Compare commits

...

38 Commits

Author SHA1 Message Date
Anna Mikhlin
908a82bea0 release: prepare for 5.2.0-rc2 2023-02-28 10:13:06 +02:00
Gleb Natapov
39158f55d0 lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once
If on the first call the capture is destroyed the second call may crash.

Fixes: #12958

Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>
(cherry picked from commit 1ce7ad1ee6)
2023-02-27 14:19:37 +02:00
Raphael S. Carvalho
22c1685b3d sstables: Temporarily disable loading of first and last position metadata
It's known that reading large cells in reverse cause large allocations.
Source: https://github.com/scylladb/scylladb/issues/11642

The loading is preliminary work for splitting large partitions into
fragments composing a run and then be able to later read such a run
in an efficiency way using the position metadata.

The splitting is not turned on yet, anywhere. Therefore, we can
temporarily disable the loading, as a way to avoid regressions in
stable versions. Large allocations can cause stalls due to foreground
memory eviction kicking in.
The default values for position metadata say that first and last
position include all clustering rows, but they aren't used anywhere
other than by sstable_run to determine if a run is disjoint at
clustering level, but given that no splitting is done yet, it
does not really matter.

Unit tests relying on position metadata were adjusted to enable
the loading, such that they can still pass.

Fixes #11642.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12979

(cherry picked from commit d73ffe7220)
2023-02-27 08:58:34 +02:00
Botond Dénes
9ba6fc73f1 mutation_compactor: only pass consumed range-tombstone-change to validator
Currently all consumed range tombstone changes are unconditionally
forwarded to the validator. Even if they are shadowed by a higher level
tombstone and/or purgable. This can result in a situation where a range
tombstone change was seen by the validator but not passed to the
consumer. The validator expects the range tombstone change to be closed
by end-of-partition but the end fragment won't come as the tombstone was
dropped, resulting in a false-positive validation failure.
Fix by only passing tombstones to the validator, that are actually
passed to the consumer too.

Fixes: #12575

Closes #12578

(cherry picked from commit e2c9cdb576)
2023-02-23 22:52:47 +02:00
Botond Dénes
f2e2c0127a types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory
Check the first fragment before dereferencing it, the fragment might be
empty, in which case move to the next one.
Found by running range scan tests with random schema and random data.

Fixes: #12821
Fixes: #12823
Fixes: #12708

Closes #12824

(cherry picked from commit ef548e654d)
2023-02-23 22:38:03 +02:00
Gleb Natapov
363ea87f51 raft: abort applier fiber when a state machine aborts
After 5badf20c7a applier fiber does not
stop after it gets abort error from a state machine which may trigger an
assertion because previous batch is not applied. Fix it.

Fixes #12863

(cherry picked from commit 9bdef9158e)
2023-02-23 14:12:12 +02:00
Kefu Chai
c49fd6f176 tools/schema_loader: do not return ref to a local variable
we should never return a reference to local variable.
so in this change, a reference to a static variable is returned
instead. this should address following warning from Clang 17:

```
/home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
        return {};
               ^~
```

Fixes #12875
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12876

(cherry picked from commit 6eab8720c4)
2023-02-22 22:02:43 +02:00
Takuya ASADA
3114589a30 scylla_coredump_setup: fix coredump timeout settings
We currently configure only TimeoutStartSec, but probably it's not
enough to prevent coredump timeout, since TimeoutStartSec is maximum
waiting time for service startup, and there is another directive to
specify maximum service running time (RuntimeMaxSec).

To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it
configures both TimeoutStartSec and TimeoutStopSec).

Fixes #5430

Closes #12757

(cherry picked from commit bf27fdeaa2)
2023-02-19 21:13:36 +02:00
Anna Stuchlik
34f68a4c0f doc: related https://github.com/scylladb/scylladb/issues/12658, fix the service name in the upgrade guide from 2022.1 to 2022.2
Closes #12698

(cherry picked from commit 826f67a298)
2023-02-17 12:17:48 +02:00
Botond Dénes
b336e11f59 Merge 'doc: fix the service name from "scylla-enterprise-server" "to "scylla-server"' from Anna Stuchlik
Related https://github.com/scylladb/scylladb/issues/12658.

This issue fixes the bug in the upgrade guides for the released versions.

Closes #12679

* github.com:scylladb/scylladb:
  doc: fix the service name in the upgrade guide for patch releases versions 2022
  doc: fix the service name in the upgrade guide from 2021.1 to 2022.1

(cherry picked from commit 325246ab2a)
2023-02-17 12:16:52 +02:00
Anna Stuchlik
9ef73d7e36 doc: fixes https://github.com/scylladb/scylladb/issues/12754, document the metric update in 5.2
Closes #12891

(cherry picked from commit bcca706ff5)
2023-02-17 12:16:13 +02:00
Botond Dénes
8700a72b4c Merge 'Backport compaction-backlog-tracker fixes to branch-5.2' from Raphael "Raph" Carvalho
Both patches are important to fix inefficiencies when updating the backlog tracker, which can manifest as a reactor stall, on a special event like schema change.

No conflicts when backporting.

Regression since 1d9f53c881, which is present in branch 5.1 onwards.

Closes #12851

* github.com:scylladb/scylladb:
  compaction: Fix inefficiency when updating LCS backlog tracker
  table: Fix quadratic behavior when inserting sstables into tracker on schema change
2023-02-15 07:22:25 +02:00
Raphael S. Carvalho
886dd3e1d2 compaction: Fix inefficiency when updating LCS backlog tracker
LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker
is calling STCS tracker's replace_sstables() with empty arguments
even when higher levels (> 0) *only* had sstables replaced.
This unnecessary call to STCS tracker will cause it to recompute
the L0 backlog, yielding the same value as before.

As LCS has a fragment size of 0.16G on higher levels, we may be
updating the tracker multiple times during incremental compaction,
which operates on SSTables on higher levels.

Inefficiency is fixed by only updating the STCS tracker if any
L0 sstable is being added or removed from the table.

This may be fixing a quadratic behavior during boot or refresh,
as new sstables are loaded one by one.
Higher levels have a substantial higher number of sstables,
therefore updating STCS tracker only when level 0 changes, reduces
significantly the number of times L0 backlog is recomputed.

Refs #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12676

(cherry picked from commit 1b2140e416)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-14 12:14:27 -03:00
Raphael S. Carvalho
f565f3de06 table: Fix quadratic behavior when inserting sstables into tracker on schema change
Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12593

(cherry picked from commit 87ee547120)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-14 12:14:21 -03:00
Anna Stuchlik
76ff6d981c doc: related https://github.com/scylladb/scylladb/issues/12754, add the requirement to upgrade Monitoring to version 4.3
Closes #12784

(cherry picked from commit c7778dd30b)
2023-02-10 10:28:35 +02:00
Botond Dénes
f924f59055 Merge 'Backport test.py improvements to 5.2' from Kamil Braun
Backport the following improvements for test.py efficiency and user experience:
- https://github.com/scylladb/scylladb/pull/12542
- https://github.com/scylladb/scylladb/pull/12560
- https://github.com/scylladb/scylladb/pull/12564
- https://github.com/scylladb/scylladb/pull/12563
- https://github.com/scylladb/scylladb/pull/12588
- https://github.com/scylladb/scylladb/pull/12613
- https://github.com/scylladb/scylladb/pull/12569
- https://github.com/scylladb/scylladb/pull/12612
- https://github.com/scylladb/scylladb/pull/12549
- https://github.com/scylladb/scylladb/pull/12678

Fixes #12617

Closes #12770

* github.com:scylladb/scylladb:
  test/pylib: put UNIX-domain socket in /tmp
  Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun
  Merge 'test.py: manual cluster pool handling for Python suite' from Alecco
  Merge 'test.py: handle broken clusters for Python suite' from Alecco
  test/pylib: scylla_cluster: don't leak server if stopping it fails
  Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
  test/pylib: scylla_cluster: return error details from test framework endpoints
  test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
  test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
  test: disable commitlog O_DSYNC, preallocation
2023-02-08 15:09:09 +02:00
Nadav Har'El
d5cef05810 test/pylib: put UNIX-domain socket in /tmp
The "cluster manager" used by the topology test suite uses a UNIX-domain
socket to communicate between the cluster manager and the individual tests.
The socket is currently located in the test directory but there is a
problem: In Linux the length of the path used as a UNIX-domain socket
address is limited to just a little over 100 bytes. In Jenkins run, the
test directory names are very long, and we sometimes go over this length
limit and the result is that test.py fails creating this socket.

In this patch we simply put the socket in /tmp instead of the test
directory. We only need to do this change in one place - the cluster
manager, as it already passes the socket path to the individual tests
(using the "--manager-api" option).

Tested by cloning Scylla in a very long directory name.
A test like ./test.py --mode=dev test_concurrent_schema fails before
this patch, and passes with it.

Fixes #12622

Closes #12678

(cherry picked from commit 681a066923)
2023-02-07 17:12:14 +01:00
Nadav Har'El
e0f4e99e9b Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun
`ScyllaClusterManager` is used to run a sequence of test cases from
a single test file. Between two consecutive tests, if the previous test
left the cluster 'dirty', meaning the cluster cannot be reused, it would
free up space in the pool (using `steal`), stop the cluster, then get a
new cluster from the pool.

Between the `steal` and the `get`, a concurrent test run (with its own
instance of `ScyllaClusterManager` would start, because there was free
space in the pool.

This resulted in undesirable behavior when we ran tests with
`--repeat X` for a large `X`: we would start with e.g. 4 concurrent
runs of a test file, because the pool size was 4. As soon as one of the
runs freed up space in the pool, we would start another concurrent run.
Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so
on. We would have a large number of concurrent runs, even though the
original 4 runs didn't finish yet. All of these concurrent runs would
compete waiting on the pool, and waiting for space in the pool would
take longer and longer (the duration is linear w.r.t number of
concurrent competing runs). Tests would then time out because they would
have to wait too long.

Fix that by using the new `replace_dirty` function introduced to the
pool. This function frees up space by returning a dirty cluster and then
immediately takes it away to be used for a new cluster. Thanks to this,
we will only have at most as many concurrent runs as the pool size. For
example with --repeat 8 and pool size 4, we would run 4 concurrent runs
and start the 5th run only when one of the original 4 runs finishes,
then the 6th run when a second run finishes and so on.

The fix is preceded by a refactor that replaces `steal` with `put(is_dirty=True)`
and a `destroy` function passed to the pool (now the pool is responsible
for stopping the cluster and releasing its IPs).

Fixes #11757

Closes #12549

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests
  test/pylib: pool: introduce `replace_dirty`
  test/pylib: pool: replace `steal` with `put(is_dirty=True)`

(cherry picked from commit 132af20057)
2023-02-07 17:08:17 +01:00
Kamil Braun
6795715011 Merge 'test.py: manual cluster pool handling for Python suite' from Alecco
From reviews of https://github.com/scylladb/scylladb/pull/12569, avoid
using `async with` and access the `Pool` of clusters with
`get()`/`put()`.

Closes #12612

* github.com:scylladb/scylladb:
  test.py: manual cluster handling for PythonSuite
  test.py: stop cluster if PythonSuite fails to start
  test.py: minor fix for failed PythonSuite test

(cherry picked from commit 5bc7f0732e)
2023-02-07 17:07:43 +01:00
Nadav Har'El
aa9e91c376 Merge 'test.py: handle broken clusters for Python suite' from Alecco
If the after test check fails (is_after_test_ok is False), discard the cluster and raise exception so context manager (pool) does not recycle it.

Ignore exception re-raised by the context manager.

Fixes #12360

Closes #12569

* github.com:scylladb/scylladb:
  test.py: handle broken clusters for Python suite
  test.py: Pool discard method

(cherry picked from commit 54f174a1f4)
2023-02-07 17:07:36 +01:00
Kamil Braun
ddfb9ebab2 test/pylib: scylla_cluster: don't leak server if stopping it fails
`ScyllaCluster.server_stop` had this piece of code:
```
        server = self.running.pop(server_id)
        if gracefully:
            await server.stop_gracefully()
        else:
            await server.stop()
        self.stopped[server_id] = server
```

We observed `stop_gracefully()` failing due to a server hanging during
shutdown. We then ended up in a state where neither `self.running` nor
`self.stopped` had this server. Later, when releasing the cluster and
its IPs, we would release that server's IP - but the server might have
still been running (all servers in `self.running` are killed before
releasing IPs, but this one wasn't in `self.running`).

Fix this by popping the server from `self.running` only after
`stop_gracefully`/`stop` finishes.

Make an analogous fix in `server_start`: put `server` into
`self.running` *before* we actually start it. If the start fails, the
server will be considered "running" even though it isn't necessarily,
but that is OK - if it isn't running, then trying to stop it later will
simply do nothing; if it is actually running, we will kill it (which we
should do) when clearing after the cluster; and we don't leak it.

Closes #12613

(cherry picked from commit a0ff33e777)
2023-02-07 17:05:20 +01:00
Nadav Har'El
d58a3e4d16 Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
Don't use a range scan, which is very inefficient, to perform a query for checking CQL availability.

Improve logging when waiting for server startup times out. Provide details about the failure: whether we managed to obtain the Host ID of the server and whether we managed to establish a CQL connection.

Closes #12588

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: better logging for timeout on server startup
  test/pylib: scylla_cluster: use less expensive query to check for CQL availability

(cherry picked from commit ccc2c6b5dd)
2023-02-07 17:05:02 +01:00
Kamil Braun
2ebac52d2d test/pylib: scylla_cluster: return error details from test framework endpoints
If an endpoint handler throws an exception, the details of the exception
are not returned to the client. Normally this is desirable so that
information is not leaked, but in this test framework we do want to
return the details to the client so it can log a useful error message.

Do it by wrapping every handler into a catch clause that returns
the exception message.

Also modify a bit how HTTPErrors are rendered so it's easier to discern
the actual body of the error from other details (such as the params used
to make the request etc.)

Before:
```
E test.pylib.rest_client.HTTPError: HTTP error 500: 500 Internal Server Error
E
E Server got itself in trouble, params None, json None, uri http+unix://api/cluster/before-test/test_stuff
```

After:
```
E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/before-test/test_stuff, params: None, json: None, body:
E Failed to start server at host 127.155.129.1.
E Check the log files:
E /home/kbraun/dev/scylladb/testlog/test.py.dev.log
E /home/kbraun/dev/scylladb/testlog/dev/scylla-1.log
```

Closes #12563

(cherry picked from commit 2f84e820fd)
2023-02-07 17:04:37 +01:00
Kamil Braun
b536614913 test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
When we obtained a new cluster for a test case after the previous test
case left a dirty cluster, we would release the old cluster's used IP
addresses (`_before_test` function). However, we would not release the
last cluster's IP after the last test case. We would run out of IPs with
sufficiently many test files or `--repeat` runs. Fix this.

Also reorder the operations a bit: stop the cluster (and release its
IPs) before freeing up space in the cluster pool (i.e. call
`self.cluster.stop()` before `self.clusters.steal()`). This reduces
concurrency a bit - fewer Scyllas running at the same time, which is
good (the pool size gives a limit on the desired max number of
concurrently running clusters). Killing a cluster is quick so it won't
make a significant difference for the next guy waiting on the pool.

Closes #12564

(cherry picked from commit 3ed3966f13)
2023-02-07 17:04:19 +01:00
Kamil Braun
85df0fd2b1 test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
If a cluster fails to boot, it saves the exception in
`self.start_exception` variable; the exception will be rethrown when
a test tries to start using this cluster. As explained in `before_test`:
```
    def before_test(self, name) -> None:
        """Check that  the cluster is ready for a test. If
        there was a start error, throw it here - the server is
        running when it's added to the pool, which can't be attributed
        to any specific test, throwing it here would stop a specific
        test."""
```
It's arguable whether we should blame some random test for a failure
that it didn't cause, but nevertheless, there's a problem here: the
`start_exception` will be rethrown and the test will fail, but then the
cluster will be simply returned to the pool and the next test will
attempt to use it... and so on.

Prevent this by marking the cluster as dirty the first time we rethrow
the exception.

Closes #12560

(cherry picked from commit 147dd73996)
2023-02-07 17:03:56 +01:00
Avi Kivity
cdf9fe7023 test: disable commitlog O_DSYNC, preallocation
Commitlog O_DSYNC is intended to make Raft and schema writes durable
in the face of power loss. To make O_DSYNC performant, we preallocate
the commitlog segments, so that the commitlog writes only change file
data and not file metadata (which would require the filesystem to commit
its own log).

However, in tests, this causes each ScyllaDB instance to write 384MB
of commitlog segments. This overloads the disks and slows everything
down.

Fix this by disabling O_DSYNC (and therefore preallocation) during
the tests. They can't survive power loss, and run with
--unsafe-bypass-fsync anyway.

Closes #12542

(cherry picked from commit 9029b8dead)
2023-02-07 17:02:59 +01:00
Beni Peled
8ff4717fd0 release: prepare for 5.2.0-rc1 2023-02-06 22:13:53 +02:00
Kamil Braun
291b1f6e7f service/raft: raft_group0: prevent double abort
There was a small chance that we called `timeout_src.request_abort()`
twice in the `with_timeout` function, first by timeout and then by
shutdown. `abort_source` fails on an assertion in this case. Fix this.

Fixes: #12512

Closes #12514

(cherry picked from commit 54170749b8)
2023-02-05 18:31:50 +02:00
Kefu Chai
b2699743cc db: system_keyspace: take the reserved_memory into account
before this change, we returns the total memory managed by Seastar
in the "total" field in system.memory. but this value only reflect
the total memory managed by Seastar's allocator. if
`reserve_additional_memory` is set when starting app_template,
Seastar's memory subsystem just reserves a chunk of memory of this
specified size for system, and takes the remaining memory. since
f05d612da8, we set this value to 50MB for wasmtime runtime. hence
the test of `TestRuntimeInfoTable.test_default_content` in dtest
fails. the test expects the size passed via the option of
`--memory` to be identical to the value reported by system.memory's
"total" field.

after this change, the "total" field takes the reserved memory
for wasm udf into account. the "total" field should reflect the total
size of memory used by Scylla, no matter how we use a certain portion
of the allocated memory.

Fixes #12522
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12573

(cherry picked from commit 4a0134a097)
2023-02-05 18:30:05 +02:00
Botond Dénes
50ae73a4bd types: is_tuple(): handle reverse types
Currently reverse types match the default case (false), even though they
might be wrapping a tuple type. One user-visible effect of this is that
a schema, which has a reversed<frozen<UDT>> clustering key component,
will have this component incorrectly represented in the schema cql dump:
the UDT will loose the frozen attribute. When attempting to recreate
this schema based on the dump, it will fail as the only frozen UDTs are
allowed in primary key components.

Fixes: #12576

Closes #12579

(cherry picked from commit ebc100f74f)
2023-02-05 18:20:21 +02:00
Calle Wilund
c3dd4a2b87 alterator::streams: Sort tables in list_streams to ensure no duplicates
Fixes #12601 (maybe?)

Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.

(cherry picked from commit da8adb4d26)
2023-02-05 17:44:00 +02:00
Benny Halevy
0f9fe61d91 view: row_lock: lock_ck: find or construct row_lock under partition lock
Since we're potentially searching the row_lock in parallel to acquiring
the read_lock on the partition, we're racing with row_locker::unlock
that may erase the _row_locks entry for the same clustering key, since
there is no lock to protect it up until the partition lock has been
acquired and the lock_partition future is resolved.

This change moves the code to search for or allocate the row lock
_after_ the partition lock has been acquired to make sure we're
synchronously starting the read/write lock function on it, without
yielding, to prevent this use-after-free.

This adds an allocation for copying the clustering key in advance
even if a row_lock entry already exists, that wasn't needed before.
It only us slows down (a bit) when there is contention and the lock
already existed when we want to go locking. In the fast path there
is no contention and then the code already had to create the lock
and copy the key. In any case, the penalty of copying the key once
is tiny compared to the rest of the work that view updates are doing.

This is required on top of 5007ded2c1 as
seen in https://github.com/scylladb/scylladb/issues/12632
which is closely related to #12168 but demonstrates a different race
causing use-after-free.

Fixes #12632

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4b5e324ecb)
2023-02-05 17:22:31 +02:00
Anna Stuchlik
59d30ff241 docs: fixes https://github.com/scylladb/scylladb/issues/12654, update the links to the Download Center
Closes #12655

(cherry picked from commit 64cc4c8515)
2023-02-05 17:19:56 +02:00
Anna Stuchlik
fb82dff89e doc: fixes https://github.com/scylladb/scylladb/issues/12672, fix the redirects to the Cloud docs
Closes #12673

(cherry picked from commit 2be131da83)
2023-02-05 17:17:35 +02:00
Kefu Chai
b588b19620 cql3/selection: construct string_view using char* not size
before this change, we construct a sstring from a comma statement,
which evaluates to the return value of `name.size()`, but what we
expect is `sstring(const char*, size_t)`.

in this change

* instead of passing the size of the string_view,
  both its address and size are used
* `std::string_view` is constructed instead of sstring, for better
  performance, as we don't need to perform a deep copy

the issue is reported by GCC-13:

```
In file included from cql3/selection/selectable.cc:11:
cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
                                                           ^~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12666

(cherry picked from commit 186ceea009)

Fixes #12739.
2023-02-05 13:50:48 +02:00
Michał Chojnowski
608ef92a71 commitlog: fix total_size_on_disk accounting after segment file removal
Currently, segment file removal first calls `f.remove_file()` and
does `total_size_on_disk -= f.known_size()` later.
However, `remove_file()` resets `known_size` to 0, so in effect
the freed space in not accounted for.

`total_size_on_disk` is not just a metric. It is also responsible
for deciding whether a segment should be recycled -- it is recycled
only if `total_size_on_disk - known_size < max_disk_size`.
Therefore this bug has dire performance consequences:
if `total_size_on_disk - known_size` ever exceeds `max_disk_size`,
the recycling of commitlog segments will stop permanently, because
`total_size_on_disk - known_size` will never go back below
`max_disk_size` due to the accounting bug. All new segments from this
point will be allocated from scratch.

The bug was uncovered by a QA performance test. It isn't easy to trigger --
it took the test 7 hours of constant high load to step into it.
However, the fact that the effect is permanent, and degrades the
performance of the cluster silently, makes the bug potentially quite severe.

The bug can be easily spotted with Prometheus as infinitely rising
`commitlog_total_size_on_disk` on the affected shards.

Fixes #12645

Closes #12646

(cherry picked from commit fa7e904cd6)
2023-02-01 21:54:37 +02:00
Kamil Braun
d2732b2663 Merge 'Enable Raft by default in new clusters' from Kamil Braun
New clusters that use a fresh conf/scylla.yaml will have `consistent_cluster_management: true`, which will enable Raft, unless the user explicitly turns it off before booting the cluster.

People using existing yaml files will continue without Raft, unless consistent_cluster_management is explicitly requested during/after upgrade.

Also update the docs: cluster creation and node addition procedures.

Fixes #12572.

Closes #12585

* github.com:scylladb/scylladb:
  docs: mention `consistent_cluster_management` for creating cluster and adding node procedures
  conf: enable `consistent_cluster_management` by default

(cherry picked from commit 5c886e59de)
2023-01-26 12:21:55 +01:00
Anna Mikhlin
34ab98e1be release: prepare for 5.2.0-rc0 2023-01-18 14:54:36 +02:00
51 changed files with 437 additions and 202 deletions

View File

@@ -72,7 +72,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.2.0-dev
VERSION=5.2.0-rc2
if test -f version
then

View File

@@ -145,19 +145,24 @@ future<alternator::executor::request_return_type> alternator::executor::list_str
auto table = find_table(_proxy, request);
auto db = _proxy.data_dictionary();
auto cfs = db.get_tables();
auto i = cfs.begin();
auto e = cfs.end();
if (limit < 1) {
throw api_error::validation("Limit must be 1 or more");
}
// TODO: the unordered_map here is not really well suited for partial
// querying - we're sorting on local hash order, and creating a table
// between queries may or may not miss info. But that should be rare,
// and we can probably expect this to be a single call.
// # 12601 (maybe?) - sort the set of tables on ID. This should ensure we never
// generate duplicates in a paged listing here. Can obviously miss things if they
// are added between paged calls and end up with a "smaller" UUID/ARN, but that
// is to be expected.
std::sort(cfs.begin(), cfs.end(), [](const data_dictionary::table& t1, const data_dictionary::table& t2) {
return t1.schema()->id().uuid() < t2.schema()->id().uuid();
});
auto i = cfs.begin();
auto e = cfs.end();
if (streams_start) {
i = std::find_if(i, e, [&](data_dictionary::table t) {
i = std::find_if(i, e, [&](const data_dictionary::table& t) {
return t.schema()->id().uuid() == streams_start
&& cdc::get_base_table(db.real_database(), *t.schema())
&& is_alternator_keyspace(t.schema()->ks_name())

View File

@@ -409,7 +409,9 @@ public:
l0_old_ssts.push_back(std::move(sst));
}
}
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
if (l0_old_ssts.size() || l0_new_ssts.size()) {
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
}
}
};

View File

@@ -553,4 +553,16 @@ murmur3_partitioner_ignore_msb_bits: 12
# WARNING: It's unsafe to set this to false if the node previously booted
# with the schema commit log enabled. In such case, some schema changes
# may be lost if the node was not cleanly stopped.
force_schema_commit_log: true
force_schema_commit_log: true
# Use Raft to consistently manage schema information in the cluster.
# Refer to https://docs.scylladb.com/master/architecture/raft.html for more details.
# The 'Handling Failures' section is especially important.
#
# Once enabled in a cluster, this cannot be turned off.
# If you want to bootstrap a new cluster without Raft, make sure to set this to `false`
# before starting your nodes for the first time.
#
# A cluster not using Raft can be 'upgraded' to use Raft. Refer to the aforementioned
# documentation, section 'Enabling Raft in ScyllaDB 5.2 and further', for the procedure.
consistent_cluster_management: true

View File

@@ -80,7 +80,7 @@ public:
virtual sstring assignment_testable_source_context() const override {
auto&& name = _type->field_name(_field);
auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
auto sname = std::string_view(reinterpret_cast<const char*>(name.data()), name.size());
return format("{}.{}", _selected, sname);
}

View File

@@ -2116,6 +2116,9 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
clogger.debug("Discarding segments {}", ftd);
for (auto& [f, mode] : ftd) {
// `f.remove_file()` resets known_size to 0, so remember the size here,
// in order to subtract it from total_size_on_disk accurately.
size_t size = f.known_size();
try {
if (f) {
co_await f.close();
@@ -2132,7 +2135,6 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
}
}
auto size = f.known_size();
auto usage = totals.total_size_on_disk;
auto next_usage = usage - size;
@@ -2165,7 +2167,7 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
// or had such an exception that we consider the file dead
// anyway. In either case we _remove_ the file size from
// footprint, because it is no longer our problem.
totals.total_size_on_disk -= f.known_size();
totals.total_size_on_disk -= size;
}
// #8376 - if we had an error in recycling (disk rename?), and no elements

View File

@@ -401,6 +401,10 @@ public:
named_value<uint64_t> wasm_udf_yield_fuel;
named_value<uint64_t> wasm_udf_total_fuel;
named_value<size_t> wasm_udf_memory_limit;
// wasm_udf_reserved_memory is static because the options in db::config
// are parsed using seastar::app_template, while this option is used for
// configuring the Seastar memory subsystem.
static constexpr size_t wasm_udf_reserved_memory = 50 * 1024 * 1024;
seastar::logging_settings logging_settings(const log_cli::options&) const;

View File

@@ -2276,7 +2276,10 @@ public:
add_partition(mutation_sink, "trace_probability", format("{:.2}", tracing::tracing::get_local_tracing_instance().get_trace_probability()));
co_await add_partition(mutation_sink, "memory", [this] () {
struct stats {
uint64_t total = 0;
// take the pre-reserved memory into account, as seastar only returns
// the stats of memory managed by the seastar allocator, but we instruct
// it to reserve addition memory for system.
uint64_t total = db::config::wasm_udf_reserved_memory;
uint64_t free = 0;
static stats reduce(stats a, stats b) { return stats{a.total + b.total, a.free + b.free}; }
};

View File

@@ -85,29 +85,25 @@ future<row_locker::lock_holder>
row_locker::lock_ck(const dht::decorated_key& pk, const clustering_key_prefix& cpk, bool exclusive, db::timeout_clock::time_point timeout, stats& stats) {
mylog.debug("taking shared lock on partition {}, and {} lock on row {} in it", pk, (exclusive ? "exclusive" : "shared"), cpk);
auto tracker = latency_stats_tracker(exclusive ? stats.exclusive_row : stats.shared_row);
auto ck = cpk;
// Create a two-level lock entry for the partition if it doesn't exist already.
auto i = _two_level_locks.try_emplace(pk, this).first;
// The two-level lock entry we've just created is guaranteed to be kept alive as long as it's locked.
// Initiating read locking in the background below ensures that even if the two-level lock is currently
// write-locked, releasing the write-lock will synchronously engage any waiting
// locks and will keep the entry alive.
future<lock_type::holder> lock_partition = i->second._partition_lock.hold_read_lock(timeout);
auto j = i->second._row_locks.find(cpk);
if (j == i->second._row_locks.end()) {
// Not yet locked, need to create the lock. This makes a copy of cpk.
try {
j = i->second._row_locks.emplace(cpk, lock_type()).first;
} catch(...) {
// If this emplace() failed, e.g., out of memory, we fail. We
// could do nothing - the partition lock we already started
// taking will be unlocked automatically after being locked.
// But it's better form to wait for the work we started, and it
// will also allow us to remove the hash-table row we added.
return lock_partition.then([ex = std::current_exception()] (auto lock) {
// The lock is automatically released when "lock" goes out of scope.
// TODO: unlock (lock = {}) now, search for the partition in the
// hash table (we know it's still there, because we held the lock until
// now) and remove the unused lock from the hash table if still unused.
return make_exception_future<row_locker::lock_holder>(std::current_exception());
});
return lock_partition.then([this, pk = &i->first, row_locks = &i->second._row_locks, ck = std::move(ck), exclusive, tracker = std::move(tracker), timeout] (auto lock1) mutable {
auto j = row_locks->find(ck);
if (j == row_locks->end()) {
// Not yet locked, need to create the lock.
j = row_locks->emplace(std::move(ck), lock_type()).first;
}
}
return lock_partition.then([this, pk = &i->first, cpk = &j->first, &row_lock = j->second, exclusive, tracker = std::move(tracker), timeout] (auto lock1) mutable {
auto* cpk = &j->first;
auto& row_lock = j->second;
// Like to the two-level lock entry above, the row_lock entry we've just created
// is guaranteed to be kept alive as long as it's locked.
// Initiating read/write locking in the background below ensures that.
auto lock_row = exclusive ? row_lock.hold_write_lock(timeout) : row_lock.hold_read_lock(timeout);
return lock_row.then([this, pk, cpk, exclusive, tracker = std::move(tracker), lock1 = std::move(lock1)] (auto lock2) mutable {
lock1.release();

View File

@@ -42,7 +42,8 @@ if __name__ == '__main__':
if systemd_unit.available('systemd-coredump@.service'):
dropin = '''
[Service]
TimeoutStartSec=infinity
RuntimeMaxSec=infinity
TimeoutSec=infinity
'''[1:-1]
os.makedirs('/etc/systemd/system/systemd-coredump@.service.d', exist_ok=True)
with open('/etc/systemd/system/systemd-coredump@.service.d/timeout.conf', 'w') as f:

View File

@@ -1112,14 +1112,14 @@ tls-ssl/index.html: /stable/operating-scylla/security
/using-scylla/integrations/integration_kairos/index.html: /stable/using-scylla/integrations/integration-kairos
/upgrade/ami_upgrade/index.html: /stable/upgrade/ami-upgrade
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
# move scylla cloud for AWS to dedicated directory
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/AWS/aws-vpc-peering
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: /stable/scylla-cloud/cloud-setup/AWS/cloud-prom-proxy
/scylla-cloud/cloud-setup/outposts/index.html: /stable/scylla-cloud/cloud-setup/AWS/outposts
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: /stable/scylla-cloud/cloud-setup/AWS/scylla-cloud-byoa
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/aws-vpc-peering.html
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: https://cloud.docs.scylladb.com/stable/monitoring/cloud-prom-proxy.html
/scylla-cloud/cloud-setup/outposts/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/outposts.html
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/scylla-cloud-byoa.html
/scylla-cloud/cloud-services/scylla_cloud_costs/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-costs
/scylla-cloud/cloud-services/scylla_cloud_managin_versions/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-managin-versions
/scylla-cloud/cloud-services/scylla_cloud_support_alerts_sla/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-support-alerts-sla

View File

@@ -25,7 +25,7 @@ Getting Started
:id: "getting-started"
:class: my-panel
* `Install ScyllaDB (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/>`_ - Links to the ScyllaDB Download Center
* `Install ScyllaDB (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/#core>`_ - Links to the ScyllaDB Download Center
* :doc:`Configure ScyllaDB </getting-started/system-configuration/>`
* :doc:`Run ScyllaDB in a Shared Environment </getting-started/scylla-in-a-shared-environment>`

View File

@@ -20,7 +20,7 @@ Install ScyllaDB
Keep your versions up-to-date. The two latest versions are supported. Also always install the latest patches for your version.
* Download and install ScyllaDB Server, Drivers and Tools in `Scylla Download Center <https://www.scylladb.com/download/#server/>`_
* Download and install ScyllaDB Server, Drivers and Tools in `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_
* :doc:`ScyllaDB Web Installer for Linux <scylla-web-installer>`
* :doc:`ScyllaDB Unified Installer (relocatable executable) <unified-installer>`
* :doc:`Air-gapped Server Installation <air-gapped-install>`

View File

@@ -4,7 +4,7 @@ ScyllaDB Web Installer for Linux
ScyllaDB Web Installer is a platform-agnostic installation script you can run with ``curl`` to install ScyllaDB on Linux.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#server>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
Prerequisites
--------------

View File

@@ -3,6 +3,7 @@
* endpoint_snitch - ``grep endpoint_snitch /etc/scylla/scylla.yaml``
* Scylla version - ``scylla --version``
* Authenticator - ``grep authenticator /etc/scylla/scylla.yaml``
* consistent_cluster_management - ``grep consistent_cluster_management /etc/scylla/scylla.yaml``
.. Note::

View File

@@ -119,6 +119,7 @@ Add New DC
* **listen_address** - IP address that Scylla used to connect to the other Scylla nodes in the cluster.
* **endpoint_snitch** - Set the selected snitch.
* **rpc_address** - Address for client connections (Thrift, CQL).
* **consistent_cluster_management** - set to the same value as used by your existing nodes.
The parameters ``seeds``, ``cluster_name`` and ``endpoint_snitch`` need to match the existing cluster.

View File

@@ -54,6 +54,8 @@ Procedure
* **seeds** - Specifies the IP address of an existing node in the cluster. The new node will use this IP to connect to the cluster and learn the cluster topology and state.
* **consistent_cluster_management** - set to the same value as used by your existing nodes.
.. note::
In earlier versions of ScyllaDB, seed nodes assisted in gossip. Starting with Scylla Open Source 4.3 and Scylla Enterprise 2021.1, the seed concept in gossip has been removed. If you are using an earlier version of ScyllaDB, you need to configure the seeds parameter in the following way:

View File

@@ -70,6 +70,7 @@ the file can be found under ``/etc/scylla/``
- **listen_address** - IP address that the Scylla use to connect to other Scylla nodes in the cluster
- **endpoint_snitch** - Set the selected snitch
- **rpc_address** - Address for client connection (Thrift, CQLSH)
- **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
3. In the ``cassandra-rackdc.properties`` file, edit the rack and data center information.
The file can be found under ``/etc/scylla/``.

View File

@@ -26,6 +26,7 @@ The file can be found under ``/etc/scylla/``
- **listen_address** - IP address that Scylla used to connect to other Scylla nodes in the cluster
- **endpoint_snitch** - Set the selected snitch
- **rpc_address** - Address for client connection (Thrift, CQL)
- **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
3. This step needs to be done **only** if you are using the **GossipingPropertyFileSnitch**. If not, skip this step.
In the ``cassandra-rackdc.properties`` file, edit the parameters listed below.

View File

@@ -63,6 +63,7 @@ Perform the following steps for each node in the new cluster:
* **rpc_address** - Address for client connection (Thrift, CQL).
* **broadcast_address** - The IP address a node tells other nodes in the cluster to contact it by.
* **broadcast_rpc_address** - Default: unset. The RPC address to broadcast to drivers and other Scylla nodes. It cannot be set to 0.0.0.0. If left blank, it will be set to the value of ``rpc_address``. If ``rpc_address`` is set to 0.0.0.0, ``broadcast_rpc_address`` must be explicitly configured.
* **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
#. After you have installed and configured Scylla and edited ``scylla.yaml`` file on all the nodes, start the node specified with the ``seeds`` parameter. Then start the rest of the nodes in your cluster, one at a time, using
``sudo systemctl start scylla-server``.

View File

@@ -25,6 +25,7 @@ Login to one of the nodes in the cluster with (UN) status, collect the following
* seeds - ``cat /etc/scylla/scylla.yaml | grep seeds:``
* endpoint_snitch - ``cat /etc/scylla/scylla.yaml | grep endpoint_snitch``
* Scylla version - ``scylla --version``
* consistent_cluster_management - ``grep consistent_cluster_management /etc/scylla/scylla.yaml``
Procedure
---------

View File

@@ -66,6 +66,8 @@ Procedure
- **rpc_address** - Address for client connection (Thrift, CQL)
- **consistent_cluster_management** - set to the same value as used by your existing nodes.
#. Add the ``replace_node_first_boot`` parameter to the ``scylla.yaml`` config file on the new node. This line can be added to any place in the config file. After a successful node replacement, there is no need to remove it from the ``scylla.yaml`` file. (Note: The obsolete parameters "replace_address" and "replace_address_first_boot" are not supported and should not be used). The value of the ``replace_node_first_boot`` parameter should be the Host ID of the node to be replaced.
For example (using the Host ID of the failed node from above):

View File

@@ -68,7 +68,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------
@@ -92,13 +92,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
3. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
3. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
4. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -130,7 +130,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
@@ -164,7 +164,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -66,7 +66,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------

View File

@@ -16,13 +16,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -54,7 +54,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
@@ -88,7 +88,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -69,7 +69,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------

View File

@@ -36,13 +36,13 @@ A new io.conf format was introduced in Scylla 2.3 and 2019.1. If your io.conf do
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -75,7 +75,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the old release
------------------------------------
@@ -120,7 +120,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -102,7 +102,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
.. _upgrade-debian-ubuntu-enterprise-2022.2:
@@ -138,7 +138,7 @@ Download and install the new release
sudo apt-get clean all
sudo apt-get update
sudo apt-get dist-upgrade scylla-enterprise-server
sudo apt-get dist-upgrade scylla-enterprise
Answer y to the first two questions.
@@ -213,13 +213,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in ``UN`` status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version. Validate that the version matches the one you upgraded to.
#. Check scylla-enterprise-server log (using ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no new errors in the log.
#. Check scylla-server log (using ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no new errors in the log.
#. Check again after two minutes, to validate no new issues are introduced.
Once you are sure the node upgrade was successful, move to the next node in the cluster.
@@ -260,7 +260,7 @@ Drain and gracefully stop the node
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the old release
------------------------------------
@@ -359,7 +359,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -63,7 +63,7 @@ Stop ScyllaDB
.. code:: sh
sudo systemctl stop scylla-enterprise-server
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
@@ -84,7 +84,7 @@ Start the node
.. code:: sh
sudo systemctl start scylla-enterprise-server
sudo systemctl start scylla-server
Validate
--------
@@ -125,7 +125,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo systemctl stop scylla-enterprise-server
sudo systemctl stop scylla-server
Downgrade to the previous release
-----------------------------------
@@ -149,7 +149,7 @@ Start the node
.. code:: sh
sudo systemctl start scylla-enterprise-server
sudo systemctl start scylla-server
Validate
--------

View File

@@ -1,5 +1,5 @@
Scylla Metric Update - Scylla 5.1 to 5.2
========================================
ScyllaDB Metric Update - Scylla 5.1 to 5.2
============================================
.. toctree::
:maxdepth: 2
@@ -7,8 +7,8 @@ Scylla Metric Update - Scylla 5.1 to 5.2
Scylla 5.2 Dashboards are available as part of the latest |mon_root|.
The following metrics are new in Scylla 5.2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following metrics are new in ScyllaDB 5.2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. list-table::
:widths: 25 150
@@ -16,5 +16,42 @@ The following metrics are new in Scylla 5.2
* - Metric
- Description
* - TODO
- TODO
* - scylla_database_disk_reads
- Holds the number of currently active disk read operations.
* - scylla_database_sstables_read
- Holds the number of currently read sstables.
* - scylla_memory_malloc_failed
- Total count of failed memory allocations
* - scylla_raft_group0_status
- status of the raft group, 0 - disabled, 1 - normal, 2 - aborted
* - scylla_storage_proxy_coordinator_cas_read_latency_summary
- CAS read latency summary
* - scylla_storage_proxy_coordinator_cas_write_latency_summary
- CAS write latency summary
* - scylla_storage_proxy_coordinator_read_latency_summary
- Read latency summary
* - scylla_storage_proxy_coordinator_write_latency_summary
- Write latency summary
* - scylla_streaming_finished_percentage
- Finished percentage of node operation on this shard
* - scylla_view_update_generator_sstables_pending_work
- Number of bytes remaining to be processed from SSTables for view updates
The following metrics are renamed in ScyllaDB 5.2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. list-table::
:widths: 25 150
:header-rows: 1
* - 5.1
- 5.2
* - scylla_database_active_reads_memory_consumption
- scylla_database_reads_memory_consumption
* - scylla_memory_regular_virtual_dirty_bytes
- scylla_memory_regular_unspooled_dirty_bytes
* - scylla_memory_system_virtual_dirty_bytes
- scylla_memory_system_unspooled_dirty_bytes
* - scylla_memory_virtual_dirty_bytes
- scylla_memory_unspooled_dirty_bytes

View File

@@ -67,7 +67,11 @@ Apply the following procedure **serially** on each node. Do not move to the next
If you enabled consistent cluster management in each node's configuration file, then as soon as every node has been upgraded to the new version, the cluster will start a procedure which initializes the Raft algorithm for consistent cluster metadata management.
You must then :ref:`verify <validate-raft-setup>` that this procedure successfully finishes.
.. note:: Before upgrading, make sure to use the latest `ScyllaDB Monitoring <https://monitoring.docs.scylladb.com/>`_ stack.
.. note::
If you use the `ScyllaDB Monitoring Stack <https://monitoring.docs.scylladb.com/>`_, we recommend upgrading the Monitoring Stack to the latest version **before** upgrading ScyllaDB.
For ScyllaDB 5.2, you MUST upgrade the Monitoring Stack to version 4.3 or later.
Upgrade Steps
=============

View File

@@ -476,7 +476,7 @@ To start the scylla server proper, simply invoke as: scylla server (or just scyl
// We need to have the entire app config to run the app, but we need to
// run the app to read the config file with UDF specific options so that
// we know whether we need to reserve additional memory for UDFs.
app_cfg.reserve_additional_memory = 50 * 1024 * 1024;
app_cfg.reserve_additional_memory = db::config::wasm_udf_reserved_memory;
app_template app(std::move(app_cfg));
auto ext = std::make_shared<db::extensions>();

View File

@@ -177,7 +177,6 @@ private:
template <typename Consumer, typename GCConsumer>
requires CompactedFragmentsConsumerV2<Consumer> && CompactedFragmentsConsumerV2<GCConsumer>
stop_iteration do_consume(range_tombstone_change&& rtc, Consumer& consumer, GCConsumer& gc_consumer) {
_validator(mutation_fragment_v2::kind::range_tombstone_change, rtc.position(), rtc.tombstone());
stop_iteration gc_consumer_stop = stop_iteration::no;
stop_iteration consumer_stop = stop_iteration::no;
if (rtc.tombstone() <= _partition_tombstone) {
@@ -199,6 +198,7 @@ private:
partition_is_not_empty(consumer);
_current_emitted_tombstone = rtc.tombstone();
consumer_stop = consumer.consume(std::move(rtc));
_validator(mutation_fragment_v2::kind::range_tombstone_change, rtc.position(), rtc.tombstone());
}
return gc_consumer_stop || consumer_stop;
}

View File

@@ -1144,7 +1144,7 @@ future<> server_impl::applier_fiber() {
co_await _state_machine->apply(std::move(commands));
} catch (abort_requested_exception& e) {
logger.info("[{}] applier fiber stopped because state machine was aborted: {}", _id, e);
co_return;
throw stop_apply_fiber{};
} catch (...) {
std::throw_with_nested(raft::state_machine_error{});
}

View File

@@ -1253,10 +1253,13 @@ void table::set_compaction_strategy(sstables::compaction_strategy_type strategy)
cg.get_backlog_tracker().copy_ongoing_charges(new_bt, move_read_charges);
new_sstables = make_lw_shared<sstables::sstable_set>(new_cs.make_sstable_set(t._schema));
cg.main_sstables()->for_each_sstable([this] (const sstables::shared_sstable& s) {
add_sstable_to_backlog_tracker(new_bt, s);
std::vector<sstables::shared_sstable> new_sstables_for_backlog_tracker;
new_sstables_for_backlog_tracker.reserve(cg.main_sstables()->all()->size());
cg.main_sstables()->for_each_sstable([this, &new_sstables_for_backlog_tracker] (const sstables::shared_sstable& s) {
new_sstables->insert(s);
new_sstables_for_backlog_tracker.push_back(s);
});
new_bt.replace_sstables({}, std::move(new_sstables_for_backlog_tracker));
}
void execute() noexcept {

View File

@@ -63,4 +63,15 @@ MemoryLimit=$MEMORY_LIMIT
EOS
fi
if [ -e /etc/systemd/system/systemd-coredump@.service.d/timeout.conf ]; then
COREDUMP_RUNTIME_MAX=$(grep RuntimeMaxSec /etc/systemd/system/systemd-coredump@.service.d/timeout.conf)
if [ -z $COREDUMP_RUNTIME_MAX ]; then
cat << EOS > /etc/systemd/system/systemd-coredump@.service.d/timeout.conf
[Service]
RuntimeMaxSec=infinity
TimeoutSec=infinity
EOS
fi
fi
systemctl --system daemon-reload >/dev/null || true

View File

@@ -103,7 +103,7 @@ future<prepare_response> paxos_state::prepare(storage_proxy& sp, tracing::trace_
auto ex = f2.get_exception();
logger.debug("Failed to get data or digest: {}. Ignored.", std::move(ex));
}
auto upgrade_if_needed = [schema = std::move(schema)] (std::optional<proposal> p) mutable {
auto upgrade_if_needed = [schema = std::move(schema)] (std::optional<proposal> p) {
if (!p || p->update.schema_version() == schema->version()) {
return make_ready_future<std::optional<proposal>>(std::move(p));
}
@@ -115,7 +115,7 @@ future<prepare_response> paxos_state::prepare(storage_proxy& sp, tracing::trace_
// for that version and upgrade the mutation with it.
logger.debug("Stored mutation references outdated schema version. "
"Trying to upgrade the accepted proposal mutation to the most recent schema version.");
return service::get_column_mapping(p->update.column_family_id(), p->update.schema_version()).then([schema = std::move(schema), p = std::move(p)] (const column_mapping& cm) {
return service::get_column_mapping(p->update.column_family_id(), p->update.schema_version()).then([schema, p = std::move(p)] (const column_mapping& cm) {
return make_ready_future<std::optional<proposal>>(proposal(p->ballot, freeze(p->update.unfreeze_upgrading(schema, cm))));
});
};

View File

@@ -969,7 +969,11 @@ with_timeout(abort_source& as, db::timeout_clock::duration d, F&& fun) {
// FIXME: using lambda as workaround for clang bug #50345 (miscompiling coroutine templates).
auto impl = [] (abort_source& as, db::timeout_clock::duration d, F&& fun) -> future_t {
abort_source timeout_src;
auto sub = as.subscribe([&timeout_src] () noexcept { timeout_src.request_abort(); });
auto sub = as.subscribe([&timeout_src] () noexcept {
if (!timeout_src.abort_requested()) {
timeout_src.request_abort();
}
});
if (!sub) {
throw abort_requested_exception{};
}

View File

@@ -136,7 +136,9 @@ struct sstable_open_config {
// fields respectively. Problematic sstables might fail to load. Set to
// false if you want to disable this, to be able to read such sstables.
// Should only be disabled for diagnostics purposes.
bool load_first_and_last_position_metadata = true;
// FIXME: Enable it by default once the root cause of large allocation when reading sstable in reverse is fixed.
// Ref: https://github.com/scylladb/scylladb/issues/11642
bool load_first_and_last_position_metadata = false;
};
class sstable : public enable_lw_shared_from_this<sstable> {

60
test.py
View File

@@ -343,7 +343,16 @@ class PythonTestSuite(TestSuite):
pool_size = cfg.get("pool_size", 2)
self.create_cluster = self.get_cluster_factory(cluster_size)
self.clusters = Pool(pool_size, self.create_cluster)
async def recycle_cluster(cluster: ScyllaCluster) -> None:
"""When a dirty cluster is returned to the cluster pool,
stop it and release the used IPs. We don't necessarily uninstall() it yet,
which would delete the log file and directory - we might want to preserve
these if it came from a failed test.
"""
await cluster.stop()
await cluster.release_ips()
self.clusters = Pool(pool_size, self.create_cluster, recycle_cluster)
def get_cluster_factory(self, cluster_size: int) -> Callable[..., Awaitable]:
def create_server(create_cfg: ScyllaCluster.CreateServerParams):
@@ -686,7 +695,8 @@ class CQLApprovalTest(Test):
if self.server_log is not None:
logger.info("Server log:\n%s", self.server_log)
async with self.suite.clusters.instance(logger) as cluster:
# TODO: consider dirty_on_exception=True
async with self.suite.clusters.instance(False, logger) as cluster:
try:
cluster.before_test(self.uname)
logger.info("Leasing Scylla cluster %s for test %s", cluster, self.uname)
@@ -842,26 +852,32 @@ class PythonTest(Test):
loggerPrefix = self.mode + '/' + self.uname
logger = LogPrefixAdapter(logging.getLogger(loggerPrefix), {'prefix': loggerPrefix})
async with self.suite.clusters.instance(logger) as cluster:
try:
cluster.before_test(self.uname)
logger.info("Leasing Scylla cluster %s for test %s", cluster, self.uname)
self.args.insert(0, "--host={}".format(cluster.endpoint()))
self.is_before_test_ok = True
cluster.take_log_savepoint()
status = await run_test(self, options)
cluster.after_test(self.uname)
self.is_after_test_ok = True
self.success = status
except Exception as e:
self.server_log = cluster.read_server_log()
self.server_log_filename = cluster.server_log_filename()
if self.is_before_test_ok is False:
print("Test {} pre-check failed: {}".format(self.name, str(e)))
print("Server log of the first server:\n{}".format(self.server_log))
# Don't try to continue if the cluster is broken
raise
logger.info("Test %s %s", self.uname, "succeeded" if self.success else "failed ")
cluster = await self.suite.clusters.get(logger)
try:
cluster.before_test(self.uname)
logger.info("Leasing Scylla cluster %s for test %s", cluster, self.uname)
self.args.insert(0, "--host={}".format(cluster.endpoint()))
self.is_before_test_ok = True
cluster.take_log_savepoint()
status = await run_test(self, options)
cluster.after_test(self.uname)
self.is_after_test_ok = True
self.success = status
except Exception as e:
self.server_log = cluster.read_server_log()
self.server_log_filename = cluster.server_log_filename()
if self.is_before_test_ok is False:
print("Test {} pre-check failed: {}".format(self.name, str(e)))
print("Server log of the first server:\n{}".format(self.server_log))
logger.info(f"Discarding cluster after failed start for test %s...", self.name)
elif self.is_after_test_ok is False:
print("Test {} post-check failed: {}".format(self.name, str(e)))
print("Server log of the first server:\n{}".format(self.server_log))
logger.info(f"Discarding cluster after failed test %s...", self.name)
await self.suite.clusters.put(cluster, is_dirty=True)
else:
await self.suite.clusters.put(cluster, is_dirty=False)
logger.info("Test %s %s", self.uname, "succeeded" if self.success else "failed ")
return self
def write_junit_failure_report(self, xml_res: ET.Element) -> None:

View File

@@ -4926,6 +4926,7 @@ SEASTAR_TEST_CASE(test_large_partition_splitting_on_compaction) {
position_in_partition::tri_compare pos_tri_cmp(*s);
for (auto& sst : ret.new_sstables) {
sst = env.reusable_sst(s, tmp.path().string(), sst->generation().value()).get0();
BOOST_REQUIRE(sst->may_have_partition_tombstones());
auto reader = sstable_reader(sst, s, env.make_reader_permit());

View File

@@ -205,6 +205,7 @@ def run_scylla_cmd(pid, dir):
'--max-networking-io-control-blocks', '100',
'--unsafe-bypass-fsync', '1',
'--kernel-page-cache', '1',
'--commitlog-use-o-dsync', '0',
'--flush-schema-tables-after-modification', 'false',
'--api-address', ip,
'--rpc-address', ip,

View File

@@ -106,6 +106,7 @@ cql_test_config::cql_test_config(shared_ptr<db::config> cfg)
db_config->add_per_partition_rate_limit_extension();
db_config->flush_schema_tables_after_modification.set(false);
db_config->commitlog_use_o_dsync(false);
}
cql_test_config::cql_test_config(const cql_test_config&) = default;

View File

@@ -85,7 +85,8 @@ public:
future<shared_sstable> reusable_sst(schema_ptr schema, sstring dir, unsigned long generation,
sstable::version_types version, sstable::format_types f = sstable::format_types::big) {
auto sst = make_sstable(std::move(schema), dir, generation, version, f);
return sst->load().then([sst = std::move(sst)] {
sstable_open_config cfg { .load_first_and_last_position_metadata = true };
return sst->load(default_priority_class(), cfg).then([sst = std::move(sst)] {
return make_ready_future<shared_sstable>(std::move(sst));
});
}

View File

@@ -43,7 +43,8 @@ sstables::shared_sstable make_sstable_containing(std::function<sstables::shared_
}
}
write_memtable_to_sstable_for_test(*mt, sst).get();
sst->open_data().get();
sstable_open_config cfg { .load_first_and_last_position_metadata = true };
sst->open_data(cfg).get();
std::set<mutation, mutation_decorated_key_less_comparator> merged;
for (auto&& m : muts) {

View File

@@ -72,7 +72,11 @@ class HostRegistry:
self.next_host_id += 1
return Host(self.subnet.format(self.next_host_id))
self.pool = Pool[Host](254, create_host)
async def destroy_host(h: Host) -> None:
# Doesn't matter, we never return hosts to the pool as 'dirty'.
pass
self.pool = Pool[Host](254, create_host, destroy_host)
async def cleanup() -> None:
if self.lock_filename:
@@ -85,5 +89,5 @@ class HostRegistry:
return await self.pool.get()
async def release_host(self, host: Host) -> None:
return await self.pool.put(host)
return await self.pool.put(host, is_dirty=False)

View File

@@ -1,5 +1,5 @@
import asyncio
from typing import Generic, Callable, Awaitable, TypeVar, AsyncContextManager, Final
from typing import Generic, Callable, Awaitable, TypeVar, AsyncContextManager, Final, Optional
T = TypeVar('T')
@@ -10,12 +10,15 @@ class Pool(Generic[T]):
on demand, so that if you use less, you don't create anything upfront.
If there is no object in the pool and all N objects are in use, you want
to wait until one of the object is returned to the pool. Expects a
builder async function to build a new object.
builder async function to build a new object and a destruction async
function to clean up after a 'dirty' object (see below).
Usage example:
async def start_server():
return Server()
pool = Pool(4, start_server)
async def destroy_server(server):
await server.free_resources()
pool = Pool(4, start_server, destroy_server)
server = await pool.get()
try:
@@ -24,25 +27,51 @@ class Pool(Generic[T]):
await pool.put(server)
Alternatively:
async with pool.instance() as server:
async with pool.instance(dirty_on_exception=False) as server:
await run_test(test, server)
If the object is considered no longer usable by other users of the pool
you can 'steal' it, which frees up space in the pool.
you can pass `is_dirty=True` flag to `put`, which will cause the object
to be 'destroyed' (by calling the provided `destroy` function on it) and
will free up space in the pool.
server = await.pool.get()
dirty = True
try:
dirty = await run_test(test, server)
finally:
if dirty:
await pool.steal()
else:
await pool.put(server)
await pool.put(server, is_dirty=dirty)
Alternatively:
async with (cm := pool.instance(dirty_on_exception=True)) as server:
cm.dirty = await run_test(test, server)
# It will also be considered dirty if run_test throws an exception
To atomically return a dirty object and use the freed space to obtain
another object, you can use `replace_dirty`. This is different from a
`put(is_dirty=True)` call followed by a `get` call, where a concurrent
waiter might take the space freed up by `put`.
server = await.pool.get()
dirty = False
try:
for _ in range(num_runs):
if dirty:
srv = server
server = None
server = await pool.replace_dirty(srv)
dirty = await run_test(test, server)
finally:
if server:
await pool.put(is_dirty=dirty)
"""
def __init__(self, max_size: int, build: Callable[..., Awaitable[T]]):
def __init__(self, max_size: int,
build: Callable[..., Awaitable[T]],
destroy: Callable[[T], Awaitable[None]]):
assert(max_size >= 0)
self.max_size: Final[int] = max_size
self.build: Final[Callable[..., Awaitable[T]]] = build
self.destroy: Final[Callable[[T], Awaitable]] = destroy
self.cond: Final[asyncio.Condition] = asyncio.Condition()
self.pool: list[T] = []
self.total: int = 0 # len(self.pool) + leased objects
@@ -64,6 +93,68 @@ class Pool(Generic[T]):
# No object in pool, but total < max_size so we can construct one
self.total += 1
return await self._build_and_get(*args, **kwargs)
async def put(self, obj: T, is_dirty: bool):
"""Return a previously borrowed object to the pool
if it's not dirty, otherwise destroy the object
and free up space in the pool.
"""
if is_dirty:
await self.destroy(obj)
async with self.cond:
if is_dirty:
self.total -= 1
else:
self.pool.append(obj)
self.cond.notify()
async def replace_dirty(self, obj: T, *args, **kwargs) -> T:
"""Atomically `put` a previously borrowed dirty object and `get` another one.
The 'atomicity' guarantees that the space freed up by the returned object
is used to return another object to the caller. The caller doesn't need
to wait for space to be freed by another user of the pool.
Note: the returned object might have been constructed earlier or it might
be built right now, as in `get`.
*args and **kwargs are used as in `get`.
"""
await self.destroy(obj)
async with self.cond:
if self.pool:
self.total -= 1
return self.pool.pop()
# Need to construct a new object.
# The space for this object is already accounted for in self.total.
return await self._build_and_get(*args, **kwargs)
def instance(self, dirty_on_exception: bool, *args, **kwargs) -> AsyncContextManager[T]:
class Instance:
def __init__(self, pool: Pool[T], dirty_on_exception: bool):
self.pool = pool
self.dirty = False
self.dirty_on_exception = dirty_on_exception
async def __aenter__(self):
self.obj = await self.pool.get(*args, **kwargs)
return self.obj
async def __aexit__(self, exc_type, exc, obj):
if self.obj:
self.dirty |= self.dirty_on_exception and exc is not None
await self.pool.put(self.obj, is_dirty=self.dirty)
self.obj = None
return Instance(self, dirty_on_exception)
async def _build_and_get(self, *args, **kwargs) -> T:
"""Precondition: we allocated space for this object
(it's included in self.total).
"""
try:
obj = await self.build(*args, **kwargs)
except:
@@ -72,33 +163,3 @@ class Pool(Generic[T]):
self.cond.notify()
raise
return obj
async def steal(self) -> None:
"""Take ownership of a previously borrowed object.
Frees up space in the pool.
"""
async with self.cond:
self.total -= 1
self.cond.notify()
async def put(self, obj: T):
"""Return a previously borrowed object to the pool."""
async with self.cond:
self.pool.append(obj)
self.cond.notify()
def instance(self, *args, **kwargs) -> AsyncContextManager[T]:
class Instance:
def __init__(self, pool):
self.pool = pool
async def __aenter__(self):
self.obj = await self.pool.get(*args, **kwargs)
return self.obj
async def __aexit__(self, exc_type, exc, obj):
if self.obj:
await self.pool.put(self.obj)
self.obj = None
return Instance(self)

View File

@@ -21,14 +21,17 @@ logger = logging.getLogger(__name__)
class HTTPError(Exception):
def __init__(self, uri, code, message):
def __init__(self, uri, code, params, json, message):
super().__init__(message)
self.uri = uri
self.code = code
self.params = params
self.json = json
self.message = message
def __str__(self):
return f"HTTP error {self.code}: {self.message}, uri {self.uri}"
return f"HTTP error {self.code}, uri: {self.uri}, " \
f"params: {self.params}, json: {self.json}, body:\n{self.message}"
# TODO: support ssl and verify_ssl
@@ -63,7 +66,7 @@ class RESTClient(metaclass=ABCMeta):
params = params, json = json, timeout = client_timeout) as resp:
if resp.status != 200:
text = await resp.text()
raise HTTPError(uri, resp.status, f"{text}, params {params}, json {json}")
raise HTTPError(uri, resp.status, params, json, text)
if response_type is not None:
# Return response.text() or response.json()
return await getattr(resp, response_type)()

View File

@@ -17,8 +17,10 @@ import pathlib
import shutil
import tempfile
import time
import traceback
from typing import Optional, Dict, List, Set, Tuple, Callable, AsyncIterator, NamedTuple, Union
import uuid
from enum import Enum
from io import BufferedWriter
from test.pylib.host_registry import Host, HostRegistry
from test.pylib.pool import Pool
@@ -111,6 +113,7 @@ SCYLLA_CMDLINE_OPTIONS = [
'--max-networking-io-control-blocks', '100',
'--unsafe-bypass-fsync', '1',
'--kernel-page-cache', '1',
'--commitlog-use-o-dsync', '0',
'--abort-on-lsa-bad-alloc', '1',
'--abort-on-seastar-bad-alloc',
'--abort-on-internal-error', '1',
@@ -173,6 +176,11 @@ def merge_cmdline_options(base: List[str], override: List[str]) -> List[str]:
return run()
class CqlUpState(Enum):
NOT_CONNECTED = 1,
CONNECTED = 2,
QUERIED = 3
class ScyllaServer:
"""Starts and handles a single Scylla server, managing logs, checking if responsive,
and cleanup when finished."""
@@ -295,7 +303,7 @@ class ScyllaServer:
except Exception as exc: # pylint: disable=broad-except
return f"Exception when reading server log {self.log_filename}: {exc}"
async def cql_is_up(self) -> bool:
async def cql_is_up(self) -> CqlUpState:
"""Test that CQL is serving (a check we use at start up)."""
caslog = logging.getLogger('cassandra')
oldlevel = caslog.getEffectiveLevel()
@@ -310,6 +318,7 @@ class ScyllaServer:
# work, so rely on this "side effect".
profile = ExecutionProfile(load_balancing_policy=WhiteListRoundRobinPolicy([self.ip_addr]),
request_timeout=self.START_TIMEOUT)
connected = False
try:
# In a cluster setup, it's possible that the CQL
# here is directed to a node different from the initial contact
@@ -321,16 +330,19 @@ class ScyllaServer:
protocol_version=4,
auth_provider=auth) as cluster:
with cluster.connect() as session:
session.execute("SELECT * FROM system.local")
connected = True
# See the comment above about `auth::standard_role_manager`. We execute
# a 'real' query to ensure that the auth service has finished initializing.
session.execute("SELECT key FROM system.local where key = 'local'")
self.control_cluster = Cluster(execution_profiles=
{EXEC_PROFILE_DEFAULT: profile},
contact_points=[self.ip_addr],
auth_provider=auth)
self.control_connection = self.control_cluster.connect()
return True
return CqlUpState.QUERIED
except (NoHostAvailable, InvalidRequest, OperationTimedOut) as exc:
self.logger.debug("Exception when checking if CQL is up: %s", exc)
return False
return CqlUpState.CONNECTED if connected else CqlUpState.NOT_CONNECTED
finally:
caslog.setLevel(oldlevel)
# Any other exception may indicate a problem, and is passed to the caller.
@@ -363,6 +375,7 @@ class ScyllaServer:
self.start_time = time.time()
sleep_interval = 0.1
cql_up_state = CqlUpState.NOT_CONNECTED
while time.time() < self.start_time + self.START_TIMEOUT:
if self.cmd.returncode:
@@ -377,20 +390,30 @@ class ScyllaServer:
logpath = log_handler.baseFilename # type: ignore
else:
logpath = "?"
raise RuntimeError(f"Failed to start server at host {self.ip_addr}.\n"
raise RuntimeError(f"Failed to start server with ID = {self.server_id}, IP = {self.ip_addr}.\n"
"Check the log files:\n"
f"{logpath}\n"
f"{self.log_filename}")
if hasattr(self, "host_id") or await self.get_host_id(api):
if await self.cql_is_up():
cql_up_state = await self.cql_is_up()
if cql_up_state == CqlUpState.QUERIED:
return
# Sleep and retry
await asyncio.sleep(sleep_interval)
raise RuntimeError(f"failed to start server {self.ip_addr}, "
f"check server log at {self.log_filename}")
err = f"Failed to start server with ID = {self.server_id}, IP = {self.ip_addr}."
if hasattr(self, "host_id"):
err += f" Managed to obtain the server's Host ID ({self.host_id})"
if cql_up_state == CqlUpState.CONNECTED:
err += " and to connect the CQL driver, but failed to execute a query."
else:
err += " but failed to connect the CQL driver."
else:
err += " Failed to obtain the server's Host ID."
err += f"\nCheck server log at {self.log_filename}."
raise RuntimeError(err)
async def force_schema_migration(self) -> None:
"""This is a hack to change schema hash on an existing cluster node
@@ -705,6 +728,8 @@ class ScyllaCluster:
to any specific test, throwing it here would stop a specific
test."""
if self.start_exception:
# Mark as dirty so further test cases don't try to reuse this cluster.
self.is_dirty = True
raise self.start_exception
for server in self.running.values():
@@ -729,11 +754,14 @@ class ScyllaCluster:
if server_id not in self.running:
return ScyllaCluster.ActionReturn(success=False, msg=f"Server {server_id} unknown")
self.is_dirty = True
server = self.running.pop(server_id)
server = self.running[server_id]
# Remove the server from `running` only after we successfully stop it.
# Stopping may fail and if we removed it from `running` now it might leak.
if gracefully:
await server.stop_gracefully()
else:
await server.stop()
self.running.pop(server_id)
self.stopped[server_id] = server
return ScyllaCluster.ActionReturn(success=True, msg=f"{server} stopped")
@@ -753,8 +781,10 @@ class ScyllaCluster:
self.is_dirty = True
server = self.stopped.pop(server_id)
server.seeds = self._seeds()
await server.start(self.api)
# Put the server in `running` before starting it.
# Starting may fail and if we didn't add it now it might leak.
self.running[server_id] = server
await server.start(self.api)
return ScyllaCluster.ActionReturn(success=True, msg=f"{server} started")
async def server_restart(self, server_id: ServerNum) -> ActionReturn:
@@ -817,7 +847,9 @@ class ScyllaClusterManager:
self.is_after_test_ok: bool = False
# API
# NOTE: need to make a safe temp dir as tempfile can't make a safe temp sock name
self.manager_dir: str = tempfile.mkdtemp(prefix="manager-", dir=base_dir)
# Put the socket in /tmp, not base_dir, to avoid going over the length
# limit of UNIX-domain socket addresses (issue #12622).
self.manager_dir: str = tempfile.mkdtemp(prefix="manager-", dir="/tmp")
self.sock_path: str = f"{self.manager_dir}/api"
app = aiohttp.web.Application()
self._setup_routes(app)
@@ -828,7 +860,8 @@ class ScyllaClusterManager:
if self.is_running:
self.logger.warning("ScyllaClusterManager already running")
return
await self._get_cluster()
self.cluster = await self.clusters.get(self.logger)
self.logger.info("First Scylla cluster: %s", self.cluster)
self.cluster.setLogger(self.logger)
await self.runner.setup()
self.site = aiohttp.web.UnixSite(self.runner, path=self.sock_path)
@@ -839,12 +872,10 @@ class ScyllaClusterManager:
self.current_test_case_full_name = f'{self.test_uname}::{test_case_name}'
self.logger.info("Setting up %s", self.current_test_case_full_name)
if self.cluster.is_dirty:
self.logger.info(f"Current cluster %s is dirty after last test, stopping...", self.cluster.name)
await self.clusters.steal()
await self.cluster.stop()
await self.cluster.release_ips()
self.logger.info(f"Waiting for new cluster for test %s...", self.current_test_case_full_name)
await self._get_cluster()
self.logger.info(f"Current cluster %s is dirty after test %s, replacing with a new one...",
self.cluster.name, self.current_test_case_full_name)
self.cluster = await self.clusters.replace_dirty(self.cluster, self.logger)
self.logger.info("Got new Scylla cluster: %s", self.cluster.name)
self.cluster.setLogger(self.logger)
self.logger.info("Leasing Scylla cluster %s for test %s", self.cluster, self.current_test_case_full_name)
self.cluster.before_test(self.current_test_case_full_name)
@@ -860,44 +891,56 @@ class ScyllaClusterManager:
del self.site
if not self.cluster.is_dirty:
self.logger.info("Returning Scylla cluster %s for test %s", self.cluster, self.test_uname)
await self.clusters.put(self.cluster)
await self.clusters.put(self.cluster, is_dirty=False)
else:
self.logger.info("ScyllaManager: Scylla cluster %s is dirty after %s, stopping it",
self.cluster, self.test_uname)
await self.clusters.steal()
await self.cluster.stop()
await self.clusters.put(self.cluster, is_dirty=True)
del self.cluster
if os.path.exists(self.manager_dir):
shutil.rmtree(self.manager_dir)
self.is_running = False
async def _get_cluster(self) -> None:
self.cluster = await self.clusters.get(self.logger)
self.logger.info("Got new Scylla cluster %s", self.cluster)
def _setup_routes(self, app: aiohttp.web.Application) -> None:
app.router.add_get('/up', self._manager_up)
app.router.add_get('/cluster/up', self._cluster_up)
app.router.add_get('/cluster/is-dirty', self._is_dirty)
app.router.add_get('/cluster/replicas', self._cluster_replicas)
app.router.add_get('/cluster/running-servers', self._cluster_running_servers)
app.router.add_get('/cluster/host-ip/{server_id}', self._cluster_server_ip_addr)
app.router.add_get('/cluster/host-id/{server_id}', self._cluster_host_id)
app.router.add_get('/cluster/before-test/{test_case_name}', self._before_test_req)
app.router.add_get('/cluster/after-test', self._after_test)
app.router.add_get('/cluster/mark-dirty', self._mark_dirty)
app.router.add_get('/cluster/server/{server_id}/stop', self._cluster_server_stop)
app.router.add_get('/cluster/server/{server_id}/stop_gracefully',
self._cluster_server_stop_gracefully)
app.router.add_get('/cluster/server/{server_id}/start', self._cluster_server_start)
app.router.add_get('/cluster/server/{server_id}/restart', self._cluster_server_restart)
app.router.add_put('/cluster/addserver', self._cluster_server_add)
app.router.add_put('/cluster/remove-node/{initiator}', self._cluster_remove_node)
app.router.add_get('/cluster/decommission-node/{server_id}',
self._cluster_decommission_node)
app.router.add_get('/cluster/server/{server_id}/get_config', self._server_get_config)
app.router.add_put('/cluster/server/{server_id}/update_config', self._server_update_config)
def make_catching_handler(handler: Callable) -> Callable:
async def catching_handler(request) -> aiohttp.web.Response:
"""Catch all exceptions and return them to the client.
Without this, the client would get an 'Internal server error' message
without any details. Thanks to this the test log shows the actual error.
"""
try:
return await handler(request)
except Exception as e:
tb = traceback.format_exc()
self.logger.error(f'Exception when executing {handler.__name__}: {e}\n{tb}')
return aiohttp.web.Response(status=500, text=str(e))
return catching_handler
def add_get(route: str, handler: Callable):
app.router.add_get(route, make_catching_handler(handler))
def add_put(route: str, handler: Callable):
app.router.add_put(route, make_catching_handler(handler))
add_get('/up', self._manager_up)
add_get('/cluster/up', self._cluster_up)
add_get('/cluster/is-dirty', self._is_dirty)
add_get('/cluster/replicas', self._cluster_replicas)
add_get('/cluster/running-servers', self._cluster_running_servers)
add_get('/cluster/host-ip/{server_id}', self._cluster_server_ip_addr)
add_get('/cluster/host-id/{server_id}', self._cluster_host_id)
add_get('/cluster/before-test/{test_case_name}', self._before_test_req)
add_get('/cluster/after-test', self._after_test)
add_get('/cluster/mark-dirty', self._mark_dirty)
add_get('/cluster/server/{server_id}/stop', self._cluster_server_stop)
add_get('/cluster/server/{server_id}/stop_gracefully', self._cluster_server_stop_gracefully)
add_get('/cluster/server/{server_id}/start', self._cluster_server_start)
add_get('/cluster/server/{server_id}/restart', self._cluster_server_restart)
add_put('/cluster/addserver', self._cluster_server_add)
add_put('/cluster/remove-node/{initiator}', self._cluster_remove_node)
add_get('/cluster/decommission-node/{server_id}', self._cluster_decommission_node)
add_get('/cluster/server/{server_id}/get_config', self._server_get_config)
add_put('/cluster/server/{server_id}/update_config', self._server_update_config)
async def _manager_up(self, _request) -> aiohttp.web.Response:
return aiohttp.web.Response(text=f"{self.is_running}")

View File

@@ -143,7 +143,8 @@ private:
throw std::bad_function_call();
}
virtual const std::vector<view_ptr>& get_table_views(data_dictionary::table t) const override {
return {};
static const std::vector<view_ptr> empty;
return empty;
}
virtual sstring get_available_index_name(data_dictionary::database db, std::string_view ks_name, std::string_view table_name,
std::optional<sstring> index_name_root) const override {

View File

@@ -735,6 +735,7 @@ bool abstract_type::is_collection() const {
bool abstract_type::is_tuple() const {
struct visitor {
bool operator()(const abstract_type&) { return false; }
bool operator()(const reversed_type_impl& t) { return t.underlying_type()->is_tuple(); }
bool operator()(const tuple_type_impl&) { return true; }
};
return visit(*this, visitor{});
@@ -1956,6 +1957,10 @@ data_value deserialize_aux(const tuple_type_impl& t, View v) {
template<FragmentedView View>
utils::multiprecision_int deserialize_value(const varint_type_impl&, View v) {
if (v.empty()) {
throw marshal_exception("cannot deserialize multiprecision int - empty buffer");
}
skip_empty_fragments(v);
bool negative = v.current_fragment().front() < 0;
utils::multiprecision_int num;
while (v.size_bytes()) {
@@ -2052,6 +2057,7 @@ bool deserialize_value(const boolean_type_impl&, View v) {
if (v.size_bytes() != 1) {
throw marshal_exception(format("cannot deserialize boolean, size mismatch ({:d})", v.size_bytes()));
}
skip_empty_fragments(v);
return v.current_fragment().front() != 0;
}