Compare commits

...

114 Commits

Author SHA1 Message Date
Anna Mikhlin
f1c45553bc release: prepare for 5.2.1 2023-05-08 22:15:46 +03:00
Botond Dénes
1a288e0a78 Update seastar submodule
* seastar 1488aaf8...aa46b980 (1):
  > core/on_internal_error: always log error with backtrace

Fixes: #13786
2023-05-08 10:30:10 +03:00
Marcin Maliszkiewicz
a2fed1588e db: view: use deferred_close for closing staging_sstable_reader
When consume_in_thread throws the reader should still be closed.

Related https://github.com/scylladb/scylla-enterprise/issues/2661

Closes #13398
Refs: scylladb/scylla-enterprise#2661
Fixes: #13413

(cherry picked from commit 99f8d7dcbe)
2023-05-08 09:41:07 +03:00
Botond Dénes
f07a06d390 Merge 'service:forward_service: use long type instead of counter in function mocking' from Michał Jadwiszczak
Aggregation query on counter column is failing because forward_service is looking for function with counter as an argument and such function doesn't exist. Instead the long type should be used.

Fixes: #12939

Closes #12963

* github.com:scylladb/scylladb:
  test:boost: counter column parallelized aggregation test
  service:forward_service: use long type when column is counter

(cherry picked from commit 61e67b865a)
2023-05-07 14:27:29 +03:00
Anna Stuchlik
4ec531d807 doc: remove the sequential repair option from docs
Fixes https://github.com/scylladb/scylladb/issues/12132

The sequential repair mode is not supported. This commit
removes the incorrect information from the documentation.

Closes #13544

(cherry picked from commit 3d25edf539)
2023-05-07 14:27:29 +03:00
Asias He
4867683f80 storage_service: Fix removing replace node as pending
Consider

- n1, n2, n3
- n3 is down
- n4 replaces n3 with the same ip address 127.0.0.3
- Inside the storage_service::handle_state_normal callback for 127.0.0.3 on n1/n2

  ```
  auto host_id = _gossiper.get_host_id(endpoint);
  auto existing = tmptr->get_endpoint_for_host_id(host_id);
  ```

  host_id = new host id
  existing = empty

  As a result, del_replacing_endpoint() will not be called.

This means 127.0.0.3 will not be removed as a pending node on n1 and n2 when
replacing is done. This is wrong.

This is a regression since commit 9942c60d93
(storage_service: do not inherit the host_id of a replaced a node), where
replacing node uses a new host id than the node to be replaced.

To fix, call del_replacing_endpoint() when a node becomes NORMAL and existing
is empty.

Before:
n1:
storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3
token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Marked ops done from coordinator=127.0.0.3
storage_service - Node 127.0.0.3 state jump to normal
storage_service - Set host_id=6f9ba4e8-9457-4c76-8e2a-e2be257fe123 to be owned by node=127.0.0.3

After:
n1:
storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3
token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Marked ops done from coordinator=127.0.0.3
storage_service - Node 127.0.0.3 state jump to normal
token_metadata - Removed node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - Set host_id=72219180-e3d1-4752-b644-5c896e4c2fed to be owned by node=127.0.0.3

Tests: https://github.com/scylladb/scylla-dtest/pull/3126

Closes #13677

Fixes: https://github.com/scylladb/scylla-enterprise/issues/2852

(cherry picked from commit a8040306bb)
2023-05-03 14:15:13 +03:00
Botond Dénes
0e42defe06 readers: evictable_reader: skip progress guarantee when next pos is partition start
The evictable reader must ensure that each buffer fill makes forward
progress, i.e. the last fragment in the buffer has a position larger
than the last fragment from the last buffer-fill. Otherwise, the reader
could get stuck in an infinite loop between buffer fills, if the reader
is evicted in-between.
The code guranteeing this forward change has a bug: when the next
expected position is a partition-start (another partition), the code
would loop forever, effectively reading all there is from the underlying
reader.
To avoid this, add a special case to ignore the progress guarantee loop
altogether when the next expected position is a partition start. In this
case, progress is garanteed anyway, because there is exactly one
partition-start fragment in each partition.

Fixes: #13491

Closes #13563

(cherry picked from commit 72003dc35c)
2023-05-02 21:58:41 +03:00
Avi Kivity
f73d017f05 tools: toolchain: regenerate
Fixes #13744
2023-05-02 13:16:59 +03:00
Pavel Emelyanov
3723678b82 scylla-gdb: Parse and eval _all_threads without quotes
I've no idea why the quotes are there at all, it works even without
them. However, with quotes gdb-13 fails to find the _all_threads static
thread-local variable _unless_ it's printed with gdb "p" command
beforehand.

fixes: #13125

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13132

(cherry picked from commit 537510f7d2)
2023-05-02 13:16:59 +03:00
Wojciech Mitros
b0a7c02e09 rust: update dependencies
Cranelift-codegen 0.92.0 and wasmtime 5.0.0 have security issues
potentially allowing malicious UDFs to read some memory outside
the wasm sandbox. This patch updates them to versions 0.92.1
and 5.0.1 respectively, where the issues are fixed.

Fixes #13157

Closes #13171

(cherry picked from commit aad2afd417)
2023-04-27 22:01:44 +03:00
Wojciech Mitros
f18c49dcc6 rust: update dependencies
Wasmtime added some improvements in recent releases - particularly,
two security issues were patched in version 2.0.2. There were no
breaking changes for our use other than the strategy of returning
Traps - all of them are now anyhow::Errors instead, but we can
still downcast to them, and read the corresponding error message.

The cxx, anyhow and futures dependency versions now match the
versions saved in the Cargo.lock.

Closes #12830

(cherry picked from commit 8b756cb73f)

Ref #13157
2023-04-27 22:00:54 +03:00
Anna Stuchlik
35dfec78d1 doc: fixes https://github.com/scylladb/scylladb/issues/12964, removes the information that the CDC options are experimental
Closes #12973

(cherry picked from commit 4dd1659d0b)
2023-04-27 21:06:49 +03:00
Raphael S. Carvalho
dbd8ca4ade replica: Fix undefined behavior in table::generate_and_propagate_view_updates()
Undefined behavior because the evaluation order is undefined.

With GCC, where evaluation is right-to-left, schema will be moved
once it's forwarded to make_flat_mutation_reader_from_mutations_v2().

The consequence is that memory tracking of mutation_fragment_v2
(for tracking only permit used by view update), which uses the schema,
can be incorrect. However, it's more likely that Scylla will crash
when estimating memory usage for row, which access schema column
information using schema::column_at(), which in turn asserts that
the requested column does really exist.

Fixes #13093.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13092

(cherry picked from commit 3fae46203d)
2023-04-27 19:56:38 +03:00
Anna Stuchlik
1be4afb842 doc: remove incorrect info about BYPASS CACHE
Fixes https://github.com/scylladb/scylladb/issues/13106

This commit removes the information that BYPASS CACHE
is an Enterprise-only feature and replaces that info
with the link to the BYPASS CACHE description.

Closes #13316

(cherry picked from commit 1cfea1f13c)
2023-04-27 19:54:04 +03:00
Kefu Chai
7cc9f5a05f dist/redhat: enforce dependency on %{release} also
* tools/python3 279b6c1...cf7030a (1):
  > dist: redhat: provide only a single version

s/%{version}/%{version}-%{release}/ in `Requires:` sections.

this enforces the runtime dependencies of exactly the same
releases between scylla packages.

Fixes #13222
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 7165551fd7)
2023-04-27 19:27:34 +03:00
Nadav Har'El
bf7fc9709d test/rest_api: fix flaky test for toppartitions
The REST test test_storage_service.py::test_toppartitions_pk_needs_escaping
was flaky. It tests the toppartition request, which unfortunately needs
to choose a sampling duration in advance, and we chose 1 second which we
considered more than enough - and indeed typically even 1ms is enough!
but very rarely (only know of only one occurance, in issue #13223) one
second is not enough.

Instead of increasing this 1 second and making this test even slower,
this patch takes a retry approach: The tests starts with a 0.01 second
duration, and is then retried with increasing durations until it succeeds
or a 5-seconds duration is reached. This retry approach has two benefits:
1. It de-flakes the test (allowing a very slow test to take 5 seconds
instead of 1 seconds which wasn't enough), and 2. At the same time it
makes a successful test much faster (it used to always take a full
second, now it takes 0.07 seconds on a dev build on my laptop).

A *failed* test may, in some cases, take 10 seconds after this patch
(although in some other cases, an error will be caught immediately),
but I consider this acceptable - this test should pass, after all,
and a failure indicates a regression and taking 10 seconds will be
the last of our worries in that case.

Fixes #13223.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13238

(cherry picked from commit c550e681d7)
2023-04-27 19:16:58 +03:00
Nadav Har'El
00a8c3a433 test/alternator: increase CQL connection timeout
This patch increases the connection timeout in the get_cql_cluster()
function in test/cql-pytest/run.py. This function is used to test
that Scylla came up, and also test/alternator/run uses it to set
up the authentication - which can only be done through CQL.

The Python driver has 2-second and 5-second default timeouts that should
have been more than enough for everybody (TM), but in #13239 we saw
that in one case it apparently wasn't enough. So to be extra safe,
let's increase the default connection-related timeouts to 60 seconds.

Note this change only affects the Scylla *boot* in the test/*/run
scripts, and it does not affect the actual tests - those have different
code to connect to Scylla (see cql_session() in test/cql-pytest/util.py),
and we already increased the timeouts there in #11289.

Fixes #13239

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13291

(cherry picked from commit 4fdcee8415)
2023-04-27 19:15:39 +03:00
Tomasz Grabiec
c08ed39a33 direct_failure_detector: Avoid throwing exceptions in the success path
sleep_abortable() is aborted on success, which causes sleep_aborted
exception to be thrown. This causes scylla to throw every 100ms for
each pinged node. Throwing may reduce performance if happens often.

Also, it spams the logs if --logger-log-level exception=trace is enabled.

Avoid by swallowing the exception on cancellation.

Fixes #13278.

Closes #13279

(cherry picked from commit 99cb948eac)
2023-04-27 19:14:31 +03:00
Kefu Chai
04424f8956 test: cql-pytest: test_describe: clamp bloom filter's fp rate
before this change, we use `round(random.random(), 5)` for
the value of `bloom_filter_fp_chance` config option. there are
chances that this expression could return a number lower or equal
to 6.71e-05.

but we do have a minimal for this option, which is defined by
`utils::bloom_calculations::probs`. and the minimal false positive
rate is 6.71e-05.

we are observing test failures where the we are using 0 for
the option, and scylla right rejected it with the error message of
```
bloom_filter_fp_chance must be larger than 6.71e-05 and less than or equal to 1.0 (got 0)
```.

so, in this change, to address the test failure, we always use a number
slightly greater or equal to a number slightly greater to the minimum to
ensure that the randomly picked number is in the range of supported
false positive rate.

Fixes #13313
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13314

(cherry picked from commit 33f4012eeb)
2023-04-27 19:12:53 +03:00
Beni Peled
429b696bbc release: prepare for 5.2.0 2023-04-27 16:26:43 +03:00
Beni Peled
a89867d8c2 release: prepare for 5.2.0-rc5 2023-04-25 14:37:54 +03:00
Benny Halevy
6ad94fedf3 utils: clear_gently: do not clear null unique_ptr
Otherwise the null pointer is dereferenced.

Add a unit test reproducing the issue
and testing this fix.

Fixes #13636

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 12877ad026)
2023-04-24 17:51:01 +03:00
Anna Stuchlik
a6188d6abc doc: document tombstone_gc as not experimental
The tombstone_gc was documented as experimental in version 5.0.
It is no longer experimental in version 5.2.
This commit updates the information about the option.

Closes #13469

(cherry picked from commit a68b976c91)
2023-04-24 11:54:06 +03:00
Botond Dénes
50095cc3a5 Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun
in `make_group0_history_state_id_mutation`, when adding a new entry to
the group 0 history table, if the parameter `gc_older_than` is engaged,
we create a range tombstone in the mutation which deletes entries older
than the new one by `gc_older_than`. In particular if
`gc_older_than = 0`, we want to delete all older entries.

There was a subtle bug there: we were using millisecond resolution when
generating the tombstone, while the provided state IDs used microsecond
resolution. On a super fast machine it could happen that we managed to
perform two schema changes in a single millisecond; this happened
sometimes in `group0_test.test_group0_history_clearing_old_entries`
on our new CI/promotion machines, causing the test to fail because the
tombstone didn't clear the entry correspodning to the previous schema
change when performing the next schema change (since they happened in
the same millisecond).

Use microsecond resolution to fix that. The consecutive state IDs used
in group 0 mutations are guaranteed to be strictly monotonic at
microsecond resolution (see `generate_group0_state_id` in
service/raft/raft_group0_client.cc).

Fixes #13594

Closes #13604

* github.com:scylladb/scylladb:
  db: system_keyspace: use microsecond resolution for group0_history range tombstone
  utils: UUID_gen: accept decimicroseconds in min_time_UUID

(cherry picked from commit 10c1f1dc80)
2023-04-23 16:03:02 +03:00
Botond Dénes
7b2215d8e0 Merge 'Backport bugfixes regarding UDT, UDF, UDA interactions to branch-5.2' from Wojciech Mitros
This patch backports https://github.com/scylladb/scylladb/pull/12710 to branch-5.2. To resolve the conflicts that it's causing, it also includes
* https://github.com/scylladb/scylladb/pull/12680
* https://github.com/scylladb/scylladb/pull/12681

Closes #13542

* github.com:scylladb/scylladb:
  uda: change the UDF used in a UDA if it's replaced
  functions: add helper same_signature method
  uda: return aggregate functions as shared pointers
  udf: also check reducefunc to confirm that a UDF is not used in a UDA
  udf: fix dropping UDFs that share names with other UDFs used in UDAs
  pytest: add optional argument for new_function argument types
  udt: disallow dropping a user type used in a user function
2023-04-19 01:38:08 -04:00
Botond Dénes
da9f90362d Merge 'Compaction reevaluation bug fixes' from Raphael "Raph" Carvalho
A problem in compaction reevaluation can cause the SSTable set to be left uncompacted for indefinite amount of time, potentially causing space and read amplification to be suboptimal.

Two revaluation problems are being fixed, one after off-strategy compaction ended, and another in compaction manager which intends to periodically reevaluate a need for compaction.

Fixes https://github.com/scylladb/scylladb/issues/13429.
Fixes https://github.com/scylladb/scylladb/issues/13430.

Closes #13431

* github.com:scylladb/scylladb:
  compaction: Make compaction reevaluation actually periodic
  replica: Reevaluate regular compaction on off-strategy completion

(cherry picked from commit 9a02315c6b)
2023-04-19 01:14:33 -04:00
Botond Dénes
c9a17c80f6 mutation/mutation_compactor: consume_partition_end(): reset _stop
The purpose of `_stop` is to remember whether the consumption of the
last partition was interrupted or it was consumed fully. In the former
case, the compactor allows retreiving the compaction state for the given
partition, so that its compaction can be resumed at a later point in
time.
Currently, `_stop` is set to `stop_iteration::yes` whenever the return
value of any of the `consume()` methods is also `stop_iteration::yes`.
Meaning, if the consuming of the partition is interrupted, this is
remembered in `_stop`.
However, a partition whose consumption was interrupted is not always
continued later. Sometimes consumption of a partitions is interrputed
because the partition is not interesting and the downstream consumer
wants to stop it. In these cases the compactor should not return an
engagned optional from `detach_state()`, because there is not state to
detach, the state should be thrown away. This was incorrectly handled so
far and is fixed in this patch, but overwriting `_stop` in
`consume_partition_end()` with whatever the downstream consumer returns.
Meaning if they want to skip the partition, then `_stop` is reset to
`stop_partition::no` and `detach_state()` will return a disengaged
optional as it should in this case.

Fixes: #12629

Closes #13365

(cherry picked from commit bae62f899d)
2023-04-18 02:32:24 -04:00
Wojciech Mitros
7242c42089 uda: change the UDF used in a UDA if it's replaced
Currently, if a UDA uses a UDF that's being replaced,
the UDA will still keep using the old UDF until the
node is restarted.
This patch fixes this behavior by checking all UDAs
when replacing a UDF and updating them if necessary.

Fixes #12709

(cherry picked from commit 02bfac0c66)
2023-04-17 13:14:46 +02:00
Wojciech Mitros
70ff69afab functions: add helper same_signature method
When deciding whether two functions have the same
signature, we have to check if they have the same name
and parameter types. Additionally, if they're represented
by pointers, we need to check if any of them is a nullptr.
This logic is used multiple times, so it's extracted to
a separate function.
To use this function, the `used_by_user_aggregate` method
takes now a function instead of name and types list - we
can do it because we always use it with an existing user
function (that we're trying to drop).
The method will also be useful when we'll be not dropping,
but replacing a user function.

(cherry picked from commit 58987215dc)
2023-04-17 13:14:40 +02:00
Wojciech Mitros
5fd4bb853b uda: return aggregate functions as shared pointers
We will want to reuse the functions that we get from an aggregate
without making a deep copy, and it's only possible if we get
pointers from the aggregate instead of actual values.

(cherry picked from commit 20069372e7)
2023-04-17 13:14:24 +02:00
Wojciech Mitros
313649e86d udf: also check reducefunc to confirm that a UDF is not used in a UDA
When dropping a UDF we're checking if it's not begin used in any UDAs
and fail otherwise. However, we're only checking its state function
and final function, and it may also be used as its reduce function.
This patch adds the missing checks and a test for them.

(cherry picked from commit ef1dac813b)
2023-04-17 13:14:16 +02:00
Wojciech Mitros
14d8cec130 udf: fix dropping UDFs that share names with other UDFs used in UDAs
Currently, when dropping a function, we only check if there exist
an aggregate that uses a function with the same name as its state
function or final function. This may cause the drop to fail even
when it's just another UDF with the same name that's used in the
aggregate, even when the actual dropped function is not used there.
This patch fixes this by checking whether not only the name of the
UDA's sfunc and finalfunc, but also their argument types.

(cherry picked from commit 49077dd144)
2023-04-17 13:14:09 +02:00
Wojciech Mitros
203cbb79a1 pytest: add optional argument for new_function argument types
When multiple functions with the same name but different argument types
are created, the default drop statement for these functions will fail
because it does not include the argument types.
With this change, this problem can be worked around by specifying
argument types when creating the function, as this will cause the drop
statement to include them.

(cherry picked from commit 8791b0faf5)
2023-04-17 13:13:59 +02:00
Wojciech Mitros
51f19d1b8c udt: disallow dropping a user type used in a user function
Currently, nothing prevents us from dropping a user type
used in a user function, even though doing so may make us
unable to use the function correctly.
This patch prevents this behavior by checking all function
argument and return types when executing a drop type statement
and preventing it from completing if the type is referenced
by any of them.

(cherry picked from commit 86c61828e6)
2023-04-17 13:13:35 +02:00
Anna Stuchlik
83735ae77f doc: update the metrics between 5.2 and 2023.1
Related: https://github.com/scylladb/scylla-enterprise/issues/2794

This commit adds the information about the metric changes
in version 2023.1 compared to version 5.2.

This commit is part of the 5.2-to-2023.1 upgrade guide and
must be backported to branch-5.2.

Closes #13506

(cherry picked from commit 989a75b2f7)
2023-04-17 11:29:43 +02:00
Avi Kivity
9d384e3af2 Merge 'Backport "reader_concurrency_semaphore: don't evict inactive readers needlessly" to branch-5.2' from Botond Dénes
The patch doesn't apply cleanly, so a targeted backport PR was necessary.
I also needed to cherry-pick two patches from https://github.com/scylladb/scylladb/pull/13255 that the backported patch depends on. Decided against backporting the entire https://github.com/scylladb/scylladb/pull/13255 as it is quite an intrusive change.

Fixes: https://github.com/scylladb/scylladb/issues/11803

Closes #13515

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: don't evict inactive readers needlessly
  reader_concurrency_semaphore: add stats to record reason for queueing permits
  reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
2023-04-17 12:25:21 +03:00
Nadav Har'El
0da0c94f49 cql: USING TTL 0 means unlimited, not default TTL
Our documentation states that writing an item with "USING TTL 0" means it
should never expire. This should be true even if the table has a default
TTL. But Scylla mistakenly handled "USING TTL 0" exactly like having no
USING TTL at all (i.e., it took the default TTL, instead of unlimited).
We had two xfailing tests demonstrating that Scylla's behavior in this
is different from Cassandra. Scylla's behavior in this case was also
undocumented.

By the way, Cassandra used to have the same bug (CASSANDRA-11207) but
it was fixed already in 2016 (Cassandra 3.6).

So in this patch we fix Scylla's "USING TTL 0" behavior to match the
documentation and Cassandra's behavior since 2016. One xfailing test
starts to pass and the second test passes this bug and fails on a
different one. This patch also adds a third test for "USING TTL ?"
with UNSET_VALUE - it behaves, on both Scylla and Cassandra, like a
missing "USING TTL".

The origin of this bug was that after parsing the statement, we saved
the USING TTL in an integer, and used 0 for the case of no USING TTL
given. This meant that we couldn't tell if we have USING TTL 0 or
no USING TTL at all. This patch uses an std::optional so we can tell
the case of a missing USING TTL from the case of USING TTL 0.

Fixes #6447

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13079

(cherry picked from commit a4a318f394)
2023-04-17 10:41:08 +03:00
Nadav Har'El
1a9f51b767 cql: fix empty aggregation, and add more tests
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c2be)
2023-04-17 10:41:08 +03:00
Raphael S. Carvalho
dba0e604a7 table: Fix disk-space related metrics
total disk space used metric is incorrectly telling the amount of
disk space ever used, which is wrong. It should tell the size of
all sstables being used + the ones waiting to be deleted.
live disk space used, by this defition, shouldn't account the
ones waiting to be deleted.
and live sstable count, shouldn't account sstables waiting to
be deleted.

Fix all that.

Fixes #12717.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 529a1239a9)
2023-04-16 22:14:01 +03:00
Michał Chojnowski
4ea67940cb locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges()
Some callees of update_pending_ranges use the variant of get_address_ranges()
which builds a hashmap of all <endpoint, owned range> pairs. For
everywhere_topology, the size of this map is quadratic in the number of
endpoints, making it big enough to cause contiguous allocations of tens of MiB
for clusters of realistic size, potentially causing trouble for the
allocator (as seen e.g. in #12724). This deserves a correction.

This patch removes the quadratic variant of get_address_ranges() and replaces
its uses with its linear counterpart.

Refs #10337
Refs #10817
Refs #10836
Refs #10837
Fixes #12724

(cherry picked from commit 9e57b21e0c)
2023-04-16 21:59:14 +03:00
Jan Ciolek
a8c49c44e5 cql/query_options: add a check for missing bind marker name
There was a missing check in validation of named
bind markers.

Let's say that a user prepares a query like:
```cql
INSERT INTO ks.tab (pk, ck, v) VALUES (:pk, :ck, :v)
```
Then they execute the query, but specify only
values for `:pk` and `:ck`.

We should detect that a value for :v is missing
and throw an invalid_request_exception.

Until now there was no such check, in case of a missing variable
invalid `query_options` were created and Scylla crashed.

Sadly it's impossible to create a regression test
using `cql-pytest` or `boost`.

`cql-pytest` uses the python driver, which silently
ignores mising named bind variables, deciding
that the user meant to send an UNSET_VALUE for them.
When given values like `{'pk': 1, 'ck': 2}`, it will automaticaly
extend them to `{'pk': 1, 'ck': 2, 'v': UNSET_VALUE}`.

In `boost` I tried to use `cql_test_env`,
but it only has methods which take valid `query_options`
as a parameter. I could create a separate unit tests
for the creation and validation of `query_options`
but it won't be a true end-to-end test like `cql-pytest`.

The bug was found using the rust driver,
the reproducer is available in the issue description.

Fixes: #12727

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #12730

(cherry picked from commit 2a5ed115ca)
2023-04-16 21:57:28 +03:00
Nadav Har'El
12a29edf90 test/alternator: fix flaky test for partition-tombstone scan
The test test_scan.py::test_scan_long_partition_tombstone_string
checks that a full-table Scan operation ends a page in the middle of
a very long string of partition tombstones, and does NOT scan the
entire table in one page (if we did that, getting a single page could
take an unbounded amount of time).

The test is currently flaky, having failed in CI runs three times in
the past two months.

The reason for the flakiness is that we don't know exactly how long
we need to make the sequence of partition tombstones in the test before
we can be absolutely sure a single page will not read this entire sequence.
For single-partition scans we have the "query_tombstone_page_limit"
configuration parameter, which tells us exactly how long we need to
make the sequence of row tombstones. But for a full-table scan of
partition tombstones, the situation is more complicated - because the
scan is done in parallel on several vnodes in parallel and each of
them needs to read query_tombstone_page_limit before it stops.

In my experiments, using query_tombstone_limit * 4 consecutive tombstones
was always enough - I ran this test hundreds of times and it didn't fail
once. But since it did fail on Jenkins very rarely (3 times in the last
two months), maybe the multiplier 4 isn't enough. So this patch doubles
it to 8. Hopefully this would be enough for anyone (TM).

This makes this test even bigger and slower than it was. To make it
faster, I changed this test's write isolation mode from the default
always_use_lwt to forbid_rmw (not use LWT). This leaves the test's
total run time to be similar to what it was before this patch - around
0.5 seconds in dev build mode on my laptop.

Fixes #12817

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12819

(cherry picked from commit 14cdd034ee)
2023-04-14 11:54:45 +03:00
Botond Dénes
3e10c3fc89 reader_concurrency_semaphore: don't evict inactive readers needlessly
Inactive readers should only be evicted to free up resources for waiting
readers. Evicting them when waiters are not admitted for any other
reason than resources is wasteful and leads to extra load later on when
these evicted readers have to be recreated end requeued.
This patch changes the logic on both the registering path and the
admission path to not evict inactive readers unless there are readers
actually waiting on resources.
A unit-test is also added, reproducing the overly-agressive eviction and
checking that it doesn't happen anymore.

Fixes: #11803

Closes #13286

(cherry picked from commit bd57471e54)
2023-04-14 10:37:30 +03:00
Botond Dénes
f11deb5074 reader_concurrency_semaphore: add stats to record reason for queueing permits
When diagnosing problems, knowing why permits were queued is very
valuable. Record the reason in a new stats, one for each reason a permit
can be queued.

(cherry picked from commit 7b701ac52e)
2023-04-14 10:37:30 +03:00
Botond Dénes
1baf9dddd7 reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
So caller can bump the appropriate counters or log the reason why the
the request cannot be admitted.

(cherry picked from commit bb00405818)
2023-04-14 09:30:02 +03:00
Kamil Braun
9717ff5057 docs: cleaning up after failed membership change
After a failed topology operation, like bootstrap / decommission /
removenode, the cluster might contain a garbage entry in either token
ring or group 0. This entry can be cleaned-up by executing removenode on
any other node, pointing to the node that failed to bootstrap or leave
the cluster.

Document this procedure, including a method of finding the host ID of a
garbage entry.

Add references in other documents.

Fixes: #13122

Closes #13186

(cherry picked from commit c2a2996c2b)
2023-04-13 10:35:02 +02:00
Anna Stuchlik
b293b1446f doc: remove Enterprise upgrade guides from OSS doc
This commit removes the Enterprise upgrade guides from
the Open Source documentation. The Enterprise upgrade guides
should only be available in the Enterprise documentation,
with the source files stored in scylla-enterprise.git.

In addition, this commit:
- adds the links to the Enterprise user guides in the Enterprise
documentation at https://enterprise.docs.scylladb.com/
- adds the redirections for the removed pages to avoid
breaking any links.

This commit must be reverted in scylla-enterprise.git.

(cherry picked from commit 61bc05ae49)

Closes #13473
2023-04-11 14:26:35 +03:00
Yaron Kaikov
e6f7ac17f6 doc: update supported os for 2022.1
ubuntu22.04 is already supported on both `5.0` and `2022.1`

updating the table

Closes #13340

(cherry picked from commit c80ab78741)
2023-04-05 13:56:07 +03:00
Anna Stuchlik
36619fc7d9 doc: add upgrade guide from 5.2 to 2023.1
Related: https://github.com/scylladb/scylla-enterprise/issues/2770

This commit adds the upgrade guide from ScyllaDB Open Source 5.2
to ScyllaDB Enterprise 2023.1.
This commit does not cover metric updates (the metrics file has no
content, which needs to be added in another PR).

As this is an upgrade guide, this commit must be merged to master and
backported to branch-5.2 and branch-2023.1 in scylla-enterprise.git.

Closes #13294

(cherry picked from commit 595325c11b)
2023-04-05 06:43:01 +03:00
Anna Stuchlik
750414c196 doc: update Raft doc for versions 5.2 and 2023.1
Fixes https://github.com/scylladb/scylladb/issues/13345
Fixes https://github.com/scylladb/scylladb/issues/13421

This commit updates the Raft documentation page to be up to date in versions 5.2 and 2023.1.

- Irrelevant information about previous releases is removed.
- Some information is clarified.
- Mentions of version 5.2 are either removed (if possible) or version 2023.1 is added.

Closes #13426

(cherry picked from commit 447ce58da5)
2023-04-05 06:42:28 +03:00
Botond Dénes
128050e984 Merge 'commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off' from Calle Wilund
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken comparisons in delete_segments.

Closes #12950

* github.com:scylladb/scylladb:
  commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
  commitlog: change type of stored size

(cherry picked from commit e70be47276)
2023-04-03 08:57:43 +03:00
Yaron Kaikov
d70751fee3 release: prepare for 5.2.0-rc4 2023-04-02 16:40:56 +03:00
Tzach Livyatan
1fba43c317 docs: minor improvments to the Raft Handling Failures and recovery procedure sections
Closes #13292

(cherry picked from commit 46e6c639d9)
2023-03-31 11:22:20 +02:00
Botond Dénes
e380c24c69 Merge 'Improve database shutdown verbosity' from Pavel Emelyanov
The `database::stop` method is sometimes hanging and it's always hard to spot where exactly it sleeps. Few more logging messages would make this much simpler.

refs: #13100
refs: #10941

Closes #13141

* github.com:scylladb/scylladb:
  database: Increase verbosity of database::stop() method
  large_data_handler: Increase verbosity on shutdown
  large_data_handler: Coroutinize .stop() method

(cherry picked from commit e22b27a107)
2023-03-30 17:01:24 +03:00
Avi Kivity
76a76a95f4 Update tools/java submodule (hdrhistogram with Java 11)
* tools/java 1c4e1e7a7d...83b2168b19 (1):
  > Fix cassandra-stress -log hdrfile=... with java 11

Fixes #13287
2023-03-29 14:10:27 +03:00
Anna Stuchlik
f6837afec7 doc: update the Ubuntu version used in the image
Starting from 5.2 and 2023.1 our images are based on Ubuntu:22.04.
See https://github.com/scylladb/scylladb/issues/13138#issuecomment-1467737084

This commit adds that information to the docs.
It should be merged and backported to branch-5.2.

Closes #13301

(cherry picked from commit 9e27f6b4b7)
2023-03-27 14:08:57 +03:00
Botond Dénes
6350c8836d Revert "repair: Reduce repair reader eviction with diff shard count"
This reverts commit c6087cf3a0.

Said commit can cause a deadlock when 2 or more repairs compete for
locks on 2 or more nodes. Consider the following scenario:

Node n1 and n2 in the cluster, 1 shard per node, rf = 2, each shard has
1 available unit for the reader lock

    n1 starts repair r1
    r1-n1 (instance of r1 on node1) takes the reader lock on node1
    n2 starts repair r2
    r2-n2 (instance of r2 on node2) takes the reader lock on node2
    r1-n2 will fail to take the reader lock on node2
    r2-n1 will fail to take the reader lock on node1

As a result, r1 and r2 could not make progress and deadlock happens.

The complexity comes from the fact that a repair job needs lock on more
than one node. It is not guaranteed that all the participant nodes could
take the lock in one short.

There is no simple solution to this so we have to revert this locking
mechanism and look for another way to prevent reader trashing when
repairing nodes with mismatching shard count.

Fixes: #12693

Closes #13266

(cherry picked from commit 7699904c54)
2023-03-24 09:44:16 +02:00
Avi Kivity
5457948437 Update seastar submodule (rpc cancellation during negotiation)
* seastar 8889cbc198...1488aaf842 (1):
  > Merge 'Keep outgoing queue all cancellable while negotiating (again)' from Pavel Emelyanov

Fixes #11507.
2023-03-23 17:15:00 +02:00
Avi Kivity
da41001b5c .gitmodules: point seastar submodule at scylla-seastar.git
This allows is to backport seastar commits.

Ref #11507.
2023-03-23 17:11:43 +02:00
Anna Stuchlik
dd61e8634c doc: related https://github.com/scylladb/scylladb/issues/12754; add the missing information about reporting latencies to the upgrade guide 5.1 to 5.2
Closes #12935

(cherry picked from commit 26bb36cdf5)
2023-03-22 10:38:28 +02:00
Anna Stuchlik
b642b4c30e doc: fix the service name in upgrade guides
Fixes https://github.com/scylladb/scylladb/issues/13207

This commit fixes the service and package names in
the upgrade guides 5.0-to-2022.1 and 5.1-to-2022.2.
Service name: scylla-server
Package name: scylla-enterprise

Previous PRs to fix the same issue in other
upgrade guides:
https://github.com/scylladb/scylladb/pull/12679
https://github.com/scylladb/scylladb/pull/12698

This commit must be backported to branch-5.1 and branch 5.2.

Closes #13225

(cherry picked from commit 922f6ba3dd)
2023-03-22 10:37:12 +02:00
Botond Dénes
c013336121 db/view/view_update_check: check_needs_view_update_path(): filter out non-member hosts
We currently don't clean up the system_distributed.view_build_status
table after removed nodes. This can cause false-positive check for
whether view update generation is needed for streaming.
The proper fix is to clean up this table, but that will be more
involved, it even when done, it might not be immediate. So until then
and to be on the safe side, filter out entries belonging to unknown
hosts from said table.

Fixes: #11905
Refs: #11836

Closes #11860

(cherry picked from commit 84a69b6adb)
2023-03-22 09:03:50 +02:00
Kamil Braun
b6b35ce061 service: storage_proxy: sequence CDC preimage select with Paxos learn
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.

Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.

`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.

Fixes #12098

(cherry picked from commit 1ef113691a)
2023-03-21 20:23:19 +02:00
Petr Gusev
069e38f02d transport server: fix unexpected server errors handling
If request processing ended with an error, it is worth
sending the error to the client through
make_error/write_response. Previously in this case we
just wrote a message to the log and didn't handle the
client connection in any way. As a result, the only
thing the client got in this case was timeout error.

A new test_batch_with_error is added. It is quite
difficult to reproduce error condition in a test,
so we use error injection instead. Passing injection_key
in the body of the request ensures that the exception
will be thrown only for this test request and
will not affect other requests that
the driver may send in the background.

Closes: scylladb#12104
(cherry picked from commit a4cf509c3d)
2023-03-21 20:23:09 +02:00
Anna Mikhlin
61a8003ad1 release: prepare for 5.2.0-rc3 2023-03-20 10:10:27 +02:00
Botond Dénes
8a17066961 Merge 'doc: Updates the recommended OS to be Ubuntu 22.04' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/13138
Fixes https://github.com/scylladb/scylladb/issues/13153

This PR:

- Fixes outdated information about the recommended OS. Since version 5.2, the recommended OS should be Ubuntu 22.04 because that OS is used for building the ScyllaDB image.
- Adds the OS support information for version 5.2.

This PR (both commits) needs to be backported to branch-5.2.

Closes #13188

* github.com:scylladb/scylladb:
  doc: Add OS support for version 5.2
  doc: Updates the recommended OS to be Ubuntu 22.04

(cherry picked from commit f4b5679804)
2023-03-17 10:30:06 +02:00
Pavel Emelyanov
487ba9f3e1 Merge '[backport] reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()' from Botond Dénes
This PR backports 2f4a793457 to branch-5.2. Said patch depends on some other patches that are not part of any release yet.
This PR should apply to 5.1 and 5.0 too.

Closes #13162

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()
  reader_permit: expose operator<<(reader_permit::state)
  reader_permit: add get_state() accessor
2023-03-16 18:41:08 +03:00
Botond Dénes
bd4f9e3615 Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr
The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()`

Fixes: #12249

Closes #12978

* github.com:scylladb/scylladb:
  flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE
  make_nonforwardable: test through run_mutation_source_tests
  make_nonforwardable: next_partition and fast_forward_to when single_partition is true
  make_forwardable: fix next_partition
  flat_mutation_reader_v2: drop forward_buffer_to
  nonforwardable reader: fix indentation
  nonforwardable reader: refactor, extract reset_partition
  nonforwardable reader: add more tests
  nonforwardable reader: no partition_end after fast_forward_to()
  nonforwardable reader: no partition_end after next_partition()
  nonforwardable reader: no partition_end for empty reader
  row_cache: pass partition_start though nonforwardable reader

(cherry picked from commit 46efdfa1a1)
2023-03-16 10:42:03 +02:00
Botond Dénes
c68deb2461 reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()
Instead of open-coding the same, in an incomplete way.
clear_inactive_reads() does incomplete eviction in severeal ways:
* it doesn't decrement _stats.inactive_reads
* it doesn't set the permit to evicted state
* it doesn't cancel the ttl timer (if any)
* it doesn't call the eviction notifier on the permit (if there is one)

The list goes on. We already have an evict() method that all this
correctly, use that instead of the current badly open-coded alternative.

This patch also enhances the existing test for clear_inactive_reads()
and adds a new one specifically for `stop()` being called while having
inactive reads.

Fixes: #13048

Closes #13049

(cherry picked from commit 2f4a793457)
2023-03-14 09:50:16 +02:00
Botond Dénes
dd96d3017a reader_permit: expose operator<<(reader_permit::state)
(cherry picked from commit ec1c615029)
2023-03-14 09:50:16 +02:00
Botond Dénes
6ca80ee118 reader_permit: add get_state() accessor
(cherry picked from commit 397266f420)
2023-03-14 09:40:11 +02:00
Jan Ciolek
eee8f750cc cql3: preserve binary_operator.order in search_and_replace
There was a bug in `expr::search_and_replace`.
It doesn't preserve the `order` field of binary_operator.

`order` field is used to mark relations created
using the SCYLLA_CLUSTERING_BOUND.
It is a CQL feature used for internal queries inside Scylla.
It means that we should handle the restriction as a raw
clustering bound, not as an expression in the CQL language.

Losing the SCYLLA_CLUSTERING_BOUND marker could cause issues,
the database could end up selecting the wrong clustering ranges.

Fixes: #13055

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13056

(cherry picked from commit aa604bd935)
2023-03-09 12:52:39 +02:00
Botond Dénes
8d5206e6c6 sstables/sstable: validate_checksums(): force-check EOF
EOF is only guarateed to be set if one tried to read past the end of the
file. So when checking for EOF, also try to read some more. This
should force the EOF flag into a correct value. We can then check that
the read yielded 0 bytes.
This should ensure that `validate_checksums()` will not falsely declare
the validation to have failed.

Fixes: #11190

Closes #12696

(cherry picked from commit 693c22595a)
2023-03-09 12:30:44 +02:00
Anna Stuchlik
cfa40402f4 doc: Update the documentation landing page
This commit makes the following changes to the docs landing page:

- Adds the ScyllaDB enterprise docs as one of three tiles.

- Modifies the three tiles to reflect the three flavors of ScyllaDB.

- Moves the "New to ScyllaDB? Start here!" under the page title.

- Renames "Our Products" to "Other Products" to list the products other
  than ScyllaDB itself. In addtition, the boxes are enlarged from to
  large-4 to look better.

The major purpose of this commit is to expose the ScyllaDB
documentation.

docs: fix the link
(cherry picked from commit 27bb8c2302)

Closes #13086
2023-03-06 14:18:15 +02:00
Botond Dénes
2d170e51cf Merge 'doc: specify the versions where Alternator TTL is no longer experimental' from Anna Stuchlik
This PR adds a note to the Alternator TTL section to specify in which Open Source and Enterprise versions the feature was promoted from experimental to non-experimental.

The challenge here is that OSS and Enterprise are (still) **documented together**, but they're **not in sync** in promoting the TTL feature: it's still experimental in 5.1 (released) but no longer experimental in 2022.2 (to be released soon).

We can take one of the following approaches:
a) Merge this PR with master and ask the 2022.2 users to refer to master.
b) Merge this PR with master and then backport to branch-5.1. If we choose this approach, it is necessary to backport https://github.com/scylladb/scylladb/pull/11997 beforehand to avoid conflicts.

I'd opt for a) because it makes more sense from the OSS perspective and helps us avoid mess and backporting.

Closes #12295

* github.com:scylladb/scylladb:
  doc: fix the version in the comment on removing the note
  doc: specify the versions where Alternator TTL is no longer experimental

(cherry picked from commit d5dee43be7)
2023-03-02 12:09:16 +02:00
Anna Stuchlik
860e79e4b1 doc: fixes https://github.com/scylladb/scylladb/issues/12954, adds the minimal version from which the 2021.1-to-2022.1 upgrade is supported for Ubuntu, Debian, and image
Closes #12974

(cherry picked from commit 91b611209f)
2023-02-28 13:02:05 +02:00
Anna Mikhlin
908a82bea0 release: prepare for 5.2.0-rc2 2023-02-28 10:13:06 +02:00
Gleb Natapov
39158f55d0 lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once
If on the first call the capture is destroyed the second call may crash.

Fixes: #12958

Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>
(cherry picked from commit 1ce7ad1ee6)
2023-02-27 14:19:37 +02:00
Raphael S. Carvalho
22c1685b3d sstables: Temporarily disable loading of first and last position metadata
It's known that reading large cells in reverse cause large allocations.
Source: https://github.com/scylladb/scylladb/issues/11642

The loading is preliminary work for splitting large partitions into
fragments composing a run and then be able to later read such a run
in an efficiency way using the position metadata.

The splitting is not turned on yet, anywhere. Therefore, we can
temporarily disable the loading, as a way to avoid regressions in
stable versions. Large allocations can cause stalls due to foreground
memory eviction kicking in.
The default values for position metadata say that first and last
position include all clustering rows, but they aren't used anywhere
other than by sstable_run to determine if a run is disjoint at
clustering level, but given that no splitting is done yet, it
does not really matter.

Unit tests relying on position metadata were adjusted to enable
the loading, such that they can still pass.

Fixes #11642.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12979

(cherry picked from commit d73ffe7220)
2023-02-27 08:58:34 +02:00
Botond Dénes
9ba6fc73f1 mutation_compactor: only pass consumed range-tombstone-change to validator
Currently all consumed range tombstone changes are unconditionally
forwarded to the validator. Even if they are shadowed by a higher level
tombstone and/or purgable. This can result in a situation where a range
tombstone change was seen by the validator but not passed to the
consumer. The validator expects the range tombstone change to be closed
by end-of-partition but the end fragment won't come as the tombstone was
dropped, resulting in a false-positive validation failure.
Fix by only passing tombstones to the validator, that are actually
passed to the consumer too.

Fixes: #12575

Closes #12578

(cherry picked from commit e2c9cdb576)
2023-02-23 22:52:47 +02:00
Botond Dénes
f2e2c0127a types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory
Check the first fragment before dereferencing it, the fragment might be
empty, in which case move to the next one.
Found by running range scan tests with random schema and random data.

Fixes: #12821
Fixes: #12823
Fixes: #12708

Closes #12824

(cherry picked from commit ef548e654d)
2023-02-23 22:38:03 +02:00
Gleb Natapov
363ea87f51 raft: abort applier fiber when a state machine aborts
After 5badf20c7a applier fiber does not
stop after it gets abort error from a state machine which may trigger an
assertion because previous batch is not applied. Fix it.

Fixes #12863

(cherry picked from commit 9bdef9158e)
2023-02-23 14:12:12 +02:00
Kefu Chai
c49fd6f176 tools/schema_loader: do not return ref to a local variable
we should never return a reference to local variable.
so in this change, a reference to a static variable is returned
instead. this should address following warning from Clang 17:

```
/home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
        return {};
               ^~
```

Fixes #12875
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12876

(cherry picked from commit 6eab8720c4)
2023-02-22 22:02:43 +02:00
Takuya ASADA
3114589a30 scylla_coredump_setup: fix coredump timeout settings
We currently configure only TimeoutStartSec, but probably it's not
enough to prevent coredump timeout, since TimeoutStartSec is maximum
waiting time for service startup, and there is another directive to
specify maximum service running time (RuntimeMaxSec).

To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it
configures both TimeoutStartSec and TimeoutStopSec).

Fixes #5430

Closes #12757

(cherry picked from commit bf27fdeaa2)
2023-02-19 21:13:36 +02:00
Anna Stuchlik
34f68a4c0f doc: related https://github.com/scylladb/scylladb/issues/12658, fix the service name in the upgrade guide from 2022.1 to 2022.2
Closes #12698

(cherry picked from commit 826f67a298)
2023-02-17 12:17:48 +02:00
Botond Dénes
b336e11f59 Merge 'doc: fix the service name from "scylla-enterprise-server" "to "scylla-server"' from Anna Stuchlik
Related https://github.com/scylladb/scylladb/issues/12658.

This issue fixes the bug in the upgrade guides for the released versions.

Closes #12679

* github.com:scylladb/scylladb:
  doc: fix the service name in the upgrade guide for patch releases versions 2022
  doc: fix the service name in the upgrade guide from 2021.1 to 2022.1

(cherry picked from commit 325246ab2a)
2023-02-17 12:16:52 +02:00
Anna Stuchlik
9ef73d7e36 doc: fixes https://github.com/scylladb/scylladb/issues/12754, document the metric update in 5.2
Closes #12891

(cherry picked from commit bcca706ff5)
2023-02-17 12:16:13 +02:00
Botond Dénes
8700a72b4c Merge 'Backport compaction-backlog-tracker fixes to branch-5.2' from Raphael "Raph" Carvalho
Both patches are important to fix inefficiencies when updating the backlog tracker, which can manifest as a reactor stall, on a special event like schema change.

No conflicts when backporting.

Regression since 1d9f53c881, which is present in branch 5.1 onwards.

Closes #12851

* github.com:scylladb/scylladb:
  compaction: Fix inefficiency when updating LCS backlog tracker
  table: Fix quadratic behavior when inserting sstables into tracker on schema change
2023-02-15 07:22:25 +02:00
Raphael S. Carvalho
886dd3e1d2 compaction: Fix inefficiency when updating LCS backlog tracker
LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker
is calling STCS tracker's replace_sstables() with empty arguments
even when higher levels (> 0) *only* had sstables replaced.
This unnecessary call to STCS tracker will cause it to recompute
the L0 backlog, yielding the same value as before.

As LCS has a fragment size of 0.16G on higher levels, we may be
updating the tracker multiple times during incremental compaction,
which operates on SSTables on higher levels.

Inefficiency is fixed by only updating the STCS tracker if any
L0 sstable is being added or removed from the table.

This may be fixing a quadratic behavior during boot or refresh,
as new sstables are loaded one by one.
Higher levels have a substantial higher number of sstables,
therefore updating STCS tracker only when level 0 changes, reduces
significantly the number of times L0 backlog is recomputed.

Refs #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12676

(cherry picked from commit 1b2140e416)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-14 12:14:27 -03:00
Raphael S. Carvalho
f565f3de06 table: Fix quadratic behavior when inserting sstables into tracker on schema change
Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12593

(cherry picked from commit 87ee547120)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-14 12:14:21 -03:00
Anna Stuchlik
76ff6d981c doc: related https://github.com/scylladb/scylladb/issues/12754, add the requirement to upgrade Monitoring to version 4.3
Closes #12784

(cherry picked from commit c7778dd30b)
2023-02-10 10:28:35 +02:00
Botond Dénes
f924f59055 Merge 'Backport test.py improvements to 5.2' from Kamil Braun
Backport the following improvements for test.py efficiency and user experience:
- https://github.com/scylladb/scylladb/pull/12542
- https://github.com/scylladb/scylladb/pull/12560
- https://github.com/scylladb/scylladb/pull/12564
- https://github.com/scylladb/scylladb/pull/12563
- https://github.com/scylladb/scylladb/pull/12588
- https://github.com/scylladb/scylladb/pull/12613
- https://github.com/scylladb/scylladb/pull/12569
- https://github.com/scylladb/scylladb/pull/12612
- https://github.com/scylladb/scylladb/pull/12549
- https://github.com/scylladb/scylladb/pull/12678

Fixes #12617

Closes #12770

* github.com:scylladb/scylladb:
  test/pylib: put UNIX-domain socket in /tmp
  Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun
  Merge 'test.py: manual cluster pool handling for Python suite' from Alecco
  Merge 'test.py: handle broken clusters for Python suite' from Alecco
  test/pylib: scylla_cluster: don't leak server if stopping it fails
  Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
  test/pylib: scylla_cluster: return error details from test framework endpoints
  test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
  test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
  test: disable commitlog O_DSYNC, preallocation
2023-02-08 15:09:09 +02:00
Nadav Har'El
d5cef05810 test/pylib: put UNIX-domain socket in /tmp
The "cluster manager" used by the topology test suite uses a UNIX-domain
socket to communicate between the cluster manager and the individual tests.
The socket is currently located in the test directory but there is a
problem: In Linux the length of the path used as a UNIX-domain socket
address is limited to just a little over 100 bytes. In Jenkins run, the
test directory names are very long, and we sometimes go over this length
limit and the result is that test.py fails creating this socket.

In this patch we simply put the socket in /tmp instead of the test
directory. We only need to do this change in one place - the cluster
manager, as it already passes the socket path to the individual tests
(using the "--manager-api" option).

Tested by cloning Scylla in a very long directory name.
A test like ./test.py --mode=dev test_concurrent_schema fails before
this patch, and passes with it.

Fixes #12622

Closes #12678

(cherry picked from commit 681a066923)
2023-02-07 17:12:14 +01:00
Nadav Har'El
e0f4e99e9b Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun
`ScyllaClusterManager` is used to run a sequence of test cases from
a single test file. Between two consecutive tests, if the previous test
left the cluster 'dirty', meaning the cluster cannot be reused, it would
free up space in the pool (using `steal`), stop the cluster, then get a
new cluster from the pool.

Between the `steal` and the `get`, a concurrent test run (with its own
instance of `ScyllaClusterManager` would start, because there was free
space in the pool.

This resulted in undesirable behavior when we ran tests with
`--repeat X` for a large `X`: we would start with e.g. 4 concurrent
runs of a test file, because the pool size was 4. As soon as one of the
runs freed up space in the pool, we would start another concurrent run.
Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so
on. We would have a large number of concurrent runs, even though the
original 4 runs didn't finish yet. All of these concurrent runs would
compete waiting on the pool, and waiting for space in the pool would
take longer and longer (the duration is linear w.r.t number of
concurrent competing runs). Tests would then time out because they would
have to wait too long.

Fix that by using the new `replace_dirty` function introduced to the
pool. This function frees up space by returning a dirty cluster and then
immediately takes it away to be used for a new cluster. Thanks to this,
we will only have at most as many concurrent runs as the pool size. For
example with --repeat 8 and pool size 4, we would run 4 concurrent runs
and start the 5th run only when one of the original 4 runs finishes,
then the 6th run when a second run finishes and so on.

The fix is preceded by a refactor that replaces `steal` with `put(is_dirty=True)`
and a `destroy` function passed to the pool (now the pool is responsible
for stopping the cluster and releasing its IPs).

Fixes #11757

Closes #12549

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests
  test/pylib: pool: introduce `replace_dirty`
  test/pylib: pool: replace `steal` with `put(is_dirty=True)`

(cherry picked from commit 132af20057)
2023-02-07 17:08:17 +01:00
Kamil Braun
6795715011 Merge 'test.py: manual cluster pool handling for Python suite' from Alecco
From reviews of https://github.com/scylladb/scylladb/pull/12569, avoid
using `async with` and access the `Pool` of clusters with
`get()`/`put()`.

Closes #12612

* github.com:scylladb/scylladb:
  test.py: manual cluster handling for PythonSuite
  test.py: stop cluster if PythonSuite fails to start
  test.py: minor fix for failed PythonSuite test

(cherry picked from commit 5bc7f0732e)
2023-02-07 17:07:43 +01:00
Nadav Har'El
aa9e91c376 Merge 'test.py: handle broken clusters for Python suite' from Alecco
If the after test check fails (is_after_test_ok is False), discard the cluster and raise exception so context manager (pool) does not recycle it.

Ignore exception re-raised by the context manager.

Fixes #12360

Closes #12569

* github.com:scylladb/scylladb:
  test.py: handle broken clusters for Python suite
  test.py: Pool discard method

(cherry picked from commit 54f174a1f4)
2023-02-07 17:07:36 +01:00
Kamil Braun
ddfb9ebab2 test/pylib: scylla_cluster: don't leak server if stopping it fails
`ScyllaCluster.server_stop` had this piece of code:
```
        server = self.running.pop(server_id)
        if gracefully:
            await server.stop_gracefully()
        else:
            await server.stop()
        self.stopped[server_id] = server
```

We observed `stop_gracefully()` failing due to a server hanging during
shutdown. We then ended up in a state where neither `self.running` nor
`self.stopped` had this server. Later, when releasing the cluster and
its IPs, we would release that server's IP - but the server might have
still been running (all servers in `self.running` are killed before
releasing IPs, but this one wasn't in `self.running`).

Fix this by popping the server from `self.running` only after
`stop_gracefully`/`stop` finishes.

Make an analogous fix in `server_start`: put `server` into
`self.running` *before* we actually start it. If the start fails, the
server will be considered "running" even though it isn't necessarily,
but that is OK - if it isn't running, then trying to stop it later will
simply do nothing; if it is actually running, we will kill it (which we
should do) when clearing after the cluster; and we don't leak it.

Closes #12613

(cherry picked from commit a0ff33e777)
2023-02-07 17:05:20 +01:00
Nadav Har'El
d58a3e4d16 Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
Don't use a range scan, which is very inefficient, to perform a query for checking CQL availability.

Improve logging when waiting for server startup times out. Provide details about the failure: whether we managed to obtain the Host ID of the server and whether we managed to establish a CQL connection.

Closes #12588

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: better logging for timeout on server startup
  test/pylib: scylla_cluster: use less expensive query to check for CQL availability

(cherry picked from commit ccc2c6b5dd)
2023-02-07 17:05:02 +01:00
Kamil Braun
2ebac52d2d test/pylib: scylla_cluster: return error details from test framework endpoints
If an endpoint handler throws an exception, the details of the exception
are not returned to the client. Normally this is desirable so that
information is not leaked, but in this test framework we do want to
return the details to the client so it can log a useful error message.

Do it by wrapping every handler into a catch clause that returns
the exception message.

Also modify a bit how HTTPErrors are rendered so it's easier to discern
the actual body of the error from other details (such as the params used
to make the request etc.)

Before:
```
E test.pylib.rest_client.HTTPError: HTTP error 500: 500 Internal Server Error
E
E Server got itself in trouble, params None, json None, uri http+unix://api/cluster/before-test/test_stuff
```

After:
```
E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/before-test/test_stuff, params: None, json: None, body:
E Failed to start server at host 127.155.129.1.
E Check the log files:
E /home/kbraun/dev/scylladb/testlog/test.py.dev.log
E /home/kbraun/dev/scylladb/testlog/dev/scylla-1.log
```

Closes #12563

(cherry picked from commit 2f84e820fd)
2023-02-07 17:04:37 +01:00
Kamil Braun
b536614913 test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
When we obtained a new cluster for a test case after the previous test
case left a dirty cluster, we would release the old cluster's used IP
addresses (`_before_test` function). However, we would not release the
last cluster's IP after the last test case. We would run out of IPs with
sufficiently many test files or `--repeat` runs. Fix this.

Also reorder the operations a bit: stop the cluster (and release its
IPs) before freeing up space in the cluster pool (i.e. call
`self.cluster.stop()` before `self.clusters.steal()`). This reduces
concurrency a bit - fewer Scyllas running at the same time, which is
good (the pool size gives a limit on the desired max number of
concurrently running clusters). Killing a cluster is quick so it won't
make a significant difference for the next guy waiting on the pool.

Closes #12564

(cherry picked from commit 3ed3966f13)
2023-02-07 17:04:19 +01:00
Kamil Braun
85df0fd2b1 test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
If a cluster fails to boot, it saves the exception in
`self.start_exception` variable; the exception will be rethrown when
a test tries to start using this cluster. As explained in `before_test`:
```
    def before_test(self, name) -> None:
        """Check that  the cluster is ready for a test. If
        there was a start error, throw it here - the server is
        running when it's added to the pool, which can't be attributed
        to any specific test, throwing it here would stop a specific
        test."""
```
It's arguable whether we should blame some random test for a failure
that it didn't cause, but nevertheless, there's a problem here: the
`start_exception` will be rethrown and the test will fail, but then the
cluster will be simply returned to the pool and the next test will
attempt to use it... and so on.

Prevent this by marking the cluster as dirty the first time we rethrow
the exception.

Closes #12560

(cherry picked from commit 147dd73996)
2023-02-07 17:03:56 +01:00
Avi Kivity
cdf9fe7023 test: disable commitlog O_DSYNC, preallocation
Commitlog O_DSYNC is intended to make Raft and schema writes durable
in the face of power loss. To make O_DSYNC performant, we preallocate
the commitlog segments, so that the commitlog writes only change file
data and not file metadata (which would require the filesystem to commit
its own log).

However, in tests, this causes each ScyllaDB instance to write 384MB
of commitlog segments. This overloads the disks and slows everything
down.

Fix this by disabling O_DSYNC (and therefore preallocation) during
the tests. They can't survive power loss, and run with
--unsafe-bypass-fsync anyway.

Closes #12542

(cherry picked from commit 9029b8dead)
2023-02-07 17:02:59 +01:00
Beni Peled
8ff4717fd0 release: prepare for 5.2.0-rc1 2023-02-06 22:13:53 +02:00
Kamil Braun
291b1f6e7f service/raft: raft_group0: prevent double abort
There was a small chance that we called `timeout_src.request_abort()`
twice in the `with_timeout` function, first by timeout and then by
shutdown. `abort_source` fails on an assertion in this case. Fix this.

Fixes: #12512

Closes #12514

(cherry picked from commit 54170749b8)
2023-02-05 18:31:50 +02:00
Kefu Chai
b2699743cc db: system_keyspace: take the reserved_memory into account
before this change, we returns the total memory managed by Seastar
in the "total" field in system.memory. but this value only reflect
the total memory managed by Seastar's allocator. if
`reserve_additional_memory` is set when starting app_template,
Seastar's memory subsystem just reserves a chunk of memory of this
specified size for system, and takes the remaining memory. since
f05d612da8, we set this value to 50MB for wasmtime runtime. hence
the test of `TestRuntimeInfoTable.test_default_content` in dtest
fails. the test expects the size passed via the option of
`--memory` to be identical to the value reported by system.memory's
"total" field.

after this change, the "total" field takes the reserved memory
for wasm udf into account. the "total" field should reflect the total
size of memory used by Scylla, no matter how we use a certain portion
of the allocated memory.

Fixes #12522
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12573

(cherry picked from commit 4a0134a097)
2023-02-05 18:30:05 +02:00
Botond Dénes
50ae73a4bd types: is_tuple(): handle reverse types
Currently reverse types match the default case (false), even though they
might be wrapping a tuple type. One user-visible effect of this is that
a schema, which has a reversed<frozen<UDT>> clustering key component,
will have this component incorrectly represented in the schema cql dump:
the UDT will loose the frozen attribute. When attempting to recreate
this schema based on the dump, it will fail as the only frozen UDTs are
allowed in primary key components.

Fixes: #12576

Closes #12579

(cherry picked from commit ebc100f74f)
2023-02-05 18:20:21 +02:00
Calle Wilund
c3dd4a2b87 alterator::streams: Sort tables in list_streams to ensure no duplicates
Fixes #12601 (maybe?)

Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.

(cherry picked from commit da8adb4d26)
2023-02-05 17:44:00 +02:00
Benny Halevy
0f9fe61d91 view: row_lock: lock_ck: find or construct row_lock under partition lock
Since we're potentially searching the row_lock in parallel to acquiring
the read_lock on the partition, we're racing with row_locker::unlock
that may erase the _row_locks entry for the same clustering key, since
there is no lock to protect it up until the partition lock has been
acquired and the lock_partition future is resolved.

This change moves the code to search for or allocate the row lock
_after_ the partition lock has been acquired to make sure we're
synchronously starting the read/write lock function on it, without
yielding, to prevent this use-after-free.

This adds an allocation for copying the clustering key in advance
even if a row_lock entry already exists, that wasn't needed before.
It only us slows down (a bit) when there is contention and the lock
already existed when we want to go locking. In the fast path there
is no contention and then the code already had to create the lock
and copy the key. In any case, the penalty of copying the key once
is tiny compared to the rest of the work that view updates are doing.

This is required on top of 5007ded2c1 as
seen in https://github.com/scylladb/scylladb/issues/12632
which is closely related to #12168 but demonstrates a different race
causing use-after-free.

Fixes #12632

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4b5e324ecb)
2023-02-05 17:22:31 +02:00
Anna Stuchlik
59d30ff241 docs: fixes https://github.com/scylladb/scylladb/issues/12654, update the links to the Download Center
Closes #12655

(cherry picked from commit 64cc4c8515)
2023-02-05 17:19:56 +02:00
Anna Stuchlik
fb82dff89e doc: fixes https://github.com/scylladb/scylladb/issues/12672, fix the redirects to the Cloud docs
Closes #12673

(cherry picked from commit 2be131da83)
2023-02-05 17:17:35 +02:00
Kefu Chai
b588b19620 cql3/selection: construct string_view using char* not size
before this change, we construct a sstring from a comma statement,
which evaluates to the return value of `name.size()`, but what we
expect is `sstring(const char*, size_t)`.

in this change

* instead of passing the size of the string_view,
  both its address and size are used
* `std::string_view` is constructed instead of sstring, for better
  performance, as we don't need to perform a deep copy

the issue is reported by GCC-13:

```
In file included from cql3/selection/selectable.cc:11:
cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
                                                           ^~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12666

(cherry picked from commit 186ceea009)

Fixes #12739.
2023-02-05 13:50:48 +02:00
Michał Chojnowski
608ef92a71 commitlog: fix total_size_on_disk accounting after segment file removal
Currently, segment file removal first calls `f.remove_file()` and
does `total_size_on_disk -= f.known_size()` later.
However, `remove_file()` resets `known_size` to 0, so in effect
the freed space in not accounted for.

`total_size_on_disk` is not just a metric. It is also responsible
for deciding whether a segment should be recycled -- it is recycled
only if `total_size_on_disk - known_size < max_disk_size`.
Therefore this bug has dire performance consequences:
if `total_size_on_disk - known_size` ever exceeds `max_disk_size`,
the recycling of commitlog segments will stop permanently, because
`total_size_on_disk - known_size` will never go back below
`max_disk_size` due to the accounting bug. All new segments from this
point will be allocated from scratch.

The bug was uncovered by a QA performance test. It isn't easy to trigger --
it took the test 7 hours of constant high load to step into it.
However, the fact that the effect is permanent, and degrades the
performance of the cluster silently, makes the bug potentially quite severe.

The bug can be easily spotted with Prometheus as infinitely rising
`commitlog_total_size_on_disk` on the affected shards.

Fixes #12645

Closes #12646

(cherry picked from commit fa7e904cd6)
2023-02-01 21:54:37 +02:00
Kamil Braun
d2732b2663 Merge 'Enable Raft by default in new clusters' from Kamil Braun
New clusters that use a fresh conf/scylla.yaml will have `consistent_cluster_management: true`, which will enable Raft, unless the user explicitly turns it off before booting the cluster.

People using existing yaml files will continue without Raft, unless consistent_cluster_management is explicitly requested during/after upgrade.

Also update the docs: cluster creation and node addition procedures.

Fixes #12572.

Closes #12585

* github.com:scylladb/scylladb:
  docs: mention `consistent_cluster_management` for creating cluster and adding node procedures
  conf: enable `consistent_cluster_management` by default

(cherry picked from commit 5c886e59de)
2023-01-26 12:21:55 +01:00
Anna Mikhlin
34ab98e1be release: prepare for 5.2.0-rc0 2023-01-18 14:54:36 +02:00
218 changed files with 2856 additions and 4686 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -72,7 +72,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.2.0-dev
VERSION=5.2.1
if test -f version
then

View File

@@ -145,19 +145,24 @@ future<alternator::executor::request_return_type> alternator::executor::list_str
auto table = find_table(_proxy, request);
auto db = _proxy.data_dictionary();
auto cfs = db.get_tables();
auto i = cfs.begin();
auto e = cfs.end();
if (limit < 1) {
throw api_error::validation("Limit must be 1 or more");
}
// TODO: the unordered_map here is not really well suited for partial
// querying - we're sorting on local hash order, and creating a table
// between queries may or may not miss info. But that should be rare,
// and we can probably expect this to be a single call.
// # 12601 (maybe?) - sort the set of tables on ID. This should ensure we never
// generate duplicates in a paged listing here. Can obviously miss things if they
// are added between paged calls and end up with a "smaller" UUID/ARN, but that
// is to be expected.
std::sort(cfs.begin(), cfs.end(), [](const data_dictionary::table& t1, const data_dictionary::table& t2) {
return t1.schema()->id().uuid() < t2.schema()->id().uuid();
});
auto i = cfs.begin();
auto e = cfs.end();
if (streams_start) {
i = std::find_if(i, e, [&](data_dictionary::table t) {
i = std::find_if(i, e, [&](const data_dictionary::table& t) {
return t.schema()->id().uuid() == streams_start
&& cdc::get_base_table(db.real_database(), *t.schema())
&& is_alternator_keyspace(t.schema()->ks_name())

View File

@@ -647,6 +647,7 @@ sstables::compaction_stopped_exception compaction_manager::task::make_compaction
compaction_manager::compaction_manager(config cfg, abort_source& as)
: _cfg(std::move(cfg))
, _compaction_submission_timer(compaction_sg().cpu, compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), [this] () -> float {
_last_backlog = backlog();
auto b = _last_backlog / available_memory();
@@ -681,6 +682,7 @@ compaction_manager::compaction_manager(config cfg, abort_source& as)
compaction_manager::compaction_manager()
: _cfg(config{ .available_memory = 1 })
, _compaction_submission_timer(compaction_sg().cpu, compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), 1, [] () -> float { return 1.0; }))
, _backlog_manager(_compaction_controller)
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
@@ -738,7 +740,7 @@ void compaction_manager::register_metrics() {
void compaction_manager::enable() {
assert(_state == state::none || _state == state::disabled);
_state = state::enabled;
_compaction_submission_timer.arm(periodic_compaction_submission_interval());
_compaction_submission_timer.arm_periodic(periodic_compaction_submission_interval());
_waiting_reevalution = postponed_compactions_reevaluation();
}

View File

@@ -296,10 +296,10 @@ private:
std::function<void()> compaction_submission_callback();
// all registered tables are reevaluated at a constant interval.
// Submission is a NO-OP when there's nothing to do, so it's fine to call it regularly.
timer<lowres_clock> _compaction_submission_timer = timer<lowres_clock>(compaction_submission_callback());
static constexpr std::chrono::seconds periodic_compaction_submission_interval() { return std::chrono::seconds(3600); }
config _cfg;
timer<lowres_clock> _compaction_submission_timer;
compaction_controller _compaction_controller;
compaction_backlog_manager _backlog_manager;
optimized_optional<abort_source::subscription> _early_abort_subscription;

View File

@@ -409,7 +409,9 @@ public:
l0_old_ssts.push_back(std::move(sst));
}
}
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
if (l0_old_ssts.size() || l0_new_ssts.size()) {
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
}
}
};

View File

@@ -553,4 +553,16 @@ murmur3_partitioner_ignore_msb_bits: 12
# WARNING: It's unsafe to set this to false if the node previously booted
# with the schema commit log enabled. In such case, some schema changes
# may be lost if the node was not cleanly stopped.
force_schema_commit_log: true
force_schema_commit_log: true
# Use Raft to consistently manage schema information in the cluster.
# Refer to https://docs.scylladb.com/master/architecture/raft.html for more details.
# The 'Handling Failures' section is especially important.
#
# Once enabled in a cluster, this cannot be turned off.
# If you want to bootstrap a new cluster without Raft, make sure to set this to `false`
# before starting your nodes for the first time.
#
# A cluster not using Raft can be 'upgraded' to use Raft. Refer to the aforementioned
# documentation, section 'Enabling Raft in ScyllaDB 5.2 and further', for the procedure.
consistent_cluster_management: true

View File

@@ -10,6 +10,7 @@
#include "cql3/attributes.hh"
#include "cql3/column_identifier.hh"
#include <optional>
namespace cql3 {
@@ -55,9 +56,9 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
}
}
int32_t attributes::get_time_to_live(const query_options& options) {
std::optional<int32_t> attributes::get_time_to_live(const query_options& options) {
if (!_time_to_live.has_value() || _time_to_live_unset_guard.is_unset(options))
return 0;
return std::nullopt;
cql3::raw_value tval = expr::evaluate(*_time_to_live, options);
if (tval.is_null()) {

View File

@@ -45,7 +45,7 @@ public:
int64_t get_timestamp(int64_t now, const query_options& options);
int32_t get_time_to_live(const query_options& options);
std::optional<int32_t> get_time_to_live(const query_options& options);
db::timeout_clock::duration get_timeout(const query_options& options) const;

View File

@@ -1416,7 +1416,7 @@ expression search_and_replace(const expression& e,
};
},
[&] (const binary_operator& oper) -> expression {
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs));
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs), oper.order);
},
[&] (const column_mutation_attribute& cma) -> expression {
return column_mutation_attribute{cma.kind, recurse(cma.column)};

View File

@@ -13,6 +13,7 @@
#include "cql3/lists.hh"
#include "cql3/constants.hh"
#include "cql3/user_types.hh"
#include "cql3/ut_name.hh"
#include "cql3/type_json.hh"
#include "cql3/functions/user_function.hh"
#include "cql3/functions/user_aggregate.hh"
@@ -52,6 +53,13 @@ bool abstract_function::requires_thread() const { return false; }
bool as_json_function::requires_thread() const { return false; }
static bool same_signature(const shared_ptr<function>& f1, const shared_ptr<function>& f2) {
if (f1 == nullptr || f2 == nullptr) {
return false;
}
return f1->name() == f2->name() && f1->arg_types() == f2->arg_types();
}
thread_local std::unordered_multimap<function_name, shared_ptr<function>> functions::_declared = init();
void functions::clear_functions() noexcept {
@@ -143,22 +151,56 @@ void functions::replace_function(shared_ptr<function> func) {
with_udf_iter(func->name(), func->arg_types(), [func] (functions::declared_t::iterator i) {
i->second = std::move(func);
});
auto scalar_func = dynamic_pointer_cast<scalar_function>(func);
if (!scalar_func) {
return;
}
for (auto& fit : _declared) {
auto aggregate = dynamic_pointer_cast<user_aggregate>(fit.second);
if (aggregate && (same_signature(aggregate->sfunc(), scalar_func)
|| (same_signature(aggregate->finalfunc(), scalar_func))
|| (same_signature(aggregate->reducefunc(), scalar_func))))
{
// we need to replace at least one underlying function
shared_ptr<scalar_function> sfunc = same_signature(aggregate->sfunc(), scalar_func) ? scalar_func : aggregate->sfunc();
shared_ptr<scalar_function> finalfunc = same_signature(aggregate->finalfunc(), scalar_func) ? scalar_func : aggregate->finalfunc();
shared_ptr<scalar_function> reducefunc = same_signature(aggregate->reducefunc(), scalar_func) ? scalar_func : aggregate->reducefunc();
fit.second = ::make_shared<user_aggregate>(aggregate->name(), aggregate->initcond(), sfunc, reducefunc, finalfunc);
}
}
}
void functions::remove_function(const function_name& name, const std::vector<data_type>& arg_types) {
with_udf_iter(name, arg_types, [] (functions::declared_t::iterator i) { _declared.erase(i); });
}
std::optional<function_name> functions::used_by_user_aggregate(const function_name& name) {
std::optional<function_name> functions::used_by_user_aggregate(shared_ptr<user_function> func) {
for (const shared_ptr<function>& fptr : _declared | boost::adaptors::map_values) {
auto aggregate = dynamic_pointer_cast<user_aggregate>(fptr);
if (aggregate && (aggregate->sfunc().name() == name || (aggregate->has_finalfunc() && aggregate->finalfunc().name() == name))) {
if (aggregate && (same_signature(aggregate->sfunc(), func)
|| (same_signature(aggregate->finalfunc(), func))
|| (same_signature(aggregate->reducefunc(), func))))
{
return aggregate->name();
}
}
return {};
}
std::optional<function_name> functions::used_by_user_function(const ut_name& user_type) {
for (const shared_ptr<function>& fptr : _declared | boost::adaptors::map_values) {
for (auto& arg_type : fptr->arg_types()) {
if (arg_type->references_user_type(user_type.get_keyspace(), user_type.get_user_type_name())) {
return fptr->name();
}
}
if (fptr->return_type()->references_user_type(user_type.get_keyspace(), user_type.get_user_type_name())) {
return fptr->name();
}
}
return {};
}
lw_shared_ptr<column_specification>
functions::make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
const function& fun, size_t i) {

View File

@@ -71,7 +71,8 @@ public:
static void add_function(shared_ptr<function>);
static void replace_function(shared_ptr<function>);
static void remove_function(const function_name& name, const std::vector<data_type>& arg_types);
static std::optional<function_name> used_by_user_aggregate(const function_name& name);
static std::optional<function_name> used_by_user_aggregate(shared_ptr<user_function>);
static std::optional<function_name> used_by_user_function(const ut_name& user_type);
private:
template <typename F>
static void with_udf_iter(const function_name& name, const std::vector<data_type>& arg_types, F&& f);

View File

@@ -37,14 +37,14 @@ public:
virtual sstring element_type() const override { return "aggregate"; }
virtual std::ostream& describe(std::ostream& os) const override;
const scalar_function& sfunc() const {
return *_sfunc;
seastar::shared_ptr<scalar_function> sfunc() const {
return _sfunc;
}
const scalar_function& reducefunc() const {
return *_reducefunc;
seastar::shared_ptr<scalar_function> reducefunc() const {
return _reducefunc;
}
const scalar_function& finalfunc() const {
return *_finalfunc;
seastar::shared_ptr<scalar_function> finalfunc() const {
return _finalfunc;
}
const bytes_opt& initcond() const {
return _initcond;

View File

@@ -135,12 +135,21 @@ void query_options::prepare(const std::vector<lw_shared_ptr<column_specification
ordered_values.reserve(specs.size());
for (auto&& spec : specs) {
auto& spec_name = spec->name->text();
bool found_value_for_name = false;
for (size_t j = 0; j < names.size(); j++) {
if (names[j] == spec_name) {
ordered_values.emplace_back(_value_views[j]);
found_value_for_name = true;
break;
}
}
// No bound value was found with the name `spec_name`.
// This means that the user forgot to include a bound value with such name.
if (!found_value_for_name) {
throw exceptions::invalid_request_exception(
format("Missing value for bind marker with name: {}", spec_name));
}
}
_value_views = std::move(ordered_values);
}

View File

@@ -22,6 +22,7 @@
#include "db/config.hh"
#include "data_dictionary/data_dictionary.hh"
#include "hashers.hh"
#include "utils/error_injection.hh"
namespace cql3 {
@@ -600,6 +601,14 @@ query_processor::get_statement(const sstring_view& query, const service::client_
std::unique_ptr<raw::parsed_statement>
query_processor::parse_statement(const sstring_view& query) {
try {
{
const char* error_injection_key = "query_processor-parse_statement-test_failure";
utils::get_local_injector().inject(error_injection_key, [&]() {
if (query.find(error_injection_key) != sstring_view::npos) {
throw std::runtime_error(error_injection_key);
}
});
}
auto statement = util::do_with_parser(query, std::mem_fn(&cql3_parser::CqlParser::query));
if (!statement) {
throw exceptions::syntax_exception("Parsing failed");

View File

@@ -80,7 +80,7 @@ public:
virtual sstring assignment_testable_source_context() const override {
auto&& name = _type->field_name(_field);
auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
auto sname = std::string_view(reinterpret_cast<const char*>(name.data()), name.size());
return format("{}.{}", _selected, sname);
}

View File

@@ -35,7 +35,7 @@ drop_function_statement::prepare_schema_mutations(query_processor& qp, api::time
if (!user_func) {
throw exceptions::invalid_request_exception(format("'{}' is not a user defined function", func));
}
if (auto aggregate = functions::functions::used_by_user_aggregate(user_func->name()); bool(aggregate)) {
if (auto aggregate = functions::functions::used_by_user_aggregate(user_func)) {
throw exceptions::invalid_request_exception(format("Cannot delete function {}, as it is used by user-defined aggregate {}", func, *aggregate));
}
m = co_await qp.get_migration_manager().prepare_function_drop_announcement(user_func, ts);

View File

@@ -10,6 +10,7 @@
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "cql3/query_processor.hh"
#include "cql3/functions/functions.hh"
#include "boost/range/adaptor/map.hpp"
@@ -109,6 +110,9 @@ void drop_type_statement::validate_while_executing(query_processor& qp) const {
}
}
if (auto&& fun_name = functions::functions::used_by_user_function(_name)) {
throw exceptions::invalid_request_exception(format("Cannot drop user type {}.{} as it is still used by function {}", keyspace, type->get_name_as_string(), *fun_name));
}
} catch (data_dictionary::no_such_keyspace& e) {
throw exceptions::invalid_request_exception(format("Cannot drop type in unknown keyspace {}", keyspace()));
}

View File

@@ -17,6 +17,7 @@
#include "cql3/util.hh"
#include "validation.hh"
#include "db/consistency_level_validations.hh"
#include <optional>
#include <seastar/core/shared_ptr.hh>
#include <boost/range/adaptor/transformed.hpp>
#include <boost/range/adaptor/map.hpp>
@@ -95,8 +96,9 @@ bool modification_statement::is_timestamp_set() const {
return attrs->is_timestamp_set();
}
gc_clock::duration modification_statement::get_time_to_live(const query_options& options) const {
return gc_clock::duration(attrs->get_time_to_live(options));
std::optional<gc_clock::duration> modification_statement::get_time_to_live(const query_options& options) const {
std::optional<int32_t> ttl = attrs->get_time_to_live(options);
return ttl ? std::make_optional<gc_clock::duration>(*ttl) : std::nullopt;
}
future<> modification_statement::check_access(query_processor& qp, const service::client_state& state) const {

View File

@@ -130,7 +130,7 @@ public:
bool is_timestamp_set() const;
gc_clock::duration get_time_to_live(const query_options& options) const;
std::optional<gc_clock::duration> get_time_to_live(const query_options& options) const;
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;

View File

@@ -93,7 +93,7 @@ public:
};
// Note: value (mutation) only required to contain the rows we are interested in
private:
const gc_clock::duration _ttl;
const std::optional<gc_clock::duration> _ttl;
// For operations that require a read-before-write, stores prefetched cell values.
// For CAS statements, stores values of conditioned columns.
// Is a reference to an outside prefetch_data container since a CAS BATCH statement
@@ -106,7 +106,7 @@ public:
const query_options& _options;
update_parameters(const schema_ptr schema_, const query_options& options,
api::timestamp_type timestamp, gc_clock::duration ttl, const prefetch_data& prefetched)
api::timestamp_type timestamp, std::optional<gc_clock::duration> ttl, const prefetch_data& prefetched)
: _ttl(ttl)
, _prefetched(prefetched)
, _timestamp(timestamp)
@@ -127,11 +127,7 @@ public:
}
atomic_cell make_cell(const abstract_type& type, const raw_value_view& value, atomic_cell::collection_member cm = atomic_cell::collection_member::no) const {
auto ttl = _ttl;
if (ttl.count() <= 0) {
ttl = _schema->default_time_to_live();
}
auto ttl = this->ttl();
return value.with_value([&] (const FragmentedView auto& v) {
if (ttl.count() > 0) {
@@ -143,11 +139,7 @@ public:
};
atomic_cell make_cell(const abstract_type& type, const managed_bytes_view& value, atomic_cell::collection_member cm = atomic_cell::collection_member::no) const {
auto ttl = _ttl;
if (ttl.count() <= 0) {
ttl = _schema->default_time_to_live();
}
auto ttl = this->ttl();
if (ttl.count() > 0) {
return atomic_cell::make_live(type, _timestamp, value, _local_deletion_time + ttl, ttl, cm);
@@ -169,7 +161,7 @@ public:
}
gc_clock::duration ttl() const {
return _ttl.count() > 0 ? _ttl : _schema->default_time_to_live();
return _ttl.value_or(_schema->default_time_to_live());
}
gc_clock::time_point expiry() const {

View File

@@ -59,7 +59,7 @@ public:
}
_end_of_stream = false;
forward_buffer_to(pr.start());
clear_buffer();
return _underlying->fast_forward_to(std::move(pr));
}

View File

@@ -1671,9 +1671,9 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
align = f.disk_write_dma_alignment();
auto is_overwrite = false;
auto existing_size = f.known_size();
if ((flags & open_flags::dsync) != open_flags{}) {
auto existing_size = f.known_size();
is_overwrite = true;
// would be super nice if we just could mmap(/dev/zero) and do sendto
// instead of this, but for now we must do explicit buffer writes.
@@ -1683,8 +1683,6 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
if (existing_size > max_size) {
co_await f.truncate(max_size);
} else if (existing_size < max_size) {
totals.total_size_on_disk += (max_size - existing_size);
clogger.trace("Pre-writing {} of {} KB to segment {}", (max_size - existing_size)/1024, max_size/1024, filename);
// re-open without o_dsync for pre-alloc. The reason/rationale
@@ -1732,6 +1730,12 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
co_await f.truncate(max_size);
}
// #12810 - we did not update total_size_on_disk unless o_dsync was
// on. So kept running with total == 0 -> free for all in creating new segment.
// Always update total_size_on_disk. Will wrap-around iff existing_size > max_size.
// That is ok.
totals.total_size_on_disk += (max_size - existing_size);
if (cfg.extensions && !cfg.extensions->commitlog_file_extensions().empty()) {
for (auto * ext : cfg.extensions->commitlog_file_extensions()) {
auto nf = co_await ext->wrap_file(filename, f, flags);
@@ -2116,6 +2120,9 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
clogger.debug("Discarding segments {}", ftd);
for (auto& [f, mode] : ftd) {
// `f.remove_file()` resets known_size to 0, so remember the size here,
// in order to subtract it from total_size_on_disk accurately.
auto size = f.known_size();
try {
if (f) {
co_await f.close();
@@ -2132,7 +2139,6 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
}
}
auto size = f.known_size();
auto usage = totals.total_size_on_disk;
auto next_usage = usage - size;
@@ -2165,7 +2171,7 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
// or had such an exception that we consider the file dead
// anyway. In either case we _remove_ the file size from
// footprint, because it is no longer our problem.
totals.total_size_on_disk -= f.known_size();
totals.total_size_on_disk -= size;
}
// #8376 - if we had an error in recycling (disk rename?), and no elements

View File

@@ -401,6 +401,10 @@ public:
named_value<uint64_t> wasm_udf_yield_fuel;
named_value<uint64_t> wasm_udf_total_fuel;
named_value<size_t> wasm_udf_memory_limit;
// wasm_udf_reserved_memory is static because the options in db::config
// are parsed using seastar::app_template, while this option is used for
// configuring the Seastar memory subsystem.
static constexpr size_t wasm_udf_reserved_memory = 50 * 1024 * 1024;
seastar::logging_settings logging_settings(const log_cli::options&) const;

View File

@@ -7,6 +7,7 @@
*/
#include <seastar/core/print.hh>
#include <seastar/core/coroutine.hh>
#include "db/system_keyspace.hh"
#include "db/large_data_handler.hh"
#include "sstables/sstables.hh"
@@ -55,11 +56,11 @@ void large_data_handler::start() {
}
future<> large_data_handler::stop() {
if (!running()) {
return make_ready_future<>();
if (running()) {
_running = false;
large_data_logger.info("Waiting for {} background handlers", max_concurrency - _sem.available_units());
co_await _sem.wait(max_concurrency);
}
_running = false;
return _sem.wait(max_concurrency);
}
void large_data_handler::plug_system_keyspace(db::system_keyspace& sys_ks) noexcept {

View File

@@ -2216,15 +2216,15 @@ std::vector<mutation> make_create_aggregate_mutations(schema_features features,
mutation& m = p.first;
clustering_key& ckey = p.second;
data_type state_type = aggregate->sfunc().arg_types()[0];
data_type state_type = aggregate->sfunc()->arg_types()[0];
if (aggregate->has_finalfunc()) {
m.set_clustered_cell(ckey, "final_func", aggregate->finalfunc().name().name, timestamp);
m.set_clustered_cell(ckey, "final_func", aggregate->finalfunc()->name().name, timestamp);
}
if (aggregate->initcond()) {
m.set_clustered_cell(ckey, "initcond", state_type->deserialize(*aggregate->initcond()).to_parsable_string(), timestamp);
}
m.set_clustered_cell(ckey, "return_type", aggregate->return_type()->as_cql3_type().to_string(), timestamp);
m.set_clustered_cell(ckey, "state_func", aggregate->sfunc().name().name, timestamp);
m.set_clustered_cell(ckey, "state_func", aggregate->sfunc()->name().name, timestamp);
m.set_clustered_cell(ckey, "state_type", state_type->as_cql3_type().to_string(), timestamp);
std::vector<mutation> muts = {m};
@@ -2233,7 +2233,7 @@ std::vector<mutation> make_create_aggregate_mutations(schema_features features,
auto sa_p = get_mutation(sa_schema, *aggregate);
mutation& sa_mut = sa_p.first;
clustering_key& sa_ckey = sa_p.second;
sa_mut.set_clustered_cell(sa_ckey, "reduce_func", aggregate->reducefunc().name().name, timestamp);
sa_mut.set_clustered_cell(sa_ckey, "reduce_func", aggregate->reducefunc()->name().name, timestamp);
sa_mut.set_clustered_cell(sa_ckey, "state_type", state_type->as_cql3_type().to_string(), timestamp);
muts.emplace_back(sa_mut);

View File

@@ -295,7 +295,7 @@ future<> size_estimates_mutation_reader::fast_forward_to(const dht::partition_ra
}
future<> size_estimates_mutation_reader::fast_forward_to(position_range pr) {
forward_buffer_to(pr.start());
clear_buffer();
_end_of_stream = false;
if (_partition_reader) {
return _partition_reader->fast_forward_to(std::move(pr));

View File

@@ -2276,7 +2276,10 @@ public:
add_partition(mutation_sink, "trace_probability", format("{:.2}", tracing::tracing::get_local_tracing_instance().get_trace_probability()));
co_await add_partition(mutation_sink, "memory", [this] () {
struct stats {
uint64_t total = 0;
// take the pre-reserved memory into account, as seastar only returns
// the stats of memory managed by the seastar allocator, but we instruct
// it to reserve addition memory for system.
uint64_t total = db::config::wasm_udf_reserved_memory;
uint64_t free = 0;
static stats reduce(stats a, stats b) { return stats{a.total + b.total, a.free + b.free}; }
};
@@ -3344,11 +3347,11 @@ mutation system_keyspace::make_group0_history_state_id_mutation(
using namespace std::chrono;
assert(*gc_older_than >= gc_clock::duration{0});
auto ts_millis = duration_cast<milliseconds>(microseconds{ts});
auto gc_older_than_millis = duration_cast<milliseconds>(*gc_older_than);
assert(gc_older_than_millis < ts_millis);
auto ts_micros = microseconds{ts};
auto gc_older_than_micros = duration_cast<microseconds>(*gc_older_than);
assert(gc_older_than_micros < ts_micros);
auto tomb_upper_bound = utils::UUID_gen::min_time_UUID(ts_millis - gc_older_than_millis);
auto tomb_upper_bound = utils::UUID_gen::min_time_UUID(ts_micros - gc_older_than_micros);
// We want to delete all entries with IDs smaller than `tomb_upper_bound`
// but the deleted range is of the form (x, +inf) since the schema is reversed.
auto range = query::clustering_range::make_starting_with({

View File

@@ -172,7 +172,7 @@ class build_progress_virtual_reader {
}
virtual future<> fast_forward_to(position_range range) override {
forward_buffer_to(range.start());
clear_buffer();
_end_of_stream = false;
return _underlying.fast_forward_to(std::move(range));
}

View File

@@ -85,29 +85,25 @@ future<row_locker::lock_holder>
row_locker::lock_ck(const dht::decorated_key& pk, const clustering_key_prefix& cpk, bool exclusive, db::timeout_clock::time_point timeout, stats& stats) {
mylog.debug("taking shared lock on partition {}, and {} lock on row {} in it", pk, (exclusive ? "exclusive" : "shared"), cpk);
auto tracker = latency_stats_tracker(exclusive ? stats.exclusive_row : stats.shared_row);
auto ck = cpk;
// Create a two-level lock entry for the partition if it doesn't exist already.
auto i = _two_level_locks.try_emplace(pk, this).first;
// The two-level lock entry we've just created is guaranteed to be kept alive as long as it's locked.
// Initiating read locking in the background below ensures that even if the two-level lock is currently
// write-locked, releasing the write-lock will synchronously engage any waiting
// locks and will keep the entry alive.
future<lock_type::holder> lock_partition = i->second._partition_lock.hold_read_lock(timeout);
auto j = i->second._row_locks.find(cpk);
if (j == i->second._row_locks.end()) {
// Not yet locked, need to create the lock. This makes a copy of cpk.
try {
j = i->second._row_locks.emplace(cpk, lock_type()).first;
} catch(...) {
// If this emplace() failed, e.g., out of memory, we fail. We
// could do nothing - the partition lock we already started
// taking will be unlocked automatically after being locked.
// But it's better form to wait for the work we started, and it
// will also allow us to remove the hash-table row we added.
return lock_partition.then([ex = std::current_exception()] (auto lock) {
// The lock is automatically released when "lock" goes out of scope.
// TODO: unlock (lock = {}) now, search for the partition in the
// hash table (we know it's still there, because we held the lock until
// now) and remove the unused lock from the hash table if still unused.
return make_exception_future<row_locker::lock_holder>(std::current_exception());
});
return lock_partition.then([this, pk = &i->first, row_locks = &i->second._row_locks, ck = std::move(ck), exclusive, tracker = std::move(tracker), timeout] (auto lock1) mutable {
auto j = row_locks->find(ck);
if (j == row_locks->end()) {
// Not yet locked, need to create the lock.
j = row_locks->emplace(std::move(ck), lock_type()).first;
}
}
return lock_partition.then([this, pk = &i->first, cpk = &j->first, &row_lock = j->second, exclusive, tracker = std::move(tracker), timeout] (auto lock1) mutable {
auto* cpk = &j->first;
auto& row_lock = j->second;
// Like to the two-level lock entry above, the row_lock entry we've just created
// is guaranteed to be kept alive as long as it's locked.
// Initiating read/write locking in the background below ensures that.
auto lock_row = exclusive ? row_lock.hold_write_lock(timeout) : row_lock.hold_read_lock(timeout);
return lock_row.then([this, pk, cpk, exclusive, tracker = std::move(tracker), lock1 = std::move(lock1)] (auto lock2) mutable {
lock1.release();

View File

@@ -2523,24 +2523,28 @@ update_backlog node_update_backlog::add_fetch(unsigned shard, update_backlog bac
return std::max(backlog, _max.load(std::memory_order_relaxed));
}
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const sstring& ks_name, const sstring& cf_name) {
return sys_dist_ks.view_status(ks_name, cf_name).then([] (std::unordered_map<locator::host_id, sstring>&& view_statuses) {
return boost::algorithm::any_of(view_statuses | boost::adaptors::map_values, [] (const sstring& view_status) {
return view_status == "STARTED";
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const sstring& ks_name,
const sstring& cf_name) {
using view_statuses_type = std::unordered_map<locator::host_id, sstring>;
return sys_dist_ks.view_status(ks_name, cf_name).then([&tm] (view_statuses_type&& view_statuses) {
return boost::algorithm::any_of(view_statuses, [&tm] (const view_statuses_type::value_type& view_status) {
// Only consider status of known hosts.
return view_status.second == "STARTED" && tm.get_endpoint_for_host_id(view_status.first);
});
});
}
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const replica::table& t, streaming::stream_reason reason) {
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const replica::table& t,
streaming::stream_reason reason) {
if (is_internal_keyspace(t.schema()->ks_name())) {
return make_ready_future<bool>(false);
}
if (reason == streaming::stream_reason::repair && !t.views().empty()) {
return make_ready_future<bool>(true);
}
return do_with(t.views(), [&sys_dist_ks] (auto& views) {
return do_with(t.views(), [&sys_dist_ks, &tm] (auto& views) {
return map_reduce(views,
[&sys_dist_ks] (const view_ptr& view) { return check_view_build_ongoing(sys_dist_ks, view->ks_name(), view->cf_name()); },
[&sys_dist_ks, &tm] (const view_ptr& view) { return check_view_build_ongoing(sys_dist_ks, tm, view->ks_name(), view->cf_name()); },
false,
std::logical_or<bool>());
});

View File

@@ -22,9 +22,13 @@ class system_distributed_keyspace;
}
namespace locator {
class token_metadata;
}
namespace db::view {
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const sstring& ks_name, const sstring& cf_name);
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const replica::table& t, streaming::stream_reason reason);
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const replica::table& t,
streaming::stream_reason reason);
}

View File

@@ -157,11 +157,11 @@ future<> view_update_generator::start() {
service::get_local_streaming_priority(),
nullptr,
::mutation_reader::forwarding::no);
auto close_sr = deferred_close(staging_sstable_reader);
inject_failure("view_update_generator_consume_staging_sstable");
auto result = staging_sstable_reader.consume_in_thread(view_updating_consumer(s, std::move(permit), *t, sstables, _as, staging_sstable_reader_handle),
dht::incremental_owned_ranges_checker::make_partition_filter(_db.get_keyspace_local_ranges(s->ks_name())));
staging_sstable_reader.close().get();
if (result == stop_iteration::yes) {
break;
}

View File

@@ -478,7 +478,15 @@ static future<bool> ping_with_timeout(pinger::endpoint_id id, clock::timepoint_t
auto f = pinger.ping(id, timeout_as);
auto sleep_and_abort = [] (clock::timepoint_t timeout, abort_source& timeout_as, clock& c) -> future<> {
co_await c.sleep_until(timeout, timeout_as);
co_await c.sleep_until(timeout, timeout_as).then_wrapped([&timeout_as] (auto&& f) {
// Avoid throwing if sleep was aborted.
if (f.failed() && timeout_as.abort_requested()) {
// Expected (if ping() resolved first or we were externally aborted).
f.ignore_ready_future();
return make_ready_future<>();
}
return f;
});
if (!timeout_as.abort_requested()) {
// We resolved before `f`. Abort the operation.
timeout_as.request_abort();
@@ -501,8 +509,6 @@ static future<bool> ping_with_timeout(pinger::endpoint_id id, clock::timepoint_t
// Wait on the sleep as well (it should return shortly, being aborted) so we don't discard the future.
try {
co_await std::move(sleep_and_abort);
} catch (const sleep_aborted&) {
// Expected (if `f` resolved first or we were externally aborted).
} catch (...) {
// There should be no other exceptions, but just in case... log it and discard,
// we want to propagate exceptions from `f`, not from sleep.

View File

@@ -42,7 +42,8 @@ if __name__ == '__main__':
if systemd_unit.available('systemd-coredump@.service'):
dropin = '''
[Service]
TimeoutStartSec=infinity
RuntimeMaxSec=infinity
TimeoutSec=infinity
'''[1:-1]
os.makedirs('/etc/systemd/system/systemd-coredump@.service.d', exist_ok=True)
with open('/etc/systemd/system/systemd-coredump@.service.d/timeout.conf', 'w') as f:

View File

@@ -7,7 +7,7 @@ Group: Applications/Databases
License: AGPLv3
URL: http://www.scylladb.com/
Source0: %{reloc_pkg}
Requires: %{product}-server = %{version} %{product}-conf = %{version} %{product}-python3 = %{version} %{product}-kernel-conf = %{version} %{product}-jmx = %{version} %{product}-tools = %{version} %{product}-tools-core = %{version} %{product}-node-exporter = %{version}
Requires: %{product}-server = %{version}-%{release} %{product}-conf = %{version}-%{release} %{product}-python3 = %{version}-%{release} %{product}-kernel-conf = %{version}-%{release} %{product}-jmx = %{version}-%{release} %{product}-tools = %{version}-%{release} %{product}-tools-core = %{version}-%{release} %{product}-node-exporter = %{version}-%{release}
Obsoletes: scylla-server < 1.1
%global _debugsource_template %{nil}
@@ -54,7 +54,7 @@ Group: Applications/Databases
Summary: The Scylla database server
License: AGPLv3
URL: http://www.scylladb.com/
Requires: %{product}-conf = %{version} %{product}-python3 = %{version}
Requires: %{product}-conf = %{version}-%{release} %{product}-python3 = %{version}-%{release}
Conflicts: abrt
AutoReqProv: no

View File

@@ -1,6 +1,77 @@
### a dictionary of redirections
#old path: new path
# removing the Enterprise upgrade guides from the Open Source documentation
/stable/upgrade/upgrade-enterprise/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-image.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-image.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/metric-update-2021.1-to-2022.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/metric-update-2021.1-to-2022.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/metric-update-2020.1-to-2021.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/metric-update-2020.1-to-2021.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/metric-update-2019.1-to-2020.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/metric-update-2019.1-to-2020.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/metric-update-2018.1-to-2019.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/metric-update-2018.1-to-2019.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/metric-update-2017.1-to-2018.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/metric-update-2017.1-to-2018.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-ubuntu-14-to-16.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-ubuntu-14-to-16.html
/stable/getting-started/install-scylla/unified-installer.html#unified-installed-upgrade: https://enterprise.docs.scylladb.com/stable/getting-started/install-scylla/unified-installer.html#unified-installed-upgrade
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-image.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-image.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-20-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-20-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-debian-10.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-debian-10.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-20-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-20-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-9.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-9.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-10.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-10.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-9.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-9.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-10.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-10.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-debian.html
# removing the Enterprise-only content from the Open Source documentation
/stable/using-scylla/workload-prioritization: https://enterprise.docs.scylladb.com//stable/using-scylla/workload-prioritization.html
/stable/operating-scylla/security/encryption-at-rest: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/encryption-at-rest.html
/stable/operating-scylla/security/ldap-authentication: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/ldap-authentication.html
/stable/operating-scylla/security/ldap-authorization: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/ldap-authorization.html
/stable/operating-scylla/security/auditing: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/auditing.html
# unifying the Ubunut upgrade guide for different Ubuntu versions: from 5.0 to 2022.1
/stable/upgrade/upgrade-to-enterprise/upgrade-guide-from-5.0-to-2022.1/upgrade-guide-from-5.0-to-2022.1-ubuntu-18-04.html: /stable/upgrade/upgrade-to-enterprise/upgrade-guide-from-5.0-to-2022.1/upgrade-guide-from-5.0-to-2022.1-ubuntu.html
@@ -1112,14 +1183,14 @@ tls-ssl/index.html: /stable/operating-scylla/security
/using-scylla/integrations/integration_kairos/index.html: /stable/using-scylla/integrations/integration-kairos
/upgrade/ami_upgrade/index.html: /stable/upgrade/ami-upgrade
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
# move scylla cloud for AWS to dedicated directory
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/AWS/aws-vpc-peering
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: /stable/scylla-cloud/cloud-setup/AWS/cloud-prom-proxy
/scylla-cloud/cloud-setup/outposts/index.html: /stable/scylla-cloud/cloud-setup/AWS/outposts
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: /stable/scylla-cloud/cloud-setup/AWS/scylla-cloud-byoa
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/aws-vpc-peering.html
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: https://cloud.docs.scylladb.com/stable/monitoring/cloud-prom-proxy.html
/scylla-cloud/cloud-setup/outposts/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/outposts.html
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/scylla-cloud-byoa.html
/scylla-cloud/cloud-services/scylla_cloud_costs/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-costs
/scylla-cloud/cloud-services/scylla_cloud_managin_versions/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-managin-versions
/scylla-cloud/cloud-services/scylla_cloud_support_alerts_sla/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-support-alerts-sla

View File

@@ -161,6 +161,10 @@ events appear in the Streams API as normal deletions - without the
distinctive marker on deletions which are really expirations.
See <https://github.com/scylladb/scylla/issues/5060>.
<!--- REMOVE IN FUTURE VERSIONS - Remove the note below in version 5.3/2023.1 -->
> **Note** This feature is experimental in versions earlier than ScyllaDB Open Source 5.2 and ScyllaDB Enterprise 2022.2.
---

View File

@@ -5,7 +5,7 @@ Raft Consensus Algorithm in ScyllaDB
Introduction
--------------
ScyllaDB was originally designed, following Apache Cassandra, to use gossip for topology and schema updates and the Paxos consensus algorithm for
strong data consistency (:doc:`LWT </using-scylla/lwt>`). To achieve stronger consistency without performance penalty, ScyllaDB 5.x has turned to Raft - a consensus algorithm designed as an alternative to both gossip and Paxos.
strong data consistency (:doc:`LWT </using-scylla/lwt>`). To achieve stronger consistency without performance penalty, ScyllaDB has turned to Raft - a consensus algorithm designed as an alternative to both gossip and Paxos.
Raft is a consensus algorithm that implements a distributed, consistent, replicated log across members (nodes). Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines.
@@ -13,9 +13,9 @@ Raft uses a heartbeat mechanism to trigger a leader election. All servers start
Leader selection is described in detail in the `Raft paper <https://raft.github.io/raft.pdf>`_.
ScyllaDB 5.x may use Raft to maintain schema updates in every node (see below). Any schema update, like ALTER, CREATE or DROP TABLE, is first committed as an entry in the replicated Raft log, and, once stored on most replicas, applied to all nodes **in the same order**, even in the face of a node or network failures.
ScyllaDB can use Raft to maintain schema updates in every node (see below). Any schema update, like ALTER, CREATE or DROP TABLE, is first committed as an entry in the replicated Raft log, and, once stored on most replicas, applied to all nodes **in the same order**, even in the face of a node or network failures.
Following ScyllaDB 5.x releases will use Raft to guarantee consistent topology updates similarly.
Upcoming ScyllaDB releases will use Raft to guarantee consistent topology updates similarly.
.. _raft-quorum-requirement:
@@ -26,90 +26,55 @@ Raft requires at least a quorum of nodes in a cluster to be available. If multip
and the quorum is lost, the cluster is unavailable for schema updates. See :ref:`Handling Failures <raft-handling-failures>`
for information on how to handle failures.
Upgrade Considerations for ScyllaDB 5.0 and Later
==================================================
Note that when you have a two-DC cluster with the same number of nodes in each DC, the cluster will lose the quorum if one
of the DCs is down.
**We recommend configuring three DCs per cluster to ensure that the cluster remains available and operational when one DC is down.**
.. _enabling-raft-existing-cluster:
Enabling Raft
---------------
Enabling Raft in ScyllaDB 5.0 and 5.1
=====================================
.. warning::
In ScyllaDB 5.0 and 5.1, Raft is an experimental feature.
It is not possible to enable Raft in an existing cluster in ScyllaDB 5.0 and 5.1.
In order to have a Raft-enabled cluster in these versions, you must create a new cluster with Raft enabled from the start.
.. warning::
**Do not** use Raft in production clusters in ScyllaDB 5.0 and 5.1. Such clusters won't be able to correctly upgrade to ScyllaDB 5.2.
Use Raft only for testing and experimentation in clusters which can be thrown away.
.. warning::
Once enabled, Raft cannot be disabled on your cluster. The cluster nodes will fail to restart if you remove the Raft feature.
When creating a new cluster, add ``raft`` to the list of experimental features in your ``scylla.yaml`` file:
.. code-block:: yaml
experimental_features:
- raft
.. _enabling-raft-existing-cluster:
Enabling Raft in ScyllaDB 5.2 and further
=========================================
.. TODO include enterprise versions in this documentation
.. note::
In ScyllaDB 5.2, Raft is Generally Available and can be safely used for consistent schema management.
In ScyllaDB 5.3 it will become enabled by default.
In further versions it will be mandatory.
In ScyllaDB 5.2 and ScyllaDB Enterprise 2023.1 Raft is Generally Available and can be safely used for consistent schema management.
In further versions, it will be mandatory.
ScyllaDB 5.2 and later comes equipped with a procedure that can setup Raft-based consistent cluster management in an existing cluster. We refer to this as the **internal Raft upgrade procedure** (do not confuse with the :doc:`ScyllaDB version upgrade procedure </upgrade/upgrade-opensource/upgrade-guide-from-5.1-to-5.2/upgrade-guide-from-5.1-to-5.2-generic>`).
ScyllaDB Open Source 5.2 and later, and ScyllaDB Enterprise 2023.1 and later come equipped with a procedure that can setup Raft-based consistent cluster management in an existing cluster. We refer to this as the **Raft upgrade procedure** (do not confuse with the :doc:`ScyllaDB version upgrade procedure </upgrade/index/>`).
.. warning::
Once enabled, Raft cannot be disabled on your cluster. The cluster nodes will fail to restart if you remove the Raft feature.
To enable Raft in an existing cluster in Scylla 5.2 and beyond:
To enable Raft in an existing cluster, you need to enable the ``consistent_cluster_management`` option in the ``scylla.yaml`` file
for **each node** in the cluster:
* ensure that the schema is synchronized in the cluster by executing :doc:`nodetool describecluster </operating-scylla/nodetool-commands/describecluster>` on each node and ensuring that the schema version is the same on all nodes,
* then perform a :doc:`rolling restart </operating-scylla/procedures/config-change/rolling-restart/>`, updating the ``scylla.yaml`` file for **each node** in the cluster before restarting it to enable the ``consistent_cluster_management`` flag:
#. Ensure that the schema is synchronized in the cluster by executing :doc:`nodetool describecluster </operating-scylla/nodetool-commands/describecluster>` on each node and ensuring that the schema version is the same on all nodes.
#. Perform a :doc:`rolling restart </operating-scylla/procedures/config-change/rolling-restart/>`, updating the ``scylla.yaml`` file for **each node** in the cluster before restarting it to enable the ``consistent_cluster_management`` option:
.. code-block:: yaml
.. code-block:: yaml
consistent_cluster_management: true
consistent_cluster_management: true
When all the nodes in the cluster and updated and restarted, the cluster will start the **internal Raft upgrade procedure**.
**You must then verify** that the internal Raft upgrade procedure has finished successfully. Refer to the :ref:`next section <verify-raft-procedure>`.
When all the nodes in the cluster and updated and restarted, the cluster will start the **Raft upgrade procedure**.
**You must then verify** that the Raft upgrade procedure has finished successfully. Refer to the :ref:`next section <verify-raft-procedure>`.
You can also enable the ``consistent_cluster_management`` flag while performing :doc:`rolling upgrade from 5.1 to 5.2 </upgrade/upgrade-opensource/upgrade-guide-from-5.1-to-5.2/upgrade-guide-from-5.1-to-5.2-generic>`: update ``scylla.yaml`` before restarting each node. The internal Raft upgrade procedure will start as soon as the last node was upgraded and restarted. As above, this requires :ref:`verifying <verify-raft-procedure>` that this internal procedure successfully finishes.
Alternatively, you can enable the ``consistent_cluster_management`` option when you are:
Finally, you can enable the ``consistent_cluster_management`` flag when creating a new cluster. This does not use the internal Raft upgrade procedure; instead, Raft is functioning in the cluster and managing schema right from the start.
* Performing a rolling upgrade from version 5.1 to 5.2 or version 2022.x to 2023.1 by updating ``scylla.yaml`` before restarting each node. The Raft upgrade procedure will start as soon as the last node was upgraded and restarted. As above, this requires :ref:`verifying <verify-raft-procedure>` that the procedure successfully finishes.
* Creating a new cluster. This does not use the Raft upgrade procedure; instead, Raft is functioning in the cluster and managing schema right from the start.
Until all nodes are restarted with ``consistent_cluster_management: true``, it is still possible to turn this option back off. Once enabled on every node, it must remain turned on (or the node will refuse to restart).
.. _verify-raft-procedure:
Verifying that the internal Raft upgrade procedure finished successfully
Verifying that the Raft upgrade procedure finished successfully
========================================================================
.. versionadded:: 5.2
The internal Raft upgrade procedure starts as soon as every node in the cluster restarts with ``consistent_cluster_management`` flag enabled in ``scylla.yaml``.
The Raft upgrade procedure starts as soon as every node in the cluster restarts with ``consistent_cluster_management`` flag enabled in ``scylla.yaml``.
.. TODO: update the above sentence once 5.3 and later are released.
The procedure requires **full cluster availability** to correctly setup the Raft algorithm; after the setup finishes, Raft can proceed with only a majority of nodes, but this initial setup is an exception.
An unlucky event, such as a hardware failure, may cause one of your nodes to fail. If this happens before the internal Raft upgrade procedure finishes, the procedure will get stuck and your intervention will be required.
An unlucky event, such as a hardware failure, may cause one of your nodes to fail. If this happens before the Raft upgrade procedure finishes, the procedure will get stuck and your intervention will be required.
To verify that the procedure finishes, look at the log of every Scylla node (using ``journalctl _COMM=scylla``). Search for the following patterns:
@@ -204,8 +169,6 @@ If some nodes are **dead and irrecoverable**, you'll need to perform a manual re
Verifying that Raft is enabled
===============================
.. versionadded:: 5.2
You can verify that Raft is enabled on your cluster by performing the following query on each node:
.. code-block:: sql
@@ -224,7 +187,7 @@ The query should return:
on every node.
If the query returns 0 rows, or ``value`` is ``synchronize`` or ``use_pre_raft_procedures``, it means that the cluster is in the middle of the internal Raft upgrade procedure; consult the :ref:`relevant section <verify-raft-procedure>`.
If the query returns 0 rows, or ``value`` is ``synchronize`` or ``use_pre_raft_procedures``, it means that the cluster is in the middle of the Raft upgrade procedure; consult the :ref:`relevant section <verify-raft-procedure>`.
If ``value`` is ``recovery``, it means that the cluster is in the middle of the manual recovery procedure. The procedure must be finished. Consult :ref:`the section about Raft recovery <recover-raft-procedure>`.
@@ -276,12 +239,8 @@ Examples
- Schema updates are possible and safe.
- Try restarting the node. If the node is dead, :doc:`replace it with a new node </operating-scylla/procedures/cluster-management/replace-dead-node/>`.
* - 2 nodes
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- Restart at least 1 of the 2 nodes that are down to regain quorum. If you cant recover at least 1 of the 2 nodes, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
* - 1 datacenter
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- When the DC comes back online, restart the nodes. If the DC does not come back online and nodes are lost, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
.. list-table:: Cluster B: 2 datacenters, 6 nodes (3 nodes per DC)
:widths: 20 40 40
@@ -294,10 +253,10 @@ Examples
- Schema updates are possible and safe.
- Try restarting the node(s). If the node is dead, :doc:`replace it with a new node </operating-scylla/procedures/cluster-management/replace-dead-node/>`.
* - 3 nodes
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- Restart 1 of the 3 nodes that are down to regain quorum. If you cant recover at least 1 of the 3 failed nodes, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
* - 1DC
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- When the DCs come back online, restart the nodes. If the DC fails to come back online and the nodes are lost, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
@@ -315,7 +274,7 @@ Examples
- Schema updates are possible and safe.
- When the DC comes back online, try restarting the nodes in the cluster. If the nodes are dead, :doc:`add 3 new nodes in a new region </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`.
* - 2 DCs
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- When the DCs come back online, restart the nodes. If at least one DC fails to come back online and the nodes are lost, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
.. _recover-raft-procedure:
@@ -323,26 +282,24 @@ Examples
Raft manual recovery procedure
==============================
.. versionadded:: 5.2
The manual Raft recovery procedure applies to the following situations:
* :ref:`The internal Raft upgrade procedure <verify-raft-procedure>` got stuck because one of your nodes failed in the middle of the procedure and is irrecoverable,
* :ref:`The Raft upgrade procedure <verify-raft-procedure>` got stuck because one of your nodes failed in the middle of the procedure and is irrecoverable,
* or the cluster was running Raft but a majority of nodes (e.g. 2 our of 3) failed and are irrecoverable. Raft cannot progress unless a majority of nodes is available.
.. warning::
Perform the manual recovery procedure **only** if you're dealing with **irrecoverable** nodes. If it is possible to restart your nodes, do that instead of manual recovery.
.. warning::
.. note::
Before proceeding, make sure that the irrecoverable nodes are truly dead, and not, for example, temporarily partitioned away due to a network failure. If it is possible for the 'dead' nodes to come back to life, they might communicate and interfere with the recovery procedure and cause unpredictable problems.
If you have no means of ensuring that these irrecoverable nodes won't come back to life and communicate with the rest of the cluster, setup firewall rules or otherwise isolate your alive nodes to reject any communication attempts from these dead nodes.
During the manual recovery procedure you'll enter a special ``RECOVERY`` mode, remove all faulty nodes (using the standard :doc:`node removal procedure </operating-scylla/procedures/cluster-management/remove-node/>`), delete the internal Raft data, and restart the cluster. This will cause the cluster to perform the internal Raft upgrade procedure again, initializing the Raft algorithm from scratch. The manual recovery procedure is applicable both to clusters which were not running Raft in the past and then had Raft enabled, and to clusters which were bootstrapped using Raft.
During the manual recovery procedure you'll enter a special ``RECOVERY`` mode, remove all faulty nodes (using the standard :doc:`node removal procedure </operating-scylla/procedures/cluster-management/remove-node/>`), delete the internal Raft data, and restart the cluster. This will cause the cluster to perform the Raft upgrade procedure again, initializing the Raft algorithm from scratch. The manual recovery procedure is applicable both to clusters which were not running Raft in the past and then had Raft enabled, and to clusters which were bootstrapped using Raft.
.. warning::
.. note::
Entering ``RECOVERY`` mode requires a node restart. Restarting an additional node while some nodes are already dead may lead to unavailability of data queries (assuming that you haven't lost it already). For example, if you're using the standard RF=3, CL=QUORUM setup, and you're recovering from a stuck of upgrade procedure because one of your nodes is dead, restarting another node will cause temporary data query unavailability (until the node finishes restarting). Prepare your service for downtime before proceeding.
@@ -393,4 +350,3 @@ Learn More About Raft
* `Making Schema Changes Safe with Raft <https://www.scylladb.com/presentations/making-schema-changes-safe-with-raft/>`_ - A Scylla Summit talk by Konstantin Osipov (register for access)
* `The Future of Consensus in ScyllaDB 5.0 and Beyond <https://www.scylladb.com/presentations/the-future-of-consensus-in-scylladb-5-0-and-beyond/>`_ - A Scylla Summit talk by Tomasz Grabiec (register for access)

View File

@@ -746,9 +746,7 @@ CDC options
.. versionadded:: 3.2 Scylla Open Source
The following options are to be used with Change Data Capture. Available as an experimental feature from Scylla Open Source 3.2.
To use this feature, you must enable the :ref:`experimental tag <yaml_enabling_experimental_features>` in the scylla.yaml.
The following options can be used with Change Data Capture.
+---------------------------+-----------------+------------------------------------------------------------------------------------------------------------------------+
| option | default | description |
@@ -823,7 +821,8 @@ The ``tombstone_gc`` option allows you to prevent data resurrection. With the ``
are only removed after :term:`repair` is performed. Unlike ``gc_grace_seconds``, ``tombstone_gc`` has no time constraints - when
the ``repair`` mode is on, tombstones garbage collection will wait until repair is run.
The ``tombstone_gc`` option can be enabled using ``ALTER TABLE`` and ``CREATE TABLE``. For example:
You can enable the after-repair tombstone GC by setting the ``repair`` mode using
``ALTER TABLE`` or ``CREATE TABLE``. For example:
.. code-block:: cql
@@ -833,10 +832,6 @@ The ``tombstone_gc`` option can be enabled using ``ALTER TABLE`` and ``CREATE TA
ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'} ;
.. note::
The ``tombstone_gc`` option was added in ScyllaDB 5.0 as an experimental feature, and it is disabled by default.
You need to explicitly specify the ``repair`` mode table property to enable the feature.
The following modes are available:
.. list-table::
@@ -846,7 +841,7 @@ The following modes are available:
* - Mode
- Description
* - ``timeout``
- Tombstone GC is performed after the wait time specified with ``gc_grace_seconds``. Default in ScyllaDB 5.0.
- Tombstone GC is performed after the wait time specified with ``gc_grace_seconds`` (default).
* - ``repair``
- Tombstone GC is performed after repair is run.
* - ``disabled``

View File

@@ -25,7 +25,7 @@ Getting Started
:id: "getting-started"
:class: my-panel
* `Install ScyllaDB (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/>`_ - Links to the ScyllaDB Download Center
* `Install ScyllaDB (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/#core>`_ - Links to the ScyllaDB Download Center
* :doc:`Configure ScyllaDB </getting-started/system-configuration/>`
* :doc:`Run ScyllaDB in a Shared Environment </getting-started/scylla-in-a-shared-environment>`

View File

@@ -20,7 +20,7 @@ Install ScyllaDB
Keep your versions up-to-date. The two latest versions are supported. Also always install the latest patches for your version.
* Download and install ScyllaDB Server, Drivers and Tools in `Scylla Download Center <https://www.scylladb.com/download/#server/>`_
* Download and install ScyllaDB Server, Drivers and Tools in `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_
* :doc:`ScyllaDB Web Installer for Linux <scylla-web-installer>`
* :doc:`ScyllaDB Unified Installer (relocatable executable) <unified-installer>`
* :doc:`Air-gapped Server Installation <air-gapped-install>`

View File

@@ -4,7 +4,7 @@ ScyllaDB Web Installer for Linux
ScyllaDB Web Installer is a platform-agnostic installation script you can run with ``curl`` to install ScyllaDB on Linux.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#server>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
Prerequisites
--------------

View File

@@ -25,11 +25,7 @@ ScyllaDB Open Source
.. note::
Recommended OS and ScyllaDB AMI/Image OS for ScyllaDB Open Source:
- Ubuntu 20.04 for versions 4.6 and later.
- CentOS 7 for versions earlier than 4.6.
The recommended OS for ScyllaDB Open Source is Ubuntu 22.04.
+----------------------------+----------------------------------+-----------------------------+---------+-------+
| Linux Distributions | Ubuntu | Debian | CentOS /| Rocky/|
@@ -37,6 +33,8 @@ ScyllaDB Open Source
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| ScyllaDB Version / Version | 14.04| 16.04| 18.04|20.04 |22.04 | 8 | 9 | 10 | 11 | 7 | 8 |
+============================+======+======+======+======+======+======+======+=======+=======+=========+=======+
| 5.2 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 5.1 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 5.0 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
@@ -63,17 +61,18 @@ ScyllaDB Open Source
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 4.3).
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 4.3). Since
version 5.2, the ScyllaDB AMI/Image OS for ScyllaDB Open Source is based on Ubuntu 22.04.
ScyllaDB Enterprise
--------------------
.. note::
Recommended OS and ScyllaDB AMI/Image OS for ScyllaDB Enterprise:
- Ubuntu 20.04 for versions 2021.1 and later.
- CentOS 7 for versions earlier than 2021.1.
The recommended OS for ScyllaDB Enterprise is Ubuntu 22.04.
+----------------------------+-----------------------------------+---------------------------+--------+-------+
| Linux Distributions | Ubuntu | Debian | CentOS/| Rocky/|
@@ -83,7 +82,7 @@ ScyllaDB Enterprise
+============================+======+======+======+======+=======+======+======+======+======+========+=======+
| 2022.2 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2022.1 | |x| | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |v| | |v| |
| 2022.1 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2021.1 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
@@ -95,4 +94,5 @@ ScyllaDB Enterprise
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 2021.1).
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 2021.1). Since
version 2023.1, the ScyllaDB AMI/Image OS for ScyllaDB Enterprise is based on Ubuntu 22.04.

View File

@@ -13,7 +13,7 @@
:image: /_static/img/mascots/scylla-docs.svg
:search_box:
The most up-to-date documents for the fastest, best performing, high availability NoSQL database.
New to ScyllaDB? Start `here <https://cloud.docs.scylladb.com/stable/scylladb-basics/>`_!
.. raw:: html
@@ -26,16 +26,7 @@
<div class="grid-x grid-margin-x hs">
.. topic-box::
:title: New to ScyllaDB? Start here!
:link: https://cloud.docs.scylladb.com/stable/scylladb-basics/
:class: large-4
:anchor: ScyllaDB Basics
Learn the essentials of ScyllaDB.
.. topic-box::
:title: Let us manage your DB
:title: ScyllaDB Cloud
:link: https://cloud.docs.scylladb.com
:class: large-4
:anchor: ScyllaDB Cloud Documentation
@@ -43,12 +34,20 @@
Simplify application development with ScyllaDB Cloud - a fully managed database-as-a-service.
.. topic-box::
:title: Manage your own DB
:title: ScyllaDB Enterprise
:link: https://enterprise.docs.scylladb.com
:class: large-4
:anchor: ScyllaDB Enterprise Documentation
Deploy and manage ScyllaDB's most stable enterprise-grade database with premium features and 24/7 support.
.. topic-box::
:title: ScyllaDB Open Source
:link: getting-started
:class: large-4
:anchor: ScyllaDB Open Source and Enterprise Documentation
:anchor: ScyllaDB Open Source Documentation
Deploy and manage your database in your own environment.
Deploy and manage your database in your environment.
.. raw:: html
@@ -59,40 +58,16 @@
<div class="topics-grid topics-grid--products">
<h2 class="topics-grid__title">Our Products</h2>
<h2 class="topics-grid__title">Other Products</h2>
<div class="grid-container full">
<div class="grid-x grid-margin-x">
.. topic-box::
:title: ScyllaDB Enterprise
:link: getting-started
:image: /_static/img/mascots/scylla-enterprise.svg
:class: topic-box--product,large-3,small-6
ScyllaDBs most stable high-performance enterprise-grade NoSQL database.
.. topic-box::
:title: ScyllaDB Open Source
:link: getting-started
:image: /_static/img/mascots/scylla-opensource.svg
:class: topic-box--product,large-3,small-6
A high-performance NoSQL database with a close-to-the-hardware, shared-nothing approach.
.. topic-box::
:title: ScyllaDB Cloud
:link: https://cloud.docs.scylladb.com
:image: /_static/img/mascots/scylla-cloud.svg
:class: topic-box--product,large-3,small-6
A fully managed NoSQL database as a service powered by ScyllaDB Enterprise.
.. topic-box::
:title: ScyllaDB Alternator
:link: https://docs.scylladb.com/stable/alternator/alternator.html
:image: /_static/img/mascots/scylla-alternator.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Open source Amazon DynamoDB-compatible API.
@@ -100,7 +75,7 @@
:title: ScyllaDB Monitoring Stack
:link: https://monitoring.docs.scylladb.com
:image: /_static/img/mascots/scylla-monitor.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Complete open source monitoring solution for your ScyllaDB clusters.
@@ -108,7 +83,7 @@
:title: ScyllaDB Manager
:link: https://manager.docs.scylladb.com
:image: /_static/img/mascots/scylla-manager.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Hassle-free ScyllaDB NoSQL database management for scale-out clusters.
@@ -116,7 +91,7 @@
:title: ScyllaDB Drivers
:link: https://docs.scylladb.com/stable/using-scylla/drivers/
:image: /_static/img/mascots/scylla-drivers.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Shard-aware drivers for superior performance.
@@ -124,7 +99,7 @@
:title: ScyllaDB Operator
:link: https://operator.docs.scylladb.com
:image: /_static/img/mascots/scylla-enterprise.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Easily run and manage your ScyllaDB cluster on Kubernetes.

View File

@@ -41,14 +41,6 @@ Scylla nodetool repair command supports the following options:
nodetool repair -et 90874935784
nodetool repair --end-token 90874935784
- ``-seq``, ``--sequential`` Use *-seq* to carry out a sequential repair.
For example, a sequential repair of all keyspaces on a node:
::
nodetool repair -seq
- ``-hosts`` ``--in-hosts`` syncs the **repair master** data subset only between a list of nodes, using host ID or Address. The list *must* include the **repair master**.

View File

@@ -3,6 +3,7 @@
* endpoint_snitch - ``grep endpoint_snitch /etc/scylla/scylla.yaml``
* Scylla version - ``scylla --version``
* Authenticator - ``grep authenticator /etc/scylla/scylla.yaml``
* consistent_cluster_management - ``grep consistent_cluster_management /etc/scylla/scylla.yaml``
.. Note::

View File

@@ -119,6 +119,7 @@ Add New DC
* **listen_address** - IP address that Scylla used to connect to the other Scylla nodes in the cluster.
* **endpoint_snitch** - Set the selected snitch.
* **rpc_address** - Address for client connections (Thrift, CQL).
* **consistent_cluster_management** - set to the same value as used by your existing nodes.
The parameters ``seeds``, ``cluster_name`` and ``endpoint_snitch`` need to match the existing cluster.
@@ -200,6 +201,11 @@ Add New DC
#. If you are using Scylla Monitoring, update the `monitoring stack <https://monitoring.docs.scylladb.com/stable/install/monitoring_stack.html#configure-scylla-nodes-from-files>`_ to monitor it. If you are using Scylla Manager, make sure you install the `Manager Agent <https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html>`_ and Manager can access the new DC.
Handling Failures
=================
If one of the new nodes starts bootstrapping but then fails in the middle e.g. due to a power loss, you can retry bootstrap (by restarting the node). If you don't want to retry, or the node refuses to boot on subsequent attempts, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.
Configure the Client not to Connect to the New DC
-------------------------------------------------

View File

@@ -54,6 +54,8 @@ Procedure
* **seeds** - Specifies the IP address of an existing node in the cluster. The new node will use this IP to connect to the cluster and learn the cluster topology and state.
* **consistent_cluster_management** - set to the same value as used by your existing nodes.
.. note::
In earlier versions of ScyllaDB, seed nodes assisted in gossip. Starting with Scylla Open Source 4.3 and Scylla Enterprise 2021.1, the seed concept in gossip has been removed. If you are using an earlier version of ScyllaDB, you need to configure the seeds parameter in the following way:
@@ -117,3 +119,8 @@ Procedure
You don't need to restart the Scylla service after modifying the seeds list in ``scylla.yaml``.
#. If you are using Scylla Monitoring, update the `monitoring stack <https://monitoring.docs.scylladb.com/stable/install/monitoring_stack.html#configure-scylla-nodes-from-files>`_ to monitor it. If you are using Scylla Manager, make sure you install the `Manager Agent <https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html>`_, and Manager can access it.
Handling Failures
=================
If the node starts bootstrapping but then fails in the middle e.g. due to a power loss, you can retry bootstrap (by restarting the node). If you don't want to retry, or the node refuses to boot on subsequent attempts, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.

View File

@@ -70,6 +70,7 @@ the file can be found under ``/etc/scylla/``
- **listen_address** - IP address that the Scylla use to connect to other Scylla nodes in the cluster
- **endpoint_snitch** - Set the selected snitch
- **rpc_address** - Address for client connection (Thrift, CQLSH)
- **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
3. In the ``cassandra-rackdc.properties`` file, edit the rack and data center information.
The file can be found under ``/etc/scylla/``.

View File

@@ -26,6 +26,7 @@ The file can be found under ``/etc/scylla/``
- **listen_address** - IP address that Scylla used to connect to other Scylla nodes in the cluster
- **endpoint_snitch** - Set the selected snitch
- **rpc_address** - Address for client connection (Thrift, CQL)
- **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
3. This step needs to be done **only** if you are using the **GossipingPropertyFileSnitch**. If not, skip this step.
In the ``cassandra-rackdc.properties`` file, edit the parameters listed below.

View File

@@ -63,6 +63,7 @@ Perform the following steps for each node in the new cluster:
* **rpc_address** - Address for client connection (Thrift, CQL).
* **broadcast_address** - The IP address a node tells other nodes in the cluster to contact it by.
* **broadcast_rpc_address** - Default: unset. The RPC address to broadcast to drivers and other Scylla nodes. It cannot be set to 0.0.0.0. If left blank, it will be set to the value of ``rpc_address``. If ``rpc_address`` is set to 0.0.0.0, ``broadcast_rpc_address`` must be explicitly configured.
* **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
#. After you have installed and configured Scylla and edited ``scylla.yaml`` file on all the nodes, start the node specified with the ``seeds`` parameter. Then start the rest of the nodes in your cluster, one at a time, using
``sudo systemctl start scylla-server``.

View File

@@ -0,0 +1,204 @@
Handling Cluster Membership Change Failures
*******************************************
A failure may happen in the middle of a cluster membership change (that is bootstrap, decommission, removenode, or replace), such as loss of power. If that happens, you should ensure that the cluster is brought back to a consistent state as soon as possible. Further membership changes might be impossible until you do so.
For example, a node that crashed in the middle of decommission might leave the cluster in a state where it considers the node to still be a member, but the node itself will refuse to restart and communicate with the cluster. This particular case is very unlikely - it requires a specifically timed crash to happen, after the data streaming phase of decommission finishes but before the node commits that it left. But if it happens, you won't be able to bootstrap other nodes (they will try to contact the partially-decommissioned node and fail) until you remove the remains of the node that crashed.
---------------------------
Handling a Failed Bootstrap
---------------------------
If a failure happens when trying to bootstrap a new node to the cluster, you can try bootstrapping the node again by restarting it.
If the failure persists or you decided that you don't want to bootstrap the node anymore, follow the instructions in the :ref:`cleaning up after a failed membership change <cleaning-up-after-change>` section to remove the remains of the bootstrapping node. You can then clear the node's data directories and attempt to bootstrap it again.
------------------------------
Handling a Failed Decommission
------------------------------
There are two cases.
Most likely the failure happened during the data repair/streaming phase - before the node tried to leave the token ring. Look for a log message containing "leaving token ring" in the logs of the node that you tried to decommission. For example:
.. code-block:: console
INFO 2023-03-14 13:08:38,323 [shard 0] storage_service - decommission[5b2e752e-964d-4f36-871f-254491f4e8cc]: leaving token ring
If the message is **not** present, the failure happened before the node tried to leave the token ring. In that case you can simply restart the node and attempt to decommission it again.
If the message is present, the node attempted to leave the token ring, but it might have left the cluster only partially before the failure. **Do not try to restart the node**. Instead, you must make sure that the node is dead and remove any leftovers using the :doc:`removenode operation </operating-scylla/nodetool-commands/removenode/>`. See :ref:`cleaning up after a failed membership change <cleaning-up-after-change>`. Trying to restart the node after such failure results in unpredictable behavior - it may restart normally, it may refuse to restart, or it may even try to rebootstrap.
If you don't have access to the node's logs anymore, assume the second case (the node might have attempted to leave the token ring), **do not try to restart the node**, instead follow the :ref:`cleaning up after a failed membership change <cleaning-up-after-change>` section.
----------------------------
Handling a Failed Removenode
----------------------------
Simply retry the removenode operation.
If you somehow lost the host ID of the node that you tried to remove, follow the instructions in :ref:`cleaning up after a failed membership change <cleaning-up-after-change>`.
--------------------------
Handling a Failed Replace
--------------------------
Replace is a special case of bootstrap, but the bootstrapping node tries to take the place of another dead node. You can retry a failed replace operation by restarting the replacing node.
If the failure persists or you decided that you don't want to perform the replace anymore, follow the instructions in :ref:`cleaning up after a failed membership change <cleaning-up-after-change>` section to remove the remains of the replacing node. You can then clear the node's data directories and attempt to replace again. Alternatively, you can remove the dead node which you initially tried to replace using :doc:`removenode </operating-scylla/nodetool-commands/removenode/>`, and perform a regular bootstrap.
.. _cleaning-up-after-change:
--------------------------------------------
Cleaning up after a Failed Membership Change
--------------------------------------------
After a failed membership change, the cluster may contain remains of a node that tried to leave or join - other nodes may consider the node a member, possibly in a transitioning state. It is important to remove any such "ghost" members. Their presence may reduce the cluster's availability, performance, or prevent further membership changes.
You need to determine the host IDs of any potential ghost members, then remove them using the :doc:`removenode operation </operating-scylla/nodetool-commands/removenode/>`. Note that after a failed replace, there may be two different host IDs that you'll want to find and run ``removenode`` on: the new replacing node and the old node that you tried to replace. (Or you can remove the new node only, then try to replace the old node again.)
Step One: Determining Host IDs of Ghost Members
===============================================
* After a failed bootstrap, you need to determine the host ID of the node that tried to bootstrap, if it managed to generate a host ID (it might not have chosen the host ID yet if it failed very early in the procedure, in which case there's nothing to remove). Look for a message containing ``system_keyspace - Setting local host id to`` in the node's logs, which will contain the node's host ID. For example: ``system_keyspace - Setting local host id to f180b78b-6094-434d-8432-7327f4d4b38d``. If you don't have access to the node's logs, read the generic method below.
* After a failed decommission, you need to determine the host ID of the node that tried to decommission. You can search the node's logs as in the failed bootstrap case (see above), or you can use the generic method below.
* After a failed removenode, you need to determine the host ID of the node that you tried to remove. You should already have it, since executing a removenode requires the host ID in the first place. But if you lost it somehow, read the generic method below.
* After a failed replace, you need to determine the host ID of the replacing node. Search the node's logs as in the failed bootstrap case (see above), or you can use the generic method below. You may also want to determine the host ID of the replaced node - either to attempt replacing it again after removing the remains of the previous replacing node, or to remove it using :doc:`nodetool removenode </operating-scylla/nodetool-commands/removenode/>`. You should already have the host ID of the replaced node if you used the ``replace_node_first_boot`` option to perform the replace.
If you cannot determine the ghost members' host ID using the suggestions above, use the method described below. The approach differs depending on whether Raft is enabled in your cluster.
.. tabs::
.. group-tab:: Raft enabled
#. Make sure there are no ongoing membership changes.
#. Execute the following CQL query on one of your nodes to retrieve the Raft group 0 ID:
.. code-block:: cql
select value from system.scylla_local where key = 'raft_group0_id'
For example:
.. code-block:: cql
cqlsh> select value from system.scylla_local where key = 'raft_group0_id';
value
--------------------------------------
607fef80-c276-11ed-a6f6-3075f294cc65
#. Use the obtained Raft group 0 ID to query the set of all cluster members' host IDs (which includes the ghost members), by executing the following query:
.. code-block:: cql
select server_id from system.raft_state where group_id = <group0_id>
replace ``<group0_id>`` with the group 0 ID that you obtained. For example:
.. code-block:: cql
cqlsh> select server_id from system.raft_state where group_id = 607fef80-c276-11ed-a6f6-3075f294cc65;
server_id
--------------------------------------
26a9badc-6e96-4b86-a8df-5173e5ab47fe
7991e7f5-692e-45a0-8ae5-438be5bc7c4f
aff11c6d-fbe7-4395-b7ca-3912d7dba2c6
#. Execute the following CQL query to obtain the host IDs of all token ring members:
.. code-block:: cql
select host_id, up from system.cluster_status;
For example:
.. code-block:: cql
cqlsh> select peer, host_id, up from system.cluster_status;
peer | host_id | up
-----------+--------------------------------------+-------
127.0.0.3 | null | False
127.0.0.1 | 26a9badc-6e96-4b86-a8df-5173e5ab47fe | True
127.0.0.2 | 7991e7f5-692e-45a0-8ae5-438be5bc7c4f | True
The output of this query is similar to the output of ``nodetool status``.
We included the ``up`` column to see which nodes are down and the ``peer`` column to see their IP addresses.
In this example, one of the nodes tried to decommission and crashed as soon as it left the token ring but before it left the Raft group. Its entry will show up in ``system.cluster_status`` queries with ``host_id = null``, like above, until the cluster is restarted.
#. A host ID belongs to a ghost member if:
* It appears in the ``system.raft_state`` query but not in the ``system.cluster_status`` query,
* Or it appears in the ``system.cluster_status`` query but does not correspond to any remaining node in your cluster.
In our example, the ghost member's host ID was ``aff11c6d-fbe7-4395-b7ca-3912d7dba2c6`` because it appeared in the ``system.raft_state`` query but not in the ``system.cluster_status`` query.
If you're unsure whether a given row in the ``system.cluster_status`` query corresponds to a node in your cluster, you can connect to each node in the cluster and execute ``select host_id from system.local`` (or search the node's logs) to obtain that node's host ID, collecting the host IDs of all nodes in your cluster. Then check if each host ID from the ``system.cluster_status`` query appears in your collected set; if not, it's a ghost member.
A good rule of thumb is to look at the members marked as down (``up = False`` in ``system.cluster_status``) - ghost members are eventually marked as down by the remaining members of the cluster. But remember that a real member might also be marked as down if it was shutdown or partitioned away from the rest of the cluster. If in doubt, connect to each node and collect their host IDs, as described in the previous paragraph.
.. group-tab:: Raft disabled
#. Make sure there are no ongoing membership changes.
#. Execute the following CQL query on one of your nodes to obtain the host IDs of all token ring members:
.. code-block:: cql
select peer, host_id, up from system.cluster_status;
For example:
.. code-block:: cql
cqlsh> select peer, host_id, up from system.cluster_status;
peer | host_id | up
-----------+--------------------------------------+-------
127.0.0.3 | 42405b3b-487e-4759-8590-ddb9bdcebdc5 | False
127.0.0.1 | 4e3ee715-528f-4dc9-b10f-7cf294655a9e | True
127.0.0.2 | 225a80d0-633d-45d2-afeb-a5fa422c9bd5 | True
The output of this query is similar to the output of ``nodetool status``.
We included the ``up`` column to see which nodes are down.
In this example, one of the 3 nodes tried to decommission but crashed while it was leaving the token ring. The node is in a partially left state and will refuse to restart, but other nodes still consider it as a normal member. We'll have to use ``removenode`` to clean up after it.
#. A host ID belongs to a ghost member if it appears in the ``system.cluster_status`` query but does not correspond to any remaining node in your cluster.
If you're unsure whether a given row in the ``system.cluster_status`` query corresponds to a node in your cluster, you can connect to each node in the cluster and execute ``select host_id from system.local`` (or search the node's logs) to obtain that node's host ID, collecting the host IDs of all nodes in your cluster. Then check if each host ID from the ``system.cluster_status`` query appears in your collected set; if not, it's a ghost member.
A good rule of thumb is to look at the members marked as down (``up = False`` in ``system.cluster_status``) - ghost members are eventually marked as down by the remaining members of the cluster. But remember that a real member might also be marked as down if it was shutdown or partitioned away from the rest of the cluster. If in doubt, connect to each node and collect their host IDs, as described in the previous paragraph.
In our example, the ghost member's host ID is ``42405b3b-487e-4759-8590-ddb9bdcebdc5`` because it is the only member marked as down and we can verify that the other two rows appearing in ``system.cluster_status`` belong to the remaining 2 nodes in the cluster.
In some cases, even after a failed topology change, there may be no ghost members left - for example, if a bootstrapping node crashed very early in the procedure or a decommissioning node crashed after it committed the membership change but before it finalized its own shutdown steps.
If any ghost members are present, proceed to the next step.
Step Two: Removing the Ghost Members
====================================
Given the host IDs of ghost members, you can remove them using ``removenode``; follow the :doc:`documentation for removenode operation </operating-scylla/nodetool-commands/removenode/>`.
If you're executing ``removenode`` too quickly after a failed membership change, an error similar to the following might pop up:
.. code-block:: console
nodetool: Scylla API server HTTP POST to URL '/storage_service/remove_node' failed: seastar::rpc::remote_verb_error (node_ops_cmd_check: Node 127.0.0.2 rejected node_ops_cmd=removenode_abort from node=127.0.0.1 with ops_uuid=0ba0a5ab-efbd-4801-a31c-034b5f55487c, pending_node_ops={b47523f2-de6a-4c38-8490-39127dba6b6a}, pending node ops is in progress)
In that case simply wait for 2 minutes before trying ``removenode`` again.
If ``removenode`` returns an error like:
.. code-block:: console
nodetool: Scylla API server HTTP POST to URL '/storage_service/remove_node' failed: std::runtime_error (removenode[12e7e05b-d1ae-4978-b6a6-de0066aa80d8]: Host ID 42405b3b-487e-4759-8590-ddb9bdcebdc5 not found in the cluster)
and you're sure that you're providing the correct Host ID, it means that the member was already removed and you don't have to clean up after it.

View File

@@ -25,6 +25,7 @@ Cluster Management Procedures
Safely Shutdown Your Cluster <safe-shutdown>
Safely Restart Your Cluster <safe-start>
Cluster Membership Change <membership-changes>
Handling Membership Change Failures <handling-membership-change-failures>
repair-based-node-operation
.. panel-box::
@@ -80,6 +81,8 @@ Cluster Management Procedures
* :doc:`Cluster Membership Change Notes </operating-scylla/procedures/cluster-management/membership-changes/>`
* :doc:`Handling Membership Change Failures </operating-scylla/procedures/cluster-management/handling-membership-change-failures>`
* :ref:`Add Bigger Nodes to a Cluster <add-bigger-nodes-to-a-cluster>`
* :doc:`Repair Based Node Operations (RBNO) </operating-scylla/procedures/cluster-management/repair-based-node-operation>`

View File

@@ -49,6 +49,11 @@ Removing a Running Node
.. include:: /rst_include/clean-data-code.rst
Handling Failures
-----------------
If ``nodetool decommission`` starts executing but then fails in the middle e.g. due to a power loss, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.
----------------------------
Removing an Unavailable Node
----------------------------
@@ -81,7 +86,6 @@ the ``nodetool removenode`` operation will fail. To ensure successful operation
``nodetool removenode`` (not required when :doc:`Repair Based Node Operations (RBNO) <repair-based-node-operation>` for ``removenode``
is enabled).
Additional Information
----------------------
* :doc:`Nodetool Reference </operating-scylla/nodetool>`

View File

@@ -25,6 +25,7 @@ Login to one of the nodes in the cluster with (UN) status, collect the following
* seeds - ``cat /etc/scylla/scylla.yaml | grep seeds:``
* endpoint_snitch - ``cat /etc/scylla/scylla.yaml | grep endpoint_snitch``
* Scylla version - ``scylla --version``
* consistent_cluster_management - ``grep consistent_cluster_management /etc/scylla/scylla.yaml``
Procedure
---------

View File

@@ -66,6 +66,8 @@ Procedure
- **rpc_address** - Address for client connection (Thrift, CQL)
- **consistent_cluster_management** - set to the same value as used by your existing nodes.
#. Add the ``replace_node_first_boot`` parameter to the ``scylla.yaml`` config file on the new node. This line can be added to any place in the config file. After a successful node replacement, there is no need to remove it from the ``scylla.yaml`` file. (Note: The obsolete parameters "replace_address" and "replace_address_first_boot" are not supported and should not be used). The value of the ``replace_node_first_boot`` parameter should be the Host ID of the node to be replaced.
For example (using the Host ID of the failed node from above):
@@ -150,6 +152,12 @@ Procedure
.. note::
When :doc:`Repair Based Node Operations (RBNO) <repair-based-node-operation>` for **replace** is enabled, there is no need to rerun repair.
Handling Failures
-----------------
If the new node starts and begins the replace operation but then fails in the middle e.g. due to a power loss, you can retry the replace (by restarting the node). If you don't want to retry, or the node refuses to boot on subsequent attempts, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.
------------------------------
Setup RAID Following a Restart
------------------------------

View File

@@ -198,7 +198,7 @@ By default ScyllaDB will try to use cache, but since the data wont be used ag
As a consequence it can lead to bad latency on operational workloads due to increased rate of cache misses.
To prevent this problem, queries from analytical workloads can bypass the cache using the bypass cache option.
:ref:`Bypass Cache <select-statement>` is only available with Scylla Enterprise.
See :ref:`Bypass Cache <bypass-cache>` for more information.
Batching
========

View File

@@ -68,7 +68,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------
@@ -92,13 +92,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
3. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
3. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
4. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -130,7 +130,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
@@ -164,7 +164,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -114,7 +114,7 @@ New io.conf format was introduced in ScyllaDB 2.3 and 2019.1. If your io.conf do
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
@@ -154,7 +154,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the old release
------------------------------------

View File

@@ -66,7 +66,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------

View File

@@ -16,13 +16,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -54,7 +54,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
@@ -88,7 +88,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -7,7 +7,7 @@ This document is a step-by-step procedure for upgrading from ScyllaDB Enterprise
Applicable Versions
===================
This guide covers upgrading ScyllaDB Enterprise from version 2021.1.x to ScyllaDB Enterprise version 2022.1.y on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
This guide covers upgrading ScyllaDB Enterprise from version **2021.1.8** or later to ScyllaDB Enterprise version 2022.1.y on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
Upgrade Procedure
=================
@@ -69,7 +69,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------

View File

@@ -36,13 +36,13 @@ A new io.conf format was introduced in Scylla 2.3 and 2019.1. If your io.conf do
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -75,7 +75,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the old release
------------------------------------
@@ -120,7 +120,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -8,8 +8,8 @@ Upgrading ScyllaDB images requires updating:
* Underlying OS packages. Starting with ScyllaDB 4.6, each ScyllaDB version includes a list of 3rd party and
OS packages tested with the ScyllaDB release. The list depends on the base OS:
* ScyllaDB Open Source **4.4** and ScyllaDB Enterprise **2020.1** or earlier are based on **CentOS 7**.
* ScyllaDB Open Source **4.5** and ScyllaDB Enterprise **2021.1** or later are based on **Ubuntu 20.04**.
* ScyllaDB Open Source **5.0 and 5.1** and ScyllaDB Enterprise **2021.1, 2022.1, and 2022.2** are based on **Ubuntu 20.04**.
* ScyllaDB Open Source **5.2** and ScyllaDB Enterprise **2023.1** are based on **Ubuntu 22.04**.
If you're running ScyllaDB Open Source 5.0 or later or ScyllaDB Enterprise 2021.1.10 or later, you can
automatically update 3rd party and OS packages together with the ScyllaDB packages - by running one command.

View File

@@ -6,10 +6,10 @@ Upgrade ScyllaDB
:titlesonly:
:hidden:
ScyllaDB Enterprise <upgrade-enterprise/index>
ScyllaDB Open Source <upgrade-opensource/index>
ScyllaDB Open Source to ScyllaDB Enterprise <upgrade-to-enterprise/index>
ScyllaDB AMI <ami-upgrade>
ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/enterprise/upgrade/upgrade-enterprise/index.html>
.. raw:: html
@@ -23,14 +23,14 @@ Upgrade ScyllaDB
Procedures for upgrading Scylla.
* :doc:`Upgrade ScyllaDB Enterprise <upgrade-enterprise/index>`
* :doc:`Upgrade ScyllaDB Open Source <upgrade-opensource/index>`
* :doc:`Upgrade from ScyllaDB Open Source to Scylla Enterprise <upgrade-to-enterprise/index>`
* :doc:`Upgrade ScyllaDB AMI <ami-upgrade>`
* `Upgrade ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/enterprise/upgrade/upgrade-enterprise/index.html>`_
.. raw:: html

View File

@@ -1,17 +0,0 @@
.. include:: /upgrade/upgrade-enterprise/_common/gossip_generation_bug_warning.rst
.. note::
Scylla Enterprise 2019.1.6 added a new configuration to restrict the memory usage cartesian product IN queries.
If you are using IN in SELECT operations and hitting a *"cartesian product size ... is greater than maximum"* error, you can either update the query (recommended) or bypass the warning temporarily by adding the following parameters to *scylla.yaml*:
* *max_clustering_key_restrictions_per_query: 1000*
* *max_partition_key_restrictions_per_query: 1000*
The higher the values, the more likely you will hit an out of memory issue.
.. note::
Scylla Enterprise 2019.1.8 added a new configuration to restrict the memory usage of reverse queries.
If you are using reverse queries and hitting an error *"Aborting reverse partition read because partition ... is larger than the maximum safe size of ... for reversible partitions"* see the :doc:`reverse queries FAQ section </troubleshooting/reverse-queries>`.

View File

@@ -1,4 +0,0 @@
.. include:: /upgrade/upgrade-enterprise/_common/gossip_generation_bug_warning.rst
.. include:: /upgrade/upgrade-enterprise/_common/mv_si_rebuild_warning.rst

View File

@@ -1,10 +0,0 @@
.. note:: The note is only useful when CDC is GA supported in the target Scylla. Execute the following commands one node at the time, moving to the next node only **after** the upgrade procedure completed successfully.
.. warning::
If you are using CDC and upgrading Scylla 2020.1 to 2021.1, please review the API updates in :doc:`querying CDC streams </using-scylla/cdc/cdc-querying-streams>` and :doc:`CDC stream generations </using-scylla/cdc/cdc-stream-generations>`.
In particular, you should update applications that use CDC according to :ref:`CDC Upgrade notes <scylla-4-3-to-4-4-upgrade>` **before** upgrading the cluster to 2021.1.
If you are using CDC and upgrading from pre 2020.1 version to 2020.1, note the :doc:`upgrading from experimental CDC </kb/cdc-experimental-upgrade>`.
.. include:: /upgrade/upgrade-enterprise/_common/mv_si_rebuild_warning.rst

View File

@@ -1,6 +0,0 @@
.. note:: The note is only useful when CDC is GA supported in the target ScyllaDB. Execute the following commands one node at a time, moving to the next node only **after** the upgrade procedure completed successfully.
.. warning::
If you are using CDC and upgrading ScyllaDB 2021.1 to 2022.1, please review the API updates in :doc:`querying CDC streams </using-scylla/cdc/cdc-querying-streams>` and :doc:`CDC stream generations </using-scylla/cdc/cdc-stream-generations>`.
In particular, you should update applications that use CDC according to :ref:`CDC Upgrade notes <scylla-4-3-to-4-4-upgrade>` **before** upgrading the cluster to 2022.1.

View File

@@ -1,9 +0,0 @@
.. note::
If **any** of your instances are running Scylla Enterprise 2019.1.6 or earlier, **and** one of your Scylla nodes is up for more than a year, you might have been exposed to issue `#6063 <https://github.com/scylladb/scylla/pull/6083>`_.
One way to check this is by comparing `Generation No` (from `nodetool gossipinfo` output) with the current time in Epoch format (`date +%s`), and check if the difference is higher than one year (31536000 seconds).
See `scylla-check-gossiper-generation <https://github.com/scylladb/scylla-code-samples/tree/master/scylla-check-gossiper-generation>`_ for a script to do just that.
If this is the case, do **not** initiate the upgrade process before consulting with Scylla Support for further instructions.

View File

@@ -1,6 +0,0 @@
.. warning::
If you are using materialized views or secondary indexes created in Scylla 2019.1.x and, **while** upgrading to 2020.1.x (7 or lower) updated your schema; you might have MV inconsistency.
To fix: rebuild the MV.
It is recommended to avoid schema and topology updates during upgrade (mix cluster).

View File

@@ -1,50 +0,0 @@
=============================
Upgrade Scylla Enterprise
=============================
.. toctree::
:hidden:
:titlesonly:
ScyllaDB Enterprise 2022 <upgrade-guide-from-2022.x.y-to-2022.x.z/index>
ScyllaDB Enterprise 2021 <upgrade-guide-from-2021.x.y-to-2021.x.z/index>
ScyllaDB Enterprise 2020 <upgrade-guide-from-2020.x.y-to-2020.x.z/index>
ScyllaDB Enterprise 2019 <upgrade-guide-from-2019.x.y-to-2019.x.z/index>
ScyllaDB Enterprise 2018 <upgrade-guide-from-2018.x.y-to-2018.x.z/index>
ScyllaDB Enterprise 2017 <upgrade-guide-from-2017.x.y-to-2017.x.z/index>
ScyllaDB Enterprise 2022.1 to Scylla Enterprise 2022.2 <upgrade-guide-from-2022.1-to-2022.2/index>
ScyllaDB Enterprise 2021.1 to Scylla Enterprise 2022.1 <upgrade-guide-from-2021.1-to-2022.1/index>
ScyllaDB Enterprise 2020.1 to Scylla Enterprise 2021.1 <upgrade-guide-from-2020.1-to-2021.1/index>
ScyllaDB Enterprise 2019.1 to Scylla Enterprise 2020.1 <upgrade-guide-from-2019.1-to-2020.1/index>
ScyllaDB Enterprise 2018.1 to Scylla Enterprise 2019.1 <upgrade-guide-from-2018.1-to-2019.1/index>
ScyllaDB Enterprise 2017.1 to Scylla Enterprise 2018.1 <upgrade-guide-from-2017.1-to-2018.1/index>
Ubuntu 14.04 to 16.04 <upgrade-guide-from-ubuntu-14-to-16>
.. panel-box::
:title: Upgrade ScyllaDB Enterprise
:id: "getting-started"
:class: my-panel
Procedures for upgrading to a new version of ScyllaDB Enterprise.
Patch Release Upgrade
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2022.x </upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2021.x <upgrade-guide-from-2021.x.y-to-2021.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2020.x <upgrade-guide-from-2020.x.y-to-2020.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2019.x <upgrade-guide-from-2019.x.y-to-2019.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2018.x <upgrade-guide-from-2018.x.y-to-2018.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2017.x <upgrade-guide-from-2017.x.y-to-2017.x.z/index>`
Major Release Upgrade
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2022.1 to Scylla Enterprise 2022.2 (minor release) <upgrade-guide-from-2022.1-to-2022.2/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2021.1 to Scylla Enterprise 2022.1 <upgrade-guide-from-2021.1-to-2022.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2020.1 to Scylla Enterprise 2021.1 <upgrade-guide-from-2020.1-to-2021.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2019.1 to Scylla Enterprise 2020.1 <upgrade-guide-from-2019.1-to-2020.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2018.1 to Scylla Enterprise 2019.1 <upgrade-guide-from-2018.1-to-2019.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2017.1 to Scylla Enterprise 2018.1 <upgrade-guide-from-2017.1-to-2018.1/index>`
* :doc:`Upgrade Guide - Ubuntu 14.04 to 16.04 <upgrade-guide-from-ubuntu-14-to-16>`
* :ref:`Upgrade Unified Installer (relocatable executable) install <unified-installed-upgrade>`

View File

@@ -1,39 +0,0 @@
==================================================
Upgrade from Scylla Enterprise 2017.1 to 2018.1
==================================================
.. toctree::
:hidden:
:titlesonly:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.1-to-2018.1-rpm>
Ubuntu <upgrade-guide-from-2017.1-to-2018.1-ubuntu>
Debian <upgrade-guide-from-2017.1-to-2018.1-debian>
Metrics <metric-update-2017.1-to-2018.1>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade to Scylla Enterprise 2018.1</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2017.1.x to 2018.1.y on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.1-to-2018.1-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2017.1.x to 2018.1.y on Ubuntu <upgrade-guide-from-2017.1-to-2018.1-ubuntu>`
* :doc:`Upgrade Scylla Enterprise from 2017.1.x to 2018.1.y on Debian <upgrade-guide-from-2017.1-to-2018.1-debian>`
* :doc:`Scylla Enterprise Metrics Update - Scylla 2017.1 to 2018.1<metric-update-2017.1-to-2018.1>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,291 +0,0 @@
====================================================================
Scylla Enterprise Metric Update - Scylla Enterprise 2017.1 to 2018.1
====================================================================
Updated Metrics
~~~~~~~~~~~~~~~
The following metric names have changed between Scylla Enterprise 2017.1 and 2018.1
=========================================================================== ===========================================================================
2017.1 2018.1
=========================================================================== ===========================================================================
scylla_batchlog_manager_total_operations_total_write_replay_attempts scylla_batchlog_manager_total_write_replay_attempts
scylla_cache_objects_partitions scylla_cache_partitions
scylla_cache_total_operations_concurrent_misses_same_key scylla_cache_concurrent_misses_same_key
scylla_cache_total_operations_evictions scylla_cache_partition_evictions
scylla_cache_total_operations_hits scylla_cache_partition_hits
scylla_cache_total_operations_insertions scylla_cache_partition_insertions
scylla_cache_total_operations_merges scylla_cache_partition_merges
scylla_cache_total_operations_misses scylla_cache_partition_misses
scylla_cache_total_operations_removals scylla_cache_partition_removals
scylla_commitlog_memory_buffer_list_bytes scylla_commitlog_memory_buffer_bytes
scylla_commitlog_memory_total_size scylla_commitlog_disk_total_bytes
scylla_commitlog_queue_length_allocating_segments scylla_commitlog_allocating_segments
scylla_commitlog_queue_length_pending_allocations scylla_commitlog_pending_allocations
scylla_commitlog_queue_length_pending_flushes scylla_commitlog_pending_flushes
scylla_commitlog_queue_length_segments scylla_commitlog_segments
scylla_commitlog_queue_length_unused_segments scylla_commitlog_unused_segments
scylla_commitlog_total_bytes_slack scylla_commitlog_slack
scylla_commitlog_total_bytes_written scylla_commitlog_bytes_written
scylla_commitlog_total_operations_alloc scylla_commitlog_alloc
scylla_commitlog_total_operations_cycle scylla_commitlog_cycle
scylla_commitlog_total_operations_flush scylla_commitlog_flush
scylla_commitlog_total_operations_flush_limit_exceeded scylla_commitlog_flush_limit_exceeded
scylla_commitlog_total_operations_requests_blocked_memory scylla_commitlog_requests_blocked_memory
scylla_compaction_manager_objects_compactions scylla_compaction_manager_compactions
scylla_cql_total_operations_batches scylla_cql_batches
scylla_cql_total_operations_deletes scylla_cql_deletes
scylla_cql_total_operations_inserts scylla_cql_inserts
scylla_cql_total_operations_reads scylla_cql_reads
scylla_cql_total_operations_updates scylla_cql_updates
scylla_database_bytes_total_result_memory scylla_database_total_result_bytes
scylla_database_queue_length_active_reads scylla_database_active_reads
scylla_database_queue_length_active_reads_streaming scylla_database_active_reads
scylla_database_queue_length_active_reads_system_keyspace scylla_database_active_reads
scylla_database_queue_length_queued_reads scylla_database_queued_reads
scylla_database_queue_length_queued_reads_streaming scylla_database_queued_reads
scylla_database_queue_length_queued_reads_system_keyspace scylla_database_queued_reads
scylla_database_queue_length_requests_blocked_memory scylla_database_requests_blocked_memory_current
scylla_database_total_operations_clustering_filter_count scylla_database_clustering_filter_count
scylla_database_total_operations_clustering_filter_fast_path_count scylla_database_clustering_filter_fast_path_count
scylla_database_total_operations_clustering_filter_sstables_checked scylla_database_clustering_filter_sstables_checked
scylla_database_total_operations_clustering_filter_surviving_sstables scylla_database_clustering_filter_surviving_sstables
scylla_database_total_operations_requests_blocked_memory scylla_database_requests_blocked_memory
scylla_database_total_operations_short_data_queries scylla_database_short_data_queries
scylla_database_total_operations_short_mutation_queries scylla_database_short_mutation_queries
scylla_database_total_operations_sstable_read_queue_overloads scylla_database_sstable_read_queue_overloads
scylla_database_total_operations_total_reads scylla_database_total_reads
scylla_database_total_operations_total_reads_failed scylla_database_total_reads_failed
scylla_database_total_operations_total_writes scylla_database_total_writes
scylla_database_total_operations_total_writes_failed scylla_database_total_writes_failed
scylla_database_total_operations_total_writes_timedout scylla_database_total_writes_timedout
scylla_gossip_derive_heart_beat_version scylla_gossip_heart_beat
scylla_http_0_connections_http_connections scylla_httpd_connections_total
scylla_http_0_current_connections_current scylla_httpd_connections_current
scylla_http_0_http_requests_served scylla_httpd_requests_served
scylla_io_queue_delay_commitlog scylla_io_queue_commitlog_delay
scylla_io_queue_delay_compaction scylla_io_queue_compaction_delay
scylla_io_queue_delay_default scylla_io_queue_default_delay
scylla_io_queue_delay_memtable_flush scylla_io_queue_memtable_flush_delay
scylla_io_queue_derive_commitlog scylla_io_queue_commitlog_total_bytes
scylla_io_queue_derive_compaction scylla_io_queue_compaction_total_bytes
scylla_io_queue_derive_default scylla_io_queue_default_total_bytes
scylla_io_queue_derive_memtable_flush scylla_io_queue_memtable_flush_total_bytes
scylla_io_queue_queue_length_commitlog scylla_io_queue_commitlog_queue_length
scylla_io_queue_queue_length_compaction scylla_io_queue_compaction_queue_length
scylla_io_queue_queue_length_default scylla_io_queue_default_queue_length
scylla_io_queue_queue_length_memtable_flush scylla_io_queue_memtable_flush_queue_length
scylla_io_queue_total_operations_commitlog scylla_io_queue_commitlog_total_operations
scylla_io_queue_total_operations_compaction scylla_io_queue_compaction_total_operations
scylla_io_queue_total_operations_default scylla_io_queue_default_total_operations
scylla_io_queue_total_operations_memtable_flush scylla_io_queue_memtable_flush_total_operations
scylla_lsa_bytes_free_space_in_zones scylla_lsa_free_space_in_zones
scylla_lsa_bytes_large_objects_total_space scylla_lsa_large_objects_total_space_bytes
scylla_lsa_bytes_non_lsa_used_space scylla_lsa_non_lsa_used_space_bytes
scylla_lsa_bytes_small_objects_total_space scylla_lsa_small_objects_total_space_bytes
scylla_lsa_bytes_small_objects_used_space scylla_lsa_small_objects_used_space_bytes
scylla_lsa_bytes_total_space scylla_lsa_total_space_bytes
scylla_lsa_bytes_used_space scylla_lsa_used_space_bytes
scylla_lsa_objects_zones scylla_lsa_zones
scylla_lsa_operations_segments_compacted scylla_lsa_segments_compacted
scylla_lsa_operations_segments_migrated scylla_lsa_segments_migrated
scylla_lsa_percent_occupancy scylla_lsa_occupancy
scylla_memory_bytes_dirty scylla_memory_dirty_bytes
scylla_memory_bytes_regular_dirty scylla_memory_regular_dirty_bytes
scylla_memory_bytes_regular_virtual_dirty scylla_memory_regular_virtual_dirty_bytes
scylla_memory_bytes_streaming_dirty scylla_memory_streaming_dirty_bytes
scylla_memory_bytes_streaming_virtual_dirty scylla_memory_streaming_virtual_dirty_bytes
scylla_memory_bytes_system_dirty scylla_memory_system_dirty_bytes
scylla_memory_bytes_system_virtual_dirty scylla_memory_system_virtual_dirty_bytes
scylla_memory_bytes_virtual_dirty scylla_memory_virtual_dirty_bytes
scylla_memory_memory_allocated_memory scylla_memory_allocated_memory
scylla_memory_memory_free_memory scylla_memory_free_memory
scylla_memory_memory_total_memory scylla_memory_total_memory
scylla_memory_objects_malloc scylla_memory_malloc_live_objects
scylla_memory_total_operations_cross_cpu_free scylla_memory_cross_cpu_free_operations
scylla_memory_total_operations_free scylla_memory_free_operations
scylla_memory_total_operations_malloc scylla_memory_malloc_operations
scylla_memory_total_operations_reclaims scylla_memory_reclaims_operations
scylla_memtables_bytes_pending_flushes scylla_memtables_pending_flushes
scylla_memtables_queue_length_pending_flushes scylla_memtables_pending_flushes_bytes
scylla_query_processor_total_operations_statements_prepared scylla_query_processor_statements_prepared
scylla_reactor_derive_aio_read_bytes scylla_reactor_aio_bytes_read
scylla_reactor_derive_aio_write_bytes scylla_reactor_aio_bytes_write
scylla_reactor_derive_busy_ns scylla_reactor_cpu_busy_ns
scylla_reactor_derive_polls scylla_reactor_polls
scylla_reactor_gauge_load scylla_reactor_utilization
scylla_reactor_gauge_queued_io_requests scylla_reactor_io_queue_requests
scylla_reactor_queue_length_tasks_pending scylla_reactor_tasks_pending
scylla_reactor_queue_length_timers_pending scylla_reactor_timers_pending
scylla_reactor_total_operations_aio_reads scylla_reactor_aio_reads
scylla_reactor_total_operations_aio_writes scylla_reactor_aio_writes
scylla_reactor_total_operations_cexceptions scylla_reactor_cpp_exceptions
scylla_reactor_total_operations_fsyncs scylla_reactor_fsyncs
scylla_reactor_total_operations_io_threaded_fallbacks scylla_reactor_io_threaded_fallbacks
scylla_reactor_total_operations_logging_failures scylla_reactor_logging_failures
scylla_reactor_total_operations_tasks_processed scylla_reactor_tasks_processed
scylla_storage_proxy_coordinator_background_reads scylla_storage_proxy_coordinator_background_read_repairs
scylla_storage_proxy_coordinator_completed_data_reads_local_node scylla_storage_proxy_coordinator_completed_reads_local_node
scylla_storage_proxy_coordinator_data_read_errors_local_node scylla_storage_proxy_coordinator_read_errors_local_node
scylla_storage_proxy_coordinator_data_reads_local_node scylla_storage_proxy_coordinator_reads_local_node
scylla_streaming_derive_total_incoming_bytes scylla_streaming_total_incoming_bytes
scylla_streaming_derive_total_outgoing_bytes scylla_streaming_total_outgoing_bytes
scylla_thrift_connections_thrift_connections scylla_thrift_current_connections
scylla_thrift_current_connections_current scylla_thrift_thrift_connections
scylla_thrift_total_requests_served scylla_thrift_served
scylla_tracing_keyspace_helper_total_operations_bad_column_family_errors scylla_tracing_keyspace_helper_bad_column_family_errors
scylla_tracing_keyspace_helper_total_operations_tracing_errors scylla_tracing_keyspace_helper_tracing_errors
scylla_tracing_queue_length_active_sessions scylla_tracing_active_sessions
scylla_tracing_queue_length_cached_records scylla_tracing_cached_records
scylla_tracing_queue_length_flushing_records scylla_tracing_flushing_records
scylla_tracing_queue_length_pending_for_write_records scylla_tracing_pending_for_write_records
scylla_tracing_total_operations_dropped_records scylla_tracing_dropped_records
scylla_tracing_total_operations_dropped_sessions scylla_tracing_dropped_sessions
scylla_tracing_total_operations_trace_errors scylla_tracing_trace_errors
scylla_tracing_total_operations_trace_records_count scylla_tracing_trace_records_count
scylla_transport_connections_cql_connections scylla_transport_cql_connections
scylla_transport_current_connections_current scylla_transport_current_connections
scylla_transport_queue_length_requests_blocked_memory scylla_transport_requests_blocked_memory
scylla_transport_queue_length_requests_serving scylla_transport_requests_serving
scylla_transport_total_requests_requests_served scylla_transport_requests_served
=========================================================================== ===========================================================================
New Metrics
~~~~~~~~~~~
The following metrics are new in 2018.1
+--------------------------------------------------------------------------+
| New Metric Name |
+==========================================================================+
| scylla_cache_active_reads |
+--------------------------------------------------------------------------+
| scylla_cache_garbage_partitions |
+--------------------------------------------------------------------------+
| scylla_cache_mispopulations |
+--------------------------------------------------------------------------+
| scylla_cache_evictions_from_garbage |
+--------------------------------------------------------------------------+
| scylla_cache_pinned_dirty_memory_overload |
+--------------------------------------------------------------------------+
| scylla_cache_reads |
+--------------------------------------------------------------------------+
| scylla_cache_reads_with_misses |
+--------------------------------------------------------------------------+
| scylla_cache_row_hits |
+--------------------------------------------------------------------------+
| scylla_cache_row_insertions |
+--------------------------------------------------------------------------+
| scylla_cache_row_misses |
+--------------------------------------------------------------------------+
| scylla_cache_sstable_partition_skips |
+--------------------------------------------------------------------------+
| scylla_cache_sstable_reader_recreations |
+--------------------------------------------------------------------------+
| scylla_cache_sstable_row_skips |
+--------------------------------------------------------------------------+
| scylla_cql_batches_pure_logged |
+--------------------------------------------------------------------------+
| scylla_cql_batches_pure_unlogged |
+--------------------------------------------------------------------------+
| scylla_cql_batches_unlogged_from_logged |
+--------------------------------------------------------------------------+
| scylla_cql_prepared_cache_evictions |
+--------------------------------------------------------------------------+
| scylla_cql_prepared_cache_memory_footprint |
+--------------------------------------------------------------------------+
| scylla_cql_prepared_cache_size |
+--------------------------------------------------------------------------+
| scylla_cql_statements_in_batches |
+--------------------------------------------------------------------------+
| scylla_database_active_reads_memory_consumption |
+--------------------------------------------------------------------------+
| scylla_database_counter_cell_lock_acquisition |
+--------------------------------------------------------------------------+
| scylla_database_counter_cell_lock_pending |
+--------------------------------------------------------------------------+
| scylla_database_cpu_flush_quota |
+--------------------------------------------------------------------------+
| scylla_execution_stages_function_calls_enqueued |
+--------------------------------------------------------------------------+
| scylla_execution_stages_function_calls_executed |
+--------------------------------------------------------------------------+
| scylla_execution_stages_tasks_preempted |
+--------------------------------------------------------------------------+
| scylla_execution_stages_tasks_scheduled |
+--------------------------------------------------------------------------+
| scylla_httpd_read_errors |
+--------------------------------------------------------------------------+
| scylla_httpd_reply_errors |
+--------------------------------------------------------------------------+
| scylla_scheduler_queue_length |
+--------------------------------------------------------------------------+
| scylla_scheduler_runtime_ms |
+--------------------------------------------------------------------------+
| scylla_scheduler_shares |
+--------------------------------------------------------------------------+
| scylla_scheduler_tasks_processed |
+--------------------------------------------------------------------------+
| scylla_scylladb_current_version |
+--------------------------------------------------------------------------+
| scylla_sstables_index_page_blocks |
+--------------------------------------------------------------------------+
| scylla_sstables_index_page_hits |
+--------------------------------------------------------------------------+
| scylla_sstables_index_page_misses |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_background_reads |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_foreground_read_repair |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_foreground_reads |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_read_latency |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_write_latency |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_replica_reads |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_replica_received_counter_updates |
+--------------------------------------------------------------------------+
| scylla_transport_unpaged_queries |
+--------------------------------------------------------------------------+
Deprecated Metrics
~~~~~~~~~~~~~~~~~~
The following metrics are deprecated in 2018.1
+--------------------------------------------------------------------------+
| Deprecated Metric Name |
+==========================================================================+
| scylla_cache_total_operations_uncached_wide_partitions |
+--------------------------------------------------------------------------+
| scylla_cache_total_operations_wide_partition_evictions |
+--------------------------------------------------------------------------+
| scylla_io_queue_delay_query |
+--------------------------------------------------------------------------+
| scylla_io_queue_derive_query |
+--------------------------------------------------------------------------+
| scylla_io_queue_queue_length_query |
+--------------------------------------------------------------------------+
| scylla_io_queue_total_operations_query |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_digest_read_errors_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_digest_reads_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_mutation_data_read_errors_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_mutation_data_reads_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_completed_mutation_data_reads_local_node|
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_reads |
+--------------------------------------------------------------------------+

View File

@@ -1,8 +0,0 @@
.. |OS| replace:: Debian 8
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-debian/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: http://www.scylladb.com/enterprise-download/debian8/
.. |ENABLE_APT_REPO| replace:: echo 'deb http://http.debian.net/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list
.. |JESSIE_BACKPORTS| replace:: -t jessie-backports openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2017.1-to-2018.1-ubuntu-and-debian.rst

View File

@@ -1,180 +0,0 @@
=============================================================================================
Upgrade Guide - Scylla Enterprise 2017.1 to 2018.1 for Red Hat Enterprise Linux 7 or CentOS 7
=============================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2017.1 to Scylla Enterprise 2018.1, and rollback to 2017.1 if required.
Applicable versions
===================
This guide covers upgrading Scylla from the following versions: 2017.1.x to Scylla Enterprise version 2018.1.y, on the following platforms:
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Check cluster schema
* Drain node and backup the data
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2018.1 features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes
* Not to apply schema changes
Upgrade steps
=============
Check cluster schema
--------------------
Make sure that all nodes have the schema synched prior to the upgrade. The upgrade will fail if there is a schema disagreement between nodes.
.. code:: sh
nodetool describecluster
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2017.1
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2017.1-2018.1-rpm-rollback-procedure>` the upgrade. If you are not running a 2017.1.x version, stop right here! This guide only covers 2017.1.x to 2018.1.y upgrades.
To upgrade:
1. Update the `Scylla RPM Enterprise repo <http://www.scylladb.com/enterprise-download/centos_rpm/>`_ to **2018.1**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the Scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
* More on :doc:`Scylla Metrics Update - Scylla Enterprise 2017.1 to 2018.1<metric-update-2017.1-to-2018.1>`
.. _upgrade-2017.1-2018.1-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2018.1.x to 2017.1.y. Apply this procedure if an upgrade from 2017.1 to 2018.1 failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2018.1
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to 2017.1, you will:
* Drain the node and stop Scylla
* Retrieve the old Scylla packages
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
1. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/yum.repos.d/scylla.repo
2. Update the `Scylla RPM Enterprise repo <http://www.scylladb.com/enterprise-download/centos_rpm/>`_ to **2017.1**
3. Install
.. code:: sh
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum remove scylla\*tools-core
sudo yum downgrade scylla\* -y
sudo yum install scylla-enterprise
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2017.1 /etc/scylla/scylla.yaml
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from previous snapshot, 2018.1 uses a different set of system tables. Reference doc: :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>`
.. code:: sh
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,8 +0,0 @@
.. |OS| replace:: Ubuntu 14.04 or 16.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-ubuntu/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: http://www.scylladb.com/enterprise-download/
.. |ENABLE_APT_REPO| replace:: sudo add-apt-repository -y ppa:openjdk-r/ppa
.. |JESSIE_BACKPORTS| replace:: openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2017.1-to-2018.1-ubuntu-and-debian.rst

View File

@@ -1,38 +0,0 @@
=====================================================
Upgrade Scylla Enterprise 2017
=====================================================
.. toctree::
:titlesonly:
:hidden:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.x.y-to-2017.x.z-rpm>
Ubuntu <upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu>
Debian <upgrade-guide-from-2017.x.y-to-2017.x.z-debian>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade to Scylla Enterprise 2017.x.z</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2017.x.y to 2017.x.z on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.x.y-to-2017.x.z-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2017.x.y to 2017.x.z on Ubuntu <upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu>`
* :doc:`Upgrade Scylla Enterprise from 2017.x.y to 2017.x.z on Debian <upgrade-guide-from-2017.x.y-to-2017.x.z-debian>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Debian 8
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-debian/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/debian8/
.. include:: /upgrade/_common/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu-and-debian.rst

View File

@@ -1,153 +0,0 @@
===========================================================================================
Upgrade Guide - Scylla Enterprise 2017.x.y to 2017.x.z for Red Hat Enterprise 7 or CentOS 7
===========================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2017.x.y to 2017.x.z.
Applicable versions
===================
This guide covers upgrading Scylla Enterprise from the following versions: 2017.x.y to 2017.x.z, on the following platforms:
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Drain node and backup the data
* Check your current release
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2017.x.z features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes
* Not to apply schema changes
Upgrade steps
=============
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2017.x.z
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2017.x.y-to-2017.x.z-rpm-rollback-procedure>` the upgrade. If you are not running a 2017.x.y version, stop right here! This guide only covers 2017.x.y to 2017.x.z upgrades.
To upgrade:
1. Update the `Scylla Enterprise RPM repo <http://www.scylladb.com/enterprise-download/centos_rpm>`_ to **2017.x**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the Scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
.. _upgrade-2017.x.y-to-2017.x.z-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2017.x.z to 2017.x.y. Apply this procedure if an upgrade from 2017.x.y to 2017.x.z failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2017.x.z
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to 2017.x.y, you will:
* Drain the node and stop Scylla
* Downgrade to previous release
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Downgrade to previous release
-----------------------------
1. Install
.. code:: sh
sudo yum downgrade scylla\*-2017.x.y
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2017.x.z /etc/scylla/scylla.yaml
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Ubuntu 14.04 or 16.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/
.. include:: /upgrade/_common/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu-and-debian.rst

View File

@@ -1,37 +0,0 @@
==================================================
Upgrade from Scylla Enterprise 2018.1 to 2019.1
==================================================
.. toctree::
:hidden:
:titlesonly:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2018.1-to-2019.1-rpm>
Ubuntu 16.04 <upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04>
Metrics <metric-update-2018.1-to-2019.1>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade to Scylla Enterprise 2019.1</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2018.1.x to 2019.1.y on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2018.1-to-2019.1-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2018.1.x to 2019.1.y on Ubuntu 16.04 <upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04>`
* :doc:`Scylla Enterprise Metrics Update - Scylla 2018.1 to 2019.1<metric-update-2018.1-to-2019.1>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,141 +0,0 @@
====================================================================
Scylla Enterprise Metric Update - Scylla Enterprise 2018.1 to 2019.1
====================================================================
New Metrics
~~~~~~~~~~~
The following metrics are new in 2019.1 compare to 2018.1
* scylla_alien_receive_batch_queue_length
* scylla_alien_total_received_messages
* scylla_alien_total_sent_messages
* scylla_cql_authorized_prepared_statements_cache_evictions
* scylla_cql_authorized_prepared_statements_cache_size
* scylla_cql_filtered_read_requests
* scylla_cql_filtered_rows_dropped_total
* scylla_cql_filtered_rows_matched_total
* scylla_cql_filtered_rows_read_total
* scylla_cql_rows_read
* scylla_cql_secondary_index_creates
* scylla_cql_secondary_index_drops
* scylla_cql_secondary_index_reads
* scylla_cql_secondary_index_rows_read
* scylla_cql_unpaged_select_queries
* scylla_cql_user_prepared_auth_cache_footprint
* scylla_database_dropped_view_updates
* scylla_database_large_partition_exceeding_threshold
* scylla_database_multishard_query_failed_reader_saves
* scylla_database_multishard_query_failed_reader_stops
* scylla_database_multishard_query_unpopped_bytes
* scylla_database_multishard_query_unpopped_fragments
* scylla_database_paused_reads
* scylla_database_paused_reads_permit_based_evictions
* scylla_database_total_view_updates_failed_local
* scylla_database_total_view_updates_failed_remote
* scylla_database_total_view_updates_pushed_local
* scylla_database_total_view_updates_pushed_remote
* scylla_database_view_building_paused
* scylla_database_view_update_backlog
* scylla_hints_for_views_manager_corrupted_files
* scylla_hints_for_views_manager_discarded
* scylla_hints_for_views_manager_dropped
* scylla_hints_for_views_manager_errors
* scylla_hints_for_views_manager_sent
* scylla_hints_for_views_manager_size_of_hints_in_progress
* scylla_hints_for_views_manager_written
* scylla_hints_manager_corrupted_files
* scylla_hints_manager_discarded
* scylla_hints_manager_dropped
* scylla_hints_manager_errors
* scylla_hints_manager_sent
* scylla_hints_manager_size_of_hints_in_progress
* scylla_hints_manager_written
* scylla_node_operation_mode
* scylla_query_processor_queries
* scylla_reactor_aio_errors
* scylla_reactor_cpu_steal_time_ms
* scylla_scheduler_time_spent_on_task_quota_violations_ms
* scylla_sstables_capped_local_deletion_time
* scylla_sstables_capped_tombstone_deletion_time
* scylla_sstables_cell_tombstone_writes
* scylla_sstables_cell_writes
* scylla_sstables_partition_reads
* scylla_sstables_partition_seeks
* scylla_sstables_partition_writes
* scylla_sstables_range_partition_reads
* scylla_sstables_range_tombstone_writes
* scylla_sstables_row_reads
* scylla_sstables_row_writes
* scylla_sstables_single_partition_reads
* scylla_sstables_sstable_partition_reads
* scylla_sstables_static_row_writes
* scylla_sstables_tombstone_writes
* scylla_storage_proxy_coordinator_background_replica_writes_failed_local_node
* scylla_storage_proxy_coordinator_background_writes_failed
* scylla_storage_proxy_coordinator_last_mv_flow_control_delay
* scylla_storage_proxy_replica_cross_shard_ops
* scylla_transport_requests_blocked_memory_current
* scylla_io_queue_shares
Updated Metrics
~~~~~~~~~~~~~~~
The following metric names have changed between Scylla Enterprise 2018.1 and 2019.1
.. list-table::
:widths: 30 30
:header-rows: 1
* - Scylla 2018.1 Name
- Scylla 2019.1 Name
* - scylla_io_queue_compaction_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_compaction_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_compaction_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_default_delay
- scylla_io_queue_delay
* - scylla_io_queue_default_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_default_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_default_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_memtable_flush_delay
- scylla_io_queue_delay
* - scylla_io_queue_memtable_flush_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_memtable_flush_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_memtable_flush_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_commitlog_delay
- scylla_io_queue_delay
* - scylla_io_queue_commitlog_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_commitlog_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_commitlog_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_compaction_delay
- scylla_io_queue_delay
* - scylla_reactor_cpu_busy_ns
- scylla_reactor_cpu_busy_ms
* - scylla_storage_proxy_coordinator_current_throttled_writes
- scylla_storage_proxy_coordinator_current_throttled_base_writes
Deprecated Metrics
~~~~~~~~~~~~~~~~~~
* scylla_database_cpu_flush_quota
* scylla_scollectd_latency
* scylla_scollectd_records
* scylla_scollectd_total_bytes_sent
* scylla_scollectd_total_requests
* scylla_scollectd_total_time_in_ms
* scylla_scollectd_total_values
* scylla_transport_unpaged_queries

View File

@@ -1,190 +0,0 @@
=============================================================================================
Upgrade Guide - Scylla Enterprise 2018.1 to 2019.1 for Red Hat Enterprise Linux 7 or CentOS 7
=============================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2018.1 to Scylla Enterprise 2019.1, and rollback to 2018.1 if required.
Applicable versions
===================
This guide covers upgrading Scylla from the following versions: 2018.1.7 or later to Scylla Enterprise version 2019.1.y, on the following platforms:
.. note::
This upgrade procedure only works from **2018.1.7** or later. If you have an older Scylla Enterprise 2018.1.x version, please contact the Scylla Support team for advice.
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
.. include:: /upgrade/upgrade-enterprise/_common/enterprise_2019.1_warnings.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Check cluster schema
* Drain node and backup the data
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade it is highly recommended:
* Not to use new 2019.1 features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/index.html>`_ for suspending Scylla Manager scheduled or running repairs.
* Not to apply schema changes
.. include:: /upgrade/_common/upgrade_to_2019_warning.rst
Upgrade steps
=============
Check cluster schema
--------------------
Make sure that all nodes have the schema synched prior to upgrade, we won't survive an upgrade that has schema disagreement between nodes.
.. code:: sh
nodetool describecluster
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2018.1
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2018.1-2019.1-rpm-rollback-procedure>` the upgrade. If you are not running a 2018.1.x version, stop right here! This guide only covers 2018.1.x to 2019.1.y upgrades.
To upgrade:
1. Update the `Scylla RPM Enterprise repo <http://www.scylladb.com/enterprise-download/centos_rpm/>`_ to **2019.1**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the Scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes, to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
* More on :doc:`Scylla Metrics Update - Scylla Enterprise 2018.1 to 2019.1<metric-update-2018.1-to-2019.1>`
.. _upgrade-2018.1-2019.1-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2019.1.x to 2018.1.y. Apply this procedure if an upgrade from 2018.1 to 2019.1 failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2019.1
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to 2018.1, you will:
* Drain the node and stop Scylla
* Retrieve the old Scylla packages
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
1. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/yum.repos.d/scylla.repo
2. Update the `Scylla RPM Enterprise repo <http://www.scylladb.com/enterprise-download/centos_rpm/>`_ to **2018.1**
3. Install
.. code:: sh
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum remove scylla\*tools-core
sudo yum downgrade scylla\* -y
sudo yum install scylla-enterprise
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2018.1 /etc/scylla/scylla.yaml
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from previous snapshot, 2019.1 uses a different set of system tables. Reference doc: :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>`
.. code:: sh
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo chown -R scylla:scylla /var/lib/scylla/data/keyspace_name/table_name-UUID/
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,7 +0,0 @@
.. |OS| replace:: 16.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: https://www.scylladb.com/download/enterprise/scylla-ubuntu-16-04/
.. |OPENJDK| replace:: openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2018.1-to-2019.1-ubuntu-and-debian.rst

View File

@@ -1,35 +0,0 @@
=====================================================
Upgrade Scylla Enterprise 2018
=====================================================
.. toctree::
:titlesonly:
:hidden:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2018.x.y-to-2018.x.z-rpm>
Ubuntu <upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu>
Debian <upgrade-guide-from-2018.x.y-to-2018.x.z-debian>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade Scylla Enterprise</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2018.x.y to 2018.x.z on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2018.x.y-to-2018.x.z-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2018.x.y to 2018.x.z on Ubuntu <upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu>`
* :doc:`Upgrade Scylla Enterprise from 2018.x.y to 2018.x.z on Debian <upgrade-guide-from-2018.x.y-to-2018.x.z-debian>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Debian 8
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-debian/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/debian8/
.. include:: /upgrade/_common/upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu-and-debian.rst

View File

@@ -1,166 +0,0 @@
===========================================================================================
Upgrade Guide - Scylla Enterprise 2018.x.y to 2018.x.z for Red Hat Enterprise 7 or CentOS 7
===========================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2018.x.y to 2018.x.z.
Applicable versions
===================
This guide covers upgrading Scylla Enterprise from the following versions: 2018.x.y to 2018.x.z, on the following platforms:
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
.. include:: /upgrade/upgrade-enterprise/_common/gossip_generation_bug_warning.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Drain node and backup the data
* Check your current release
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2018.x.z features.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes.
* Not to apply schema changes.
Upgrade steps
=============
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2018.x.z
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2018.x.y-to-2018.x.z-rpm-rollback-procedure>` the upgrade. If you are not running a 2018.x.y version, stop right here! This guide only covers 2018.x.y to 2018.x.z upgrades.
To upgrade:
1. Update the `Scylla Enterprise RPM repo <http://www.scylladb.com/enterprise-download/centos_rpm>`_ to **2018.x**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
.. _upgrade-2018.x.y-to-2018.x.z-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2018.x.z to 2018.x.y. Apply this procedure if an upgrade from 2018.x.y to 2018.x.z failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2018.x.z
Scylla rollback is a rolling procedure which does **not** require full cluster shutdown.
For each of the nodes rollback to 2018.x.y, you will:
* Drain the node and stop Scylla
* Downgrade to previous release
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Downgrade to previous release
--------------------------------------------------
1. Install
Scylla-enterprise 2018.1.5 starts to use new gcc packages, the gcc packages should be removed before downgrade if you upgrade from 2018.1.x(x<5) to 2018.1.y(y>=5).
.. code:: sh
sudo yum remove scylla-libgcc73 scylla-libstdc++73 -y
sudo yum downgrade scylla\*-2018.x.y -y
sudo yum install scylla-enterprise
If you don't upgrade from 2018.1.x(x<5) to 2018.1.y(y>=5), you can smoothly downgrade the packages.
.. code:: sh
sudo yum downgrade scylla\*-2018.x.y
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2018.x.z /etc/scylla/scylla.yaml
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Ubuntu 14.04 or 16.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/
.. include:: /upgrade/_common/upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu-and-debian.rst

View File

@@ -1,41 +0,0 @@
==================================================
Upgrade from Scylla Enterprise 2019.1 to 2020.1
==================================================
.. toctree::
:hidden:
:titlesonly:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2019.1-to-2020.1-rpm>
Ubuntu 16.04 <upgrade-guide-from-2019.1-to-2020.1-ubuntu-16-04>
Ubuntu 18.04 <upgrade-guide-from-2019.1-to-2020.1-ubuntu-18-04>
Debian <upgrade-guide-from-2019.1-to-2020.1-debian>
Metrics <metric-update-2019.1-to-2020.1>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade to Scylla Enterprise 2020.1</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2019.1.x to 2020.1.y on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2019.1-to-2020.1-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2019.1.x to 2020.1.y on Ubuntu 16.04 <upgrade-guide-from-2019.1-to-2020.1-ubuntu-16-04>`
* :doc:`Upgrade Scylla Enterprise from 2019.1.x to 2020.1.y on Ubuntu 18.04 <upgrade-guide-from-2019.1-to-2020.1-ubuntu-18-04>`
* :doc:`Upgrade Scylla Enterprise from 2019.1.x to 2020.1.y on Debian <upgrade-guide-from-2019.1-to-2020.1-debian>`
* :doc:`Scylla Enterprise Metrics Update - Scylla 2019.1 to 2020.1<metric-update-2019.1-to-2020.1>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,109 +0,0 @@
====================================================================
Scylla Enterprise Metric Update - Scylla Enterprise 2019.1 to 2020.1
====================================================================
The following metrics are new in 2020.1 compared to 2019.1
CQL metrics
~~~~~~~~~~~
* *scylla_cql_deletes_per_ks* : Counts the number of CQL DELETE requests executed on particular keyspaces. The label 'who' indicates where the reqs come from (clients or DB internals)
* *scylla_cql_inserts_per_ks* : Counts the number of CQL INSERT requests executed on particular keyspaces. The label 'who' indicates where the reqs come from (clients or DB internals).
* *scylla_cql_reads_per_ks* : Counts the number of CQL SELECT requests executed on particular keyspaces. The label 'who' indicates where the reqs come from (clients or DB internals)
* *scylla_cql_select_allow_filtering* : Counts the number of SELECT query executions with ALLOW FILTERING option.
* *scylla_cql_select_bypass_caches* : Counts the number of SELECT query executions with BYPASS CACHE option.
* *scylla_cql_select_partition_range_scan* : Counts the number of SELECT query executions requiring partition range scan.
* *scylla_cql_select_partition_range_scan_no_bypass_cache* : Counts the number of SELECT query executions requiring partition range scan without BYPASS CACHE option.
* *scylla_cql_unpaged_select_queries_per_ks* : Counts the number of unpaged CQL SELECT requests against particular keyspaces.
* *scylla_cql_updates_per_ks* : Counts the number of CQL UPDATE requests executed on particular keyspaces. The label 'who' indicates where the reqs come from (clients or DB internals)
SSTable metrics
~~~~~~~~~~~~~~~
* *scylla_sstables_capped_local_deletion_time* : Was local deletion time capped at maximum allowed value in Statistics
* *scylla_sstables_capped_tombstone_deletion_time* : Was partition tombstone deletion time capped at maximum allowed value
* *scylla_sstables_cell_tombstone_writes* : Number of cell tombstones written
* *scylla_sstables_cell_writes* : Number of cells written
* *scylla_sstables_partition_reads* : Number of partitions read
* *scylla_sstables_partition_seeks* : Number of partitions seeked
* *scylla_sstables_partition_writes* : Number of partitions written
* *scylla_sstables_range_partition_reads* : Number of partition range flat mutation reads
* *scylla_sstables_range_tombstone_writes* : Number of range tombstones written
* *scylla_sstables_row_reads* : Number of rows read
* *scylla_sstables_row_writes* : Number of clustering rows written
* *scylla_sstables_single_partition_reads* : Number of single partition flat mutation reads
* *scylla_sstables_sstable_partition_reads* : Number of whole sstable flat mutation reads
* *scylla_sstables_static_row_writes* : Number of static rows written
* *scylla_sstables_tombstone_writes* : Number of tombstones written
Storage Proxy Metrics
~~~~~~~~~~~~~~~~~~~~~
* *scylla_storage_proxy_coordinator_cas_dropped_prune* : How many times a coordinator did not perfom prune after cas
* *scylla_storage_proxy_coordinator_cas_failed_read_round_optimization* : Cas read rounds issued only if previous value is missing on some replica
* *scylla_storage_proxy_coordinator_cas_prune* : How many times paxos prune was done after successful cas operation
* *scylla_storage_proxy_coordinator_cas_read_contention* : How many contended reads were encountered
* *scylla_storage_proxy_coordinator_cas_read_latency* : Transactional read latency histogram
* *scylla_storage_proxy_coordinator_cas_read_timeouts* : Number of transactional read request failed due to a timeout
* *scylla_storage_proxy_coordinator_cas_read_unavailable* : Number of transactional read requests failed due to an "unavailable" error
* *scylla_storage_proxy_coordinator_cas_read_unfinished_commit* : Number of transaction commit attempts that occurred on read
* *scylla_storage_proxy_coordinator_cas_write_condition_not_met* : Number of transaction preconditions that did not match current values
* *scylla_storage_proxy_coordinator_cas_write_contention* : How many contended writes were encountered
* *scylla_storage_proxy_coordinator_cas_write_latency* : Transactional write latency histogram
* *scylla_storage_proxy_coordinator_cas_write_timeout_due_to_uncertainty* : How many times write timeout was reported because of uncertainty in the result
* *scylla_storage_proxy_coordinator_cas_write_timeouts* : Number of transactional write request failed due to a timeout
* *scylla_storage_proxy_coordinator_cas_write_unavailable* : Number of transactional write requests failed due to an "unavailable" error
* *scylla_storage_proxy_coordinator_cas_write_unfinished_commit* : Number of transaction commit attempts that occurred on write
* *scylla_storage_proxy_coordinator_foreground_read_repairs* : Number of foreground read repairs
* *scylla_storage_proxy_coordinator_reads_coordinator_outside_replica_set* : Number of CQL read requests which arrived to a non-replica and had to be forwarded to a replica
* *scylla_storage_proxy_coordinator_writes_coordinator_outside_replica_set* : Number of CQL write requests which arrived to a non-replica and had to be forwarded to a replica
* *scylla_storage_proxy_replica_cas_dropped_prune* : How many times a coordinator did not perfom prune after cas
* *scylla_tracing_keyspace_helper_bad_column_family_errors*
* *Scylla_tracing_keyspace_helper_tracing_errors*
Other metrics
~~~~~~~~~~~~~
* *scylla_stall_detector_reported* : Total number of reported stalls. Look in the traces for the exact reason
* *scylla_database_paused_reads* : The number of currently active reads that are temporarily paused.
* *scylla_database_paused_reads_permit_based_evictions* : The number of paused reads evicted to free up permits. Permits are required for new reads to start, and the database will evict paused reads (if any) to be able to admit new ones if there is a shortage of permits.
* *scylla_database_schema_changed* : The number of times the schema changed
* *scylla_memtables_failed_flushes* : Holds the number of failed memtable flushes. A high value in this metric may indicate a permanent failure to flush a memtable.
* *scylla_query_processor_queries* : Counts queries by consistency level.
* *scylla_reactor_abandoned_failed_futures* : Total number of abandoned failed futures, futures destroyed while still containing an exception
* *scylla_reactor_aio_errors* : Total aio errors
CDC Metrics (disabled in 2020.1.0)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* *scylla_cdc_operations_failed* : Number of failed CDC operations
* *scylla_cdc_operations_on_clustering_row_performed_failed* : Number of failed CDC operations that processed a clustering_row
* *scylla_cdc_operations_on_clustering_row_performed_total* : Number of total CDC operations that processed a clustering_row
* *scylla_cdc_operations_on_list_performed_failed* : Number of failed CDC operations that processed a list
* *scylla_cdc_operations_on_list_performed_total* : Number of total CDC operations that processed a list
* *scylla_cdc_operations_on_map_performed_failed* : Number of failed CDC operations that processed a map
* *scylla_cdc_operations_on_map_performed_total* : Number of total CDC operations that processed a map
* *scylla_cdc_operations_on_partition_delete_performed_failed* : Number of failed CDC operations that processed a partition_delete
* *scylla_cdc_operations_on_partition_delete_performed_total* : Number of total CDC operations that processed a partition_delete
* *scylla_cdc_operations_on_range_tombstone_performed_failed* : Number of failed CDC operations that processed a range_tombstone
* *scylla_cdc_operations_on_range_tombstone_performed_total* : Number of total CDC operations that processed a range_tombstone
* *scylla_cdc_operations_on_row_delete_performed_failed* : Number of failed CDC operations that processed a row_delete
* *scylla_cdc_operations_on_row_delete_performed_total* : Number of total CDC operations that processed a row_delete
* *scylla_cdc_operations_on_set_performed_failed* : Number of failed CDC operations that processed a set
* *scylla_cdc_operations_on_set_performed_total* : Number of total CDC operations that processed a set
* *scylla_cdc_operations_on_static_row_performed_failed* : Number of failed CDC operations that processed a static_row
* *scylla_cdc_operations_on_static_row_performed_total* : Number of total CDC operations that processed a static_row
* *scylla_cdc_operations_on_udt_performed_failed* : Number of failed CDC operations that processed a udt
* *scylla_cdc_operations_on_udt_performed_total* : Number of total CDC operations that processed a udt
* *scylla_cdc_operations_total* : Number of total CDC operations
* *scylla_cdc_operations_with_postimage_failed* : Number of failed operations that included postimage
* *scylla_cdc_operations_with_postimage_total* : Number of total operations that included postimage
* *scylla_cdc_operations_with_preimage_failed* : Number of failed operations that included preimage
* *scylla_cdc_operations_with_preimage_total* : Number of total operations that included preimage
* *scylla_cdc_preimage_selects_failed* : Number of failed preimage queries performed
* *scylla_cdc_preimage_selects_total* : Number of total preimage queries performed
* *scylla_compaction_manager_pending_compactions* : Holds the number of compaction tasks waiting for an opportunity to run.

View File

@@ -1,7 +0,0 @@
.. |OS| replace:: Debian 9
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-debian/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: https://www.scylladb.com/customer-portal/?product=ent&platform=debian-9&version=stable-release-2020.1
.. |OPENJDK| replace:: openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2019.1-to-2020.1-ubuntu-and-debian.rst

View File

@@ -1,190 +0,0 @@
=============================================================================================
Upgrade Guide - Scylla Enterprise 2019.1 to 2020.1 for Red Hat Enterprise Linux 7 or CentOS 7
=============================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2019.1 to Scylla Enterprise 2020.1, and rollback to 2019.1 if required.
Applicable versions
===================
This guide covers upgrading Scylla from the following versions: 2019.1.7 or later to Scylla Enterprise version 2020.1.y, on the following platforms:
.. note::
This upgrade procedure only works from **2019.1.7** or later. If you have an older Scylla Enterprise 2019.1.x version, please contact the Scylla Support team for advice.
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
.. include:: /upgrade/upgrade-enterprise/_common/enterprise_2020.1_warnings.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Check cluster schema
* Drain node and backup the data
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2020.1 features.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/index.html>`_ for suspending Scylla Manager scheduled or running repairs.
* Not to apply schema changes.
.. include:: /upgrade/_common/upgrade_to_2020_warning.rst
Upgrade steps
=============
Check cluster schema
--------------------
Make sure that all nodes have the schema synched prior to the upgrade. The upgrade will fail if there is a schema disagreement between nodes.
.. code:: sh
nodetool describecluster
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2019.1
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2019.1-2020.1-rpm-rollback-procedure>` the upgrade. If you are not running a 2019.1.x version, stop right here! This guide only covers 2019.1.x to 2020.1.y upgrades.
To upgrade:
1. Update the `Scylla RPM Enterprise repo <https://www.scylladb.com/customer-portal/?product=ent&platform=centos7&version=stable-release-2020.1>`_ to **2020.1**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
* More on :doc:`Scylla Metrics Update - Scylla Enterprise 2019.1 to 2020.1<metric-update-2019.1-to-2020.1>`
.. _upgrade-2019.1-2020.1-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2020.1.x to 2019.1.y. Apply this procedure if an upgrade from 2019.1 to 2020.1 failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2020.1
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to 2019.1, you will:
* Drain the node and stop Scylla
* Retrieve the old Scylla packages
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
1. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/yum.repos.d/scylla.repo
2. Update the `Scylla RPM Enterprise 2019.1 repo <https://www.scylladb.com/customer-portal/?product=ent&platform=centos7&version=stable-release-2019.1>`_ to **2019.1**
3. Install
.. code:: sh
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum remove scylla\*tools-core
sudo yum downgrade scylla\* -y
sudo yum install scylla-enterprise
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2019.1 /etc/scylla/scylla.yaml
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from previous snapshot, 2020.1 uses a different set of system tables. Reference doc: :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>`
.. code:: sh
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo chown -R scylla:scylla /var/lib/scylla/data/keyspace_name/table_name-UUID/
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,7 +0,0 @@
.. |OS| replace:: 16.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-16-04/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: https://www.scylladb.com/customer-portal/?product=ent&platform=ubuntu-16.04&version=stable-release-2020.1
.. |OPENJDK| replace:: openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2019.1-to-2020.1-ubuntu-and-debian.rst

View File

@@ -1,7 +0,0 @@
.. |OS| replace:: 18.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-18-04/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: https://www.scylladb.com/customer-portal/?product=ent&platform=ubuntu-18.04&version=stable-release-2020.1
.. |OPENJDK| replace:: openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2019.1-to-2020.1-ubuntu-and-debian.rst

View File

@@ -1,35 +0,0 @@
=====================================================
Upgrade Scylla Enterprise 2019
=====================================================
.. toctree::
:titlesonly:
:hidden:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2019.x.y-to-2019.x.z-rpm>
Ubuntu <upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu>
Debian <upgrade-guide-from-2019.x.y-to-2019.x.z-debian>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade Scylla Enterprise</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2019.x.y to 2019.x.z on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2019.x.y-to-2019.x.z-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2019.x.y to 2019.x.z on Ubuntu <upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu>`
* :doc:`Upgrade Scylla Enterprise from 2019.x.y to 2019.x.z on Debian <upgrade-guide-from-2019.x.y-to-2019.x.z-debian>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Debian 9
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-debian/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/debian9/
.. include:: /upgrade/_common/upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu-and-debian.rst

View File

@@ -1,155 +0,0 @@
===========================================================================================
Upgrade Guide - Scylla Enterprise 2019.x.y to 2019.x.z for Red Hat Enterprise 7 or CentOS 7
===========================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2019.x.y to 2019.x.z.
Applicable versions
===================
This guide covers upgrading Scylla Enterprise from the following versions: 2019.x.y to 2019.x.z, on the following platforms:
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
.. include:: /upgrade/upgrade-enterprise/_common/enterprise_2019.1_warnings.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Drain node and backup the data
* Check your current release
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2019.x.z features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes
* Not to apply schema changes
Upgrade steps
=============
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2019.x.z
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2019.x.y-to-2019.x.z-rpm-rollback-procedure>` the upgrade. If you are not running a 2019.x.y version, stop right here! This guide only covers 2019.x.y to 2019.x.z upgrades.
To upgrade:
1. Update the `Scylla Enterprise RPM repo <http://www.scylladb.com/enterprise-download/centos_rpm>`_ to **2019.x**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes, to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
.. _upgrade-2019.x.y-to-2019.x.z-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2019.x.z to 2019.x.y. Apply this procedure if an upgrade from 2019.x.y to 2019.x.z failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2019.x.z
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to 2019.x.y, you will:
* Drain the node and stop Scylla
* Downgrade to previous release
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Downgrade to previous release
--------------------------------------------------
1. Install
.. code:: sh
sudo yum downgrade scylla\*-2019.x.y -y
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2019.x.z /etc/scylla/scylla.yaml
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Ubuntu 16.04 or 18.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/
.. include:: /upgrade/_common/upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu-and-debian.rst

Some files were not shown because too many files have changed in this diff Show More