Compare commits

...

214 Commits

Author SHA1 Message Date
Beni Peled
58acf071bf release: prepare for 5.2.6 2023-07-30 14:19:28 +03:00
Raphael S. Carvalho
d2369fc546 cached_file: Evict unused pages that aren't linked to LRU yet
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page read ahead will be linked into LRU so it doesn't sit in
memory indefinitely. Also allowing for cached_file dtor to
clear all cache if some of those pages brought in advance
aren't fetched later.

A reproducer was added.

Fixes #14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14818

(cherry picked from commit 050ce9ef1d)
2023-07-28 13:56:28 +02:00
Kamil Braun
6273c4df35 test: use correct timestamp resolution in test_group0_history_clearing_old_entries
In 10c1f1dc80 I fixed
`make_group0_history_state_id_mutation` to use correct timestamp
resolution (microseconds instead of milliseconds) which was supposed to
fix the flakiness of `test_group0_history_clearing_old_entries`.

Unfortunately, the test is still flaky, although now it's failing at a
later step -- this is because I was sloppy and I didn't adjust this
second part of the test to also use microsecond resolution. The test is
counting the number of entries in the `system.group0_history` table that
are older than a certain timestamp, but it's doing the counting using
millisecond resolution, causing it to give results that are off by one
sometimes.

Fix it by using microseconds everywhere.

Fixes #14653

Closes #14670

(cherry picked from commit 9d4b3c6036)
2023-07-27 15:46:37 +02:00
Raphael S. Carvalho
752984e774 Fix stack-use-after-return in mutation source excluding staging
The new test detected a stack-use-after-return when using table's
as_mutation_source_excluding_staging() for range reads.

This doesn't really affect view updates that generate single
key reads only. So the problem was only stressed in the recently
added test. Otherwise, we'd have seen it when running dtests
(in debug mode) that stress the view update path from staging.

The problem happens because the closure was feeded into
a noncopyable_function that was taken by reference. For range
reads, we defer before subsequent usage of the predicate.
For single key reads, we only defer after finished using
the predicate.

Fix is about using sstable_predicate type, so there won't
be a need to construct a temporary object on stack.

Fixes #14812.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14813

(cherry picked from commit 0ac43ea877)
2023-07-26 14:30:32 +03:00
Raphael S. Carvalho
986491447b table: Optimize creation of reader excluding staging for view building
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

This overhead is fixed by not building a new fresh sstable set
when recreating the reader, but rather supplying a predicate
to sstable set that will filter out staging sstables when
creating either a single-key or range scan reader.

This could have another benefit over today's approach which
may incorrectly consider a staging sstable as non-staging, if
the staging sst wasn't included in the current batch for view
building.

With this improvement, view building was measured to be 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs #14089.
Fixes #14244.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 1d8cb32a5d)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14764
2023-07-20 16:46:15 +03:00
Takuya ASADA
a05bb26cd6 scylla_fstrim_setup: start scylla-fstrim.timer on setup
Currently, scylla_fstrim_setup does not start scylla-fstrim.timer and
just enables it, so the timer starts only after rebooted.
This is incorrect behavior, we start start it during the setup.

Also, unmask is unnecessary for enabling the timer.

Fixes #14249

Closes #14252

(cherry picked from commit c70a9cbffe)

Closes #14421
2023-07-18 16:05:09 +03:00
Michał Chojnowski
41aef6dc96 partition_snapshot_reader.hh: fix iterator invalidation in do_refresh_state
do_refresh_state() keeps iterators to rows_entry in a vector.
This vector might be resized during the procedure, triggering
memory reclaim and invalidating the iterators, which can cause
arbitrarily long loops and/or a segmentation fault during make_heap().
To fix this, do_refresh_state has to always be called from the allocating
section.

Additionally, it turns out that the first do_refresh_state is useless,
because reset_state() doesn't set _change_mark. This causes do_refresh_state
to be needlessly repeated during a next_row() or next_range_tombstone() which
happens immediately after it. Therefore this patch moves the _change_mark
assignment from maybe_refresh_state to do_refresh_state, so that the change mark
is properly set even after the first refresh.

Fixes #14696

Closes #14697
2023-07-17 14:20:37 +02:00
Botond Dénes
aa5e904c40 repair: Release permit earlier when the repair_reader is done
Consider

- 10 repair instances take all the 10 _streaming_concurrency_sem

- repair readers are done but the permits are not released since they
  are waiting for view update _registration_sem

- view updates trying to take the _streaming_concurrency_sem to make
  progress of view update so it could release _registration_sem, but it
  could not take _streaming_concurrency_sem since the 10 repair
  instances have taken them

- deadlock happens

Note, when the readers are done, i.e., reaching EOS, the repair reader
replaces the underlying (evictable) reader with an empty reader. The
empty reader is not evictable, so the resources cannot be forcibly
released.

To fix, release the permits manually as soon as the repair readers are
done even if the repair job is waiting for _registration_sem.

Fixes #14676

Closes #14677

(cherry picked from commit 1b577e0414)
2023-07-14 18:18:43 +03:00
Marcin Maliszkiewicz
eff2fe79b1 alternator: close output_stream when exception is thrown during response streaming
When exception occurs and we omit closing output_stream then the whole process is brought down
by an assertion in ~output_stream.

Fixes https://github.com/scylladb/scylladb/issues/14453
Relates https://github.com/scylladb/scylladb/issues/14403

Closes #14454

(cherry picked from commit 6424dd5ec4)
2023-07-13 23:27:46 +03:00
Nadav Har'El
ee8b26167b Merge 'Yield while building large results in Alternator - rjson::print, executor::batch_get_item' from Marcin Maliszkiewicz
Adds preemption points used in Alternator when:
 - sending bigger json response
 - building results for BatchGetItem

I've tested manually by inserting in preemptible sections (e.g. before `os.write`) code similar to:

    auto start  = std::chrono::steady_clock::now();
    do { } while ((std::chrono::steady_clock::now() - start) < 100ms);

and seeing reactor stall times. After the patch they
were not increasing while before they kept building up due to no preemption.

Refs #7926
Fixes #13689

Closes #12351

* github.com:scylladb/scylladb:
  alternator: remove redundant flush call in make_streamed
  utils: yield when streaming json in print()
  alternator: yield during BatchGetItem operation

(cherry picked from commit d2e089777b)
2023-07-13 23:27:38 +03:00
Yaron Kaikov
02bc54d4b6 release: prepare for 5.2.5 2023-07-13 14:23:18 +03:00
Avi Kivity
c9a5c4c876 Merge ' message: match unknown tenants to the default tenant' from Botond Dénes
On connection setup, the isolation cookie of the connection is matched to the appropriate scheduling group. This is achieved by iterating over the known statement tenant connection types as well as the system connections and choosing the one with a matching name.

If a match is not found, it is assumed that the cluster is upgraded and the remote node has a scheduling group the local one doesn't have. To avoid demoting a scheduling group of unknown importance, in this case the default scheduling group is chosen.

This is problematic when upgrading an OSS cluster to an enterprise version, as the scheduling groups of the enterprise service-levels will match none of the statement tenants and will hence fall-back to the default scheduling group. As a consequence, while the cluster is mixed, user workload on old (OSS) nodes, will be executed under the system scheduling group and concurrency semaphore. Not only does this mean that user workloads are directly competing for resources with system ones, but the two workloads are now sharing the semaphore too, reducing the available throughput. This usually manifests in queries timing out on the old (OSS) nodes in the cluster.

This PR proposes to fix this, by recognizing that the unknown scheduling group is in fact a tenant this node doesn't know yet, and matching it with the default statement tenant. With this, order should be restored, with service-level connections being recognized as user connections and being executed in the statement scheduling group and the statement (user) concurrency semaphore.

I tested this manually, by creating a cluster of 2 OSS nodes, then upgrading one of the nodes to enterprise and verifying (with extra logging) that service level connections are matched to the default statement tenant after the PR and they indeed match to the default scheduling group before.

Fixes: #13841
Fixes: #12552

Closes #13843

* github.com:scylladb/scylladb:
  message: match unknown tenants to the default tenant
  message: generalize per-tenant connection types

(cherry picked from commit a7c2c9f92b)
2023-07-12 15:31:48 +03:00
Tomasz Grabiec
1a6f4389ae Merge 'atomic_cell: compare value last' from Benny Halevy
Currently, when two cells have the same write timestamp
and both are alive or expiring, we compare their value first,
before checking if either of them is expiring
and if both are expiring, comparing their expiration time
and ttl value to determine which of them will expire
later or was written later.

This was based on an early version of Cassandra.
However, the Cassandra implementation rightfully changed in
e225c88a65 ([CASSANDRA-14592](https://issues.apache.org/jira/browse/CASSANDRA-14592)),
where the cell expiration is considered before the cell value.

To summarize, the motivation for this change is three fold:
1. Cassandra compatibility
2. Prevent an edge case where a null value is returned by select query when an expired cell has a larger value than a cell with later expiration.
3. A generalization of the above: value-based reconciliation may cause select query to return a mixture of upserts, if multiple upserts use the same timeastamp but have different expiration times.  If the cell value is considered before expiration, the select result may contain cells from different inserts, while reconciling based the expiration times will choose cells consistently from either upserts, as all cells in the respective upsert will carry the same expiration time.

\Fixes scylladb/scylladb#14182

Also, this series:
- updates dml documentation
- updates internal documentation
- updates and adds unit tests and cql pytest reproducing #14182

\Closes scylladb/scylladb#14183

* github.com:scylladb/scylladb:
  docs: dml: add update ordering section
  cql-pytest: test_using_timestamp: add tests for rewrites using same timestamp
  mutation_partition: compare_row_marker_for_merge: consider ttl in case expiry is the same
  atomic_cell: compare_atomic_cell_for_merge: update and add documentation
  compare_atomic_cell_for_merge: compare value last for live cells
  mutation_test: test_cell_ordering: improve debuggability

(cherry picked from commit 87b4606cd6)

Closes #14649
2023-07-12 10:09:56 +03:00
Calle Wilund
1088c3e24a storage_proxy: Make split_stats resilient to being called from different scheduling group
Fixes #11017

When doing writes, storage proxy creates types deriving from abstract_write_response_handler.
These are created in the various scheduling groups executing the write inducing code. They
pick up a group-local reference to the various metrics used by SP. Normally all code
using (and esp. modifying) these metrics are executed in the same scheduling group.
However, if gossip sees a node go down, it will notify listeners, which eventually
calls get_ep_stat and register_metrics.
This code (before this patch) uses _active_ scheduling group to eventually add
metrics, using a local dict as guard against double regs. If, as described above,
we're called in a different sched group than the original one however, this
can cause double registrations.

Fixed here by keeping a reference to creating scheduling group and using this, not
active one, when/if creating new metrics.

Closes #14636
2023-07-12 09:24:56 +03:00
Botond Dénes
c9cb8dcfd0 Merge '[backport 5.2] view: fix range tombstone handling on flushes in view_updating_consumer' from Michał Chojnowski
View update routines accept mutation objects.
But what comes out of staging sstable readers is a stream of mutation_fragment_v2 objects.
To build view updates after a repair/streaming, we have to convert the fragment stream into mutations. This is done by piping the stream to mutation_rebuilder_v2.

To keep memory usage limited, the stream for a single partition might have to be split into multiple partial mutation objects. view_update_consumer does that, but in improper way -- when the split/flush happens inside an active range tombstone, the range tombstone isn't closed properly. This is illegal, and triggers an internal error.

This patch fixes the problem by closing the active range tombstone (and reopening in the same position in the next mutation object).

The tombstone is closed just after the last seen clustered position. This is not necessary for correctness -- for example we could delay all processing of the range tombstone until we see its end bound -- but it seems like the most natural semantic.

Backported from c25201c1a3. `view_build_test.cc` needed some tiny adjustments for the backport.

Closes #14619
Fixes #14503

* github.com:scylladb/scylladb:
  test: view_build_test: add range tombstones to test_view_update_generator_buffering
  test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
  view_updating_consumer: make buffer limit a variable
  view: fix range tombstone handling on flushes in view_updating_consumer
2023-07-11 15:04:23 +03:00
Takuya ASADA
91c1feec51 scylla_raid_setup: wipe filesystem signatures from specified disks
The discussion on the thread says, when we reformat a volume with another
filesystem, kernel and libblkid may skip to populate /dev/disk/by-* since it
detected two filesystem signatures, because mkfs.xxx did not cleared previous
filesystem signature.
To avoid this, we need to run wipefs before running mkfs.

Note that this runs wipefs twice, for target disks and also for RAID device.
wipefs for RAID device is needed since wipefs on disks doesn't clear filesystem signatures on /dev/mdX (we may see previous filesystem signature on /dev/mdX when we construct RAID volume multiple time on same disks).

Also dropped -f option from mkfs.xfs, it will check wipefs is working as we
expected.

Fixes #13737

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #13738

(cherry picked from commit fdceda20cc)
2023-07-11 15:00:03 +03:00
Piotr Dulikowski
57d0310dcc combined: mergers: remove recursion in operator()()
In mutation_reader_merger and clustering_order_reader_merger, the
operator()() is responsible for producing mutation fragments that will
be merged and pushed to the combined reader's buffer. Sometimes, it
might have to advance existing readers, open new and / or close some
existing ones, which requires calling a helper method and then calling
operator()() recursively.

In some unlucky circumstances, a stack overflow can occur:

- Readers have to be opened incrementally,
- Most or all readers must not produce any fragments and need to report
  end of stream without preemption,
- There has to be enough readers opened within the lifetime of the
  combined reader (~500),
- All of the above needs to happen within a single task quota.

In order to prevent such a situation, the code of both reader merger
classes were modified not to perform recursion at all. Most of the code
of the operator()() was moved to maybe_produce_batch which does not
recur if it is not possible for it to produce a fragment, instead it
returns std::nullopt and operator()() calls this method in a loop via
seastar::repeat_until_value.

A regression test is added.

Fixes: scylladb/scylladb#14415

Closes #14452

(cherry picked from commit ee9bfb583c)

Closes #14605
2023-07-11 11:09:25 +03:00
Michał Chojnowski
78f25f2d36 test: view_build_test: add range tombstones to test_view_update_generator_buffering
This patch adds a full-range tombstone to the compacted mutation.
This raises the coverage of the test. In particular, it reproduces
issue #14503, which should have been caught by this test, but wasn't.
2023-07-11 09:44:00 +02:00
Michał Chojnowski
14fa3ee34e test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
A random mutation test for view_updating_consumer's buffering logic.
Reproduces #14503.
2023-07-11 09:44:00 +02:00
Michał Chojnowski
75933b9906 view_updating_consumer: make buffer limit a variable
The limit doesn't change at runtime, but we this patch makes it variable for
unit testing purposes.
2023-07-11 09:44:00 +02:00
Michał Chojnowski
fc7b02c8e4 view: fix range tombstone handling on flushes in view_updating_consumer
View update routines accept `mutation` objects.
But what comes out of staging sstable readers is a stream of
mutation_fragment_v2 objects.
To build view updates after a repair/streaming, we have to
convert the fragment stream into `mutation`s. This is done by piping
the stream to mutation_rebuilder_v2.

To keep memory usage limited, the stream for a single partition might
have to be split into multiple partial `mutation` objects.
view_update_consumer does that, but in improper way -- when the
split/flush happens inside an active range tombstone, the range
tombstone isn't closed properly. This is illegal, and triggers an
internal error.

This patch fixes the problem by closing the active range tombstone
(and reopening in the same position in the next `mutation` object).

The tombstone is closed just after the last seen clustered position.
This is not necessary for correctness -- for example we could delay
all processing of the range tombstone until we see its end
bound -- but it seems like the most natural semantic.

Fixes #14503
2023-07-11 09:44:00 +02:00
Jan Ciolek
0f4f8638c5 forward_service: fix forgetting case-sensitivity in aggregates
There was a bug that caused aggregates to fail when
used on column-sensitive columns.

For example:
```
SELECT SUM("SomeColumn") FROM ks.table;
```
would fail, with a message saying that there
is no column "somecolumn".

This is because the case-sensitivity got lost on the way.

For non case-sensitive column names we convert them to lowercase,
but for case sensitive names we have to preserve the name
as originally written.

The problem was in `forward_service` - we took a column name
and created a non case-sensitive `column_identifier` out of it.
This converted the name to lowercase, and later such column
couldn't be found.

To fix it, let's make the `column_identifier` case-sensitive.
It will preserve the name, without converting it to lowercase.

Fixes: https://github.com/scylladb/scylladb/issues/14307

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
(cherry picked from commit 7fca350075)
2023-07-10 15:22:58 +03:00
Botond Dénes
0ba37fa431 Merge 'doc: fix rollback in the 4.3-to-2021.1, 5.0-to-2022.1, and 5.1-to-2022.2 upgrade guides' from Anna Stuchlik
This PR fixes the Restore System Tables section of the upgrade guides by adding a command to clean upgraded SStables during rollback or adding the entire section to restore system tables (which was missing from the older documents).

This PR fixes is a bug and must be backported to branch-5.3, branch-5.2., and branch-5.1.

Refs: https://github.com/scylladb/scylla-enterprise/issues/3046

- [x]  5.1-to-2022.2 - update command (backport to branch-5.3, branch-5.2, and branch-5.1)
- [x]  5.0-to-2022.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1)
- [x]  4.3-to-2021.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1)

(see https://github.com/scylladb/scylla-enterprise/issues/3046#issuecomment-1604232864)

Closes #14444

* github.com:scylladb/scylladb:
  doc: fix rollback in 4.3-to-2021.1 upgrade guide
  doc: fix rollback in 5.0-to-2022.1 upgrade guide
  doc: fix rollback in 5.1-to-2022.2 upgrade guide

(cherry picked from commit 8a7261fd70)
2023-07-10 15:16:24 +03:00
Raphael S. Carvalho
55edbded47 compaction: avoid excessive reallocation and during input list formatting
with off-strategy, input list size can be close to 1k, which will
lead to unneeded reallocations when formatting the list for
logging.

in the past, we faced stalls in this area, and excessive reallocation
(log2 ~1k = ~10) may have contributed to that.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13907

(cherry picked from commit 5544d12f18)

Fixes scylladb/scylladb#14071
2023-07-09 23:54:18 +03:00
Marcin Maliszkiewicz
9f79c9f41d docs: link general repairs page to RBNO page
Information was duplicated before and the version on this page was outdated - RBNO is enabled for replace operation already.

Closes #12984

(cherry picked from commit bd7caefccf)
2023-07-07 16:38:32 +02:00
Kamil Braun
6dd09bb4ea storage_proxy: query_partition_key_range_concurrent: don't access empty range
`query_partition_range_concurrent` implements an optimization when
querying a token range that intersects multiple vnodes. Instead of
sending a query for each vnode separately, it sometimes sends a single
query to cover multiple vnodes - if the intersection of replica sets for
those vnodes is large enough to satisfy the CL and good enough in terms
of the heat metric. To check the latter condition, the code would take
the smallest heat metric of the intersected replica set and compare them
to smallest heat metrics of replica sets calculated separately for each
vnode.

Unfortunately, there was an edge case that the code didn't handle: the
intersected replica set might be empty and the code would access an
empty range.

This was catched by an assertion added in
8db1d75c6c by the dtest
`test_query_dc_with_rf_0_does_not_crash_db`.

The fix is simple: check if the intersected set is empty - if so, don't
calculate the heat metrics because we can decide early that the
optimization doesn't apply.

Also change the `assert` to `on_internal_error`.

Fixes #14284

Closes #14300

(cherry picked from commit 732feca115)

Backport note: the original `assert` was never added to branch-5.2, but
the fix is still applicable, so I backported the fix and the
`on_internal_error` check.
2023-07-05 13:14:24 +02:00
Mikołaj Grzebieluch
f431345ab6 raft topology: wait_for_peers_to_enter_synchronize_state doesn't need to resolve all IPs
Another node can stop after it joined the group0 but before it advertised itself
in gossip. `get_inet_addrs` will try to resolve all IPs and
`wait_for_peers_to_enter_synchronize_state` will loop indefinitely.

But `wait_for_peers_to_enter_synchronize_state` can return early if one of
the nodes confirms that the upgrade procedure has finished. For that, it doesn't
need the IPs of all group 0 members - only the IP of some nodes which can do
the confirmation.

This commit restructures the code so that IPs of nodes are resolved inside the
`max_concurrent_for_each` that `wait_for_peers_to_enter_synchronize_state` performs.
Then, even if some IPs won't be resolved, but one of the nodes confirms a
successful upgrade, we can continue.

Fixes #13543

(cherry picked from commit a45e0765e4)
2023-07-05 13:01:57 +02:00
Anna Stuchlik
009601d374 doc: fix rollback in 5.2-to-2023.1 upgrade guide
This commit fixes the Restore System Tables section
in the 5.2-to-2023.1 upgrade guide by adding a command
to clean upgraded SStables during rollback.

This is a bug (an incomplete command) and must be
backported to branch-5.3 and branch-5.2.

Refs: https://github.com/scylladb/scylla-enterprise/issues/3046

Closes #14373

(cherry picked from commit f4ae2c095b)
2023-06-29 12:07:41 +03:00
Botond Dénes
8e63b2f3e3 Merge 'readers: evictable_reader: don't accidentally consume the entire partition' from Kamil Braun
The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between.

The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction.

So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491.

There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order.

Fix the comparison and adjust one of the tests (added in #13563) to detect this case.

After the fix, the evictable reader starts generating some redundant (but expected) range tombstone change fragments since it's now being paused and resumed. For this we need to adjust mutation source tests which were a bit too specific. We modify `flat_mutation_reader_assertions` to squash the redundant `r_t_c`s.

Fixes #13491

Closes #14375

* github.com:scylladb/scylladb:
  readers: evictable_reader: don't accidentally consume the entire partition
  test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position

(cherry picked from commit 586102b42e)
2023-06-29 12:04:35 +03:00
Benny Halevy
483c0b183a repair: use fmt::join to print ks_erms|boost::adaptors::map_keys
This is a minimal fix for #13146 for branch-5.2

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14405
2023-06-27 14:15:28 +03:00
Anna Stuchlik
d5063e6347 doc: add Ubuntu 22 to 2021.1 OS support
Fixes https://github.com/scylladb/scylla-enterprise/issues/3036

This commit adds support for Ubuntu 22.04 to the list
of OSes supported by ScyllaDB Enterprise 2021.1.

This commit fixex a bug and must be backported to
branch-5.3 and branch-5.2.

Closes #14372

(cherry picked from commit 74fc69c825)
2023-06-26 13:43:58 +03:00
Anna Stuchlik
543fa04e4d doc: udpate the OSS docs landing page
Fixes https://github.com/scylladb/scylladb/issues/14333

This commit replaces the documentation landing page with
the Open Source-only documentation landing page.

This change is required as now there is a separate landing
page for the ScyllaDB documentation, so the page is duplicated,
creating bad user experience.

(cherry picked from commit f60f89df17)

Closes #14370
2023-06-23 14:00:33 +02:00
Anna Mikhlin
cebbf6c5df release: prepare for 5.2.4 2023-06-22 16:23:46 +03:00
Avi Kivity
73b8669953 Update seastar submodule (default priority class shares)
* seastar 32ab15cda6...29a0e64513 (1):
  > reactor: change shares for default IO class from 1 to 200

Fixes #13753.

In 5.3: 37e6e65211
2023-06-21 21:23:14 +03:00
Botond Dénes
9efca96cf2 Merge 'Backport 5.2 test.py stability/UX improvemenets' from Kamil Braun
Backport the following improvements for test.py topology tests for CI stability:
- https://github.com/scylladb/scylladb/pull/12652
- https://github.com/scylladb/scylladb/pull/12630
- https://github.com/scylladb/scylladb/pull/12619
- https://github.com/scylladb/scylladb/pull/12686
- picked from https://github.com/scylladb/scylladb/pull/12726: 9ceb6aba81
- picked from https://github.com/scylladb/scylladb/pull/12173: fc60484422
- https://github.com/scylladb/scylladb/pull/12765
- https://github.com/scylladb/scylladb/pull/12804
- https://github.com/scylladb/scylladb/pull/13342
- https://github.com/scylladb/scylladb/pull/13589
- picked from https://github.com/scylladb/scylladb/pull/13135: 7309a1bd6b
- picked from https://github.com/scylladb/scylladb/pull/13134: 21b505e67c, a4411e9ec4, c1d0ee2bce, 8e3392c64f, 794d0e4000, e407956e9f
- https://github.com/scylladb/scylladb/pull/13271
- https://github.com/scylladb/scylladb/pull/13399
- picked from https://github.com/scylladb/scylladb/pull/12699: 3508a4e41e, 08d754e13f, 62a945ccd5, 041ee3ffdd
- https://github.com/scylladb/scylladb/pull/13438 (but skipped the test_mutation_schema_change.py fix since I didn't backport this new test)
- https://github.com/scylladb/scylladb/pull/13427
- https://github.com/scylladb/scylladb/pull/13756
- https://github.com/scylladb/scylladb/pull/13789
- https://github.com/scylladb/scylladb/pull/13933 (but skipped the test_snapshot.py fix since I didn't backport this new test)

Closes #14215

* github.com:scylladb/scylladb:
  test: pylib: fix `read_barrier` implementation
  test: pylib: random_tables: perform read barrier in `verify_schema`
  test: issue a read barrier before checking ring consistency
  Merge 'scylla_cluster.py: fix read_last_line' from Gusev Petr
  test/pylib: ManagerClient helpers to wait for...
  test: pylib: Add a way to create cql connections with particular coordinators
  test/pylib: get gossiper alive endpoints
  test/topology: default replication factor 3
  test/pylib: configurable replication factor
  scylla_cluster.py: optimize node logs reading
  test/pylib: RandomTables.add_column with value column
  scylla_cluster.py: add start flag to server_add
  ServerInfo: drop host_id
  scylla_cluster.py: add config to server_add
  scylla_cluster.py: add expected_error to server_start
  scylla_cluster.py: ScyllaServer.start, refactor error reporting
  scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed
  test: improve logging in ScyllaCluster
  test: topology smp test with custom cluster
  test/pylib: topology: support clusters of initial size 0
  Merge 'test/pylib: split and refactor topology tests' from Alecco
  Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun
  test: Increase START_TIMEOUT
  test/pylib: one-shot error injection helper
  test: topology: wait for token ring/group 0 consistency after decommission
  test: topology: verify that group 0 and token ring are consistent
  Merge 'pytest: start after ungraceful stop' from Alecco
  Merge 'test.py: improve test failure handling' from Kamil Braun
2023-06-15 07:19:39 +03:00
Pavel Emelyanov
210e3d1999 Backport 'Merge 'Enlighten messaging_service::shutdown()''
This includes seastar update titled
  'Merge 'Split rpc::server stop into two parts''

* br-5.2-backport-ms-shutdown:
  messaging_service: Shutdown rpc server on shutdown
  messaging_service: Generalize stop_servers()
  messaging_service: Restore indentation after previous patch
  messaging_service: Coroutinize stop()
  messaging_service: Coroutinize stop_servers()
  Update seastar submodule

refs: #14031
2023-06-14 09:14:06 +03:00
Pavel Emelyanov
702d622b38 messaging_service: Shutdown rpc server on shutdown
The RPC server now has a lighter .shutdown() method that just does what
m.s. shutdown() needs, so call it. On stop call regular stop to finalize
the stopping process

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:04:04 +03:00
Pavel Emelyanov
db44630254 messaging_service: Generalize stop_servers()
Make it do_with_servers() and make it accept method to call and message
to print. This gives the ability to reuse this helper in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:03:59 +03:00
Pavel Emelyanov
5d3d64bafe messaging_service: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:03:53 +03:00
Pavel Emelyanov
079f5d8eca messaging_service: Coroutinize stop()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:03:48 +03:00
Pavel Emelyanov
fd7310b104 messaging_service: Coroutinize stop_servers()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:03:42 +03:00
Pavel Emelyanov
991d00964d Update seastar submodule
* seastar 8c86e6de...32ab15cd (1):
  > rpc: Introduce server::shutdown()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:02:46 +03:00
Anna Stuchlik
0137ddaec8 doc: remove support for Ubuntu 18
Fixes https://github.com/scylladb/scylladb/issues/14097

This commit removes support for Ubuntu 18 from
platform support for ScyllaDB Enterprise 2023.1.

The update is in sync with the change made for
ScyllaDB 5.2.

This commit must be backported to branch-5.2 and
branch-5.3.

Closes #14118

(cherry picked from commit b7022cd74e)
2023-06-13 12:06:56 +03:00
Raphael S. Carvalho
58f88897c8 compaction: Fix incremental compaction for sstable cleanup
After c7826aa910, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes #14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14038

(cherry picked from commit 23443e0574)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14193
2023-06-13 09:53:46 +03:00
Kamil Braun
f4115528d6 test: pylib: fix read_barrier implementation
The previous implementation didn't actually do a read barrier, because
the statement failed on an early prepare/validate step which happened
before read barrier was even performed.

Change it to a statement which does not fail and doesn't perform any
schema change but requires a read barrier.

This breaks one test which uses `RandomTables.verify_schema()` when only
one node is alive, but `verify_schema` performs a read barrier. Unbreak
it by skipping the read barrier in this case (it makes sense in this
particular test).

Closes #13933

(cherry picked from commit 64dc76db55)
Backport note: skipped the test_snapshot.py change, as the test doesn't
exist on this branch.
2023-06-12 12:40:22 +02:00
Kamil Braun
9c941aba0b test: pylib: random_tables: perform read barrier in verify_schema
`RandomTables.verify_schema` is often called in topology tests after
performing a schema change. It compares the schema tables fetched from
some node to the expected latest schema stored by the `RandomTables`
object.

However there's no guarantee that the latest schema change has already
propagated to the node which we query. We could have performed the
schema change on a different node and the change may not have been
applied yet on all nodes.

To fix that, pick a specific node and perform a read barrier on it, then
use that node to fetch the schema tables.

Fixes #13788

Closes #13789

(cherry picked from commit 3f3dcf451b)
2023-06-12 12:40:22 +02:00
Konstantin Osipov
094bcac399 test: issue a read barrier before checking ring consistency
Raft replication doesn't guarantee that all replicas see
identical Raft state at all times, it only guarantees the
same order of events on all replicas.

When comparing raft state with gossip state on a node, first
issue a read barrier to ensure the node has the latest raft state.

To issue a read barrier it is sufficient to alter a non-existing
state: in order to validate the DDL the node needs to sync with the
leader and fetch its latest group0 state.

Fixes #13518 (flaky topology test).

Closes #13756

(cherry picked from commit e7c9ca560b)
2023-06-12 12:40:22 +02:00
Kamil Braun
e49a531aaa Merge 'scylla_cluster.py: fix read_last_line' from Gusev Petr
This is a follow-up to #13399, the patch
addresses the issues mentioned there:
* linesep can be split between blocks;
* linesep can be part of UTF-8 sequence;
* avoid excessively long lines, limit to 256 chars;
* the logic of the function made simpler and more maintainable.

Closes #13427

* github.com:scylladb/scylladb:
  pylib_test: add tests for read_last_line
  pytest: add pylib_test directory
  scylla_cluster.py: fix read_last_line
  scylla_cluster.py: move read_last_line to util.py

(cherry picked from commit 70f2b09397)
2023-06-12 12:40:22 +02:00
Alejo Sanchez
bcf99a37cd test/pylib: ManagerClient helpers to wait for...
server to see other servers after start/restart

When starting/restarting a server, provide a way to wait for the server
to see at least n other servers.

Also leave the implementation methods available for manual use and
update previous tests, one to wait for a specific server to be seen, and
one to wait for a specific server to not be seen (down).

Fixes #13147

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13438

(cherry picked from commit 11561a73cb)
Backport note: skipped the test_mutation_schema_change.py fix as the
test doesn't exist on this branch.
2023-06-12 12:40:08 +02:00
Tomasz Grabiec
fe4af95745 test: pylib: Add a way to create cql connections with particular coordinators
Usage:

  await manager.driver_connect(server=servers[0])
  manager.cql.execute(f"...", execution_profile='whitelist')

(cherry picked from commit 041ee3ffdd)
2023-06-12 12:38:15 +02:00
Alejo Sanchez
ac5dff7de0 test/pylib: get gossiper alive endpoints
Helper to get list of gossiper alive endpoints from REST API.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
(cherry picked from commit 62a945ccd5)
2023-06-12 12:38:15 +02:00
Alejo Sanchez
ad99456a9d test/topology: default replication factor 3
For most tests there will be nodes down, increase replication factor to
3 to avoid having problems for partitions belonging to down nodes.

Use replication factor 1 for raft upgrade tests.

(cherry picked from commit 08d754e13f)
2023-06-12 12:38:15 +02:00
Alejo Sanchez
937e890fba test/pylib: configurable replication factor
Make replication factor configurable for the RandomTables helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
(cherry picked from commit 3508a4e41e)
2023-06-12 12:38:15 +02:00
Petr Gusev
12eec5bb2b scylla_cluster.py: optimize node logs reading
There are two occasions in scylla_cluster
where we read the node logs, and in both of
them we read the entire file in memory.
This is not efficient and may cause an OOM.

In the first case we need the last line of the
log file, so we seek at the end and move backwards
looking for a new line symbol.

In the second case we look through the
log file to find the expected_error.
The readlines() method returns a Python
list object, which means it reads the entire
file in memory. It's sufficient to just remove
it since iterating over the file instance
already yields lines lazily one by one.

This is a follow-up for #13134.

Closes #13399

(cherry picked from commit 09636b20f3)
2023-06-12 12:38:15 +02:00
Alejo Sanchez
59847389d4 test/pylib: RandomTables.add_column with value column
When adding extra columns in a test, make them value column. Name them
with the "v_" prefix and use the value column number counter.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13271

(cherry picked from commit 81b40c10de)
2023-06-12 12:38:15 +02:00
Petr Gusev
7a8c5db55b scylla_cluster.py: add start flag to server_add
Sometimes when creating a node it's useful
to just install it and not start. For example,
we may want to try to start it later with
expected error.

The ScyllaServer.install method has been made
exception safe, if an exception occurs, it
reverts to the original state. This allows
to not duplicate the try/except logic
in two of its call sites.

(cherry picked from commit e407956e9f)
2023-06-12 12:38:15 +02:00
Petr Gusev
15ea5bf53f ServerInfo: drop host_id
We are going to allow the
ScyllaCluster.add_server function not to
start the server if the caller has requested
that with a special parameter. The host_id
can only be obtained from a running node, so
add_server won't be able to return it in
this case. I've grepped the tests for host_id
and there doesn't seem to be any
reference to it in the code.

(cherry picked from commit 794d0e4000)
2023-06-12 12:38:15 +02:00
Petr Gusev
3ab610753e scylla_cluster.py: add config to server_add
Sometimes when creating a node it's useful
to pass a custom node config.

(cherry picked from commit 8e3392c64f)
2023-06-12 12:38:15 +02:00
Petr Gusev
1959eddf86 scylla_cluster.py: add expected_error to server_start
Sometimes it's useful to check that the node has failed
to start for a particular reason. If server_start can't
find expected_error in the node's log or if the
node has started without errors, it throws an exception.

(cherry picked from commit c1d0ee2bce)
2023-06-12 12:38:15 +02:00
Petr Gusev
43525aec83 scylla_cluster.py: ScyllaServer.start, refactor error reporting
Extract the function that encapsulates all the error
reporting logic. We are going to use it in several
other places to implement expected_error feature.

(cherry picked from commit a4411e9ec4)
2023-06-12 12:38:15 +02:00
Petr Gusev
930c4e65d6 scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed
The ScyllaServer expects cmd to be None if the
Scylla process is not running. Otherwise, if start failed
and the test called update_config, the latter will
try to send a signal to a non-existent process via cmd.

(cherry picked from commit 21b505e67c)
2023-06-12 12:38:15 +02:00
Konstantin Osipov
d2caaef188 test: improve logging in ScyllaCluster
Print IP addresses and cluster identifiers in more log messages,
it helps debugging.

(cherry picked from commit 7309a1bd6b)
2023-06-12 12:38:15 +02:00
Alejo Sanchez
6474edd691 test: topology smp test with custom cluster
Instead of decommission of initial cluster, use custom cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13589

(cherry picked from commit ce87aedd30)
2023-06-12 12:38:15 +02:00
Alejo Sanchez
b39cdadff3 test/pylib: topology: support clusters of initial size 0
To allow tests with custom clusters, allow configuration of initial
cluster size of 0.

Add a proof-of-concept test to be removed later.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13342

(cherry picked from commit e3b462507d)
2023-06-12 12:38:15 +02:00
Nadav Har'El
7b60cddae7 Merge 'test/pylib: split and refactor topology tests' from Alecco
Move long running topology tests out of  `test_topology.py` and into their own files, so they can be run in parallel.

While there, merge simple schema tests.

Closes #12804

* github.com:scylladb/scylladb:
  test/topology: rename topology test file
  test/topology: lint and type for topology tests
  test/topology: move topology ip tests to own file
  test/topology: move topology test remove garbaje...
  test/topology: move topology rejoin test to own file
  test/topology: merge topology schema tests and...
  test/topology: isolate topology smp params test
  test/topology: move topology helpers to common file

(cherry picked from commit a24600a662)
2023-06-12 12:38:15 +02:00
Botond Dénes
ea80fe20ad Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun
Recently we enabled RBNO by default in all topology operations. This
made the operations a bit slower (repair-based topology ops are a bit
slower than classic streaming - they do more work), and in debug mode
with large number of concurrent tests running, they might timeout.

The timeout for bootstrap was already increased before, do the same for
decommission/removenode. The previously used timeout was 300 seconds
(this is the default used by aiohttp library when it makes HTTP
requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which
is 1000 seconds.

Closes #12765

* github.com:scylladb/scylladb:
  test/pylib: use larger timeout for decommission/removenode
  test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT

(cherry picked from commit e55f475db1)
2023-06-12 12:38:15 +02:00
Asias He
f90fe6f312 test: Increase START_TIMEOUT
It is observed that CI machine is slow to run the test. Increase the
timeout of adding servers.

(cherry picked from commit fc60484422)
2023-06-12 12:38:15 +02:00
Alejo Sanchez
6e2c547388 test/pylib: one-shot error injection helper
Existing helper with async context manager only worked for non one-shot
error injections. Fix it and add another helper for one-shot without a
context manager.

Fix tests using the previous helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
(cherry picked from commit 9ceb6aba81)
2023-06-12 12:38:05 +02:00
Kamil Braun
91aa2cd8d7 test: topology: wait for token ring/group 0 consistency after decommission
There was a check for immediate consistency after a decommission
operation has finished in one of the tests, but it turns out that also
after decommission it might take some time for token ring to be updated
on other nodes. Replace the check with a wait.

Also do the wait in another test that performs a sequence of
decommissions. We won't attempt to start another decommission until
every node learns that the previously decommissioned node has left.

Closes #12686

(cherry picked from commit 40142a51d0)
2023-06-12 11:58:32 +02:00
Kamil Braun
05c3f7ecef test: topology: verify that group 0 and token ring are consistent
After topology changes like removing a node, verify that the set of
group 0 members and token ring members is the same.

Modify `get_token_ring_host_ids` to only return NORMAL members. The
previous version which used the `/storage_service/host_id` endpoint
might have returned non-NORMAL members as well.

Fixes: #12153

Closes #12619

(cherry picked from commit fa9cf81af2)
2023-06-12 11:58:02 +02:00
Kamil Braun
3aa73e8b5a Merge 'pytest: start after ungraceful stop' from Alecco
If a server is stopped suddenly (i.e. not graceful), schema tables might
be in inconsistent state. Add a test case and enable Scylla
configuration option (force_schema_commit_log) to handle this.

Fixes #12218

Closes #12630

* github.com:scylladb/scylladb:
  pytest: test start after ungraceful stop
  test.py: enable force_schema_commit_log

(cherry picked from commit 5eadea301e)
2023-06-12 11:57:09 +02:00
Nadav Har'El
a0ba3b3350 Merge 'test.py: improve test failure handling' from Kamil Braun
Improve logging by printing the cluster at the end of each test.

Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure.

Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test.

Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do.

Closes #12652

* github.com:scylladb/scylladb:
  test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
  test/topology: don't drop random_tables keyspace after a failed test
  test/pylib: mark cluster as dirty after a failed test
  test: pylib, topology: don't perform operations after test on a dirty cluster
  test/pylib: print cluster at the end of test

(cherry picked from commit 2653865b34)
2023-06-12 11:47:54 +02:00
Anna Mikhlin
ea08d409f1 release: prepare for 5.2.3 2023-06-08 22:04:50 +03:00
Avi Kivity
f32971b81f Merge 'multishard_mutation_query: make reader_context::lookup_readers() exception safe' from Botond Dénes
With regards to closing the looked-up querier if an exception is thrown. In particular, this requires closing the querier if a semaphore mismatch is detected. Move the table lookup above the line where the querier is looked up, to avoid having to handle the exception from it. As a consequence of closing the querier on the error path, the lookup lambda has to be made a coroutine. This is sad, but this is executed once per page, so its cost should be insignificant when spread over an
entire page worth of work.

Also add a unit test checking that the mismatch is detected in the first place and that readers are closed.

Fixes: #13784

Closes #13790

* github.com:scylladb/scylladb:
  test/boost/database_test: add unit test for semaphore mismatch on range scans
  partition_slice_builder: add set_specific_ranges()
  multishard_mutation_query: make reader_context::lookup_readers() exception safe
  multishard_mutation_query: lookup_readers(): make inner lambda a coroutine

(cherry picked from commit 1c0e8c25ca)
2023-06-08 04:29:51 -04:00
Michał Chojnowski
8872157422 data_dictionary: fix forgetting of UDTs on ALTER KEYSPACE
Due to a simple programming oversight, one of keyspace_metadata
constructors is using empty user_types_metadata instead of the
passed one. Fix that.

Fixes #14139

Closes #14143

(cherry picked from commit 1a521172ec)
2023-06-06 21:52:47 +03:00
Kamil Braun
b5785ed434 auth: don't use infinite timeout in default_role_row_satisfies query
A long long time ago there was an issue about removing infinite timeouts
from distributed queries: #3603. There was also a fix:
620e950fc8. But apparently some queries
escaped the fix, like the one in `default_role_row_satisfies`.

With the right conditions and timing this query may cause a node to hang
indefinitely on shutdown. A node tries to perform this query after it
starts. If we kill another node which is required to serve this query
right before that moment, the query will hang; when we try to shutdown
the querying node, it will wait for the query to finish (it's a
background task in auth service), which it never does due to infinite
timeout.

Use the same timeout configuration as other queries in this module do.

Fixes #13545.

Closes #14134

(cherry picked from commit f51312e580)
2023-06-06 19:39:29 +03:00
Pavel Emelyanov
70f93767fd Update seastar submodule
* seastar 98504c4b...8c86e6de (1):
  > rpc: Wait for server socket to stop before killing conns

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-30 20:10:44 +03:00
Tzach Livyatan
bb3751334c Remove Ubuntu 18.04 support from 5.2
Ubuntu [18.04 will be soon out of standard support](https://ubuntu.com/blog/18-04-end-of-standard-support), and can be removed from 5.2 supported list
https://github.com/scylladb/scylla-pkg/issues/3346

Closes #13529

(cherry picked from commit e655060429)
2023-05-30 16:25:42 +03:00
Beni Peled
9dd70a58c3 release: prepare for 5.2.2 2023-05-18 14:03:20 +03:00
Anna Stuchlik
0bc6694ac5 doc: fix the links to the Enterprise docs
Fixes https://github.com/scylladb/scylladb/issues/13915

This commit fixes broken links to the Enterprise docs.
They are links to the enterprise branch, which is not
published. The links to the Enterprise docs should include
"stable" instead of the branch name.

This commit must be backported to branch-5.2, because
the broken links are present in the published 5.2 docs.

Closes #13917

(cherry picked from commit 6f4a68175b)
2023-05-18 08:40:02 +03:00
Botond Dénes
486483b379 Merge '[Backport 5.2]: node ops backports' from Benny Halevy
This branch backports to branch-5.2 several fixes related to node operations:
- ba919aa88a (PR #12980; Fixes: #11011, #12969)
- 53636167ca (part of PR #12970; Fixes: #12764, #12956)
- 5856e69462 (part of PR #12970)
- 2b44631ded (PR #13028; Fixes: #12989)
- 6373452b31 (PR #12799; Fixes #12798)

Closes #13531

* github.com:scylladb/scylladb:
  Merge 'Do not mask node operation errors' from Benny Halevy
  Merge 'storage_service: Make node operations safer by detecting asymmetric abort' from Tomasz Grabiec
  storage_service: Wait for normal state handler to finish in replace
  storage_service: Wait for normal state handler to finish in bootstrap
  storage_service: Send heartbeat earlier for node ops
2023-05-17 16:46:49 +03:00
Tzach Livyatan
9afaec5b12 Update Azure recommended instances type from the Lsv2-series to the Lsv3-series
Closes #13835

(cherry picked from commit a73fde6888)
2023-05-17 15:41:47 +03:00
Anna Stuchlik
9c99dc36b5 doc: add OS support for version 2023.1
Fixes https://github.com/scylladb/scylladb/issues/13857

This commit adds the OS support for ScyllaDB Enterprise 2023.1.
The support is the same as for ScyllaDB Open Source 5.2, on which
2023.1 is based.

After this commit is merged, it must be backported to branch-5.2.
In this way, it will be merged to branch-2023.1 and available in
the docs for Enterprise 2023.1

Closes: #13858
(cherry picked from commit 84ed95f86f)
2023-05-16 10:11:21 +03:00
Tomasz Grabiec
548a7f73d3 Merge 'range_tombstone_change_generator: fix an edge case in flush()' from Michał Chojnowski
range_tombstone_change_generator::flush() mishandles the case when two range
tombstones are adjacent and flush(pos, end_of_range=true) is called with pos
equal to the end bound of the lesser-position range tombstone.

In such case, the start change of the greater-position rtc will be accidentally
emitted, and there won't be an end change, which breaks reader assumptions by
ending the stream with an unclosed range tombstone, triggering an assertion.

This is due to a non-strict inequality used in a place where strict inequality
should be used. The modified line was intended to close range tombstones
which end exactly on the flush position, but this is unnecessary because such
range tombstones are handled by the last `if` in the function anyway.
Instead, this line caused range tombstones beginning right after the flush
position to be emitted sometimes.

Fixes https://github.com/scylladb/scylladb/issues/12462

Closes #13894

* github.com:scylladb/scylladb:
  tests: row_cache: Add reproducer for reader producing missing closing range tombstone
  range_tombstone_change_generator: fix an edge case in flush()
2023-05-15 23:29:08 +02:00
Raphael S. Carvalho
5c66875dbe sstables: Fix use-after-move when making reader in reverse mode
static report:
sstables/mx/reader.cc:1705:58: error: invalid invocation of method 'operator*' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
            legacy_reverse_slice_to_native_reverse_slice(*schema, slice.get()), pc, std::move(trace_state), fwd, fwd_mr, monitor);

Fixes #13394.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 213eaab246)
2023-05-15 20:27:34 +03:00
Raphael S. Carvalho
26b4d2c3c1 db/view/build_progress_virtual_reader: Fix use-after-move
use-after-free in ctor, which potentially leads to a failure
when locating table from moved schema object.

static report
In file included from db/system_keyspace.cc:51:
./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
                _db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),

Fixes #13395.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 1ecba373d6)
2023-05-15 20:26:01 +03:00
Raphael S. Carvalho
874062b72a index/built_indexes_virtual_reader.hh: Fix use-after-move
static report:
./index/built_indexes_virtual_reader.hh:228:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
                _db.find_column_family(s->ks_name(), system_keyspace::v3::BUILT_VIEWS),

Fixes #13396.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit f8df3c72d4)
2023-05-15 20:24:35 +03:00
Raphael S. Carvalho
71ec750a59 replica: Fix use-after-move in table::make_streaming_reader
Variant used by
streaming/stream_transfer_task.cc:        , reader(cf.make_streaming_reader(cf.schema(), std::move(permit_), prs))

as full slice is retrieved after schema is moved (clang evaluates
left-to-right), the stream transfer task can be potentially working
on a stale slice for a particular set of partitions.

static report:
In file included from replica/dirty_memory_manager.cc:6:
replica/database.hh:706:83: error: invalid invocation of method 'operator->' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
        return make_streaming_reader(std::move(schema), std::move(permit), range, schema->full_slice());

Fixes #13397.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 04932a66d3)
2023-05-15 20:21:48 +03:00
Tomasz Grabiec
7c1bdc6553 tests: row_cache: Add reproducer for reader producing missing closing range tombstone
Adds a reproducer for #12462.

The bug manifests by reader throwing:

  std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}}

The reason is that prior to the fix range_tombstone_change_generator::flush()
was used with end_of_range=true to produce the closing range_tombstone_change
and it did not handle correctly the case when there are two adjacent range
tombstones and flush(pos, end_of_range=true) is called such that pos is the
boundary between the two.

Cherry-picked from a717c803c7.
2023-05-15 18:02:40 +02:00
Michał Chojnowski
24d966f806 range_tombstone_change_generator: fix an edge case in flush()
range_tombstone_change_generator::flush() mishandles the case when two range
tombstones are adjacent and flush(pos, end_of_range=true) is called with pos
equal to the end bound of the lesser-position range tombstone.

In such case, the start change of the greater-position rtc will be accidentally
emitted, and there won't be an end change, which breaks reader assumptions by
ending the stream with an unclosed range tombstone, triggering an assertion.

This is due to a non-strict inequality used in a place where strict inequality
should be used. The modified line was intended to close range tombstones
which end exactly on the flush position, but this is unnecessary because such
range tombstones are handled by the last `if` in the function anyway.
Instead, this line caused range tombstones beginning right after the flush
position to be emitted sometimes.

Fixes #12462
2023-05-15 17:48:24 +02:00
Asias He
05a3a1bf55 tombstone_gc: Fix gc_before for immediate mode
The immediate mode is similar to timeout mode with gc_grace_seconds
zero. Thus, the gc_before returned should be the query_time instead of
gc_clock::time_point::max in immediate mode.

Setting gc_before to gc_clock::time_point::max, a row could be dropped
by compaction even if the ttl is not expired yet.

The following procedure reproduces the issue:

- Start 2 nodes

- Insert data

```
CREATE KEYSPACE ks2a WITH REPLICATION = { 'class' : 'SimpleStrategy',
'replication_factor' : 2 };
CREATE TABLE ks2a.tb (pk int, ck int, c0 text, c1 text, c2 text, PRIMARY
KEY(pk, ck)) WITH tombstone_gc = {'mode': 'immediate'};
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (10 ,1, 'x', 'y', 'z')
USING TTL 1000000;
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (20 ,1, 'x', 'y', 'z')
USING TTL 1000000;
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (30 ,1, 'x', 'y', 'z')
USING TTL 1000000;
```

- Run nodetool flush and nodetool compact

- Compaction drops all data

```
~128 total partitions merged to 0.
```

Fixes #13572

Closes #13800

(cherry picked from commit 7fcc403122)
2023-05-15 10:33:29 +03:00
Takuya ASADA
f148a6be1d scylla_kernel_check: suppress verbose iotune messages
Stop printing verbose iotune messages while the check, just print error
message.

Fixes #13373.

Closes #13362

(cherry picked from commit 160c184d0b)
2023-05-14 21:25:57 +03:00
Benny Halevy
5785550e24 view: view_builder: start: demote sleep_aborted log error
This is not really an error, so print it in debug log_level
rather than error log_level.

Fixes #13374

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13462

(cherry picked from commit cc42f00232)
2023-05-14 21:21:59 +03:00
Avi Kivity
401de17c82 Update seastar submodule (condition_variable tasktrace fix)
* seastar aa46b980ec...98504c4bb6 (1):
  > condition-variable: replace the coroutine wakeup task with a promise

Fixes #13368
2023-05-14 21:12:12 +03:00
Raphael S. Carvalho
94c9553e8a Fix use-after-move when initializing row cache with dummy entry
Courtersy of clang-tidy:
row_cache.cc:1191:28: warning: 'entry' used after it was moved [bugprone-use-after-move]
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:60: note: move occurred here
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:28: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{*_schema});

The use-after-move is UB, as for it to happen, depends on evaluation order.

We haven't hit it yet as clang is left-to-right.

Fixes #13400.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13401

(cherry picked from commit d2d151ae5b)
2023-05-14 21:02:24 +03:00
Anna Mikhlin
f1c45553bc release: prepare for 5.2.1 2023-05-08 22:15:46 +03:00
Botond Dénes
1a288e0a78 Update seastar submodule
* seastar 1488aaf8...aa46b980 (1):
  > core/on_internal_error: always log error with backtrace

Fixes: #13786
2023-05-08 10:30:10 +03:00
Marcin Maliszkiewicz
a2fed1588e db: view: use deferred_close for closing staging_sstable_reader
When consume_in_thread throws the reader should still be closed.

Related https://github.com/scylladb/scylla-enterprise/issues/2661

Closes #13398
Refs: scylladb/scylla-enterprise#2661
Fixes: #13413

(cherry picked from commit 99f8d7dcbe)
2023-05-08 09:41:07 +03:00
Botond Dénes
f07a06d390 Merge 'service:forward_service: use long type instead of counter in function mocking' from Michał Jadwiszczak
Aggregation query on counter column is failing because forward_service is looking for function with counter as an argument and such function doesn't exist. Instead the long type should be used.

Fixes: #12939

Closes #12963

* github.com:scylladb/scylladb:
  test:boost: counter column parallelized aggregation test
  service:forward_service: use long type when column is counter

(cherry picked from commit 61e67b865a)
2023-05-07 14:27:29 +03:00
Anna Stuchlik
4ec531d807 doc: remove the sequential repair option from docs
Fixes https://github.com/scylladb/scylladb/issues/12132

The sequential repair mode is not supported. This commit
removes the incorrect information from the documentation.

Closes #13544

(cherry picked from commit 3d25edf539)
2023-05-07 14:27:29 +03:00
Asias He
4867683f80 storage_service: Fix removing replace node as pending
Consider

- n1, n2, n3
- n3 is down
- n4 replaces n3 with the same ip address 127.0.0.3
- Inside the storage_service::handle_state_normal callback for 127.0.0.3 on n1/n2

  ```
  auto host_id = _gossiper.get_host_id(endpoint);
  auto existing = tmptr->get_endpoint_for_host_id(host_id);
  ```

  host_id = new host id
  existing = empty

  As a result, del_replacing_endpoint() will not be called.

This means 127.0.0.3 will not be removed as a pending node on n1 and n2 when
replacing is done. This is wrong.

This is a regression since commit 9942c60d93
(storage_service: do not inherit the host_id of a replaced a node), where
replacing node uses a new host id than the node to be replaced.

To fix, call del_replacing_endpoint() when a node becomes NORMAL and existing
is empty.

Before:
n1:
storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3
token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Marked ops done from coordinator=127.0.0.3
storage_service - Node 127.0.0.3 state jump to normal
storage_service - Set host_id=6f9ba4e8-9457-4c76-8e2a-e2be257fe123 to be owned by node=127.0.0.3

After:
n1:
storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3
token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Marked ops done from coordinator=127.0.0.3
storage_service - Node 127.0.0.3 state jump to normal
token_metadata - Removed node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3
storage_service - Set host_id=72219180-e3d1-4752-b644-5c896e4c2fed to be owned by node=127.0.0.3

Tests: https://github.com/scylladb/scylla-dtest/pull/3126

Closes #13677

Fixes: https://github.com/scylladb/scylla-enterprise/issues/2852

(cherry picked from commit a8040306bb)
2023-05-03 14:15:13 +03:00
Botond Dénes
0e42defe06 readers: evictable_reader: skip progress guarantee when next pos is partition start
The evictable reader must ensure that each buffer fill makes forward
progress, i.e. the last fragment in the buffer has a position larger
than the last fragment from the last buffer-fill. Otherwise, the reader
could get stuck in an infinite loop between buffer fills, if the reader
is evicted in-between.
The code guranteeing this forward change has a bug: when the next
expected position is a partition-start (another partition), the code
would loop forever, effectively reading all there is from the underlying
reader.
To avoid this, add a special case to ignore the progress guarantee loop
altogether when the next expected position is a partition start. In this
case, progress is garanteed anyway, because there is exactly one
partition-start fragment in each partition.

Fixes: #13491

Closes #13563

(cherry picked from commit 72003dc35c)
2023-05-02 21:58:41 +03:00
Avi Kivity
f73d017f05 tools: toolchain: regenerate
Fixes #13744
2023-05-02 13:16:59 +03:00
Pavel Emelyanov
3723678b82 scylla-gdb: Parse and eval _all_threads without quotes
I've no idea why the quotes are there at all, it works even without
them. However, with quotes gdb-13 fails to find the _all_threads static
thread-local variable _unless_ it's printed with gdb "p" command
beforehand.

fixes: #13125

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13132

(cherry picked from commit 537510f7d2)
2023-05-02 13:16:59 +03:00
Botond Dénes
ea506f50cc Merge 'Do not mask node operation errors' from Benny Halevy
This series handles errors when aborting node operations and prints them rather letting them leak and be exposed to the user.

Also, cleanup the node_ops logging formats when aborting different node ops
and add more error logging around errors in the "worker" nodes.

Closes #12799

* github.com:scylladb/scylladb:
  storage_service: node_ops_signal_abort: print a warning when signaling abort
  storage_service: s/node_ops_singal_abort/node_ops_signal_abort/
  storage_service: node_ops_abort: add log messages
  storage_service: wire node_ops_ctl for node operations
  storage_service: add node_ops_ctl class to formalize all node_ops flow
  repair: node_ops_cmd_request: add print function
  repair: do_decommission_removenode_with_repair: log ignore_nodes
  repair: replace_with_repair: get ignore_nodes as unordered_set
  gossiper: get_generation_for_nodes: get nodes as unordered_set
  storage_service: don't let node_ops abort failures mask the real error

(cherry picked from commit 6373452b31)
2023-04-30 18:58:28 +03:00
Kamil Braun
42fd3704e4 Merge 'storage_service: Make node operations safer by detecting asymmetric abort' from Tomasz Grabiec
This patch fixes a problem which affects decommission and removenode
which may lead to data consistency problems under conditions which
lead one of the nodes to unliaterally decide to abort the node
operation without the coordinator noticing.

If this happens during streaming, the node operation coordinator would
proceed to make a change in the gossiper, and only later dectect that
one of the nodes aborted during sending of decommission_done or
removenode_done command. That's too late, because the operation will
be finalized by all the nodes once gossip propagates.

It's unsafe to finalize the operation while another node aborted. The
other node reverted to the old topolgy, with which they were running
for some time, without considering the pending replica when handling
requests. As a result, we may end up with consistency issues. Writes
made by those coordinators may not be replicated to CL replicas in the
new topology. Streaming may have missed to replicate those writes
depending on timing.

It's possible that some node aborts but streaming succeeds if the
abort is not due to network problems, or if the network problems are
transient and/or localized and affect only heartbeats.

There is no way to revert after we commit the node operation to the
gossiper, so it's ok to close node_ops sessions before making the
change to the gossiper, and thus detect aborts and prevent later aborts
after the change in the gossiper is made. This is already done during
bootstrap (RBNO enabled) and replacenode. This patch canges removenode
to also take this approach by moving sending of remove_done earlier.

We cannot take this approach with decommission easily, because
decommission_done command includes a wait for the node to leave the
ring, which won't happen before the change to the gossiper is
made. Separating this from decommission_done would require protocol
changes. This patch adds a second-best solution, which is to check if
sessions are still there right before making a change to the gossiper,
leaving decommission_done where it was.

The race can still happen, but the time window is now much smaller.

The PR also lays down infrastructure which enables testing the scenarios. It makes node ops
watchdog periods configurable, and adds error injections.

Fixes #12989
Refs #12969

Closes #13028

* github.com:scylladb/scylladb:
  storage_service: node ops: Extract node_ops_insert() to reduce code duplication
  storage_service: Make node operations safer by detecting asymmetric abort
  storage_service: node ops: Add error injections
  service: node_ops: Make watchdog and heartbeat intervals configurable

(cherry picked from commit 2b44631ded)
2023-04-30 18:58:28 +03:00
Asias He
c9d19b3595 storage_service: Wait for normal state handler to finish in replace
Similar to "storage_service: Wait for normal state handler to finish in
bootstrap", this patch enables the check on the replace procedure.

(cherry picked from commit 5856e69462)
2023-04-30 18:58:28 +03:00
Asias He
9a873bf4b3 storage_service: Wait for normal state handler to finish in bootstrap
In storage_service::handle_state_normal, storage_service::notify_joined
will be called which drops the rpc connections to the node becomes
normal. This causes rpc calls with that node fail with
seastar::rpc::closed_error error.

Consider this:

- n1 in the cluster
- n2 is added to join the cluster
- n2 sees n1 is in normal status
- n2 starts bootstrap process
- notify_joined on n2 closes rpc connection to n1 in the middle of
  bootstrap
- n2 fails to bootstrap

For example, during bootstrap with RBNO, we saw repair failed in a
test that sets ring_delay to zero and does not wait for gossip to
settle.

repair - repair[9cd0dbf8-4bca-48fc-9b1c-d9e80d0313a2]: sync data for
keyspace=system_distributed_everywhere, status=failed:
std::runtime_error ({shard 0: seastar::rpc::closed_error (connection is
closed)})

This patch fixes the race by waiting for the handle_state_normal handler
to finish before the bootstrap process.

Fixes #12764
Fixes #12956

(cherry picked from commit 53636167ca)
2023-04-30 18:58:28 +03:00
Asias He
51a00280a2 storage_service: Send heartbeat earlier for node ops
Node ops has the following procedure:

1   for node in sync_nodes
      send prepare cmd to node

2   for node in sync_nodes
      send heartbeat cmd to node

If any of the prepare cmd in step 1 takes longer than the heartbeat
watchdog timeout, the heartbeat in step 2 will be too late to update the
watchdog, as a result the watchdog will abort the operation.

To prevent slow prepare cmd kills the node operations, we can start the
heartbeat earlier in the procedure.

Fixes #11011
Fixes #12969

Closes #12980

(cherry picked from commit ba919aa88a)
2023-04-30 18:58:28 +03:00
Wojciech Mitros
b0a7c02e09 rust: update dependencies
Cranelift-codegen 0.92.0 and wasmtime 5.0.0 have security issues
potentially allowing malicious UDFs to read some memory outside
the wasm sandbox. This patch updates them to versions 0.92.1
and 5.0.1 respectively, where the issues are fixed.

Fixes #13157

Closes #13171

(cherry picked from commit aad2afd417)
2023-04-27 22:01:44 +03:00
Wojciech Mitros
f18c49dcc6 rust: update dependencies
Wasmtime added some improvements in recent releases - particularly,
two security issues were patched in version 2.0.2. There were no
breaking changes for our use other than the strategy of returning
Traps - all of them are now anyhow::Errors instead, but we can
still downcast to them, and read the corresponding error message.

The cxx, anyhow and futures dependency versions now match the
versions saved in the Cargo.lock.

Closes #12830

(cherry picked from commit 8b756cb73f)

Ref #13157
2023-04-27 22:00:54 +03:00
Anna Stuchlik
35dfec78d1 doc: fixes https://github.com/scylladb/scylladb/issues/12964, removes the information that the CDC options are experimental
Closes #12973

(cherry picked from commit 4dd1659d0b)
2023-04-27 21:06:49 +03:00
Raphael S. Carvalho
dbd8ca4ade replica: Fix undefined behavior in table::generate_and_propagate_view_updates()
Undefined behavior because the evaluation order is undefined.

With GCC, where evaluation is right-to-left, schema will be moved
once it's forwarded to make_flat_mutation_reader_from_mutations_v2().

The consequence is that memory tracking of mutation_fragment_v2
(for tracking only permit used by view update), which uses the schema,
can be incorrect. However, it's more likely that Scylla will crash
when estimating memory usage for row, which access schema column
information using schema::column_at(), which in turn asserts that
the requested column does really exist.

Fixes #13093.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13092

(cherry picked from commit 3fae46203d)
2023-04-27 19:56:38 +03:00
Anna Stuchlik
1be4afb842 doc: remove incorrect info about BYPASS CACHE
Fixes https://github.com/scylladb/scylladb/issues/13106

This commit removes the information that BYPASS CACHE
is an Enterprise-only feature and replaces that info
with the link to the BYPASS CACHE description.

Closes #13316

(cherry picked from commit 1cfea1f13c)
2023-04-27 19:54:04 +03:00
Kefu Chai
7cc9f5a05f dist/redhat: enforce dependency on %{release} also
* tools/python3 279b6c1...cf7030a (1):
  > dist: redhat: provide only a single version

s/%{version}/%{version}-%{release}/ in `Requires:` sections.

this enforces the runtime dependencies of exactly the same
releases between scylla packages.

Fixes #13222
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 7165551fd7)
2023-04-27 19:27:34 +03:00
Nadav Har'El
bf7fc9709d test/rest_api: fix flaky test for toppartitions
The REST test test_storage_service.py::test_toppartitions_pk_needs_escaping
was flaky. It tests the toppartition request, which unfortunately needs
to choose a sampling duration in advance, and we chose 1 second which we
considered more than enough - and indeed typically even 1ms is enough!
but very rarely (only know of only one occurance, in issue #13223) one
second is not enough.

Instead of increasing this 1 second and making this test even slower,
this patch takes a retry approach: The tests starts with a 0.01 second
duration, and is then retried with increasing durations until it succeeds
or a 5-seconds duration is reached. This retry approach has two benefits:
1. It de-flakes the test (allowing a very slow test to take 5 seconds
instead of 1 seconds which wasn't enough), and 2. At the same time it
makes a successful test much faster (it used to always take a full
second, now it takes 0.07 seconds on a dev build on my laptop).

A *failed* test may, in some cases, take 10 seconds after this patch
(although in some other cases, an error will be caught immediately),
but I consider this acceptable - this test should pass, after all,
and a failure indicates a regression and taking 10 seconds will be
the last of our worries in that case.

Fixes #13223.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13238

(cherry picked from commit c550e681d7)
2023-04-27 19:16:58 +03:00
Nadav Har'El
00a8c3a433 test/alternator: increase CQL connection timeout
This patch increases the connection timeout in the get_cql_cluster()
function in test/cql-pytest/run.py. This function is used to test
that Scylla came up, and also test/alternator/run uses it to set
up the authentication - which can only be done through CQL.

The Python driver has 2-second and 5-second default timeouts that should
have been more than enough for everybody (TM), but in #13239 we saw
that in one case it apparently wasn't enough. So to be extra safe,
let's increase the default connection-related timeouts to 60 seconds.

Note this change only affects the Scylla *boot* in the test/*/run
scripts, and it does not affect the actual tests - those have different
code to connect to Scylla (see cql_session() in test/cql-pytest/util.py),
and we already increased the timeouts there in #11289.

Fixes #13239

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13291

(cherry picked from commit 4fdcee8415)
2023-04-27 19:15:39 +03:00
Tomasz Grabiec
c08ed39a33 direct_failure_detector: Avoid throwing exceptions in the success path
sleep_abortable() is aborted on success, which causes sleep_aborted
exception to be thrown. This causes scylla to throw every 100ms for
each pinged node. Throwing may reduce performance if happens often.

Also, it spams the logs if --logger-log-level exception=trace is enabled.

Avoid by swallowing the exception on cancellation.

Fixes #13278.

Closes #13279

(cherry picked from commit 99cb948eac)
2023-04-27 19:14:31 +03:00
Kefu Chai
04424f8956 test: cql-pytest: test_describe: clamp bloom filter's fp rate
before this change, we use `round(random.random(), 5)` for
the value of `bloom_filter_fp_chance` config option. there are
chances that this expression could return a number lower or equal
to 6.71e-05.

but we do have a minimal for this option, which is defined by
`utils::bloom_calculations::probs`. and the minimal false positive
rate is 6.71e-05.

we are observing test failures where the we are using 0 for
the option, and scylla right rejected it with the error message of
```
bloom_filter_fp_chance must be larger than 6.71e-05 and less than or equal to 1.0 (got 0)
```.

so, in this change, to address the test failure, we always use a number
slightly greater or equal to a number slightly greater to the minimum to
ensure that the randomly picked number is in the range of supported
false positive rate.

Fixes #13313
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13314

(cherry picked from commit 33f4012eeb)
2023-04-27 19:12:53 +03:00
Beni Peled
429b696bbc release: prepare for 5.2.0 2023-04-27 16:26:43 +03:00
Beni Peled
a89867d8c2 release: prepare for 5.2.0-rc5 2023-04-25 14:37:54 +03:00
Benny Halevy
6ad94fedf3 utils: clear_gently: do not clear null unique_ptr
Otherwise the null pointer is dereferenced.

Add a unit test reproducing the issue
and testing this fix.

Fixes #13636

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 12877ad026)
2023-04-24 17:51:01 +03:00
Anna Stuchlik
a6188d6abc doc: document tombstone_gc as not experimental
The tombstone_gc was documented as experimental in version 5.0.
It is no longer experimental in version 5.2.
This commit updates the information about the option.

Closes #13469

(cherry picked from commit a68b976c91)
2023-04-24 11:54:06 +03:00
Botond Dénes
50095cc3a5 Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun
in `make_group0_history_state_id_mutation`, when adding a new entry to
the group 0 history table, if the parameter `gc_older_than` is engaged,
we create a range tombstone in the mutation which deletes entries older
than the new one by `gc_older_than`. In particular if
`gc_older_than = 0`, we want to delete all older entries.

There was a subtle bug there: we were using millisecond resolution when
generating the tombstone, while the provided state IDs used microsecond
resolution. On a super fast machine it could happen that we managed to
perform two schema changes in a single millisecond; this happened
sometimes in `group0_test.test_group0_history_clearing_old_entries`
on our new CI/promotion machines, causing the test to fail because the
tombstone didn't clear the entry correspodning to the previous schema
change when performing the next schema change (since they happened in
the same millisecond).

Use microsecond resolution to fix that. The consecutive state IDs used
in group 0 mutations are guaranteed to be strictly monotonic at
microsecond resolution (see `generate_group0_state_id` in
service/raft/raft_group0_client.cc).

Fixes #13594

Closes #13604

* github.com:scylladb/scylladb:
  db: system_keyspace: use microsecond resolution for group0_history range tombstone
  utils: UUID_gen: accept decimicroseconds in min_time_UUID

(cherry picked from commit 10c1f1dc80)
2023-04-23 16:03:02 +03:00
Botond Dénes
7b2215d8e0 Merge 'Backport bugfixes regarding UDT, UDF, UDA interactions to branch-5.2' from Wojciech Mitros
This patch backports https://github.com/scylladb/scylladb/pull/12710 to branch-5.2. To resolve the conflicts that it's causing, it also includes
* https://github.com/scylladb/scylladb/pull/12680
* https://github.com/scylladb/scylladb/pull/12681

Closes #13542

* github.com:scylladb/scylladb:
  uda: change the UDF used in a UDA if it's replaced
  functions: add helper same_signature method
  uda: return aggregate functions as shared pointers
  udf: also check reducefunc to confirm that a UDF is not used in a UDA
  udf: fix dropping UDFs that share names with other UDFs used in UDAs
  pytest: add optional argument for new_function argument types
  udt: disallow dropping a user type used in a user function
2023-04-19 01:38:08 -04:00
Botond Dénes
da9f90362d Merge 'Compaction reevaluation bug fixes' from Raphael "Raph" Carvalho
A problem in compaction reevaluation can cause the SSTable set to be left uncompacted for indefinite amount of time, potentially causing space and read amplification to be suboptimal.

Two revaluation problems are being fixed, one after off-strategy compaction ended, and another in compaction manager which intends to periodically reevaluate a need for compaction.

Fixes https://github.com/scylladb/scylladb/issues/13429.
Fixes https://github.com/scylladb/scylladb/issues/13430.

Closes #13431

* github.com:scylladb/scylladb:
  compaction: Make compaction reevaluation actually periodic
  replica: Reevaluate regular compaction on off-strategy completion

(cherry picked from commit 9a02315c6b)
2023-04-19 01:14:33 -04:00
Botond Dénes
c9a17c80f6 mutation/mutation_compactor: consume_partition_end(): reset _stop
The purpose of `_stop` is to remember whether the consumption of the
last partition was interrupted or it was consumed fully. In the former
case, the compactor allows retreiving the compaction state for the given
partition, so that its compaction can be resumed at a later point in
time.
Currently, `_stop` is set to `stop_iteration::yes` whenever the return
value of any of the `consume()` methods is also `stop_iteration::yes`.
Meaning, if the consuming of the partition is interrupted, this is
remembered in `_stop`.
However, a partition whose consumption was interrupted is not always
continued later. Sometimes consumption of a partitions is interrputed
because the partition is not interesting and the downstream consumer
wants to stop it. In these cases the compactor should not return an
engagned optional from `detach_state()`, because there is not state to
detach, the state should be thrown away. This was incorrectly handled so
far and is fixed in this patch, but overwriting `_stop` in
`consume_partition_end()` with whatever the downstream consumer returns.
Meaning if they want to skip the partition, then `_stop` is reset to
`stop_partition::no` and `detach_state()` will return a disengaged
optional as it should in this case.

Fixes: #12629

Closes #13365

(cherry picked from commit bae62f899d)
2023-04-18 02:32:24 -04:00
Wojciech Mitros
7242c42089 uda: change the UDF used in a UDA if it's replaced
Currently, if a UDA uses a UDF that's being replaced,
the UDA will still keep using the old UDF until the
node is restarted.
This patch fixes this behavior by checking all UDAs
when replacing a UDF and updating them if necessary.

Fixes #12709

(cherry picked from commit 02bfac0c66)
2023-04-17 13:14:46 +02:00
Wojciech Mitros
70ff69afab functions: add helper same_signature method
When deciding whether two functions have the same
signature, we have to check if they have the same name
and parameter types. Additionally, if they're represented
by pointers, we need to check if any of them is a nullptr.
This logic is used multiple times, so it's extracted to
a separate function.
To use this function, the `used_by_user_aggregate` method
takes now a function instead of name and types list - we
can do it because we always use it with an existing user
function (that we're trying to drop).
The method will also be useful when we'll be not dropping,
but replacing a user function.

(cherry picked from commit 58987215dc)
2023-04-17 13:14:40 +02:00
Wojciech Mitros
5fd4bb853b uda: return aggregate functions as shared pointers
We will want to reuse the functions that we get from an aggregate
without making a deep copy, and it's only possible if we get
pointers from the aggregate instead of actual values.

(cherry picked from commit 20069372e7)
2023-04-17 13:14:24 +02:00
Wojciech Mitros
313649e86d udf: also check reducefunc to confirm that a UDF is not used in a UDA
When dropping a UDF we're checking if it's not begin used in any UDAs
and fail otherwise. However, we're only checking its state function
and final function, and it may also be used as its reduce function.
This patch adds the missing checks and a test for them.

(cherry picked from commit ef1dac813b)
2023-04-17 13:14:16 +02:00
Wojciech Mitros
14d8cec130 udf: fix dropping UDFs that share names with other UDFs used in UDAs
Currently, when dropping a function, we only check if there exist
an aggregate that uses a function with the same name as its state
function or final function. This may cause the drop to fail even
when it's just another UDF with the same name that's used in the
aggregate, even when the actual dropped function is not used there.
This patch fixes this by checking whether not only the name of the
UDA's sfunc and finalfunc, but also their argument types.

(cherry picked from commit 49077dd144)
2023-04-17 13:14:09 +02:00
Wojciech Mitros
203cbb79a1 pytest: add optional argument for new_function argument types
When multiple functions with the same name but different argument types
are created, the default drop statement for these functions will fail
because it does not include the argument types.
With this change, this problem can be worked around by specifying
argument types when creating the function, as this will cause the drop
statement to include them.

(cherry picked from commit 8791b0faf5)
2023-04-17 13:13:59 +02:00
Wojciech Mitros
51f19d1b8c udt: disallow dropping a user type used in a user function
Currently, nothing prevents us from dropping a user type
used in a user function, even though doing so may make us
unable to use the function correctly.
This patch prevents this behavior by checking all function
argument and return types when executing a drop type statement
and preventing it from completing if the type is referenced
by any of them.

(cherry picked from commit 86c61828e6)
2023-04-17 13:13:35 +02:00
Anna Stuchlik
83735ae77f doc: update the metrics between 5.2 and 2023.1
Related: https://github.com/scylladb/scylla-enterprise/issues/2794

This commit adds the information about the metric changes
in version 2023.1 compared to version 5.2.

This commit is part of the 5.2-to-2023.1 upgrade guide and
must be backported to branch-5.2.

Closes #13506

(cherry picked from commit 989a75b2f7)
2023-04-17 11:29:43 +02:00
Avi Kivity
9d384e3af2 Merge 'Backport "reader_concurrency_semaphore: don't evict inactive readers needlessly" to branch-5.2' from Botond Dénes
The patch doesn't apply cleanly, so a targeted backport PR was necessary.
I also needed to cherry-pick two patches from https://github.com/scylladb/scylladb/pull/13255 that the backported patch depends on. Decided against backporting the entire https://github.com/scylladb/scylladb/pull/13255 as it is quite an intrusive change.

Fixes: https://github.com/scylladb/scylladb/issues/11803

Closes #13515

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: don't evict inactive readers needlessly
  reader_concurrency_semaphore: add stats to record reason for queueing permits
  reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
2023-04-17 12:25:21 +03:00
Nadav Har'El
0da0c94f49 cql: USING TTL 0 means unlimited, not default TTL
Our documentation states that writing an item with "USING TTL 0" means it
should never expire. This should be true even if the table has a default
TTL. But Scylla mistakenly handled "USING TTL 0" exactly like having no
USING TTL at all (i.e., it took the default TTL, instead of unlimited).
We had two xfailing tests demonstrating that Scylla's behavior in this
is different from Cassandra. Scylla's behavior in this case was also
undocumented.

By the way, Cassandra used to have the same bug (CASSANDRA-11207) but
it was fixed already in 2016 (Cassandra 3.6).

So in this patch we fix Scylla's "USING TTL 0" behavior to match the
documentation and Cassandra's behavior since 2016. One xfailing test
starts to pass and the second test passes this bug and fails on a
different one. This patch also adds a third test for "USING TTL ?"
with UNSET_VALUE - it behaves, on both Scylla and Cassandra, like a
missing "USING TTL".

The origin of this bug was that after parsing the statement, we saved
the USING TTL in an integer, and used 0 for the case of no USING TTL
given. This meant that we couldn't tell if we have USING TTL 0 or
no USING TTL at all. This patch uses an std::optional so we can tell
the case of a missing USING TTL from the case of USING TTL 0.

Fixes #6447

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13079

(cherry picked from commit a4a318f394)
2023-04-17 10:41:08 +03:00
Nadav Har'El
1a9f51b767 cql: fix empty aggregation, and add more tests
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c2be)
2023-04-17 10:41:08 +03:00
Raphael S. Carvalho
dba0e604a7 table: Fix disk-space related metrics
total disk space used metric is incorrectly telling the amount of
disk space ever used, which is wrong. It should tell the size of
all sstables being used + the ones waiting to be deleted.
live disk space used, by this defition, shouldn't account the
ones waiting to be deleted.
and live sstable count, shouldn't account sstables waiting to
be deleted.

Fix all that.

Fixes #12717.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 529a1239a9)
2023-04-16 22:14:01 +03:00
Michał Chojnowski
4ea67940cb locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges()
Some callees of update_pending_ranges use the variant of get_address_ranges()
which builds a hashmap of all <endpoint, owned range> pairs. For
everywhere_topology, the size of this map is quadratic in the number of
endpoints, making it big enough to cause contiguous allocations of tens of MiB
for clusters of realistic size, potentially causing trouble for the
allocator (as seen e.g. in #12724). This deserves a correction.

This patch removes the quadratic variant of get_address_ranges() and replaces
its uses with its linear counterpart.

Refs #10337
Refs #10817
Refs #10836
Refs #10837
Fixes #12724

(cherry picked from commit 9e57b21e0c)
2023-04-16 21:59:14 +03:00
Jan Ciolek
a8c49c44e5 cql/query_options: add a check for missing bind marker name
There was a missing check in validation of named
bind markers.

Let's say that a user prepares a query like:
```cql
INSERT INTO ks.tab (pk, ck, v) VALUES (:pk, :ck, :v)
```
Then they execute the query, but specify only
values for `:pk` and `:ck`.

We should detect that a value for :v is missing
and throw an invalid_request_exception.

Until now there was no such check, in case of a missing variable
invalid `query_options` were created and Scylla crashed.

Sadly it's impossible to create a regression test
using `cql-pytest` or `boost`.

`cql-pytest` uses the python driver, which silently
ignores mising named bind variables, deciding
that the user meant to send an UNSET_VALUE for them.
When given values like `{'pk': 1, 'ck': 2}`, it will automaticaly
extend them to `{'pk': 1, 'ck': 2, 'v': UNSET_VALUE}`.

In `boost` I tried to use `cql_test_env`,
but it only has methods which take valid `query_options`
as a parameter. I could create a separate unit tests
for the creation and validation of `query_options`
but it won't be a true end-to-end test like `cql-pytest`.

The bug was found using the rust driver,
the reproducer is available in the issue description.

Fixes: #12727

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #12730

(cherry picked from commit 2a5ed115ca)
2023-04-16 21:57:28 +03:00
Nadav Har'El
12a29edf90 test/alternator: fix flaky test for partition-tombstone scan
The test test_scan.py::test_scan_long_partition_tombstone_string
checks that a full-table Scan operation ends a page in the middle of
a very long string of partition tombstones, and does NOT scan the
entire table in one page (if we did that, getting a single page could
take an unbounded amount of time).

The test is currently flaky, having failed in CI runs three times in
the past two months.

The reason for the flakiness is that we don't know exactly how long
we need to make the sequence of partition tombstones in the test before
we can be absolutely sure a single page will not read this entire sequence.
For single-partition scans we have the "query_tombstone_page_limit"
configuration parameter, which tells us exactly how long we need to
make the sequence of row tombstones. But for a full-table scan of
partition tombstones, the situation is more complicated - because the
scan is done in parallel on several vnodes in parallel and each of
them needs to read query_tombstone_page_limit before it stops.

In my experiments, using query_tombstone_limit * 4 consecutive tombstones
was always enough - I ran this test hundreds of times and it didn't fail
once. But since it did fail on Jenkins very rarely (3 times in the last
two months), maybe the multiplier 4 isn't enough. So this patch doubles
it to 8. Hopefully this would be enough for anyone (TM).

This makes this test even bigger and slower than it was. To make it
faster, I changed this test's write isolation mode from the default
always_use_lwt to forbid_rmw (not use LWT). This leaves the test's
total run time to be similar to what it was before this patch - around
0.5 seconds in dev build mode on my laptop.

Fixes #12817

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12819

(cherry picked from commit 14cdd034ee)
2023-04-14 11:54:45 +03:00
Botond Dénes
3e10c3fc89 reader_concurrency_semaphore: don't evict inactive readers needlessly
Inactive readers should only be evicted to free up resources for waiting
readers. Evicting them when waiters are not admitted for any other
reason than resources is wasteful and leads to extra load later on when
these evicted readers have to be recreated end requeued.
This patch changes the logic on both the registering path and the
admission path to not evict inactive readers unless there are readers
actually waiting on resources.
A unit-test is also added, reproducing the overly-agressive eviction and
checking that it doesn't happen anymore.

Fixes: #11803

Closes #13286

(cherry picked from commit bd57471e54)
2023-04-14 10:37:30 +03:00
Botond Dénes
f11deb5074 reader_concurrency_semaphore: add stats to record reason for queueing permits
When diagnosing problems, knowing why permits were queued is very
valuable. Record the reason in a new stats, one for each reason a permit
can be queued.

(cherry picked from commit 7b701ac52e)
2023-04-14 10:37:30 +03:00
Botond Dénes
1baf9dddd7 reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
So caller can bump the appropriate counters or log the reason why the
the request cannot be admitted.

(cherry picked from commit bb00405818)
2023-04-14 09:30:02 +03:00
Kamil Braun
9717ff5057 docs: cleaning up after failed membership change
After a failed topology operation, like bootstrap / decommission /
removenode, the cluster might contain a garbage entry in either token
ring or group 0. This entry can be cleaned-up by executing removenode on
any other node, pointing to the node that failed to bootstrap or leave
the cluster.

Document this procedure, including a method of finding the host ID of a
garbage entry.

Add references in other documents.

Fixes: #13122

Closes #13186

(cherry picked from commit c2a2996c2b)
2023-04-13 10:35:02 +02:00
Anna Stuchlik
b293b1446f doc: remove Enterprise upgrade guides from OSS doc
This commit removes the Enterprise upgrade guides from
the Open Source documentation. The Enterprise upgrade guides
should only be available in the Enterprise documentation,
with the source files stored in scylla-enterprise.git.

In addition, this commit:
- adds the links to the Enterprise user guides in the Enterprise
documentation at https://enterprise.docs.scylladb.com/
- adds the redirections for the removed pages to avoid
breaking any links.

This commit must be reverted in scylla-enterprise.git.

(cherry picked from commit 61bc05ae49)

Closes #13473
2023-04-11 14:26:35 +03:00
Yaron Kaikov
e6f7ac17f6 doc: update supported os for 2022.1
ubuntu22.04 is already supported on both `5.0` and `2022.1`

updating the table

Closes #13340

(cherry picked from commit c80ab78741)
2023-04-05 13:56:07 +03:00
Anna Stuchlik
36619fc7d9 doc: add upgrade guide from 5.2 to 2023.1
Related: https://github.com/scylladb/scylla-enterprise/issues/2770

This commit adds the upgrade guide from ScyllaDB Open Source 5.2
to ScyllaDB Enterprise 2023.1.
This commit does not cover metric updates (the metrics file has no
content, which needs to be added in another PR).

As this is an upgrade guide, this commit must be merged to master and
backported to branch-5.2 and branch-2023.1 in scylla-enterprise.git.

Closes #13294

(cherry picked from commit 595325c11b)
2023-04-05 06:43:01 +03:00
Anna Stuchlik
750414c196 doc: update Raft doc for versions 5.2 and 2023.1
Fixes https://github.com/scylladb/scylladb/issues/13345
Fixes https://github.com/scylladb/scylladb/issues/13421

This commit updates the Raft documentation page to be up to date in versions 5.2 and 2023.1.

- Irrelevant information about previous releases is removed.
- Some information is clarified.
- Mentions of version 5.2 are either removed (if possible) or version 2023.1 is added.

Closes #13426

(cherry picked from commit 447ce58da5)
2023-04-05 06:42:28 +03:00
Botond Dénes
128050e984 Merge 'commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off' from Calle Wilund
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken comparisons in delete_segments.

Closes #12950

* github.com:scylladb/scylladb:
  commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
  commitlog: change type of stored size

(cherry picked from commit e70be47276)
2023-04-03 08:57:43 +03:00
Yaron Kaikov
d70751fee3 release: prepare for 5.2.0-rc4 2023-04-02 16:40:56 +03:00
Tzach Livyatan
1fba43c317 docs: minor improvments to the Raft Handling Failures and recovery procedure sections
Closes #13292

(cherry picked from commit 46e6c639d9)
2023-03-31 11:22:20 +02:00
Botond Dénes
e380c24c69 Merge 'Improve database shutdown verbosity' from Pavel Emelyanov
The `database::stop` method is sometimes hanging and it's always hard to spot where exactly it sleeps. Few more logging messages would make this much simpler.

refs: #13100
refs: #10941

Closes #13141

* github.com:scylladb/scylladb:
  database: Increase verbosity of database::stop() method
  large_data_handler: Increase verbosity on shutdown
  large_data_handler: Coroutinize .stop() method

(cherry picked from commit e22b27a107)
2023-03-30 17:01:24 +03:00
Avi Kivity
76a76a95f4 Update tools/java submodule (hdrhistogram with Java 11)
* tools/java 1c4e1e7a7d...83b2168b19 (1):
  > Fix cassandra-stress -log hdrfile=... with java 11

Fixes #13287
2023-03-29 14:10:27 +03:00
Anna Stuchlik
f6837afec7 doc: update the Ubuntu version used in the image
Starting from 5.2 and 2023.1 our images are based on Ubuntu:22.04.
See https://github.com/scylladb/scylladb/issues/13138#issuecomment-1467737084

This commit adds that information to the docs.
It should be merged and backported to branch-5.2.

Closes #13301

(cherry picked from commit 9e27f6b4b7)
2023-03-27 14:08:57 +03:00
Botond Dénes
6350c8836d Revert "repair: Reduce repair reader eviction with diff shard count"
This reverts commit c6087cf3a0.

Said commit can cause a deadlock when 2 or more repairs compete for
locks on 2 or more nodes. Consider the following scenario:

Node n1 and n2 in the cluster, 1 shard per node, rf = 2, each shard has
1 available unit for the reader lock

    n1 starts repair r1
    r1-n1 (instance of r1 on node1) takes the reader lock on node1
    n2 starts repair r2
    r2-n2 (instance of r2 on node2) takes the reader lock on node2
    r1-n2 will fail to take the reader lock on node2
    r2-n1 will fail to take the reader lock on node1

As a result, r1 and r2 could not make progress and deadlock happens.

The complexity comes from the fact that a repair job needs lock on more
than one node. It is not guaranteed that all the participant nodes could
take the lock in one short.

There is no simple solution to this so we have to revert this locking
mechanism and look for another way to prevent reader trashing when
repairing nodes with mismatching shard count.

Fixes: #12693

Closes #13266

(cherry picked from commit 7699904c54)
2023-03-24 09:44:16 +02:00
Avi Kivity
5457948437 Update seastar submodule (rpc cancellation during negotiation)
* seastar 8889cbc198...1488aaf842 (1):
  > Merge 'Keep outgoing queue all cancellable while negotiating (again)' from Pavel Emelyanov

Fixes #11507.
2023-03-23 17:15:00 +02:00
Avi Kivity
da41001b5c .gitmodules: point seastar submodule at scylla-seastar.git
This allows is to backport seastar commits.

Ref #11507.
2023-03-23 17:11:43 +02:00
Anna Stuchlik
dd61e8634c doc: related https://github.com/scylladb/scylladb/issues/12754; add the missing information about reporting latencies to the upgrade guide 5.1 to 5.2
Closes #12935

(cherry picked from commit 26bb36cdf5)
2023-03-22 10:38:28 +02:00
Anna Stuchlik
b642b4c30e doc: fix the service name in upgrade guides
Fixes https://github.com/scylladb/scylladb/issues/13207

This commit fixes the service and package names in
the upgrade guides 5.0-to-2022.1 and 5.1-to-2022.2.
Service name: scylla-server
Package name: scylla-enterprise

Previous PRs to fix the same issue in other
upgrade guides:
https://github.com/scylladb/scylladb/pull/12679
https://github.com/scylladb/scylladb/pull/12698

This commit must be backported to branch-5.1 and branch 5.2.

Closes #13225

(cherry picked from commit 922f6ba3dd)
2023-03-22 10:37:12 +02:00
Botond Dénes
c013336121 db/view/view_update_check: check_needs_view_update_path(): filter out non-member hosts
We currently don't clean up the system_distributed.view_build_status
table after removed nodes. This can cause false-positive check for
whether view update generation is needed for streaming.
The proper fix is to clean up this table, but that will be more
involved, it even when done, it might not be immediate. So until then
and to be on the safe side, filter out entries belonging to unknown
hosts from said table.

Fixes: #11905
Refs: #11836

Closes #11860

(cherry picked from commit 84a69b6adb)
2023-03-22 09:03:50 +02:00
Kamil Braun
b6b35ce061 service: storage_proxy: sequence CDC preimage select with Paxos learn
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.

Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.

`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.

Fixes #12098

(cherry picked from commit 1ef113691a)
2023-03-21 20:23:19 +02:00
Petr Gusev
069e38f02d transport server: fix unexpected server errors handling
If request processing ended with an error, it is worth
sending the error to the client through
make_error/write_response. Previously in this case we
just wrote a message to the log and didn't handle the
client connection in any way. As a result, the only
thing the client got in this case was timeout error.

A new test_batch_with_error is added. It is quite
difficult to reproduce error condition in a test,
so we use error injection instead. Passing injection_key
in the body of the request ensures that the exception
will be thrown only for this test request and
will not affect other requests that
the driver may send in the background.

Closes: scylladb#12104
(cherry picked from commit a4cf509c3d)
2023-03-21 20:23:09 +02:00
Anna Mikhlin
61a8003ad1 release: prepare for 5.2.0-rc3 2023-03-20 10:10:27 +02:00
Botond Dénes
8a17066961 Merge 'doc: Updates the recommended OS to be Ubuntu 22.04' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/13138
Fixes https://github.com/scylladb/scylladb/issues/13153

This PR:

- Fixes outdated information about the recommended OS. Since version 5.2, the recommended OS should be Ubuntu 22.04 because that OS is used for building the ScyllaDB image.
- Adds the OS support information for version 5.2.

This PR (both commits) needs to be backported to branch-5.2.

Closes #13188

* github.com:scylladb/scylladb:
  doc: Add OS support for version 5.2
  doc: Updates the recommended OS to be Ubuntu 22.04

(cherry picked from commit f4b5679804)
2023-03-17 10:30:06 +02:00
Pavel Emelyanov
487ba9f3e1 Merge '[backport] reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()' from Botond Dénes
This PR backports 2f4a793457 to branch-5.2. Said patch depends on some other patches that are not part of any release yet.
This PR should apply to 5.1 and 5.0 too.

Closes #13162

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()
  reader_permit: expose operator<<(reader_permit::state)
  reader_permit: add get_state() accessor
2023-03-16 18:41:08 +03:00
Botond Dénes
bd4f9e3615 Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr
The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()`

Fixes: #12249

Closes #12978

* github.com:scylladb/scylladb:
  flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE
  make_nonforwardable: test through run_mutation_source_tests
  make_nonforwardable: next_partition and fast_forward_to when single_partition is true
  make_forwardable: fix next_partition
  flat_mutation_reader_v2: drop forward_buffer_to
  nonforwardable reader: fix indentation
  nonforwardable reader: refactor, extract reset_partition
  nonforwardable reader: add more tests
  nonforwardable reader: no partition_end after fast_forward_to()
  nonforwardable reader: no partition_end after next_partition()
  nonforwardable reader: no partition_end for empty reader
  row_cache: pass partition_start though nonforwardable reader

(cherry picked from commit 46efdfa1a1)
2023-03-16 10:42:03 +02:00
Botond Dénes
c68deb2461 reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()
Instead of open-coding the same, in an incomplete way.
clear_inactive_reads() does incomplete eviction in severeal ways:
* it doesn't decrement _stats.inactive_reads
* it doesn't set the permit to evicted state
* it doesn't cancel the ttl timer (if any)
* it doesn't call the eviction notifier on the permit (if there is one)

The list goes on. We already have an evict() method that all this
correctly, use that instead of the current badly open-coded alternative.

This patch also enhances the existing test for clear_inactive_reads()
and adds a new one specifically for `stop()` being called while having
inactive reads.

Fixes: #13048

Closes #13049

(cherry picked from commit 2f4a793457)
2023-03-14 09:50:16 +02:00
Botond Dénes
dd96d3017a reader_permit: expose operator<<(reader_permit::state)
(cherry picked from commit ec1c615029)
2023-03-14 09:50:16 +02:00
Botond Dénes
6ca80ee118 reader_permit: add get_state() accessor
(cherry picked from commit 397266f420)
2023-03-14 09:40:11 +02:00
Jan Ciolek
eee8f750cc cql3: preserve binary_operator.order in search_and_replace
There was a bug in `expr::search_and_replace`.
It doesn't preserve the `order` field of binary_operator.

`order` field is used to mark relations created
using the SCYLLA_CLUSTERING_BOUND.
It is a CQL feature used for internal queries inside Scylla.
It means that we should handle the restriction as a raw
clustering bound, not as an expression in the CQL language.

Losing the SCYLLA_CLUSTERING_BOUND marker could cause issues,
the database could end up selecting the wrong clustering ranges.

Fixes: #13055

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13056

(cherry picked from commit aa604bd935)
2023-03-09 12:52:39 +02:00
Botond Dénes
8d5206e6c6 sstables/sstable: validate_checksums(): force-check EOF
EOF is only guarateed to be set if one tried to read past the end of the
file. So when checking for EOF, also try to read some more. This
should force the EOF flag into a correct value. We can then check that
the read yielded 0 bytes.
This should ensure that `validate_checksums()` will not falsely declare
the validation to have failed.

Fixes: #11190

Closes #12696

(cherry picked from commit 693c22595a)
2023-03-09 12:30:44 +02:00
Anna Stuchlik
cfa40402f4 doc: Update the documentation landing page
This commit makes the following changes to the docs landing page:

- Adds the ScyllaDB enterprise docs as one of three tiles.

- Modifies the three tiles to reflect the three flavors of ScyllaDB.

- Moves the "New to ScyllaDB? Start here!" under the page title.

- Renames "Our Products" to "Other Products" to list the products other
  than ScyllaDB itself. In addtition, the boxes are enlarged from to
  large-4 to look better.

The major purpose of this commit is to expose the ScyllaDB
documentation.

docs: fix the link
(cherry picked from commit 27bb8c2302)

Closes #13086
2023-03-06 14:18:15 +02:00
Botond Dénes
2d170e51cf Merge 'doc: specify the versions where Alternator TTL is no longer experimental' from Anna Stuchlik
This PR adds a note to the Alternator TTL section to specify in which Open Source and Enterprise versions the feature was promoted from experimental to non-experimental.

The challenge here is that OSS and Enterprise are (still) **documented together**, but they're **not in sync** in promoting the TTL feature: it's still experimental in 5.1 (released) but no longer experimental in 2022.2 (to be released soon).

We can take one of the following approaches:
a) Merge this PR with master and ask the 2022.2 users to refer to master.
b) Merge this PR with master and then backport to branch-5.1. If we choose this approach, it is necessary to backport https://github.com/scylladb/scylladb/pull/11997 beforehand to avoid conflicts.

I'd opt for a) because it makes more sense from the OSS perspective and helps us avoid mess and backporting.

Closes #12295

* github.com:scylladb/scylladb:
  doc: fix the version in the comment on removing the note
  doc: specify the versions where Alternator TTL is no longer experimental

(cherry picked from commit d5dee43be7)
2023-03-02 12:09:16 +02:00
Anna Stuchlik
860e79e4b1 doc: fixes https://github.com/scylladb/scylladb/issues/12954, adds the minimal version from which the 2021.1-to-2022.1 upgrade is supported for Ubuntu, Debian, and image
Closes #12974

(cherry picked from commit 91b611209f)
2023-02-28 13:02:05 +02:00
Anna Mikhlin
908a82bea0 release: prepare for 5.2.0-rc2 2023-02-28 10:13:06 +02:00
Gleb Natapov
39158f55d0 lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once
If on the first call the capture is destroyed the second call may crash.

Fixes: #12958

Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>
(cherry picked from commit 1ce7ad1ee6)
2023-02-27 14:19:37 +02:00
Raphael S. Carvalho
22c1685b3d sstables: Temporarily disable loading of first and last position metadata
It's known that reading large cells in reverse cause large allocations.
Source: https://github.com/scylladb/scylladb/issues/11642

The loading is preliminary work for splitting large partitions into
fragments composing a run and then be able to later read such a run
in an efficiency way using the position metadata.

The splitting is not turned on yet, anywhere. Therefore, we can
temporarily disable the loading, as a way to avoid regressions in
stable versions. Large allocations can cause stalls due to foreground
memory eviction kicking in.
The default values for position metadata say that first and last
position include all clustering rows, but they aren't used anywhere
other than by sstable_run to determine if a run is disjoint at
clustering level, but given that no splitting is done yet, it
does not really matter.

Unit tests relying on position metadata were adjusted to enable
the loading, such that they can still pass.

Fixes #11642.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12979

(cherry picked from commit d73ffe7220)
2023-02-27 08:58:34 +02:00
Botond Dénes
9ba6fc73f1 mutation_compactor: only pass consumed range-tombstone-change to validator
Currently all consumed range tombstone changes are unconditionally
forwarded to the validator. Even if they are shadowed by a higher level
tombstone and/or purgable. This can result in a situation where a range
tombstone change was seen by the validator but not passed to the
consumer. The validator expects the range tombstone change to be closed
by end-of-partition but the end fragment won't come as the tombstone was
dropped, resulting in a false-positive validation failure.
Fix by only passing tombstones to the validator, that are actually
passed to the consumer too.

Fixes: #12575

Closes #12578

(cherry picked from commit e2c9cdb576)
2023-02-23 22:52:47 +02:00
Botond Dénes
f2e2c0127a types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory
Check the first fragment before dereferencing it, the fragment might be
empty, in which case move to the next one.
Found by running range scan tests with random schema and random data.

Fixes: #12821
Fixes: #12823
Fixes: #12708

Closes #12824

(cherry picked from commit ef548e654d)
2023-02-23 22:38:03 +02:00
Gleb Natapov
363ea87f51 raft: abort applier fiber when a state machine aborts
After 5badf20c7a applier fiber does not
stop after it gets abort error from a state machine which may trigger an
assertion because previous batch is not applied. Fix it.

Fixes #12863

(cherry picked from commit 9bdef9158e)
2023-02-23 14:12:12 +02:00
Kefu Chai
c49fd6f176 tools/schema_loader: do not return ref to a local variable
we should never return a reference to local variable.
so in this change, a reference to a static variable is returned
instead. this should address following warning from Clang 17:

```
/home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
        return {};
               ^~
```

Fixes #12875
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12876

(cherry picked from commit 6eab8720c4)
2023-02-22 22:02:43 +02:00
Takuya ASADA
3114589a30 scylla_coredump_setup: fix coredump timeout settings
We currently configure only TimeoutStartSec, but probably it's not
enough to prevent coredump timeout, since TimeoutStartSec is maximum
waiting time for service startup, and there is another directive to
specify maximum service running time (RuntimeMaxSec).

To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it
configures both TimeoutStartSec and TimeoutStopSec).

Fixes #5430

Closes #12757

(cherry picked from commit bf27fdeaa2)
2023-02-19 21:13:36 +02:00
Anna Stuchlik
34f68a4c0f doc: related https://github.com/scylladb/scylladb/issues/12658, fix the service name in the upgrade guide from 2022.1 to 2022.2
Closes #12698

(cherry picked from commit 826f67a298)
2023-02-17 12:17:48 +02:00
Botond Dénes
b336e11f59 Merge 'doc: fix the service name from "scylla-enterprise-server" "to "scylla-server"' from Anna Stuchlik
Related https://github.com/scylladb/scylladb/issues/12658.

This issue fixes the bug in the upgrade guides for the released versions.

Closes #12679

* github.com:scylladb/scylladb:
  doc: fix the service name in the upgrade guide for patch releases versions 2022
  doc: fix the service name in the upgrade guide from 2021.1 to 2022.1

(cherry picked from commit 325246ab2a)
2023-02-17 12:16:52 +02:00
Anna Stuchlik
9ef73d7e36 doc: fixes https://github.com/scylladb/scylladb/issues/12754, document the metric update in 5.2
Closes #12891

(cherry picked from commit bcca706ff5)
2023-02-17 12:16:13 +02:00
Botond Dénes
8700a72b4c Merge 'Backport compaction-backlog-tracker fixes to branch-5.2' from Raphael "Raph" Carvalho
Both patches are important to fix inefficiencies when updating the backlog tracker, which can manifest as a reactor stall, on a special event like schema change.

No conflicts when backporting.

Regression since 1d9f53c881, which is present in branch 5.1 onwards.

Closes #12851

* github.com:scylladb/scylladb:
  compaction: Fix inefficiency when updating LCS backlog tracker
  table: Fix quadratic behavior when inserting sstables into tracker on schema change
2023-02-15 07:22:25 +02:00
Raphael S. Carvalho
886dd3e1d2 compaction: Fix inefficiency when updating LCS backlog tracker
LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker
is calling STCS tracker's replace_sstables() with empty arguments
even when higher levels (> 0) *only* had sstables replaced.
This unnecessary call to STCS tracker will cause it to recompute
the L0 backlog, yielding the same value as before.

As LCS has a fragment size of 0.16G on higher levels, we may be
updating the tracker multiple times during incremental compaction,
which operates on SSTables on higher levels.

Inefficiency is fixed by only updating the STCS tracker if any
L0 sstable is being added or removed from the table.

This may be fixing a quadratic behavior during boot or refresh,
as new sstables are loaded one by one.
Higher levels have a substantial higher number of sstables,
therefore updating STCS tracker only when level 0 changes, reduces
significantly the number of times L0 backlog is recomputed.

Refs #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12676

(cherry picked from commit 1b2140e416)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-14 12:14:27 -03:00
Raphael S. Carvalho
f565f3de06 table: Fix quadratic behavior when inserting sstables into tracker on schema change
Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12593

(cherry picked from commit 87ee547120)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-14 12:14:21 -03:00
Anna Stuchlik
76ff6d981c doc: related https://github.com/scylladb/scylladb/issues/12754, add the requirement to upgrade Monitoring to version 4.3
Closes #12784

(cherry picked from commit c7778dd30b)
2023-02-10 10:28:35 +02:00
Botond Dénes
f924f59055 Merge 'Backport test.py improvements to 5.2' from Kamil Braun
Backport the following improvements for test.py efficiency and user experience:
- https://github.com/scylladb/scylladb/pull/12542
- https://github.com/scylladb/scylladb/pull/12560
- https://github.com/scylladb/scylladb/pull/12564
- https://github.com/scylladb/scylladb/pull/12563
- https://github.com/scylladb/scylladb/pull/12588
- https://github.com/scylladb/scylladb/pull/12613
- https://github.com/scylladb/scylladb/pull/12569
- https://github.com/scylladb/scylladb/pull/12612
- https://github.com/scylladb/scylladb/pull/12549
- https://github.com/scylladb/scylladb/pull/12678

Fixes #12617

Closes #12770

* github.com:scylladb/scylladb:
  test/pylib: put UNIX-domain socket in /tmp
  Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun
  Merge 'test.py: manual cluster pool handling for Python suite' from Alecco
  Merge 'test.py: handle broken clusters for Python suite' from Alecco
  test/pylib: scylla_cluster: don't leak server if stopping it fails
  Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
  test/pylib: scylla_cluster: return error details from test framework endpoints
  test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
  test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
  test: disable commitlog O_DSYNC, preallocation
2023-02-08 15:09:09 +02:00
Nadav Har'El
d5cef05810 test/pylib: put UNIX-domain socket in /tmp
The "cluster manager" used by the topology test suite uses a UNIX-domain
socket to communicate between the cluster manager and the individual tests.
The socket is currently located in the test directory but there is a
problem: In Linux the length of the path used as a UNIX-domain socket
address is limited to just a little over 100 bytes. In Jenkins run, the
test directory names are very long, and we sometimes go over this length
limit and the result is that test.py fails creating this socket.

In this patch we simply put the socket in /tmp instead of the test
directory. We only need to do this change in one place - the cluster
manager, as it already passes the socket path to the individual tests
(using the "--manager-api" option).

Tested by cloning Scylla in a very long directory name.
A test like ./test.py --mode=dev test_concurrent_schema fails before
this patch, and passes with it.

Fixes #12622

Closes #12678

(cherry picked from commit 681a066923)
2023-02-07 17:12:14 +01:00
Nadav Har'El
e0f4e99e9b Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun
`ScyllaClusterManager` is used to run a sequence of test cases from
a single test file. Between two consecutive tests, if the previous test
left the cluster 'dirty', meaning the cluster cannot be reused, it would
free up space in the pool (using `steal`), stop the cluster, then get a
new cluster from the pool.

Between the `steal` and the `get`, a concurrent test run (with its own
instance of `ScyllaClusterManager` would start, because there was free
space in the pool.

This resulted in undesirable behavior when we ran tests with
`--repeat X` for a large `X`: we would start with e.g. 4 concurrent
runs of a test file, because the pool size was 4. As soon as one of the
runs freed up space in the pool, we would start another concurrent run.
Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so
on. We would have a large number of concurrent runs, even though the
original 4 runs didn't finish yet. All of these concurrent runs would
compete waiting on the pool, and waiting for space in the pool would
take longer and longer (the duration is linear w.r.t number of
concurrent competing runs). Tests would then time out because they would
have to wait too long.

Fix that by using the new `replace_dirty` function introduced to the
pool. This function frees up space by returning a dirty cluster and then
immediately takes it away to be used for a new cluster. Thanks to this,
we will only have at most as many concurrent runs as the pool size. For
example with --repeat 8 and pool size 4, we would run 4 concurrent runs
and start the 5th run only when one of the original 4 runs finishes,
then the 6th run when a second run finishes and so on.

The fix is preceded by a refactor that replaces `steal` with `put(is_dirty=True)`
and a `destroy` function passed to the pool (now the pool is responsible
for stopping the cluster and releasing its IPs).

Fixes #11757

Closes #12549

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests
  test/pylib: pool: introduce `replace_dirty`
  test/pylib: pool: replace `steal` with `put(is_dirty=True)`

(cherry picked from commit 132af20057)
2023-02-07 17:08:17 +01:00
Kamil Braun
6795715011 Merge 'test.py: manual cluster pool handling for Python suite' from Alecco
From reviews of https://github.com/scylladb/scylladb/pull/12569, avoid
using `async with` and access the `Pool` of clusters with
`get()`/`put()`.

Closes #12612

* github.com:scylladb/scylladb:
  test.py: manual cluster handling for PythonSuite
  test.py: stop cluster if PythonSuite fails to start
  test.py: minor fix for failed PythonSuite test

(cherry picked from commit 5bc7f0732e)
2023-02-07 17:07:43 +01:00
Nadav Har'El
aa9e91c376 Merge 'test.py: handle broken clusters for Python suite' from Alecco
If the after test check fails (is_after_test_ok is False), discard the cluster and raise exception so context manager (pool) does not recycle it.

Ignore exception re-raised by the context manager.

Fixes #12360

Closes #12569

* github.com:scylladb/scylladb:
  test.py: handle broken clusters for Python suite
  test.py: Pool discard method

(cherry picked from commit 54f174a1f4)
2023-02-07 17:07:36 +01:00
Kamil Braun
ddfb9ebab2 test/pylib: scylla_cluster: don't leak server if stopping it fails
`ScyllaCluster.server_stop` had this piece of code:
```
        server = self.running.pop(server_id)
        if gracefully:
            await server.stop_gracefully()
        else:
            await server.stop()
        self.stopped[server_id] = server
```

We observed `stop_gracefully()` failing due to a server hanging during
shutdown. We then ended up in a state where neither `self.running` nor
`self.stopped` had this server. Later, when releasing the cluster and
its IPs, we would release that server's IP - but the server might have
still been running (all servers in `self.running` are killed before
releasing IPs, but this one wasn't in `self.running`).

Fix this by popping the server from `self.running` only after
`stop_gracefully`/`stop` finishes.

Make an analogous fix in `server_start`: put `server` into
`self.running` *before* we actually start it. If the start fails, the
server will be considered "running" even though it isn't necessarily,
but that is OK - if it isn't running, then trying to stop it later will
simply do nothing; if it is actually running, we will kill it (which we
should do) when clearing after the cluster; and we don't leak it.

Closes #12613

(cherry picked from commit a0ff33e777)
2023-02-07 17:05:20 +01:00
Nadav Har'El
d58a3e4d16 Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
Don't use a range scan, which is very inefficient, to perform a query for checking CQL availability.

Improve logging when waiting for server startup times out. Provide details about the failure: whether we managed to obtain the Host ID of the server and whether we managed to establish a CQL connection.

Closes #12588

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: better logging for timeout on server startup
  test/pylib: scylla_cluster: use less expensive query to check for CQL availability

(cherry picked from commit ccc2c6b5dd)
2023-02-07 17:05:02 +01:00
Kamil Braun
2ebac52d2d test/pylib: scylla_cluster: return error details from test framework endpoints
If an endpoint handler throws an exception, the details of the exception
are not returned to the client. Normally this is desirable so that
information is not leaked, but in this test framework we do want to
return the details to the client so it can log a useful error message.

Do it by wrapping every handler into a catch clause that returns
the exception message.

Also modify a bit how HTTPErrors are rendered so it's easier to discern
the actual body of the error from other details (such as the params used
to make the request etc.)

Before:
```
E test.pylib.rest_client.HTTPError: HTTP error 500: 500 Internal Server Error
E
E Server got itself in trouble, params None, json None, uri http+unix://api/cluster/before-test/test_stuff
```

After:
```
E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/before-test/test_stuff, params: None, json: None, body:
E Failed to start server at host 127.155.129.1.
E Check the log files:
E /home/kbraun/dev/scylladb/testlog/test.py.dev.log
E /home/kbraun/dev/scylladb/testlog/dev/scylla-1.log
```

Closes #12563

(cherry picked from commit 2f84e820fd)
2023-02-07 17:04:37 +01:00
Kamil Braun
b536614913 test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
When we obtained a new cluster for a test case after the previous test
case left a dirty cluster, we would release the old cluster's used IP
addresses (`_before_test` function). However, we would not release the
last cluster's IP after the last test case. We would run out of IPs with
sufficiently many test files or `--repeat` runs. Fix this.

Also reorder the operations a bit: stop the cluster (and release its
IPs) before freeing up space in the cluster pool (i.e. call
`self.cluster.stop()` before `self.clusters.steal()`). This reduces
concurrency a bit - fewer Scyllas running at the same time, which is
good (the pool size gives a limit on the desired max number of
concurrently running clusters). Killing a cluster is quick so it won't
make a significant difference for the next guy waiting on the pool.

Closes #12564

(cherry picked from commit 3ed3966f13)
2023-02-07 17:04:19 +01:00
Kamil Braun
85df0fd2b1 test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
If a cluster fails to boot, it saves the exception in
`self.start_exception` variable; the exception will be rethrown when
a test tries to start using this cluster. As explained in `before_test`:
```
    def before_test(self, name) -> None:
        """Check that  the cluster is ready for a test. If
        there was a start error, throw it here - the server is
        running when it's added to the pool, which can't be attributed
        to any specific test, throwing it here would stop a specific
        test."""
```
It's arguable whether we should blame some random test for a failure
that it didn't cause, but nevertheless, there's a problem here: the
`start_exception` will be rethrown and the test will fail, but then the
cluster will be simply returned to the pool and the next test will
attempt to use it... and so on.

Prevent this by marking the cluster as dirty the first time we rethrow
the exception.

Closes #12560

(cherry picked from commit 147dd73996)
2023-02-07 17:03:56 +01:00
Avi Kivity
cdf9fe7023 test: disable commitlog O_DSYNC, preallocation
Commitlog O_DSYNC is intended to make Raft and schema writes durable
in the face of power loss. To make O_DSYNC performant, we preallocate
the commitlog segments, so that the commitlog writes only change file
data and not file metadata (which would require the filesystem to commit
its own log).

However, in tests, this causes each ScyllaDB instance to write 384MB
of commitlog segments. This overloads the disks and slows everything
down.

Fix this by disabling O_DSYNC (and therefore preallocation) during
the tests. They can't survive power loss, and run with
--unsafe-bypass-fsync anyway.

Closes #12542

(cherry picked from commit 9029b8dead)
2023-02-07 17:02:59 +01:00
Beni Peled
8ff4717fd0 release: prepare for 5.2.0-rc1 2023-02-06 22:13:53 +02:00
Kamil Braun
291b1f6e7f service/raft: raft_group0: prevent double abort
There was a small chance that we called `timeout_src.request_abort()`
twice in the `with_timeout` function, first by timeout and then by
shutdown. `abort_source` fails on an assertion in this case. Fix this.

Fixes: #12512

Closes #12514

(cherry picked from commit 54170749b8)
2023-02-05 18:31:50 +02:00
Kefu Chai
b2699743cc db: system_keyspace: take the reserved_memory into account
before this change, we returns the total memory managed by Seastar
in the "total" field in system.memory. but this value only reflect
the total memory managed by Seastar's allocator. if
`reserve_additional_memory` is set when starting app_template,
Seastar's memory subsystem just reserves a chunk of memory of this
specified size for system, and takes the remaining memory. since
f05d612da8, we set this value to 50MB for wasmtime runtime. hence
the test of `TestRuntimeInfoTable.test_default_content` in dtest
fails. the test expects the size passed via the option of
`--memory` to be identical to the value reported by system.memory's
"total" field.

after this change, the "total" field takes the reserved memory
for wasm udf into account. the "total" field should reflect the total
size of memory used by Scylla, no matter how we use a certain portion
of the allocated memory.

Fixes #12522
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12573

(cherry picked from commit 4a0134a097)
2023-02-05 18:30:05 +02:00
Botond Dénes
50ae73a4bd types: is_tuple(): handle reverse types
Currently reverse types match the default case (false), even though they
might be wrapping a tuple type. One user-visible effect of this is that
a schema, which has a reversed<frozen<UDT>> clustering key component,
will have this component incorrectly represented in the schema cql dump:
the UDT will loose the frozen attribute. When attempting to recreate
this schema based on the dump, it will fail as the only frozen UDTs are
allowed in primary key components.

Fixes: #12576

Closes #12579

(cherry picked from commit ebc100f74f)
2023-02-05 18:20:21 +02:00
Calle Wilund
c3dd4a2b87 alterator::streams: Sort tables in list_streams to ensure no duplicates
Fixes #12601 (maybe?)

Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.

(cherry picked from commit da8adb4d26)
2023-02-05 17:44:00 +02:00
Benny Halevy
0f9fe61d91 view: row_lock: lock_ck: find or construct row_lock under partition lock
Since we're potentially searching the row_lock in parallel to acquiring
the read_lock on the partition, we're racing with row_locker::unlock
that may erase the _row_locks entry for the same clustering key, since
there is no lock to protect it up until the partition lock has been
acquired and the lock_partition future is resolved.

This change moves the code to search for or allocate the row lock
_after_ the partition lock has been acquired to make sure we're
synchronously starting the read/write lock function on it, without
yielding, to prevent this use-after-free.

This adds an allocation for copying the clustering key in advance
even if a row_lock entry already exists, that wasn't needed before.
It only us slows down (a bit) when there is contention and the lock
already existed when we want to go locking. In the fast path there
is no contention and then the code already had to create the lock
and copy the key. In any case, the penalty of copying the key once
is tiny compared to the rest of the work that view updates are doing.

This is required on top of 5007ded2c1 as
seen in https://github.com/scylladb/scylladb/issues/12632
which is closely related to #12168 but demonstrates a different race
causing use-after-free.

Fixes #12632

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4b5e324ecb)
2023-02-05 17:22:31 +02:00
Anna Stuchlik
59d30ff241 docs: fixes https://github.com/scylladb/scylladb/issues/12654, update the links to the Download Center
Closes #12655

(cherry picked from commit 64cc4c8515)
2023-02-05 17:19:56 +02:00
Anna Stuchlik
fb82dff89e doc: fixes https://github.com/scylladb/scylladb/issues/12672, fix the redirects to the Cloud docs
Closes #12673

(cherry picked from commit 2be131da83)
2023-02-05 17:17:35 +02:00
Kefu Chai
b588b19620 cql3/selection: construct string_view using char* not size
before this change, we construct a sstring from a comma statement,
which evaluates to the return value of `name.size()`, but what we
expect is `sstring(const char*, size_t)`.

in this change

* instead of passing the size of the string_view,
  both its address and size are used
* `std::string_view` is constructed instead of sstring, for better
  performance, as we don't need to perform a deep copy

the issue is reported by GCC-13:

```
In file included from cql3/selection/selectable.cc:11:
cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
                                                           ^~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12666

(cherry picked from commit 186ceea009)

Fixes #12739.
2023-02-05 13:50:48 +02:00
Michał Chojnowski
608ef92a71 commitlog: fix total_size_on_disk accounting after segment file removal
Currently, segment file removal first calls `f.remove_file()` and
does `total_size_on_disk -= f.known_size()` later.
However, `remove_file()` resets `known_size` to 0, so in effect
the freed space in not accounted for.

`total_size_on_disk` is not just a metric. It is also responsible
for deciding whether a segment should be recycled -- it is recycled
only if `total_size_on_disk - known_size < max_disk_size`.
Therefore this bug has dire performance consequences:
if `total_size_on_disk - known_size` ever exceeds `max_disk_size`,
the recycling of commitlog segments will stop permanently, because
`total_size_on_disk - known_size` will never go back below
`max_disk_size` due to the accounting bug. All new segments from this
point will be allocated from scratch.

The bug was uncovered by a QA performance test. It isn't easy to trigger --
it took the test 7 hours of constant high load to step into it.
However, the fact that the effect is permanent, and degrades the
performance of the cluster silently, makes the bug potentially quite severe.

The bug can be easily spotted with Prometheus as infinitely rising
`commitlog_total_size_on_disk` on the affected shards.

Fixes #12645

Closes #12646

(cherry picked from commit fa7e904cd6)
2023-02-01 21:54:37 +02:00
Kamil Braun
d2732b2663 Merge 'Enable Raft by default in new clusters' from Kamil Braun
New clusters that use a fresh conf/scylla.yaml will have `consistent_cluster_management: true`, which will enable Raft, unless the user explicitly turns it off before booting the cluster.

People using existing yaml files will continue without Raft, unless consistent_cluster_management is explicitly requested during/after upgrade.

Also update the docs: cluster creation and node addition procedures.

Fixes #12572.

Closes #12585

* github.com:scylladb/scylladb:
  docs: mention `consistent_cluster_management` for creating cluster and adding node procedures
  conf: enable `consistent_cluster_management` by default

(cherry picked from commit 5c886e59de)
2023-01-26 12:21:55 +01:00
Anna Mikhlin
34ab98e1be release: prepare for 5.2.0-rc0 2023-01-18 14:54:36 +02:00
295 changed files with 5744 additions and 6005 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -72,7 +72,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.2.0-dev
VERSION=5.2.6
if test -f version
then

View File

@@ -88,17 +88,20 @@ json::json_return_type make_streamed(rjson::value&& value) {
// move objects to coroutine frame.
auto los = std::move(os);
auto lrs = std::move(rs);
std::exception_ptr ex;
try {
co_await rjson::print(*lrs, los);
co_await los.flush();
co_await los.close();
} catch (...) {
// at this point, we cannot really do anything. HTTP headers and return code are
// already written, and quite potentially a portion of the content data.
// just log + rethrow. It is probably better the HTTP server closes connection
// abruptly or something...
elogger.error("Unhandled exception in data streaming: {}", std::current_exception());
throw;
ex = std::current_exception();
elogger.error("Exception during streaming HTTP response: {}", ex);
}
co_await los.close();
if (ex) {
co_await coroutine::return_exception_ptr(std::move(ex));
}
co_return;
};
@@ -2358,21 +2361,22 @@ std::optional<rjson::value> executor::describe_single_item(schema_ptr schema,
return item;
}
std::vector<rjson::value> executor::describe_multi_item(schema_ptr schema,
const query::partition_slice& slice,
const cql3::selection::selection& selection,
const query::result& query_result,
const std::optional<attrs_to_get>& attrs_to_get) {
cql3::selection::result_set_builder builder(selection, gc_clock::now());
query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));
future<std::vector<rjson::value>> executor::describe_multi_item(schema_ptr schema,
const query::partition_slice&& slice,
shared_ptr<cql3::selection::selection> selection,
foreign_ptr<lw_shared_ptr<query::result>> query_result,
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get) {
cql3::selection::result_set_builder builder(*selection, gc_clock::now());
query::result_view::consume(*query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));
auto result_set = builder.build();
std::vector<rjson::value> ret;
for (auto& result_row : result_set->rows()) {
rjson::value item = rjson::empty_object();
describe_single_item(selection, result_row, attrs_to_get, item);
describe_single_item(*selection, result_row, *attrs_to_get, item);
ret.push_back(std::move(item));
co_await coroutine::maybe_yield();
}
return ret;
co_return ret;
}
static bool check_needs_read_before_write(const parsed::value& v) {
@@ -3254,8 +3258,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
service::storage_proxy::coordinator_query_options(executor::default_timeout(), permit, client_state, trace_state)).then(
[schema = rs.schema, partition_slice = std::move(partition_slice), selection = std::move(selection), attrs_to_get = rs.attrs_to_get] (service::storage_proxy::coordinator_query_result qr) mutable {
utils::get_local_injector().inject("alternator_batch_get_item", [] { throw std::runtime_error("batch_get_item injection"); });
std::vector<rjson::value> jsons = describe_multi_item(schema, partition_slice, *selection, *qr.query_result, *attrs_to_get);
return make_ready_future<std::vector<rjson::value>>(std::move(jsons));
return describe_multi_item(std::move(schema), std::move(partition_slice), std::move(selection), std::move(qr.query_result), std::move(attrs_to_get));
});
response_futures.push_back(std::move(f));
}

View File

@@ -222,11 +222,11 @@ public:
const query::result&,
const std::optional<attrs_to_get>&);
static std::vector<rjson::value> describe_multi_item(schema_ptr schema,
const query::partition_slice& slice,
const cql3::selection::selection& selection,
const query::result& query_result,
const std::optional<attrs_to_get>& attrs_to_get);
static future<std::vector<rjson::value>> describe_multi_item(schema_ptr schema,
const query::partition_slice&& slice,
shared_ptr<cql3::selection::selection> selection,
foreign_ptr<lw_shared_ptr<query::result>> query_result,
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get);
static void describe_single_item(const cql3::selection::selection&,
const std::vector<bytes_opt>&,

View File

@@ -145,19 +145,24 @@ future<alternator::executor::request_return_type> alternator::executor::list_str
auto table = find_table(_proxy, request);
auto db = _proxy.data_dictionary();
auto cfs = db.get_tables();
auto i = cfs.begin();
auto e = cfs.end();
if (limit < 1) {
throw api_error::validation("Limit must be 1 or more");
}
// TODO: the unordered_map here is not really well suited for partial
// querying - we're sorting on local hash order, and creating a table
// between queries may or may not miss info. But that should be rare,
// and we can probably expect this to be a single call.
// # 12601 (maybe?) - sort the set of tables on ID. This should ensure we never
// generate duplicates in a paged listing here. Can obviously miss things if they
// are added between paged calls and end up with a "smaller" UUID/ARN, but that
// is to be expected.
std::sort(cfs.begin(), cfs.end(), [](const data_dictionary::table& t1, const data_dictionary::table& t2) {
return t1.schema()->id().uuid() < t2.schema()->id().uuid();
});
auto i = cfs.begin();
auto e = cfs.end();
if (streams_start) {
i = std::find_if(i, e, [&](data_dictionary::table t) {
i = std::find_if(i, e, [&](const data_dictionary::table& t) {
return t.schema()->id().uuid() == streams_start
&& cdc::get_base_table(db.real_database(), *t.schema())
&& is_alternator_keyspace(t.schema()->ks_name())

View File

@@ -66,36 +66,48 @@ atomic_cell::atomic_cell(const abstract_type& type, atomic_cell_view other)
set_view(_data);
}
// Based on:
// - org.apache.cassandra.db.AbstractCell#reconcile()
// - org.apache.cassandra.db.BufferExpiringCell#reconcile()
// - org.apache.cassandra.db.BufferDeletedCell#reconcile()
// Based on Cassandra's resolveRegular function:
// - https://github.com/apache/cassandra/blob/e4f31b73c21b04966269c5ac2d3bd2562e5f6c63/src/java/org/apache/cassandra/db/rows/Cells.java#L79-L119
//
// Note: the ordering algorithm for cell is the same as for rows,
// except that the cell value is used to break a tie in case all other attributes are equal.
// See compare_row_marker_for_merge.
std::strong_ordering
compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right) {
// Largest write timestamp wins.
if (left.timestamp() != right.timestamp()) {
return left.timestamp() <=> right.timestamp();
}
// Tombstones always win reconciliation with live cells of the same timestamp
if (left.is_live() != right.is_live()) {
return left.is_live() ? std::strong_ordering::less : std::strong_ordering::greater;
}
if (left.is_live()) {
auto c = compare_unsigned(left.value(), right.value()) <=> 0;
if (c != 0) {
return c;
}
// Prefer expiring cells (which will become tombstones at some future date) over live cells.
// See https://issues.apache.org/jira/browse/CASSANDRA-14592
if (left.is_live_and_has_ttl() != right.is_live_and_has_ttl()) {
// prefer expiring cells.
return left.is_live_and_has_ttl() ? std::strong_ordering::greater : std::strong_ordering::less;
}
// If both are expiring, choose the cell with the latest expiry or derived write time.
if (left.is_live_and_has_ttl()) {
// Prefer cell with latest expiry
if (left.expiry() != right.expiry()) {
return left.expiry() <=> right.expiry();
} else {
// prefer the cell that was written later,
// so it survives longer after it expires, until purged.
} else if (right.ttl() != left.ttl()) {
// The cell write time is derived by (expiry - ttl).
// Prefer the cell that was written later,
// so it survives longer after it expires, until purged,
// as it become purgeable gc_grace_seconds after it was written.
//
// Note that this is an extension to Cassandra's algorithm
// which stops at the expiration time, and if equal,
// move forward to compare the cell values.
return right.ttl() <=> left.ttl();
}
}
// The cell with the largest value wins, if all other attributes of the cells are identical.
// This is quite arbitrary, but still required to break the tie in a deterministic way.
return compare_unsigned(left.value(), right.value());
} else {
// Both are deleted

View File

@@ -55,6 +55,7 @@ future<bool> default_role_row_satisfies(
return qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{meta::DEFAULT_SUPERUSER_NAME},
cql3::query_processor::cache_internal::yes).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {

View File

@@ -414,9 +414,12 @@ private:
class formatted_sstables_list {
bool _include_origin = true;
std::vector<sstring> _ssts;
std::vector<std::string> _ssts;
public:
formatted_sstables_list() = default;
void reserve(size_t n) {
_ssts.reserve(n);
}
explicit formatted_sstables_list(const std::vector<shared_sstable>& ssts, bool include_origin) : _include_origin(include_origin) {
_ssts.reserve(ssts.size());
for (const auto& sst : ssts) {
@@ -435,9 +438,7 @@ public:
};
std::ostream& operator<<(std::ostream& os, const formatted_sstables_list& lst) {
os << "[";
os << boost::algorithm::join(lst._ssts, ",");
os << "]";
fmt::print(os, "[{}]", fmt::join(lst._ssts, ","));
return os;
}
@@ -641,6 +642,7 @@ private:
future<> setup() {
auto ssts = make_lw_shared<sstables::sstable_set>(make_sstable_set_for_input());
formatted_sstables_list formatted_msg;
formatted_msg.reserve(_sstables.size());
auto fully_expired = _table_s.fully_expired_sstables(_sstables, gc_clock::now());
min_max_tracker<api::timestamp_type> timestamp_tracker;

View File

@@ -647,6 +647,7 @@ sstables::compaction_stopped_exception compaction_manager::task::make_compaction
compaction_manager::compaction_manager(config cfg, abort_source& as)
: _cfg(std::move(cfg))
, _compaction_submission_timer(compaction_sg().cpu, compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), [this] () -> float {
_last_backlog = backlog();
auto b = _last_backlog / available_memory();
@@ -681,6 +682,7 @@ compaction_manager::compaction_manager(config cfg, abort_source& as)
compaction_manager::compaction_manager()
: _cfg(config{ .available_memory = 1 })
, _compaction_submission_timer(compaction_sg().cpu, compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), 1, [] () -> float { return 1.0; }))
, _backlog_manager(_compaction_controller)
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
@@ -738,7 +740,7 @@ void compaction_manager::register_metrics() {
void compaction_manager::enable() {
assert(_state == state::none || _state == state::disabled);
_state = state::enabled;
_compaction_submission_timer.arm(periodic_compaction_submission_interval());
_compaction_submission_timer.arm_periodic(periodic_compaction_submission_interval());
_waiting_reevalution = postponed_compactions_reevaluation();
}
@@ -1444,14 +1446,18 @@ protected:
co_return std::nullopt;
}
private:
// Releases reference to cleaned files such that respective used disk space can be freed.
void release_exhausted(std::vector<sstables::shared_sstable> exhausted_sstables) {
_compacting.release_compacting(exhausted_sstables);
}
future<> run_cleanup_job(sstables::compaction_descriptor descriptor) {
co_await coroutine::switch_to(_cm.compaction_sg().cpu);
// Releases reference to cleaned files such that respective used disk space can be freed.
auto release_exhausted = [this, &descriptor] (std::vector<sstables::shared_sstable> exhausted_sstables) mutable {
auto exhausted = boost::copy_range<std::unordered_set<sstables::shared_sstable>>(exhausted_sstables);
std::erase_if(descriptor.sstables, [&] (const sstables::shared_sstable& sst) {
return exhausted.contains(sst);
});
_compacting.release_compacting(exhausted_sstables);
};
for (;;) {
compaction_backlog_tracker user_initiated(std::make_unique<user_initiated_backlog_tracker>(_cm._compaction_controller.backlog_of_shares(200), _cm.available_memory()));
_cm.register_backlog_tracker(user_initiated);
@@ -1459,8 +1465,7 @@ private:
std::exception_ptr ex;
try {
setup_new_compaction(descriptor.run_identifier);
co_await compact_sstables_and_update_history(descriptor, _compaction_data,
std::bind(&cleanup_sstables_compaction_task::release_exhausted, this, std::placeholders::_1));
co_await compact_sstables_and_update_history(descriptor, _compaction_data, release_exhausted);
finish_compaction();
_cm.reevaluate_postponed_compactions();
co_return; // done with current job

View File

@@ -296,10 +296,10 @@ private:
std::function<void()> compaction_submission_callback();
// all registered tables are reevaluated at a constant interval.
// Submission is a NO-OP when there's nothing to do, so it's fine to call it regularly.
timer<lowres_clock> _compaction_submission_timer = timer<lowres_clock>(compaction_submission_callback());
static constexpr std::chrono::seconds periodic_compaction_submission_interval() { return std::chrono::seconds(3600); }
config _cfg;
timer<lowres_clock> _compaction_submission_timer;
compaction_controller _compaction_controller;
compaction_backlog_manager _backlog_manager;
optimized_optional<abort_source::subscription> _early_abort_subscription;

View File

@@ -409,7 +409,9 @@ public:
l0_old_ssts.push_back(std::move(sst));
}
}
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
if (l0_old_ssts.size() || l0_new_ssts.size()) {
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
}
}
};

View File

@@ -553,4 +553,16 @@ murmur3_partitioner_ignore_msb_bits: 12
# WARNING: It's unsafe to set this to false if the node previously booted
# with the schema commit log enabled. In such case, some schema changes
# may be lost if the node was not cleanly stopped.
force_schema_commit_log: true
force_schema_commit_log: true
# Use Raft to consistently manage schema information in the cluster.
# Refer to https://docs.scylladb.com/master/architecture/raft.html for more details.
# The 'Handling Failures' section is especially important.
#
# Once enabled in a cluster, this cannot be turned off.
# If you want to bootstrap a new cluster without Raft, make sure to set this to `false`
# before starting your nodes for the first time.
#
# A cluster not using Raft can be 'upgraded' to use Raft. Refer to the aforementioned
# documentation, section 'Enabling Raft in ScyllaDB 5.2 and further', for the procedure.
consistent_cluster_management: true

View File

@@ -10,6 +10,7 @@
#include "cql3/attributes.hh"
#include "cql3/column_identifier.hh"
#include <optional>
namespace cql3 {
@@ -55,9 +56,9 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
}
}
int32_t attributes::get_time_to_live(const query_options& options) {
std::optional<int32_t> attributes::get_time_to_live(const query_options& options) {
if (!_time_to_live.has_value() || _time_to_live_unset_guard.is_unset(options))
return 0;
return std::nullopt;
cql3::raw_value tval = expr::evaluate(*_time_to_live, options);
if (tval.is_null()) {

View File

@@ -45,7 +45,7 @@ public:
int64_t get_timestamp(int64_t now, const query_options& options);
int32_t get_time_to_live(const query_options& options);
std::optional<int32_t> get_time_to_live(const query_options& options);
db::timeout_clock::duration get_timeout(const query_options& options) const;

View File

@@ -1416,7 +1416,7 @@ expression search_and_replace(const expression& e,
};
},
[&] (const binary_operator& oper) -> expression {
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs));
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs), oper.order);
},
[&] (const column_mutation_attribute& cma) -> expression {
return column_mutation_attribute{cma.kind, recurse(cma.column)};

View File

@@ -13,6 +13,7 @@
#include "cql3/lists.hh"
#include "cql3/constants.hh"
#include "cql3/user_types.hh"
#include "cql3/ut_name.hh"
#include "cql3/type_json.hh"
#include "cql3/functions/user_function.hh"
#include "cql3/functions/user_aggregate.hh"
@@ -52,6 +53,13 @@ bool abstract_function::requires_thread() const { return false; }
bool as_json_function::requires_thread() const { return false; }
static bool same_signature(const shared_ptr<function>& f1, const shared_ptr<function>& f2) {
if (f1 == nullptr || f2 == nullptr) {
return false;
}
return f1->name() == f2->name() && f1->arg_types() == f2->arg_types();
}
thread_local std::unordered_multimap<function_name, shared_ptr<function>> functions::_declared = init();
void functions::clear_functions() noexcept {
@@ -143,22 +151,56 @@ void functions::replace_function(shared_ptr<function> func) {
with_udf_iter(func->name(), func->arg_types(), [func] (functions::declared_t::iterator i) {
i->second = std::move(func);
});
auto scalar_func = dynamic_pointer_cast<scalar_function>(func);
if (!scalar_func) {
return;
}
for (auto& fit : _declared) {
auto aggregate = dynamic_pointer_cast<user_aggregate>(fit.second);
if (aggregate && (same_signature(aggregate->sfunc(), scalar_func)
|| (same_signature(aggregate->finalfunc(), scalar_func))
|| (same_signature(aggregate->reducefunc(), scalar_func))))
{
// we need to replace at least one underlying function
shared_ptr<scalar_function> sfunc = same_signature(aggregate->sfunc(), scalar_func) ? scalar_func : aggregate->sfunc();
shared_ptr<scalar_function> finalfunc = same_signature(aggregate->finalfunc(), scalar_func) ? scalar_func : aggregate->finalfunc();
shared_ptr<scalar_function> reducefunc = same_signature(aggregate->reducefunc(), scalar_func) ? scalar_func : aggregate->reducefunc();
fit.second = ::make_shared<user_aggregate>(aggregate->name(), aggregate->initcond(), sfunc, reducefunc, finalfunc);
}
}
}
void functions::remove_function(const function_name& name, const std::vector<data_type>& arg_types) {
with_udf_iter(name, arg_types, [] (functions::declared_t::iterator i) { _declared.erase(i); });
}
std::optional<function_name> functions::used_by_user_aggregate(const function_name& name) {
std::optional<function_name> functions::used_by_user_aggregate(shared_ptr<user_function> func) {
for (const shared_ptr<function>& fptr : _declared | boost::adaptors::map_values) {
auto aggregate = dynamic_pointer_cast<user_aggregate>(fptr);
if (aggregate && (aggregate->sfunc().name() == name || (aggregate->has_finalfunc() && aggregate->finalfunc().name() == name))) {
if (aggregate && (same_signature(aggregate->sfunc(), func)
|| (same_signature(aggregate->finalfunc(), func))
|| (same_signature(aggregate->reducefunc(), func))))
{
return aggregate->name();
}
}
return {};
}
std::optional<function_name> functions::used_by_user_function(const ut_name& user_type) {
for (const shared_ptr<function>& fptr : _declared | boost::adaptors::map_values) {
for (auto& arg_type : fptr->arg_types()) {
if (arg_type->references_user_type(user_type.get_keyspace(), user_type.get_user_type_name())) {
return fptr->name();
}
}
if (fptr->return_type()->references_user_type(user_type.get_keyspace(), user_type.get_user_type_name())) {
return fptr->name();
}
}
return {};
}
lw_shared_ptr<column_specification>
functions::make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
const function& fun, size_t i) {

View File

@@ -71,7 +71,8 @@ public:
static void add_function(shared_ptr<function>);
static void replace_function(shared_ptr<function>);
static void remove_function(const function_name& name, const std::vector<data_type>& arg_types);
static std::optional<function_name> used_by_user_aggregate(const function_name& name);
static std::optional<function_name> used_by_user_aggregate(shared_ptr<user_function>);
static std::optional<function_name> used_by_user_function(const ut_name& user_type);
private:
template <typename F>
static void with_udf_iter(const function_name& name, const std::vector<data_type>& arg_types, F&& f);

View File

@@ -37,14 +37,14 @@ public:
virtual sstring element_type() const override { return "aggregate"; }
virtual std::ostream& describe(std::ostream& os) const override;
const scalar_function& sfunc() const {
return *_sfunc;
seastar::shared_ptr<scalar_function> sfunc() const {
return _sfunc;
}
const scalar_function& reducefunc() const {
return *_reducefunc;
seastar::shared_ptr<scalar_function> reducefunc() const {
return _reducefunc;
}
const scalar_function& finalfunc() const {
return *_finalfunc;
seastar::shared_ptr<scalar_function> finalfunc() const {
return _finalfunc;
}
const bytes_opt& initcond() const {
return _initcond;

View File

@@ -135,12 +135,21 @@ void query_options::prepare(const std::vector<lw_shared_ptr<column_specification
ordered_values.reserve(specs.size());
for (auto&& spec : specs) {
auto& spec_name = spec->name->text();
bool found_value_for_name = false;
for (size_t j = 0; j < names.size(); j++) {
if (names[j] == spec_name) {
ordered_values.emplace_back(_value_views[j]);
found_value_for_name = true;
break;
}
}
// No bound value was found with the name `spec_name`.
// This means that the user forgot to include a bound value with such name.
if (!found_value_for_name) {
throw exceptions::invalid_request_exception(
format("Missing value for bind marker with name: {}", spec_name));
}
}
_value_views = std::move(ordered_values);
}

View File

@@ -22,6 +22,7 @@
#include "db/config.hh"
#include "data_dictionary/data_dictionary.hh"
#include "hashers.hh"
#include "utils/error_injection.hh"
namespace cql3 {
@@ -600,6 +601,14 @@ query_processor::get_statement(const sstring_view& query, const service::client_
std::unique_ptr<raw::parsed_statement>
query_processor::parse_statement(const sstring_view& query) {
try {
{
const char* error_injection_key = "query_processor-parse_statement-test_failure";
utils::get_local_injector().inject(error_injection_key, [&]() {
if (query.find(error_injection_key) != sstring_view::npos) {
throw std::runtime_error(error_injection_key);
}
});
}
auto statement = util::do_with_parser(query, std::mem_fn(&cql3_parser::CqlParser::query));
if (!statement) {
throw exceptions::syntax_exception("Parsing failed");

View File

@@ -80,7 +80,7 @@ public:
virtual sstring assignment_testable_source_context() const override {
auto&& name = _type->field_name(_field);
auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
auto sname = std::string_view(reinterpret_cast<const char*>(name.data()), name.size());
return format("{}.{}", _selected, sname);
}

View File

@@ -35,7 +35,7 @@ drop_function_statement::prepare_schema_mutations(query_processor& qp, api::time
if (!user_func) {
throw exceptions::invalid_request_exception(format("'{}' is not a user defined function", func));
}
if (auto aggregate = functions::functions::used_by_user_aggregate(user_func->name()); bool(aggregate)) {
if (auto aggregate = functions::functions::used_by_user_aggregate(user_func)) {
throw exceptions::invalid_request_exception(format("Cannot delete function {}, as it is used by user-defined aggregate {}", func, *aggregate));
}
m = co_await qp.get_migration_manager().prepare_function_drop_announcement(user_func, ts);

View File

@@ -10,6 +10,7 @@
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "cql3/query_processor.hh"
#include "cql3/functions/functions.hh"
#include "boost/range/adaptor/map.hpp"
@@ -109,6 +110,9 @@ void drop_type_statement::validate_while_executing(query_processor& qp) const {
}
}
if (auto&& fun_name = functions::functions::used_by_user_function(_name)) {
throw exceptions::invalid_request_exception(format("Cannot drop user type {}.{} as it is still used by function {}", keyspace, type->get_name_as_string(), *fun_name));
}
} catch (data_dictionary::no_such_keyspace& e) {
throw exceptions::invalid_request_exception(format("Cannot drop type in unknown keyspace {}", keyspace()));
}

View File

@@ -17,6 +17,7 @@
#include "cql3/util.hh"
#include "validation.hh"
#include "db/consistency_level_validations.hh"
#include <optional>
#include <seastar/core/shared_ptr.hh>
#include <boost/range/adaptor/transformed.hpp>
#include <boost/range/adaptor/map.hpp>
@@ -95,8 +96,9 @@ bool modification_statement::is_timestamp_set() const {
return attrs->is_timestamp_set();
}
gc_clock::duration modification_statement::get_time_to_live(const query_options& options) const {
return gc_clock::duration(attrs->get_time_to_live(options));
std::optional<gc_clock::duration> modification_statement::get_time_to_live(const query_options& options) const {
std::optional<int32_t> ttl = attrs->get_time_to_live(options);
return ttl ? std::make_optional<gc_clock::duration>(*ttl) : std::nullopt;
}
future<> modification_statement::check_access(query_processor& qp, const service::client_state& state) const {

View File

@@ -130,7 +130,7 @@ public:
bool is_timestamp_set() const;
gc_clock::duration get_time_to_live(const query_options& options) const;
std::optional<gc_clock::duration> get_time_to_live(const query_options& options) const;
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;

View File

@@ -93,7 +93,7 @@ public:
};
// Note: value (mutation) only required to contain the rows we are interested in
private:
const gc_clock::duration _ttl;
const std::optional<gc_clock::duration> _ttl;
// For operations that require a read-before-write, stores prefetched cell values.
// For CAS statements, stores values of conditioned columns.
// Is a reference to an outside prefetch_data container since a CAS BATCH statement
@@ -106,7 +106,7 @@ public:
const query_options& _options;
update_parameters(const schema_ptr schema_, const query_options& options,
api::timestamp_type timestamp, gc_clock::duration ttl, const prefetch_data& prefetched)
api::timestamp_type timestamp, std::optional<gc_clock::duration> ttl, const prefetch_data& prefetched)
: _ttl(ttl)
, _prefetched(prefetched)
, _timestamp(timestamp)
@@ -127,11 +127,7 @@ public:
}
atomic_cell make_cell(const abstract_type& type, const raw_value_view& value, atomic_cell::collection_member cm = atomic_cell::collection_member::no) const {
auto ttl = _ttl;
if (ttl.count() <= 0) {
ttl = _schema->default_time_to_live();
}
auto ttl = this->ttl();
return value.with_value([&] (const FragmentedView auto& v) {
if (ttl.count() > 0) {
@@ -143,11 +139,7 @@ public:
};
atomic_cell make_cell(const abstract_type& type, const managed_bytes_view& value, atomic_cell::collection_member cm = atomic_cell::collection_member::no) const {
auto ttl = _ttl;
if (ttl.count() <= 0) {
ttl = _schema->default_time_to_live();
}
auto ttl = this->ttl();
if (ttl.count() > 0) {
return atomic_cell::make_live(type, _timestamp, value, _local_deletion_time + ttl, ttl, cm);
@@ -169,7 +161,7 @@ public:
}
gc_clock::duration ttl() const {
return _ttl.count() > 0 ? _ttl : _schema->default_time_to_live();
return _ttl.value_or(_schema->default_time_to_live());
}
gc_clock::time_point expiry() const {

View File

@@ -216,7 +216,7 @@ keyspace_metadata::keyspace_metadata(std::string_view name,
std::move(strategy_options),
durable_writes,
std::move(cf_defs),
user_types_metadata{},
std::move(user_types),
storage_options{}) { }
keyspace_metadata::keyspace_metadata(std::string_view name,

View File

@@ -59,7 +59,7 @@ public:
}
_end_of_stream = false;
forward_buffer_to(pr.start());
clear_buffer();
return _underlying->fast_forward_to(std::move(pr));
}

View File

@@ -1671,9 +1671,9 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
align = f.disk_write_dma_alignment();
auto is_overwrite = false;
auto existing_size = f.known_size();
if ((flags & open_flags::dsync) != open_flags{}) {
auto existing_size = f.known_size();
is_overwrite = true;
// would be super nice if we just could mmap(/dev/zero) and do sendto
// instead of this, but for now we must do explicit buffer writes.
@@ -1683,8 +1683,6 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
if (existing_size > max_size) {
co_await f.truncate(max_size);
} else if (existing_size < max_size) {
totals.total_size_on_disk += (max_size - existing_size);
clogger.trace("Pre-writing {} of {} KB to segment {}", (max_size - existing_size)/1024, max_size/1024, filename);
// re-open without o_dsync for pre-alloc. The reason/rationale
@@ -1732,6 +1730,12 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
co_await f.truncate(max_size);
}
// #12810 - we did not update total_size_on_disk unless o_dsync was
// on. So kept running with total == 0 -> free for all in creating new segment.
// Always update total_size_on_disk. Will wrap-around iff existing_size > max_size.
// That is ok.
totals.total_size_on_disk += (max_size - existing_size);
if (cfg.extensions && !cfg.extensions->commitlog_file_extensions().empty()) {
for (auto * ext : cfg.extensions->commitlog_file_extensions()) {
auto nf = co_await ext->wrap_file(filename, f, flags);
@@ -2116,6 +2120,9 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
clogger.debug("Discarding segments {}", ftd);
for (auto& [f, mode] : ftd) {
// `f.remove_file()` resets known_size to 0, so remember the size here,
// in order to subtract it from total_size_on_disk accurately.
auto size = f.known_size();
try {
if (f) {
co_await f.close();
@@ -2132,7 +2139,6 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
}
}
auto size = f.known_size();
auto usage = totals.total_size_on_disk;
auto next_usage = usage - size;
@@ -2165,7 +2171,7 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
// or had such an exception that we consider the file dead
// anyway. In either case we _remove_ the file size from
// footprint, because it is no longer our problem.
totals.total_size_on_disk -= f.known_size();
totals.total_size_on_disk -= size;
}
// #8376 - if we had an error in recycling (disk rename?), and no elements

View File

@@ -909,6 +909,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, force_schema_commit_log(this, "force_schema_commit_log", value_status::Used, false,
"Use separate schema commit log unconditionally rater than after restart following discovery of cluster-wide support for it.")
, task_ttl_seconds(this, "task_ttl_in_seconds", liveness::LiveUpdate, value_status::Used, 10, "Time for which information about finished task stays in memory.")
, nodeops_watchdog_timeout_seconds(this, "nodeops_watchdog_timeout_seconds", liveness::LiveUpdate, value_status::Used, 120, "Time in seconds after which node operations abort when not hearing from the coordinator")
, nodeops_heartbeat_interval_seconds(this, "nodeops_heartbeat_interval_seconds", liveness::LiveUpdate, value_status::Used, 10, "Period of heartbeat ticks in node operations")
, cache_index_pages(this, "cache_index_pages", liveness::LiveUpdate, value_status::Used, false,
"Keep SSTable index pages in the global cache after a SSTable read. Expected to improve performance for workloads with big partitions, but may degrade performance for workloads with small partitions.")
, x_log2_compaction_groups(this, "x_log2_compaction_groups", value_status::Used, 0, "Controls static number of compaction groups per table per shard. For X groups, set the option to log (base 2) of X. Example: Value of 3 implies 8 groups.")

View File

@@ -388,6 +388,8 @@ public:
named_value<bool> force_schema_commit_log;
named_value<uint32_t> task_ttl_seconds;
named_value<uint32_t> nodeops_watchdog_timeout_seconds;
named_value<uint32_t> nodeops_heartbeat_interval_seconds;
named_value<bool> cache_index_pages;
@@ -401,6 +403,10 @@ public:
named_value<uint64_t> wasm_udf_yield_fuel;
named_value<uint64_t> wasm_udf_total_fuel;
named_value<size_t> wasm_udf_memory_limit;
// wasm_udf_reserved_memory is static because the options in db::config
// are parsed using seastar::app_template, while this option is used for
// configuring the Seastar memory subsystem.
static constexpr size_t wasm_udf_reserved_memory = 50 * 1024 * 1024;
seastar::logging_settings logging_settings(const log_cli::options&) const;

View File

@@ -7,6 +7,7 @@
*/
#include <seastar/core/print.hh>
#include <seastar/core/coroutine.hh>
#include "db/system_keyspace.hh"
#include "db/large_data_handler.hh"
#include "sstables/sstables.hh"
@@ -55,11 +56,11 @@ void large_data_handler::start() {
}
future<> large_data_handler::stop() {
if (!running()) {
return make_ready_future<>();
if (running()) {
_running = false;
large_data_logger.info("Waiting for {} background handlers", max_concurrency - _sem.available_units());
co_await _sem.wait(max_concurrency);
}
_running = false;
return _sem.wait(max_concurrency);
}
void large_data_handler::plug_system_keyspace(db::system_keyspace& sys_ks) noexcept {

View File

@@ -2216,15 +2216,15 @@ std::vector<mutation> make_create_aggregate_mutations(schema_features features,
mutation& m = p.first;
clustering_key& ckey = p.second;
data_type state_type = aggregate->sfunc().arg_types()[0];
data_type state_type = aggregate->sfunc()->arg_types()[0];
if (aggregate->has_finalfunc()) {
m.set_clustered_cell(ckey, "final_func", aggregate->finalfunc().name().name, timestamp);
m.set_clustered_cell(ckey, "final_func", aggregate->finalfunc()->name().name, timestamp);
}
if (aggregate->initcond()) {
m.set_clustered_cell(ckey, "initcond", state_type->deserialize(*aggregate->initcond()).to_parsable_string(), timestamp);
}
m.set_clustered_cell(ckey, "return_type", aggregate->return_type()->as_cql3_type().to_string(), timestamp);
m.set_clustered_cell(ckey, "state_func", aggregate->sfunc().name().name, timestamp);
m.set_clustered_cell(ckey, "state_func", aggregate->sfunc()->name().name, timestamp);
m.set_clustered_cell(ckey, "state_type", state_type->as_cql3_type().to_string(), timestamp);
std::vector<mutation> muts = {m};
@@ -2233,7 +2233,7 @@ std::vector<mutation> make_create_aggregate_mutations(schema_features features,
auto sa_p = get_mutation(sa_schema, *aggregate);
mutation& sa_mut = sa_p.first;
clustering_key& sa_ckey = sa_p.second;
sa_mut.set_clustered_cell(sa_ckey, "reduce_func", aggregate->reducefunc().name().name, timestamp);
sa_mut.set_clustered_cell(sa_ckey, "reduce_func", aggregate->reducefunc()->name().name, timestamp);
sa_mut.set_clustered_cell(sa_ckey, "state_type", state_type->as_cql3_type().to_string(), timestamp);
muts.emplace_back(sa_mut);

View File

@@ -295,7 +295,7 @@ future<> size_estimates_mutation_reader::fast_forward_to(const dht::partition_ra
}
future<> size_estimates_mutation_reader::fast_forward_to(position_range pr) {
forward_buffer_to(pr.start());
clear_buffer();
_end_of_stream = false;
if (_partition_reader) {
return _partition_reader->fast_forward_to(std::move(pr));

View File

@@ -2276,7 +2276,10 @@ public:
add_partition(mutation_sink, "trace_probability", format("{:.2}", tracing::tracing::get_local_tracing_instance().get_trace_probability()));
co_await add_partition(mutation_sink, "memory", [this] () {
struct stats {
uint64_t total = 0;
// take the pre-reserved memory into account, as seastar only returns
// the stats of memory managed by the seastar allocator, but we instruct
// it to reserve addition memory for system.
uint64_t total = db::config::wasm_udf_reserved_memory;
uint64_t free = 0;
static stats reduce(stats a, stats b) { return stats{a.total + b.total, a.free + b.free}; }
};
@@ -3344,11 +3347,11 @@ mutation system_keyspace::make_group0_history_state_id_mutation(
using namespace std::chrono;
assert(*gc_older_than >= gc_clock::duration{0});
auto ts_millis = duration_cast<milliseconds>(microseconds{ts});
auto gc_older_than_millis = duration_cast<milliseconds>(*gc_older_than);
assert(gc_older_than_millis < ts_millis);
auto ts_micros = microseconds{ts};
auto gc_older_than_micros = duration_cast<microseconds>(*gc_older_than);
assert(gc_older_than_micros < ts_micros);
auto tomb_upper_bound = utils::UUID_gen::min_time_UUID(ts_millis - gc_older_than_millis);
auto tomb_upper_bound = utils::UUID_gen::min_time_UUID(ts_micros - gc_older_than_micros);
// We want to delete all entries with IDs smaller than `tomb_upper_bound`
// but the deleted range is of the form (x, +inf) since the schema is reversed.
auto range = query::clustering_range::make_starting_with({

View File

@@ -172,7 +172,7 @@ class build_progress_virtual_reader {
}
virtual future<> fast_forward_to(position_range range) override {
forward_buffer_to(range.start());
clear_buffer();
_end_of_stream = false;
return _underlying.fast_forward_to(std::move(range));
}
@@ -197,7 +197,7 @@ public:
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
return flat_mutation_reader_v2(std::make_unique<build_progress_reader>(
std::move(s),
s,
std::move(permit),
_db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),
range,

View File

@@ -85,29 +85,25 @@ future<row_locker::lock_holder>
row_locker::lock_ck(const dht::decorated_key& pk, const clustering_key_prefix& cpk, bool exclusive, db::timeout_clock::time_point timeout, stats& stats) {
mylog.debug("taking shared lock on partition {}, and {} lock on row {} in it", pk, (exclusive ? "exclusive" : "shared"), cpk);
auto tracker = latency_stats_tracker(exclusive ? stats.exclusive_row : stats.shared_row);
auto ck = cpk;
// Create a two-level lock entry for the partition if it doesn't exist already.
auto i = _two_level_locks.try_emplace(pk, this).first;
// The two-level lock entry we've just created is guaranteed to be kept alive as long as it's locked.
// Initiating read locking in the background below ensures that even if the two-level lock is currently
// write-locked, releasing the write-lock will synchronously engage any waiting
// locks and will keep the entry alive.
future<lock_type::holder> lock_partition = i->second._partition_lock.hold_read_lock(timeout);
auto j = i->second._row_locks.find(cpk);
if (j == i->second._row_locks.end()) {
// Not yet locked, need to create the lock. This makes a copy of cpk.
try {
j = i->second._row_locks.emplace(cpk, lock_type()).first;
} catch(...) {
// If this emplace() failed, e.g., out of memory, we fail. We
// could do nothing - the partition lock we already started
// taking will be unlocked automatically after being locked.
// But it's better form to wait for the work we started, and it
// will also allow us to remove the hash-table row we added.
return lock_partition.then([ex = std::current_exception()] (auto lock) {
// The lock is automatically released when "lock" goes out of scope.
// TODO: unlock (lock = {}) now, search for the partition in the
// hash table (we know it's still there, because we held the lock until
// now) and remove the unused lock from the hash table if still unused.
return make_exception_future<row_locker::lock_holder>(std::current_exception());
});
return lock_partition.then([this, pk = &i->first, row_locks = &i->second._row_locks, ck = std::move(ck), exclusive, tracker = std::move(tracker), timeout] (auto lock1) mutable {
auto j = row_locks->find(ck);
if (j == row_locks->end()) {
// Not yet locked, need to create the lock.
j = row_locks->emplace(std::move(ck), lock_type()).first;
}
}
return lock_partition.then([this, pk = &i->first, cpk = &j->first, &row_lock = j->second, exclusive, tracker = std::move(tracker), timeout] (auto lock1) mutable {
auto* cpk = &j->first;
auto& row_lock = j->second;
// Like to the two-level lock entry above, the row_lock entry we've just created
// is guaranteed to be kept alive as long as it's locked.
// Initiating read/write locking in the background below ensures that.
auto lock_row = exclusive ? row_lock.hold_write_lock(timeout) : row_lock.hold_read_lock(timeout);
return lock_row.then([this, pk, cpk, exclusive, tracker = std::move(tracker), lock1 = std::move(lock1)] (auto lock2) mutable {
lock1.release();

View File

@@ -1825,6 +1825,8 @@ future<> view_builder::start(service::migration_manager& mm) {
(void)_build_step.trigger();
return make_ready_future<>();
});
}).handle_exception_type([] (const seastar::sleep_aborted& e) {
vlogger.debug("start aborted: {}", e.what());
}).handle_exception([] (std::exception_ptr eptr) {
vlogger.error("start failed: {}", eptr);
return make_ready_future<>();
@@ -2523,32 +2525,33 @@ update_backlog node_update_backlog::add_fetch(unsigned shard, update_backlog bac
return std::max(backlog, _max.load(std::memory_order_relaxed));
}
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const sstring& ks_name, const sstring& cf_name) {
return sys_dist_ks.view_status(ks_name, cf_name).then([] (std::unordered_map<locator::host_id, sstring>&& view_statuses) {
return boost::algorithm::any_of(view_statuses | boost::adaptors::map_values, [] (const sstring& view_status) {
return view_status == "STARTED";
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const sstring& ks_name,
const sstring& cf_name) {
using view_statuses_type = std::unordered_map<locator::host_id, sstring>;
return sys_dist_ks.view_status(ks_name, cf_name).then([&tm] (view_statuses_type&& view_statuses) {
return boost::algorithm::any_of(view_statuses, [&tm] (const view_statuses_type::value_type& view_status) {
// Only consider status of known hosts.
return view_status.second == "STARTED" && tm.get_endpoint_for_host_id(view_status.first);
});
});
}
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const replica::table& t, streaming::stream_reason reason) {
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const replica::table& t,
streaming::stream_reason reason) {
if (is_internal_keyspace(t.schema()->ks_name())) {
return make_ready_future<bool>(false);
}
if (reason == streaming::stream_reason::repair && !t.views().empty()) {
return make_ready_future<bool>(true);
}
return do_with(t.views(), [&sys_dist_ks] (auto& views) {
return do_with(t.views(), [&sys_dist_ks, &tm] (auto& views) {
return map_reduce(views,
[&sys_dist_ks] (const view_ptr& view) { return check_view_build_ongoing(sys_dist_ks, view->ks_name(), view->cf_name()); },
[&sys_dist_ks, &tm] (const view_ptr& view) { return check_view_build_ongoing(sys_dist_ks, tm, view->ks_name(), view->cf_name()); },
false,
std::logical_or<bool>());
});
}
const size_t view_updating_consumer::buffer_size_soft_limit{1 * 1024 * 1024};
const size_t view_updating_consumer::buffer_size_hard_limit{2 * 1024 * 1024};
void view_updating_consumer::do_flush_buffer() {
_staging_reader_handle.pause();
@@ -2571,6 +2574,10 @@ void view_updating_consumer::do_flush_buffer() {
}
void view_updating_consumer::flush_builder() {
_buffer.emplace_back(_mut_builder->flush());
}
void view_updating_consumer::end_builder() {
_mut_builder->consume_end_of_partition();
if (auto mut_opt = _mut_builder->consume_end_of_stream()) {
_buffer.emplace_back(std::move(*mut_opt));
@@ -2579,11 +2586,9 @@ void view_updating_consumer::flush_builder() {
}
void view_updating_consumer::maybe_flush_buffer_mid_partition() {
if (_buffer_size >= buffer_size_hard_limit) {
if (_buffer_size >= _buffer_size_hard_limit) {
flush_builder();
auto dk = _buffer.back().decorated_key();
do_flush_buffer();
consume_new_partition(dk);
}
}

View File

@@ -22,9 +22,13 @@ class system_distributed_keyspace;
}
namespace locator {
class token_metadata;
}
namespace db::view {
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const sstring& ks_name, const sstring& cf_name);
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const replica::table& t, streaming::stream_reason reason);
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const replica::table& t,
streaming::stream_reason reason);
}

View File

@@ -157,11 +157,11 @@ future<> view_update_generator::start() {
service::get_local_streaming_priority(),
nullptr,
::mutation_reader::forwarding::no);
auto close_sr = deferred_close(staging_sstable_reader);
inject_failure("view_update_generator_consume_staging_sstable");
auto result = staging_sstable_reader.consume_in_thread(view_updating_consumer(s, std::move(permit), *t, sstables, _as, staging_sstable_reader_handle),
dht::incremental_owned_ranges_checker::make_partition_filter(_db.get_keyspace_local_ranges(s->ks_name())));
staging_sstable_reader.close().get();
if (result == stop_iteration::yes) {
break;
}

View File

@@ -33,8 +33,17 @@ public:
// We prefer flushing on partition boundaries, so at the end of a partition,
// we flush on reaching the soft limit. Otherwise we continue accumulating
// data. We flush mid-partition if we reach the hard limit.
static const size_t buffer_size_soft_limit;
static const size_t buffer_size_hard_limit;
static constexpr size_t buffer_size_soft_limit_default = 1 * 1024 * 1024;
static constexpr size_t buffer_size_hard_limit_default = 2 * 1024 * 1024;
private:
size_t _buffer_size_soft_limit = buffer_size_soft_limit_default;
size_t _buffer_size_hard_limit = buffer_size_hard_limit_default;
public:
// Meant only for usage in tests.
void set_buffer_size_limit_for_testing_purposes(size_t sz) {
_buffer_size_soft_limit = sz;
_buffer_size_hard_limit = sz;
}
private:
schema_ptr _schema;
@@ -49,6 +58,7 @@ private:
private:
void do_flush_buffer();
void flush_builder();
void end_builder();
void maybe_flush_buffer_mid_partition();
public:
@@ -113,8 +123,8 @@ public:
if (_as->abort_requested()) {
return stop_iteration::yes;
}
flush_builder();
if (_buffer_size >= buffer_size_soft_limit) {
end_builder();
if (_buffer_size >= _buffer_size_soft_limit) {
do_flush_buffer();
}
return stop_iteration::no;

View File

@@ -478,7 +478,15 @@ static future<bool> ping_with_timeout(pinger::endpoint_id id, clock::timepoint_t
auto f = pinger.ping(id, timeout_as);
auto sleep_and_abort = [] (clock::timepoint_t timeout, abort_source& timeout_as, clock& c) -> future<> {
co_await c.sleep_until(timeout, timeout_as);
co_await c.sleep_until(timeout, timeout_as).then_wrapped([&timeout_as] (auto&& f) {
// Avoid throwing if sleep was aborted.
if (f.failed() && timeout_as.abort_requested()) {
// Expected (if ping() resolved first or we were externally aborted).
f.ignore_ready_future();
return make_ready_future<>();
}
return f;
});
if (!timeout_as.abort_requested()) {
// We resolved before `f`. Abort the operation.
timeout_as.request_abort();
@@ -501,8 +509,6 @@ static future<bool> ping_with_timeout(pinger::endpoint_id id, clock::timepoint_t
// Wait on the sleep as well (it should return shortly, being aborted) so we don't discard the future.
try {
co_await std::move(sleep_and_abort);
} catch (const sleep_aborted&) {
// Expected (if `f` resolved first or we were externally aborted).
} catch (...) {
// There should be no other exceptions, but just in case... log it and discard,
// we want to propagate exceptions from `f`, not from sleep.

View File

@@ -42,7 +42,8 @@ if __name__ == '__main__':
if systemd_unit.available('systemd-coredump@.service'):
dropin = '''
[Service]
TimeoutStartSec=infinity
RuntimeMaxSec=infinity
TimeoutSec=infinity
'''[1:-1]
os.makedirs('/etc/systemd/system/systemd-coredump@.service.d', exist_ok=True)
with open('/etc/systemd/system/systemd-coredump@.service.d/timeout.conf', 'w') as f:

View File

@@ -16,7 +16,7 @@ if __name__ == '__main__':
if os.getuid() > 0:
print('Requires root permission.')
sys.exit(1)
systemd_unit('scylla-fstrim.timer').unmask()
systemd_unit('scylla-fstrim.timer').enable()
systemd_unit('scylla-fstrim.timer').start()
if is_redhat_variant() or is_arch() or is_suse_variant():
systemd_unit('fstrim.timer').disable()

View File

@@ -25,7 +25,7 @@ if __name__ == '__main__':
run('dd if=/dev/zero of=/var/tmp/kernel-check.img bs=1M count=128', shell=True, check=True, stdout=DEVNULL, stderr=DEVNULL)
run('mkfs.xfs /var/tmp/kernel-check.img', shell=True, check=True, stdout=DEVNULL, stderr=DEVNULL)
run('mount /var/tmp/kernel-check.img /var/tmp/mnt -o loop', shell=True, check=True, stdout=DEVNULL, stderr=DEVNULL)
ret = run('iotune --fs-check --evaluation-directory /var/tmp/mnt', shell=True).returncode
ret = run('iotune --fs-check --evaluation-directory /var/tmp/mnt --default-log-level error', shell=True).returncode
run('umount /var/tmp/mnt', shell=True, check=True)
shutil.rmtree('/var/tmp/mnt')
os.remove('/var/tmp/kernel-check.img')

View File

@@ -125,9 +125,12 @@ if __name__ == '__main__':
procs.append(proc)
for proc in procs:
proc.wait()
for disk in disks:
run(f'wipefs -a {disk}', shell=True, check=True)
if raid:
run('udevadm settle', shell=True, check=True)
run('mdadm --create --verbose --force --run {raid} --level={level} -c1024 --raid-devices={nr_disk} {disks}'.format(raid=fsdev, level=args.raid_level, nr_disk=len(disks), disks=args.disks.replace(',', ' ')), shell=True, check=True)
run(f'wipefs -a {fsdev}', shell=True, check=True)
run('udevadm settle', shell=True, check=True)
major_minor = os.stat(fsdev).st_rdev
@@ -138,7 +141,7 @@ if __name__ == '__main__':
# and it also cannot be smaller than the sector size.
block_size = max(1024, sector_size)
run('udevadm settle', shell=True, check=True)
run(f'mkfs.xfs -b size={block_size} {fsdev} -f -K', shell=True, check=True)
run(f'mkfs.xfs -b size={block_size} {fsdev} -K', shell=True, check=True)
run('udevadm settle', shell=True, check=True)
if is_debian_variant():

View File

@@ -7,7 +7,7 @@ Group: Applications/Databases
License: AGPLv3
URL: http://www.scylladb.com/
Source0: %{reloc_pkg}
Requires: %{product}-server = %{version} %{product}-conf = %{version} %{product}-python3 = %{version} %{product}-kernel-conf = %{version} %{product}-jmx = %{version} %{product}-tools = %{version} %{product}-tools-core = %{version} %{product}-node-exporter = %{version}
Requires: %{product}-server = %{version}-%{release} %{product}-conf = %{version}-%{release} %{product}-python3 = %{version}-%{release} %{product}-kernel-conf = %{version}-%{release} %{product}-jmx = %{version}-%{release} %{product}-tools = %{version}-%{release} %{product}-tools-core = %{version}-%{release} %{product}-node-exporter = %{version}-%{release}
Obsoletes: scylla-server < 1.1
%global _debugsource_template %{nil}
@@ -54,7 +54,7 @@ Group: Applications/Databases
Summary: The Scylla database server
License: AGPLv3
URL: http://www.scylladb.com/
Requires: %{product}-conf = %{version} %{product}-python3 = %{version}
Requires: %{product}-conf = %{version}-%{release} %{product}-python3 = %{version}-%{release}
Conflicts: abrt
AutoReqProv: no

View File

@@ -1,6 +1,77 @@
### a dictionary of redirections
#old path: new path
# removing the Enterprise upgrade guides from the Open Source documentation
/stable/upgrade/upgrade-enterprise/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-image.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2021.1-to-2022.1-image.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/metric-update-2021.1-to-2022.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/metric-update-2021.1-to-2022.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/upgrade-guide-from-2020.1-to-2021.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/metric-update-2020.1-to-2021.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.1-to-2021.1/metric-update-2020.1-to-2021.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/upgrade-guide-from-2019.1-to-2020.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/metric-update-2019.1-to-2020.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.1-to-2020.1/metric-update-2019.1-to-2020.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/metric-update-2018.1-to-2019.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.1-to-2019.1/metric-update-2018.1-to-2019.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/metric-update-2017.1-to-2018.1.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/metric-update-2017.1-to-2018.1.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-ubuntu-14-to-16.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-ubuntu-14-to-16.html
/stable/getting-started/install-scylla/unified-installer.html#unified-installed-upgrade: https://enterprise.docs.scylladb.com/stable/getting-started/install-scylla/unified-installer.html#unified-installed-upgrade
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-image.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-image.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-20-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-ubuntu-20-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-debian-10.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/upgrade-guide-from-2022.x.y-to-2022.x.z-debian-10.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-20-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-ubuntu-20-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-9.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-9.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-10.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.x.y-to-2021.x.z/upgrade-guide-from-2021.x.y-to-2021.x.z-debian-10.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-16-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-16-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-18-04.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-ubuntu-18-04.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-9.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-9.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-10.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2020.x.y-to-2020.x.z/upgrade-guide-from-2020.x.y-to-2020.x.z-debian-10.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2019.x.y-to-2019.x.z/upgrade-guide-from-2019.x.y-to-2019.x.z-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2018.x.y-to-2018.x.z/upgrade-guide-from-2018.x.y-to-2018.x.z-debian.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/index.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/index.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-rpm.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-rpm.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu.html
/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-debian.html: https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-debian.html
# removing the Enterprise-only content from the Open Source documentation
/stable/using-scylla/workload-prioritization: https://enterprise.docs.scylladb.com//stable/using-scylla/workload-prioritization.html
/stable/operating-scylla/security/encryption-at-rest: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/encryption-at-rest.html
/stable/operating-scylla/security/ldap-authentication: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/ldap-authentication.html
/stable/operating-scylla/security/ldap-authorization: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/ldap-authorization.html
/stable/operating-scylla/security/auditing: https://enterprise.docs.scylladb.com/stable/operating-scylla/security/auditing.html
# unifying the Ubunut upgrade guide for different Ubuntu versions: from 5.0 to 2022.1
/stable/upgrade/upgrade-to-enterprise/upgrade-guide-from-5.0-to-2022.1/upgrade-guide-from-5.0-to-2022.1-ubuntu-18-04.html: /stable/upgrade/upgrade-to-enterprise/upgrade-guide-from-5.0-to-2022.1/upgrade-guide-from-5.0-to-2022.1-ubuntu.html
@@ -1112,14 +1183,14 @@ tls-ssl/index.html: /stable/operating-scylla/security
/using-scylla/integrations/integration_kairos/index.html: /stable/using-scylla/integrations/integration-kairos
/upgrade/ami_upgrade/index.html: /stable/upgrade/ami-upgrade
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
# move scylla cloud for AWS to dedicated directory
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/AWS/aws-vpc-peering
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: /stable/scylla-cloud/cloud-setup/AWS/cloud-prom-proxy
/scylla-cloud/cloud-setup/outposts/index.html: /stable/scylla-cloud/cloud-setup/AWS/outposts
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: /stable/scylla-cloud/cloud-setup/AWS/scylla-cloud-byoa
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/aws-vpc-peering.html
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: https://cloud.docs.scylladb.com/stable/monitoring/cloud-prom-proxy.html
/scylla-cloud/cloud-setup/outposts/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/outposts.html
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/scylla-cloud-byoa.html
/scylla-cloud/cloud-services/scylla_cloud_costs/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-costs
/scylla-cloud/cloud-services/scylla_cloud_managin_versions/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-managin-versions
/scylla-cloud/cloud-services/scylla_cloud_support_alerts_sla/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-support-alerts-sla

View File

@@ -161,6 +161,10 @@ events appear in the Streams API as normal deletions - without the
distinctive marker on deletions which are really expirations.
See <https://github.com/scylladb/scylla/issues/5060>.
<!--- REMOVE IN FUTURE VERSIONS - Remove the note below in version 5.3/2023.1 -->
> **Note** This feature is experimental in versions earlier than ScyllaDB Open Source 5.2 and ScyllaDB Enterprise 2022.2.
---

View File

@@ -5,7 +5,7 @@ Raft Consensus Algorithm in ScyllaDB
Introduction
--------------
ScyllaDB was originally designed, following Apache Cassandra, to use gossip for topology and schema updates and the Paxos consensus algorithm for
strong data consistency (:doc:`LWT </using-scylla/lwt>`). To achieve stronger consistency without performance penalty, ScyllaDB 5.x has turned to Raft - a consensus algorithm designed as an alternative to both gossip and Paxos.
strong data consistency (:doc:`LWT </using-scylla/lwt>`). To achieve stronger consistency without performance penalty, ScyllaDB has turned to Raft - a consensus algorithm designed as an alternative to both gossip and Paxos.
Raft is a consensus algorithm that implements a distributed, consistent, replicated log across members (nodes). Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines.
@@ -13,9 +13,9 @@ Raft uses a heartbeat mechanism to trigger a leader election. All servers start
Leader selection is described in detail in the `Raft paper <https://raft.github.io/raft.pdf>`_.
ScyllaDB 5.x may use Raft to maintain schema updates in every node (see below). Any schema update, like ALTER, CREATE or DROP TABLE, is first committed as an entry in the replicated Raft log, and, once stored on most replicas, applied to all nodes **in the same order**, even in the face of a node or network failures.
ScyllaDB can use Raft to maintain schema updates in every node (see below). Any schema update, like ALTER, CREATE or DROP TABLE, is first committed as an entry in the replicated Raft log, and, once stored on most replicas, applied to all nodes **in the same order**, even in the face of a node or network failures.
Following ScyllaDB 5.x releases will use Raft to guarantee consistent topology updates similarly.
Upcoming ScyllaDB releases will use Raft to guarantee consistent topology updates similarly.
.. _raft-quorum-requirement:
@@ -26,90 +26,55 @@ Raft requires at least a quorum of nodes in a cluster to be available. If multip
and the quorum is lost, the cluster is unavailable for schema updates. See :ref:`Handling Failures <raft-handling-failures>`
for information on how to handle failures.
Upgrade Considerations for ScyllaDB 5.0 and Later
==================================================
Note that when you have a two-DC cluster with the same number of nodes in each DC, the cluster will lose the quorum if one
of the DCs is down.
**We recommend configuring three DCs per cluster to ensure that the cluster remains available and operational when one DC is down.**
.. _enabling-raft-existing-cluster:
Enabling Raft
---------------
Enabling Raft in ScyllaDB 5.0 and 5.1
=====================================
.. warning::
In ScyllaDB 5.0 and 5.1, Raft is an experimental feature.
It is not possible to enable Raft in an existing cluster in ScyllaDB 5.0 and 5.1.
In order to have a Raft-enabled cluster in these versions, you must create a new cluster with Raft enabled from the start.
.. warning::
**Do not** use Raft in production clusters in ScyllaDB 5.0 and 5.1. Such clusters won't be able to correctly upgrade to ScyllaDB 5.2.
Use Raft only for testing and experimentation in clusters which can be thrown away.
.. warning::
Once enabled, Raft cannot be disabled on your cluster. The cluster nodes will fail to restart if you remove the Raft feature.
When creating a new cluster, add ``raft`` to the list of experimental features in your ``scylla.yaml`` file:
.. code-block:: yaml
experimental_features:
- raft
.. _enabling-raft-existing-cluster:
Enabling Raft in ScyllaDB 5.2 and further
=========================================
.. TODO include enterprise versions in this documentation
.. note::
In ScyllaDB 5.2, Raft is Generally Available and can be safely used for consistent schema management.
In ScyllaDB 5.3 it will become enabled by default.
In further versions it will be mandatory.
In ScyllaDB 5.2 and ScyllaDB Enterprise 2023.1 Raft is Generally Available and can be safely used for consistent schema management.
In further versions, it will be mandatory.
ScyllaDB 5.2 and later comes equipped with a procedure that can setup Raft-based consistent cluster management in an existing cluster. We refer to this as the **internal Raft upgrade procedure** (do not confuse with the :doc:`ScyllaDB version upgrade procedure </upgrade/upgrade-opensource/upgrade-guide-from-5.1-to-5.2/upgrade-guide-from-5.1-to-5.2-generic>`).
ScyllaDB Open Source 5.2 and later, and ScyllaDB Enterprise 2023.1 and later come equipped with a procedure that can setup Raft-based consistent cluster management in an existing cluster. We refer to this as the **Raft upgrade procedure** (do not confuse with the :doc:`ScyllaDB version upgrade procedure </upgrade/index/>`).
.. warning::
Once enabled, Raft cannot be disabled on your cluster. The cluster nodes will fail to restart if you remove the Raft feature.
To enable Raft in an existing cluster in Scylla 5.2 and beyond:
To enable Raft in an existing cluster, you need to enable the ``consistent_cluster_management`` option in the ``scylla.yaml`` file
for **each node** in the cluster:
* ensure that the schema is synchronized in the cluster by executing :doc:`nodetool describecluster </operating-scylla/nodetool-commands/describecluster>` on each node and ensuring that the schema version is the same on all nodes,
* then perform a :doc:`rolling restart </operating-scylla/procedures/config-change/rolling-restart/>`, updating the ``scylla.yaml`` file for **each node** in the cluster before restarting it to enable the ``consistent_cluster_management`` flag:
#. Ensure that the schema is synchronized in the cluster by executing :doc:`nodetool describecluster </operating-scylla/nodetool-commands/describecluster>` on each node and ensuring that the schema version is the same on all nodes.
#. Perform a :doc:`rolling restart </operating-scylla/procedures/config-change/rolling-restart/>`, updating the ``scylla.yaml`` file for **each node** in the cluster before restarting it to enable the ``consistent_cluster_management`` option:
.. code-block:: yaml
.. code-block:: yaml
consistent_cluster_management: true
consistent_cluster_management: true
When all the nodes in the cluster and updated and restarted, the cluster will start the **internal Raft upgrade procedure**.
**You must then verify** that the internal Raft upgrade procedure has finished successfully. Refer to the :ref:`next section <verify-raft-procedure>`.
When all the nodes in the cluster and updated and restarted, the cluster will start the **Raft upgrade procedure**.
**You must then verify** that the Raft upgrade procedure has finished successfully. Refer to the :ref:`next section <verify-raft-procedure>`.
You can also enable the ``consistent_cluster_management`` flag while performing :doc:`rolling upgrade from 5.1 to 5.2 </upgrade/upgrade-opensource/upgrade-guide-from-5.1-to-5.2/upgrade-guide-from-5.1-to-5.2-generic>`: update ``scylla.yaml`` before restarting each node. The internal Raft upgrade procedure will start as soon as the last node was upgraded and restarted. As above, this requires :ref:`verifying <verify-raft-procedure>` that this internal procedure successfully finishes.
Alternatively, you can enable the ``consistent_cluster_management`` option when you are:
Finally, you can enable the ``consistent_cluster_management`` flag when creating a new cluster. This does not use the internal Raft upgrade procedure; instead, Raft is functioning in the cluster and managing schema right from the start.
* Performing a rolling upgrade from version 5.1 to 5.2 or version 2022.x to 2023.1 by updating ``scylla.yaml`` before restarting each node. The Raft upgrade procedure will start as soon as the last node was upgraded and restarted. As above, this requires :ref:`verifying <verify-raft-procedure>` that the procedure successfully finishes.
* Creating a new cluster. This does not use the Raft upgrade procedure; instead, Raft is functioning in the cluster and managing schema right from the start.
Until all nodes are restarted with ``consistent_cluster_management: true``, it is still possible to turn this option back off. Once enabled on every node, it must remain turned on (or the node will refuse to restart).
.. _verify-raft-procedure:
Verifying that the internal Raft upgrade procedure finished successfully
Verifying that the Raft upgrade procedure finished successfully
========================================================================
.. versionadded:: 5.2
The internal Raft upgrade procedure starts as soon as every node in the cluster restarts with ``consistent_cluster_management`` flag enabled in ``scylla.yaml``.
The Raft upgrade procedure starts as soon as every node in the cluster restarts with ``consistent_cluster_management`` flag enabled in ``scylla.yaml``.
.. TODO: update the above sentence once 5.3 and later are released.
The procedure requires **full cluster availability** to correctly setup the Raft algorithm; after the setup finishes, Raft can proceed with only a majority of nodes, but this initial setup is an exception.
An unlucky event, such as a hardware failure, may cause one of your nodes to fail. If this happens before the internal Raft upgrade procedure finishes, the procedure will get stuck and your intervention will be required.
An unlucky event, such as a hardware failure, may cause one of your nodes to fail. If this happens before the Raft upgrade procedure finishes, the procedure will get stuck and your intervention will be required.
To verify that the procedure finishes, look at the log of every Scylla node (using ``journalctl _COMM=scylla``). Search for the following patterns:
@@ -204,8 +169,6 @@ If some nodes are **dead and irrecoverable**, you'll need to perform a manual re
Verifying that Raft is enabled
===============================
.. versionadded:: 5.2
You can verify that Raft is enabled on your cluster by performing the following query on each node:
.. code-block:: sql
@@ -224,7 +187,7 @@ The query should return:
on every node.
If the query returns 0 rows, or ``value`` is ``synchronize`` or ``use_pre_raft_procedures``, it means that the cluster is in the middle of the internal Raft upgrade procedure; consult the :ref:`relevant section <verify-raft-procedure>`.
If the query returns 0 rows, or ``value`` is ``synchronize`` or ``use_pre_raft_procedures``, it means that the cluster is in the middle of the Raft upgrade procedure; consult the :ref:`relevant section <verify-raft-procedure>`.
If ``value`` is ``recovery``, it means that the cluster is in the middle of the manual recovery procedure. The procedure must be finished. Consult :ref:`the section about Raft recovery <recover-raft-procedure>`.
@@ -276,12 +239,8 @@ Examples
- Schema updates are possible and safe.
- Try restarting the node. If the node is dead, :doc:`replace it with a new node </operating-scylla/procedures/cluster-management/replace-dead-node/>`.
* - 2 nodes
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- Restart at least 1 of the 2 nodes that are down to regain quorum. If you cant recover at least 1 of the 2 nodes, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
* - 1 datacenter
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- When the DC comes back online, restart the nodes. If the DC does not come back online and nodes are lost, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
.. list-table:: Cluster B: 2 datacenters, 6 nodes (3 nodes per DC)
:widths: 20 40 40
@@ -294,10 +253,10 @@ Examples
- Schema updates are possible and safe.
- Try restarting the node(s). If the node is dead, :doc:`replace it with a new node </operating-scylla/procedures/cluster-management/replace-dead-node/>`.
* - 3 nodes
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- Restart 1 of the 3 nodes that are down to regain quorum. If you cant recover at least 1 of the 3 failed nodes, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
* - 1DC
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- When the DCs come back online, restart the nodes. If the DC fails to come back online and the nodes are lost, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
@@ -315,7 +274,7 @@ Examples
- Schema updates are possible and safe.
- When the DC comes back online, try restarting the nodes in the cluster. If the nodes are dead, :doc:`add 3 new nodes in a new region </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`.
* - 2 DCs
- Cluster is not fully operational. The data is available for reads and writes, but schema changes are impossible.
- Data is available for reads and writes, schema changes are impossible.
- When the DCs come back online, restart the nodes. If at least one DC fails to come back online and the nodes are lost, consult the :ref:`manual Raft recovery section <recover-raft-procedure>`.
.. _recover-raft-procedure:
@@ -323,26 +282,24 @@ Examples
Raft manual recovery procedure
==============================
.. versionadded:: 5.2
The manual Raft recovery procedure applies to the following situations:
* :ref:`The internal Raft upgrade procedure <verify-raft-procedure>` got stuck because one of your nodes failed in the middle of the procedure and is irrecoverable,
* :ref:`The Raft upgrade procedure <verify-raft-procedure>` got stuck because one of your nodes failed in the middle of the procedure and is irrecoverable,
* or the cluster was running Raft but a majority of nodes (e.g. 2 our of 3) failed and are irrecoverable. Raft cannot progress unless a majority of nodes is available.
.. warning::
Perform the manual recovery procedure **only** if you're dealing with **irrecoverable** nodes. If it is possible to restart your nodes, do that instead of manual recovery.
.. warning::
.. note::
Before proceeding, make sure that the irrecoverable nodes are truly dead, and not, for example, temporarily partitioned away due to a network failure. If it is possible for the 'dead' nodes to come back to life, they might communicate and interfere with the recovery procedure and cause unpredictable problems.
If you have no means of ensuring that these irrecoverable nodes won't come back to life and communicate with the rest of the cluster, setup firewall rules or otherwise isolate your alive nodes to reject any communication attempts from these dead nodes.
During the manual recovery procedure you'll enter a special ``RECOVERY`` mode, remove all faulty nodes (using the standard :doc:`node removal procedure </operating-scylla/procedures/cluster-management/remove-node/>`), delete the internal Raft data, and restart the cluster. This will cause the cluster to perform the internal Raft upgrade procedure again, initializing the Raft algorithm from scratch. The manual recovery procedure is applicable both to clusters which were not running Raft in the past and then had Raft enabled, and to clusters which were bootstrapped using Raft.
During the manual recovery procedure you'll enter a special ``RECOVERY`` mode, remove all faulty nodes (using the standard :doc:`node removal procedure </operating-scylla/procedures/cluster-management/remove-node/>`), delete the internal Raft data, and restart the cluster. This will cause the cluster to perform the Raft upgrade procedure again, initializing the Raft algorithm from scratch. The manual recovery procedure is applicable both to clusters which were not running Raft in the past and then had Raft enabled, and to clusters which were bootstrapped using Raft.
.. warning::
.. note::
Entering ``RECOVERY`` mode requires a node restart. Restarting an additional node while some nodes are already dead may lead to unavailability of data queries (assuming that you haven't lost it already). For example, if you're using the standard RF=3, CL=QUORUM setup, and you're recovering from a stuck of upgrade procedure because one of your nodes is dead, restarting another node will cause temporary data query unavailability (until the node finishes restarting). Prepare your service for downtime before proceeding.
@@ -393,4 +350,3 @@ Learn More About Raft
* `Making Schema Changes Safe with Raft <https://www.scylladb.com/presentations/making-schema-changes-safe-with-raft/>`_ - A Scylla Summit talk by Konstantin Osipov (register for access)
* `The Future of Consensus in ScyllaDB 5.0 and Beyond <https://www.scylladb.com/presentations/the-future-of-consensus-in-scylladb-5-0-and-beyond/>`_ - A Scylla Summit talk by Tomasz Grabiec (register for access)

View File

@@ -746,9 +746,7 @@ CDC options
.. versionadded:: 3.2 Scylla Open Source
The following options are to be used with Change Data Capture. Available as an experimental feature from Scylla Open Source 3.2.
To use this feature, you must enable the :ref:`experimental tag <yaml_enabling_experimental_features>` in the scylla.yaml.
The following options can be used with Change Data Capture.
+---------------------------+-----------------+------------------------------------------------------------------------------------------------------------------------+
| option | default | description |
@@ -823,7 +821,8 @@ The ``tombstone_gc`` option allows you to prevent data resurrection. With the ``
are only removed after :term:`repair` is performed. Unlike ``gc_grace_seconds``, ``tombstone_gc`` has no time constraints - when
the ``repair`` mode is on, tombstones garbage collection will wait until repair is run.
The ``tombstone_gc`` option can be enabled using ``ALTER TABLE`` and ``CREATE TABLE``. For example:
You can enable the after-repair tombstone GC by setting the ``repair`` mode using
``ALTER TABLE`` or ``CREATE TABLE``. For example:
.. code-block:: cql
@@ -833,10 +832,6 @@ The ``tombstone_gc`` option can be enabled using ``ALTER TABLE`` and ``CREATE TA
ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'} ;
.. note::
The ``tombstone_gc`` option was added in ScyllaDB 5.0 as an experimental feature, and it is disabled by default.
You need to explicitly specify the ``repair`` mode table property to enable the feature.
The following modes are available:
.. list-table::
@@ -846,7 +841,7 @@ The following modes are available:
* - Mode
- Description
* - ``timeout``
- Tombstone GC is performed after the wait time specified with ``gc_grace_seconds``. Default in ScyllaDB 5.0.
- Tombstone GC is performed after the wait time specified with ``gc_grace_seconds`` (default).
* - ``repair``
- Tombstone GC is performed after repair is run.
* - ``disabled``

View File

@@ -609,7 +609,7 @@ of eventual consistency on an event of a timestamp collision:
``INSERT`` statements happening concurrently at different cluster
nodes proceed without coordination. Eventually cell values
supplied by a statement with the highest timestamp will prevail.
supplied by a statement with the highest timestamp will prevail (see :ref:`update ordering <update-ordering>`).
Unless a timestamp is provided by the client, Scylla will automatically
generate a timestamp with microsecond precision for each
@@ -618,7 +618,7 @@ by the same node are unique. Timestamps assigned at different
nodes are not guaranteed to be globally unique.
With a steadily high write rate timestamp collision
is not unlikely. If it happens, i.e. two ``INSERTS`` have the same
timestamp, the lexicographically bigger value prevails:
timestamp, a conflict resolution algorithm determines which of the inserted cells prevails (see :ref:`update ordering <update-ordering>`).
Please refer to the :ref:`UPDATE <update-parameters>` section for more information on the :token:`update_parameter`.
@@ -726,8 +726,8 @@ Similarly to ``INSERT``, ``UPDATE`` statement happening concurrently at differen
cluster nodes proceed without coordination. Cell values
supplied by a statement with the highest timestamp will prevail.
If two ``UPDATE`` statements or ``UPDATE`` and ``INSERT``
statements have the same timestamp,
lexicographically bigger value prevails.
statements have the same timestamp, a conflict resolution algorithm determines which cells prevails
(see :ref:`update ordering <update-ordering>`).
Regarding the :token:`assignment`:
@@ -768,7 +768,7 @@ parameters:
Scylla ensures that query timestamps created by the same coordinator node are unique (even across different shards
on the same node). However, timestamps assigned at different nodes are not guaranteed to be globally unique.
Note that with a steadily high write rate, timestamp collision is not unlikely. If it happens, e.g. two INSERTS
have the same timestamp, conflicting cell values are compared and the cells with the lexicographically bigger value prevail.
have the same timestamp, a conflict resolution algorithm determines which of the inserted cells prevails (see :ref:`update ordering <update-ordering>` for more information):
- ``TTL``: specifies an optional Time To Live (in seconds) for the inserted values. If set, the inserted values are
automatically removed from the database after the specified time. Note that the TTL concerns the inserted values, not
the columns themselves. This means that any subsequent update of the column will also reset the TTL (to whatever TTL
@@ -778,6 +778,55 @@ parameters:
- ``TIMEOUT``: specifies a timeout duration for the specific request.
Please refer to the :ref:`SELECT <using-timeout>` section for more information.
.. _update-ordering:
Update ordering
~~~~~~~~~~~~~~~
:ref:`INSERT <insert-statement>`, :ref:`UPDATE <update-statement>`, and :ref:`DELETE <delete_statement>`
operations are ordered by their ``TIMESTAMP``.
Ordering of such changes is done at the cell level, where each cell carries a write ``TIMESTAMP``,
other attributes related to its expiration when it has a non-zero time-to-live (``TTL``),
and the cell value.
The fundamental rule for ordering cells that insert, update, or delete data in a given row and column
is that the cell with the highest timestamp wins.
However, it is possible that multiple such cells will carry the same ``TIMESTAMP``.
There could be several reasons for ``TIMESTAMP`` collision:
* Benign collision can be caused by "replay" of a mutation, e.g., due to client retry, or due to internal processes.
In such cases, the cells are equivalent, and any of them can be selected arbitrarily.
* ``TIMESTAMP`` collisions might be normally caused by parallel queries that are served
by different coordinator nodes. The coordinators might calculate the same write ``TIMESTAMP``
based on their local time in microseconds.
* Collisions might also happen with user-provided timestamps if the application does not guarantee
unique timestamps with the ``USING TIMESTAMP`` parameter (see :ref:`Update parameters <update-parameters>` for more information).
As said above, in the replay case, ordering of cells should not matter, as they carry the same value
and same expiration attributes, so picking any of them will reach the same result.
However, other ``TIMESTAMP`` conflicts must be resolved in a consistent way by all nodes.
Otherwise, if nodes would have picked an arbitrary cell in case of a conflict and they would
reach different results, reading from different replicas would detect the inconsistency and trigger
read-repair that will generate yet another cell that would still conflict with the existing cells,
with no guarantee for convergence.
Therefore, Scylla implements an internal, consistent conflict-resolution algorithm
that orders cells with conflicting ``TIMESTAMP`` values based on other properties, like:
* whether the cell is a tombstone or a live cell,
* whether the cell has an expiration time,
* the cell ``TTL``,
* and finally, what value the cell carries.
The conflict-resolution algorithm is documented in `Scylla's internal documentation <https://github.com/scylladb/scylladb/blob/master/docs/dev/timestamp-conflict-resolution.md>`_
and it may be subject to change.
Reliable serialization can be achieved using unique write ``TIMESTAMP``
and by using :doc:`Lightweight Transactions (LWT) </using-scylla/lwt>` to ensure atomicity of
:ref:`INSERT <insert-statement>`, :ref:`UPDATE <update-statement>`, and :ref:`DELETE <delete_statement>`.
.. _delete_statement:
DELETE
@@ -817,7 +866,7 @@ For more information on the :token:`update_parameter` refer to the :ref:`UPDATE
In a ``DELETE`` statement, all deletions within the same partition key are applied atomically,
meaning either all columns mentioned in the statement are deleted or none.
If ``DELETE`` statement has the same timestamp as ``INSERT`` or
``UPDATE`` of the same primary key, delete operation prevails.
``UPDATE`` of the same primary key, delete operation prevails (see :ref:`update ordering <update-ordering>`).
A ``DELETE`` operation can be conditional through the use of an ``IF`` clause, similar to ``UPDATE`` and ``INSERT``
statements. Each such ``DELETE`` gets a globally unique timestamp.

View File

@@ -0,0 +1,37 @@
# Timestamp conflict resolution
The fundamental rule for ordering cells that insert, update, or delete data in a given row and column
is that the cell with the highest timestamp wins.
However, it is possible that multiple such cells will carry the same `TIMESTAMP`.
In this case, conflicts must be resolved in a consistent way by all nodes.
Otherwise, if nodes would have picked an arbitrary cell in case of a conflict and they would
reach different results, reading from different replicas would detect the inconsistency and trigger
read-repair that will generate yet another cell that would still conflict with the existing cells,
with no guarantee for convergence.
The first tie-breaking rule when two cells have the same write timestamp is that
dead cells win over live cells; and if both cells are deleted, the one with the later deletion time prevails.
If both cells are alive, their expiration time is examined.
Cells that are written with a non-zero TTL (either implicit, as determined by
the table's default TTL, or explicit, `USING TTL`) are due to expire
TTL seconds after the time they were written (as determined by the coordinator,
and rounded to 1 second resolution). That time is the cell's expiration time.
When cells expire, they become tombstones, shadowing any data written with a write timestamp
less than or equal to the timestamp of the expiring cell.
Therefore, cells that have an expiration time win over cells with no expiration time.
If both cells have an expiration time, the one with the latest expiration time wins;
and if they have the same expiration time (in whole second resolution),
their write time is derived from the expiration time less the original time-to-live value
and the one that was written at a later time prevails.
Finally, if both cells are live and have no expiration, or have the same expiration time and time-to-live,
the cell with the lexicographically bigger value prevails.
Note that when multiple columns are INSERTed or UPDATEed using the same timestamp,
SELECTing those columns might return a result that mixes cells from either upsert.
This may happen when both upserts have no expiration time, or both their expiration time and TTL are the
same, respectively (in whole second resolution). In such a case, cell selection would be based on the cell values
in each column, independently of each other.

View File

@@ -25,7 +25,7 @@ Getting Started
:id: "getting-started"
:class: my-panel
* `Install ScyllaDB (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/>`_ - Links to the ScyllaDB Download Center
* `Install ScyllaDB (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/#core>`_ - Links to the ScyllaDB Download Center
* :doc:`Configure ScyllaDB </getting-started/system-configuration/>`
* :doc:`Run ScyllaDB in a Shared Environment </getting-started/scylla-in-a-shared-environment>`

View File

@@ -20,7 +20,7 @@ Install ScyllaDB
Keep your versions up-to-date. The two latest versions are supported. Also always install the latest patches for your version.
* Download and install ScyllaDB Server, Drivers and Tools in `Scylla Download Center <https://www.scylladb.com/download/#server/>`_
* Download and install ScyllaDB Server, Drivers and Tools in `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_
* :doc:`ScyllaDB Web Installer for Linux <scylla-web-installer>`
* :doc:`ScyllaDB Unified Installer (relocatable executable) <unified-installer>`
* :doc:`Air-gapped Server Installation <air-gapped-install>`

View File

@@ -4,7 +4,7 @@ ScyllaDB Web Installer for Linux
ScyllaDB Web Installer is a platform-agnostic installation script you can run with ``curl`` to install ScyllaDB on Linux.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#server>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
Prerequisites
--------------

View File

@@ -25,11 +25,7 @@ ScyllaDB Open Source
.. note::
Recommended OS and ScyllaDB AMI/Image OS for ScyllaDB Open Source:
- Ubuntu 20.04 for versions 4.6 and later.
- CentOS 7 for versions earlier than 4.6.
The recommended OS for ScyllaDB Open Source is Ubuntu 22.04.
+----------------------------+----------------------------------+-----------------------------+---------+-------+
| Linux Distributions | Ubuntu | Debian | CentOS /| Rocky/|
@@ -37,6 +33,8 @@ ScyllaDB Open Source
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| ScyllaDB Version / Version | 14.04| 16.04| 18.04|20.04 |22.04 | 8 | 9 | 10 | 11 | 7 | 8 |
+============================+======+======+======+======+======+======+======+=======+=======+=========+=======+
| 5.2 | |x| | |x| | |x| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 5.1 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 5.0 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
@@ -63,17 +61,18 @@ ScyllaDB Open Source
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 4.3).
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 4.3). Since
version 5.2, the ScyllaDB AMI/Image OS for ScyllaDB Open Source is based on Ubuntu 22.04.
ScyllaDB Enterprise
--------------------
.. note::
Recommended OS and ScyllaDB AMI/Image OS for ScyllaDB Enterprise:
- Ubuntu 20.04 for versions 2021.1 and later.
- CentOS 7 for versions earlier than 2021.1.
The recommended OS for ScyllaDB Enterprise is Ubuntu 22.04.
+----------------------------+-----------------------------------+---------------------------+--------+-------+
| Linux Distributions | Ubuntu | Debian | CentOS/| Rocky/|
@@ -81,11 +80,13 @@ ScyllaDB Enterprise
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| ScyllaDB Version / Version | 14.04| 16.04| 18.04| 20.04| 22.04 | 8 | 9 | 10 | 11 | 7 | 8 |
+============================+======+======+======+======+=======+======+======+======+======+========+=======+
| 2023.1 | |x| | |x| | |x| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2022.2 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2022.1 | |x| | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |v| | |v| |
| 2022.1 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2021.1 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
| 2021.1 | |x| | |v| | |v| | |v| | |v| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2020.1 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
@@ -95,4 +96,5 @@ ScyllaDB Enterprise
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 2021.1).
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 2021.1). Since
version 2023.1, the ScyllaDB AMI/Image OS for ScyllaDB Enterprise is based on Ubuntu 22.04.

View File

@@ -325,7 +325,7 @@ Storage: each instance can support maximum of 24 local SSD of 375 GB partitions
Microsoft Azure
---------------
The `Lsv2-series <https://azure.microsoft.com/en-us/blog/announcing-the-general-availability-of-lsv2-series-azure-virtual-machines/>`_ features high throughput, low latency, and directly mapped local NVMe storage. The Lsv2 VMs run on the AMD EPYCTM 7551 processor with an all-core boost of 2.55GHz.
The `Lsv3-series <https://learn.microsoft.com/en-us/azure/virtual-machines/lsv3-series/>`_ of Azure Virtual Machines (Azure VMs) features high-throughput, low latency, directly mapped local NVMe storage. These VMs run on the 3rd Generation Intel® Xeon® Platinum 8370C (Ice Lake) processor in a hyper-threaded configuration.
.. list-table::
@@ -336,32 +336,32 @@ The `Lsv2-series <https://azure.microsoft.com/en-us/blog/announcing-the-general-
- vCPU
- Mem (GB)
- Storage
* - L8s_v2
* - Standard_L8s_v3
- 8
- 64
- 1 x 1.92 TB
* - L16s_v2
* - Standard_L16s_v3
- 16
- 128
- 2 x 1.92 TB
* - L32s_v2
* - Standard_L32s_v3
- 32
- 256
- 4 x 1.92 TB
* - L48s_v2
* - Standard_L48s_v3
- 48
- 384
- 6 x 1.92 TB
* - L64s_v2
* - Standard_L64s_v3
- 64
- 512
- 8 x 1.92 TB
* - L80s_v2
* - Standard_L80s_v3
- 80
- 640
- 10 x 1.92 TB
More on Azure Lsv2 instances `here <https://azure.microsoft.com/en-us/blog/announcing-the-general-availability-of-lsv2-series-azure-virtual-machines/>`_
More on Azure Lsv3 instances `here <https://learn.microsoft.com/en-us/azure/virtual-machines/lsv3-series/>`_
Oracle Cloud Infrastructure (OCI)
----------------------------------------

View File

@@ -1,173 +1,53 @@
:full-width:
:hide-version-warning:
:hide-pre-content:
:hide-post-content:
:hide-sidebar:
:hide-secondary-sidebar:
:landing:
:orphan:
.. title:: Welcome to ScyllaDB Documentation
====================================
ScyllaDB Open Source Documentation
====================================
.. hero-box::
:title: Welcome to ScyllaDB Documentation
:image: /_static/img/mascots/scylla-docs.svg
:search_box:
.. meta::
:title: ScyllaDB Open Source Documentation
:description: ScyllaDB Open Source Documentation
:keywords: ScyllaDB Open Source, Scylla Open Source, Scylla docs, ScyllaDB documentation, Scylla Documentation
The most up-to-date documents for the fastest, best performing, high availability NoSQL database.
About This User Guide
-----------------------
.. raw:: html
ScyllaDB is a distributed NoSQL wide-column database for data-intensive apps that require
high performance and low latency.
<div class="landing__content landing__content">
This user guide covers topics related to ScyllaDB Open Source - an open-source project that allows you to evaluate
experimental features, review the `source code <https://github.com/scylladb/scylladb>`_, and add your contributions
to the project.
.. raw:: html
For topics related to other ScyllaDB flavors, see the documentation for `ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/>`_ and
`ScyllaDB Cloud <https://cloud.docs.scylladb.com/>`_.
<div class="topics-grid topics-grid--scrollable grid-container full">
Documentation Highlights
--------------------------
<div class="grid-x grid-margin-x hs">
* :doc:`Install ScyllaDB Open Source </getting-started/install-scylla/index>`
* :doc:`Configure ScyllaDB Open Source </getting-started/system-configuration/>`
* :doc:`Cluster Management Procedures </operating-scylla/procedures/cluster-management/index>`
* :doc:`Upgrade ScyllaDB Open Source </upgrade/index>`
* :doc:`CQL Reference </cql/index>`
* :doc:`ScyllaDB Drivers </using-scylla/drivers/index>`
.. topic-box::
:title: New to ScyllaDB? Start here!
:link: https://cloud.docs.scylladb.com/stable/scylladb-basics/
:class: large-4
:anchor: ScyllaDB Basics
ScyllaDB Community
--------------------------
Learn the essentials of ScyllaDB.
Join the ScyllaDB Open Source community:
* Contribute to the ScyllaDB Open Source `project <https://github.com/scylladb/scylladb>`_.
* Join the `ScyllaDB Community Forum <https://forum.scylladb.com/>`_.
* Join our `Slack Channel <https://slack.scylladb.com/>`_.
* Sign up for the `scylladb-users <https://groups.google.com/d/forum/scylladb-users>`_ Google group.
.. topic-box::
:title: Let us manage your DB
:link: https://cloud.docs.scylladb.com
:class: large-4
:anchor: ScyllaDB Cloud Documentation
Learn How to Use ScyllaDB
---------------------------
Simplify application development with ScyllaDB Cloud - a fully managed database-as-a-service.
.. topic-box::
:title: Manage your own DB
:link: getting-started
:class: large-4
:anchor: ScyllaDB Open Source and Enterprise Documentation
Deploy and manage your database in your own environment.
.. raw:: html
</div></div>
.. raw:: html
<div class="topics-grid topics-grid--products">
<h2 class="topics-grid__title">Our Products</h2>
<div class="grid-container full">
<div class="grid-x grid-margin-x">
.. topic-box::
:title: ScyllaDB Enterprise
:link: getting-started
:image: /_static/img/mascots/scylla-enterprise.svg
:class: topic-box--product,large-3,small-6
ScyllaDBs most stable high-performance enterprise-grade NoSQL database.
.. topic-box::
:title: ScyllaDB Open Source
:link: getting-started
:image: /_static/img/mascots/scylla-opensource.svg
:class: topic-box--product,large-3,small-6
A high-performance NoSQL database with a close-to-the-hardware, shared-nothing approach.
.. topic-box::
:title: ScyllaDB Cloud
:link: https://cloud.docs.scylladb.com
:image: /_static/img/mascots/scylla-cloud.svg
:class: topic-box--product,large-3,small-6
A fully managed NoSQL database as a service powered by ScyllaDB Enterprise.
.. topic-box::
:title: ScyllaDB Alternator
:link: https://docs.scylladb.com/stable/alternator/alternator.html
:image: /_static/img/mascots/scylla-alternator.svg
:class: topic-box--product,large-3,small-6
Open source Amazon DynamoDB-compatible API.
.. topic-box::
:title: ScyllaDB Monitoring Stack
:link: https://monitoring.docs.scylladb.com
:image: /_static/img/mascots/scylla-monitor.svg
:class: topic-box--product,large-3,small-6
Complete open source monitoring solution for your ScyllaDB clusters.
.. topic-box::
:title: ScyllaDB Manager
:link: https://manager.docs.scylladb.com
:image: /_static/img/mascots/scylla-manager.svg
:class: topic-box--product,large-3,small-6
Hassle-free ScyllaDB NoSQL database management for scale-out clusters.
.. topic-box::
:title: ScyllaDB Drivers
:link: https://docs.scylladb.com/stable/using-scylla/drivers/
:image: /_static/img/mascots/scylla-drivers.svg
:class: topic-box--product,large-3,small-6
Shard-aware drivers for superior performance.
.. topic-box::
:title: ScyllaDB Operator
:link: https://operator.docs.scylladb.com
:image: /_static/img/mascots/scylla-enterprise.svg
:class: topic-box--product,large-3,small-6
Easily run and manage your ScyllaDB cluster on Kubernetes.
.. raw:: html
</div></div></div>
.. raw:: html
<div class="topics-grid">
<h2 class="topics-grid__title">Learn More About ScyllaDB</h2>
<p class="topics-grid__text"></p>
<div class="grid-container full">
<div class="grid-x grid-margin-x">
.. topic-box::
:title: Attend ScyllaDB University
:link: https://university.scylladb.com/
:image: /_static/img/mascots/scylla-university.png
:class: large-6,small-12
:anchor: Find a Class
| Register to take a *free* class at ScyllaDB University.
| There are several learning paths to choose from.
.. topic-box::
:title: Register for a Webinar
:link: https://www.scylladb.com/resources/webinars/
:image: /_static/img/mascots/scylla-with-computer-2.png
:class: large-6,small-12
:anchor: Find a Webinar
| You can either participate in a live webinar or see a recording on demand.
| There are several webinars to choose from.
.. raw:: html
</div></div></div>
.. raw:: html
</div>
You can learn to use ScyllaDB by taking **free courses** at `ScyllaDB University <https://university.scylladb.com/>`_.
In addition, you can read our `blog <https://www.scylladb.com/blog/>`_ and attend ScyllaDB's
`webinars, workshops, and conferences <https://www.scylladb.com/company/events/>`_.
.. toctree::
:hidden:
@@ -179,7 +59,6 @@
architecture/index
troubleshooting/index
kb/index
ScyllaDB University <https://university.scylladb.com/>
faq
Contribute to ScyllaDB <contribute>
glossary

View File

@@ -41,14 +41,6 @@ Scylla nodetool repair command supports the following options:
nodetool repair -et 90874935784
nodetool repair --end-token 90874935784
- ``-seq``, ``--sequential`` Use *-seq* to carry out a sequential repair.
For example, a sequential repair of all keyspaces on a node:
::
nodetool repair -seq
- ``-hosts`` ``--in-hosts`` syncs the **repair master** data subset only between a list of nodes, using host ID or Address. The list *must* include the **repair master**.

View File

@@ -3,6 +3,7 @@
* endpoint_snitch - ``grep endpoint_snitch /etc/scylla/scylla.yaml``
* Scylla version - ``scylla --version``
* Authenticator - ``grep authenticator /etc/scylla/scylla.yaml``
* consistent_cluster_management - ``grep consistent_cluster_management /etc/scylla/scylla.yaml``
.. Note::

View File

@@ -119,6 +119,7 @@ Add New DC
* **listen_address** - IP address that Scylla used to connect to the other Scylla nodes in the cluster.
* **endpoint_snitch** - Set the selected snitch.
* **rpc_address** - Address for client connections (Thrift, CQL).
* **consistent_cluster_management** - set to the same value as used by your existing nodes.
The parameters ``seeds``, ``cluster_name`` and ``endpoint_snitch`` need to match the existing cluster.
@@ -200,6 +201,11 @@ Add New DC
#. If you are using Scylla Monitoring, update the `monitoring stack <https://monitoring.docs.scylladb.com/stable/install/monitoring_stack.html#configure-scylla-nodes-from-files>`_ to monitor it. If you are using Scylla Manager, make sure you install the `Manager Agent <https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html>`_ and Manager can access the new DC.
Handling Failures
=================
If one of the new nodes starts bootstrapping but then fails in the middle e.g. due to a power loss, you can retry bootstrap (by restarting the node). If you don't want to retry, or the node refuses to boot on subsequent attempts, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.
Configure the Client not to Connect to the New DC
-------------------------------------------------

View File

@@ -54,6 +54,8 @@ Procedure
* **seeds** - Specifies the IP address of an existing node in the cluster. The new node will use this IP to connect to the cluster and learn the cluster topology and state.
* **consistent_cluster_management** - set to the same value as used by your existing nodes.
.. note::
In earlier versions of ScyllaDB, seed nodes assisted in gossip. Starting with Scylla Open Source 4.3 and Scylla Enterprise 2021.1, the seed concept in gossip has been removed. If you are using an earlier version of ScyllaDB, you need to configure the seeds parameter in the following way:
@@ -117,3 +119,8 @@ Procedure
You don't need to restart the Scylla service after modifying the seeds list in ``scylla.yaml``.
#. If you are using Scylla Monitoring, update the `monitoring stack <https://monitoring.docs.scylladb.com/stable/install/monitoring_stack.html#configure-scylla-nodes-from-files>`_ to monitor it. If you are using Scylla Manager, make sure you install the `Manager Agent <https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html>`_, and Manager can access it.
Handling Failures
=================
If the node starts bootstrapping but then fails in the middle e.g. due to a power loss, you can retry bootstrap (by restarting the node). If you don't want to retry, or the node refuses to boot on subsequent attempts, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.

View File

@@ -70,6 +70,7 @@ the file can be found under ``/etc/scylla/``
- **listen_address** - IP address that the Scylla use to connect to other Scylla nodes in the cluster
- **endpoint_snitch** - Set the selected snitch
- **rpc_address** - Address for client connection (Thrift, CQLSH)
- **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
3. In the ``cassandra-rackdc.properties`` file, edit the rack and data center information.
The file can be found under ``/etc/scylla/``.

View File

@@ -26,6 +26,7 @@ The file can be found under ``/etc/scylla/``
- **listen_address** - IP address that Scylla used to connect to other Scylla nodes in the cluster
- **endpoint_snitch** - Set the selected snitch
- **rpc_address** - Address for client connection (Thrift, CQL)
- **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
3. This step needs to be done **only** if you are using the **GossipingPropertyFileSnitch**. If not, skip this step.
In the ``cassandra-rackdc.properties`` file, edit the parameters listed below.

View File

@@ -63,6 +63,7 @@ Perform the following steps for each node in the new cluster:
* **rpc_address** - Address for client connection (Thrift, CQL).
* **broadcast_address** - The IP address a node tells other nodes in the cluster to contact it by.
* **broadcast_rpc_address** - Default: unset. The RPC address to broadcast to drivers and other Scylla nodes. It cannot be set to 0.0.0.0. If left blank, it will be set to the value of ``rpc_address``. If ``rpc_address`` is set to 0.0.0.0, ``broadcast_rpc_address`` must be explicitly configured.
* **consistent_cluster_management** - ``true`` by default, can be set to ``false`` if you don't want to use Raft for consistent schema management in this cluster (will be mandatory in later versions). Check the :doc:`Raft in ScyllaDB document</architecture/raft/>` to learn more.
#. After you have installed and configured Scylla and edited ``scylla.yaml`` file on all the nodes, start the node specified with the ``seeds`` parameter. Then start the rest of the nodes in your cluster, one at a time, using
``sudo systemctl start scylla-server``.

View File

@@ -0,0 +1,204 @@
Handling Cluster Membership Change Failures
*******************************************
A failure may happen in the middle of a cluster membership change (that is bootstrap, decommission, removenode, or replace), such as loss of power. If that happens, you should ensure that the cluster is brought back to a consistent state as soon as possible. Further membership changes might be impossible until you do so.
For example, a node that crashed in the middle of decommission might leave the cluster in a state where it considers the node to still be a member, but the node itself will refuse to restart and communicate with the cluster. This particular case is very unlikely - it requires a specifically timed crash to happen, after the data streaming phase of decommission finishes but before the node commits that it left. But if it happens, you won't be able to bootstrap other nodes (they will try to contact the partially-decommissioned node and fail) until you remove the remains of the node that crashed.
---------------------------
Handling a Failed Bootstrap
---------------------------
If a failure happens when trying to bootstrap a new node to the cluster, you can try bootstrapping the node again by restarting it.
If the failure persists or you decided that you don't want to bootstrap the node anymore, follow the instructions in the :ref:`cleaning up after a failed membership change <cleaning-up-after-change>` section to remove the remains of the bootstrapping node. You can then clear the node's data directories and attempt to bootstrap it again.
------------------------------
Handling a Failed Decommission
------------------------------
There are two cases.
Most likely the failure happened during the data repair/streaming phase - before the node tried to leave the token ring. Look for a log message containing "leaving token ring" in the logs of the node that you tried to decommission. For example:
.. code-block:: console
INFO 2023-03-14 13:08:38,323 [shard 0] storage_service - decommission[5b2e752e-964d-4f36-871f-254491f4e8cc]: leaving token ring
If the message is **not** present, the failure happened before the node tried to leave the token ring. In that case you can simply restart the node and attempt to decommission it again.
If the message is present, the node attempted to leave the token ring, but it might have left the cluster only partially before the failure. **Do not try to restart the node**. Instead, you must make sure that the node is dead and remove any leftovers using the :doc:`removenode operation </operating-scylla/nodetool-commands/removenode/>`. See :ref:`cleaning up after a failed membership change <cleaning-up-after-change>`. Trying to restart the node after such failure results in unpredictable behavior - it may restart normally, it may refuse to restart, or it may even try to rebootstrap.
If you don't have access to the node's logs anymore, assume the second case (the node might have attempted to leave the token ring), **do not try to restart the node**, instead follow the :ref:`cleaning up after a failed membership change <cleaning-up-after-change>` section.
----------------------------
Handling a Failed Removenode
----------------------------
Simply retry the removenode operation.
If you somehow lost the host ID of the node that you tried to remove, follow the instructions in :ref:`cleaning up after a failed membership change <cleaning-up-after-change>`.
--------------------------
Handling a Failed Replace
--------------------------
Replace is a special case of bootstrap, but the bootstrapping node tries to take the place of another dead node. You can retry a failed replace operation by restarting the replacing node.
If the failure persists or you decided that you don't want to perform the replace anymore, follow the instructions in :ref:`cleaning up after a failed membership change <cleaning-up-after-change>` section to remove the remains of the replacing node. You can then clear the node's data directories and attempt to replace again. Alternatively, you can remove the dead node which you initially tried to replace using :doc:`removenode </operating-scylla/nodetool-commands/removenode/>`, and perform a regular bootstrap.
.. _cleaning-up-after-change:
--------------------------------------------
Cleaning up after a Failed Membership Change
--------------------------------------------
After a failed membership change, the cluster may contain remains of a node that tried to leave or join - other nodes may consider the node a member, possibly in a transitioning state. It is important to remove any such "ghost" members. Their presence may reduce the cluster's availability, performance, or prevent further membership changes.
You need to determine the host IDs of any potential ghost members, then remove them using the :doc:`removenode operation </operating-scylla/nodetool-commands/removenode/>`. Note that after a failed replace, there may be two different host IDs that you'll want to find and run ``removenode`` on: the new replacing node and the old node that you tried to replace. (Or you can remove the new node only, then try to replace the old node again.)
Step One: Determining Host IDs of Ghost Members
===============================================
* After a failed bootstrap, you need to determine the host ID of the node that tried to bootstrap, if it managed to generate a host ID (it might not have chosen the host ID yet if it failed very early in the procedure, in which case there's nothing to remove). Look for a message containing ``system_keyspace - Setting local host id to`` in the node's logs, which will contain the node's host ID. For example: ``system_keyspace - Setting local host id to f180b78b-6094-434d-8432-7327f4d4b38d``. If you don't have access to the node's logs, read the generic method below.
* After a failed decommission, you need to determine the host ID of the node that tried to decommission. You can search the node's logs as in the failed bootstrap case (see above), or you can use the generic method below.
* After a failed removenode, you need to determine the host ID of the node that you tried to remove. You should already have it, since executing a removenode requires the host ID in the first place. But if you lost it somehow, read the generic method below.
* After a failed replace, you need to determine the host ID of the replacing node. Search the node's logs as in the failed bootstrap case (see above), or you can use the generic method below. You may also want to determine the host ID of the replaced node - either to attempt replacing it again after removing the remains of the previous replacing node, or to remove it using :doc:`nodetool removenode </operating-scylla/nodetool-commands/removenode/>`. You should already have the host ID of the replaced node if you used the ``replace_node_first_boot`` option to perform the replace.
If you cannot determine the ghost members' host ID using the suggestions above, use the method described below. The approach differs depending on whether Raft is enabled in your cluster.
.. tabs::
.. group-tab:: Raft enabled
#. Make sure there are no ongoing membership changes.
#. Execute the following CQL query on one of your nodes to retrieve the Raft group 0 ID:
.. code-block:: cql
select value from system.scylla_local where key = 'raft_group0_id'
For example:
.. code-block:: cql
cqlsh> select value from system.scylla_local where key = 'raft_group0_id';
value
--------------------------------------
607fef80-c276-11ed-a6f6-3075f294cc65
#. Use the obtained Raft group 0 ID to query the set of all cluster members' host IDs (which includes the ghost members), by executing the following query:
.. code-block:: cql
select server_id from system.raft_state where group_id = <group0_id>
replace ``<group0_id>`` with the group 0 ID that you obtained. For example:
.. code-block:: cql
cqlsh> select server_id from system.raft_state where group_id = 607fef80-c276-11ed-a6f6-3075f294cc65;
server_id
--------------------------------------
26a9badc-6e96-4b86-a8df-5173e5ab47fe
7991e7f5-692e-45a0-8ae5-438be5bc7c4f
aff11c6d-fbe7-4395-b7ca-3912d7dba2c6
#. Execute the following CQL query to obtain the host IDs of all token ring members:
.. code-block:: cql
select host_id, up from system.cluster_status;
For example:
.. code-block:: cql
cqlsh> select peer, host_id, up from system.cluster_status;
peer | host_id | up
-----------+--------------------------------------+-------
127.0.0.3 | null | False
127.0.0.1 | 26a9badc-6e96-4b86-a8df-5173e5ab47fe | True
127.0.0.2 | 7991e7f5-692e-45a0-8ae5-438be5bc7c4f | True
The output of this query is similar to the output of ``nodetool status``.
We included the ``up`` column to see which nodes are down and the ``peer`` column to see their IP addresses.
In this example, one of the nodes tried to decommission and crashed as soon as it left the token ring but before it left the Raft group. Its entry will show up in ``system.cluster_status`` queries with ``host_id = null``, like above, until the cluster is restarted.
#. A host ID belongs to a ghost member if:
* It appears in the ``system.raft_state`` query but not in the ``system.cluster_status`` query,
* Or it appears in the ``system.cluster_status`` query but does not correspond to any remaining node in your cluster.
In our example, the ghost member's host ID was ``aff11c6d-fbe7-4395-b7ca-3912d7dba2c6`` because it appeared in the ``system.raft_state`` query but not in the ``system.cluster_status`` query.
If you're unsure whether a given row in the ``system.cluster_status`` query corresponds to a node in your cluster, you can connect to each node in the cluster and execute ``select host_id from system.local`` (or search the node's logs) to obtain that node's host ID, collecting the host IDs of all nodes in your cluster. Then check if each host ID from the ``system.cluster_status`` query appears in your collected set; if not, it's a ghost member.
A good rule of thumb is to look at the members marked as down (``up = False`` in ``system.cluster_status``) - ghost members are eventually marked as down by the remaining members of the cluster. But remember that a real member might also be marked as down if it was shutdown or partitioned away from the rest of the cluster. If in doubt, connect to each node and collect their host IDs, as described in the previous paragraph.
.. group-tab:: Raft disabled
#. Make sure there are no ongoing membership changes.
#. Execute the following CQL query on one of your nodes to obtain the host IDs of all token ring members:
.. code-block:: cql
select peer, host_id, up from system.cluster_status;
For example:
.. code-block:: cql
cqlsh> select peer, host_id, up from system.cluster_status;
peer | host_id | up
-----------+--------------------------------------+-------
127.0.0.3 | 42405b3b-487e-4759-8590-ddb9bdcebdc5 | False
127.0.0.1 | 4e3ee715-528f-4dc9-b10f-7cf294655a9e | True
127.0.0.2 | 225a80d0-633d-45d2-afeb-a5fa422c9bd5 | True
The output of this query is similar to the output of ``nodetool status``.
We included the ``up`` column to see which nodes are down.
In this example, one of the 3 nodes tried to decommission but crashed while it was leaving the token ring. The node is in a partially left state and will refuse to restart, but other nodes still consider it as a normal member. We'll have to use ``removenode`` to clean up after it.
#. A host ID belongs to a ghost member if it appears in the ``system.cluster_status`` query but does not correspond to any remaining node in your cluster.
If you're unsure whether a given row in the ``system.cluster_status`` query corresponds to a node in your cluster, you can connect to each node in the cluster and execute ``select host_id from system.local`` (or search the node's logs) to obtain that node's host ID, collecting the host IDs of all nodes in your cluster. Then check if each host ID from the ``system.cluster_status`` query appears in your collected set; if not, it's a ghost member.
A good rule of thumb is to look at the members marked as down (``up = False`` in ``system.cluster_status``) - ghost members are eventually marked as down by the remaining members of the cluster. But remember that a real member might also be marked as down if it was shutdown or partitioned away from the rest of the cluster. If in doubt, connect to each node and collect their host IDs, as described in the previous paragraph.
In our example, the ghost member's host ID is ``42405b3b-487e-4759-8590-ddb9bdcebdc5`` because it is the only member marked as down and we can verify that the other two rows appearing in ``system.cluster_status`` belong to the remaining 2 nodes in the cluster.
In some cases, even after a failed topology change, there may be no ghost members left - for example, if a bootstrapping node crashed very early in the procedure or a decommissioning node crashed after it committed the membership change but before it finalized its own shutdown steps.
If any ghost members are present, proceed to the next step.
Step Two: Removing the Ghost Members
====================================
Given the host IDs of ghost members, you can remove them using ``removenode``; follow the :doc:`documentation for removenode operation </operating-scylla/nodetool-commands/removenode/>`.
If you're executing ``removenode`` too quickly after a failed membership change, an error similar to the following might pop up:
.. code-block:: console
nodetool: Scylla API server HTTP POST to URL '/storage_service/remove_node' failed: seastar::rpc::remote_verb_error (node_ops_cmd_check: Node 127.0.0.2 rejected node_ops_cmd=removenode_abort from node=127.0.0.1 with ops_uuid=0ba0a5ab-efbd-4801-a31c-034b5f55487c, pending_node_ops={b47523f2-de6a-4c38-8490-39127dba6b6a}, pending node ops is in progress)
In that case simply wait for 2 minutes before trying ``removenode`` again.
If ``removenode`` returns an error like:
.. code-block:: console
nodetool: Scylla API server HTTP POST to URL '/storage_service/remove_node' failed: std::runtime_error (removenode[12e7e05b-d1ae-4978-b6a6-de0066aa80d8]: Host ID 42405b3b-487e-4759-8590-ddb9bdcebdc5 not found in the cluster)
and you're sure that you're providing the correct Host ID, it means that the member was already removed and you don't have to clean up after it.

View File

@@ -25,6 +25,7 @@ Cluster Management Procedures
Safely Shutdown Your Cluster <safe-shutdown>
Safely Restart Your Cluster <safe-start>
Cluster Membership Change <membership-changes>
Handling Membership Change Failures <handling-membership-change-failures>
repair-based-node-operation
.. panel-box::
@@ -80,6 +81,8 @@ Cluster Management Procedures
* :doc:`Cluster Membership Change Notes </operating-scylla/procedures/cluster-management/membership-changes/>`
* :doc:`Handling Membership Change Failures </operating-scylla/procedures/cluster-management/handling-membership-change-failures>`
* :ref:`Add Bigger Nodes to a Cluster <add-bigger-nodes-to-a-cluster>`
* :doc:`Repair Based Node Operations (RBNO) </operating-scylla/procedures/cluster-management/repair-based-node-operation>`

View File

@@ -49,6 +49,11 @@ Removing a Running Node
.. include:: /rst_include/clean-data-code.rst
Handling Failures
-----------------
If ``nodetool decommission`` starts executing but then fails in the middle e.g. due to a power loss, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.
----------------------------
Removing an Unavailable Node
----------------------------
@@ -81,7 +86,6 @@ the ``nodetool removenode`` operation will fail. To ensure successful operation
``nodetool removenode`` (not required when :doc:`Repair Based Node Operations (RBNO) <repair-based-node-operation>` for ``removenode``
is enabled).
Additional Information
----------------------
* :doc:`Nodetool Reference </operating-scylla/nodetool>`

View File

@@ -25,6 +25,7 @@ Login to one of the nodes in the cluster with (UN) status, collect the following
* seeds - ``cat /etc/scylla/scylla.yaml | grep seeds:``
* endpoint_snitch - ``cat /etc/scylla/scylla.yaml | grep endpoint_snitch``
* Scylla version - ``scylla --version``
* consistent_cluster_management - ``grep consistent_cluster_management /etc/scylla/scylla.yaml``
Procedure
---------

View File

@@ -66,6 +66,8 @@ Procedure
- **rpc_address** - Address for client connection (Thrift, CQL)
- **consistent_cluster_management** - set to the same value as used by your existing nodes.
#. Add the ``replace_node_first_boot`` parameter to the ``scylla.yaml`` config file on the new node. This line can be added to any place in the config file. After a successful node replacement, there is no need to remove it from the ``scylla.yaml`` file. (Note: The obsolete parameters "replace_address" and "replace_address_first_boot" are not supported and should not be used). The value of the ``replace_node_first_boot`` parameter should be the Host ID of the node to be replaced.
For example (using the Host ID of the failed node from above):
@@ -150,6 +152,12 @@ Procedure
.. note::
When :doc:`Repair Based Node Operations (RBNO) <repair-based-node-operation>` for **replace** is enabled, there is no need to rerun repair.
Handling Failures
-----------------
If the new node starts and begins the replace operation but then fails in the middle e.g. due to a power loss, you can retry the replace (by restarting the node). If you don't want to retry, or the node refuses to boot on subsequent attempts, consult the :doc:`Handling Membership Change Failures document</operating-scylla/procedures/cluster-management/handling-membership-change-failures>`.
------------------------------
Setup RAID Following a Restart
------------------------------

View File

@@ -15,6 +15,7 @@ As long as the cluster can satisfy the required consistency level (usually quoru
* :doc:`Hinted Handoff </architecture/anti-entropy/hinted-handoff>`
* :doc:`Read Repair </architecture/anti-entropy/read-repair>`
* :doc:`Repair Based Node Operations </operating-scylla/procedures/cluster-management/repair-based-node-operation>`
* Repair - described in the following sections
Repair Overview
@@ -50,30 +51,6 @@ Row-level repair improves Scylla in two ways:
* keeping the data in a temporary buffer.
* using the cached data to calculate the checksum and send it to the replicas.
Repair Base Operation
---------------------
.. versionadded:: 4.0 Scylla Open Source (disabled)
ScyllaDB has two mechanisms to synchronize data between nodes:
* Streaming - used for cluster topology changes, such as adding or removing nodes.
* Row Level Repair - an offline process that compares and syncs data between nodes .
With *Repair Base Operation*, Scylla uses row-level repair as the unified underlying mechanism for repair operation **and** all node operations, e.g., bootstrap, decommission, remove node, replace node, rebuild node.
This safer process makes the node operations resumable, syncing only the inconsistent data.
Also, replaced nodes now accept writes, which means there is no longer a need to repair after replacing a node.
**This feature is disabled by default.**
You can enable or disable this feature with a configuration parameter in the *scylla.yaml*:
.. code-block:: none
enable_repair_based_node_ops: [true|false]
See also
* `Scylla Manager documentation <https://manager.docs.scylladb.com/>`_

View File

@@ -198,7 +198,7 @@ By default ScyllaDB will try to use cache, but since the data wont be used ag
As a consequence it can lead to bad latency on operational workloads due to increased rate of cache misses.
To prevent this problem, queries from analytical workloads can bypass the cache using the bypass cache option.
:ref:`Bypass Cache <select-statement>` is only available with Scylla Enterprise.
See :ref:`Bypass Cache <bypass-cache>` for more information.
Batching
========

View File

@@ -68,7 +68,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------
@@ -92,13 +92,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
3. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
3. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
4. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -130,7 +130,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
@@ -164,7 +164,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -179,6 +179,19 @@ Restore the configuration files
for conf in $(cat /var/lib/dpkg/info/scylla-*server.conffiles /var/lib/dpkg/info/scylla-*conf.conffiles /var/lib/dpkg/info/scylla-*jmx.conffiles | grep -v init ) /etc/systemd/system/{var-lib-scylla,var-lib-systemd-coredump}.mount; do sudo cp -v $conf.backup-4.3 $conf; done
sudo systemctl daemon-reload (Ubuntu 16.04)
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from the previous snapshot because 2021.1 uses a different set of system tables. See :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>` for reference.
.. code:: console
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo find . -maxdepth 1 -type f -exec sudo rm -f "{}" +
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo chown -R scylla:scylla /var/lib/scylla/data/keyspace_name/table_name-UUID/
Start the node
--------------

View File

@@ -114,7 +114,7 @@ New io.conf format was introduced in ScyllaDB 2.3 and 2019.1. If your io.conf do
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
@@ -154,7 +154,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the old release
------------------------------------
@@ -184,6 +184,19 @@ Restore the configuration files
for conf in $(cat /var/lib/dpkg/info/scylla-*server.conffiles /var/lib/dpkg/info/scylla-*conf.conffiles /var/lib/dpkg/info/scylla-*jmx.conffiles | grep -v init ) /etc/systemd/system/{var-lib-scylla,var-lib-systemd-coredump}.mount; do sudo cp -v $conf.backup-5.0 $conf; done
sudo systemctl daemon-reload
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from the previous snapshot because 2022.1 uses a different set of system tables. See :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>` for reference.
.. code:: console
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo find . -maxdepth 1 -type f -exec sudo rm -f "{}" +
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo chown -R scylla:scylla /var/lib/scylla/data/keyspace_name/table_name-UUID/
Start the node
--------------

View File

@@ -66,7 +66,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------

View File

@@ -16,13 +16,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -54,7 +54,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
@@ -88,7 +88,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -7,7 +7,7 @@ This document is a step-by-step procedure for upgrading from ScyllaDB Enterprise
Applicable Versions
===================
This guide covers upgrading ScyllaDB Enterprise from version 2021.1.x to ScyllaDB Enterprise version 2022.1.y on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
This guide covers upgrading ScyllaDB Enterprise from version **2021.1.8** or later to ScyllaDB Enterprise version 2022.1.y on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
Upgrade Procedure
=================
@@ -69,7 +69,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------

View File

@@ -36,13 +36,13 @@ A new io.conf format was introduced in Scylla 2.3 and 2019.1. If your io.conf do
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -75,7 +75,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the old release
------------------------------------
@@ -120,7 +120,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -8,8 +8,8 @@ Upgrading ScyllaDB images requires updating:
* Underlying OS packages. Starting with ScyllaDB 4.6, each ScyllaDB version includes a list of 3rd party and
OS packages tested with the ScyllaDB release. The list depends on the base OS:
* ScyllaDB Open Source **4.4** and ScyllaDB Enterprise **2020.1** or earlier are based on **CentOS 7**.
* ScyllaDB Open Source **4.5** and ScyllaDB Enterprise **2021.1** or later are based on **Ubuntu 20.04**.
* ScyllaDB Open Source **5.0 and 5.1** and ScyllaDB Enterprise **2021.1, 2022.1, and 2022.2** are based on **Ubuntu 20.04**.
* ScyllaDB Open Source **5.2** and ScyllaDB Enterprise **2023.1** are based on **Ubuntu 22.04**.
If you're running ScyllaDB Open Source 5.0 or later or ScyllaDB Enterprise 2021.1.10 or later, you can
automatically update 3rd party and OS packages together with the ScyllaDB packages - by running one command.

View File

@@ -6,10 +6,10 @@ Upgrade ScyllaDB
:titlesonly:
:hidden:
ScyllaDB Enterprise <upgrade-enterprise/index>
ScyllaDB Open Source <upgrade-opensource/index>
ScyllaDB Open Source to ScyllaDB Enterprise <upgrade-to-enterprise/index>
ScyllaDB AMI <ami-upgrade>
ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/index.html>
.. raw:: html
@@ -23,14 +23,14 @@ Upgrade ScyllaDB
Procedures for upgrading Scylla.
* :doc:`Upgrade ScyllaDB Enterprise <upgrade-enterprise/index>`
* :doc:`Upgrade ScyllaDB Open Source <upgrade-opensource/index>`
* :doc:`Upgrade from ScyllaDB Open Source to Scylla Enterprise <upgrade-to-enterprise/index>`
* :doc:`Upgrade ScyllaDB AMI <ami-upgrade>`
* `Upgrade ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/index.html>`_
.. raw:: html

View File

@@ -1,17 +0,0 @@
.. include:: /upgrade/upgrade-enterprise/_common/gossip_generation_bug_warning.rst
.. note::
Scylla Enterprise 2019.1.6 added a new configuration to restrict the memory usage cartesian product IN queries.
If you are using IN in SELECT operations and hitting a *"cartesian product size ... is greater than maximum"* error, you can either update the query (recommended) or bypass the warning temporarily by adding the following parameters to *scylla.yaml*:
* *max_clustering_key_restrictions_per_query: 1000*
* *max_partition_key_restrictions_per_query: 1000*
The higher the values, the more likely you will hit an out of memory issue.
.. note::
Scylla Enterprise 2019.1.8 added a new configuration to restrict the memory usage of reverse queries.
If you are using reverse queries and hitting an error *"Aborting reverse partition read because partition ... is larger than the maximum safe size of ... for reversible partitions"* see the :doc:`reverse queries FAQ section </troubleshooting/reverse-queries>`.

View File

@@ -1,4 +0,0 @@
.. include:: /upgrade/upgrade-enterprise/_common/gossip_generation_bug_warning.rst
.. include:: /upgrade/upgrade-enterprise/_common/mv_si_rebuild_warning.rst

View File

@@ -1,10 +0,0 @@
.. note:: The note is only useful when CDC is GA supported in the target Scylla. Execute the following commands one node at the time, moving to the next node only **after** the upgrade procedure completed successfully.
.. warning::
If you are using CDC and upgrading Scylla 2020.1 to 2021.1, please review the API updates in :doc:`querying CDC streams </using-scylla/cdc/cdc-querying-streams>` and :doc:`CDC stream generations </using-scylla/cdc/cdc-stream-generations>`.
In particular, you should update applications that use CDC according to :ref:`CDC Upgrade notes <scylla-4-3-to-4-4-upgrade>` **before** upgrading the cluster to 2021.1.
If you are using CDC and upgrading from pre 2020.1 version to 2020.1, note the :doc:`upgrading from experimental CDC </kb/cdc-experimental-upgrade>`.
.. include:: /upgrade/upgrade-enterprise/_common/mv_si_rebuild_warning.rst

View File

@@ -1,6 +0,0 @@
.. note:: The note is only useful when CDC is GA supported in the target ScyllaDB. Execute the following commands one node at a time, moving to the next node only **after** the upgrade procedure completed successfully.
.. warning::
If you are using CDC and upgrading ScyllaDB 2021.1 to 2022.1, please review the API updates in :doc:`querying CDC streams </using-scylla/cdc/cdc-querying-streams>` and :doc:`CDC stream generations </using-scylla/cdc/cdc-stream-generations>`.
In particular, you should update applications that use CDC according to :ref:`CDC Upgrade notes <scylla-4-3-to-4-4-upgrade>` **before** upgrading the cluster to 2022.1.

View File

@@ -1,9 +0,0 @@
.. note::
If **any** of your instances are running Scylla Enterprise 2019.1.6 or earlier, **and** one of your Scylla nodes is up for more than a year, you might have been exposed to issue `#6063 <https://github.com/scylladb/scylla/pull/6083>`_.
One way to check this is by comparing `Generation No` (from `nodetool gossipinfo` output) with the current time in Epoch format (`date +%s`), and check if the difference is higher than one year (31536000 seconds).
See `scylla-check-gossiper-generation <https://github.com/scylladb/scylla-code-samples/tree/master/scylla-check-gossiper-generation>`_ for a script to do just that.
If this is the case, do **not** initiate the upgrade process before consulting with Scylla Support for further instructions.

View File

@@ -1,6 +0,0 @@
.. warning::
If you are using materialized views or secondary indexes created in Scylla 2019.1.x and, **while** upgrading to 2020.1.x (7 or lower) updated your schema; you might have MV inconsistency.
To fix: rebuild the MV.
It is recommended to avoid schema and topology updates during upgrade (mix cluster).

View File

@@ -1,50 +0,0 @@
=============================
Upgrade Scylla Enterprise
=============================
.. toctree::
:hidden:
:titlesonly:
ScyllaDB Enterprise 2022 <upgrade-guide-from-2022.x.y-to-2022.x.z/index>
ScyllaDB Enterprise 2021 <upgrade-guide-from-2021.x.y-to-2021.x.z/index>
ScyllaDB Enterprise 2020 <upgrade-guide-from-2020.x.y-to-2020.x.z/index>
ScyllaDB Enterprise 2019 <upgrade-guide-from-2019.x.y-to-2019.x.z/index>
ScyllaDB Enterprise 2018 <upgrade-guide-from-2018.x.y-to-2018.x.z/index>
ScyllaDB Enterprise 2017 <upgrade-guide-from-2017.x.y-to-2017.x.z/index>
ScyllaDB Enterprise 2022.1 to Scylla Enterprise 2022.2 <upgrade-guide-from-2022.1-to-2022.2/index>
ScyllaDB Enterprise 2021.1 to Scylla Enterprise 2022.1 <upgrade-guide-from-2021.1-to-2022.1/index>
ScyllaDB Enterprise 2020.1 to Scylla Enterprise 2021.1 <upgrade-guide-from-2020.1-to-2021.1/index>
ScyllaDB Enterprise 2019.1 to Scylla Enterprise 2020.1 <upgrade-guide-from-2019.1-to-2020.1/index>
ScyllaDB Enterprise 2018.1 to Scylla Enterprise 2019.1 <upgrade-guide-from-2018.1-to-2019.1/index>
ScyllaDB Enterprise 2017.1 to Scylla Enterprise 2018.1 <upgrade-guide-from-2017.1-to-2018.1/index>
Ubuntu 14.04 to 16.04 <upgrade-guide-from-ubuntu-14-to-16>
.. panel-box::
:title: Upgrade ScyllaDB Enterprise
:id: "getting-started"
:class: my-panel
Procedures for upgrading to a new version of ScyllaDB Enterprise.
Patch Release Upgrade
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2022.x </upgrade/upgrade-enterprise/upgrade-guide-from-2022.x.y-to-2022.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2021.x <upgrade-guide-from-2021.x.y-to-2021.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2020.x <upgrade-guide-from-2020.x.y-to-2020.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2019.x <upgrade-guide-from-2019.x.y-to-2019.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2018.x <upgrade-guide-from-2018.x.y-to-2018.x.z/index>`
* :doc:`Upgrade Guide - ScyllaDB Enterprise 2017.x <upgrade-guide-from-2017.x.y-to-2017.x.z/index>`
Major Release Upgrade
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2022.1 to Scylla Enterprise 2022.2 (minor release) <upgrade-guide-from-2022.1-to-2022.2/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2021.1 to Scylla Enterprise 2022.1 <upgrade-guide-from-2021.1-to-2022.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2020.1 to Scylla Enterprise 2021.1 <upgrade-guide-from-2020.1-to-2021.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2019.1 to Scylla Enterprise 2020.1 <upgrade-guide-from-2019.1-to-2020.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2018.1 to Scylla Enterprise 2019.1 <upgrade-guide-from-2018.1-to-2019.1/index>`
* :doc:`Upgrade Guide - From ScyllaDB Enterprise 2017.1 to Scylla Enterprise 2018.1 <upgrade-guide-from-2017.1-to-2018.1/index>`
* :doc:`Upgrade Guide - Ubuntu 14.04 to 16.04 <upgrade-guide-from-ubuntu-14-to-16>`
* :ref:`Upgrade Unified Installer (relocatable executable) install <unified-installed-upgrade>`

View File

@@ -1,39 +0,0 @@
==================================================
Upgrade from Scylla Enterprise 2017.1 to 2018.1
==================================================
.. toctree::
:hidden:
:titlesonly:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.1-to-2018.1-rpm>
Ubuntu <upgrade-guide-from-2017.1-to-2018.1-ubuntu>
Debian <upgrade-guide-from-2017.1-to-2018.1-debian>
Metrics <metric-update-2017.1-to-2018.1>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade to Scylla Enterprise 2018.1</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2017.1.x to 2018.1.y on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.1-to-2018.1-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2017.1.x to 2018.1.y on Ubuntu <upgrade-guide-from-2017.1-to-2018.1-ubuntu>`
* :doc:`Upgrade Scylla Enterprise from 2017.1.x to 2018.1.y on Debian <upgrade-guide-from-2017.1-to-2018.1-debian>`
* :doc:`Scylla Enterprise Metrics Update - Scylla 2017.1 to 2018.1<metric-update-2017.1-to-2018.1>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,291 +0,0 @@
====================================================================
Scylla Enterprise Metric Update - Scylla Enterprise 2017.1 to 2018.1
====================================================================
Updated Metrics
~~~~~~~~~~~~~~~
The following metric names have changed between Scylla Enterprise 2017.1 and 2018.1
=========================================================================== ===========================================================================
2017.1 2018.1
=========================================================================== ===========================================================================
scylla_batchlog_manager_total_operations_total_write_replay_attempts scylla_batchlog_manager_total_write_replay_attempts
scylla_cache_objects_partitions scylla_cache_partitions
scylla_cache_total_operations_concurrent_misses_same_key scylla_cache_concurrent_misses_same_key
scylla_cache_total_operations_evictions scylla_cache_partition_evictions
scylla_cache_total_operations_hits scylla_cache_partition_hits
scylla_cache_total_operations_insertions scylla_cache_partition_insertions
scylla_cache_total_operations_merges scylla_cache_partition_merges
scylla_cache_total_operations_misses scylla_cache_partition_misses
scylla_cache_total_operations_removals scylla_cache_partition_removals
scylla_commitlog_memory_buffer_list_bytes scylla_commitlog_memory_buffer_bytes
scylla_commitlog_memory_total_size scylla_commitlog_disk_total_bytes
scylla_commitlog_queue_length_allocating_segments scylla_commitlog_allocating_segments
scylla_commitlog_queue_length_pending_allocations scylla_commitlog_pending_allocations
scylla_commitlog_queue_length_pending_flushes scylla_commitlog_pending_flushes
scylla_commitlog_queue_length_segments scylla_commitlog_segments
scylla_commitlog_queue_length_unused_segments scylla_commitlog_unused_segments
scylla_commitlog_total_bytes_slack scylla_commitlog_slack
scylla_commitlog_total_bytes_written scylla_commitlog_bytes_written
scylla_commitlog_total_operations_alloc scylla_commitlog_alloc
scylla_commitlog_total_operations_cycle scylla_commitlog_cycle
scylla_commitlog_total_operations_flush scylla_commitlog_flush
scylla_commitlog_total_operations_flush_limit_exceeded scylla_commitlog_flush_limit_exceeded
scylla_commitlog_total_operations_requests_blocked_memory scylla_commitlog_requests_blocked_memory
scylla_compaction_manager_objects_compactions scylla_compaction_manager_compactions
scylla_cql_total_operations_batches scylla_cql_batches
scylla_cql_total_operations_deletes scylla_cql_deletes
scylla_cql_total_operations_inserts scylla_cql_inserts
scylla_cql_total_operations_reads scylla_cql_reads
scylla_cql_total_operations_updates scylla_cql_updates
scylla_database_bytes_total_result_memory scylla_database_total_result_bytes
scylla_database_queue_length_active_reads scylla_database_active_reads
scylla_database_queue_length_active_reads_streaming scylla_database_active_reads
scylla_database_queue_length_active_reads_system_keyspace scylla_database_active_reads
scylla_database_queue_length_queued_reads scylla_database_queued_reads
scylla_database_queue_length_queued_reads_streaming scylla_database_queued_reads
scylla_database_queue_length_queued_reads_system_keyspace scylla_database_queued_reads
scylla_database_queue_length_requests_blocked_memory scylla_database_requests_blocked_memory_current
scylla_database_total_operations_clustering_filter_count scylla_database_clustering_filter_count
scylla_database_total_operations_clustering_filter_fast_path_count scylla_database_clustering_filter_fast_path_count
scylla_database_total_operations_clustering_filter_sstables_checked scylla_database_clustering_filter_sstables_checked
scylla_database_total_operations_clustering_filter_surviving_sstables scylla_database_clustering_filter_surviving_sstables
scylla_database_total_operations_requests_blocked_memory scylla_database_requests_blocked_memory
scylla_database_total_operations_short_data_queries scylla_database_short_data_queries
scylla_database_total_operations_short_mutation_queries scylla_database_short_mutation_queries
scylla_database_total_operations_sstable_read_queue_overloads scylla_database_sstable_read_queue_overloads
scylla_database_total_operations_total_reads scylla_database_total_reads
scylla_database_total_operations_total_reads_failed scylla_database_total_reads_failed
scylla_database_total_operations_total_writes scylla_database_total_writes
scylla_database_total_operations_total_writes_failed scylla_database_total_writes_failed
scylla_database_total_operations_total_writes_timedout scylla_database_total_writes_timedout
scylla_gossip_derive_heart_beat_version scylla_gossip_heart_beat
scylla_http_0_connections_http_connections scylla_httpd_connections_total
scylla_http_0_current_connections_current scylla_httpd_connections_current
scylla_http_0_http_requests_served scylla_httpd_requests_served
scylla_io_queue_delay_commitlog scylla_io_queue_commitlog_delay
scylla_io_queue_delay_compaction scylla_io_queue_compaction_delay
scylla_io_queue_delay_default scylla_io_queue_default_delay
scylla_io_queue_delay_memtable_flush scylla_io_queue_memtable_flush_delay
scylla_io_queue_derive_commitlog scylla_io_queue_commitlog_total_bytes
scylla_io_queue_derive_compaction scylla_io_queue_compaction_total_bytes
scylla_io_queue_derive_default scylla_io_queue_default_total_bytes
scylla_io_queue_derive_memtable_flush scylla_io_queue_memtable_flush_total_bytes
scylla_io_queue_queue_length_commitlog scylla_io_queue_commitlog_queue_length
scylla_io_queue_queue_length_compaction scylla_io_queue_compaction_queue_length
scylla_io_queue_queue_length_default scylla_io_queue_default_queue_length
scylla_io_queue_queue_length_memtable_flush scylla_io_queue_memtable_flush_queue_length
scylla_io_queue_total_operations_commitlog scylla_io_queue_commitlog_total_operations
scylla_io_queue_total_operations_compaction scylla_io_queue_compaction_total_operations
scylla_io_queue_total_operations_default scylla_io_queue_default_total_operations
scylla_io_queue_total_operations_memtable_flush scylla_io_queue_memtable_flush_total_operations
scylla_lsa_bytes_free_space_in_zones scylla_lsa_free_space_in_zones
scylla_lsa_bytes_large_objects_total_space scylla_lsa_large_objects_total_space_bytes
scylla_lsa_bytes_non_lsa_used_space scylla_lsa_non_lsa_used_space_bytes
scylla_lsa_bytes_small_objects_total_space scylla_lsa_small_objects_total_space_bytes
scylla_lsa_bytes_small_objects_used_space scylla_lsa_small_objects_used_space_bytes
scylla_lsa_bytes_total_space scylla_lsa_total_space_bytes
scylla_lsa_bytes_used_space scylla_lsa_used_space_bytes
scylla_lsa_objects_zones scylla_lsa_zones
scylla_lsa_operations_segments_compacted scylla_lsa_segments_compacted
scylla_lsa_operations_segments_migrated scylla_lsa_segments_migrated
scylla_lsa_percent_occupancy scylla_lsa_occupancy
scylla_memory_bytes_dirty scylla_memory_dirty_bytes
scylla_memory_bytes_regular_dirty scylla_memory_regular_dirty_bytes
scylla_memory_bytes_regular_virtual_dirty scylla_memory_regular_virtual_dirty_bytes
scylla_memory_bytes_streaming_dirty scylla_memory_streaming_dirty_bytes
scylla_memory_bytes_streaming_virtual_dirty scylla_memory_streaming_virtual_dirty_bytes
scylla_memory_bytes_system_dirty scylla_memory_system_dirty_bytes
scylla_memory_bytes_system_virtual_dirty scylla_memory_system_virtual_dirty_bytes
scylla_memory_bytes_virtual_dirty scylla_memory_virtual_dirty_bytes
scylla_memory_memory_allocated_memory scylla_memory_allocated_memory
scylla_memory_memory_free_memory scylla_memory_free_memory
scylla_memory_memory_total_memory scylla_memory_total_memory
scylla_memory_objects_malloc scylla_memory_malloc_live_objects
scylla_memory_total_operations_cross_cpu_free scylla_memory_cross_cpu_free_operations
scylla_memory_total_operations_free scylla_memory_free_operations
scylla_memory_total_operations_malloc scylla_memory_malloc_operations
scylla_memory_total_operations_reclaims scylla_memory_reclaims_operations
scylla_memtables_bytes_pending_flushes scylla_memtables_pending_flushes
scylla_memtables_queue_length_pending_flushes scylla_memtables_pending_flushes_bytes
scylla_query_processor_total_operations_statements_prepared scylla_query_processor_statements_prepared
scylla_reactor_derive_aio_read_bytes scylla_reactor_aio_bytes_read
scylla_reactor_derive_aio_write_bytes scylla_reactor_aio_bytes_write
scylla_reactor_derive_busy_ns scylla_reactor_cpu_busy_ns
scylla_reactor_derive_polls scylla_reactor_polls
scylla_reactor_gauge_load scylla_reactor_utilization
scylla_reactor_gauge_queued_io_requests scylla_reactor_io_queue_requests
scylla_reactor_queue_length_tasks_pending scylla_reactor_tasks_pending
scylla_reactor_queue_length_timers_pending scylla_reactor_timers_pending
scylla_reactor_total_operations_aio_reads scylla_reactor_aio_reads
scylla_reactor_total_operations_aio_writes scylla_reactor_aio_writes
scylla_reactor_total_operations_cexceptions scylla_reactor_cpp_exceptions
scylla_reactor_total_operations_fsyncs scylla_reactor_fsyncs
scylla_reactor_total_operations_io_threaded_fallbacks scylla_reactor_io_threaded_fallbacks
scylla_reactor_total_operations_logging_failures scylla_reactor_logging_failures
scylla_reactor_total_operations_tasks_processed scylla_reactor_tasks_processed
scylla_storage_proxy_coordinator_background_reads scylla_storage_proxy_coordinator_background_read_repairs
scylla_storage_proxy_coordinator_completed_data_reads_local_node scylla_storage_proxy_coordinator_completed_reads_local_node
scylla_storage_proxy_coordinator_data_read_errors_local_node scylla_storage_proxy_coordinator_read_errors_local_node
scylla_storage_proxy_coordinator_data_reads_local_node scylla_storage_proxy_coordinator_reads_local_node
scylla_streaming_derive_total_incoming_bytes scylla_streaming_total_incoming_bytes
scylla_streaming_derive_total_outgoing_bytes scylla_streaming_total_outgoing_bytes
scylla_thrift_connections_thrift_connections scylla_thrift_current_connections
scylla_thrift_current_connections_current scylla_thrift_thrift_connections
scylla_thrift_total_requests_served scylla_thrift_served
scylla_tracing_keyspace_helper_total_operations_bad_column_family_errors scylla_tracing_keyspace_helper_bad_column_family_errors
scylla_tracing_keyspace_helper_total_operations_tracing_errors scylla_tracing_keyspace_helper_tracing_errors
scylla_tracing_queue_length_active_sessions scylla_tracing_active_sessions
scylla_tracing_queue_length_cached_records scylla_tracing_cached_records
scylla_tracing_queue_length_flushing_records scylla_tracing_flushing_records
scylla_tracing_queue_length_pending_for_write_records scylla_tracing_pending_for_write_records
scylla_tracing_total_operations_dropped_records scylla_tracing_dropped_records
scylla_tracing_total_operations_dropped_sessions scylla_tracing_dropped_sessions
scylla_tracing_total_operations_trace_errors scylla_tracing_trace_errors
scylla_tracing_total_operations_trace_records_count scylla_tracing_trace_records_count
scylla_transport_connections_cql_connections scylla_transport_cql_connections
scylla_transport_current_connections_current scylla_transport_current_connections
scylla_transport_queue_length_requests_blocked_memory scylla_transport_requests_blocked_memory
scylla_transport_queue_length_requests_serving scylla_transport_requests_serving
scylla_transport_total_requests_requests_served scylla_transport_requests_served
=========================================================================== ===========================================================================
New Metrics
~~~~~~~~~~~
The following metrics are new in 2018.1
+--------------------------------------------------------------------------+
| New Metric Name |
+==========================================================================+
| scylla_cache_active_reads |
+--------------------------------------------------------------------------+
| scylla_cache_garbage_partitions |
+--------------------------------------------------------------------------+
| scylla_cache_mispopulations |
+--------------------------------------------------------------------------+
| scylla_cache_evictions_from_garbage |
+--------------------------------------------------------------------------+
| scylla_cache_pinned_dirty_memory_overload |
+--------------------------------------------------------------------------+
| scylla_cache_reads |
+--------------------------------------------------------------------------+
| scylla_cache_reads_with_misses |
+--------------------------------------------------------------------------+
| scylla_cache_row_hits |
+--------------------------------------------------------------------------+
| scylla_cache_row_insertions |
+--------------------------------------------------------------------------+
| scylla_cache_row_misses |
+--------------------------------------------------------------------------+
| scylla_cache_sstable_partition_skips |
+--------------------------------------------------------------------------+
| scylla_cache_sstable_reader_recreations |
+--------------------------------------------------------------------------+
| scylla_cache_sstable_row_skips |
+--------------------------------------------------------------------------+
| scylla_cql_batches_pure_logged |
+--------------------------------------------------------------------------+
| scylla_cql_batches_pure_unlogged |
+--------------------------------------------------------------------------+
| scylla_cql_batches_unlogged_from_logged |
+--------------------------------------------------------------------------+
| scylla_cql_prepared_cache_evictions |
+--------------------------------------------------------------------------+
| scylla_cql_prepared_cache_memory_footprint |
+--------------------------------------------------------------------------+
| scylla_cql_prepared_cache_size |
+--------------------------------------------------------------------------+
| scylla_cql_statements_in_batches |
+--------------------------------------------------------------------------+
| scylla_database_active_reads_memory_consumption |
+--------------------------------------------------------------------------+
| scylla_database_counter_cell_lock_acquisition |
+--------------------------------------------------------------------------+
| scylla_database_counter_cell_lock_pending |
+--------------------------------------------------------------------------+
| scylla_database_cpu_flush_quota |
+--------------------------------------------------------------------------+
| scylla_execution_stages_function_calls_enqueued |
+--------------------------------------------------------------------------+
| scylla_execution_stages_function_calls_executed |
+--------------------------------------------------------------------------+
| scylla_execution_stages_tasks_preempted |
+--------------------------------------------------------------------------+
| scylla_execution_stages_tasks_scheduled |
+--------------------------------------------------------------------------+
| scylla_httpd_read_errors |
+--------------------------------------------------------------------------+
| scylla_httpd_reply_errors |
+--------------------------------------------------------------------------+
| scylla_scheduler_queue_length |
+--------------------------------------------------------------------------+
| scylla_scheduler_runtime_ms |
+--------------------------------------------------------------------------+
| scylla_scheduler_shares |
+--------------------------------------------------------------------------+
| scylla_scheduler_tasks_processed |
+--------------------------------------------------------------------------+
| scylla_scylladb_current_version |
+--------------------------------------------------------------------------+
| scylla_sstables_index_page_blocks |
+--------------------------------------------------------------------------+
| scylla_sstables_index_page_hits |
+--------------------------------------------------------------------------+
| scylla_sstables_index_page_misses |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_background_reads |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_foreground_read_repair |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_foreground_reads |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_read_latency |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_write_latency |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_replica_reads |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_replica_received_counter_updates |
+--------------------------------------------------------------------------+
| scylla_transport_unpaged_queries |
+--------------------------------------------------------------------------+
Deprecated Metrics
~~~~~~~~~~~~~~~~~~
The following metrics are deprecated in 2018.1
+--------------------------------------------------------------------------+
| Deprecated Metric Name |
+==========================================================================+
| scylla_cache_total_operations_uncached_wide_partitions |
+--------------------------------------------------------------------------+
| scylla_cache_total_operations_wide_partition_evictions |
+--------------------------------------------------------------------------+
| scylla_io_queue_delay_query |
+--------------------------------------------------------------------------+
| scylla_io_queue_derive_query |
+--------------------------------------------------------------------------+
| scylla_io_queue_queue_length_query |
+--------------------------------------------------------------------------+
| scylla_io_queue_total_operations_query |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_digest_read_errors_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_digest_reads_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_mutation_data_read_errors_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_mutation_data_reads_local_node |
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_completed_mutation_data_reads_local_node|
+--------------------------------------------------------------------------+
| scylla_storage_proxy_coordinator_reads |
+--------------------------------------------------------------------------+

View File

@@ -1,8 +0,0 @@
.. |OS| replace:: Debian 8
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-debian/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: http://www.scylladb.com/enterprise-download/debian8/
.. |ENABLE_APT_REPO| replace:: echo 'deb http://http.debian.net/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list
.. |JESSIE_BACKPORTS| replace:: -t jessie-backports openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2017.1-to-2018.1-ubuntu-and-debian.rst

View File

@@ -1,180 +0,0 @@
=============================================================================================
Upgrade Guide - Scylla Enterprise 2017.1 to 2018.1 for Red Hat Enterprise Linux 7 or CentOS 7
=============================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2017.1 to Scylla Enterprise 2018.1, and rollback to 2017.1 if required.
Applicable versions
===================
This guide covers upgrading Scylla from the following versions: 2017.1.x to Scylla Enterprise version 2018.1.y, on the following platforms:
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Check cluster schema
* Drain node and backup the data
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2018.1 features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes
* Not to apply schema changes
Upgrade steps
=============
Check cluster schema
--------------------
Make sure that all nodes have the schema synched prior to the upgrade. The upgrade will fail if there is a schema disagreement between nodes.
.. code:: sh
nodetool describecluster
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2017.1
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2017.1-2018.1-rpm-rollback-procedure>` the upgrade. If you are not running a 2017.1.x version, stop right here! This guide only covers 2017.1.x to 2018.1.y upgrades.
To upgrade:
1. Update the `Scylla RPM Enterprise repo <http://www.scylladb.com/enterprise-download/centos_rpm/>`_ to **2018.1**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the Scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
* More on :doc:`Scylla Metrics Update - Scylla Enterprise 2017.1 to 2018.1<metric-update-2017.1-to-2018.1>`
.. _upgrade-2017.1-2018.1-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2018.1.x to 2017.1.y. Apply this procedure if an upgrade from 2017.1 to 2018.1 failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2018.1
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to 2017.1, you will:
* Drain the node and stop Scylla
* Retrieve the old Scylla packages
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
1. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/yum.repos.d/scylla.repo
2. Update the `Scylla RPM Enterprise repo <http://www.scylladb.com/enterprise-download/centos_rpm/>`_ to **2017.1**
3. Install
.. code:: sh
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum remove scylla\*tools-core
sudo yum downgrade scylla\* -y
sudo yum install scylla-enterprise
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2017.1 /etc/scylla/scylla.yaml
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from previous snapshot, 2018.1 uses a different set of system tables. Reference doc: :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>`
.. code:: sh
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,8 +0,0 @@
.. |OS| replace:: Ubuntu 14.04 or 16.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.1-to-2018.1/upgrade-guide-from-2017.1-to-2018.1-ubuntu/#rollback-procedure
.. |APT| replace:: Scylla Enterprise Deb repo
.. _APT: http://www.scylladb.com/enterprise-download/
.. |ENABLE_APT_REPO| replace:: sudo add-apt-repository -y ppa:openjdk-r/ppa
.. |JESSIE_BACKPORTS| replace:: openjdk-8-jre-headless
.. include:: /upgrade/_common/upgrade-guide-from-2017.1-to-2018.1-ubuntu-and-debian.rst

View File

@@ -1,38 +0,0 @@
=====================================================
Upgrade Scylla Enterprise 2017
=====================================================
.. toctree::
:titlesonly:
:hidden:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.x.y-to-2017.x.z-rpm>
Ubuntu <upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu>
Debian <upgrade-guide-from-2017.x.y-to-2017.x.z-debian>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade to Scylla Enterprise 2017.x.z</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2017.x.y to 2017.x.z on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2017.x.y-to-2017.x.z-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2017.x.y to 2017.x.z on Ubuntu <upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu>`
* :doc:`Upgrade Scylla Enterprise from 2017.x.y to 2017.x.z on Debian <upgrade-guide-from-2017.x.y-to-2017.x.z-debian>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Debian 8
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-debian/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/debian8/
.. include:: /upgrade/_common/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu-and-debian.rst

View File

@@ -1,153 +0,0 @@
===========================================================================================
Upgrade Guide - Scylla Enterprise 2017.x.y to 2017.x.z for Red Hat Enterprise 7 or CentOS 7
===========================================================================================
This document is a step by step procedure for upgrading from Scylla Enterprise 2017.x.y to 2017.x.z.
Applicable versions
===================
This guide covers upgrading Scylla Enterprise from the following versions: 2017.x.y to 2017.x.z, on the following platforms:
* Red Hat Enterprise Linux, version 7 and later
* CentOS, version 7 and later
* No longer provide packages for Fedora
Upgrade Procedure
=================
.. include:: /upgrade/_common/warning.rst
A Scylla upgrade is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Drain node and backup the data
* Check your current release
* Backup configuration file
* Stop Scylla
* Download and install new Scylla packages
* Start Scylla
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2017.x.z features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes
* Not to apply schema changes
Upgrade steps
=============
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
-------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2017.x.z
Stop Scylla
-----------
.. code:: sh
sudo systemctl stop scylla-server
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version in case you want to :ref:`rollback <upgrade-2017.x.y-to-2017.x.z-rpm-rollback-procedure>` the upgrade. If you are not running a 2017.x.y version, stop right here! This guide only covers 2017.x.y to 2017.x.z upgrades.
To upgrade:
1. Update the `Scylla Enterprise RPM repo <http://www.scylladb.com/enterprise-download/centos_rpm>`_ to **2017.x**
2. install
.. code:: sh
sudo yum update scylla\* -y
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the Scylla version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
.. _upgrade-2017.x.y-to-2017.x.z-rpm-rollback-procedure:
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from Scylla Enterprise release 2017.x.z to 2017.x.y. Apply this procedure if an upgrade from 2017.x.y to 2017.x.z failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2017.x.z
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to 2017.x.y, you will:
* Drain the node and stop Scylla
* Downgrade to previous release
* Restore the configuration file
* Restart Scylla
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
==============
Gracefully shutdown Scylla
--------------------------
.. code:: sh
nodetool drain
sudo systemctl stop scylla-server
Downgrade to previous release
-----------------------------
1. Install
.. code:: sh
sudo yum downgrade scylla\*-2017.x.y
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2017.x.z /etc/scylla/scylla.yaml
Start the node
--------------
.. code:: sh
sudo systemctl start scylla-server
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -1,6 +0,0 @@
.. |OS| replace:: Ubuntu 14.04 or 16.04
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: /upgrade/upgrade-enterprise/upgrade-guide-from-2017.x.y-to-2017.x.z/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu/#rollback-procedure
.. |APT| replace:: Scylla Enterprise deb repo
.. _APT: http://www.scylladb.com/enterprise-download/
.. include:: /upgrade/_common/upgrade-guide-from-2017.x.y-to-2017.x.z-ubuntu-and-debian.rst

View File

@@ -1,37 +0,0 @@
==================================================
Upgrade from Scylla Enterprise 2018.1 to 2019.1
==================================================
.. toctree::
:hidden:
:titlesonly:
Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2018.1-to-2019.1-rpm>
Ubuntu 16.04 <upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04>
Metrics <metric-update-2018.1-to-2019.1>
.. raw:: html
<div class="panel callout radius animated">
<div class="row">
<div class="medium-3 columns">
<h5 id="getting-started">Upgrade to Scylla Enterprise 2019.1</h5>
</div>
<div class="medium-9 columns">
Upgrade guides are available for:
* :doc:`Upgrade Scylla Enterprise from 2018.1.x to 2019.1.y on Red Hat Enterprise Linux and CentOS <upgrade-guide-from-2018.1-to-2019.1-rpm>`
* :doc:`Upgrade Scylla Enterprise from 2018.1.x to 2019.1.y on Ubuntu 16.04 <upgrade-guide-from-2018.1-to-2019.1-ubuntu-16-04>`
* :doc:`Scylla Enterprise Metrics Update - Scylla 2018.1 to 2019.1<metric-update-2018.1-to-2019.1>`
.. raw:: html
</div>
</div>
</div>

View File

@@ -1,141 +0,0 @@
====================================================================
Scylla Enterprise Metric Update - Scylla Enterprise 2018.1 to 2019.1
====================================================================
New Metrics
~~~~~~~~~~~
The following metrics are new in 2019.1 compare to 2018.1
* scylla_alien_receive_batch_queue_length
* scylla_alien_total_received_messages
* scylla_alien_total_sent_messages
* scylla_cql_authorized_prepared_statements_cache_evictions
* scylla_cql_authorized_prepared_statements_cache_size
* scylla_cql_filtered_read_requests
* scylla_cql_filtered_rows_dropped_total
* scylla_cql_filtered_rows_matched_total
* scylla_cql_filtered_rows_read_total
* scylla_cql_rows_read
* scylla_cql_secondary_index_creates
* scylla_cql_secondary_index_drops
* scylla_cql_secondary_index_reads
* scylla_cql_secondary_index_rows_read
* scylla_cql_unpaged_select_queries
* scylla_cql_user_prepared_auth_cache_footprint
* scylla_database_dropped_view_updates
* scylla_database_large_partition_exceeding_threshold
* scylla_database_multishard_query_failed_reader_saves
* scylla_database_multishard_query_failed_reader_stops
* scylla_database_multishard_query_unpopped_bytes
* scylla_database_multishard_query_unpopped_fragments
* scylla_database_paused_reads
* scylla_database_paused_reads_permit_based_evictions
* scylla_database_total_view_updates_failed_local
* scylla_database_total_view_updates_failed_remote
* scylla_database_total_view_updates_pushed_local
* scylla_database_total_view_updates_pushed_remote
* scylla_database_view_building_paused
* scylla_database_view_update_backlog
* scylla_hints_for_views_manager_corrupted_files
* scylla_hints_for_views_manager_discarded
* scylla_hints_for_views_manager_dropped
* scylla_hints_for_views_manager_errors
* scylla_hints_for_views_manager_sent
* scylla_hints_for_views_manager_size_of_hints_in_progress
* scylla_hints_for_views_manager_written
* scylla_hints_manager_corrupted_files
* scylla_hints_manager_discarded
* scylla_hints_manager_dropped
* scylla_hints_manager_errors
* scylla_hints_manager_sent
* scylla_hints_manager_size_of_hints_in_progress
* scylla_hints_manager_written
* scylla_node_operation_mode
* scylla_query_processor_queries
* scylla_reactor_aio_errors
* scylla_reactor_cpu_steal_time_ms
* scylla_scheduler_time_spent_on_task_quota_violations_ms
* scylla_sstables_capped_local_deletion_time
* scylla_sstables_capped_tombstone_deletion_time
* scylla_sstables_cell_tombstone_writes
* scylla_sstables_cell_writes
* scylla_sstables_partition_reads
* scylla_sstables_partition_seeks
* scylla_sstables_partition_writes
* scylla_sstables_range_partition_reads
* scylla_sstables_range_tombstone_writes
* scylla_sstables_row_reads
* scylla_sstables_row_writes
* scylla_sstables_single_partition_reads
* scylla_sstables_sstable_partition_reads
* scylla_sstables_static_row_writes
* scylla_sstables_tombstone_writes
* scylla_storage_proxy_coordinator_background_replica_writes_failed_local_node
* scylla_storage_proxy_coordinator_background_writes_failed
* scylla_storage_proxy_coordinator_last_mv_flow_control_delay
* scylla_storage_proxy_replica_cross_shard_ops
* scylla_transport_requests_blocked_memory_current
* scylla_io_queue_shares
Updated Metrics
~~~~~~~~~~~~~~~
The following metric names have changed between Scylla Enterprise 2018.1 and 2019.1
.. list-table::
:widths: 30 30
:header-rows: 1
* - Scylla 2018.1 Name
- Scylla 2019.1 Name
* - scylla_io_queue_compaction_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_compaction_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_compaction_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_default_delay
- scylla_io_queue_delay
* - scylla_io_queue_default_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_default_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_default_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_memtable_flush_delay
- scylla_io_queue_delay
* - scylla_io_queue_memtable_flush_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_memtable_flush_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_memtable_flush_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_commitlog_delay
- scylla_io_queue_delay
* - scylla_io_queue_commitlog_queue_length
- scylla_io_queue_queue_length
* - scylla_io_queue_commitlog_total_bytes
- scylla_io_queue_total_bytes
* - scylla_io_queue_commitlog_total_operations
- scylla_io_queue_total_operations
* - scylla_io_queue_compaction_delay
- scylla_io_queue_delay
* - scylla_reactor_cpu_busy_ns
- scylla_reactor_cpu_busy_ms
* - scylla_storage_proxy_coordinator_current_throttled_writes
- scylla_storage_proxy_coordinator_current_throttled_base_writes
Deprecated Metrics
~~~~~~~~~~~~~~~~~~
* scylla_database_cpu_flush_quota
* scylla_scollectd_latency
* scylla_scollectd_records
* scylla_scollectd_total_bytes_sent
* scylla_scollectd_total_requests
* scylla_scollectd_total_time_in_ms
* scylla_scollectd_total_values
* scylla_transport_unpaged_queries

Some files were not shown because too many files have changed in this diff Show More