Compare commits

...

253 Commits

Author SHA1 Message Date
Anna Mikhlin
b635a30b59 release: prepare for 5.1.13 2023-06-22 16:44:06 +03:00
Avi Kivity
342d13e26a Update seastar submodule (default priority class shares)
* seastar 8d7cc3129d...5c27348333 (1):
  > reactor: change shares for default IO class from 1 to 200

Fixes #13753.

In 5.3: 37e6e65211
2023-06-21 21:24:56 +03:00
Pavel Emelyanov
db01be31c6 Backport 'Merge 'Enlighten messaging_service::shutdown()''
This includes seastar update titled
  'Merge 'Split rpc::server stop into two parts''

Includes backport of #12244 fix

* br-5.1-backport-ms-shutdown:
  messaging_service: Shutdown rpc server on shutdown
  messaging_service: Generalize stop_servers()
  messaging_service: Restore indentation after previous patch
  messaging_service: Coroutinize stop()
  messaging_service: Coroutinize stop_servers()
  messaging: Shutdown on stop() if it wasn't shut down earlier
  Update seastar submodule

refs: #14031
2023-06-14 09:28:56 +03:00
Pavel Emelyanov
87531915d9 messaging_service: Shutdown rpc server on shutdown
The RPC server now has a lighter .shutdown() method that just does what
m.s. shutdown() needs, so call it. On stop call regular stop to finalize
the stopping process

backport: The messaging_service::shutdown() had conflict due to missing
          e147681d85 commit

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:28:23 +03:00
Pavel Emelyanov
4075daf96d messaging_service: Generalize stop_servers()
Make it do_with_servers() and make it accept method to call and message
to print. This gives the ability to reuse this helper in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:28:23 +03:00
Pavel Emelyanov
b27c5567fa messaging_service: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:28:23 +03:00
Pavel Emelyanov
8877f0b28a messaging_service: Coroutinize stop()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:28:23 +03:00
Pavel Emelyanov
fabc7df720 messaging_service: Coroutinize stop_servers()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:28:23 +03:00
Tomasz Grabiec
bdafe2b98c messaging: Shutdown on stop() if it wasn't shut down earlier
All rpc::client objects have to be stopped before they are
destroyed. Currently this is done in
messaging_service::shutdown(). The cql_test_env does not call
shutdown() currently. This can lead to use-after-free on the
rpc::client object, manifesting like this:

Segmentation fault on shard 0.
Backtrace:
column_mapping::~column_mapping() at schema.cc:?
db::cql_table_large_data_handler::internal_record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:180
operator() at ./db/large_data_handler.cc:123
 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long>(std::__invoke_other, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long>(db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:114
 (inlined by) std::_Function_handler<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long), db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1>::_M_invoke(std::_Any_data const&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long)>::operator()(sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) db::cql_table_large_data_handler::record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:175
seastar::rpc::log_exception(seastar::rpc::connection&, seastar::log_level, char const*, std::__exception_ptr::exception_ptr) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:109
operator() at ./build/release/seastar/./seastar/src/rpc/rpc.cc:788
operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1682
 (inlined by) void seastar::futurize<seastar::future<void> >::satisfy_with_result_of<seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}&&) at ./build/release/seastar/./seastar/include/seastar/core/future.hh:2134
 (inlined by) operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1681
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:781
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2319
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2756
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2925
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2808
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:265
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:156
operator() at ./build/release/seastar/./seastar/src/testing/test_runner.cc:75
 (inlined by) void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char**)::$_0>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73

Fix by making sure that shutdown() is called prior to destruction.

Fixes #12244

Closes #12276
2023-06-14 09:28:23 +03:00
Pavel Emelyanov
d78bc60a74 Update seastar submodule
* seastar 09063faa...8d7cc312 (1):
  > rpc: Introduce server::shutdown()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-14 09:21:20 +03:00
Raphael S. Carvalho
97985a68a1 compaction: Fix incremental compaction for sstable cleanup
After c7826aa910, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes #14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14038

(cherry picked from commit 23443e0574)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14195
2023-06-13 09:57:59 +03:00
Avi Kivity
c1278994d8 Merge 'multishard_mutation_query: make reader_context::lookup_readers() exception safe' from Botond Dénes
With regards to closing the looked-up querier if an exception is thrown. In particular, this requires closing the querier if a semaphore mismatch is detected. Move the table lookup above the line where the querier is looked up, to avoid having to handle the exception from it. As a consequence of closing the querier on the error path, the lookup lambda has to be made a coroutine. This is sad, but this is executed once per page, so its cost should be insignificant when spread over an
entire page worth of work.

Also add a unit test checking that the mismatch is detected in the first place and that readers are closed.

Fixes: #13784

Closes #13790

* github.com:scylladb/scylladb:
  test/boost/database_test: add unit test for semaphore mismatch on range scans
  partition_slice_builder: add set_specific_ranges()
  multishard_mutation_query: make reader_context::lookup_readers() exception safe
  multishard_mutation_query: lookup_readers(): make inner lambda a coroutine

(cherry picked from commit 1c0e8c25ca)
2023-06-08 05:12:12 -04:00
Michał Chojnowski
c33fb41802 data_dictionary: fix forgetting of UDTs on ALTER KEYSPACE
Due to a simple programming oversight, one of keyspace_metadata
constructors is using empty user_types_metadata instead of the
passed one. Fix that.

Fixes #14139

Closes #14143

(cherry picked from commit 1a521172ec)
2023-06-06 21:53:03 +03:00
Kamil Braun
bca4bf6c11 auth: don't use infinite timeout in default_role_row_satisfies query
A long long time ago there was an issue about removing infinite timeouts
from distributed queries: #3603. There was also a fix:
620e950fc8. But apparently some queries
escaped the fix, like the one in `default_role_row_satisfies`.

With the right conditions and timing this query may cause a node to hang
indefinitely on shutdown. A node tries to perform this query after it
starts. If we kill another node which is required to serve this query
right before that moment, the query will hang; when we try to shutdown
the querying node, it will wait for the query to finish (it's a
background task in auth service), which it never does due to infinite
timeout.

Use the same timeout configuration as other queries in this module do.

Fixes #13545.

Closes #14134

(cherry picked from commit f51312e580)
2023-06-06 19:39:55 +03:00
Anna Mikhlin
cf08b19dad release: prepare for 5.1.12 2023-06-05 18:13:42 +03:00
Vlad Zolotarov
0d5751b4b6 scylla_prepare: correctly handle a former 'MQ' mode
Fixes a regression introduced in 80917a1054:
"scylla_prepare: stop generating 'mode' value in perftune.yaml"

When cpuset.conf contains a "full" CPU set the negation of it from
the "full" CPU set is going to generate a zero mask as a irq_cpu_mask.
This is an illegal value that will eventually end up in the generated
perftune.yaml, which in line will make the scylla service fail to start
until the issue is resolved.

In such a case a irq_cpu_mask must represent a "full" CPU set mimicking
a former 'MQ' mode.

\Fixes scylladb/scylladb#11701
Tested:
 - Manually on a 2 vCPU VM in an 'auto-selection' mode.
 - Manually on a large VM (48 vCPUs) with an 'MQ' manually
   enforced.
Message-Id: <20221004004237.2961246-1-vladz@scylladb.com>

(cherry picked from commit 8195dab92a)
2023-06-04 19:26:04 +03:00
Vlad Zolotarov
1c7cdf68d6 scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions
This patch fixes the regression introduced by 3a51e78 which broke
a very important contract: perftune.yaml should not be "touched"
by Scylla scriptology unless explicitly requested.

And a call for scylla_cpuset_setup is such an explicit request.

The issue that the offending patch was intending to fix was that
cpuset.conf was always generated anew for every call of
scylla_cpuset_setup - even if a resulting cpuset.conf would come
out exactly the same as the one present on the disk before tha call.

And since the original code was following the contract mentioned above
it was also deleting perftune.yaml every time too.
However, this was just an unavoidable side-effect of that cpuset.conf
re-generation.

The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf
we should not "touch" perftune.yaml and vise versa.

This patch implements exactly that together with reverting the dangerous
logic introduced by 3a51e78.

\Fixes scylladb/scylladb#11385
\Fixes scylladb/scylladb#10121

(cherry picked from commit c538cc2372)
2023-06-04 19:25:41 +03:00
Vlad Zolotarov
8fc0591f98 scylla_prepare: stop generating 'mode' value in perftune.yaml
Modern perftune.py supports a more generic way of defining IRQ CPUs:
'irq_cpu_mask'.

This patch makes our auto-generation code create a perftune.yaml
that uses this new parameter instead of using outdated 'mode'.

As a side effect, this change eliminates the notion of "incorrect"
value in cpuset.conf - every value is valid now as long as it fits into
the 'all' CPU set of the specific machine.

Auto-generated 'irq_cpu_mask' is going to include all bits from 'all'
CPU mask except those defined in cpuset.conf.

\Fixes scylladb/scylladb#9903

(cherry picked from commit 80917a1054)
2023-06-04 19:25:27 +03:00
Pavel Emelyanov
a0fa2d043d Update seastar submodule
* seastar a6389d17...09063faa (1):
  > rpc: Wait for server socket to stop before killing conns

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-30 20:03:52 +03:00
Botond Dénes
50a3cc6b90 compatible_ring_position_or_view: make it cheap to copy
This class exists for one purpose only: to serve as glue code between
dht::ring_position and boost::icl::interval_map. The latter requires
that keys in its intervals are:
* default constructible
* copyable
* have standalone compare operations

For this reason we have to wrap `dht::ring_position` in a class,
together with a schema to provide all this. This is
`compatible_ring_position`. There is one further requirement by code
using the interval map: it wants to do lookups without copying the
lookup key(s). To solve this, we came up with
`compatible_ring_position_or_view` which is a union of a key or a key
view + schema. As we recently found out, boost::icl copies its keys **a
lot**. It seems to assume these keys are cheap to copy and carelessly
copies them around even when iterating over the map. But
`compatible_ring_position_or_view` is not cheap to copy as it copies a
`dht::ring_position` which allocates, and it does that via an
`std::optional` and `std::variant` to add insult to injury.
This patch make said class cheap to copy, by getting rid of the variant
and storing the `dht::ring_position` via a shared pointer. The view is
stored separately and either points to the ring position stored in the
shared pointer or to an outside ring position (for lookups).

Fixes: #11669

Closes #11670

(cherry picked from commit 169a8a66f2)
2023-05-25 17:30:40 +03:00
Botond Dénes
cfa8fa1d77 Merge 'Backport compaction reevaluation fixes to branch-5.1' from Raphael "Raph" Carvalho
Fixes #13429.
Fixes #12390.
Fixes #13430.

Closes #14009

* github.com:scylladb/scylladb:
  compaction: Make compaction reevaluation actually periodic
  compaction_manager: Fix reactor stalls during periodic submissions
  compaction_manager: reindent postponed_compactions_reevaluation()
  compaction_manager: coroutinize postponed_compactions_reevaluation()
  compaction_manager: make postponed_compactions_reevaluation() return a future
  replica: Reevaluate regular compaction on off-strategy completion
2023-05-25 07:55:17 +03:00
Raphael S. Carvalho
6cdd5ccabd compaction: Make compaction reevaluation actually periodic
The manager intended to periodically reevaluate compaction need for
each registered table. But it's not working as intended.
The reevaluation is one-off.

This means that compaction was not kicking in later for a table, with
low to none write activity, that had expired data 1 hour from now.

Also make sure that reevaluation happens within the compaction
scheduling group.

Fixes #13430.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 156ac0a67a)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-23 21:30:47 -03:00
Raphael S. Carvalho
204baa0c1e compaction_manager: Fix reactor stalls during periodic submissions
Every 1 hour, compaction manager will submit all registered table_state
for a regular compaction attempt, all without yielding.

This can potentially cause a reactor stall if there are 1000s of table
states, as compaction strategy heuristics will run on behalf of each,
and processing all buckets and picking the best one is not cheap.
This problem can be magnified with compaction groups, as each group
is represented by a table state.

This might appear in dashboard as periodic stalls, every 1h, misleading
the investigator into believing that the problem is caused by a
chronological job.

This is fixed by piggybacking on compaction reevaluation loop which
can yield between each submission attempt if needed.

Fixes #12390.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12391

(cherry picked from commit 67ebd70e6e)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-23 21:18:36 -03:00
Avi Kivity
3556d2b4e8 compaction_manager: reindent postponed_compactions_reevaluation()
(cherry picked from commit d2b1d2f695)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-23 21:18:18 -03:00
Avi Kivity
6b699c9667 compaction_manager: coroutinize postponed_compactions_reevaluation()
So much nicer.

(cherry picked from commit 1669025736)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-23 21:17:58 -03:00
Avi Kivity
316ea63ea0 compaction_manager: make postponed_compactions_reevaluation() return a future
postponed_compactions_reevaluation() runs until compaction_manager is
stopped, checking if it needs to launch new compactions.

Make it return a future instead of stashing its completion somewhere.
This makes is easier to convert it to a coroutine.

(cherry picked from commit d2c44cba77)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-23 21:17:26 -03:00
Raphael S. Carvalho
bafde878ba replica: Reevaluate regular compaction on off-strategy completion
When off-strategy compaction completes, regular compaction is not triggered.

If off-strategy output causes the table's SSTable set to not conform the strategy
goal, it means that read and space amplification will be suboptimal until the next
compaction kicks in, which can take undefinite amount of time (e.g. when active
memtable is flushed).

Let's reevaluate compaction on main SSTable set when off-strategy ends.

Fixes #13429.

Backport note: conflict is around compaction_group vs table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 2652b41606)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-23 21:06:14 -03:00
Yaron Kaikov
88eeab7838 release: prepare for 5.1.11 2023-05-22 15:14:22 +03:00
Raphael S. Carvalho
f6230b5eec sstables: Fix use-after-move when making reader in reverse mode
static report:
sstables/mx/reader.cc:1705:58: error: invalid invocation of method 'operator*' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
            legacy_reverse_slice_to_native_reverse_slice(*schema, slice.get()), pc, std::move(trace_state), fwd, fwd_mr, monitor);

Fixes #13394.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 213eaab246)
2023-05-15 20:27:51 +03:00
Raphael S. Carvalho
737285d342 db/view/build_progress_virtual_reader: Fix use-after-move
use-after-free in ctor, which potentially leads to a failure
when locating table from moved schema object.

static report
In file included from db/system_keyspace.cc:51:
./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
                _db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),

Fixes #13395.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 1ecba373d6)
2023-05-15 20:26:17 +03:00
Raphael S. Carvalho
f1ee68e128 index/built_indexes_virtual_reader.hh: Fix use-after-move
static report:
./index/built_indexes_virtual_reader.hh:228:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
                _db.find_column_family(s->ks_name(), system_keyspace::v3::BUILT_VIEWS),

Fixes #13396.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit f8df3c72d4)
2023-05-15 20:24:52 +03:00
Raphael S. Carvalho
7d4abd9e64 replica: Fix use-after-move in table::make_streaming_reader
Variant used by
streaming/stream_transfer_task.cc:        , reader(cf.make_streaming_reader(cf.schema(), std::move(permit_), prs))

as full slice is retrieved after schema is moved (clang evaluates
left-to-right), the stream transfer task can be potentially working
on a stale slice for a particular set of partitions.

static report:
In file included from replica/dirty_memory_manager.cc:6:
replica/database.hh:706:83: error: invalid invocation of method 'operator->' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
        return make_streaming_reader(std::move(schema), std::move(permit), range, schema->full_slice());

Fixes #13397.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 04932a66d3)
2023-05-15 20:22:15 +03:00
Asias He
0b4b5c21ad tombstone_gc: Fix gc_before for immediate mode
The immediate mode is similar to timeout mode with gc_grace_seconds
zero. Thus, the gc_before returned should be the query_time instead of
gc_clock::time_point::max in immediate mode.

Setting gc_before to gc_clock::time_point::max, a row could be dropped
by compaction even if the ttl is not expired yet.

The following procedure reproduces the issue:

- Start 2 nodes

- Insert data

```
CREATE KEYSPACE ks2a WITH REPLICATION = { 'class' : 'SimpleStrategy',
'replication_factor' : 2 };
CREATE TABLE ks2a.tb (pk int, ck int, c0 text, c1 text, c2 text, PRIMARY
KEY(pk, ck)) WITH tombstone_gc = {'mode': 'immediate'};
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (10 ,1, 'x', 'y', 'z')
USING TTL 1000000;
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (20 ,1, 'x', 'y', 'z')
USING TTL 1000000;
INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (30 ,1, 'x', 'y', 'z')
USING TTL 1000000;
```

- Run nodetool flush and nodetool compact

- Compaction drops all data

```
~128 total partitions merged to 0.
```

Fixes #13572

Closes #13800

(cherry picked from commit 7fcc403122)
2023-05-15 10:34:16 +03:00
Takuya ASADA
3c7d2a3284 scylla_kernel_check: suppress verbose iotune messages
Stop printing verbose iotune messages while the check, just print error
message.

Fixes #13373.

Closes #13362

(cherry picked from commit 160c184d0b)
2023-05-14 21:26:15 +03:00
Benny Halevy
d7e65a1a0a view: view_builder: start: demote sleep_aborted log error
This is not really an error, so print it in debug log_level
rather than error log_level.

Fixes #13374

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13462

(cherry picked from commit cc42f00232)
2023-05-14 21:22:21 +03:00
Raphael S. Carvalho
73d80d55d1 Fix use-after-move when initializing row cache with dummy entry
Courtersy of clang-tidy:
row_cache.cc:1191:28: warning: 'entry' used after it was moved [bugprone-use-after-move]
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:60: note: move occurred here
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:28: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{*_schema});

The use-after-move is UB, as for it to happen, depends on evaluation order.

We haven't hit it yet as clang is left-to-right.

Fixes #13400.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13401

(cherry picked from commit d2d151ae5b)
2023-05-14 21:02:52 +03:00
Anna Mikhlin
37538f00f5 release: prepare for 5.1.10 2023-05-08 22:11:10 +03:00
Botond Dénes
1cb11e7e2f Update seastar submodule
* seastar 84858fde...a6389d17 (2):
  > core/on_internal_error: always log error with backtrace
  > on_internal_error: refactor log_error_and_backtrace

Fixes: #13786
2023-05-08 10:36:55 +03:00
Marcin Maliszkiewicz
f4200098ce db: view: use deferred_close for closing staging_sstable_reader
When consume_in_thread throws the reader should still be closed.

Related https://github.com/scylladb/scylla-enterprise/issues/2661

Closes #13398
Refs: scylladb/scylla-enterprise#2661
Fixes: #13413

(cherry picked from commit 99f8d7dcbe)
2023-05-08 09:45:54 +03:00
Botond Dénes
f751613924 Merge 'service:forward_service: use long type instead of counter in function mocking' from Michał Jadwiszczak
Aggregation query on counter column is failing because forward_service is looking for function with counter as an argument and such function doesn't exist. Instead the long type should be used.

Fixes: #12939

Closes #12963

* github.com:scylladb/scylladb:
  test:boost: counter column parallelized aggregation test
  service:forward_service: use long type when column is counter

(cherry picked from commit 61e67b865a)
2023-05-07 14:29:33 +03:00
Michał Jadwiszczak
b38d56367f test/boost/cql_query_test: enable parallelized_aggregation
Run tests for parallelized aggregation with
`enable_parallelized_aggregation` set always to true, so the tests work
even if the default value of the option is false.

Closes #12409

(cherry picked from commit 83bb77b8bb)

Ref #12939.
2023-05-07 14:29:33 +03:00
Anna Stuchlik
991f4ab104 doc: remove the sequential repair option from docs
Fixes https://github.com/scylladb/scylladb/issues/12132

The sequential repair mode is not supported. This commit
removes the incorrect information from the documentation.

Closes #13544

(cherry picked from commit 3d25edf539)
2023-05-07 14:29:33 +03:00
Nadav Har'El
c72058199f cql: fix empty aggregation, and add more tests
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c2be)
2023-05-04 14:18:06 +03:00
Botond Dénes
c69e1a4e12 readers: evictable_reader: skip progress guarantee when next pos is partition start
The evictable reader must ensure that each buffer fill makes forward
progress, i.e. the last fragment in the buffer has a position larger
than the last fragment from the last buffer-fill. Otherwise, the reader
could get stuck in an infinite loop between buffer fills, if the reader
is evicted in-between.
The code guranteeing this forward change has a bug: when the next
expected position is a partition-start (another partition), the code
would loop forever, effectively reading all there is from the underlying
reader.
To avoid this, add a special case to ignore the progress guarantee loop
altogether when the next expected position is a partition start. In this
case, progress is garanteed anyway, because there is exactly one
partition-start fragment in each partition.

Fixes: #13491

Closes #13563

(cherry picked from commit 72003dc35c)
2023-05-02 21:26:52 +03:00
Anna Stuchlik
62c737aa49 doc: fixes https://github.com/scylladb/scylladb/issues/12964, removes the information that the CDC options are experimental
Closes #12973

(cherry picked from commit 4dd1659d0b)
2023-04-27 21:06:58 +03:00
Raphael S. Carvalho
edeec94a89 replica: Fix undefined behavior in table::generate_and_propagate_view_updates()
Undefined behavior because the evaluation order is undefined.

With GCC, where evaluation is right-to-left, schema will be moved
once it's forwarded to make_flat_mutation_reader_from_mutations_v2().

The consequence is that memory tracking of mutation_fragment_v2
(for tracking only permit used by view update), which uses the schema,
can be incorrect. However, it's more likely that Scylla will crash
when estimating memory usage for row, which access schema column
information using schema::column_at(), which in turn asserts that
the requested column does really exist.

Fixes #13093.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13092

(cherry picked from commit 3fae46203d)
2023-04-27 20:49:39 +03:00
Anna Stuchlik
95a63edea8 doc: remove incorrect info about BYPASS CACHE
Fixes https://github.com/scylladb/scylladb/issues/13106

This commit removes the information that BYPASS CACHE
is an Enterprise-only feature and replaces that info
with the link to the BYPASS CACHE description.

Closes #13316

(cherry picked from commit 1cfea1f13c)
2023-04-27 20:49:39 +03:00
Kefu Chai
e04eef29a8 dist/redhat: enforce dependency on %{release} also
* tools/python3 bf6e892...4b04b46 (1):
  > dist: redhat: provide only a single version

s/%{version}/%{version}-%{release}/ in `Requires:` sections.

this enforces the runtime dependencies of exactly the same
releases between scylla packages.

Fixes #13222
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 7165551fd7)
2023-04-27 20:49:39 +03:00
Nadav Har'El
c14ccceec9 test/rest_api: fix flaky test for toppartitions
The REST test test_storage_service.py::test_toppartitions_pk_needs_escaping
was flaky. It tests the toppartition request, which unfortunately needs
to choose a sampling duration in advance, and we chose 1 second which we
considered more than enough - and indeed typically even 1ms is enough!
but very rarely (only know of only one occurance, in issue #13223) one
second is not enough.

Instead of increasing this 1 second and making this test even slower,
this patch takes a retry approach: The tests starts with a 0.01 second
duration, and is then retried with increasing durations until it succeeds
or a 5-seconds duration is reached. This retry approach has two benefits:
1. It de-flakes the test (allowing a very slow test to take 5 seconds
instead of 1 seconds which wasn't enough), and 2. At the same time it
makes a successful test much faster (it used to always take a full
second, now it takes 0.07 seconds on a dev build on my laptop).

A *failed* test may, in some cases, take 10 seconds after this patch
(although in some other cases, an error will be caught immediately),
but I consider this acceptable - this test should pass, after all,
and a failure indicates a regression and taking 10 seconds will be
the last of our worries in that case.

Fixes #13223.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13238

(cherry picked from commit c550e681d7)
2023-04-27 20:49:39 +03:00
Nadav Har'El
29e3cd80fb test/alternator: increase CQL connection timeout
This patch increases the connection timeout in the get_cql_cluster()
function in test/cql-pytest/run.py. This function is used to test
that Scylla came up, and also test/alternator/run uses it to set
up the authentication - which can only be done through CQL.

The Python driver has 2-second and 5-second default timeouts that should
have been more than enough for everybody (TM), but in #13239 we saw
that in one case it apparently wasn't enough. So to be extra safe,
let's increase the default connection-related timeouts to 60 seconds.

Note this change only affects the Scylla *boot* in the test/*/run
scripts, and it does not affect the actual tests - those have different
code to connect to Scylla (see cql_session() in test/cql-pytest/util.py),
and we already increased the timeouts there in #11289.

Fixes #13239

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13291

(cherry picked from commit 4fdcee8415)
2023-04-27 20:49:39 +03:00
Tomasz Grabiec
067f94d3f8 direct_failure_detector: Avoid throwing exceptions in the success path
sleep_abortable() is aborted on success, which causes sleep_aborted
exception to be thrown. This causes scylla to throw every 100ms for
each pinged node. Throwing may reduce performance if happens often.

Also, it spams the logs if --logger-log-level exception=trace is enabled.

Avoid by swallowing the exception on cancellation.

Fixes #13278.

Closes #13279

(cherry picked from commit 99cb948eac)
2023-04-27 20:49:35 +03:00
Avi Kivity
f2f9f26b79 Update seastar submodule
* seastar acdf7dca9b...84858fde99 (1):
  > http: request_parser: fix grammar ambiguity in field_content

Fixes #12468

Related upstream commit: 42575340ba
2023-04-27 17:43:21 +03:00
Benny Halevy
41cfe1c103 utils: clear_gently: do not clear null unique_ptr
Otherwise the null pointer is dereferenced.

Add a unit test reproducing the issue
and testing this fix.

Fixes #13636

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 12877ad026)
2023-04-24 17:51:16 +03:00
Petr Gusev
1de577d696 removenode: add warning in case of exception
The removenode_abort logic that follows the warning
may throw, in which case information about
the original exception was lost.

Fixes: #11722
Closes #11735

(cherry picked from commit 40bd9137f8)
2023-04-24 09:38:59 +02:00
Anna Stuchlik
a6a23acd9b doc: remove load-and-stream option from 5.1
Related: https://github.com/scylladb/scylla-enterprise/issues/2807

This commit removes the --load-and-stream nodetool option
from version 5.1 - it is not supported in this version.

This commit should only be merged to branch-5.1 (not to master)
as the feature will be added in the later versions => in versions
prior to 5.2.x the information about the option is a bug.

Closes #13618
2023-04-24 09:53:42 +03:00
Botond Dénes
efba2c38ef Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun
in `make_group0_history_state_id_mutation`, when adding a new entry to
the group 0 history table, if the parameter `gc_older_than` is engaged,
we create a range tombstone in the mutation which deletes entries older
than the new one by `gc_older_than`. In particular if
`gc_older_than = 0`, we want to delete all older entries.

There was a subtle bug there: we were using millisecond resolution when
generating the tombstone, while the provided state IDs used microsecond
resolution. On a super fast machine it could happen that we managed to
perform two schema changes in a single millisecond; this happened
sometimes in `group0_test.test_group0_history_clearing_old_entries`
on our new CI/promotion machines, causing the test to fail because the
tombstone didn't clear the entry correspodning to the previous schema
change when performing the next schema change (since they happened in
the same millisecond).

Use microsecond resolution to fix that. The consecutive state IDs used
in group 0 mutations are guaranteed to be strictly monotonic at
microsecond resolution (see `generate_group0_state_id` in
service/raft/raft_group0_client.cc).

Fixes #13594

Closes #13604

* github.com:scylladb/scylladb:
  db: system_keyspace: use microsecond resolution for group0_history range tombstone
  utils: UUID_gen: accept decimicroseconds in min_time_UUID

(cherry picked from commit 10c1f1dc80)
2023-04-23 16:03:21 +03:00
Anna Mikhlin
ba1a57bd55 release: prepare for 5.1.9 2023-04-23 10:09:19 +03:00
Botond Dénes
1891ad2551 Merge 'distributed_loader: detect highest generation before populating column families' from Benny Halevy
We should scan all sstables in the table directory and its
subdirectories to determine the highest sstable version and generation
before using it for creating new sstables (via reshard or reshape).

Otherwise, the generations of new sstables created when populating staging (via reshard or reshape) may collide with generations in the base directory, leading to https://github.com/scylladb/scylladb/issues/11789

Refs scylladb/scylladb#11789
Fixes scylladb/scylladb#11793

Closes #11795

* github.com:scylladb/scylladb:
  distributed_loader: populate_column_family: reindent
  distributed_loader: coroutinize populate_column_family
  distributed_loader: table_population_metadata: start: reindent
  distributed_loader: table_population_metadata: coroutinize start_subdir
  distributed_loader: table_population_metadata: start_subdir: reindent
  distributed_loader: pre-load all sstables metadata for table before populating it

(cherry picked from commit 4aa0b16852)
2023-04-21 08:12:44 +03:00
Nadav Har'El
2ea3d5ebf0 cql: USING TTL 0 means unlimited, not default TTL
Our documentation states that writing an item with "USING TTL 0" means it
should never expire. This should be true even if the table has a default
TTL. But Scylla mistakenly handled "USING TTL 0" exactly like having no
USING TTL at all (i.e., it took the default TTL, instead of unlimited).
We had two xfailing tests demonstrating that Scylla's behavior in this
is different from Cassandra. Scylla's behavior in this case was also
undocumented.

By the way, Cassandra used to have the same bug (CASSANDRA-11207) but
it was fixed already in 2016 (Cassandra 3.6).

So in this patch we fix Scylla's "USING TTL 0" behavior to match the
documentation and Cassandra's behavior since 2016. One xfailing test
starts to pass and the second test passes this bug and fails on a
different one. This patch also adds a third test for "USING TTL ?"
with UNSET_VALUE - it behaves, on both Scylla and Cassandra, like a
missing "USING TTL".

The origin of this bug was that after parsing the statement, we saved
the USING TTL in an integer, and used 0 for the case of no USING TTL
given. This meant that we couldn't tell if we have USING TTL 0 or
no USING TTL at all. This patch uses an std::optional so we can tell
the case of a missing USING TTL from the case of USING TTL 0.

Fixes #6447

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13079

(cherry picked from commit a4a318f394)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-04-19 00:12:55 +03:00
Botond Dénes
d7d9300453 mutation/mutation_compactor: consume_partition_end(): reset _stop
The purpose of `_stop` is to remember whether the consumption of the
last partition was interrupted or it was consumed fully. In the former
case, the compactor allows retreiving the compaction state for the given
partition, so that its compaction can be resumed at a later point in
time.
Currently, `_stop` is set to `stop_iteration::yes` whenever the return
value of any of the `consume()` methods is also `stop_iteration::yes`.
Meaning, if the consuming of the partition is interrupted, this is
remembered in `_stop`.
However, a partition whose consumption was interrupted is not always
continued later. Sometimes consumption of a partitions is interrputed
because the partition is not interesting and the downstream consumer
wants to stop it. In these cases the compactor should not return an
engagned optional from `detach_state()`, because there is not state to
detach, the state should be thrown away. This was incorrectly handled so
far and is fixed in this patch, but overwriting `_stop` in
`consume_partition_end()` with whatever the downstream consumer returns.
Meaning if they want to skip the partition, then `_stop` is reset to
`stop_partition::no` and `detach_state()` will return a disengaged
optional as it should in this case.

Fixes: #12629

Closes #13365

(cherry picked from commit bae62f899d)
2023-04-18 03:00:05 -04:00
Avi Kivity
ab2817b4a6 Merge 'Backport "reader_concurrency_semaphore: don't evict inactive readers needlessly" to branch-5.1' from Botond Dénes
The patch doesn't apply cleanly, so a targeted backport PR was necessary.
I also needed to cherry-pick two patches from https://github.com/scylladb/scylladb/pull/13255 that the backported patch depends on. Decided against backporting the entire https://github.com/scylladb/scylladb/pull/13255 as it is quite an intrusive change.

Fixes: https://github.com/scylladb/scylladb/issues/11803

Closes #13516

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: don't evict inactive readers needlessly
  reader_concurrency_semaphore: add stats to record reason for queueing permits
  reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
  reader_concurrency_semaphore: add set_resources()
2023-04-17 12:26:05 +03:00
Raphael S. Carvalho
73a340033e table: Fix disk-space related metrics
total disk space used metric is incorrectly telling the amount of
disk space ever used, which is wrong. It should tell the size of
all sstables being used + the ones waiting to be deleted.
live disk space used, by this defition, shouldn't account the
ones waiting to be deleted.
and live sstable count, shouldn't account sstables waiting to
be deleted.

Fix all that.

Fixes #12717.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 529a1239a9)
2023-04-16 22:18:46 +03:00
Michał Chojnowski
fec4f88744 locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges()
Some callees of update_pending_ranges use the variant of get_address_ranges()
which builds a hashmap of all <endpoint, owned range> pairs. For
everywhere_topology, the size of this map is quadratic in the number of
endpoints, making it big enough to cause contiguous allocations of tens of MiB
for clusters of realistic size, potentially causing trouble for the
allocator (as seen e.g. in #12724). This deserves a correction.

This patch removes the quadratic variant of get_address_ranges() and replaces
its uses with its linear counterpart.

Refs #10337
Refs #10817
Refs #10836
Refs #10837
Fixes #12724

(cherry picked from commit 9e57b21e0c)
2023-04-16 22:00:40 +03:00
Botond Dénes
802beb972e reader_concurrency_semaphore: don't evict inactive readers needlessly
Inactive readers should only be evicted to free up resources for waiting
readers. Evicting them when waiters are not admitted for any other
reason than resources is wasteful and leads to extra load later on when
these evicted readers have to be recreated end requeued.
This patch changes the logic on both the registering path and the
admission path to not evict inactive readers unless there are readers
actually waiting on resources.
A unit-test is also added, reproducing the overly-agressive eviction and
checking that it doesn't happen anymore.

Fixes: #11803

Closes #13286

(cherry picked from commit bd57471e54)
2023-04-14 11:58:53 +03:00
Botond Dénes
1101694169 reader_concurrency_semaphore: add stats to record reason for queueing permits
When diagnosing problems, knowing why permits were queued is very
valuable. Record the reason in a new stats, one for each reason a permit
can be queued.

(cherry picked from commit 7b701ac52e)
2023-04-14 11:58:53 +03:00
Botond Dénes
d361bce0f6 reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
So caller can bump the appropriate counters or log the reason why the
the request cannot be admitted.

(cherry picked from commit bb00405818)
2023-04-14 11:58:53 +03:00
Botond Dénes
7e0dcf9bc5 reader_concurrency_semaphore: add set_resources()
Allowing to change the total or initial resources the semaphore has.
After calling `set_resources()` the semaphore will look like as if it
was created with the specified amount of resources when created.

(cherry picked from commit ecc7c72acd)
2023-04-14 11:58:53 +03:00
Yaron Kaikov
011c5ac37e doc: update supported os for 2022.1
ubuntu22.04 is already supported on both `5.0` and `2022.1`

updating the table

Closes #13340

(cherry picked from commit c80ab78741)
2023-04-05 13:56:37 +03:00
Avi Kivity
9430465a52 Merge 'Transport server error handling fixes backport' from Gusev Petr
This is a backport of #11949

Closes #13303

* github.com:scylladb/scylladb:
  transport server: fix "request size too large" handling
  transport server: fix unexpected server errors handling
  test/cql-pytest.py: add scylla_inject_error() utility
  test/cql-pytest: add simple tests for USE statement

Fixes #12104
2023-03-26 19:55:37 +03:00
Petr Gusev
6f28b77962 transport server: fix "request size too large" handling
Calling _read_buf.close() doesn't imply eof(), some data
may have already been read into kernel or client buffers
and will be returned next time read() is called.
When the _server._max_request_size limit was exceeded
and the _read_buf was closed, the process_request method
finished and we started processing the next request in
connection::process. The unread data from _read_buf was
treated as the header of the next request frame, resulting
in "Invalid or unsupported protocol version" error.

The existing test_shed_too_large_request was adjusted.
It was originally written with the assumption that the data
of a large query would simply be dropped from the socket
and the connection could be used to handle the
next requests. This behaviour was changed in scylladb#8800,
now the connection is closed on the Scylla side and
can no longer be used. To check there are no errors
in this case, we use Scylla metrics, getting them
from the Scylla Prometheus API.

(cherry picked from commit 3263523)
2023-03-24 13:44:36 +04:00
Petr Gusev
06d5557c42 transport server: fix unexpected server errors handling
If request processing ended with an error, it is worth
sending the error to the client through
make_error/write_response. Previously in this case we
just wrote a message to the log and didn't handle the
client connection in any way. As a result, the only
thing the client got in this case was timeout error.

A new test_batch_with_error is added. It is quite
difficult to reproduce error condition in a test,
so we use error injection instead. Passing injection_key
in the body of the request ensures that the exception
will be thrown only for this test request and
will not affect other requests that
the driver may send in the background.

Closes: scylladb#12104

(cherry picked from commit a4cf509)
2023-03-24 13:44:36 +04:00
Nadav Har'El
bc31472469 test/cql-pytest.py: add scylla_inject_error() utility
This patch adds a scylla_inject_error(), a context manager which tests
can use to temporarily enable some error injection while some test
code is running. It can be used to write tests that artificially
inject certain errors instead of trying to reach the elaborate (and
often requiring precise timing or high amounts of data) situation where
they occur naturally.

The error-injection API is Scylla-specific (it uses the Scylla REST API)
and does not work on "release"-mode builds (all other modes are supported),
so when Cassandra or release-mode build are being tested, the test which
uses scylla_inject_error() gets skipped.

Example usage:

```python
    from rest_api import scylla_inject_error
    with scylla_inject_error(cql, "injection_name", one_shot=True):
        # do something here
        ...
```

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12264

(cherry picked from commit 6d2e146aa6)
2023-03-24 13:44:36 +04:00
Nadav Har'El
fcab382e37 test/cql-pytest: add simple tests for USE statement
This patch adds a couple of simple tests for the USE statement: that
without USE one cannot create a table without explicitly specifying
a keyspace name, and with USE, it is possible.

Beyond testing these specific feature, this patch also serves as an
example of how to write more tests that need to control the effective USE
setting. Specifically, it adds a "new_cql" function that can be used to
create a new connection with a fresh USE setting. This is necessary
in such tests, because if multiple tests use the same cql fixture
and its single connection, they will share their USE setting and there
is no way to undo or reset it after being set.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11741

(cherry picked from commit ef0da14d6f)
2023-03-24 13:44:36 +04:00
Anna Stuchlik
be6c3022ce doc: fix the service name in upgrade guides
Fixes https://github.com/scylladb/scylladb/issues/13207

This commit fixes the service and package names in
the upgrade guides 5.0-to-2022.1 and 5.1-to-2022.2.
Service name: scylla-server
Package name: scylla-enterprise

Previous PRs to fix the same issue in other
upgrade guides:
https://github.com/scylladb/scylladb/pull/12679
https://github.com/scylladb/scylladb/pull/12698

This commit must be backported to branch-5.1 and branch 5.2.

Closes #13225

(cherry picked from commit 922f6ba3dd)
2023-03-22 10:37:34 +02:00
Botond Dénes
6fa78b90b5 db/view/view_update_check: check_needs_view_update_path(): filter out non-member hosts
We currently don't clean up the system_distributed.view_build_status
table after removed nodes. This can cause false-positive check for
whether view update generation is needed for streaming.
The proper fix is to clean up this table, but that will be more
involved, it even when done, it might not be immediate. So until then
and to be on the safe side, filter out entries belonging to unknown
hosts from said table.

Fixes: #11905
Refs: #11836

Closes #11860

(cherry picked from commit 84a69b6adb)
2023-03-22 09:08:37 +02:00
Takuya ASADA
8b5a342a92 docker: prevent hostname -i failure when server address is specified
On some docker instance configuration, hostname resolution does not
work, so our script will fail on startup because we use hostname -i to
construct cqlshrc.
To prevent the error, we can use --rpc-address or --listen-address
for the address since it should be same.

Fixes #12011

Closes #12115

(cherry picked from commit 642d035067)
2023-03-21 17:54:42 +02:00
Kamil Braun
6905f5056f service: storage_proxy: sequence CDC preimage select with Paxos learn
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.

Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.

`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.

Fixes #12098

(cherry picked from commit 1ef113691a)
2023-03-21 17:47:13 +02:00
Anna Mikhlin
c90807d29f release: prepare for 5.1.8 2023-03-19 15:17:44 +02:00
Pavel Emelyanov
c468c61ddc Merge '[backport] reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict() ' from Botond Dénes
This PR backports 2f4a793457 to branch-5.1. Said patch depends on some other patches that are not part of any release yet.

Closes #13224

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()
  reader_permit: expose operator<<(reader_permit::state)
  reader_permit: add get_state() accessor
2023-03-17 14:10:41 +03:00
Botond Dénes
05a3e97077 reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()
Instead of open-coding the same, in an incomplete way.
clear_inactive_reads() does incomplete eviction in severeal ways:
* it doesn't decrement _stats.inactive_reads
* it doesn't set the permit to evicted state
* it doesn't cancel the ttl timer (if any)
* it doesn't call the eviction notifier on the permit (if there is one)

The list goes on. We already have an evict() method that all this
correctly, use that instead of the current badly open-coded alternative.

This patch also enhances the existing test for clear_inactive_reads()
and adds a new one specifically for `stop()` being called while having
inactive reads.

Fixes: #13048

Closes #13049

(cherry picked from commit 2f4a793457)
2023-03-17 04:49:19 -04:00
Botond Dénes
41e93b2e69 reader_permit: expose operator<<(reader_permit::state)
(cherry picked from commit ec1c615029)
2023-03-17 04:48:06 -04:00
Botond Dénes
9f00af9395 reader_permit: add get_state() accessor
(cherry picked from commit 397266f420)
2023-03-17 04:48:06 -04:00
Pavel Emelyanov
7e78b29609 Update seastar submodule (cancellable rpc queue)
* seastar 328edb2b...acdf7dca (1):
  > rpc: Keep dummy frame in the outgoing queue until negotiated

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

refs: #11507
refs: #11609
2023-03-14 14:27:58 +03:00
Nadav Har'El
7007c94f5d Merge '[branch-5.1] - minimal fix for crash caused by empty primary key range in LWT update' from Jan Ciołek
This is another attempt to fix #13001 on `branch-5.1`.

In #13001 we found a test case which causes a crash on `branch-5.1` because it didn't handle `UNSET_VALUE` properly:

```python3
def test_unset_insert_where(cql, table2):
    p = unique_key_int()
    stmt = cql.prepare(f'INSERT INTO {table2} (p, c) VALUES ({p}, ?)')
    with pytest.raises(InvalidRequest, match="unset"):
        cql.execute(stmt, [UNSET_VALUE])

def test_unset_insert_where_lwt(cql, table2):
    p = unique_key_int()
    stmt = cql.prepare(f'INSERT INTO {table2} (p, c) VALUES ({p}, ?) IF NOT EXISTS')
    with pytest.raises(InvalidRequest, match="unset"):
        cql.execute(stmt, [UNSET_VALUE])
```

This problem has been fixed on `master` by PR #12517. I tried to backport it to `branch-5.1` (#13029), but this didn't go well - it was a big change that touched a lot of components. It's hard to make sure that it won't cause some unexpected issues.

Then I made a simpler fix  for `branch-5.1`, which achieves the same effect as the original PR (#13057).
The problem is that this effect includes backwards incompatible changes - it bans UNSET_VALUE in some places that `branch-5.1` used to allow.

Breaking changes are bad, so I made this PR, which does an absolutely minimal change to fix the crash.
It adds a check the moment before the crash would happen.

To make sure that everything works correctly, and to detect any possible breaking changes, I wrote a bunch of tests that validate the current behavior.
I also ported some tests from the `master` branch, at least the ones that were in line with the behavior on `branch-5.1`.

Closes #13133

* github.com:scylladb/scylladb:
  cql-pytest/test_unset: port some tests from master branch
  cql-pytest/test_unset: test unset value in UPDATEs with LWT conditions
  cql-pytest/test_unset: test unset value in UPDATEs with IF EXISTS
  cql-pytest/test_unset: test unset value in UPDATE statements
  cql-pytest/test_unset: test unset value in INSERTs with IF NOT EXISTS
  cql-pytest/test_unset: test unset value in INSERT statements
  cas_request: fix crash on unset value in primary key with LWT
2023-03-12 10:17:37 +02:00
Beni Peled
5c5a9633ea release: prepare for 5.1.7 2023-03-12 08:25:57 +02:00
Jan Ciolek
c75359d664 cql-pytest/test_unset: port some tests from master branch
I copied cql-pytest tests from the master branch,
at least the ones that were compatible with branch-5.1

Some of them were expecting an InvalidRequest exception
in case of UNSET VALUES being present in places that
branch-5.1 allows, so I skipped these tests.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 16:22:46 +01:00
Jan Ciolek
24f76f40b7 cql-pytest/test_unset: test unset value in UPDATEs with LWT conditions
Test what happens when an UNSET_VALUE is passed to
an UPDATE statement with an LWT condition.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 16:18:34 +01:00
Jan Ciolek
3f133cfa87 cql-pytest/test_unset: test unset value in UPDATEs with IF EXISTS
Test what happens when an UNSET_VALUE is passed to
an UPDATE statement with IF EXISTS condition.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 16:18:34 +01:00
Jan Ciolek
d66e23b265 cql-pytest/test_unset: test unset value in UPDATE statements
Test what happens when an UNSET_VALUE is passed to
an UPDATE statement.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 16:18:34 +01:00
Jan Ciolek
378e8761b9 cql-pytest/test_unset: test unset value in INSERTs with IF NOT EXISTS
Add tests which test INSERT statements with IF NOT EXISTS,
when an UNSET_VLAUE is passed for some column.
The test are similar to the previous ones done for simple
INSERTs without IF NOT EXISTS.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 16:18:34 +01:00
Jan Ciolek
fc26f6b850 cql-pytest/test_unset: test unset value in INSERT statements
Add some tests which test what happens when an UNSET_VALUE
is passed to an INSERT statement.

Passing it for partition key column is impossible
because python driver doesn't allow it.

Passing it for clustering key column causes Scylla
to silently ignore the INSERT.

Passing it for a regular or static column
causes this column to remain unchanged,
as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 16:18:33 +01:00
Jan Ciolek
7663dc31b8 cas_request: fix crash on unset value in primary key with LWT
Doing an LWT INSERT/UPDATE and passing UNSET_VALUE
for the primary key column used to caused a crash.

This is a minimal fix for this crash.

Crash backtrace pointed to a place where
we tried doing .front() on an empty vector
of primary key ranges.

I added a check that the vector isn't empty.
If it's empty then let's throw an error
and mention that it's most likely
caused by an unset value.

This has been fixed on master,
but the PR that fixed it introduced
breaking changes, which I don't want
to add to branch-5.1.

This fix is absolutely minimal
- it performs the check at the
last moment before a crash.

It's not the prettiest, but it works
and can't introduce breaking changes,
because the new code gets activated
only in cases that would've caused
a crash.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 16:12:18 +01:00
Jan Ciolek
53b6e720f6 cql3: preserve binary_operator.order in search_and_replace
There was a bug in `expr::search_and_replace`.
It doesn't preserve the `order` field of binary_operator.

`order` field is used to mark relations created
using the SCYLLA_CLUSTERING_BOUND.
It is a CQL feature used for internal queries inside Scylla.
It means that we should handle the restriction as a raw
clustering bound, not as an expression in the CQL language.

Losing the SCYLLA_CLUSTERING_BOUND marker could cause issues,
the database could end up selecting the wrong clustering ranges.

Fixes: #13055

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13056

(cherry picked from commit aa604bd935)
2023-03-09 12:52:50 +02:00
Botond Dénes
46d6145b37 sstables/sstable: validate_checksums(): force-check EOF
EOF is only guarateed to be set if one tried to read past the end of the
file. So when checking for EOF, also try to read some more. This
should force the EOF flag into a correct value. We can then check that
the read yielded 0 bytes.
This should ensure that `validate_checksums()` will not falsely declare
the validation to have failed.

Fixes: #11190

Closes #12696

(cherry picked from commit 693c22595a)
2023-03-09 12:31:00 +02:00
Wojciech Mitros
772ac59299 functions: initialize aggregates on scylla start
Currently, UDAs can't be reused if Scylla has been
restarted since they have been created. This is
caused by the missing initialization of saved
UDAs that should have inserted them to the
cql3::functions::functions::_declared map, that
should store all (user-)created functions and
aggregates.

This patch adds the missing implementation in a way
that's analogous to the method of inserting UDF to
the _declared map.

Fixes #11309

(cherry picked from commit e558c7d988)
2023-03-09 12:21:07 +02:00
Tomasz Grabiec
1f334e48b2 row_cache: Destroy coroutine under region's allocator
The reason is alloc-dealloc mismatch of position_in_partition objects
allocated by cursors inside coroutine object stored in the update
variable in row_cache::do_update()

It is allocated under cache region, but in case of exception it will
be destroyed under the standard allocator. If update is successful, it
will be cleared under region allocator, so there is not problem in the
normal case.

Fixes #12068

Closes #12233

(cherry picked from commit 992a73a861)
2023-03-08 20:53:24 +02:00
Gleb Natapov
389050e421 lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once
If on the first call the capture is destroyed the second call may crash.

Fixes: #12958

Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>
(cherry picked from commit 1ce7ad1ee6)
2023-03-08 18:52:01 +02:00
Anna Stuchlik
2152b765a6 doc: Update the documentation landing page
This commit makes the following changes to the docs landing page:

- Adds the ScyllaDB enterprise docs as one of three tiles.

- Modifies the three tiles to reflect the three flavors of ScyllaDB.

- Moves the "New to ScyllaDB? Start here!" under the page title.

- Renames "Our Products" to "Other Products" to list the products other
  than ScyllaDB itself. In addtition, the boxes are enlarged from to
  large-4 to look better.

The major purpose of this commit is to expose the ScyllaDB
documentation.

docs: fix the link
(cherry picked from commit 27bb8c2302)

Closes #13086
2023-03-06 14:19:30 +02:00
Pavel Emelyanov
d43b6db152 azure_snitch: Handle empty zone returned from IMDS
Azure metadata API may return empty zone sometimes. If that happens
shard-0 gets empty string as its rack, but propagates UNKNOWN_RACK to
other shards.

Empty zones response should be handled regardless.

refs: #12185

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12274

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-02 09:16:02 +03:00
Pavel Emelyanov
a514e60e65 snitch: Check http response codes to be OK
Several snitch drivers make http requests to get
region/dc/zone/rack/whatever from the cloud provider. They blindly rely
on the response being successfull and read response body to parse the
data they need from.

That's not nice, add checks for requests finish with http OK statuses.

refs: #12185

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12287

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-02 09:16:02 +03:00
Anna Stuchlik
6eb70caba9 doc: fixes https://github.com/scylladb/scylladb/issues/12954, adds the minimal version from which the 2021.1-to-2022.1 upgrade is supported for Ubuntu, Debian, and image
Closes #12974

(cherry picked from commit 91b611209f)
2023-02-28 13:03:05 +02:00
Botond Dénes
dd094f1230 types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory
Check the first fragment before dereferencing it, the fragment might be
empty, in which case move to the next one.
Found by running range scan tests with random schema and random data.

Fixes: #12821
Fixes: #12823
Fixes: #12708

Closes #12824

(cherry picked from commit ef548e654d)
2023-02-23 22:38:24 +02:00
Yaron Kaikov
530600a646 release: prepare for 5.1.6 2023-02-23 14:28:29 +02:00
Kefu Chai
c8e5f8c66b tools/schema_loader: do not return ref to a local variable
we should never return a reference to local variable.
so in this change, a reference to a static variable is returned
instead. this should address following warning from Clang 17:

```
/home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
        return {};
               ^~
```

Fixes #12875
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12876

(cherry picked from commit 6eab8720c4)
2023-02-22 22:02:51 +02:00
Gleb Natapov' via ScyllaDB development
ca7a13cad2 lwt: upgrade stored mutations to the latest schema during prepare
Currently they are upgraded during learn on a replica. The are two
problems with this.  First the column mapping may not exist on a replica
if it missed this particular schema (because it was down for instance)
and the mapping history is not part of the schema. In this case "Failed
to look up column mapping for schema version" will be thrown. Second lwt
request coordinator may not have the schema for the mutation as well
(because it was freed from the registry already) and when a replica
tries to retrieve the schema from the coordinator the retrieval will fail
causing the whole request to fail with "Schema version XXXX not found"

Both of those problems can be fixed by upgrading stored mutations
during prepare on a node it is stored at. To upgrade the mutation its
column mapping is needed and it is guarantied that it will be present
at the node the mutation is stored at since it is pre-request to store
it that the corresponded schema is available. After that the mutation
is processed using latest schema that will be available on all nodes.

Fixes #10770

Message-Id: <Y7/ifraPJghCWTsq@scylladb.com>
(cherry picked from commit 15ebd59071)
2023-02-22 21:58:30 +02:00
Tomasz Grabiec
d401551020 db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order
trim_clustering_row_ranges_to() is broken for non-full keys in reverse
mode. It will trim the range to
position_in_partition_view::after_key(full_key) instead of
position_in_partition_view::before_key(key), hence it will include the
key in the resulting range rather than exclude it.

Fixes #12180
Refs #1446

(cherry picked from commit 536c0ab194)
2023-02-22 21:52:46 +02:00
Tomasz Grabiec
d952bf4035 types: Fix comparison of frozen sets with empty values
A frozen set can be part of the clustering key, and with compact
storage, the corresponding key component can have an empty value.

Comparison was not prepared for this, the iterator attempts to
deserialize the item count and will fail if the value is empty.

Fixes #12242

(cherry picked from commit 232ce699ab)
2023-02-22 21:44:37 +02:00
Michał Chojnowski
b346136e98 utils: config_file: fix handling of workdir,W in the YAML file
Option names given in db/config.cc are handled for the command line by passing
them to boost::program_options, and by YAML by comparing them with YAML
keys.
boost::program_options has logic for understanding the
long_name,short_name syntax, so for a "workdir,W" option both --workdir and -W
worked, as intended. But our YAML config parsing doesn't have this logic
and expected "workdir,W" verbatim, which is obviously not intended. Fix that.

Fixes #7478
Fixes #9500
Fixes #11503

Closes #11506

(cherry picked from commit af7ace3926)
2023-02-22 21:33:04 +02:00
Takuya ASADA
87e267213d scylla_coredump_setup: fix coredump timeout settings
We currently configure only TimeoutStartSec, but probably it's not
enough to prevent coredump timeout, since TimeoutStartSec is maximum
waiting time for service startup, and there is another directive to
specify maximum service running time (RuntimeMaxSec).

To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it
configures both TimeoutStartSec and TimeoutStopSec).

Fixes #5430

Closes #12757

(cherry picked from commit bf27fdeaa2)
2023-02-19 21:13:59 +02:00
Botond Dénes
5bda9356d5 Merge 'doc: fix the service name from "scylla-enterprise-server" "to "scylla-server"' from Anna Stuchlik
Related https://github.com/scylladb/scylladb/issues/12658.

This issue fixes the bug in the upgrade guides for the released versions.

Closes #12679

* github.com:scylladb/scylladb:
  doc: fix the service name in the upgrade guide for patch releases versions 2022
  doc: fix the service name in the upgrade guide from 2021.1 to 2022.1

(cherry picked from commit 325246ab2a)
2023-02-17 12:20:26 +02:00
Botond Dénes
f9fe48ad89 Merge 'Backport compaction-backlog-tracker fixes to branch-5.1' from Raphael "Raph" Carvalho
Both patches are important to fix inefficiencies when updating the backlog tracker, which can manifest as a reactor stall, on a special event like schema change.

A simple conflict was resolved in the first patch, since master has compaction groups. It was very easy to resolve.

Regression since 1d9f53c881, which is present in 5.1 onwards. So probably it merits a backport to 5.2 too.

Closes #12769

* github.com:scylladb/scylladb:
  compaction: Fix inefficiency when updating LCS backlog tracker
  table: Fix quadratic behavior when inserting sstables into tracker on schema change
2023-02-15 07:26:00 +02:00
Raphael S. Carvalho
0c9a0faf0d compaction: Fix inefficiency when updating LCS backlog tracker
LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker
is calling STCS tracker's replace_sstables() with empty arguments
even when higher levels (> 0) *only* had sstables replaced.
This unnecessary call to STCS tracker will cause it to recompute
the L0 backlog, yielding the same value as before.

As LCS has a fragment size of 0.16G on higher levels, we may be
updating the tracker multiple times during incremental compaction,
which operates on SSTables on higher levels.

Inefficiency is fixed by only updating the STCS tracker if any
L0 sstable is being added or removed from the table.

This may be fixing a quadratic behavior during boot or refresh,
as new sstables are loaded one by one.
Higher levels have a substantial higher number of sstables,
therefore updating STCS tracker only when level 0 changes, reduces
significantly the number of times L0 backlog is recomputed.

Refs #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12676

(cherry picked from commit 1b2140e416)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-07 12:45:34 -03:00
Raphael S. Carvalho
47dcfd866c table: Fix quadratic behavior when inserting sstables into tracker on schema change
Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12593

(cherry picked from commit 87ee547120)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-07 12:43:30 -03:00
Beni Peled
5c9ecd5604 release: prepare for 5.1.5 2023-02-06 14:47:49 +02:00
Anna Stuchlik
1a37b85d14 docs: fix the option name from compaction to compression on the Data Definition page
Fixes the option name in the "Other table options" table on the Data Definition page.

Fixes #12334

Closes #12382

(cherry picked from commit ea7e23bf92)
2023-02-05 20:06:31 +02:00
Botond Dénes
5070ddb723 sstables: track decompressed buffers
Convert decompressed temporary buffers into tracked buffers just before
returning them to the upper layer. This ensures these buffers are known
to the reader concurrency semaphore and it has an accurate view of the
actual memory consumption of reads.

Fixes: #12448

Closes #12454

(cherry picked from commit c4688563e3)
2023-02-05 20:06:31 +02:00
Tomasz Grabiec
7480af58e5 row_cache: Fix violation of the "oldest version are evicted first" when evicting last dummy
Consider the following MVCC state of a partition:

   v2: ==== <7> [entry2] ==== <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

Where === means a continuous range and --- means a discontinuous range.

After two LRU items are evicted (entry1 and entry2), we will end up with:

   v2: ---------------------- <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

This will cause readers to incorrectly think there are no rows before
entry <9>, because the range is continuous in v1, and continuity of a
snapshot is a union of continuous intervals in all versions. The
cursor will see the interval before <9> as continuous and the reader
will produce no rows.

This is only temporary, because current MVCC merging rules are such
that the flag on the latest entry wins, so we'll end up with this once
v1 is no longer needed:

   v2: ---------------------- <9> ===== <last dummy>

...and the reader will go to sstables to fetch the evicted rows before
entry <9>, as expected.

The bug is in rows_entry::on_evicted(), which treats the last dummy
entry in a special way, and doesn't evict it, and doesn't clear the
continuity by omission.

The situation is not easy to trigger because it requires certain
eviction pattern concurrent with multiple reads of the same partition
in different versions, so across memtable flushes.

Closes #12452

(cherry-picked from commit f97268d8f2)

Fixes #12451.
2023-02-05 20:06:31 +02:00
Raphael S. Carvalho
43d46a241f compaction: LCS: don't reshape all levels if only a single breaks disjointness
LCS reshape is compacting all levels if a single one breaks
disjointness. That's unnecessary work because rewriting that single
level is enough to restore disjointness. If multiple levels break
disjointness, they'll each be reshaped in its own iteration, so
reducing operation time for each step and disk space requirement,
as input files can be released incrementally.
Incremental compaction is not applied to reshape yet, so we need to
avoid "major compaction", to avoid the space overhead.
But space overhead is not the only problem, the inefficiency, when
deciding what to reshape when overlapping is detected, motivated
this patch.

Fixes #12495.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12496

(cherry picked from commit f2f839b9cc)
2023-02-05 20:06:31 +02:00
Wojciech Mitros
3807020a7b forward_service: prevent heap use-after-free of forward_aggregates
Currently, we create `forward_aggregates` inside a function that
returns the result of a future lambda that captures these aggregates
by reference. As a result, the aggregates may be destructed before
the lambda finishes, resulting in a heap use-after-free.

To prolong the lifetime of these aggregates, we cannot use a move
capture, because the lambda is wrapped in a with_thread_if_needed()
call on these aggregates. Instead, we fix this by wrapping the
entire return statement in a do_with().

Fixes #12528

Closes #12533

(cherry picked from commit 5f45b32bfa)
2023-02-05 20:06:31 +02:00
Botond Dénes
d6fb20f30e types: is_tuple(): handle reverse types
Currently reverse types match the default case (false), even though they
might be wrapping a tuple type. One user-visible effect of this is that
a schema, which has a reversed<frozen<UDT>> clustering key component,
will have this component incorrectly represented in the schema cql dump:
the UDT will loose the frozen attribute. When attempting to recreate
this schema based on the dump, it will fail as the only frozen UDTs are
allowed in primary key components.

Fixes: #12576

Closes #12579

(cherry picked from commit ebc100f74f)
2023-02-05 20:06:31 +02:00
Calle Wilund
33c20eebe6 alterator::streams: Sort tables in list_streams to ensure no duplicates
Fixes #12601 (maybe?)

Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.

(cherry picked from commit da8adb4d26)
2023-02-05 20:06:28 +02:00
Benny Halevy
f3a6af663d view: row_lock: lock_ck: find or construct row_lock under partition lock
Since we're potentially searching the row_lock in parallel to acquiring
the read_lock on the partition, we're racing with row_locker::unlock
that may erase the _row_locks entry for the same clustering key, since
there is no lock to protect it up until the partition lock has been
acquired and the lock_partition future is resolved.

This change moves the code to search for or allocate the row lock
_after_ the partition lock has been acquired to make sure we're
synchronously starting the read/write lock function on it, without
yielding, to prevent this use-after-free.

This adds an allocation for copying the clustering key in advance
even if a row_lock entry already exists, that wasn't needed before.
It only us slows down (a bit) when there is contention and the lock
already existed when we want to go locking. In the fast path there
is no contention and then the code already had to create the lock
and copy the key. In any case, the penalty of copying the key once
is tiny compared to the rest of the work that view updates are doing.

This is required on top of 5007ded2c1 as
seen in https://github.com/scylladb/scylladb/issues/12632
which is closely related to #12168 but demonstrates a different race
causing use-after-free.

Fixes #12632

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4b5e324ecb)
2023-02-05 17:38:29 +02:00
Anna Stuchlik
1685af9829 docs: fixes https://github.com/scylladb/scylladb/issues/12654, update the links to the Download Center
Closes #12655

(cherry picked from commit 64cc4c8515)
2023-02-05 17:20:45 +02:00
Anna Stuchlik
bb880c7658 doc: fixes https://github.com/scylladb/scylladb/issues/12672, fix the redirects to the Cloud docs
Closes #12673

(cherry picked from commit 2be131da83)
2023-02-05 17:17:46 +02:00
Kefu Chai
f952d397e8 cql3/selection: construct string_view using char* not size
before this change, we construct a sstring from a comma statement,
which evaluates to the return value of `name.size()`, but what we
expect is `sstring(const char*, size_t)`.

in this change

* instead of passing the size of the string_view,
  both its address and size are used
* `std::string_view` is constructed instead of sstring, for better
  performance, as we don't need to perform a deep copy

the issue is reported by GCC-13:

```
In file included from cql3/selection/selectable.cc:11:
cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
                                                           ^~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12666

(cherry picked from commit 186ceea009)

Fixes #12739.

(cherry picked from commit b588b19620)
2023-02-05 13:51:22 +02:00
Michał Chojnowski
5e88421360 commitlog: fix total_size_on_disk accounting after segment file removal
Currently, segment file removal first calls `f.remove_file()` and
does `total_size_on_disk -= f.known_size()` later.
However, `remove_file()` resets `known_size` to 0, so in effect
the freed space in not accounted for.

`total_size_on_disk` is not just a metric. It is also responsible
for deciding whether a segment should be recycled -- it is recycled
only if `total_size_on_disk - known_size < max_disk_size`.
Therefore this bug has dire performance consequences:
if `total_size_on_disk - known_size` ever exceeds `max_disk_size`,
the recycling of commitlog segments will stop permanently, because
`total_size_on_disk - known_size` will never go back below
`max_disk_size` due to the accounting bug. All new segments from this
point will be allocated from scratch.

The bug was uncovered by a QA performance test. It isn't easy to trigger --
it took the test 7 hours of constant high load to step into it.
However, the fact that the effect is permanent, and degrades the
performance of the cluster silently, makes the bug potentially quite severe.

The bug can be easily spotted with Prometheus as infinitely rising
`commitlog_total_size_on_disk` on the affected shards.

Fixes #12645

Closes #12646

(cherry picked from commit fa7e904cd6)
2023-02-01 21:54:52 +02:00
Kamil Braun
1945102ca0 docs: fix problems with Raft documentation
Fix some problems in the documentation, e.g. it is not possible to
enable Raft in an existing cluster in 5.0, but the documentation claimed
that it is.

(cherry picked from commit 1cc68b262e)

Cherry-pick note: the original commit added a lot of new stuff like
describing the Raft upgrade procedure, but also fixed problems with the
existing documentation. In this backport we include only the latter.

Closes #12582
2023-01-24 13:35:24 +02:00
Anna Mikhlin
be3f6f8c7b release: prepare for 5.1.4 2023-01-22 15:29:48 +02:00
Nadav Har'El
94735f63a3 Merge 'doc: add the upgrade guide for ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/12315

This PR adds the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2.
Instead of adding separate guides per platform, I've merged the information to create one platform-agnostic guide, similar to what we did for [OSS->OSS](https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/) and [Enterprise->Enterprise ](https://github.com/scylladb/scylladb/pull/12339)guides.

Closes #12450

* github.com:scylladb/scylladb:
  doc: add the new upgrade guide to the toctree and fix its name
  docs: add the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2

(cherry picked from commit 7192283172)
2023-01-20 14:29:59 +01:00
Michał Sala
b0d28919c0 forward_service: fix timeout support in parallel aggregates
`forward_request` verb carried information about timeouts using
`lowres_clock::time_point` (that came from local steady clock
`seastar::lowres_clock`). The time point was produced on one node and
later compared against other node `lowres_clock`. That behavior
was wrong (`lowres_clock::time_point`s produced with different
`lowres_clock`s cannot be compared) and could lead to delayed or
premature timeout.

To fix this issue, `lowres_clock::time_point` was replaced with
`lowres_system_clock::time_point` in `forward_request` verb.
Representation to which both time point types serialize is the same
(64-bit integer denoting the count of elapsed nanoseconds), so it was
possible to do an in-place switch of those types using logic suggested
by @avikivity:
    - using steady_clock is just broken, so we aren't taking anything
        from users by breaking it further
    - once all nodes are upgraded, it magically starts to work

Closes #12529

(cherry picked from commit bbbe12af43)

Fixes #12458
2023-01-18 14:09:37 +02:00
Anna Mikhlin
addc4666d5 release: prepare for 5.1.3 2023-01-12 15:51:01 +02:00
Botond Dénes
a14ffbd5e2 Merge 'Backport 5.1 cleanup compaction flush memtable' from Benny Halevy
This a backport of 9fa1783892 (#11902) to branch-5.1

Flush the memtable before cleaning up the table so not to leave any disowned tokens in the memtable
as they might be resurrected if left in the memtable.

Refs #1239

Closes #12490

* github.com:scylladb/scylladb:
  table: perform_cleanup_compaction: flush memtable
  table: add perform_cleanup_compaction
  api: storage_service: add logging for compaction operations et al
2023-01-11 08:03:35 +02:00
Benny Halevy
ea56ecace0 table: perform_cleanup_compaction: flush memtable
We don't explicitly cleanup the memtable, while
it might hold tokens disowned by the current node.

Flush the memtable before performing cleanup compaction
to make sure all tokens in the memtable are cleaned up.

Note that non-owned ranges are invalidate in the cache
in compaction_group::update_main_sstable_list_on_compaction_completion
using desc.ranges_for_cache_invalidation.

\Fixes #1239

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit eb3a94e2bc)
2023-01-11 07:55:43 +02:00
Benny Halevy
fe8d8f97e2 table: add perform_cleanup_compaction
Move the integration with compaction_manager
from the api layer to the tabel class so
it can also make sure the memtable is cleaned up in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit fc278be6c4)
2023-01-11 07:51:29 +02:00
Benny Halevy
44e920cbb0 api: storage_service: add logging for compaction operations et al
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from 85523c45c0)
2023-01-11 07:46:09 +02:00
Michał Chojnowski
b114551d53 configure: don't reduce parsers' optimization level to 1 in release
The line modified in this patch was supposed to increase the
optimization levels of parsers in debug mode to 1, because they
were too slow otherwise. But as a side effect, it also reduced the
optimization level in release mode to 1. This is not a problem
for the CQL frontend, because statement preparation is not
performance-sensitive, but it is a serious performance problem
for Alternator, where it lies in the hot path.

Fix this by only applying the -O1 to debug modes.

Fixes #12463

Closes #12460

(cherry picked from commit 08b3a9c786)
2023-01-08 01:34:56 +02:00
Nadav Har'El
099145fe9a materialized view: fix bug in some large modifications to base partitions
Sometimes a single modification to a base partition requires updates to
a large number of view rows. A common example is deletion of a base
partition containing many rows. A large BATCH is also possible.

To avoid large allocations, we split the large amount of work into
batch of 100 (max_rows_for_view_updates) rows each. The existing code
assumed an empty result from one of these batches meant that we are
done. But this assumption was incorrect: There are several cases when
a base-table update may not need a view update to be generated (see
can_skip_view_updates()) so if all 100 rows in a batch were skipped,
the view update stopped prematurely. This patch includes two tests
showing when this bug can happen - one test using a partition deletion
with a USING TIMESTAMP causing the deletion to not affect the first
100 rows, and a second test using a specially-crafed large BATCH.
These use cases are fairly esoteric, but in fact hit a user in the
wild, which led to the discovery of this bug.

The fix is fairly simple: To detect when build_some() is done it is no
longer enough to check if it returned zero view-update rows; Rather,
it explicitly returns whether or not it is done as an std::optional.

The patch includes several tests for this bug, which pass on Cassandra,
failed on Scylla before this patch, and pass with this patch.

Fixes #12297.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12305

(cherry picked from commit 92d03be37b)
2023-01-04 10:05:18 +02:00
Avi Kivity
0bdbce90f4 Merge 'reader_concurrency_semaphore: fix waiter/inactive race' from Botond Dénes
We recently (in 7fbad8de87) made sure all admission paths can trigger the eviction of inactive reads. As reader eviction happens in the background, a mechanism was added to make sure only a single eviction fiber was running at any given time. This mechanism however had a preemption point between stopping the fiber and releasing the evict lock. This gave an opportunity for either new waiters or inactive readers to be added, without the fiber acting on it. Since it still held onto the lock, it also prevented from other eviction fibers to start. This could create a situation where the semaphore could admit new reads by evicting inactive ones, but it still has waiters. Since an empty waitlist is also an admission criteria, once one waiter is wrongly added, many more can accumulate.
This series fixes this by ensuring the lock is released in the instant the fiber decides there is no more work to do.
It also fixes the assert failure on recursive eviction and adds a detection to the inactive/waiter contradiction.

Fixes: #11923
Refs: #11770

Closes #12026

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: do_wait_admission(): detect admission-waiter anomaly
  reader_concurrency_semaphore: evict_readers_in_the_background(): eliminate blind spot
  reader_concurrency_semaphore: do_detach_inactive_read(): do a complete detach

(cherry picked from commit 15ee8cfc05)
2023-01-03 16:45:51 +02:00
Botond Dénes
8ffdb8546b reader_concurrency_semaphore: unify admission logic across all paths
The semaphore currently has two admission paths: the
obtain_permit()/with_permit() methods which admits permits on user
request (the front door) and the maybe_admit_waiters() which admits
permits based on internal events like memory resource being returned
(the back door). The two paths used their own admission conditions
and naturally this means that they diverged in time. Notably,
maybe_admit_waiters() did not look at inactive readers assuming that if
there are waiters there cannot be inactive readers. This is not true
however since we merged the execution-stage into the semaphore. Waiters
can queue up even when there are inactive reads and thus
maybe_admit_waiters() has to consider evicting some of them to see if
this would allow for admitting new reads.
To avoid such divergence in the future, the admission logic was moved
into a new method can_admit_read() which is now shared between the two
method families. This method now checks for the possibility of evicting
inactive readers as well.
The admission logic was tuned slightly to only consider evicting
inactive readers if there is a real possibility that this will result
in admissions: notably, before this patch, resource availability was
checked before stalls were (used permits == blocked permits), so we
could evict readers even if this couldn't help.
Because now eviction can be started from maybe_admit_waiters(), which is
also downstream from eviction, we added a flag to avoid recursive
evict -> maybe admit -> evict ... loops.

Fixes: #11770

Closes #11784

(cherry picked from commit 7fbad8de87)
2023-01-03 16:45:17 +02:00
Takuya ASADA
db382697f1 scylla_setup: fix incorrect type definition on --online-discard option
--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.

We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.

Fixes #11700

Closes #11831

(cherry picked from commit acc408c976)
2022-12-28 20:44:02 +02:00
Petr Gusev
3c02e5d263 cql: batch statement, inserting a row with a null key column should be forbidden
Regular INSERT statements with null values for primary key
components are rejected by Scylla since #9286 and #9314.
Batch statements missed a similar check, this patch
fixes it.

Fixes: #12060
(cherry picked from commit 7730c4718e)
2022-12-28 18:15:40 +02:00
Anna Mikhlin
4c0f7ea098 release: prepare for 5.1.2 2022-12-25 20:53:22 +02:00
Botond Dénes
c14a0340ca mutation_compactor: reset stop flag on page start
When the mutation compactor has all the rows it needs for a page, it
saves the decision to stop in a member flag: _stop.
For single partition queries, the mutation compactor is kept alive
across pages and so it has a method, start_new_page() to reset its state
for the next page. This method didn't clear the _stop flag. This meant
that the value set at the end of the previous could cause the new page
and subsequently the entire query to be stopped prematurely.
This can happen if the new page starts with a row that is covered by a
higher level tombstone and is completely empty after compaction.
Reset the _stop flag in start_new_page() to prevent this.

This commit also adds a unit test which reproduces the bug.

Fixes: #12361

Closes #12384

(cherry picked from commit b0d95948e1)
2022-12-25 09:45:30 +02:00
Botond Dénes
aa523141f9 Merge 'Backport Alternator TTL tests' from Nadav Har'El
This series backports several patches which add or enable tests for  Alternator TTL. The series does not touch the code - just tests.
The goal of backporting more tests is to get the code - which is already in branch 5.1 - tested. It wasn't a good idea to backport code without backporting the tests for it.

Closes #12200
Fixes #11374

* github.com:scylladb/scylladb:
  test/alternator: increase timeout on TTL tests
  test/alternator: fix timeout in flaky test test_ttl_stats
  test/alternator: test Alternator TTL metrics
  test/alternator: skip fewer Alternator TTL tests
2022-12-22 09:51:46 +02:00
Michał Chojnowski
86240d6344 sstables: index_reader: always evict the local cache gently
Due to an oversight, the local index cache isn't evicted gently
when _upper_bound existed. This is a source of reactor stalls.
Fix that.

Fixes #12271

Closes #12364

(cherry picked from commit d9269abf5b)
2022-12-21 13:42:54 +02:00
Nadav Har'El
95a94a2687 Merge 'doc: fix the CQL version in the Interfaces table' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-doc-issues/issues/816
Fix https://github.com/scylladb/scylla-docs/issues/1613

This PR fixes the CQL version in the Interfaces page, so that it is the same as in other places across the docs and in sync with the version reported by the ScyllaDB (see https://github.com/scylladb/scylla-doc-issues/issues/816#issuecomment-1173878487).

To make sure the same CQL version is used across the docs, we should use the `|cql-version| `variable rather than hardcode the version number on several pages.
The variable is specified in the conf.py file:
```
rst_prolog = """
.. |cql-version| replace:: 3.3.1
"""
```

Closes #11320

* github.com:scylladb/scylladb:
  doc: add the Cassandra version on which the tools are based
  doc: fix the version number
  doc: update the Enterprise version where the ME format was introduced
  doc: add the ME format to the Cassandar Compatibility page
  doc: replace Scylla with ScyllaDB
  doc: rewrite the Interfaces table to the new format to include more information about CQL support
  doc: remove the CQL version from pages other than Cassandra compatibility
  doc: fix the CQL version in the Interfaces table

(cherry picked from commit ee606a5d52)
2022-12-21 09:51:14 +02:00
Benny Halevy
9173a3d808 view: row_lock: lock_ck: serialize partition and row locking
The problematic scenario this patch fixes might happen due to
unfortunate serialization of locks/unlocks between lock_pk and lock_ck,
as follows:

    1. lock_pk acquires an exclusive lock on the partition.
    2.a lock_ck attempts to acquire shared lock on the partition
        and any lock on the row. both cases currently use a fiber
        returning a future<rwlock::holder>.
    2.b since the partition is locked, the lock_partition times out
        returning an exceptional future.  lock_row has no such problem
        and succeeds, returning a future holding a rwlock::holder,
        pointing to the row lock.
    3.a the lock_holder previously returned by lock_pk is destroyed,
        calling `row_locker::unlock`
    3.b row_locker::unlock sees that the partition is not locked
        and erases it, including the row locks it contains.
    4.a when_all_succeeds continuation in lock_ck runs.  Since
        the lock_partition future failed, it destroyes both futures.
    4.b the lock_row future is destroyed with the rwlock::holder value.
    4.c ~holder attempts to return the semaphore units to the row rwlock,
        but the latter was already destroyed in 3.b above.

Acquiring the partition lock and row lock in parallel
doesn't help anything, but it complicates error handling
as seen above,

This patch serializes acquiring the row lock in lock_ck
after locking the partition to prevent the above race.

This way, erasing the unlocked partition is never expected
to happen while any of its rows locks is held.

Fixes #12168

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12208

(cherry picked from commit 5007ded2c1)
2022-12-13 14:51:44 +02:00
Botond Dénes
7942041b95 Merge 'doc: update the 5.1 upgrade guide with the mode-related information' from Anna Stuchlik
This PR adds the link to the KB article about updating the mode after the upgrade to the 5.1 upgrade guide.
In addition, I have:
- updated the KB article to include the versions affected by that change.
- fixed the broken link to the page about metric updates (it is not related to the KB article, but I fixed it in the same PR to limit the number of PRs that need to be backported).

Related: https://github.com/scylladb/scylladb/pull/11122

Closes #12148

* github.com:scylladb/scylladb:
  doc: update the releases in the KB about updating the mode after upgrade
  doc: fix the broken link in the 5.1 upgrade guide
  doc: add the link to the 5.1-related KB article to the 5.1 upgrade guide

(cherry picked from commit 897b501ba3)
2022-12-09 07:27:50 +02:00
Anna Mikhlin
1cfedc5b59 release: prepare for 5.1.1 2022-12-08 09:41:33 +02:00
Botond Dénes
606ed61263 Merge '[branch 5.1 backport] doc: fix the notes on the OS Support by Platform and Version page' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11783.

Closes #12229

* github.com:scylladb/scylladb:
  doc: replace Scylla with ScyllaDB
  doc: add a comment to remove in future versions any information that refers to previous releases
  doc: rewrite the notes to improve clarity
  doc: remove the reperitions from the notes
2022-12-07 14:10:14 +02:00
Anna Stuchlik
796e4c39f8 doc: replace Scylla with ScyllaDB
(cherry picked from commit 09b0e3f63e)
2022-12-07 13:01:34 +01:00
Anna Stuchlik
960434f784 doc: add a comment to remove in future versions any information that refers to previous releases
(cherry picked from commit 9e2b7e81d3)
2022-12-07 12:55:56 +01:00
Anna Stuchlik
05aed0417a doc: rewrite the notes to improve clarity
(cherry picked from commit fc0308fe30)
2022-12-07 12:55:17 +01:00
Anna Stuchlik
9a1fc200e1 doc: remove the reperitions from the notes
(cherry picked from commit 1bd0bc00b3)
2022-12-07 12:54:36 +01:00
Tomasz Grabiec
c7e9bbc377 Merge 'raft: server: handle aborts when waiting for config entry to commit' from Kamil Braun
Changing configuration involves two entries in the log: a 'joint
configuration entry' and a 'non-joint configuration entry'. We use
`wait_for_entry` to wait on the joint one. To wait on the non-joint one,
we use a separate promise field in `server`. This promise wasn't
connected to the `abort_source` passed into `set_configuration`.

The call could get stuck if the server got removed from the
configuration and lost leadership after committing the joint entry but
before committing the non-joint one, waiting on the promise. Aborting
wouldn't help. Fix this by subscribing to the `abort_source` in
resolving the promise exceptionally.

Furthermore, make sure that two `set_configuration` calls don't step on
each other's toes by one setting the other's promise. To do that, reset
the promise field at the end of `set_configuration` and check that it's
not engaged at the beginning.

Fixes #11288.

Closes #11325

* github.com:scylladb/scylladb:
  test: raft: randomized_nemesis_test: additional logging
  raft: server: handle aborts when waiting for config entry to commit

(cherry picked from commit 83850e247a)
2022-12-06 17:12:32 +01:00
Tomasz Grabiec
a78dac7ae9 Merge 'raft: server: drop waiters in applier_fiber instead of io_fiber' from Kamil Braun
When `io_fiber` fetched a batch with a configuration that does not
contain this node, it would send the entries committed in this batch to
`applier_fiber` and proceed by any remaining entry dropping waiters (if
the node was no longer a leader).

If there were waiters for entries committed in this batch, it could
either happen that `applier_fiber` received and processed those entries
first, notifying the waiters that the entries were committed and/or
applied, or it could happen that `io_fiber` reaches the dropping waiters
code first, causing the waiters to be resolved with
`commit_status_unknown`.

The second scenario is undesirable. For example, when a follower tries
to remove the current leader from the configuration using
`modify_config`, if the second scenario happens, the follower will get
`commit_status_unknown` - this can happen even though there are no node
or network failures. In particular, this caused
`randomized_nemesis_test.remove_leader_with_forwarding_finishes` to fail
from time to time.

Fix it by serializing the notifying and dropping of waiters in a single
fiber - `applier_fiber`. We decided to move all management of waiters
into `applier_fiber`, because most of that management was already there
(there was already one `drop_waiters` call, and two `notify_waiters`
calls). Now, when `io_fiber` observes that we've been removed from the
config and no longer a leader, instead of dropping waiters, it sends a
message to `applier_fiber`. `applier_fiber` will drop waiters when
receiving that message.

Improve an existing test to reproduce this scenario more frequently.

Fixes #11235.

Closes #11308

* github.com:scylladb/scylladb:
  test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes`
  raft: server: drop waiters in `applier_fiber` instead of `io_fiber`
  raft: server: use `visit` instead of `holds_alternative`+`get`

(cherry picked from commit 9c4e32d2e2)
2022-12-06 17:12:03 +01:00
Nadav Har'El
0debb419f7 Merge 'alternator: fix wrong 'where' condition for GSI range key' from Marcin Maliszkiewicz
Contains fixes requested in the issue (and some tiny extras), together with analysis why they don't affect the users (see commit messages).

Fixes [ #11800](https://github.com/scylladb/scylladb/issues/11800)

Closes #11926

* github.com:scylladb/scylladb:
  alternator: add maybe_quote to secondary indexes 'where' condition
  test/alternator: correct xfail reason for test_gsi_backfill_empty_string
  test/alternator: correct indentation in test_lsi_describe
  alternator: fix wrong 'where' condition for GSI range key

(cherry picked from commit ce7c1a6c52)
2022-12-05 20:18:39 +02:00
Nadav Har'El
0cfb950569 cql: fix column-name aliases in SELECT JSON
The SELECT JSON statement, just like SELECT, allows the user to rename
selected columns using an "AS" specification. E.g., "SELECT JSON v AS foo".
This specification was not honored: We simply forgot to look at the
alias in SELECT JSON's implementation (we did it correctly in regular
SELECT). So this patch fixes this bug.

We had two tests in cassandra_tests/validation/entities/json_test.py
that reproduced this bug. The checks in those tests now pass, but these
two tests still continue to fail after this patch because of two other
unrelated bugs that were discovered by the same tests. So in this patch
I also add a new test just for this specific issue - to serve as a
regression test.

Fixes #8078

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12123

(cherry picked from commit c5121cf273)
2022-12-05 20:12:44 +02:00
Nadav Har'El
eebe77b5b8 materialized views: fix view writes after base table schema change
When we write to a materialized view, we need to know some information
defined in the base table such as the columns in its schema. We have
a "view_info" object that tracks each view and its base.

This view_info object has a couple of mutable attributes which are
used to lazily-calculate and cache the SELECT statement needed to
read from the base table. If the base-table schema ever changes -
and the code calls set_base_info() at that point - we need to forget
this cached statement. If we don't (as before this patch), the SELECT
will use the wrong schema and writes will no longer work.

This patch also includes a reproducing test that failed before this
patch, and passes afterwords. The test creates a base table with a
view that has a non-trivial SELECT (it has a filter on one of the
base-regular columns), makes a benign modification to the base table
(just a silly addition of a comment), and then tries to write to the
view - and before this patch it fails.

Fixes #10026
Fixes #11542

(cherry picked from commit 2f2f01b045)
2022-12-05 20:09:15 +02:00
Nadav Har'El
aa206a6b6a test/alternator: increase timeout on TTL tests
Some of the tests in test/alternator/test_ttl.py need an expiration scan
pass to complete and expire items. In development builds on developer
machines, this usually takes less than a second (our scanning period is
set to half a second). However, in debug builds on Jenkins each scan
often takes up to 100 (!) seconds (this is the record we've seen so far).
This is why we set the tests' timeout to 120.

But recently we saw another test run failing. I think the problem is
that in some case, we need not one, but *two* scanning passes to
complete before the timeout: It is possible that the test writes an
item right after the current scan passed it, so it doesn't get expired,
and then we a second scan at a random position, possibly making that
item we mention one of the last items to be considered - so in total
we need to wait for two scanning periods, not one, for the item to
expire.

So this patch increases the timeout from 120 seconds to 240 seconds -
more than twice the highest scanning time we ever saw (100 seconds).

Note that this timeout is just a timeout, it's not the typical test
run time: The test can finish much more quickly, as little as one
second, if items expire quickly on a fast build and machine.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12106

(cherry picked from commit 6bc3075bbd)
2022-12-05 14:21:22 +02:00
Nadav Har'El
9baf72b049 test/alternator: fix timeout in flaky test test_ttl_stats
The test `test_metrics.py::test_ttl_stats` tests the metrics associated
with Alternator TTL expiration events. It normally finishes in less than a
second (the TTL scanning is configured to run every 0.5 seconds), so we
arbitrarily set a 60 second timeout for this test to allow for extremely
slow test machines. But in some extreme cases even this was not enough -
in one case we measured the TTL scan to take 63 seconds.

So in this patch we increase the timeout in this test from 60 seconds
to 120 seconds. We already did the same change in other Alternator TTL
tests in the past - in commit 746c4bd.

Fixes #11695

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11696

(cherry picked from commit 3a30fbd56c)
2022-12-05 14:21:22 +02:00
Nadav Har'El
8e62405117 test/alternator: test Alternator TTL metrics
This patch adds a test for the metrics generated by the background
expiration thread run for Alternator's TTL feature.

We test three of the four metrics: scylla_expiration_scan_passes,
scylla_expiration_scan_table and scylla_expiration_items_deleted.
The fourth metric, scylla_expiration_secondary_ranges_scanned, counts the
number of times that this node took over another node's expiration duty.
so requires a multi-node cluster to test, and we can't test it in the
single-node cluster test framework.

To see TTL expiration in action this test may need to wait up to the
setting of alternator_ttl_period_in_seconds. For a setting of 1
second (the default set by test/alternator/run), this means this
test can take up to 1 second to run. If alternator_ttl_period_in_seconds
is set higher, the test is skipped unless --runveryslow is requested.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 297109f6ee)
2022-12-05 14:21:16 +02:00
Nadav Har'El
15421e45a0 test/alternator: skip fewer Alternator TTL tests
Most of the Alternator TTL tests are extremely slow on DynamoDB because
item expiration may be delayed up to 24 hours (!), and in practice for
10 to 30 minutes. Because of this, we marked most of these tests
with the "veryslow" mark, causing them to be skipped by default - unless
pytest is given the "--runveryslow" option.

The result was that the TTL tests were not run in the normal test runs,
which can allow regressions to be introduced (luckily, this hasn't happened).

However, this "veryslow" mark was excessive. Many of the tests are very
slow only on DynamoDB, but aren't very slow on Scylla. In particular,
many of the tests involve waiting for an item to expire, something that
happens after the configurable alternator_ttl_period_in_seconds, which
is just one second in our tests.

So in this patch, we remove the "veryslow" mark from 6 tests of Alternator TTL
tests, and instead use two new fixtures - waits_for_expiration and
veryslow_on_aws - to only skip the test when running on DynamoDB or
when alternator_ttl_period_in_seconds is high - but in our usual test
environment they will not get skipped.

Because 5 of these 6 tests wait for an item to expire, they take one
second each and this patch adds 5 seconds to the Alternator test
runtime. This is unfortunate (it's more than 25% of the total Alternator
test runtime!) but not a disaster, and we plan to reduce this 5 second
time futher in the following patch, but decreasing the TTL scanning
period even further.

This patch also increases the timeout of several of these tests, to 120
seconds from the previous 10 seconds. As mentioned above, normally,
these tests should always finish in alternator_ttl_period_in_seconds
(1 second) with a single scan taking less than 0.2 seconds, but in
extreme cases of debug builds on overloaded test machines, we saw even
60 seconds being passed, so let's increase the maximum. I also needed
to make the sleep time between retries smaller, not a function of the
new (unrealistic) timeout.

4 more tests remain "veryslow" (and won't run by default) because they
are take 5-10 seconds each (e.g., a test which waits to see that an item
does *not* get expired, and a test involving writing a lot of data).
We should reconsider this in the future - to perhaps run these tests in
our normal test runs - but even for now, the 6 extra tests that we
start running are a much better protection against regressions than what
we had until now.

Fixes #11374

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

x

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 746c4bd9eb)
2022-12-05 13:07:16 +02:00
Benny Halevy
cf6bcffc1b configure: add --perf-tests-debuginfo option
Provides separate control over debuginfo for perf tests
since enabling --tests-debuginfo affects both today
causing the Jenkins archives of perf tests binaries to
inflate considerably.

Refs https://github.com/scylladb/scylla-pkg/issues/3060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 48021f3ceb)

Fixes #12191
2022-12-04 17:19:59 +02:00
Petr Gusev
a50b7f3d6a modification_statement: fix LWT insert crash if clustering key is null
PR #9314 fixed a similar issue with regular insert statements
but missed the LWT code path.

It's expected behaviour of
modification_statement::create_clustering_ranges to return an
empty range in this case, since possible_lhs_values it
uses explicitly returns empty_value_set if it evaluates rhs
to null, and it has a comment about it (All NULL
comparisons fail; no column values match.) On the other hand,
all components of the primary key are required to be set,
this is checked at the prepare phase, in
modification_statement::process_where_clause. So the only
problem was modification_statement::execute_with_condition
was not expecting an empty clustering_range in case of
a null clustering key.

Fixes: #11954
(cherry picked from commit 0d443dfd16)
2022-12-04 15:45:44 +02:00
Pavel Emelyanov
0e06025487 distributed_loader: Use coroutine::lambda in sleeping coroutine
According to seastar/doc/lambda-coroutine-fiasco.md lambda that
co_awaits once loses its capture frame. In distrobuted_loader
code there's at least one of that kind.

fixes: #12175

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12170

(cherry picked from commit 71179ff5ab)
2022-12-04 15:45:44 +02:00
Avi Kivity
820b79d56e Update seastar submodule (coroutine lambda fiasco)
* seastar 3aa91b4d2d...328edb2bce (1):
  > coroutine: explain and mitigate the lambda coroutine fiasco

Ref #12175
2022-12-04 15:45:31 +02:00
Botond Dénes
fde4a6e92d Merge 'doc: document the procedure for updating the mode after upgrade' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-docs/issues/4126

Closes #11122

* github.com:scylladb/scylladb:
  doc: add info about the time-consuming step due to resharding
  doc: add the new KB to the toctree
  doc: doc: add a KB about updating the mode in perftune.yaml after upgrade

(cherry picked from commit e9fec761a2)
2022-11-30 08:49:59 +02:00
Nadav Har'El
34b8d4306c Merge 'doc: add the links to the per-partition rate limit extension ' from Anna Stuchlik
Release 5.1. introduced a new CQL extension that applies to the CREATE TABLE and ALTER TABLE statements. The ScyllaDB-specific extensions are described on a separate page, so the CREATE TABLE and ALTER TABLE should include links to that page and section.

Note: CQL extensions are described with Markdown, while the Data Definition page is RST. Currently, there's no way to link from an RST page to an MD subsection (using a section heading or anchor), so a URL is used as a temporary solution.

Related: https://github.com/scylladb/scylladb/pull/9810

Closes #12070

* github.com:scylladb/scylladb:
  doc: move the info about per-partition rate limit for the ALTER TABLE statemet from the paragraph to the list
  doc: add the links to the per-partition rate limit extention to the CREATE TABLE and ALTER TABLE sections

(cherry picked from commit 6e9f739f19)
2022-11-28 08:52:28 +02:00
Yaron Kaikov
69e8fb997c release: prepare for 5.1.0 2022-11-27 14:36:37 +02:00
Botond Dénes
fbfd91e02f Merge '[branch 5.1 backport] doc: add ScyllaDB image upgrade guides for patch releases' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11460.

Closes #12079

* github.com:scylladb/scylladb:
  doc: update the commands to upgrade the ScyllaDB image
  doc: fix the filename in the index to resolve the warnings and fix the link
  doc: apply feedback by adding she step fo load the new repo and fixing the links
  doc: fix the version name in file upgrade-guide-from-2021.1-to-2022.1-image.rst
  doc: rename the upgrade-image file to upgrade-image-opensource and update all the links to that file
  doc: update the Enterprise guide to include the Enterprise-onlyimage file
  doc: update the image files
  doc: split the upgrade-image file to separate files for Open Source and Enterprise
  doc: clarify the alternative upgrade procedures for the ScyllaDB image
  doc: add the upgrade guide for ScyllaDB Image from 2022.x.y. to 2022.x.z
  doc: add the upgrade guide for ScyllaDB Image from 5.x.y. to 5.x.z
2022-11-25 13:27:21 +02:00
Anna Stuchlik
4c19b48495 doc: update the commands to upgrade the ScyllaDB image
(cherry picked from commit db75adaf9a)
2022-11-25 11:10:33 +01:00
Anna Stuchlik
9deca4250f doc: fix the filename in the index to resolve the warnings and fix the link
(cherry picked from commit e5c9f3c8a2)
2022-11-25 11:09:58 +01:00
Anna Stuchlik
97ab2a4eb3 doc: apply feedback by adding she step fo load the new repo and fixing the links
(cherry picked from commit 338b45303a)
2022-11-25 11:08:49 +01:00
Anna Stuchlik
16a941db3b doc: fix the version name in file upgrade-guide-from-2021.1-to-2022.1-image.rst
(cherry picked from commit 54d6d8b8cc)
2022-11-25 11:08:11 +01:00
Anna Stuchlik
8f60a464a7 doc: rename the upgrade-image file to upgrade-image-opensource and update all the links to that file
(cherry picked from commit 6ccc838740)
2022-11-25 11:07:21 +01:00
Anna Stuchlik
1e72f9cb5e doc: update the Enterprise guide to include the Enterprise-onlyimage file
(cherry picked from commit 22317f8085)
2022-11-25 11:06:40 +01:00
Anna Stuchlik
d91da87313 doc: update the image files
(cherry picked from commit 593f987bb2)
2022-11-25 11:05:51 +01:00
Anna Stuchlik
c73d59c1cb doc: split the upgrade-image file to separate files for Open Source and Enterprise
(cherry picked from commit 42224dd129)
2022-11-25 11:04:13 +01:00
Anna Stuchlik
9a6c0a89a0 doc: clarify the alternative upgrade procedures for the ScyllaDB image
(cherry picked from commit 64a527e1d3)
2022-11-25 11:01:27 +01:00
Anna Stuchlik
5b7dd00b14 doc: add the upgrade guide for ScyllaDB Image from 2022.x.y. to 2022.x.z
(cherry picked from commit 5136d7e6d7)
2022-11-25 11:00:32 +01:00
Anna Stuchlik
237df3b935 doc: add the upgrade guide for ScyllaDB Image from 5.x.y. to 5.x.z
(cherry picked from commit f1ef6a181e)
2022-11-25 10:59:43 +01:00
Botond Dénes
993d0371d9 Merge '[branch 5.1 backport] doc: remove the Operator documentation pages from the core ScyllaDB docs' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11154.

Closes #12061

* github.com:scylladb/scylladb:
  add the redirect to the Operator
  doc: remove the Operator docs from the core documentation
2022-11-25 07:49:10 +02:00
Anna Stuchlik
04167eba68 add the redirect to the Operator
(cherry picked from commit 792d1412d6)
2022-11-24 18:57:09 +01:00
Anna Stuchlik
ddf8eaba04 doc: remove the Operator docs from the core documentation
(cherry picked from commit 966c3423ad)
2022-11-24 18:54:10 +01:00
Botond Dénes
d5e5d27929 Merge '[branch 5.1 backport] doc: add the upgrade guide from 5.0 to 2022.1' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11108.

Closes #12063

* github.com:scylladb/scylladb:
  doc: apply feedback about scylla-enterprise-machine-image
  doc: update the note about installing scylla-enterprise-machine-image
  update the info about installing scylla-enterprise-machine-image during upgrade
  doc: add the requirement to install scylla-enterprise-machine-image if the previous version was installed with an image
  doc: update the info about metrics in 2022.1 compared to 5.0
  doc: minor formatting and language fixes
  doc: add the new guide to the toctree
  doc: add the upgrade guide from 5.0 to 2022.1
2022-11-23 14:53:13 +02:00
Anna Stuchlik
bb69ece13d doc: update the note about installing scylla-enterprise-machine-image
(cherry picked from commit 9fe7aa5c9a)
2022-11-23 14:53:13 +02:00
Anna Stuchlik
d4268863cd update the info about installing scylla-enterprise-machine-image during upgrade
(cherry picked from commit 4b0ec11136)
2022-11-23 14:53:13 +02:00
Anna Stuchlik
8b05c67226 doc: add the requirement to install scylla-enterprise-machine-image if the previous version was installed with an image
(cherry picked from commit da7f6cdec4)
2022-11-23 14:53:13 +02:00
Anna Stuchlik
3afc58de7d doc: add the upgrade guide from 5.0 to 2022.1
(cherry picked from commit c9b0c6fdbf)
2022-11-23 13:08:17 +01:00
Anna Stuchlik
496696140b docs: backport upgrade guide improvements from #11577 to 5.1
PR #11577 added the 5.0->5.1 upgrade guide. At the same time, it
improved some of the common `.rst` files that were using in other
upgrade guides; e.g. the `docs/upgrade/_common/upgrade-guide-v4-rpm.rst`
file is used in the 4.6->5.0 upgrade guide.

The 5.0->5.1 upgrade guide was then refactored. The refactored version
was already backported to the 5.1 branch (#12034). But we should still
backport the improvements done in #11577. This commit contains these
improvements.

(cherry picked from commit 2513497f9a)

Closes #12055
2022-11-23 13:51:54 +02:00
Anna Stuchlik
3a070380a7 doc: update the links to Manager and Operator
Closes #11196

(cherry picked from commit 532aa6e655)
2022-11-23 13:47:58 +02:00
Botond Dénes
7b551d8ce4 Merge '[branch 5.1 backport] doc: update the OS support for versions 2022.1 and 2022.2' from Anna Stuchlik
This is a backport of https://github.com/scylladb/scylladb/pull/11461.

Closes #12044

* github.com:scylladb/scylladb:
  doc: remove support for Debian 9 from versions 2022.1 and 2022.2
  doc: remove support for Ubuntu 16.04 from versions 2022.1 and 2022.2
  backport 11461 doc: add support for Debian 11 to versions 2022.1 and 2022.2
2022-11-22 08:30:04 +02:00
Anna Stuchlik
2018b8fcfd doc: remove support for Debian 9 from versions 2022.1 and 2022.2
(cherry picked from commit 4c7aa5181e)
2022-11-22 08:30:04 +02:00
Anna Stuchlik
20b5aa938e doc: remove support for Ubuntu 16.04 from versions 2022.1 and 2022.2
(cherry picked from commit dfc7203139)
2022-11-22 08:30:04 +02:00
Anna Stuchlik
f519580252 doc: add support for Debian 11 to versions 2022.1 and 2022.2
(cherry picked from commit dd4979ffa8)
2022-11-22 08:30:00 +02:00
Takuya ASADA
46c0a1cc0a scylla_raid_setup: run uuidpath existance check only after mount failed
We added UUID device file existance check on #11399, we expect UUID
device file is created before checking, and we wait for the creation by
"udevadm settle" after "mkfs.xfs".

However, we actually getting error which says UUID device file missing,
it probably means "udevadm settle" doesn't guarantee the device file created,
on some condition.

To avoid the error, use var-lib-scylla.mount to wait for UUID device
file is ready, and run the file existance check when the service is
failed.

Fixes #11617

Closes #11666

(cherry picked from commit a938b009ca)
2022-11-21 21:26:48 +02:00
Takuya ASADA
a822282fde scylla_raid_setup: prevent mount failed for /var/lib/scylla
Just like 4a8ed4c, we also need to wait for udev event completion to
create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the
disk just after formatting.

Fixes #11359

(cherry picked from commit 8835a34ab6)
2022-11-21 21:26:48 +02:00
Takuya ASADA
bc5af9fdea scylla_raid_setup: check uuid and device path are valid
Added code to check make sure uuid and uuid based device path are valid.

(cherry picked from commit 40134efee4)

Ref #11617, #11359 (prerequisite).
2022-11-21 21:26:11 +02:00
Nadav Har'El
c8bb147f84 Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek
When filtering with multi column restriction present all other restrictions were ignored.
So a query like:
`SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;`
would ignore the restriction `regular_col = 0`.

This was caused by a bug in the filtering code:
2779a171fc/cql3/selection/selection.cc (L433-L449)

When multi column restrictions were detected, the code checked if they are satisfied and returned immediately.
This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied.

This code was introduced back in 2019, when fixing #3574.
Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct.

Fixes: #6200
Fixes: #12014

Closes #12031

* github.com:scylladb/scylladb:
  cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
  boost/restrictions-test: uncomment part of the test that passes now
  cql-pytest: enable test for filtering combined multi column and regular column restrictions
  cql3: don't ignore other restrictions when a multi column restriction is present during filtering

(cherry picked from commit 2d2034ea28)
2022-11-21 14:02:33 +02:00
Kamil Braun
dc92ec4c8b docs: a single 5.0 -> 5.1 upgrade guide
There were 4 different pages for upgrading Scylla 5.0 to 5.1 (and the
same is true for other version pairs, but I digress) for different
environments:
- "ScyllaDB Image for EC2, GCP, and Azure"
- Ubuntu
- Debian
- RHEL/CentOS

THe Ubuntu and Debian pages used a common template:
```
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst
```
with different variable substitutions.

The "Image" page used a similar template, with some extra content in the
middle:
```
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-image-opensource.rst
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst
```

The RHEL/CentOS page used a different template:
```
.. include:: /upgrade/_common/upgrade-guide-v4-rpm.rst
```

This was an unmaintainable mess. Most of the content was "the same" for
each of these options. The only content that must actually be different
is the part with package installation instructions (e.g. calls to `yum`
vs `apt-get`). The rest of the content was logically the same - the
differences were mistakes, typos, and updates/fixes to the text that
were made in some of these docs but not others.

In this commit I prepare a single page that covers the upgrade and
rollback procedures for each of these options. The section dependent on
the system was implemented using Sphinx Tabs.

I also fixed and changed some parts:

- In the "Gracefully stop the node" section:
Ubuntu/Debian/Images pages had:

```rst
.. code:: sh

   sudo service scylla-server stop
```

RHEL/CentOS pages had:
```rst
.. code:: sh

.. include:: /rst_include/scylla-commands-stop-index.rst
```

the stop-index file contained this:
```rst
.. tabs::

   .. group-tab:: Supported OS

      .. code-block:: shell

         sudo systemctl stop scylla-server

   .. group-tab:: Docker

      .. code-block:: shell

         docker exec -it some-scylla supervisorctl stop scylla

      (without stopping *some-scylla* container)
```

So the RHEL/CentOS version had two tabs: one for Scylla installed
directly on the system, one for Scylla running in Docker - which is
interesting, because nothing anywhere else in the upgrade documents
mentions Docker.  Furthermore, the RHEL/CentOS version used `systemctl`
while the ubuntu/debian/images version used `service` to stop/start
scylla-server.  Both work on modern systems.

The Docker option is completely out of place - the rest of the upgrade
procedure does not mention Docker. So I decided it doesn't make sense to
include it. Docker documentation could be added later if we actually
decide to write upgrade documentation when using Docker...  Between
`systemctl` and `service` I went with `service` as it's a bit
higher-level.

- Similar change for "Start the node" section, and corresponding
  stop/start sections in the Rollback procedure.

- To reuse text for Ubuntu and Debian, when referencing "ScyllaDB deb
  repo" in the Debian/Ubuntu tabs, I provide two separate links: to
  Debian and Ubuntu repos.

- the link to rollback procedure in the RPM guide (in 'Download and
  install the new release' section) pointed to rollback procedure from
  3.0 to 3.1 guide... Fixed to point to the current page's rollback
  procedure.

- in the rollback procedure steps summary, the RPM version missed the
  "Restore system tables" step.

- in the rollback procedure, the repository links were pointing to the
  new versions, while they should point to the old versions.

There are some other pre-existing problems I noticed that need fixing:

- EC2/GCP/Azure option has no corresponding coverage in the rollback
  section (Download and install the old release) as it has in the
  upgrade section. There is no guide for rolling back 3rd party and OS
  packages, only Scylla. I left a TODO in a comment.
- the repository links assume certain Debian and Ubuntu versions (Debian
  10 and Ubuntu 20), but there are more available options (e.g. Ubuntu
  22). Not sure how to deal with this problem. Maybe a separate section
  with links? Or just a generic link without choice of platform/version?

Closes #11891

(cherry picked from commit 0c7ff0d2cb)

Backport notes:
Funnily, the 5.1 branch did not have the upgrade guide to 5.1 at all. It
was only in `master`. So the backport does not remove files, only adds
new ones.
I also had to add:
- an additional link in the upgrade-opensource index to the 5.1 upgrade
  page (it was already in upstream `master` when the cherry-picked commit
  was added)
- the list of new metrics, which was also completely missing in
  branch-5.1.

Closes #12034
2022-11-21 13:58:41 +02:00
Tzach Livyatan
8f7e3275a2 Update Alternator Markdown file to use automatic link notation
Closes #11335

(cherry picked from commit 8fc58300ea)
2022-11-21 09:56:10 +02:00
Yaron Kaikov
40a1905a2d release: prepare for 5.1.0-rc5 2022-11-19 13:41:28 +02:00
Avi Kivity
4e2c436222 Merge 'doc: add the upgrade guide from 5.0 to 2022.1 on Ubuntu 20.04' from Anna Stuchlik
Ubuntu 22.04 is supported by both ScyllaDB Open Source 5.0 and Enterprise 2022.1.

Closes #11227

* github.com:scylladb/scylladb:
  doc: add the redirects from Ubuntu version specific to version generic pages
  doc: remove version-speific content for Ubuntu and add the generic page to the toctree
  doc: rename the file to include Ubuntu
  doc: remove the version number from the document and add the link to Supported Versions
  doc: add a generic page for Ubuntu
  doc: add the upgrade guide from 5.0 to 2022.1 on Ubuntu 2022.1

(cherry picked from commit d4c986e4fa)
2022-11-18 17:06:00 +02:00
Botond Dénes
68be369f93 Merge 'doc: add the upgrade guide for ScyllaDB image from 2021.1 to 2022.1' from Anna Stuchlik
This PR is related to  https://github.com/scylladb/scylla-docs/issues/4124 and https://github.com/scylladb/scylla-docs/issues/4123.

**New Enterprise Upgrade Guide from 2021.1 to 2022.2**
I've added the upgrade guide for ScyllaDB Enterprise image. In consists of 3 files:
/upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p1.rst
upgrade/_common/upgrade-image.rst
 /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p2.rst

**Modified Enterprise Upgrade Guides 2021.1 to 2022.2**
I've modified the existing guides for Ubuntu and Debian to use the same files as above, but exclude the image-related information:
/upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p1.rst + /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p2.rst = /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian.rst

To make things simpler and remove duplication, I've replaced the guides for Ubuntu 18 and 20 with a generic Ubuntu guide.

**Modified Enterprise Upgrade Guides from 4.6 to 5.0**
These guides included a bug: they included the image-related information (about updating OS packages), because a file that includes that information was included by mistake. What's worse, it was duplicated. After the includes were removed, image-related information is no longer included in the Ubuntu and Debian guides (this fixes https://github.com/scylladb/scylla-docs/issues/4123).

I've modified the index file to be in sync with the updates.

Closes #11285

* github.com:scylladb/scylladb:
  doc: reorganize the content to list the recommended way of upgrading the image first
  doc: update the image upgrade guide for ScyllaDB image to include the location of the manifest file
  doc: fix the upgrade guides for Ubuntu and Debian by removing image-related information
  doc: update the guides for Ubuntu and Debian to remove image information and the OS version number
  doc: add the upgrade guide for ScyllaDB image from 2021.1 to 2022.1

(cherry picked from commit dca351c2a6)
2022-11-18 17:05:14 +02:00
Botond Dénes
0f7adb5f47 Merge 'doc: change the tool names to "Scylla SStable" and "Scylla Types"' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/11393

- Rename the tool names across the docs.
- Update the examples to replace `scylla-sstable` and `scylla-types` with `scylla sstable` and `scylla types`, respectively.

Closes #11432

* github.com:scylladb/scylladb:
  doc: update the tool names in the toctree and reference pages
  doc: rename the scylla-types tool as Scylla Types
  doc: rename the scylla-sstable tool as Scylla SStable

(cherry picked from commit 2c46c24608)
2022-11-18 17:04:37 +02:00
Avi Kivity
82dc8357ef Merge 'Docs: document how scylla-sstable obtains its schema' from Botond Dénes
This is a very important aspect of the tool that was completely missing from the document before. Also add a comparison with SStableDump.
Fixes: https://github.com/scylladb/scylladb/issues/11363

Closes #11390

* github.com:scylladb/scylladb:
  docs: scylla-sstable.rst: add comparison with SStableDump
  docs: scylla-sstable.rst: add section about providing the schema

(cherry picked from commit 2ab5cbd841)
2022-11-18 17:01:17 +02:00
Anna Stuchlik
12a58957e2 doc: fix the upgrade version in the upgrade guide for RHEL and CentOS
Closes #11477

(cherry picked from commit 0dee507c48)
2022-11-18 16:59:44 +02:00
Botond Dénes
3423ad6e38 Merge 'doc: update the default SStable format' from Anna Stuchlik
The purpose of this PR is to update the information about the default SStable format.
It

Closes #11431

* github.com:scylladb/scylladb:
  doc: simplify the information about default formats in different versions
  doc: update the SSTables 3.0 Statistics File Format to add the UUID host_id option of the ME format
  doc: add the information regarding the ME format to the SSTables 3.0 Data File Format page
  doc: fix additional information regarding the ME format on the SStable 3.x page
  doc: add the ME format to the table
  add a comment to remove the information when the documentation is versioned (in 5.1)
  doc: replace Scylla with ScyllaDB
  doc: fix the formatting and language in the updated section
  doc: fix the default SStable format

(cherry picked from commit a0392bc1eb)
2022-11-18 16:58:14 +02:00
Anna Stuchlik
64001719fa doc: remove the section about updating OS packages during upgrade from upgrade guides for Ubunut and Debian (from 4.5 to 4.6)
Closes #11629

(cherry picked from commit c5285bcb14)
2022-11-18 16:56:29 +02:00
AdamStawarz
cc3d368bc8 Update tombstones-flush.rst
change syntax:

nodetool compact <keyspace>.<mytable>;
to
nodetool compact <keyspace> <mytable>;

Closes #11904

(cherry picked from commit 6bc455ebea)
2022-11-18 16:52:06 +02:00
Botond Dénes
d957b0044b Merge 'doc: improve the documentation landing page ' from Anna Stuchlik
This PR introduces the following changes to the documentation landing page:

- The " New to ScyllaDB? Start here!" box is added.
- The "Connect your application to Scylla" box is removed.
- Some wording has been improved.
- "Scylla" has been replaced with "ScyllaDB".

Closes #11896

* github.com:scylladb/scylladb:
  Update docs/index.rst
  doc: replace Scylla with ScyllaDB on the landing page
  doc: improve the wording on the landing page
  doc: add the link to the ScyllaDB Basics page to the documentation landing page

(cherry picked from commit 2b572d94f5)
2022-11-18 16:51:26 +02:00
Botond Dénes
d4ed67bd47 Merge 'doc: cql-extensions.md: improve description of synchronous views' from Nadav Har'El
It was pointed out to me that our description of the synchronous_updates
materialized-view option does not make it clear enough what is the
default setting, or why a user might want to use this option.

This patch changes the description to (I hope) better address these
issues.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11404

* github.com:scylladb/scylladb:
  doc: cql-extensions.md: replace "Scylla" by "ScyllaDB"
  doc: cql-extensions.md: improve description of synchronous views

(cherry picked from commit b9fc504fb2)
2022-11-18 16:44:38 +02:00
Nadav Har'El
0cd6341cae Merge 'doc: document user defined functions (UDFs)' from Anna Stuchlik
This PR is V2 of the[ PR created by @psarna.](https://github.com/scylladb/scylladb/pull/11560).
I have:
- copied the content.
- applied the suggestions left by @nyh.
- made minor improvements, such as replacing "Scylla" with "ScyllaDB", fixing punctuation, and fixing the RST syntax.

Fixes https://github.com/scylladb/scylladb/issues/11378

Closes #11984

* github.com:scylladb/scylladb:
  doc: label user-defined functions as Experimental
  doc: restore the note for the Count function (removed by mistatke)
  doc: document user defined functions (UDFs)

(cherry picked from commit 7cbb0b98bb)
2022-11-18 16:43:53 +02:00
Nadav Har'El
23d8852a82 Merge 'doc: update the "Counting all rows in a table is slow" page' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/11373

- Updated the information on the "Counting all rows in a table is slow" page.
- Added COUNT to the list of selectors of the SELECT statement (somehow it was missing).
- Added the note to the description of the COUNT() function with a link to the KB page for troubleshooting if necessary. This will allow the users to easily find the KB page.

Closes #11417

* github.com:scylladb/scylladb:
  doc: add a comment to remove the note in version 5.1
  doc: update the information on the Countng all rows page and add the recommendation to upgrade ScyllaDB
  doc: add a note to the description of COUNT with a reference to the KB article
  doc: add COUNT to the list of acceptable selectors of the SELECT statement

(cherry picked from commit 22bb35e2cb)
2022-11-18 16:28:43 +02:00
Aleksandra Martyniuk
88016de43e compaction: request abort only once in compaction_data::stop
compaction_manager::task (and thus compaction_data) can be stopped
because of many different reasons. Thus, abort can be requested more
than once on compaction_data abort source causing a crash.

To prevent this before each request_abort() we check whether an abort
was requested before.

Closes #12004

(cherry picked from commit 7ead1a7857)

Fixes #12002.
2022-11-17 19:15:43 +02:00
Asias He
bdecf4318a gossip: Improve get_live_token_owners and get_unreachable_token_owners
The get_live_token_owners returns the nodes that are part of the ring
and live.

The get_unreachable_token_owners returns the nodes that are part of the ring
and is not alive.

The token_metadata::get_all_endpoints returns nodes that are part of the
ring.

The patch changes both functions to use the more authoritative source to
get the nodes that are part of the ring and call is_alive to check if
the node is up or down. So that the correctness does not depend on
any derived information.

This patch fixes a truncate issue in storage_proxy::truncate_blocking
where it calls get_live_token_owners and get_unreachable_token_owners to
decide the nodes to talk with for truncate operation. The truncate
failed because incorrect nodes were returned.

Fixes #10296
Fixes #11928

Closes #11952

(cherry picked from commit 16bd9ec8b1)
2022-11-17 14:30:43 +02:00
Eliran Sinvani
72bf244ad1 cql: Fix crash upon use of the word empty for service level name
Wrong access to an uninitialized token instead of the actual
generated string caused the parser to crash, this wasn't
detected by the ANTLR3 compiler because all the temporary
variables defined in the ANTLR3 statements are global in the
generated code. This essentialy caused a null dereference.

Tests: 1. The fixed issue scenario from github.
       2. Unit tests in release mode.

Fixes #11774

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190612133151.20609-1-eliransin@scylladb.com>

Closes #11777

(cherry picked from commit ab7429b77d)
2022-11-10 20:42:59 +02:00
Botond Dénes
ee82323599 db/view/view_builder: don't drop partition and range tombstones when resuming
The view builder builds the views from a given base table in
view_builder::batch_size batches of rows. After processing this many
rows, it suspends so the view builder can switch to building views for
other base tables in the name of fairness. When resuming the build step
for a given base table, it reuses the reader used previously (also
serving the role of a snapshot, pinning sstables read from). The
compactor however is created anew. As the reader can be in the middle of
a partition, the view builder injects a partition start into the
compactor to prime it for continuing the partition. This however only
included the partition-key, crucially missing any active tombstones:
partition tombstone or -- since the v2 transition -- active range
tombstone. This can result in base rows covered by either of this to be
resurrected and the view builder to generate view updates for them.
This patch solves this by using the detach-state mechanism of the
compactor which was explicitly developed for situations like this (in
the range scan code) -- resuming a read with the readers kept but the
compactor recreated.
Also included are two test cases reproducing the problem, one with a
range tombstone, the other with a partition tombstone.

Fixes: #11668

Closes #11671

(cherry picked from commit 5621cdd7f9)
2022-11-07 11:45:37 +02:00
Alexander Turetskiy
2f78df92ab Alternator: Projection field added to return from DescribeTable which describes GSIs and LSIs.
The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693

(cherry picked from commit 636e14cc77)
2022-11-07 10:36:04 +02:00
Takuya ASADA
e2809674d2 locator::ec2_snitch: Retry HTTP request to EC2 instance metadata service
EC2 instance metadata service can be busy, ret's retry to connect with
interval, just like we do in scylla-machine-image.

Fixes #10250

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #11688

(cherry picked from commit 6b246dc119)
2022-11-06 15:43:06 +02:00
Yaron Kaikov
0295d0c5c8 release: prepare for 5.1.0-rc4 2022-11-06 14:49:29 +02:00
Botond Dénes
fa94222662 Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El
As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted.

In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it).

Fixes #11801

Closes #11808

* github.com:scylladb/scylladb:
  test/alternator: add test for issue 11801
  MV: fix handling of view update which reassign the same key value
  materialized views: inline used-once and confusing function, replace_entry()

(cherry picked from commit e981bd4f21)
2022-11-01 13:14:21 +02:00
Pavel Emelyanov
dff7f3c5ba compaction_manager: Swallow ENOSPCs in ::stop()
When being stopped compaction manager may step on ENOSPC. This is not a
reason to fail stopping process with abort, better to warn this fact in
logs and proceed as if nothing happened

refs: #11245

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:36:44 +03:00
Pavel Emelyanov
3723713130 exceptions: Mark storage_io_error::code() with noexcept
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:36:38 +03:00
Pavel Emelyanov
03f8411e38 table: Handle storage_io_error's ENOSPC when flushing
Commit a9805106 (table: seal_active_memtable: handle ENOSPC error)
made memtable flushing code stand ENOSPC and continue flusing again
in the hope that the node administrator would provide some free space.

However, it looks like the IO code may report back ENOSPC with some
exception type this code doesn't expect. This patch tries to fix it

refs: #11245

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:35:26 +03:00
Pavel Emelyanov
0e391d67d1 table: Rewrap retry loop
The existing loop is very branchy in its attempts to find out whether or
not to abort. The "allowed_retries" count can be a good indicator of the
decision taken. This makes the code notably shorter and easier to extend

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-13 15:35:25 +03:00
Benny Halevy
f76989285e table: seal_active_memtable: handle ENOSPC error
Aborting too soon on ENOSPC is too harsh, leading to loss of
availability of the node for reads, while restarting it won't
solve the ENOSPC condition.

Fixes #11245

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11246
2022-10-13 15:35:19 +03:00
Beni Peled
9deeeb4db1 release: prepare for 5.1.0-rc3 2022-10-09 08:36:06 +03:00
Avi Kivity
1f3196735f Update tools/java submodule (cqlsh permissions)
* tools/java ad6764b506...b3959948dd (1):
  > install.sh is using wrong permissions for install cqlsh files

Fixes #11584.
2022-10-04 18:02:03 +03:00
Nadav Har'El
abb6817261 cql: validate bloom_filter_fp_chance up-front
Scylla's Bloom filter implementation has a minimal false-positive rate
that it can support (6.71e-5). When setting bloom_filter_fp_chance any
lower than that, the compute_bloom_spec() function, which writes the bloom
filter, throws an exception. However, this is too late - it only happens
while flushing the memtable to disk, and a failure at that point causes
Scylla to crash.

Instead, we should refuse the table creation with the unsupported
bloom_filter_fp_chance. This is also what Cassandra did six years ago -
see CASSANDRA-11920.

This patch also includes a regression test, which crashes Scylla before
this patch but passes after the patch (and also passes on Cassandra).

Fixes #11524.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11576

(cherry picked from commit 4c93a694b7)
2022-10-04 16:21:48 +03:00
Nadav Har'El
d3fd090429 alternator: return ProvisionedThroughput in DescribeTable
DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing
mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable
operation must return a ProvisionedThroughput structure, listing both
ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not
stated in some DynamoDB documentation but is explictly mentioned in
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html
Also in empirically, DynamoDB returns ProvisionedThroughput with zeros
even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this.

The ProvisionedThroughput structure being missing was a problem for
applications like DynamoDB connectors for Spark, if they implicitly
assume that ProvisionedThroughput is returned by DescribeTable, and
fail (as described in issue #11222) if it's outright missing.

So this patch adds the missing ProvisionedThroughput structure, and
the xfailing test starts to pass.

Note that this patch doesn't change the fact that attempting to set
a table to PROVISIONED billing mode is ignored: DescribeTable continues
to always return PAY_PER_REQUEST as the billing mode and zero as the
provisioned capacities.

Fixes #11222

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11298

(cherry picked from commit 941c719a23)
2022-10-03 14:26:55 +03:00
Pavel Emelyanov
3e7c57d162 cross-shard-barrier: Capture shared barrier in complete
When cross-shard barrier is abort()-ed it spawns a background fiber
that will wake-up other shards (if they are sleeping) with exception.

This fiber is implicitly waited by the owning sharded service .stop,
because barrier usage is like this:

    sharded<service> s;
    co_await s.invoke_on_all([] {
        ...
        barrier.abort();
    });
    ...
    co_await s.stop();

If abort happens, the invoke_on_all() will only resolve _after_ it
queues up the waking lambdas into smp queues, thus the subseqent stop
will queue its stopping lambdas after barrier's ones.

However, in debug mode the queue can be shuffled, so the owning service
can suddenly be freed from under the barrier's feet causing use after
free. Fortunately, this can be easily fixed by capturing the shared
pointer on the shared barrier instead of a regular pointer on the
shard-local barrier.

fixes: #11303

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11553
2022-10-03 13:20:28 +03:00
Tomasz Grabiec
f878a34da3 test: lib: random_mutation_generator: Don't generate mutations with marker uncompacted with shadowable tombstone
The generator was first setting the marker then applied tombstones.

The marker was set like this:

  row.marker() = random_row_marker();

Later, when shadowable tombstones were applied, they were compacted
with the marker as expected.

However, the key for the row was chosen randomly in each iteration and
there are multiple keys set, so there was a possibility of a key clash
with an earlier row. This could override the marker without applying
any tombstones, which is conditional on random choice.

This could generate rows with markers uncompacted with shadowable tombstones.

This broken row_cache_test::test_concurrent_reads_and_eviction on
comparison between expected and read mutations. The latter was
compacted because it went through an extra merge path, which compacts
the row.

Fix by making sure there are no key clashes.

Closes #11663

(cherry picked from commit 5268f0f837)
2022-10-02 16:44:57 +03:00
Raphael S. Carvalho
eaded57b2e compaction: Properly handle stop request for off-strategy
If user stops off-strategy via API, compaction manager can decide
to give up on it completely, so data will sit unreshaped in
maintenance set, preventing it from being compacted with data
in the main set. That's problematic because it will probably lead
to a significant increase in read and space amplification until
off-strategy is triggered again, which cannot happen anytime
soon.

Let's handle it by moving data in maintenance set into main one,
even if unreshaped. Then regular compaction will be able to
continue from where off-strategy left off.

Fixes #11543.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11545

(cherry picked from commit a04047f390)
2022-10-02 14:20:17 +03:00
Tomasz Grabiec
25d2da08d1 db: range_tombstone_list: Avoid quadratic behavior when applying
Range tombstones are kept in memory (cache/memtable) in
range_tombstone_list. It keeps them deoverlapped, so applying a range
tombstone which covers many range tombstones will erase existing range
tombstones from the list. This operation needs to be exception-safe,
so range_tombstone_list maintains an undo log. This undo log will
receive a record for each range tombstone which is removed. For
exception safety reasons, before pushing an undo log entry, we reserve
space in the log by calling std::vector::reserve(size() + 1). This is
O(N) where N is the number of undo log entries. Therefore, the whole
application is O(N^2).

This can cause reactor stalls and availability issues when replicas
apply such deletions.

This patch avoids the problem by reserving exponentially increasing
amount of space. Also, to avoid large allocations, switches the
container to chunked_vector.

Fixes #11211

Closes #11215

(cherry picked from commit 7f80602b01)
2022-09-30 00:01:26 +03:00
Botond Dénes
9b1a570f6f sstables: crawling mx-reader: make on_out_of_clustering_range() no-op
Said method currently emits a partition-end. This method is only called
when the last fragment in the stream is a range tombstone change with a
position after all clustered rows. The problem is that
consume_partition_end() is also called unconditionally, resulting in two
partition-end fragments being emitted. The fix is simple: make this
method a no-op, there is nothing to do there.

Also add two tests: one targeted to this bug and another one testing the
crawling reader with random mutations generated for random schema.

Fixes: #11421

Closes #11422

(cherry picked from commit be9d1c4df4)
2022-09-29 23:42:01 +03:00
Piotr Dulikowski
426d045249 exception: fix the error code used for rate_limit_exception
Per-partition rate limiting added a new error type which should be
returned when Scylla decides to reject an operation due to per-partition
rate limit being exceeded. The new error code requires drivers to
negotiate support for it, otherwise Scylla will report the error as
`Config_error`. The existing error code override logic works properly,
however due to a mistake Scylla will report the `Config_error` code even
if the driver correctly negotiated support for it.

This commit fixes the problem by specifying the correct error code in
`rate_limit_exception`'s constructor.

Tested manually with a modified version of the Rust driver which
negotiates support for the new error. Additionally, tested what happens
when the driver doesn't negotiate support (Scylla properly falls back to
`Config_error`).

Branches: 5.1
Fixes: #11517

Closes #11518

(cherry picked from commit e69b44a60f)
2022-09-29 23:39:25 +03:00
Botond Dénes
86dbbf12cc shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
Commit 8ab57aa added a yield to the buffer-copy loop, which means that
the copy can yield before done and the multishard reader might see the
half-copied buffer and consider the reader done (because
`_end_of_stream` is already set) resulting in the dropping the remaining
part of the buffer and in an invalid stream if the last copied fragment
wasn't a partition-end.

Fixes: #11561
(cherry picked from commit 0c450c9d4c)
2022-09-29 19:11:52 +03:00
Pavel Emelyanov
b05903eddd messaging_service: Fix gossiper verb group
When configuring tcp-nodelay unconditionally, messaging service thinks
gossiper uses group index 1, though it had changed some time ago and now
those verbs belong to group 0.

fixes: #11465

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 2c74062962)
2022-09-29 19:05:11 +03:00
Piotr Sarna
26ead53304 Merge 'Fix mutation commutativity with shadowable tombstone'
from Tomasz Grabiec

This series fixes lack of mutation associativity which manifests as
sporadic failures in
row_cache_test.cc::test_concurrent_reads_and_eviction due to differences
in mutations applied and read.

No known production impact.

Refs https://github.com/scylladb/scylladb/issues/11307

Closes #11312

* github.com:scylladb/scylladb:
  test: mutation_test: Add explicit test for mutation commutativity
  test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones
  db: mutation_partition: Drop unnecessary maybe_shadow()
  db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone
  mutation_partition: row: make row marker shadowing symmetric

(cherry picked from commit 484004e766)
2022-09-20 23:21:58 +02:00
Tomasz Grabiec
f60bab9471 test: row_cache: Use more narrow key range to stress overlapping reads more
This makes catching issues related to concurrent access of same or
adjacent entries more likely. For example, catches #11239.

Closes #11260

(cherry picked from commit 8ee5b69f80)
2022-09-20 23:21:54 +02:00
Yaron Kaikov
66f34245fc release: prepare for 5.1.0-rc2 2022-09-19 14:35:28 +03:00
Michał Chojnowski
4047528bd9 db: commitlog: don't print INFO logs on shutdown
The intention was for these logs to be printed during the
database shutdown sequence, but it was overlooked that it's not
the only place where commitlog::shutdown is called.
Commitlogs are started and shut down periodically by hinted handoff.
When that happens, these messages spam the log.

Fix that by adding INFO commitlog shutdown logs to database::stop,
and change the level of the commitlog::shutdown log call to DEBUG.

Fixes #11508

Closes #11536

(cherry picked from commit 9b6fc553b4)
2022-09-18 13:33:05 +03:00
Michał Chojnowski
1a82c61452 sstables: add a flag for disabling long-term index caching
Long-term index caching in the global cache, as introduced in 4.6, is a major
pessimization for workloads where accesses to the index are (spacially) sparse.
We want to have a way to disable it for the affected workloads.

There is already infrastructure in place for disabling it for BYPASS CACHE
queries. One way of solving the issue is hijacking that infrastructure.

This patch adds a global flag (and a corresponding CLI option) which controls
index caching. Setting the flag to `false` causes all index reads to behave
like they would in BYPASS CACHE queries.

Consequences of this choice:

- The per-SSTable partition_index_cache is unused. Every index_reader has
  its own, and they die together. Independent reads can no longer reuse the
  work of other reads which hit the same index pages. This is not crucial,
  since partition accesses have no (natural) spatial locality. Note that
  the original reason for partition_index_cache -- the ability to share
  reads for the lower and upper bound of the query -- is unaffected.
- The per-SSTable cached_file is unused. Every index_reader has its own
  (uncached) input stream from the index file, and every
  bsearch_clustered_cursor has its own cached_file, which dies together with
  the cursor. Note that the cursor still can perform its binary search with
  caching. However, it won't be able to reuse the file pages read by
  index_reader. In particular, if the promoted index is small, and fits inside
  the same file page as its index_entry, that page will be re-read.
  It can also happen that index_reader will read the same index file page
  multiple times. When the summary is so dense that multiple index pages fit in
  one index file page, advancing the upper bound, which reads the next index
  page, will read the same index file page. Since summary:disk ratio is 1:2000,
  this is expected to happen for partitions with size greater than 2000
  partition keys.

Fixes #11202

(cherry picked from commit cdb3e71045)
2022-09-18 13:27:46 +03:00
Avi Kivity
3d9800eb1c logalloc: don't crash while reporting reclaim stalls if --abort-on-seastar-bad-alloc is specified
The logger is proof against allocation failures, except if
--abort-on-seastar-bad-alloc is specified. If it is, it will crash.

The reclaim stall report is likely to be called in low memory conditions
(reclaim's job is to alleviate these conditions after all), so we're
likely to crash here if we're reclaiming a very low memory condition
and have a large stall simultaneously (AND we're running in a debug
environment).

Prevent all this by disabling --abort-on-seastar-bad-alloc temporarily.

Fixes #11549

Closes #11555

(cherry picked from commit d3b8c0c8a6)
2022-09-18 13:24:21 +03:00
Karol Baryła
c48e9b47dd transport/server.cc: Return correct size of decompressed lz4 buffer
An incorrect size is returned from the function, which could lead to
crashes or undefined behavior. Fix by erroring out in these cases.

Fixes #11476

(cherry picked from commit 1c2eef384d)
2022-09-07 10:58:30 +03:00
Avi Kivity
2eadaad9f7 Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes
Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later.

Fixes: https://github.com/scylladb/scylladb/issues/11264

Closes #11273

* github.com:scylladb/scylladb:
  querier: querier_cache: remove now unused evict_all_for_table()
  database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()
  reader_concurrency_semaphore: add evict_inactive_reads_for_table()

(cherry picked from commit afa7960926)
2022-09-02 10:41:22 +03:00
Yaron Kaikov
d10aee15e7 release: prepare for 5.1.0-rc1 2022-09-02 06:15:05 +03:00
Avi Kivity
9e017cb1e6 Update seastar submodule (tls error handling)
* seastar f9f5228b74...3aa91b4d2d (1):
  > Merge 'tls: vec_push: handle async errors rather than throwing on_internal_error' from Benny Halevy

Fixes #11252
2022-09-01 13:10:13 +03:00
Avi Kivity
b8504cc9b2 .gitmodules: switch seastar to scylla-seastar.git
This allows us to backport seastar patches to branch-5.1 on
scylla-seastar.git.
2022-09-01 13:08:22 +03:00
Avi Kivity
856703a85e Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec
Scenario:

cache = [
    row(pos=2, continuous=false),
    row(pos=after(2), dummy=true)
]

Scanning read starts, starts populating [-inf, before(2)] from sstables.

row(pos=2) is evicted.

cache = [
    row(pos=after(2), dummy=true)
]

Scanning read finishes reading from sstables.

Refreshes cache cursor via
partition_snapshot_row_cursor::maybe_refresh(), which calls
partition_snapshot_row_cursor::advance_to() because iterators are
invalidated. This advances the cursor to
after(2). no_clustering_row_between(2, after(2)) returns true, so
advance_to() returns true, and maybe_refresh() returns true. This is
interpreted by the cache reader as "the cursor has not moved forward",
so it marks the range as complete, without emitting the row with
pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads
will also miss the row.

The bug is in advance_to(), which is using
no_clustering_row_between(a, b) to determine its result, which by
definition excludes the starting key.

Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction
with reduced key range in the random_mutation_generator (1024 -> 16).

Fixes #11239

Closes #11240

* github.com:scylladb/scylladb:
  test: mvcc: Fix illegal use of maybe_refresh()
  tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range()
  tests: row_cache_test: Introduce one_shot mode to throttle
  row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy
2022-08-11 16:51:59 +02:00
Yaron Kaikov
86a6c1fb2b release: prepare for 5.1.0-rc0 2022-08-09 18:48:43 +03:00
242 changed files with 6671 additions and 1431 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -60,7 +60,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.1.0-dev
VERSION=5.1.13
if test -f version
then

View File

@@ -34,6 +34,7 @@
#include "expressions.hh"
#include "conditions.hh"
#include "cql3/constants.hh"
#include "cql3/util.hh"
#include <optional>
#include "utils/overloaded_functor.hh"
#include <seastar/json/json_elements.hh>
@@ -438,6 +439,11 @@ future<executor::request_return_type> executor::describe_table(client_state& cli
rjson::add(table_description, "BillingModeSummary", rjson::empty_object());
rjson::add(table_description["BillingModeSummary"], "BillingMode", "PAY_PER_REQUEST");
rjson::add(table_description["BillingModeSummary"], "LastUpdateToPayPerRequestDateTime", rjson::value(creation_date_seconds));
// In PAY_PER_REQUEST billing mode, provisioned capacity should return 0
rjson::add(table_description, "ProvisionedThroughput", rjson::empty_object());
rjson::add(table_description["ProvisionedThroughput"], "ReadCapacityUnits", 0);
rjson::add(table_description["ProvisionedThroughput"], "WriteCapacityUnits", 0);
rjson::add(table_description["ProvisionedThroughput"], "NumberOfDecreasesToday", 0);
std::unordered_map<std::string,std::string> key_attribute_types;
// Add base table's KeySchema and collect types for AttributeDefinitions:
@@ -460,6 +466,11 @@ future<executor::request_return_type> executor::describe_table(client_state& cli
rjson::add(view_entry, "IndexArn", generate_arn_for_index(*schema, index_name));
// Add indexes's KeySchema and collect types for AttributeDefinitions:
describe_key_schema(view_entry, *vptr, key_attribute_types);
// Add projection type
rjson::value projection = rjson::empty_object();
rjson::add(projection, "ProjectionType", "ALL");
// FIXME: we have to get ProjectionType from the schema when it is added
rjson::add(view_entry, "Projection", std::move(projection));
// Local secondary indexes are marked by an extra '!' sign occurring before the ':' delimiter
rjson::value& index_array = (delim_it > 1 && cf_name[delim_it-1] == '!') ? lsi_array : gsi_array;
rjson::push_back(index_array, std::move(view_entry));
@@ -917,9 +928,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
if (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {
add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);
}
sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";
sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));
if (!view_range_key.empty()) {
where_clause = where_clause + " AND \"" + view_hash_key + "\" IS NOT NULL";
where_clause = format("{} AND {} IS NOT NULL", where_clause,
cql3::util::maybe_quote(view_range_key));
}
where_clauses.push_back(std::move(where_clause));
view_builders.emplace_back(std::move(view_builder));
@@ -974,9 +986,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
// Note above we don't need to add virtual columns, as all
// base columns were copied to view. TODO: reconsider the need
// for virtual columns when we support Projection.
sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";
sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));
if (!view_range_key.empty()) {
where_clause = where_clause + " AND \"" + view_range_key + "\" IS NOT NULL";
where_clause = format("{} AND {} IS NOT NULL", where_clause,
cql3::util::maybe_quote(view_range_key));
}
where_clauses.push_back(std::move(where_clause));
view_builders.emplace_back(std::move(view_builder));

View File

@@ -142,19 +142,24 @@ future<alternator::executor::request_return_type> alternator::executor::list_str
auto table = find_table(_proxy, request);
auto db = _proxy.data_dictionary();
auto cfs = db.get_tables();
auto i = cfs.begin();
auto e = cfs.end();
if (limit < 1) {
throw api_error::validation("Limit must be 1 or more");
}
// TODO: the unordered_map here is not really well suited for partial
// querying - we're sorting on local hash order, and creating a table
// between queries may or may not miss info. But that should be rare,
// and we can probably expect this to be a single call.
// # 12601 (maybe?) - sort the set of tables on ID. This should ensure we never
// generate duplicates in a paged listing here. Can obviously miss things if they
// are added between paged calls and end up with a "smaller" UUID/ARN, but that
// is to be expected.
std::sort(cfs.begin(), cfs.end(), [](const data_dictionary::table& t1, const data_dictionary::table& t2) {
return t1.schema()->id() < t2.schema()->id();
});
auto i = cfs.begin();
auto e = cfs.end();
if (streams_start) {
i = std::find_if(i, e, [&](data_dictionary::table t) {
i = std::find_if(i, e, [&](const data_dictionary::table& t) {
return t.schema()->id() == streams_start
&& cdc::get_base_table(db.real_database(), *t.schema())
&& is_alternator_keyspace(t.schema()->ks_name())

View File

@@ -610,6 +610,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
apilog.debug("force_keyspace_compaction: keyspace={} tables={}", keyspace, column_families);
return ctx.db.invoke_on_all([keyspace, column_families] (replica::database& db) -> future<> {
auto table_ids = boost::copy_range<std::vector<utils::UUID>>(column_families | boost::adaptors::transformed([&] (auto& cf_name) {
return db.find_uuid(keyspace, cf_name);
@@ -634,6 +635,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, column_families);
return ss.local().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
if (!is_cleanup_allowed) {
@@ -653,7 +655,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
// as a table can be dropped during loop below, let's find it before issuing the cleanup request.
for (auto& id : table_ids) {
replica::table& t = db.find_column_family(id);
co_await cm.perform_cleanup(owned_ranges_ptr, t.as_table_state());
co_await t.perform_cleanup_compaction(owned_ranges_ptr);
}
co_return;
}).then([]{
@@ -663,6 +665,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::perform_keyspace_offstrategy_compaction.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> tables) -> future<json::json_return_type> {
apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, tables);
co_return co_await ctx.db.map_reduce0([&keyspace, &tables] (replica::database& db) -> future<bool> {
bool needed = false;
for (const auto& table : tables) {
@@ -676,6 +679,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::upgrade_sstables.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, column_families, exclude_current_version);
return ctx.db.invoke_on_all([=] (replica::database& db) {
auto owned_ranges_ptr = compaction::make_owned_ranges_ptr(db.get_keyspace_local_ranges(keyspace));
return do_for_each(column_families, [=, &db](sstring cfname) {
@@ -691,6 +695,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<request> req) -> future<json::json_return_type> {
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("perform_keyspace_flush: keyspace={} tables={}", keyspace, column_families);
auto &db = ctx.db.local();
if (column_families.empty()) {
co_await db.flush_on_all(keyspace);
@@ -702,6 +707,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::decommission.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("decommission");
return ss.local().decommission().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -717,6 +723,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::remove_node.set(r, [&ss](std::unique_ptr<request> req) {
auto host_id = req->get_query_param("host_id");
std::vector<sstring> ignore_nodes_strs= split(req->get_query_param("ignore_nodes"), ",");
apilog.info("remove_node: host_id={} ignore_nodes={}", host_id, ignore_nodes_strs);
auto ignore_nodes = std::list<gms::inet_address>();
for (std::string n : ignore_nodes_strs) {
try {
@@ -789,6 +796,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::drain.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("drain");
return ss.local().drain().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -822,12 +830,14 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::stop_gossiping.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("stop_gossiping");
return ss.local().stop_gossiping().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::start_gossiping.set(r, [&ss](std::unique_ptr<request> req) {
apilog.info("start_gossiping");
return ss.local().start_gossiping().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -930,6 +940,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::rebuild.set(r, [&ss](std::unique_ptr<request> req) {
auto source_dc = req->get_query_param("source_dc");
apilog.info("rebuild: source_dc={}", source_dc);
return ss.local().rebuild(std::move(source_dc)).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -966,6 +977,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
// FIXME: We should truncate schema tables if more than one node in the cluster.
auto& sp = service::get_storage_proxy();
auto& fs = sp.local().features();
apilog.info("reset_local_schema");
return db::schema_tables::recalculate_schema_version(sys_ks, sp, fs).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -973,6 +985,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {
auto probability = req->get_query_param("probability");
apilog.info("set_trace_probability: probability={}", probability);
return futurize_invoke([probability] {
double real_prob = std::stod(probability.c_str());
return tracing::tracing::tracing_instance().invoke_on_all([real_prob] (auto& local_tracing) {
@@ -1010,6 +1023,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
auto ttl = req->get_query_param("ttl");
auto threshold = req->get_query_param("threshold");
auto fast = req->get_query_param("fast");
apilog.info("set_slow_query: enable={} ttl={} threshold={} fast={}", enable, ttl, threshold, fast);
try {
return tracing::tracing::tracing_instance().invoke_on_all([enable, ttl, threshold, fast] (auto& local_tracing) {
if (threshold != "") {
@@ -1036,6 +1050,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, true);
});
@@ -1043,6 +1058,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, false);
});

View File

@@ -55,6 +55,7 @@ future<bool> default_role_row_satisfies(
return qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{meta::DEFAULT_SUPERUSER_NAME},
cql3::query_processor::cache_internal::yes).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {

View File

@@ -80,8 +80,10 @@ struct compaction_data {
}
void stop(sstring reason) {
stop_requested = std::move(reason);
abort.request_abort();
if (!abort.abort_requested()) {
stop_requested = std::move(reason);
abort.request_abort();
}
}
};

View File

@@ -15,6 +15,7 @@
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/switch_to.hh>
#include <seastar/coroutine/parallel_for_each.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include "sstables/exceptions.hh"
#include "locator/abstract_replication_strategy.hh"
#include "utils/fb_utilities.hh"
@@ -637,6 +638,7 @@ sstables::compaction_stopped_exception compaction_manager::task::make_compaction
compaction_manager::compaction_manager(config cfg, abort_source& as)
: _cfg(std::move(cfg))
, _compaction_submission_timer(compaction_sg().cpu, compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), [this] () -> float {
_last_backlog = backlog();
auto b = _last_backlog / available_memory();
@@ -670,6 +672,7 @@ compaction_manager::compaction_manager(config cfg, abort_source& as)
compaction_manager::compaction_manager()
: _cfg(config{ .available_memory = 1 })
, _compaction_submission_timer(compaction_sg().cpu, compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), 1, [] () -> float { return 1.0; }))
, _backlog_manager(_compaction_controller)
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
@@ -726,38 +729,46 @@ void compaction_manager::register_metrics() {
void compaction_manager::enable() {
assert(_state == state::none || _state == state::disabled);
_state = state::enabled;
_compaction_submission_timer.arm(periodic_compaction_submission_interval());
postponed_compactions_reevaluation();
_compaction_submission_timer.arm_periodic(periodic_compaction_submission_interval());
_waiting_reevalution = postponed_compactions_reevaluation();
}
std::function<void()> compaction_manager::compaction_submission_callback() {
return [this] () mutable {
for (auto& e: _compaction_state) {
submit(*e.first);
postpone_compaction_for_table(e.first);
}
reevaluate_postponed_compactions();
};
}
void compaction_manager::postponed_compactions_reevaluation() {
_waiting_reevalution = repeat([this] {
return _postponed_reevaluation.wait().then([this] {
if (_state != state::enabled) {
_postponed.clear();
return stop_iteration::yes;
}
auto postponed = std::move(_postponed);
try {
for (auto& t : postponed) {
auto s = t->schema();
cmlog.debug("resubmitting postponed compaction for table {}.{} [{}]", s->ks_name(), s->cf_name(), fmt::ptr(t));
submit(*t);
future<> compaction_manager::postponed_compactions_reevaluation() {
while (true) {
co_await _postponed_reevaluation.when();
if (_state != state::enabled) {
_postponed.clear();
co_return;
}
// A task_state being reevaluated can re-insert itself into postponed list, which is the reason
// for moving the list to be processed into a local.
auto postponed = std::exchange(_postponed, {});
try {
for (auto it = postponed.begin(); it != postponed.end();) {
compaction::table_state* t = *it;
it = postponed.erase(it);
// skip reevaluation of a table_state that became invalid post its removal
if (!_compaction_state.contains(t)) {
continue;
}
} catch (...) {
_postponed = std::move(postponed);
auto s = t->schema();
cmlog.debug("resubmitting postponed compaction for table {}.{} [{}]", s->ks_name(), s->cf_name(), fmt::ptr(t));
submit(*t);
co_await coroutine::maybe_yield();
}
return stop_iteration::no;
});
});
} catch (...) {
_postponed.insert(postponed.begin(), postponed.end());
}
}
}
void compaction_manager::reevaluate_postponed_compactions() noexcept {
@@ -842,6 +853,20 @@ future<> compaction_manager::really_do_stop() {
cmlog.info("Stopped");
}
template <typename Ex>
requires std::is_base_of_v<std::exception, Ex> &&
requires (const Ex& ex) {
{ ex.code() } noexcept -> std::same_as<const std::error_code&>;
}
auto swallow_enospc(const Ex& ex) noexcept {
if (ex.code().value() != ENOSPC) {
return make_exception_future<>(std::make_exception_ptr(ex));
}
cmlog.warn("Got ENOSPC on stop, ignoring...");
return make_ready_future<>();
}
void compaction_manager::do_stop() noexcept {
if (_state == state::none || _state == state::stopped) {
return;
@@ -849,7 +874,10 @@ void compaction_manager::do_stop() noexcept {
try {
_state = state::stopped;
_stop_future = really_do_stop();
_stop_future = really_do_stop()
.handle_exception_type([] (const std::system_error& ex) { return swallow_enospc(ex); })
.handle_exception_type([] (const storage_io_error& ex) { return swallow_enospc(ex); })
;
} catch (...) {
cmlog.error("Failed to stop the manager: {}", std::current_exception());
}
@@ -1050,7 +1078,7 @@ public:
bool performed() const noexcept {
return _performed;
}
private:
future<> run_offstrategy_compaction(sstables::compaction_data& cdata) {
// This procedure will reshape sstables in maintenance set until it's ready for
// integration into main set.
@@ -1083,6 +1111,7 @@ public:
return desc.sstables.size() ? std::make_optional(std::move(desc)) : std::nullopt;
};
std::exception_ptr err;
while (auto desc = get_next_job()) {
desc->creator = [this, &new_unused_sstables, &t] (shard_id dummy) {
auto sst = t.make_sstable();
@@ -1091,7 +1120,16 @@ public:
};
auto input = boost::copy_range<std::unordered_set<sstables::shared_sstable>>(desc->sstables);
auto ret = co_await sstables::compact_sstables(std::move(*desc), cdata, t);
sstables::compaction_result ret;
try {
ret = co_await sstables::compact_sstables(std::move(*desc), cdata, t);
} catch (sstables::compaction_stopped_exception&) {
// If off-strategy compaction stopped on user request, let's not discard the partial work.
// Therefore, both un-reshaped and reshaped data will be integrated into main set, allowing
// regular compaction to continue from where off-strategy left off.
err = std::current_exception();
break;
}
_performed = true;
// update list of reshape candidates without input but with output added to it
@@ -1128,6 +1166,9 @@ public:
for (auto& sst : sstables_to_remove) {
sst->mark_for_deletion();
}
if (err) {
co_await coroutine::return_exception_ptr(std::move(err));
}
}
protected:
virtual future<compaction_stats_opt> do_run() override {
@@ -1390,14 +1431,18 @@ protected:
co_return std::nullopt;
}
private:
// Releases reference to cleaned files such that respective used disk space can be freed.
void release_exhausted(std::vector<sstables::shared_sstable> exhausted_sstables) {
_compacting.release_compacting(exhausted_sstables);
}
future<> run_cleanup_job(sstables::compaction_descriptor descriptor) {
co_await coroutine::switch_to(_cm.compaction_sg().cpu);
// Releases reference to cleaned files such that respective used disk space can be freed.
auto release_exhausted = [this, &descriptor] (std::vector<sstables::shared_sstable> exhausted_sstables) mutable {
auto exhausted = boost::copy_range<std::unordered_set<sstables::shared_sstable>>(exhausted_sstables);
std::erase_if(descriptor.sstables, [&] (const sstables::shared_sstable& sst) {
return exhausted.contains(sst);
});
_compacting.release_compacting(exhausted_sstables);
};
for (;;) {
compaction_backlog_tracker user_initiated(std::make_unique<user_initiated_backlog_tracker>(_cm._compaction_controller.backlog_of_shares(200), _cm.available_memory()));
_cm.register_backlog_tracker(user_initiated);
@@ -1405,8 +1450,7 @@ private:
std::exception_ptr ex;
try {
setup_new_compaction(descriptor.run_identifier);
co_await compact_sstables_and_update_history(descriptor, _compaction_data,
std::bind(&cleanup_sstables_compaction_task::release_exhausted, this, std::placeholders::_1));
co_await compact_sstables_and_update_history(descriptor, _compaction_data, release_exhausted);
finish_compaction();
_cm.reevaluate_postponed_compactions();
co_return; // done with current job

View File

@@ -279,10 +279,10 @@ private:
std::function<void()> compaction_submission_callback();
// all registered tables are reevaluated at a constant interval.
// Submission is a NO-OP when there's nothing to do, so it's fine to call it regularly.
timer<lowres_clock> _compaction_submission_timer = timer<lowres_clock>(compaction_submission_callback());
static constexpr std::chrono::seconds periodic_compaction_submission_interval() { return std::chrono::seconds(3600); }
config _cfg;
timer<lowres_clock> _compaction_submission_timer;
compaction_controller _compaction_controller;
compaction_backlog_manager _backlog_manager;
optimized_optional<abort_source::subscription> _early_abort_subscription;
@@ -330,7 +330,7 @@ private:
// table still exists and compaction is not disabled for the table.
inline bool can_proceed(compaction::table_state* t) const;
void postponed_compactions_reevaluation();
future<> postponed_compactions_reevaluation();
void reevaluate_postponed_compactions() noexcept;
// Postpone compaction for a table that couldn't be executed due to ongoing
// similar-sized compaction.

View File

@@ -409,7 +409,9 @@ public:
l0_old_ssts.push_back(std::move(sst));
}
}
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
if (l0_old_ssts.size() || l0_new_ssts.size()) {
_l0_scts.replace_sstables(std::move(l0_old_ssts), std::move(l0_new_ssts));
}
}
};

View File

@@ -200,10 +200,8 @@ leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input
auto [disjoint, overlapping_sstables] = is_disjoint(level_info[level], tolerance(level));
if (!disjoint) {
auto ideal_level = ideal_level_for_input(input, max_sstable_size_in_bytes);
leveled_manifest::logger.warn("Turns out that level {} is not disjoint, found {} overlapping SSTables, so compacting everything on behalf of {}.{}", level, overlapping_sstables, schema->ks_name(), schema->cf_name());
// Unfortunately no good limit to limit input size to max_sstables for LCS major
compaction_descriptor desc(std::move(input), iop, ideal_level, max_sstable_size_in_bytes);
leveled_manifest::logger.warn("Turns out that level {} is not disjoint, found {} overlapping SSTables, so the level will be entirely compacted on behalf of {}.{}", level, overlapping_sstables, schema->ks_name(), schema->cf_name());
compaction_descriptor desc(std::move(level_info[level]), iop, level, max_sstable_size_in_bytes);
desc.options = compaction_type_options::make_reshape();
return desc;
}

View File

@@ -10,88 +10,6 @@
#pragma once
#include "dht/i_partitioner.hh"
#include <optional>
#include <variant>
// Wraps ring_position_view so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position_view {
const ::schema* _schema = nullptr;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position_view> _rpv;
public:
constexpr compatible_ring_position_view() = default;
compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
: _schema(&s), _rpv(rpv) {
}
const dht::ring_position_view& position() const {
return *_rpv;
}
const ::schema& schema() const {
return *_schema;
}
friend std::strong_ordering tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
}
friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position {
schema_ptr _schema;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position> _rp;
public:
constexpr compatible_ring_position() = default;
compatible_ring_position(schema_ptr s, dht::ring_position rp)
: _schema(std::move(s)), _rp(std::move(rp)) {
}
dht::ring_position_view position() const {
return *_rp;
}
const ::schema& schema() const {
return *_schema;
}
friend std::strong_ordering tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rp, *y._rp);
}
friend bool operator<(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position or ring_position_view so either is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
@@ -99,37 +17,22 @@ public:
// on callers to keep ring position alive, allow lookup on containers that don't support different
// key types, and also avoiding unnecessary copies.
class compatible_ring_position_or_view {
// Optional to supply a default constructor, no more.
std::optional<std::variant<compatible_ring_position, compatible_ring_position_view>> _crp_or_view;
schema_ptr _schema;
lw_shared_ptr<dht::ring_position> _rp;
dht::ring_position_view_opt _rpv; // optional only for default ctor, nothing more
public:
constexpr compatible_ring_position_or_view() = default;
compatible_ring_position_or_view() = default;
explicit compatible_ring_position_or_view(schema_ptr s, dht::ring_position rp)
: _crp_or_view(compatible_ring_position(std::move(s), std::move(rp))) {
: _schema(std::move(s)), _rp(make_lw_shared<dht::ring_position>(std::move(rp))), _rpv(dht::ring_position_view(*_rp)) {
}
explicit compatible_ring_position_or_view(const schema& s, dht::ring_position_view rpv)
: _crp_or_view(compatible_ring_position_view(s, rpv)) {
: _schema(s.shared_from_this()), _rpv(rpv) {
}
dht::ring_position_view position() const {
struct rpv_accessor {
dht::ring_position_view operator()(const compatible_ring_position& crp) {
return crp.position();
}
dht::ring_position_view operator()(const compatible_ring_position_view& crpv) {
return crpv.position();
}
};
return std::visit(rpv_accessor{}, *_crp_or_view);
const dht::ring_position_view& position() const {
return *_rpv;
}
friend std::strong_ordering tri_compare(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
struct schema_accessor {
const ::schema& operator()(const compatible_ring_position& crp) {
return crp.schema();
}
const ::schema& operator()(const compatible_ring_position_view& crpv) {
return crpv.schema();
}
};
return dht::ring_position_tri_compare(std::visit(schema_accessor{}, *x._crp_or_view), x.position(), y.position());
return dht::ring_position_tri_compare(*x._schema, x.position(), y.position());
}
friend bool operator<(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) < 0;

View File

@@ -630,6 +630,8 @@ arg_parser.add_argument('--static-yaml-cpp', dest='staticyamlcpp', action='store
help='Link libyaml-cpp statically')
arg_parser.add_argument('--tests-debuginfo', action='store', dest='tests_debuginfo', type=int, default=0,
help='Enable(1)/disable(0)compiler debug information generation for tests')
arg_parser.add_argument('--perf-tests-debuginfo', action='store', dest='perf_tests_debuginfo', type=int, default=0,
help='Enable(1)/disable(0)compiler debug information generation for perf tests')
arg_parser.add_argument('--python', action='store', dest='python', default='python3',
help='Python3 path')
arg_parser.add_argument('--split-dwarf', dest='split_dwarf', action='store_true', default=False,
@@ -1423,6 +1425,7 @@ linker_flags = linker_flags(compiler=args.cxx)
dbgflag = '-g -gz' if args.debuginfo else ''
tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'
perf_tests_link_rule = 'link' if args.perf_tests_debuginfo else 'link_stripped'
# Strip if debuginfo is disabled, otherwise we end up with partial
# debug info from the libraries we static link with
@@ -1954,7 +1957,8 @@ with open(buildfile, 'w') as f:
# So we strip the tests by default; The user can very
# quickly re-link the test unstripped by adding a "_g"
# to the test name, e.g., "ninja build/release/testname_g"
f.write('build $builddir/{}/{}: {}.{} {} | {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
link_rule = perf_tests_link_rule if binary.startswith('test/perf/') else tests_link_rule
f.write('build $builddir/{}/{}: {}.{} {} | {} {}\n'.format(mode, binary, link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
f.write(' libs = {}\n'.format(local_libs))
f.write('build $builddir/{}/{}_g: {}.{} {} | {} {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
f.write(' libs = {}\n'.format(local_libs))
@@ -2070,7 +2074,8 @@ with open(buildfile, 'w') as f:
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
if cc.endswith('Parser.cpp'):
# Unoptimized parsers end up using huge amounts of stack space and overflowing their stack
flags = '-O1'
flags = '-O1' if modes[mode]['optimization-level'] in ['0', 'g', 's'] else ''
if has_sanitize_address_use_after_scope:
flags += ' -fno-sanitize-address-use-after-scope'
f.write(' obj_cxxflags = %s\n' % flags)

View File

@@ -1396,7 +1396,7 @@ serviceLevelOrRoleName returns [sstring name]
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower); }
| t=STRING_LITERAL { $name = sstring($t.text); }
| t=QUOTED_NAME { $name = sstring($t.text); }
| k=unreserved_keyword { $name = sstring($t.text);
| k=unreserved_keyword { $name = k;
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);}
| QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");}
;

View File

@@ -10,6 +10,7 @@
#include "cql3/attributes.hh"
#include "cql3/column_identifier.hh"
#include <optional>
namespace cql3 {
@@ -56,16 +57,16 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
}
}
int32_t attributes::get_time_to_live(const query_options& options) {
std::optional<int32_t> attributes::get_time_to_live(const query_options& options) {
if (!_time_to_live.has_value())
return 0;
return std::nullopt;
cql3::raw_value tval = expr::evaluate(*_time_to_live, options);
if (tval.is_null()) {
throw exceptions::invalid_request_exception("Invalid null value of TTL");
}
if (tval.is_unset_value()) {
return 0;
return std::nullopt;
}
int32_t ttl;

View File

@@ -42,7 +42,7 @@ public:
int64_t get_timestamp(int64_t now, const query_options& options);
int32_t get_time_to_live(const query_options& options);
std::optional<int32_t> get_time_to_live(const query_options& options);
db::timeout_clock::duration get_timeout(const query_options& options) const;

View File

@@ -1457,7 +1457,7 @@ expression search_and_replace(const expression& e,
};
},
[&] (const binary_operator& oper) -> expression {
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs));
return binary_operator(recurse(oper.lhs), oper.op, recurse(oper.rhs), oper.order);
},
[&] (const column_mutation_attribute& cma) -> expression {
return column_mutation_attribute{cma.kind, recurse(cma.column)};

View File

@@ -22,6 +22,7 @@
#include "db/config.hh"
#include "data_dictionary/data_dictionary.hh"
#include "hashers.hh"
#include "utils/error_injection.hh"
namespace cql3 {
@@ -599,6 +600,14 @@ query_processor::get_statement(const sstring_view& query, const service::client_
std::unique_ptr<raw::parsed_statement>
query_processor::parse_statement(const sstring_view& query) {
try {
{
const char* error_injection_key = "query_processor-parse_statement-test_failure";
utils::get_local_injector().inject(error_injection_key, [&]() {
if (query.find(error_injection_key) != sstring_view::npos) {
throw std::runtime_error(error_injection_key);
}
});
}
auto statement = util::do_with_parser(query, std::mem_fn(&cql3_parser::CqlParser::query));
if (!statement) {
throw exceptions::syntax_exception("Parsing failed");

View File

@@ -81,7 +81,7 @@ public:
virtual sstring assignment_testable_source_context() const override {
auto&& name = _type->field_name(_field);
auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
auto sname = std::string_view(reinterpret_cast<const char*>(name.data()), name.size());
return format("{}.{}", _selected, sname);
}

View File

@@ -435,7 +435,7 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
clustering_key_prefix ckey = clustering_key_prefix::from_exploded(clustering_key);
// FIXME: push to upper layer so it happens once per row
auto static_and_regular_columns = expr::get_non_pk_values(selection, static_row, row);
return expr::is_satisfied_by(
bool multi_col_clustering_satisfied = expr::is_satisfied_by(
clustering_columns_restrictions,
expr::evaluation_inputs{
.partition_key = &partition_key,
@@ -444,6 +444,9 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
.selection = &selection,
.options = &_options,
});
if (!multi_col_clustering_satisfied) {
return false;
}
}
auto static_row_iterator = static_row.iterator();

View File

@@ -261,6 +261,10 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::do_
if (options.getSerialConsistency() == null)
throw new InvalidRequestException("Invalid empty serial consistency level");
#endif
for (size_t i = 0; i < _statements.size(); ++i) {
_statements[i].statement->validate_primary_key_restrictions(options.for_statement(i));
}
if (_has_conditions) {
++_stats.cas_batches;
_stats.statements_in_cas_batches += _statements.size();

View File

@@ -119,6 +119,9 @@ std::optional<mutation> cas_request::apply(foreign_ptr<lw_shared_ptr<query::resu
const update_parameters::prefetch_data::row* cas_request::find_old_row(const cas_row_update& op) const {
static const clustering_key empty_ckey = clustering_key::make_empty();
if (_key.empty()) {
throw exceptions::invalid_request_exception("partition key ranges empty - probably caused by an unset value");
}
const partition_key& pkey = _key.front().start()->value().key().value();
// We must ignore statement clustering column restriction when
// choosing a row to check the conditions. If there is no
@@ -130,6 +133,9 @@ const update_parameters::prefetch_data::row* cas_request::find_old_row(const cas
// CREATE TABLE t(p int, c int, s int static, v int, PRIMARY KEY(p, c));
// INSERT INTO t(p, s) VALUES(1, 1);
// UPDATE t SET v=1 WHERE p=1 AND c=1 IF s=1;
if (op.ranges.empty()) {
throw exceptions::invalid_request_exception("clustering key ranges empty - probably caused by an unset value");
}
const clustering_key& ckey = op.ranges.front().start() ? op.ranges.front().start()->value() : empty_ckey;
auto row = _rows.find_row(pkey, ckey);
if (row == nullptr && !ckey.is_empty() &&

View File

@@ -20,6 +20,7 @@
#include "tombstone_gc.hh"
#include "db/per_partition_rate_limit_extension.hh"
#include "db/per_partition_rate_limit_options.hh"
#include "utils/bloom_calculations.hh"
#include <boost/algorithm/string/predicate.hpp>
@@ -152,6 +153,16 @@ void cf_prop_defs::validate(const data_dictionary::database db, sstring ks_name,
throw exceptions::configuration_exception(KW_MAX_INDEX_INTERVAL + " must be greater than " + KW_MIN_INDEX_INTERVAL);
}
if (get_simple(KW_BF_FP_CHANCE)) {
double bloom_filter_fp_chance = get_double(KW_BF_FP_CHANCE, 0/*not used*/);
double min_bloom_filter_fp_chance = utils::bloom_calculations::min_supported_bloom_filter_fp_chance();
if (bloom_filter_fp_chance <= min_bloom_filter_fp_chance || bloom_filter_fp_chance > 1.0) {
throw exceptions::configuration_exception(format(
"{} must be larger than {} and less than or equal to 1.0 (got {})",
KW_BF_FP_CHANCE, min_bloom_filter_fp_chance, bloom_filter_fp_chance));
}
}
speculative_retry::from_sstring(get_string(KW_SPECULATIVE_RETRY, speculative_retry(speculative_retry::type::NONE, 0).to_sstring()));
}

View File

@@ -15,6 +15,7 @@
#include "cql3/util.hh"
#include "validation.hh"
#include "db/consistency_level_validations.hh"
#include <optional>
#include <seastar/core/shared_ptr.hh>
#include <boost/range/adaptor/transformed.hpp>
#include <boost/range/adaptor/map.hpp>
@@ -92,8 +93,9 @@ bool modification_statement::is_timestamp_set() const {
return attrs->is_timestamp_set();
}
gc_clock::duration modification_statement::get_time_to_live(const query_options& options) const {
return gc_clock::duration(attrs->get_time_to_live(options));
std::optional<gc_clock::duration> modification_statement::get_time_to_live(const query_options& options) const {
std::optional<int32_t> ttl = attrs->get_time_to_live(options);
return ttl ? std::make_optional<gc_clock::duration>(*ttl) : std::nullopt;
}
future<> modification_statement::check_access(query_processor& qp, const service::client_state& state) const {
@@ -109,9 +111,6 @@ future<> modification_statement::check_access(query_processor& qp, const service
future<std::vector<mutation>>
modification_statement::get_mutations(query_processor& qp, const query_options& options, db::timeout_clock::time_point timeout, bool local, int64_t now, service::query_state& qs) const {
if (_restrictions->range_or_slice_eq_null(options)) { // See #7852 and #9290.
throw exceptions::invalid_request_exception("Invalid null value in condition for a key column");
}
auto cl = options.get_consistency();
auto json_cache = maybe_prepare_json_cache(options);
auto keys = build_partition_keys(options, json_cache);
@@ -250,6 +249,12 @@ modification_statement::execute_without_checking_exception_message(query_process
return modify_stage(this, seastar::ref(qp), seastar::ref(qs), seastar::cref(options));
}
void modification_statement::validate_primary_key_restrictions(const query_options& options) const {
if (_restrictions->range_or_slice_eq_null(options)) { // See #7852 and #9290.
throw exceptions::invalid_request_exception("Invalid null value in condition for a key column");
}
}
future<::shared_ptr<cql_transport::messages::result_message>>
modification_statement::do_execute(query_processor& qp, service::query_state& qs, const query_options& options) const {
if (has_conditions() && options.get_protocol_version() == 1) {
@@ -260,6 +265,8 @@ modification_statement::do_execute(query_processor& qp, service::query_state& qs
inc_cql_stats(qs.get_client_state().is_internal());
validate_primary_key_restrictions(options);
if (has_conditions()) {
return execute_with_condition(qp, qs, options);
}

View File

@@ -128,7 +128,7 @@ public:
bool is_timestamp_set() const;
gc_clock::duration get_time_to_live(const query_options& options) const;
std::optional<gc_clock::duration> get_time_to_live(const query_options& options) const;
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;
@@ -229,6 +229,8 @@ public:
// True if this statement needs to read only static column values to check if it can be applied.
bool has_only_static_column_conditions() const { return !_has_regular_column_conditions && _has_static_column_conditions; }
void validate_primary_key_restrictions(const query_options& options) const;
virtual future<::shared_ptr<cql_transport::messages::result_message>>
execute(query_processor& qp, service::query_state& qs, const query_options& options) const override;

View File

@@ -1499,7 +1499,7 @@ parallelized_select_statement::do_execute(
command->slice.options.set<query::partition_slice::option::allow_short_read>();
auto timeout_duration = get_timeout(state.get_client_state(), options);
auto timeout = db::timeout_clock::now() + timeout_duration;
auto timeout = lowres_system_clock::now() + timeout_duration;
auto reductions = _selection->get_reductions();
query::forward_request req = {
@@ -1571,11 +1571,16 @@ void select_statement::maybe_jsonize_select_clause(data_dictionary::database db,
std::vector<data_type> selector_types;
std::vector<const column_definition*> defs;
selector_names.reserve(_select_clause.size());
selector_types.reserve(_select_clause.size());
auto selectables = selection::raw_selector::to_selectables(_select_clause, *schema);
selection::selector_factories factories(selection::raw_selector::to_selectables(_select_clause, *schema), db, schema, defs);
auto selectors = factories.new_instances();
for (size_t i = 0; i < selectors.size(); ++i) {
selector_names.push_back(selectables[i]->to_string());
if (_select_clause[i]->alias) {
selector_names.push_back(_select_clause[i]->alias->to_string());
} else {
selector_names.push_back(selectables[i]->to_string());
}
selector_types.push_back(selectors[i]->get_type());
}

View File

@@ -93,7 +93,7 @@ public:
};
// Note: value (mutation) only required to contain the rows we are interested in
private:
const gc_clock::duration _ttl;
const std::optional<gc_clock::duration> _ttl;
// For operations that require a read-before-write, stores prefetched cell values.
// For CAS statements, stores values of conditioned columns.
// Is a reference to an outside prefetch_data container since a CAS BATCH statement
@@ -106,7 +106,7 @@ public:
const query_options& _options;
update_parameters(const schema_ptr schema_, const query_options& options,
api::timestamp_type timestamp, gc_clock::duration ttl, const prefetch_data& prefetched)
api::timestamp_type timestamp, std::optional<gc_clock::duration> ttl, const prefetch_data& prefetched)
: _ttl(ttl)
, _prefetched(prefetched)
, _timestamp(timestamp)
@@ -127,11 +127,7 @@ public:
}
atomic_cell make_cell(const abstract_type& type, const raw_value_view& value, atomic_cell::collection_member cm = atomic_cell::collection_member::no) const {
auto ttl = _ttl;
if (ttl.count() <= 0) {
ttl = _schema->default_time_to_live();
}
auto ttl = this->ttl();
return value.with_value([&] (const FragmentedView auto& v) {
if (ttl.count() > 0) {
@@ -143,11 +139,7 @@ public:
};
atomic_cell make_cell(const abstract_type& type, const managed_bytes_view& value, atomic_cell::collection_member cm = atomic_cell::collection_member::no) const {
auto ttl = _ttl;
if (ttl.count() <= 0) {
ttl = _schema->default_time_to_live();
}
auto ttl = this->ttl();
if (ttl.count() > 0) {
return atomic_cell::make_live(type, _timestamp, value, _local_deletion_time + ttl, ttl, cm);
@@ -169,7 +161,7 @@ public:
}
gc_clock::duration ttl() const {
return _ttl.count() > 0 ? _ttl : _schema->default_time_to_live();
return _ttl.value_or(_schema->default_time_to_live());
}
gc_clock::time_point expiry() const {

View File

@@ -204,7 +204,7 @@ keyspace_metadata::keyspace_metadata(std::string_view name,
std::move(strategy_options),
durable_writes,
std::move(cf_defs),
user_types_metadata{},
std::move(user_types),
storage_options{}) { }
keyspace_metadata::keyspace_metadata(std::string_view name,

View File

@@ -2031,7 +2031,7 @@ future<> db::commitlog::segment_manager::shutdown() {
}
}
co_await _shutdown_promise->get_shared_future();
clogger.info("Commitlog shutdown complete");
clogger.debug("Commitlog shutdown complete");
}
void db::commitlog::segment_manager::add_file_to_dispose(named_file f, dispose_mode mode) {
@@ -2094,6 +2094,9 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
clogger.debug("Discarding segments {}", ftd);
for (auto& [f, mode] : ftd) {
// `f.remove_file()` resets known_size to 0, so remember the size here,
// in order to subtract it from total_size_on_disk accurately.
size_t size = f.known_size();
try {
if (f) {
co_await f.close();
@@ -2110,7 +2113,6 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
}
}
auto size = f.known_size();
auto usage = totals.total_size_on_disk;
auto next_usage = usage - size;
@@ -2144,7 +2146,7 @@ future<> db::commitlog::segment_manager::do_pending_deletes() {
// or had such an exception that we consider the file dead
// anyway. In either case we _remove_ the file size from
// footprint, because it is no longer our problem.
totals.total_size_on_disk -= f.known_size();
totals.total_size_on_disk -= size;
}
// #8376 - if we had an error in recycling (disk rename?), and no elements

View File

@@ -899,6 +899,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Ignore truncation record stored in system tables as if tables were never truncated.")
, force_schema_commit_log(this, "force_schema_commit_log", value_status::Used, false,
"Use separate schema commit log unconditionally rater than after restart following discovery of cluster-wide support for it.")
, cache_index_pages(this, "cache_index_pages", liveness::LiveUpdate, value_status::Used, true,
"Keep SSTable index pages in the global cache after a SSTable read. Expected to improve performance for workloads with big partitions, but may degrade performance for workloads with small partitions.")
, default_log_level(this, "default_log_level", value_status::Used)
, logger_log_level(this, "logger_log_level", value_status::Used)
, log_to_stdout(this, "log_to_stdout", value_status::Used)

View File

@@ -379,6 +379,8 @@ public:
named_value<bool> ignore_truncation_record;
named_value<bool> force_schema_commit_log;
named_value<bool> cache_index_pages;
seastar::logging_settings logging_settings(const log_cli::options&) const;
const db::extensions& extensions() const;

View File

@@ -2020,6 +2020,33 @@ std::vector<shared_ptr<cql3::functions::user_function>> create_functions_from_sc
return ret;
}
std::vector<shared_ptr<cql3::functions::user_aggregate>> create_aggregates_from_schema_partition(
replica::database& db, lw_shared_ptr<query::result_set> result, lw_shared_ptr<query::result_set> scylla_result) {
std::unordered_multimap<sstring, const query::result_set_row*> scylla_aggs;
if (scylla_result) {
for (const auto& scylla_row : scylla_result->rows()) {
auto scylla_agg_name = scylla_row.get_nonnull<sstring>("aggregate_name");
scylla_aggs.emplace(scylla_agg_name, &scylla_row);
}
}
std::vector<shared_ptr<cql3::functions::user_aggregate>> ret;
for (const auto& row : result->rows()) {
auto agg_name = row.get_nonnull<sstring>("aggregate_name");
auto agg_args = read_arg_types(db, row, row.get_nonnull<sstring>("keyspace_name"));
const query::result_set_row *scylla_row_ptr = nullptr;
for (auto [it, end] = scylla_aggs.equal_range(agg_name); it != end; ++it) {
auto scylla_agg_args = read_arg_types(db, *it->second, it->second->get_nonnull<sstring>("keyspace_name"));
if (agg_args == scylla_agg_args) {
scylla_row_ptr = it->second;
break;
}
}
ret.emplace_back(create_aggregate(db, row, scylla_row_ptr));
}
return ret;
}
/*
* User type metadata serialization/deserialization
*/

View File

@@ -209,6 +209,8 @@ std::vector<user_type> create_types_from_schema_partition(keyspace_metadata& ks,
std::vector<shared_ptr<cql3::functions::user_function>> create_functions_from_schema_partition(replica::database& db, lw_shared_ptr<query::result_set> result);
std::vector<shared_ptr<cql3::functions::user_aggregate>> create_aggregates_from_schema_partition(replica::database& db, lw_shared_ptr<query::result_set> result, lw_shared_ptr<query::result_set> scylla_result);
std::vector<mutation> make_create_function_mutations(shared_ptr<cql3::functions::user_function> func, api::timestamp_type timestamp);
std::vector<mutation> make_drop_function_mutations(shared_ptr<cql3::functions::user_function> func, api::timestamp_type timestamp);

View File

@@ -3234,11 +3234,11 @@ mutation system_keyspace::make_group0_history_state_id_mutation(
using namespace std::chrono;
assert(*gc_older_than >= gc_clock::duration{0});
auto ts_millis = duration_cast<milliseconds>(microseconds{ts});
auto gc_older_than_millis = duration_cast<milliseconds>(*gc_older_than);
assert(gc_older_than_millis < ts_millis);
auto ts_micros = microseconds{ts};
auto gc_older_than_micros = duration_cast<microseconds>(*gc_older_than);
assert(gc_older_than_micros < ts_micros);
auto tomb_upper_bound = utils::UUID_gen::min_time_UUID(ts_millis - gc_older_than_millis);
auto tomb_upper_bound = utils::UUID_gen::min_time_UUID(ts_micros - gc_older_than_micros);
// We want to delete all entries with IDs smaller than `tomb_upper_bound`
// but the deleted range is of the form (x, +inf) since the schema is reversed.
auto range = query::clustering_range::make_starting_with({

View File

@@ -197,7 +197,7 @@ public:
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
return flat_mutation_reader_v2(std::make_unique<build_progress_reader>(
std::move(s),
s,
std::move(permit),
_db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),
range,

View File

@@ -10,8 +10,6 @@
#include "log.hh"
#include "utils/latency.hh"
#include <seastar/core/when_all.hh>
static logging::logger mylog("row_locking");
row_locker::row_locker(schema_ptr s)
@@ -76,35 +74,32 @@ row_locker::lock_pk(const dht::decorated_key& pk, bool exclusive, db::timeout_cl
future<row_locker::lock_holder>
row_locker::lock_ck(const dht::decorated_key& pk, const clustering_key_prefix& cpk, bool exclusive, db::timeout_clock::time_point timeout, stats& stats) {
mylog.debug("taking shared lock on partition {}, and {} lock on row {} in it", pk, (exclusive ? "exclusive" : "shared"), cpk);
auto ck = cpk;
// Create a two-level lock entry for the partition if it doesn't exist already.
auto i = _two_level_locks.try_emplace(pk, this).first;
// The two-level lock entry we've just created is guaranteed to be kept alive as long as it's locked.
// Initiating read locking in the background below ensures that even if the two-level lock is currently
// write-locked, releasing the write-lock will synchronously engage any waiting
// locks and will keep the entry alive.
future<lock_type::holder> lock_partition = i->second._partition_lock.hold_read_lock(timeout);
auto j = i->second._row_locks.find(cpk);
if (j == i->second._row_locks.end()) {
// Not yet locked, need to create the lock. This makes a copy of cpk.
try {
j = i->second._row_locks.emplace(cpk, lock_type()).first;
} catch(...) {
// If this emplace() failed, e.g., out of memory, we fail. We
// could do nothing - the partition lock we already started
// taking will be unlocked automatically after being locked.
// But it's better form to wait for the work we started, and it
// will also allow us to remove the hash-table row we added.
return lock_partition.then([ex = std::current_exception()] (auto lock) {
// The lock is automatically released when "lock" goes out of scope.
// TODO: unlock (lock = {}) now, search for the partition in the
// hash table (we know it's still there, because we held the lock until
// now) and remove the unused lock from the hash table if still unused.
return make_exception_future<row_locker::lock_holder>(std::current_exception());
});
}
}
single_lock_stats &single_lock_stats = exclusive ? stats.exclusive_row : stats.shared_row;
single_lock_stats.operations_currently_waiting_for_lock++;
utils::latency_counter waiting_latency;
waiting_latency.start();
future<lock_type::holder> lock_row = exclusive ? j->second.hold_write_lock(timeout) : j->second.hold_read_lock(timeout);
return when_all_succeed(std::move(lock_partition), std::move(lock_row))
.then_unpack([this, pk = &i->first, cpk = &j->first, exclusive, &single_lock_stats, waiting_latency = std::move(waiting_latency)] (auto lock1, auto lock2) mutable {
return lock_partition.then([this, pk = &i->first, row_locks = &i->second._row_locks, ck = std::move(ck), exclusive, &single_lock_stats, waiting_latency = std::move(waiting_latency), timeout] (auto lock1) mutable {
auto j = row_locks->find(ck);
if (j == row_locks->end()) {
// Not yet locked, need to create the lock.
j = row_locks->emplace(std::move(ck), lock_type()).first;
}
auto* cpk = &j->first;
auto& row_lock = j->second;
// Like to the two-level lock entry above, the row_lock entry we've just created
// is guaranteed to be kept alive as long as it's locked.
// Initiating read/write locking in the background below ensures that.
auto lock_row = exclusive ? row_lock.hold_write_lock(timeout) : row_lock.hold_read_lock(timeout);
return lock_row.then([this, pk, cpk, exclusive, &single_lock_stats, waiting_latency = std::move(waiting_latency), lock1 = std::move(lock1)] (auto lock2) mutable {
// FIXME: indentation
lock1.release();
lock2.release();
waiting_latency.stop();
@@ -112,6 +107,7 @@ row_locker::lock_ck(const dht::decorated_key& pk, const clustering_key_prefix& c
single_lock_stats.lock_acquisitions++;
single_lock_stats.operations_currently_waiting_for_lock--;
return lock_holder(this, pk, cpk, exclusive);
});
});
}

View File

@@ -123,6 +123,9 @@ const column_definition* view_info::view_column(const column_definition& base_de
void view_info::set_base_info(db::view::base_info_ptr base_info) {
_base_info = std::move(base_info);
// Forget the cached objects which may refer to the base schema.
_select_statement = nullptr;
_partition_slice = std::nullopt;
}
// A constructor for a base info that can facilitate reads and writes from the materialized view.
@@ -868,13 +871,18 @@ void view_updates::generate_update(
bool same_row = true;
for (auto col_id : col_ids) {
auto* after = update.cells().find_cell(col_id);
// Note: multi-cell columns can't be part of the primary key.
auto& cdef = _base->regular_column_at(col_id);
if (existing) {
auto* before = existing->cells().find_cell(col_id);
// Note that this cell is necessarily atomic, because col_ids are
// view key columns, and keys must be atomic.
if (before && before->as_atomic_cell(cdef).is_live()) {
if (after && after->as_atomic_cell(cdef).is_live()) {
auto cmp = compare_atomic_cell_for_merge(before->as_atomic_cell(cdef), after->as_atomic_cell(cdef));
// We need to compare just the values of the keys, not
// metadata like the timestamp. This is because below,
// if the old and new view row have the same key, we need
// to be sure to reach the update_entry() case.
auto cmp = compare_unsigned(before->as_atomic_cell(cdef).value(), after->as_atomic_cell(cdef).value());
if (cmp != 0) {
same_row = false;
}
@@ -894,7 +902,13 @@ void view_updates::generate_update(
if (same_row) {
update_entry(base_key, update, *existing, now);
} else {
replace_entry(base_key, update, *existing, now);
// This code doesn't work if the old and new view row have the
// same key, because if they do we get both data and tombstone
// for the same timestamp (now) and the tombstone wins. This
// is why we need the "same_row" case above - it's not just a
// performance optimization.
delete_old_entry(base_key, *existing, update, now);
create_entry(base_key, update, now);
}
} else {
delete_old_entry(base_key, *existing, update, now);
@@ -938,8 +952,12 @@ future<stop_iteration> view_update_builder::stop() const {
return make_ready_future<stop_iteration>(stop_iteration::yes);
}
future<utils::chunked_vector<frozen_mutation_and_schema>> view_update_builder::build_some() {
future<std::optional<utils::chunked_vector<frozen_mutation_and_schema>>> view_update_builder::build_some() {
(void)co_await advance_all();
if (!_update && !_existing) {
// Tell the caller there is no more data to build.
co_return std::nullopt;
}
bool do_advance_updates = false;
bool do_advance_existings = false;
if (_update && _update->is_partition_start()) {
@@ -1462,6 +1480,8 @@ future<> view_builder::start(service::migration_manager& mm) {
(void)_build_step.trigger();
return make_ready_future<>();
});
}).handle_exception_type([] (const seastar::sleep_aborted& e) {
vlogger.debug("start aborted: {}", e.what());
}).handle_exception([] (std::exception_ptr eptr) {
vlogger.error("start failed: {}", eptr);
return make_ready_future<>();
@@ -2056,15 +2076,20 @@ public:
// Called in the context of a seastar::thread.
void view_builder::execute(build_step& step, exponential_backoff_retry r) {
gc_clock::time_point now = gc_clock::now();
auto consumer = compact_for_query_v2<view_builder::consumer>(
auto compaction_state = make_lw_shared<compact_for_query_state_v2>(
*step.reader.schema(),
now,
step.pslice,
batch_size,
query::max_partitions,
view_builder::consumer{*this, step, now});
consumer.consume_new_partition(step.current_key); // Initialize the state in case we're resuming a partition
query::max_partitions);
auto consumer = compact_for_query_v2<view_builder::consumer>(compaction_state, view_builder::consumer{*this, step, now});
auto built = step.reader.consume_in_thread(std::move(consumer));
if (auto ds = std::move(*compaction_state).detach_state()) {
if (ds->current_tombstone) {
step.reader.unpop_mutation_fragment(mutation_fragment_v2(*step.reader.schema(), step.reader.permit(), std::move(*ds->current_tombstone)));
}
step.reader.unpop_mutation_fragment(mutation_fragment_v2(*step.reader.schema(), step.reader.permit(), std::move(ds->partition_start)));
}
_as.check();
@@ -2146,24 +2171,28 @@ update_backlog node_update_backlog::add_fetch(unsigned shard, update_backlog bac
return std::max(backlog, _max.load(std::memory_order_relaxed));
}
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const sstring& ks_name, const sstring& cf_name) {
return sys_dist_ks.view_status(ks_name, cf_name).then([] (std::unordered_map<utils::UUID, sstring>&& view_statuses) {
return boost::algorithm::any_of(view_statuses | boost::adaptors::map_values, [] (const sstring& view_status) {
return view_status == "STARTED";
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const sstring& ks_name,
const sstring& cf_name) {
using view_statuses_type = std::unordered_map<utils::UUID, sstring>;
return sys_dist_ks.view_status(ks_name, cf_name).then([&tm] (view_statuses_type&& view_statuses) {
return boost::algorithm::any_of(view_statuses, [&tm] (const view_statuses_type::value_type& view_status) {
// Only consider status of known hosts.
return view_status.second == "STARTED" && tm.get_endpoint_for_host_id(view_status.first);
});
});
}
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const replica::table& t, streaming::stream_reason reason) {
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const replica::table& t,
streaming::stream_reason reason) {
if (is_internal_keyspace(t.schema()->ks_name())) {
return make_ready_future<bool>(false);
}
if (reason == streaming::stream_reason::repair && !t.views().empty()) {
return make_ready_future<bool>(true);
}
return do_with(t.views(), [&sys_dist_ks] (auto& views) {
return do_with(t.views(), [&sys_dist_ks, &tm] (auto& views) {
return map_reduce(views,
[&sys_dist_ks] (const view_ptr& view) { return check_view_build_ongoing(sys_dist_ks, view->ks_name(), view->cf_name()); },
[&sys_dist_ks, &tm] (const view_ptr& view) { return check_view_build_ongoing(sys_dist_ks, tm, view->ks_name(), view->cf_name()); },
false,
std::logical_or<bool>());
});

View File

@@ -154,10 +154,7 @@ private:
void delete_old_entry(const partition_key& base_key, const clustering_row& existing, const clustering_row& update, gc_clock::time_point now);
void do_delete_old_entry(const partition_key& base_key, const clustering_row& existing, const clustering_row& update, gc_clock::time_point now);
void update_entry(const partition_key& base_key, const clustering_row& update, const clustering_row& existing, gc_clock::time_point now);
void replace_entry(const partition_key& base_key, const clustering_row& update, const clustering_row& existing, gc_clock::time_point now) {
create_entry(base_key, update, now);
delete_old_entry(base_key, existing, update, now);
}
void update_entry_for_computed_column(const partition_key& base_key, const clustering_row& update, const std::optional<clustering_row>& existing, gc_clock::time_point now);
};
class view_update_builder {
@@ -188,7 +185,15 @@ public:
}
view_update_builder(view_update_builder&& other) noexcept = default;
future<utils::chunked_vector<frozen_mutation_and_schema>> build_some();
// build_some() works on batches of 100 (max_rows_for_view_updates)
// updated rows, but can_skip_view_updates() can decide that some of
// these rows do not effect the view, and as a result build_some() can
// fewer than 100 rows - in extreme cases even zero (see issue #12297).
// So we can't use an empty returned vector to signify that the view
// update building is done - and we wrap the return value in an
// std::optional, which is disengaged when the iteration is done.
future<std::optional<utils::chunked_vector<frozen_mutation_and_schema>>> build_some();
future<> close() noexcept;

View File

@@ -22,9 +22,13 @@ class system_distributed_keyspace;
}
namespace locator {
class token_metadata;
}
namespace db::view {
future<bool> check_view_build_ongoing(db::system_distributed_keyspace& sys_dist_ks, const sstring& ks_name, const sstring& cf_name);
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const replica::table& t, streaming::stream_reason reason);
future<bool> check_needs_view_update_path(db::system_distributed_keyspace& sys_dist_ks, const locator::token_metadata& tm, const replica::table& t,
streaming::stream_reason reason);
}

View File

@@ -84,10 +84,10 @@ future<> view_update_generator::start() {
service::get_local_streaming_priority(),
nullptr,
::mutation_reader::forwarding::no);
auto close_sr = deferred_close(staging_sstable_reader);
inject_failure("view_update_generator_consume_staging_sstable");
auto result = staging_sstable_reader.consume_in_thread(view_updating_consumer(s, std::move(permit), *t, sstables, _as, staging_sstable_reader_handle));
staging_sstable_reader.close().get();
if (result == stop_iteration::yes) {
break;
}

View File

@@ -11,6 +11,7 @@
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sstring.hh>
#include <seastar/util/optimized_optional.hh>
#include "types.hh"
#include "keys.hh"
#include "utils/managed_bytes.hh"
@@ -317,6 +318,9 @@ class ring_position_view {
const dht::token* _token; // always not nullptr
const partition_key* _key; // Can be nullptr
int8_t _weight;
private:
ring_position_view() noexcept : _token(nullptr), _key(nullptr), _weight(0) { }
explicit operator bool() const noexcept { return bool(_token); }
public:
using token_bound = ring_position::token_bound;
struct after_key_tag {};
@@ -404,9 +408,11 @@ public:
after_key is_after_key() const { return after_key(_weight == 1); }
friend std::ostream& operator<<(std::ostream&, ring_position_view);
friend class optimized_optional<ring_position_view>;
};
using ring_position_ext_view = ring_position_view;
using ring_position_view_opt = optimized_optional<ring_position_view>;
//
// Represents position in the ring of partitions, where partitions are ordered

View File

@@ -478,7 +478,15 @@ static future<bool> ping_with_timeout(pinger::endpoint_id id, clock::timepoint_t
auto f = pinger.ping(id, timeout_as);
auto sleep_and_abort = [] (clock::timepoint_t timeout, abort_source& timeout_as, clock& c) -> future<> {
co_await c.sleep_until(timeout, timeout_as);
co_await c.sleep_until(timeout, timeout_as).then_wrapped([&timeout_as] (auto&& f) {
// Avoid throwing if sleep was aborted.
if (f.failed() && timeout_as.abort_requested()) {
// Expected (if ping() resolved first or we were externally aborted).
f.ignore_ready_future();
return make_ready_future<>();
}
return std::move(f);
});
if (!timeout_as.abort_requested()) {
// We resolved before `f`. Abort the operation.
timeout_as.request_abort();
@@ -501,8 +509,6 @@ static future<bool> ping_with_timeout(pinger::endpoint_id id, clock::timepoint_t
// Wait on the sleep as well (it should return shortly, being aborted) so we don't discard the future.
try {
co_await std::move(sleep_and_abort);
} catch (const sleep_aborted&) {
// Expected (if `f` resolved first or we were externally aborted).
} catch (...) {
// There should be no other exceptions, but just in case... log it and discard,
// we want to propagate exceptions from `f`, not from sleep.

View File

@@ -42,7 +42,8 @@ if __name__ == '__main__':
if systemd_unit.available('systemd-coredump@.service'):
dropin = '''
[Service]
TimeoutStartSec=infinity
RuntimeMaxSec=infinity
TimeoutSec=infinity
'''[1:-1]
os.makedirs('/etc/systemd/system/systemd-coredump@.service.d', exist_ok=True)
with open('/etc/systemd/system/systemd-coredump@.service.d/timeout.conf', 'w') as f:

View File

@@ -36,6 +36,9 @@ if __name__ == '__main__':
except:
pass
if cpuset != args.cpuset or smp != args.smp:
if os.path.exists('/etc/scylla.d/perftune.yaml'):
os.remove('/etc/scylla.d/perftune.yaml')
cfg.set('CPUSET', '{cpuset}{smp}'.format( \
cpuset='--cpuset {} '.format(args.cpuset) if args.cpuset else '', \
smp='--smp {} '.format(args.smp) if args.smp else '' \

View File

@@ -25,7 +25,7 @@ if __name__ == '__main__':
run('dd if=/dev/zero of=/var/tmp/kernel-check.img bs=1M count=128', shell=True, check=True, stdout=DEVNULL, stderr=DEVNULL)
run('mkfs.xfs /var/tmp/kernel-check.img', shell=True, check=True, stdout=DEVNULL, stderr=DEVNULL)
run('mount /var/tmp/kernel-check.img /var/tmp/mnt -o loop', shell=True, check=True, stdout=DEVNULL, stderr=DEVNULL)
ret = run('iotune --fs-check --evaluation-directory /var/tmp/mnt', shell=True).returncode
ret = run('iotune --fs-check --evaluation-directory /var/tmp/mnt --default-log-level error', shell=True).returncode
run('umount /var/tmp/mnt', shell=True, check=True)
shutil.rmtree('/var/tmp/mnt')
os.remove('/var/tmp/kernel-check.img')

View File

@@ -16,41 +16,72 @@ import distro
from scylla_util import *
from subprocess import run
def get_mode_cpuset(nic, mode):
mode_cpu_mask = out('/opt/scylladb/scripts/perftune.py --tune net --nic {} --mode {} --get-cpu-mask-quiet'.format(nic, mode))
return hex2list(mode_cpu_mask)
def get_cur_cpuset():
cfg = sysconfig_parser('/etc/scylla.d/cpuset.conf')
cpuset = cfg.get('CPUSET')
return re.sub(r'^--cpuset (.+)$', r'\1', cpuset).strip()
def get_tune_mode(nic):
def cpu_mask_is_zero(cpu_mask):
"""
The cpu_mask is a comma-separated list of 32-bit hex values with possibly omitted zero components,
e.g. 0xffff,,0xffff
We want to estimate if the whole mask is all-zeros.
:param cpu_mask: hwloc-calc generated CPU mask
:return: True if mask is zero, False otherwise
"""
for cur_cpu_mask in cpu_mask.split(','):
if cur_cpu_mask and int(cur_cpu_mask, 16) != 0:
return False
return True
def get_irq_cpu_mask():
"""
Return an irq_cpu_mask corresponding to a value written in cpuset.conf
Let's use the "CPU masks invariant": irq_cpu_mask | compute_cpu_mask == cpu_mask.
This function is called when we are generating a perftune.yaml meaning that there are no restrictions on
cpu_mask defined.
And this means that in the context of this function call cpu_mask is "all CPUs", or in hwloc-cal lingo - 'all'.
(For any "special" value of a cpu_mask a user needs to write his/her own perftune.yaml)
Mentioned above means that in order to calculate an irq_cpu_mask that corresponds to a compute_cpu_mask defined
using --cpuset in cpuset.conf and cpu_mask == 'all' we need to invert bits from the compute_cpu_mask in the 'all'
mask.
This can be achieved by running the following hwloc-calc command:
hwloc-calc --pi all ~PU:X ~PU:Y ~PU:Z ...
where X,Y,Z,... are either a single CPU index or a CPU range.
For example, if we have the following cpuset:
0,2-7,17-24,35
to get irq_cpu_mask we want to run the following command:
hwloc-calc --pi all ~PU:0 ~PU:2-7 ~PU:17-24 ~PU:35
"""
if not os.path.exists('/etc/scylla.d/cpuset.conf'):
raise Exception('/etc/scylla.d/cpuset.conf not found')
cur_cpuset = get_cur_cpuset()
mq_cpuset = get_mode_cpuset(nic, 'mq')
sq_cpuset = get_mode_cpuset(nic, 'sq')
sq_split_cpuset = get_mode_cpuset(nic, 'sq_split')
if cur_cpuset == mq_cpuset:
return 'mq'
elif cur_cpuset == sq_cpuset:
return 'sq'
elif cur_cpuset == sq_split_cpuset:
return 'sq_split'
else:
raise Exception('tune mode not found')
hwloc_cmd = "/opt/scylladb/bin/hwloc-calc --pi all {}".\
format(" ".join(['~PU:{}'.format(c) for c in cur_cpuset.split(",")]))
def config_updated():
perftune_mtime = os.path.getmtime('/etc/scylla.d/perftune.yaml')
cpuset_mtime = os.path.getmtime('/etc/scylla.d/cpuset.conf')
sysconfig_mtime = os.path.getmtime(sysconfdir_p() / 'scylla-server')
print("perftune_mtime < cpuset_mtime:{}".format(perftune_mtime < cpuset_mtime))
print("perftune_mtime < sysconfig_mtime:{}".format(perftune_mtime < sysconfig_mtime))
if perftune_mtime < cpuset_mtime or perftune_mtime < sysconfig_mtime:
return True
return False
irq_cpu_mask = out(hwloc_cmd).strip()
# If the generated mask turns out to be all-zeros then it means that all present CPUs are used in cpuset.conf.
# In such a case irq_cpu_mask has to be all-CPUs too, a.k.a. MQ mode.
if cpu_mask_is_zero(irq_cpu_mask):
irq_cpu_mask = out("/opt/scylladb/bin/hwloc-calc all").strip()
return irq_cpu_mask
def create_perftune_conf(cfg):
"""
@@ -65,8 +96,10 @@ def create_perftune_conf(cfg):
nic = cfg.get('IFNAME')
if not nic:
nic = 'eth0'
mode = get_tune_mode(nic)
params += '--tune net --nic "{nic}" --mode {mode}'.format(nic=nic, mode=mode)
irq_cpu_mask = get_irq_cpu_mask()
# Note that 'irq_cpu_mask' is a coma separated list of 32-bits wide masks.
# Therefore, we need to put it in quotes.
params += '--tune net --nic "{nic}" --irq-cpu-mask "{irq_cpu_mask}"'.format(nic=nic, irq_cpu_mask=irq_cpu_mask)
if cfg.has_option('SET_CLOCKSOURCE') and cfg.get('SET_CLOCKSOURCE') == 'yes':
params += ' --tune system --tune-clock'
@@ -75,7 +108,7 @@ def create_perftune_conf(cfg):
params += ' --write-back-cache=false'
if len(params) > 0:
if os.path.exists('/etc/scylla.d/perftune.yaml') and not config_updated():
if os.path.exists('/etc/scylla.d/perftune.yaml'):
return True
params += ' --dump-options-file'

View File

@@ -16,7 +16,7 @@ import stat
import distro
from pathlib import Path
from scylla_util import *
from subprocess import run
from subprocess import run, SubprocessError
if __name__ == '__main__':
if os.getuid() > 0:
@@ -137,7 +137,9 @@ if __name__ == '__main__':
# stalling. The minimum block size for crc enabled filesystems is 1024,
# and it also cannot be smaller than the sector size.
block_size = max(1024, sector_size)
run('udevadm settle', shell=True, check=True)
run(f'mkfs.xfs -b size={block_size} {fsdev} -f -K', shell=True, check=True)
run('udevadm settle', shell=True, check=True)
if is_debian_variant():
confpath = '/etc/mdadm/mdadm.conf'
@@ -153,6 +155,11 @@ if __name__ == '__main__':
os.makedirs(mount_at, exist_ok=True)
uuid = out(f'blkid -s UUID -o value {fsdev}')
if not uuid:
raise Exception(f'Failed to get UUID of {fsdev}')
uuidpath = f'/dev/disk/by-uuid/{uuid}'
after = 'local-fs.target'
wants = ''
if raid and args.raid_level != '0':
@@ -169,7 +176,7 @@ After={after}{wants}
DefaultDependencies=no
[Mount]
What=/dev/disk/by-uuid/{uuid}
What={uuidpath}
Where={mount_at}
Type=xfs
Options=noatime{opt_discard}
@@ -191,8 +198,16 @@ WantedBy=multi-user.target
systemd_unit.reload()
if args.raid_level != '0':
md_service.start()
mount = systemd_unit(mntunit_bn)
mount.start()
try:
mount = systemd_unit(mntunit_bn)
mount.start()
except SubprocessError as e:
if not os.path.exists(uuidpath):
print(f'\nERROR: {uuidpath} is not found\n')
elif not stat.S_ISBLK(os.stat(uuidpath).st_mode):
print(f'\nERROR: {uuidpath} is not block device\n')
raise e
if args.enable_on_nextboot:
mount.enable()
uid = pwd.getpwnam('scylla').pw_uid

View File

@@ -214,7 +214,7 @@ if __name__ == '__main__':
help='skip raid setup')
parser.add_argument('--raid-level-5', action='store_true', default=False,
help='use RAID5 for RAID volume')
parser.add_argument('--online-discard', default=True,
parser.add_argument('--online-discard', default=1, choices=[0, 1], type=int,
help='Configure XFS to discard unused blocks as soon as files are deleted')
parser.add_argument('--nic',
help='specify NIC')
@@ -458,7 +458,7 @@ if __name__ == '__main__':
args.no_raid_setup = not raid_setup
if raid_setup:
level = '5' if raid_level_5 else '0'
run_setup_script('RAID', f'scylla_raid_setup --disks {disks} --enable-on-nextboot --raid-level={level} --online-discard={int(online_discard)}')
run_setup_script('RAID', f'scylla_raid_setup --disks {disks} --enable-on-nextboot --raid-level={level} --online-discard={online_discard}')
coredump_setup = interactive_ask_service('Do you want to enable coredumps?', 'Yes - sets up coredump to allow a post-mortem analysis of the Scylla state just prior to a crash. No - skips this step.', coredump_setup)
args.no_coredump_setup = not coredump_setup

View File

@@ -68,7 +68,12 @@ class ScyllaSetup:
def cqlshrc(self):
home = os.environ['HOME']
hostname = subprocess.check_output(['hostname', '-i']).decode('ascii').strip()
if self._rpcAddress:
hostname = self._rpcAddress
elif self._listenAddress:
hostname = self._listenAddress
else:
hostname = subprocess.check_output(['hostname', '-i']).decode('ascii').strip()
with open("%s/.cqlshrc" % home, "w") as cqlshrc:
cqlshrc.write("[connection]\nhostname = %s\n" % hostname)

View File

@@ -7,7 +7,7 @@ Group: Applications/Databases
License: AGPLv3
URL: http://www.scylladb.com/
Source0: %{reloc_pkg}
Requires: %{product}-server = %{version} %{product}-conf = %{version} %{product}-python3 = %{version} %{product}-kernel-conf = %{version} %{product}-jmx = %{version} %{product}-tools = %{version} %{product}-tools-core = %{version} %{product}-node-exporter = %{version}
Requires: %{product}-server = %{version}-%{release} %{product}-conf = %{version}-%{release} %{product}-python3 = %{version}-%{release} %{product}-kernel-conf = %{version}-%{release} %{product}-jmx = %{version}-%{release} %{product}-tools = %{version}-%{release} %{product}-tools-core = %{version}-%{release} %{product}-node-exporter = %{version}-%{release}
Obsoletes: scylla-server < 1.1
%global _debugsource_template %{nil}
@@ -54,7 +54,7 @@ Group: Applications/Databases
Summary: The Scylla database server
License: AGPLv3
URL: http://www.scylladb.com/
Requires: %{product}-conf = %{version} %{product}-python3 = %{version}
Requires: %{product}-conf = %{version}-%{release} %{product}-python3 = %{version}-%{release}
Conflicts: abrt
AutoReqProv: no

View File

@@ -1,58 +1,10 @@
### a dictionary of redirections
#old path: new path
# removing the old Monitoring Stack documentation from the ScyllaDB docs
/stable/operating-scylla/monitoring/index.html: https://monitoring.docs.scylladb.com/stable/
/stable/upgrade/upgrade-monitor/index.html: https://monitoring.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-monitor/upgrade-guide-from-monitoring-1.x-to-monitoring-2.x.html: https://monitoring.docs.scylladb.com/stable/upgrade/upgrade-guide-from-monitoring-1.x-to-monitoring-2.x.html
/stable/upgrade/upgrade-monitor/upgrade-guide-from-monitoring-2.x-to-monitoring-2.y.html: https://monitoring.docs.scylladb.com/stable/upgrade/upgrade-guide-from-monitoring-2.x-to-monitoring-2.y.html
/stable/upgrade/upgrade-monitor/upgrade-guide-from-monitoring-2.x-to-monitoring-3.y.html: https://monitoring.docs.scylladb.com/stable/upgrade/upgrade-guide-from-monitoring-2.x-to-monitoring-3.y.html
/stable/upgrade/upgrade-monitor/upgrade-guide-from-monitoring-3.x-to-monitoring-3.y.html: https://monitoring.docs.scylladb.com/stable/upgrade/upgrade-guide-from-monitoring-3.x-to-monitoring-3.y.html
# removing the old Operator documentation from the ScyllaDB docs
/stable/operating-scylla/scylla-operator/index.html: https://operator.docs.scylladb.com/stable/
### removing the old Scylla Manager documentation from the ScyllaDB docs
/stable/operating-scylla/manager/index.html: https://manager.docs.scylladb.com/
/stable/upgrade/upgrade-manager/index.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-maintenance-1.x.y-to-1.x.z/index.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-maintenance-1.x.y-to-1.x.z/upgrade-guide-from-manager-1.x.y-to-1.x.z-CentOS.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-maintenance-1.x.y-to-1.x.z/upgrade-guide-from-manager-1.x.y-to-1.x.z-ubuntu.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-manager-1.0.x-to-1.1.x.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-manager-1.1.x-to-1.2.x.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.2-to-1.3/index.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.2-to-1.3/upgrade-guide-from-manager-1.2.x-to-1.3.x-CentOS.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.2-to-1.3/upgrade-guide-from-manager-1.2.x-to-1.3.x-ubuntu.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.2-to-1.3/manager-metric-update-1.2-to-1.3.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.3-to-1.4/index.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.3-to-1.4/upgrade-guide-from-manager-1.3.x-to-1.4.x-CentOS.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.3-to-1.4/upgrade-guide-from-manager-1.3.x-to-1.4.x-ubuntu.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.3-to-1.4/manager-metric-update-1.3-to-1.4.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.4-to-2.0/index.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.4-to-2.0/upgrade-guide-from-manager-1.4.x-to-2.0.x.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-1.4-to-2.0/manager-metric-update-1.4-to-2.0.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-2.x.a-to-2.y.b/index.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-2.x.a-to-2.y.b/upgrade-2.x.a-to-2.y.b.html: https://manager.docs.scylladb.com/stable/upgrade/index.html
/stable/upgrade/upgrade-manager/upgrade-guide-from-2.x.a-to-2.y.b/upgrade-row-level-repair.html: https://www.scylladb.com/2019/08/13/scylla-open-source-3-1-efficiently-maintaining-consistency-with-row-level-repair/
stable/operating-scylla/manager/2.1/index.html: https://manager.docs.scylladb.com/
/stable/operating-scylla/manager/2.1/architecture.html: https://manager.docs.scylladb.com/
/stable/operating-scylla/manager/2.1/install.html: https://manager.docs.scylladb.com/stable/install-scylla-manager.html
/stable/operating-scylla/manager/2.1/install-agent.html: https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html
/stable/operating-scylla/manager/2.1/add-a-cluster.html: https://manager.docs.scylladb.com/stable/add-a-cluster.html
/stable/operating-scylla/manager/2.1/repair.html: https://manager.docs.scylladb.com/stable/repair/index.html
/stable/operating-scylla/manager/2.1/backup.html: https://manager.docs.scylladb.com/stable/backup/index.html
/stable/operating-scylla/manager/2.1/extract-schema-from-backup.html: https://manager.docs.scylladb.com/stable/sctool/backup.html
/stable/operating-scylla/manager/2.1/restore-a-backup.html: https://manager.docs.scylladb.com/stable/restore/index.html
/stable/operating-scylla/manager/2.1/health-check.html: https://manager.docs.scylladb.com/stable/health-check.html
/stable/operating-scylla/manager/2.1/sctool.html: https://manager.docs.scylladb.com/stable/sctool/index.html
/stable/operating-scylla/manager/2.1/monitoring-manager-integration.html: https://manager.docs.scylladb.com/stable/scylla-monitoring.html
/stable/operating-scylla/manager/2.1/use-a-remote-db.html: https://manager.docs.scylladb.com/
/stable/operating-scylla/manager/2.1/configuration-file.html: https://manager.docs.scylladb.com/stable/config/scylla-manager-config.html
/stable/operating-scylla/manager/2.1/agent-configuration-file.html: https://manager.docs.scylladb.com/stable/config/scylla-manager-agent-config.html
### moving the CQL reference files to the new cql folder
/stable/getting-started/ddl.html: /stable/cql/ddl.html
@@ -1108,14 +1060,14 @@ tls-ssl/index.html: /stable/operating-scylla/security
/using-scylla/integrations/integration_kairos/index.html: /stable/using-scylla/integrations/integration-kairos
/upgrade/ami_upgrade/index.html: /stable/upgrade/ami-upgrade
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: /stable/scylla-cloud/cloud-setup/GCP/gcp-vpc-peering
/scylla-cloud/cloud-setup/gcp-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
/scylla-cloud/cloud-setup/GCP/gcp-vcp-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/gcp-vpc-peering.html
# move scylla cloud for AWS to dedicated directory
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: /stable/scylla-cloud/cloud-setup/AWS/aws-vpc-peering
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: /stable/scylla-cloud/cloud-setup/AWS/cloud-prom-proxy
/scylla-cloud/cloud-setup/outposts/index.html: /stable/scylla-cloud/cloud-setup/AWS/outposts
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: /stable/scylla-cloud/cloud-setup/AWS/scylla-cloud-byoa
/scylla-cloud/cloud-setup/aws-vpc-peering/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/aws-vpc-peering.html
/scylla-cloud/cloud-setup/cloud-prom-proxy/index.html: https://cloud.docs.scylladb.com/stable/monitoring/cloud-prom-proxy.html
/scylla-cloud/cloud-setup/outposts/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/outposts.html
/scylla-cloud/cloud-setup/scylla-cloud-byoa/index.html: https://cloud.docs.scylladb.com/stable/cloud-setup/scylla-cloud-byoa.html
/scylla-cloud/cloud-services/scylla_cloud_costs/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-costs
/scylla-cloud/cloud-services/scylla_cloud_managin_versions/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-managin-versions
/scylla-cloud/cloud-services/scylla_cloud_support_alerts_sla/index.html: /stable/scylla-cloud/cloud-services/scylla-cloud-support-alerts-sla

View File

@@ -134,7 +134,7 @@ isolation policy for a specific table can be overridden by tagging the table
This section provides only a very brief introduction to Alternator's
design. A much more detailed document about the features of the DynamoDB
API and how they are, or could be, implemented in Scylla can be found in:
https://docs.google.com/document/d/1i4yjF5OSAazAY_-T8CBce9-2ykW4twx_E_Nt2zDoOVs
<https://docs.google.com/document/d/1i4yjF5OSAazAY_-T8CBce9-2ykW4twx_E_Nt2zDoOVs>
Almost all of Alternator's source code (except some initialization code)
can be found in the alternator/ subdirectory of Scylla's source code.

View File

@@ -26,7 +26,7 @@ request for this single URL to many different backend nodes. Such a
load-balancing setup is *not* included inside Alternator. You should either
set one up, or configure the client library to do the load balancing itself.
Instructions for doing this can be found in:
https://github.com/scylladb/alternator-load-balancing/
<https://github.com/scylladb/alternator-load-balancing/>
## Write isolation policies
@@ -125,7 +125,7 @@ All of this is not yet implemented in Alternator.
Scylla has an advanced and extensive monitoring framework for inspecting
and graphing hundreds of different metrics of Scylla's usage and performance.
Scylla's monitoring stack, based on Grafana and Prometheus, is described in
https://docs.scylladb.com/operating-scylla/monitoring/.
<https://docs.scylladb.com/operating-scylla/monitoring/>.
This monitoring stack is different from DynamoDB's offering - but Scylla's
is significantly more powerful and gives the user better insights on
the internals of the database and its performance.
@@ -160,7 +160,7 @@ experimental:
One thing that this implementation is still missing is that expiration
events appear in the Streams API as normal deletions - without the
distinctive marker on deletions which are really expirations.
https://github.com/scylladb/scylla/issues/5060
<https://github.com/scylladb/scylla/issues/5060>
* The DynamoDB Streams API for capturing change is supported, but still
considered experimental so needs to be enabled explicitly with the
@@ -172,12 +172,12 @@ experimental:
* While in DynamoDB data usually appears in the stream less than a second
after it was written, in Alternator Streams there is currently a 10
second delay by default.
https://github.com/scylladb/scylla/issues/6929
<https://github.com/scylladb/scylla/issues/6929>
* Some events are represented differently in Alternator Streams. For
example, a single PutItem is represented by a REMOVE + MODIFY event,
instead of just a single MODIFY or INSERT.
https://github.com/scylladb/scylla/issues/6930
https://github.com/scylladb/scylla/issues/6918
<https://github.com/scylladb/scylla/issues/6930>
<https://github.com/scylladb/scylla/issues/6918>
## Unimplemented API features
@@ -189,18 +189,18 @@ they should be easy to detect. Here is a list of these unimplemented features:
* Currently in Alternator, a GSI (Global Secondary Index) can only be added
to a table at table creation time. Unlike DynamoDB which also allows adding
a GSI (but not an LSI) to an existing table using an UpdateTable operation.
https://github.com/scylladb/scylla/issues/5022
<https://github.com/scylladb/scylla/issues/5022>
* GSI (Global Secondary Index) and LSI (Local Secondary Index) may be
configured to project only a subset of the base-table attributes to the
index. This option is not yet respected by Alternator - all attributes
are projected. This wastes some disk space when it is not needed.
https://github.com/scylladb/scylla/issues/5036
<https://github.com/scylladb/scylla/issues/5036>
* DynamoDB's new multi-item transaction feature (TransactWriteItems,
TransactGetItems) is not supported. Note that the older single-item
conditional updates feature are fully supported.
https://github.com/scylladb/scylla/issues/5064
<https://github.com/scylladb/scylla/issues/5064>
* Alternator does not yet support the DynamoDB API calls that control which
table is available in which data center (DC): CreateGlobalTable,
@@ -211,19 +211,19 @@ they should be easy to detect. Here is a list of these unimplemented features:
If a DC is added after a table is created, the table won't be visible from
the new DC and changing that requires a CQL "ALTER TABLE" statement to
modify the table's replication strategy.
https://github.com/scylladb/scylla/issues/5062
<https://github.com/scylladb/scylla/issues/5062>
* Recently DynamoDB added support, in addition to the DynamoDB Streams API,
also for the similar Kinesis Streams. Alternator doesn't support this yet,
and the related operations DescribeKinesisStreamingDestination,
DisableKinesisStreamingDestination, and EnableKinesisStreamingDestination.
https://github.com/scylladb/scylla/issues/8786
<https://github.com/scylladb/scylla/issues/8786>
* The on-demand backup APIs are not supported: CreateBackup, DescribeBackup,
DeleteBackup, ListBackups, RestoreTableFromBackup.
For now, users can use Scylla's existing backup solutions such as snapshots
or Scylla Manager.
https://github.com/scylladb/scylla/issues/5063
<https://github.com/scylladb/scylla/issues/5063>
* Continuous backup (the ability to restore any point in time) is also not
supported: UpdateContinuousBackups, DescribeContinuousBackups,
@@ -237,28 +237,28 @@ they should be easy to detect. Here is a list of these unimplemented features:
BillingMode option is ignored by Alternator, and if a provisioned throughput
is specified, it is ignored. Requests which are asked to return the amount
of provisioned throughput used by the request do not return it in Alternator.
https://github.com/scylladb/scylla/issues/5068
<https://github.com/scylladb/scylla/issues/5068>
* DAX (DynamoDB Accelerator), an in-memory cache for DynamoDB, is not
available in for Alternator. Anyway, it should not be necessary - Scylla's
internal cache is already rather advanced and there is no need to place
another cache in front of the it. We wrote more about this here:
https://www.scylladb.com/2017/07/31/database-caches-not-good/
<https://www.scylladb.com/2017/07/31/database-caches-not-good/>
* The DescribeTable is missing information about creation data and size
estimates, and also part of the information about indexes enabled on
the table.
https://github.com/scylladb/scylla/issues/5013
https://github.com/scylladb/scylla/issues/5026
https://github.com/scylladb/scylla/issues/7550
https://github.com/scylladb/scylla/issues/7551
<https://github.com/scylladb/scylla/issues/5013>
<https://github.com/scylladb/scylla/issues/5026>
<https://github.com/scylladb/scylla/issues/7550>
<https://github.com/scylladb/scylla/issues/7551 >
* The recently-added PartiQL syntax (SQL-like SELECT/UPDATE/INSERT/DELETE
expressions) and the new operations ExecuteStatement, BatchExecuteStatement
and ExecuteTransaction is not yet supported.
A user that is interested in an SQL-like syntax can consider using Scylla's
CQL protocol instead.
https://github.com/scylladb/scylla/issues/8787
<https://github.com/scylladb/scylla/issues/8787>
* As mentioned above, Alternator has its own powerful monitoring framework,
which is different from AWS's. In particular, the operations
@@ -266,8 +266,8 @@ they should be easy to detect. Here is a list of these unimplemented features:
UpdateContributorInsights that configure Amazon's "CloudWatch Contributor
Insights" are not yet supported. Scylla has different ways to retrieve the
same information, such as which items were accessed most often.
https://github.com/scylladb/scylla/issues/8788
<https://github.com/scylladb/scylla/issues/8788>
* Alternator does not support the new DynamoDB feature "export to S3",
and its operations DescribeExport, ExportTableToPointInTime, ListExports.
https://github.com/scylladb/scylla/issues/8789
<https://github.com/scylladb/scylla/issues/8789>

View File

@@ -4,70 +4,65 @@ Raft Consensus Algorithm in ScyllaDB
Introduction
--------------
ScyllaDB was originally designed, following Apache Cassandra, to use gossip for topology and schema updates and the Paxos consensus algorithm for
strong data consistency (:doc:`LWT </using-scylla/lwt>`). To achieve stronger consistency without performance penalty, ScyllaDB 5.0 is turning to Raft - a consensus algorithm designed as an alternative to both gossip and Paxos.
ScyllaDB was originally designed, following Apache Cassandra, to use gossip for topology and schema updates and the Paxos consensus algorithm for
strong data consistency (:doc:`LWT </using-scylla/lwt>`). To achieve stronger consistency without performance penalty, ScyllaDB 5.x has turned to Raft - a consensus algorithm designed as an alternative to both gossip and Paxos.
Raft is a consensus algorithm that implements a distributed, consistent, replicated log across members (nodes). Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines.
Raft uses a heartbeat mechanism to trigger a leader election. All servers start as followers and remain in the follower state as long as they receive valid RPCs (heartbeat) from a leader or candidate. A leader sends periodic heartbeats to all followers to maintain his authority (leadership). Suppose a follower receives no communication over a period called the election timeout. In that case, it assumes no viable leader and begins an election to choose a new leader.
Leader selection is described in detail in the `raft paper <https://raft.github.io/raft.pdf>`_.
Leader selection is described in detail in the `Raft paper <https://raft.github.io/raft.pdf>`_.
Scylla 5.0 uses Raft to maintain schema updates in every node (see below). Any schema update, like ALTER, CREATE or DROP TABLE, is first committed as an entry in the replicated Raft log, and, once stored on most replicas, applied to all nodes **in the same order**, even in the face of a node or network failures.
ScyllaDB 5.x may use Raft to maintain schema updates in every node (see below). Any schema update, like ALTER, CREATE or DROP TABLE, is first committed as an entry in the replicated Raft log, and, once stored on most replicas, applied to all nodes **in the same order**, even in the face of a node or network failures.
Following Scylla 5.x releases will use Raft to guarantee consistent topology updates similarly.
Following ScyllaDB 5.x releases will use Raft to guarantee consistent topology updates similarly.
.. _raft-quorum-requirement:
Quorum Requirement
-------------------
Raft requires at least a quorum of nodes in a cluster to be available. If multiple nodes fail
and the quorum is lost, the cluster is unavailable for schema updates. See :ref:`Handling Failures <raft-handliing-failures>`
Raft requires at least a quorum of nodes in a cluster to be available. If multiple nodes fail
and the quorum is lost, the cluster is unavailable for schema updates. See :ref:`Handling Failures <raft-handling-failures>`
for information on how to handle failures.
Upgrade Considerations for SyllaDB 5.0 and Later
==================================================
Note that when you have a two-DC cluster with the same number of nodes in each DC, the cluster will lose the quorum if one
Note that when you have a two-DC cluster with the same number of nodes in each DC, the cluster will lose the quorum if one
of the DCs is down.
**We recommend configuring three DCs per cluster to ensure that the cluster remains available and operational when one DC is down.**
Enabling Raft
---------------
Enabling Raft in ScyllaDB 5.0
===============================
Enabling Raft in ScyllaDB 5.0 and 5.1
=====================================
.. note::
In ScyllaDB 5.0:
.. warning::
In ScyllaDB 5.0 and 5.1, Raft is an experimental feature.
* Raft is an experimental feature.
* Raft implementation only covers safe schema changes. See :ref:`Safe Schema Changes with Raft <raft-schema-changes>`.
It is not possible to enable Raft in an existing cluster in ScyllaDB 5.0 and 5.1.
In order to have a Raft-enabled cluster in these versions, you must create a new cluster with Raft enabled from the start.
If you are creating a new cluster, add ``raft`` to the list of experimental features in your ``scylla.yaml`` file:
.. warning::
.. code-block:: yaml
experimental_features:
- raft
**Do not** use Raft in production clusters in ScyllaDB 5.0 and 5.1. Such clusters won't be able to correctly upgrade to ScyllaDB 5.2.
If you upgrade to ScyllaDB 5.0 from an earlier version, perform a :doc:`rolling restart </operating-scylla/procedures/config-change/rolling-restart/>`
updating the ``scylla.yaml`` file for **each node** in the cluster to enable the experimental Raft feature:
.. code-block:: yaml
experimental_features:
- raft
When all the nodes in the cluster and updated and restarted, the cluster will begin to use Raft for schema changes.
Use Raft only for testing and experimentation in clusters which can be thrown away.
.. warning::
Once enabled, Raft cannot be disabled on your cluster. The cluster nodes will fail to restart if you remove the Raft feature.
Verifying that Raft Is Enabled
When creating a new cluster, add ``raft`` to the list of experimental features in your ``scylla.yaml`` file:
.. code-block:: yaml
experimental_features:
- raft
Verifying that Raft is enabled
===============================
You can verify that Raft is enabled on your cluster in one of the following ways:
@@ -100,23 +95,23 @@ Safe Schema Changes with Raft
-------------------------------
In ScyllaDB, schema is based on :doc:`Data Definition Language (DDL) </cql/ddl>`. In earlier ScyllaDB versions, schema changes were tracked via the gossip protocol, which might lead to schema conflicts if the updates are happening concurrently.
Implementing Raft eliminates schema conflicts and allows full automation of DDL changes under any conditions, as long as a quorum
Implementing Raft eliminates schema conflicts and allows full automation of DDL changes under any conditions, as long as a quorum
of nodes in the cluster is available. The following examples illustrate how Raft provides the solution to problems with schema changes.
* A network partition may lead to a split-brain case, where each subset of nodes has a different version of the schema.
With Raft, after a network split, the majority of the cluster can continue performing schema changes, while the minority needs to wait until it can rejoin the majority. Data manipulation statements on the minority can continue unaffected, provided the :ref:`quorum requirement <raft-quorum-requirement>` is satisfied.
* Two or more conflicting schema updates are happening at the same time. For example, two different columns with the same definition are simultaneously added to the cluster. There is no effective way to resolve the conflict - the cluster will employ the schema with the most recent timestamp, but changes related to the shadowed table will be lost.
* Two or more conflicting schema updates are happening at the same time. For example, two different columns with the same definition are simultaneously added to the cluster. There is no effective way to resolve the conflict - the cluster will employ the schema with the most recent timestamp, but changes related to the shadowed table will be lost.
With Raft, concurrent schema changes are safe.
With Raft, concurrent schema changes are safe.
In summary, Raft makes schema changes safe, but it requires that a quorum of nodes in the cluster is available.
.. _raft-handliing-failures:
.. _raft-handling-failures:
Handling Failures
------------------
@@ -175,7 +170,7 @@ Examples
* - 1-4 nodes
- Schema updates are possible and safe.
- Try restarting the nodes. If the nodes are dead, :doc:`replace them with new nodes </operating-scylla/procedures/cluster-management/replace-dead-node-or-more/>`.
* - 1 DC
* - 1 DC
- Schema updates are possible and safe.
- When the DC comes back online, try restarting the nodes in the cluster. If the nodes are dead, :doc:`add 3 new nodes in a new region </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`.
* - 2 DCs

View File

@@ -1,20 +1,25 @@
:term:`Sorted Strings Table (SSTable)<SSTable>` is the persistent file format used by Scylla and Apache Cassandra. SSTable is saved as a persistent, ordered, immutable set of files on disk.
:term:`Sorted Strings Table (SSTable)<SSTable>` is the persistent file format used by ScyllaDB and Apache Cassandra. SSTable is saved as a persistent, ordered, immutable set of files on disk.
Immutable means SSTables are never modified; they are created by a MemTable flush and are deleted by a compaction.
The location of Scylla SSTables is specified in scylla.yaml ``data_file_directories`` parameter (default location: ``/var/lib/scylla/data``).
The location of ScyllaDB SSTables is specified in scylla.yaml ``data_file_directories`` parameter (default location: ``/var/lib/scylla/data``).
SSTable 3.0 (mc format) is more efficient and requires less disk space than the SSTable 2.x. SSTable version support is as follows:
SSTable 3.x is more efficient and requires less disk space than the SSTable 2.x.
SSTable Version Support
------------------------
.. list-table::
:widths: 33 33 33
:header-rows: 1
* - SSTable Version
- Scylla Enterprise Version
- Scylla Open Source Version
- ScyllaDB Enterprise Version
- ScyllaDB Open Source Version
* - 3.x ('me')
- 2022.2
- 5.1 and above
* - 3.x ('md')
- 2021.1
- 4.3 and above
- 4.3, 4.4, 4.5, 4.6, 5.0
* - 3.0 ('mc')
- 2019.1, 2020.1
- 3.x, 4.1, 4.2

View File

@@ -1,5 +1,5 @@
Scylla SSTable - 3.x
====================
ScyllaDB SSTable - 3.x
=======================
.. toctree::
:hidden:
@@ -12,21 +12,24 @@ Scylla SSTable - 3.x
.. include:: ../_common/sstable_what_is.rst
* In Scylla 3.1 and above, mc format is enabled by default.
* In ScyllaDB 5.1 and above, the ``me`` format is enabled by default.
* In ScyllaDB 4.3 to 5.0, the ``md`` format is enabled by default.
* In ScyllaDB 3.1 to 4.2, the ``mc`` format is enabled by default.
* In ScyllaDB 3.0, the ``mc`` format is disabled by default. You can enable it by adding the ``enable_sstables_mc_format`` parameter set to ``true`` in the ``scylla.yaml`` file. For example:
.. code-block:: shell
enable_sstables_mc_format: true
* In Scylla 3.0, mc format is disabled by default and can be enabled by adding the ``enable_sstables_mc_format`` parameter as 'true' in ``scylla.yaml`` file.
.. REMOVE IN FUTURE VERSIONS - Remove the note above in version 5.2.
For example:
Additional Information
-------------------------
.. code-block:: shell
enable_sstables_mc_format: true
For more information on Scylla 3.x SSTable formats, see below:
For more information on ScyllaDB 3.x SSTable formats, see below:
* :doc:`SSTable 3.0 Data File Format <sstables-3-data-file-format>`
* :doc:`SSTable 3.0 Statistics <sstables-3-statistics>`
* :doc:`SSTable 3.0 Summary <sstables-3-summary>`
* :doc:`SSTable 3.0 Index <sstables-3-index>`
* :doc:`SSTable 3.0 Format in Scylla <sstable-format>`
* :doc:`SSTable 3.0 Format in ScyllaDB <sstable-format>`

View File

@@ -28,8 +28,13 @@ Table of contents mc-1-big-TOC.txt
This document focuses on the data file format but also refers to other components in parts where information stored in them affects the way we read/write the data file.
Note that the file on-disk format applies both to the "mc" and "md" SSTable format versions.
The "md" format only fixed the semantics of the (min|max)_clustering_key fields in the SSTable Statistics file, which are now valid for describing the accurate range of clustering prefixes present in the SSTable.
Note that the file on-disk format applies to all "m*" SSTable format versions ("mc", "md", and "me").
* The "md" format only fixed the semantics of the ``(min|max)_clustering_key`` fields in the SSTable Statistics file,
which are now valid for describing the accurate range of clustering prefixes present in the SSTable.
* The "me" format added the ``host_id`` of the host writing the SStable to the SSTable Statistics file.
It is used to qualify the commit log replay position that is also stored in the SSTable Statistics file.
See :doc:`SSTables 3.0 Statistics File Format </architecture/sstable/sstable3/sstables-3-statistics>` for more details.
Overview

View File

@@ -175,6 +175,13 @@ Whole entry
// It contains only one commit log position interval - [lower bound of commit log, upper bound of commit log].
array<be32<int32_t>, commit_log_interval> commit_log_intervals;
// Versions MC and MD of SSTable 3.x format end here.
// UUID of the host that wrote the SSTable.
// Qualifies all commitlog positions in the SSTable Statistics file.
UUID host_id;
}
using clustering_bound = array<be32<int32_t>, clustering_column>;

View File

@@ -21,8 +21,6 @@
Appendices
----------
.. include:: /rst_include/cql-version-index.rst
.. _appendix-A:
Appendix A: CQL Keywords

View File

@@ -1,6 +1,6 @@
# ScyllaDB CQL Extensions
Scylla extends the CQL language to provide a few extra features. This document
ScyllaDB extends the CQL language to provide a few extra features. This document
lists those extensions.
## BYPASS CACHE clause
@@ -109,7 +109,7 @@ Storage options can be inspected by checking the new system schema table: `syste
A special statement is dedicated for pruning ghost rows from materialized views.
Ghost row is an inconsistency issue which manifests itself by having rows
in a materialized view which do not correspond to any base table rows.
Such inconsistencies should be prevented altogether and Scylla is striving to avoid
Such inconsistencies should be prevented altogether and ScyllaDB is striving to avoid
them, but *if* they happen, this statement can be used to restore a materialized view
to a fully consistent state without rebuilding it from scratch.
@@ -133,21 +133,35 @@ token ranges.
## Synchronous materialized views
Materialized view updates can be applied synchronously (with errors propagated
back to the user) or asynchronously, in the background. Historically, in order
to use synchronous updates, the materialized view had to be local,
which could be achieved e.g. by using the same partition key definition
as the one present in the base table.
Scylla also allows explicitly marking the view as synchronous, which forces
all its view updates to be updated synchronously. Such views tend to reduce
observed availability of the base table, because a base table write would only
succeed if all synchronous view updates also succeed. On the other hand,
failed view updates would be detected immediately, and appropriate action
can be taken (e.g. pruning the materialized view, as mentioned in the paragraph
above).
Usually, when a table with materialized views is updated, the update to the
views happens _asynchronously_, i.e., in the background. This means that
the user cannot know when the view updates have all finished - or even be
sure that they succeeded.
In order to mark a materialized view as synchronous, one can use the following
syntax:
However, there are circumstances where ScyllaDB does view updates
_synchronously_ - i.e., the user's write returns only after the views
were updated. This happens when the materialized-view replica is on the
same node as the base-table replica. For example, if the base table and
the view have the same partition key. Note that only ScyllaDB guarantees
synchronous view updates in this case - they are asynchronous in Cassandra.
ScyllaDB also allows explicitly marking a view as synchronous. When a view
is marked synchronous, base-table updates will wait for that view to be
updated before returning. A base table may have multiple views marked
synchronous, and will wait for all of them. The consistency level of a
write applies to synchronous views as well as to the base table: For
example, writing with QUORUM consistency level returns only after a
quorum of the base-table replicas were updated *and* also a quorum of
each synchronous view table was also updated.
Synchronous views tend to reduce the observed availability of the base table,
because a base-table write would only succeed if enough synchronous view
updates also succeed. On the other hand, failed view updates would be
detected immediately, and appropriate action can be taken, such as retrying
the write or pruning the materialized view (as mentioned in the previous
section). This can improve the consistency of the base table with its views.
To create a new materialized view with synchronous updates, use:
```cql
CREATE MATERIALIZED VIEW main.mv
@@ -157,12 +171,18 @@ CREATE MATERIALIZED VIEW main.mv
WITH synchronous_updates = true;
```
To make an existing materialized view synchronous, use:
```cql
ALTER MATERIALIZED VIEW main.mv WITH synchronous_updates = true;
```
Synchronous updates can also be dynamically turned off by setting
the value of `synchronous_updates` to `false`.
To return a materialized view to the default behavior (which, as explained
above, _usually_ means asynchronous updates), use:
```cql
ALTER MATERIALIZED VIEW main.mv WITH synchronous_updates = false;
```
### Synchronous global secondary indexes
@@ -261,7 +281,7 @@ that the rate of requests exceeds configured limit, the cluster will start
rejecting some of them in order to bring the throughput back to the configured
limit. Rejected requests are less costly which can help reduce overload.
_NOTE_: Due to Scylla's distributed nature, tracking per-partition request rates
_NOTE_: Due to ScyllaDB's distributed nature, tracking per-partition request rates
is not perfect and the actual rate of accepted requests may be higher up to
a factor of keyspace's `RF`. This feature should not be used to enforce precise
limits but rather serve as an overload protection feature.

View File

@@ -3,9 +3,6 @@
Data Definition
===============
.. include:: /rst_include/cql-version-index.rst
CQL stores data in *tables*, whose schema defines the layout of said data in the table, and those tables are grouped in
*keyspaces*. A keyspace defines a number of options that apply to all the tables it contains, most prominently of
which is the replication strategy used by the keyspace. An application can have only one keyspace. However, it is also possible to
@@ -634,7 +631,7 @@ A table supports the following options:
- map
- see below
- :ref:`Compaction options <cql-compaction-options>`
* - ``compaction``
* - ``compression``
- map
- see below
- :ref:`Compression options <cql-compression-options>`
@@ -749,9 +746,7 @@ CDC options
.. versionadded:: 3.2 Scylla Open Source
The following options are to be used with Change Data Capture. Available as an experimental feature from Scylla Open Source 3.2.
To use this feature, you must enable the :ref:`experimental tag <yaml_enabling_experimental_features>` in the scylla.yaml.
The following options can be used with Change Data Capture.
+---------------------------+-----------------+------------------------------------------------------------------------------------------------------------------------+
| option | default | description |
@@ -863,6 +858,18 @@ Other considerations:
- Adding new columns (see ``ALTER TABLE`` below) is a constant time operation. There is thus no need to try to
anticipate future usage when creating a table.
.. _ddl-per-parition-rate-limit:
Limiting the rate of requests per partition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can limit the read rates and writes rates into a partition by applying
a ScyllaDB CQL extension to the CREATE TABLE or ALTER TABLE statements.
See `Per-partition rate limit <https://docs.scylladb.com/stable/cql/cql-extensions.html#per-partition-rate-limit>`_
for details.
.. REMOVE IN FUTURE VERSIONS - Remove the URL above (temporary solution) and replace it with a relative link (once the solution is applied).
.. _alter-table-statement:
ALTER TABLE
@@ -921,6 +928,7 @@ The ``ALTER TABLE`` statement can:
The same note applies to the set of ``compression`` sub-options.
- Change or add any of the ``Encryption options`` above.
- Change or add any of the :ref:`CDC options <cdc-options>` above.
- Change or add per-partition rate limits. See :ref:`Limiting the rate of requests per partition <ddl-per-parition-rate-limit>`.
.. warning:: Dropping a column assumes that the timestamps used for the value of this column are "real" timestamp in
microseconds. Using "real" timestamps in microseconds is the default is and is **strongly** recommended, but as
@@ -930,7 +938,6 @@ The ``ALTER TABLE`` statement can:
.. warning:: Once a column is dropped, it is allowed to re-add a column with the same name as the dropped one
**unless** the type of the dropped column was a (non-frozen) column (due to an internal technical limitation).
.. _drop-table-statement:
DROP TABLE

View File

@@ -22,8 +22,6 @@
Definitions
-----------
.. include:: /rst_include/cql-version-index.rst
.. _conventions:
Conventions

View File

@@ -3,8 +3,6 @@
Data Manipulation
-----------------
.. include:: /rst_include/cql-version-index.rst
This section describes the statements supported by CQL to insert, update, delete, and query data.
:ref:`SELECT <select-statement>`
@@ -99,11 +97,12 @@ alternatively, of the wildcard character (``*``) to select all the columns defin
Selectors
`````````
A :token:`selector` can be one of:
A :token:`selector` can be one of the following:
- A column name of the table selected to retrieve the values for that column.
- A casting, which allows you to convert a nested selector to a (compatible) type.
- A function call, where the arguments are selector themselves.
- A call to the :ref:`COUNT function <count-function>`, which counts all non-null results.
Aliases
```````

View File

@@ -21,7 +21,6 @@
.. _cql-functions:
.. Need some intro for UDF and native functions in general and point those to it.
.. _udfs:
.. _native-functions:
Functions
@@ -33,13 +32,15 @@ CQL supports two main categories of functions:
- The :ref:`aggregate functions <aggregate-functions>`, which are used to aggregate multiple rows of results from a
``SELECT`` statement.
.. In both cases, CQL provides a number of native "hard-coded" functions as well as the ability to create new user-defined
.. functions.
In both cases, CQL provides a number of native "hard-coded" functions as well as the ability to create new user-defined
functions.
.. .. note:: By default, the use of user-defined functions is disabled by default for security concerns (even when
.. enabled, the execution of user-defined functions is sandboxed and a "rogue" function should not be allowed to do
.. evil, but no sandbox is perfect so using user-defined functions is opt-in). See the ``enable_user_defined_functions``
.. in ``scylla.yaml`` to enable them.
.. note:: Although user-defined functions are sandboxed, protecting the system from a "rogue" function, user-defined functions are disabled by default for extra security.
See the ``enable_user_defined_functions`` in ``scylla.yaml`` to enable them.
Additionally, user-defined functions are still experimental and need to be explicitly enabled by adding ``udf`` to the list of
``experimental_features`` configuration options in ``scylla.yaml``, or turning on the ``experimental`` flag.
See :ref:`Enabling Experimental Features <yaml_enabling_experimental_features>` for details.
.. A function is identifier by its name:
@@ -60,11 +61,11 @@ Native functions
Cast
````
Supported starting from Scylla version 2.1
Supported starting from ScyllaDB version 2.1
The ``cast`` function can be used to convert one native datatype to another.
The following table describes the conversions supported by the ``cast`` function. Scylla will silently ignore any cast converting a cast datatype into its own datatype.
The following table describes the conversions supported by the ``cast`` function. ScyllaDB will silently ignore any cast converting a cast datatype into its own datatype.
=============== =======================================================================================================
From To
@@ -228,6 +229,65 @@ A number of functions are provided to “convert” the native types into binary
takes a 64-bit ``blob`` argument and converts it to a ``bigint`` value. For example, ``bigintAsBlob(3)`` is
``0x0000000000000003`` and ``blobAsBigint(0x0000000000000003)`` is ``3``.
.. _udfs:
User-defined functions :label-caution:`Experimental`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
User-defined functions (UDFs) execute user-provided code in ScyllaDB. Supported languages are currently Lua and WebAssembly.
UDFs are part of the ScyllaDB schema and are automatically propagated to all nodes in the cluster.
UDFs can be overloaded, so that multiple UDFs with different argument types can have the same function name, for example::
CREATE FUNCTION sample ( arg int ) ...;
CREATE FUNCTION sample ( arg text ) ...;
When calling a user-defined function, arguments can be literals or terms. Prepared statement placeholders can be used, too.
CREATE FUNCTION statement
`````````````````````````
Creating a new user-defined function uses the ``CREATE FUNCTION`` statement. For example::
CREATE OR REPLACE FUNCTION div(dividend double, divisor double)
RETURNS NULL ON NULL INPUT
RETURNS double
LANGUAGE LUA
AS 'return dividend/divisor;';
``CREATE FUNCTION`` with the optional ``OR REPLACE`` keywords creates either a function
or replaces an existing one with the same signature. A ``CREATE FUNCTION`` without ``OR REPLACE``
fails if a function with the same signature already exists. If the optional ``IF NOT EXISTS``
keywords are used, the function will only be created only if another function with the same
signature does not exist. ``OR REPLACE`` and ``IF NOT EXISTS`` cannot be used together.
Behavior for null input values must be defined for each function:
* ``RETURNS NULL ON NULL INPUT`` declares that the function will always return null (without being executed) if any of the input arguments is null.
* ``CALLED ON NULL INPUT`` declares that the function will always be executed.
Function Signature
``````````````````
Signatures are used to distinguish individual functions. The signature consists of a fully-qualified function name of the <keyspace>.<function_name> and a concatenated list of all the argument types.
Note that keyspace names, function names and argument types are subject to the default naming conventions and case-sensitivity rules.
Functions belong to a keyspace; if no keyspace is specified, the current keyspace is used. User-defined functions are not allowed in the system keyspaces.
DROP FUNCTION statement
```````````````````````
Dropping a function uses the ``DROP FUNCTION`` statement. For example::
DROP FUNCTION myfunction;
DROP FUNCTION mykeyspace.afunction;
DROP FUNCTION afunction ( int );
DROP FUNCTION afunction ( text );
You must specify the argument types of the function, the arguments_signature, in the drop command if there are multiple overloaded functions with the same name but different signatures.
``DROP FUNCTION`` with the optional ``IF EXISTS`` keywords drops a function if it exists, but does not throw an error if it doesnt.
.. _aggregate-functions:
Aggregate functions
@@ -261,6 +321,10 @@ It also can be used to count the non-null value of a given column::
SELECT COUNT (scores) FROM plays;
.. note::
Counting all rows in a table may be time-consuming and exceed the default timeout. In such a case,
see :doc:`Counting all rows in a table is slow </kb/count-all-rows>` for instructions.
Max and Min
```````````
@@ -286,6 +350,59 @@ instance::
.. _user-defined-aggregates-functions:
User-defined aggregates (UDAs) :label-caution:`Experimental`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
User-defined aggregates allow the creation of custom aggregate functions. User-defined aggregates can be used in SELECT statement.
Each aggregate requires an initial state of type ``STYPE`` defined with the ``INITCOND`` value (default value: ``null``). The first argument of the state function must have type STYPE. The remaining arguments of the state function must match the types of the user-defined aggregate arguments. The state function is called once for each row, and the value returned by the state function becomes the new state. After all rows are processed, the optional FINALFUNC is executed with the last state value as its argument.
The ``STYPE`` value is mandatory in order to distinguish possibly overloaded versions of the state and/or final function, since the overload can appear after creation of the aggregate.
A complete working example for user-defined aggregates (assuming that a keyspace has been selected using the ``USE`` statement)::
CREATE FUNCTION accumulate_len(acc tuple<bigint,bigint>, a text)
RETURNS NULL ON NULL INPUT
RETURNS tuple<bigint,bigint>
LANGUAGE lua as 'return {acc[1] + 1, acc[2] + #a}';
CREATE OR REPLACE FUNCTION present(res tuple<bigint,bigint>)
RETURNS NULL ON NULL INPUT
RETURNS text
LANGUAGE lua as
'return "The average string length is " .. res[2]/res[1] .. "!"';
CREATE OR REPLACE AGGREGATE avg_length(text)
SFUNC accumulate_len
STYPE tuple<bigint,bigint>
FINALFUNC present
INITCOND (0,0);
CREATE AGGREGATE statement
``````````````````````````
The ``CREATE AGGREGATE`` command with the optional ``OR REPLACE`` keywords creates either an aggregate or replaces an existing one with the same signature. A ``CREATE AGGREGATE`` without ``OR REPLACE`` fails if an aggregate with the same signature already exists. The ``CREATE AGGREGATE`` command with the optional ``IF NOT EXISTS`` keywords creates an aggregate if it does not already exist. The ``OR REPLACE`` and ``IF NOT EXISTS`` phrases cannot be used together.
The ``STYPE`` value defines the type of the state value and must be specified. The optional ``INITCOND`` defines the initial state value for the aggregate; the default value is null. A non-null ``INITCOND`` must be specified for state functions that are declared with ``RETURNS NULL ON NULL INPUT``.
The ``SFUNC`` value references an existing function to use as the state-modifying function. The first argument of the state function must have type ``STYPE``. The remaining arguments of the state function must match the types of the user-defined aggregate arguments. The state function is called once for each row, and the value returned by the state function becomes the new state. State is not updated for state functions declared with ``RETURNS NULL ON NULL INPUT`` and called with null. After all rows are processed, the optional ``FINALFUNC`` is executed with last state value as its argument. It must take only one argument with type ``STYPE``, but the return type of the ``FINALFUNC`` may be a different type. A final function declared with ``RETURNS NULL ON NULL INPUT`` means that the aggregates return value will be null, if the last state is null.
If no ``FINALFUNC`` is defined, the overall return type of the aggregate function is ``STYPE``. If a ``FINALFUNC`` is defined, it is the return type of that function.
DROP AGGREGATE statement
````````````````````````
Dropping an user-defined aggregate function uses the DROP AGGREGATE statement. For example::
DROP AGGREGATE myAggregate;
DROP AGGREGATE myKeyspace.anAggregate;
DROP AGGREGATE someAggregate ( int );
DROP AGGREGATE someAggregate ( text );
The ``DROP AGGREGATE`` statement removes an aggregate created using ``CREATE AGGREGATE``. You must specify the argument types of the aggregate to drop if there are multiple overloaded aggregates with the same name but a different signature.
The ``DROP AGGREGATE`` command with the optional ``IF EXISTS`` keywords drops an aggregate if it exists, and does nothing if a function with the signature does not exist.
.. include:: /rst_include/apache-cql-return-index.rst
.. include:: /rst_include/apache-copyrights.rst
.. include:: /rst_include/apache-copyrights.rst

View File

@@ -24,9 +24,6 @@ Materialized Views
------------------
Production ready in Scylla Open Source 3.0 and Scylla Enterprise 2019.1.x
.. include:: /rst_include/cql-version-index.rst
Materialized views names are defined by:
.. code-block:: cql

View File

@@ -8,12 +8,6 @@ Data Types
.. _UUID: https://en.wikipedia.org/wiki/Universally_unique_identifier
.. include:: /rst_include/cql-version-index.rst
CQL is a typed language and supports a rich set of data types, including :ref:`native types <native-types>` and
:ref:`collection types <collections>`.

View File

@@ -8,47 +8,47 @@ Getting Started
install-scylla/index
configure
requirements
Migrate to Scylla </using-scylla/migrate-scylla>
Migrate to ScyllaDB </using-scylla/migrate-scylla>
Integration Solutions </using-scylla/integrations/index>
tutorials
.. panel-box::
:title: Scylla Requirements
:title: ScyllaDB Requirements
:id: "getting-started"
:class: my-panel
* :doc:`Scylla System Requirements Guide</getting-started/system-requirements/>`
* :doc:`ScyllaDB System Requirements Guide</getting-started/system-requirements/>`
* :doc:`OS Support by Platform and Version</getting-started/os-support/>`
.. panel-box::
:title: Install and Configure Scylla
:title: Install and Configure ScyllaDB
:id: "getting-started"
:class: my-panel
* `Install Scylla (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/>`_ - Links to the ScyllaDB Download Center
* `Install ScyllaDB (Binary Packages, Docker, or EC2) <https://www.scylladb.com/download/#core>`_ - Links to the ScyllaDB Download Center
* :doc:`Configure Scylla</getting-started/system-configuration/>`
* :doc:`Run Scylla in a Shared Environment </getting-started/scylla-in-a-shared-environment>`
* :doc:`Create a Scylla Cluster - Single Data Center (DC) </operating-scylla/procedures/cluster-management/create-cluster/>`
* :doc:`Create a Scylla Cluster - Multi Data Center (DC) </operating-scylla/procedures/cluster-management/create-cluster-multidc/>`
* :doc:`Configure ScyllaDB </getting-started/system-configuration/>`
* :doc:`Run ScyllaDB in a Shared Environment </getting-started/scylla-in-a-shared-environment>`
* :doc:`Create a ScyllaDB Cluster - Single Data Center (DC) </operating-scylla/procedures/cluster-management/create-cluster/>`
* :doc:`Create a ScyllaDB Cluster - Multi Data Center (DC) </operating-scylla/procedures/cluster-management/create-cluster-multidc/>`
.. panel-box::
:title: Develop Applications for Scylla
:title: Develop Applications for ScyllaDB
:id: "getting-started"
:class: my-panel
* :doc:`Scylla Drivers</using-scylla/drivers/index>`
* `Get Started Lesson on Scylla University <https://university.scylladb.com/courses/scylla-essentials-overview/lessons/quick-wins-install-and-run-scylla/>`_
* :doc:`ScyllaDB Drivers</using-scylla/drivers/index>`
* `Get Started Lesson on ScyllaDB University <https://university.scylladb.com/courses/scylla-essentials-overview/lessons/quick-wins-install-and-run-scylla/>`_
* :doc:`CQL Reference </cql/index>`
* :doc:`cqlsh - the CQL shell </cql/cqlsh/>`
.. panel-box::
:title: Use Scylla with Third-party Solutions
:title: Use ScyllaDB with Third-party Solutions
:id: "getting-started"
:class: my-panel
* :doc:`Migrate to Scylla </using-scylla/migrate-scylla>` - How to migrate your current database to Scylla
* :doc:`Integrate with Scylla </using-scylla/integrations/index>` - Integration solutions with Scylla
* :doc:`Migrate to ScyllaDB </using-scylla/migrate-scylla>` - How to migrate your current database to Scylla
* :doc:`Integrate with ScyllaDB </using-scylla/integrations/index>` - Integration solutions with Scylla
.. panel-box::

View File

@@ -20,7 +20,7 @@ Install Scylla
Keep your versions up-to-date. The two latest versions are supported. Also always install the latest patches for your version.
* Download and install Scylla Server, Drivers and Tools in `Scylla Download Center <https://www.scylladb.com/download/#server/>`_
* Download and install ScyllaDB Server, Drivers and Tools in `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_
* :doc:`ScyllaDB Web Installer for Linux <scylla-web-installer>`
* :doc:`Scylla Unified Installer (relocatable executable) <unified-installer>`
* :doc:`Air-gapped Server Installation <air-gapped-install>`

View File

@@ -4,7 +4,7 @@ ScyllaDB Web Installer for Linux
ScyllaDB Web Installer is a platform-agnostic installation script you can run with ``curl`` to install ScyllaDB on Linux.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#server>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
Prerequisites
--------------

View File

@@ -1,81 +1,93 @@
OS Support by Platform and Version
==================================
The following matrix shows which Operating Systems, Platforms, and Containers / Instance Engines are supported with which versions of Scylla.
The following matrix shows which Operating Systems, Platforms, and Containers / Instance Engines are supported with which versions of ScyllaDB.
Scylla requires a fix to the XFS append introduced in kernel 3.15 (back-ported to 3.10 in RHEL/CentOS).
Scylla will not run with earlier kernel versions. Details in `Scylla issue 885 <https://github.com/scylladb/scylla/issues/885>`_.
ScyllaDB requires a fix to the XFS append introduced in kernel 3.15 (back-ported to 3.10 in RHEL/CentOS).
ScyllaDB will not run with earlier kernel versions. Details in `ScyllaDB issue 885 <https://github.com/scylladb/scylla/issues/885>`_.
.. REMOVE IN FUTURE VERSIONS - Remove information about versions from the notes below in version 5.2.
.. note::
**Supported Architecture**
Scylla Open Source supports x86_64 for all versions and aarch64 starting from Scylla 4.6 and nightly build. In particular, aarch64 support includes AWS EC2 Graviton.
For Scylla Open Source **4.5** and later, the recommended OS and Scylla AMI/IMage OS is Ubuntu 20.04.4 LTS.
ScyllaDB Open Source supports x86_64 for all versions and AArch64 starting from ScyllaDB 4.6 and nightly build. In particular, aarch64 support includes AWS EC2 Graviton.
Scylla Open Source
-------------------
ScyllaDB Open Source
----------------------
.. note:: For Enterprise versions **prior to** 4.6, the recommended OS and Scylla AMI/Image OS is CentOS 7.
.. note::
For Scylla Open Source versions **4.6 and later**, the recommended OS and Scylla AMI/Image OS is Ubuntu 20.04.
Recommended OS and ScyllaDB AMI/Image OS for ScyllaDB Open Source:
- Ubuntu 20.04 for versions 4.6 and later.
- CentOS 7 for versions earlier than 4.6.
+--------------------------+----------------------------------+-----------------------------+-------------+
| Platform | Ubuntu | Debian | Centos/RHEL |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| Scylla Version / Version | 14.04| 16.04| 18.04|20.04 |22.04 | 8 | 9 | 10 | 11 | 7 | 8 |
+==========================+======+======+======+======+======+======+======+=======+=======+======+======+
| 5.0 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 4.6 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 4.5 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 4.4 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 4.3 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 4.2 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 4.1 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 4.0 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |x| | |x| | |v| | |x| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 3.x | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |x| | |x| | |v| | |x| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 2.3 | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |x| | |v| | |x| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
| 2.2 | |v| | |v| | |x| | |x| | |x| | |v| | |x| | |x| | |x| | |v| | |x| |
+--------------------------+------+------+------+------+------+------+------+-------+-------+------+------+
+----------------------------+----------------------------------+-----------------------------+---------+-------+
| Platform | Ubuntu | Debian | CentOS /| Rocky/|
| | | | RHEL | RHEL |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| ScyllaDB Version / Version | 14.04| 16.04| 18.04|20.04 |22.04 | 8 | 9 | 10 | 11 | 7 | 8 |
+============================+======+======+======+======+======+======+======+=======+=======+=========+=======+
| 5.1 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 5.0 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 4.6 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 4.5 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 4.4 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 4.3 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 4.2 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 4.1 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 4.0 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |x| | |x| | |v| | |x| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 3.x | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |x| | |x| | |v| | |x| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 2.3 | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |x| | |v| | |x| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
| 2.2 | |v| | |v| | |x| | |x| | |x| | |v| | |x| | |x| | |x| | |v| | |x| |
+----------------------------+------+------+------+------+------+------+------+-------+-------+---------+-------+
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 4.3).
Scylla Enterprise
-----------------
ScyllaDB Enterprise
--------------------
.. note:: Enterprise versions **prior to** 2021.1, the recommended OS and Scylla AMI/IMage OS is CentOS 7.
.. note::
Recommended OS and ScyllaDB AMI/Image OS for ScyllaDB Enterprise:
For Enterprise versions **2021.1 and later**, the recommended OS and Scylla AMI/IMage OS is Ubuntu 20.04.4 LTS.
- Ubuntu 20.04 for versions 2021.1 and later.
- CentOS 7 for versions earlier than 2021.1.
For Enterprise versions **2021.1 and later**, the recommended OS and Scylla AMI/Image OS is Ubuntu 20.04.
+--------------------------+---------------------------+--------------------+------------+
| Platform | Ubuntu | Debian | Centos/RHEL|
+--------------------------+------+------+------+------+------+------+------+------+-----+
| Scylla Version / Version | 14 | 16 | 18 | 20 | 8 | 9 | 10 | 7 | 8 |
+==========================+======+======+======+======+======+======+======+======+=====+
| 2021.1 | |x| | |v| | |v| | |v| | |x| | |v| | |v| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+------+-----+
| 2020.1 | |x| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+--------------------------+------+------+------+------+------+------+------+------+-----+
| 2019.1 | |x| | |v| | |v| | |x| | |x| | |v| | |x| | |v| | |x| |
+--------------------------+------+------+------+------+------+------+------+------+-----+
| 2018.1 | |v| | |v| | |x| | |x| | |v| | |x| | |x| | |v| | |x| |
+--------------------------+------+------+------+------+------+------+------+------+-----+
+----------------------------+-----------------------------------+---------------------------+--------+-------+
| Platform | Ubuntu | Debian | CentOS/| Rocky/|
| | | | RHEL | RHEL |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| ScyllaDB Version / Version | 14.04| 16.04| 18.04| 20.04| 22.04 | 8 | 9 | 10 | 11 | 7 | 8 |
+============================+======+======+======+======+=======+======+======+======+======+========+=======+
| 2022.2 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2022.1 | |x| | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2021.1 | |x| | |v| | |v| | |v| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2020.1 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |v| | |x| | |v| | |v| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2019.1 | |x| | |v| | |v| | |x| | |x| | |x| | |v| | |x| | |x| | |v| | |x| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
| 2018.1 | |v| | |v| | |x| | |x| | |v| | |x| | |x| | |x| | |x| | |v| | |x| |
+----------------------------+------+------+------+------+-------+------+------+------+------+--------+-------+
All releases are available as a Docker container, EC2 AMI, and a GCP image (GCP image from version 2021.1).

View File

@@ -1,7 +1,7 @@
===================
Scylla Requirements
===================
=====================
ScyllaDB Requirements
=====================
.. toctree::
:maxdepth: 2
@@ -22,9 +22,9 @@ Scylla Requirements
</div>
<div class="medium-9 columns">
* :doc:`Scylla System Requirements Guide</getting-started/system-requirements/>`
* :doc:`ScyllaDB System Requirements Guide</getting-started/system-requirements/>`
* :doc:`OS Support by Platform and Version</getting-started/os-support/>`
* :doc:`Running Scylla in a shared environment </getting-started/scylla-in-a-shared-environment>`
* :doc:`Running ScyllaDB in a Shared Environment </getting-started/scylla-in-a-shared-environment>`
.. raw:: html

View File

@@ -13,7 +13,7 @@
:image: /_static/img/mascots/scylla-docs.svg
:search_box:
The most up-to-date documents for the fastest, best performing, high availability NoSQL database.
New to ScyllaDB? Start `here <https://cloud.docs.scylladb.com/stable/scylladb-basics/>`_!
.. raw:: html
@@ -26,28 +26,29 @@
<div class="grid-x grid-margin-x hs">
.. topic-box::
:title: Let us manage your DB
:title: ScyllaDB Cloud
:link: https://cloud.docs.scylladb.com
:class: large-4
:anchor: Get Started with Scylla Cloud
:anchor: ScyllaDB Cloud Documentation
Take advantage of Scylla Cloud, a fully-managed database-as-a-service.
Simplify application development with ScyllaDB Cloud - a fully managed database-as-a-service.
.. topic-box::
:title: Manage your own DB
:title: ScyllaDB Enterprise
:link: https://enterprise.docs.scylladb.com
:class: large-4
:anchor: ScyllaDB Enterprise Documentation
Deploy and manage ScyllaDB's most stable enterprise-grade database with premium features and 24/7 support.
.. topic-box::
:title: ScyllaDB Open Source
:link: getting-started
:class: large-4
:anchor: Get Started with Scylla
:anchor: ScyllaDB Open Source Documentation
Provision and manage a Scylla cluster in your environment.
Deploy and manage your database in your environment.
.. topic-box::
:title: Connect your application to Scylla
:link: using-scylla/drivers
:class: large-4
:anchor: Choose a Driver
Use high performance Scylla drivers to connect your application to a Scylla cluster.
.. raw:: html
@@ -57,75 +58,50 @@
<div class="topics-grid topics-grid--products">
<h2 class="topics-grid__title">Our Product List</h2>
<p class="topics-grid__text">To begin choose a product from the list below</p>
<h2 class="topics-grid__title">Other Products</h2>
<div class="grid-container full">
<div class="grid-x grid-margin-x">
.. topic-box::
:title: Scylla Enterprise
:link: getting-started
:image: /_static/img/mascots/scylla-enterprise.svg
:class: topic-box--product,large-3,small-6
ScyllaDBs most stable high-performance enterprise-grade NoSQL database.
.. topic-box::
:title: Scylla Open Source
:link: getting-started
:image: /_static/img/mascots/scylla-opensource.svg
:class: topic-box--product,large-3,small-6
A high-performance NoSQL database with a close-to-the-hardware, shared-nothing approach.
.. topic-box::
:title: Scylla Cloud
:link: https://cloud.docs.scylladb.com
:image: /_static/img/mascots/scylla-cloud.svg
:class: topic-box--product,large-3,small-6
A fully managed NoSQL database as a service powered by Scylla Enterprise.
.. topic-box::
:title: Scylla Alternator
:title: ScyllaDB Alternator
:link: https://docs.scylladb.com/stable/alternator/alternator.html
:image: /_static/img/mascots/scylla-alternator.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Open source Amazon DynamoDB-compatible API.
.. topic-box::
:title: Scylla Monitoring Stack
:title: ScyllaDB Monitoring Stack
:link: https://monitoring.docs.scylladb.com
:image: /_static/img/mascots/scylla-monitor.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Complete open source monitoring solution for your Scylla clusters.
Complete open source monitoring solution for your ScyllaDB clusters.
.. topic-box::
:title: Scylla Manager
:title: ScyllaDB Manager
:link: https://manager.docs.scylladb.com
:image: /_static/img/mascots/scylla-manager.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Hassle-free Scylla NoSQL database management for scale-out clusters.
Hassle-free ScyllaDB NoSQL database management for scale-out clusters.
.. topic-box::
:title: Scylla Drivers
:title: ScyllaDB Drivers
:link: https://docs.scylladb.com/stable/using-scylla/drivers/
:image: /_static/img/mascots/scylla-drivers.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Shard-aware drivers for superior performance.
.. topic-box::
:title: Scylla Operator
:title: ScyllaDB Operator
:link: https://operator.docs.scylladb.com
:image: /_static/img/mascots/scylla-enterprise.svg
:class: topic-box--product,large-3,small-6
:class: topic-box--product,large-4,small-6
Easily run and manage your Scylla Cluster on Kubernetes.
Easily run and manage your ScyllaDB cluster on Kubernetes.
.. raw:: html
@@ -135,19 +111,19 @@
<div class="topics-grid">
<h2 class="topics-grid__title">Learn More About Scylla</h2>
<h2 class="topics-grid__title">Learn More About ScyllaDB</h2>
<p class="topics-grid__text"></p>
<div class="grid-container full">
<div class="grid-x grid-margin-x">
.. topic-box::
:title: Attend Scylla University
:title: Attend ScyllaDB University
:link: https://university.scylladb.com/
:image: /_static/img/mascots/scylla-university.png
:class: large-6,small-12
:anchor: Find a Class
| Register to take a *free* class at Scylla University.
| Register to take a *free* class at ScyllaDB University.
| There are several learning paths to choose from.
.. topic-box::
@@ -178,9 +154,9 @@
architecture/index
troubleshooting/index
kb/index
Scylla University <https://university.scylladb.com/>
ScyllaDB University <https://university.scylladb.com/>
faq
Contribute to Scylla <contribute>
Contribute to ScyllaDB <contribute>
glossary
alternator/alternator

View File

@@ -2,7 +2,7 @@
Counting all rows in a table is slow
====================================
**Audience: Scylla users**
**Audience: ScyllaDB users**
Trying to count all rows in a table using
@@ -10,14 +10,21 @@ Trying to count all rows in a table using
SELECT COUNT(1) FROM ks.table;
often fails with **ReadTimeout** error.
may fail with the **ReadTimeout** error.
COUNT() is running a full-scan query on all nodes, which might take a long time to finish. Often the time is greater than Scylla query timeout.
One way to bypass this in Scylla 4.4 or later is increasing the timeout for this query using the :ref:`USING TIMEOUT <using-timeout>` directive, for example:
COUNT() runs a full-scan query on all nodes, which might take a long time to finish. As a result, the count time may be greater than the ScyllaDB query timeout.
One way to prevent that issue in Scylla 4.4 or later is to increase the timeout for the query using the :ref:`USING TIMEOUT <using-timeout>` directive, for example:
.. code-block:: cql
SELECT COUNT(1) FROM ks.table USING TIMEOUT 120s;
You can also get an *estimation* of the number **of partitions** (not rows) with :doc:`nodetool tablestats </operating-scylla/nodetool-commands/tablestats>`
You can also get an *estimation* of the number **of partitions** (not rows) with :doc:`nodetool tablestats </operating-scylla/nodetool-commands/tablestats>`.
.. note::
ScyllaDB 5.1 includes improvements to speed up the execution of SELECT COUNT(*) queries.
To increase the count speed, we recommend upgrading to ScyllaDB 5.1 or later.
.. REMOVE IN FUTURE VERSIONS - Remove the note above in version 5.1.

View File

@@ -55,6 +55,7 @@ Knowledge Base
* :doc:`Map CPUs to Scylla Shards </kb/map-cpu>` - Mapping between CPUs and Scylla shards
* :doc:`Recreate RAID devices </kb/raid-device>` - How to recreate your RAID devices without running scylla-setup
* :doc:`Configure Scylla Networking with Multiple NIC/IP Combinations </kb/yaml-address>` - examples for setting the different IP addresses in scylla.yaml
* :doc:`Updating the Mode in perftune.yaml After a ScyllaDB Upgrade </kb/perftune-modes-sync>`
* :doc:`Kafka Sink Connector Quickstart </using-scylla/integrations/kafka-connector>`
* :doc:`Kafka Sink Connector Configuration </using-scylla/integrations/sink-config>`

View File

@@ -0,0 +1,48 @@
==============================================================
Updating the Mode in perftune.yaml After a ScyllaDB Upgrade
==============================================================
In versions 5.1 (ScyllaDB Open Source) and 2022.2 (ScyllaDB Enterprise), we improved ScyllaDB's performance by `removing the rx_queues_count from the mode
condition <https://github.com/scylladb/seastar/pull/949>`_. As a result, ScyllaDB operates in
the ``sq_split`` mode instead of the ``mq`` mode (see :doc:`Seastar Perftune </operating-scylla/admin-tools/perftune>` for information about the modes).
If you upgrade from an earlier version of ScyllaDB, your cluster's existing nodes may use the ``mq`` mode,
while new nodes will use the ``sq_split`` mode. As using different modes across one cluster is not recommended,
you should change the configuration to ensure that the ``sq_split`` mode is used on all nodes.
This section describes how to update the `perftune.yaml` file to configure the ``sq_split`` mode on all nodes.
Procedure
------------
The examples below assume that you are using the default locations for storing data and the `scylla.yaml` file,
and that your NIC is ``eth5``.
#. Backup your old configuration.
.. code-block:: console
sudo mv /etc/scylla.d/cpuset.conf /etc/scylla.d/cpuset.conf.old
sudo mv /etc/scylla.d/perftune.yaml /etc/scylla.d/perftune.yaml.old
#. Create a new configuration.
.. code-block:: console
sudo scylla_sysconfig_setup --nic eth5 --homedir /var/lib/scylla --confdir /etc/scylla
A new ``/etc/scylla.d/cpuset.conf`` will be generated on the output.
#. Compare the contents of the newly generated ``/etc/scylla.d/cpuset.conf`` with ``/etc/scylla.d/cpuset.conf.old`` you created in step 1.
- If they are exactly the same, rename ``/etc/scylla.d/perftune.yaml.old`` you created in step 1 back to ``/etc/scylla.d/perftune.yaml`` and continue to the next node.
- If they are different, move on to the next steps.
#. Restart the ``scylla-server`` service.
.. code-block:: console
nodetool drain
sudo systemctl restart scylla-server
#. Wait for the service to become up and running (similarly to how it is done during a :doc:`rolling restart </operating-scylla/procedures/config-change/rolling-restart>`). It may take a considerable amount of time before the node is in the UN state due to resharding.
#. Continue to the next node.

View File

@@ -42,7 +42,7 @@ Steps:
.. code-block:: sh
nodetool compact <keyspace>.<mytable>;
nodetool compact <keyspace> <mytable>;
5. Alter the table and change the grace period back to the original ``gc_grace_seconds`` value.

View File

@@ -3,8 +3,8 @@
* :doc:`REST - Scylla REST/HTTP Admin API</operating-scylla/rest>`.
* :doc:`Tracing </using-scylla/tracing>` - a ScyllaDB tool for debugging and analyzing internal flows in the server.
* :doc:`SSTableloader </operating-scylla/admin-tools/sstableloader>` - Bulk load the sstables found in the directory to a Scylla cluster
* :doc:`scylla-sstable </operating-scylla/admin-tools/scylla-sstable>` - Validates and dumps the content of SStables, generates a histogram, dumps the content of the SStable index.
* :doc:`scylla-types </operating-scylla/admin-tools/scylla-types/>` - Examines raw values obtained from SStables, logs, coredumps, etc.
* :doc:`Scylla SStable </operating-scylla/admin-tools/scylla-sstable>` - Validates and dumps the content of SStables, generates a histogram, dumps the content of the SStable index.
* :doc:`Scylla Types </operating-scylla/admin-tools/scylla-types/>` - Examines raw values obtained from SStables, logs, coredumps, etc.
* :doc:`cassandra-stress </operating-scylla/admin-tools/cassandra-stress/>` A tool for benchmarking and load testing a Scylla and Cassandra clusters.
* :doc:`SSTabledump - Scylla 3.0, Scylla Enterprise 2019.1 and newer versions </operating-scylla/admin-tools/sstabledump>`
* :doc:`SSTable2JSON - Scylla 2.3 and older </operating-scylla/admin-tools/sstable2json>`

View File

@@ -9,8 +9,8 @@ Admin Tools
CQLSh </cql/cqlsh>
REST </operating-scylla/rest>
Tracing </using-scylla/tracing>
scylla-sstable
scylla-types </operating-scylla/admin-tools/scylla-types/>
Scylla SStable </operating-scylla/admin-tools/scylla-sstable/>
Scylla Types </operating-scylla/admin-tools/scylla-types/>
sstableloader
cassandra-stress </operating-scylla/admin-tools/cassandra-stress/>
sstabledump

View File

@@ -1,4 +1,4 @@
scylla-sstable
Scylla SStable
==============
.. versionadded:: 5.0
@@ -9,7 +9,17 @@ Introduction
This tool allows you to examine the content of SStables by performing operations such as dumping the content of SStables,
generating a histogram, validating the content of SStables, and more. See `Supported Operations`_ for the list of available operations.
Run ``scylla-sstable --help`` for additional information about the tool and the operations.
Run ``scylla sstable --help`` for additional information about the tool and the operations.
This tool is similar to SStableDump_, with notable differences:
* Built on the ScyllaDB C++ codebase, it supports all SStable formats and components that ScyllaDB supports.
* Expanded scope: this tool supports much more than dumping SStable data components (see `Supported Operations`_).
* More flexible on how schema is obtained and where SStables are located: SStableDump_ only supports dumping SStables located in their native data directory. To dump an SStable, one has to clone the entire ScyllaDB data directory tree, including system table directories and even config files. ``scylla sstable`` can dump sstables from any path with multiple choices on how to obtain the schema, see Schema_.
Currently, SStableDump_ works better on production systems as it automatically loads the schema from the system tables, unlike ``scylla sstable``, which has to be provided with the schema explicitly. On the other hand ``scylla sstable`` works better for off-line investigations, as it can be used with as little as just a schema definition file and a single sstable. In the future we plan on closing this gap -- adding support for automatic schema-loading for ``scylla sstable`` too -- and completely supplant SStableDump_ with ``scylla sstable``.
.. _SStableDump: /operating-scylla/admin-tools/sstabledump
Usage
------
@@ -21,11 +31,82 @@ The command syntax is as follows:
.. code-block:: console
scylla-sstable <operation> <path to SStable>
scylla sstable <operation> <path to SStable>
You can specify more than one SStable.
Schema
^^^^^^
All operations need a schema to interpret the SStables with.
Currently, there are two ways to obtain the schema:
* ``--schema-file FILENAME`` - Read the schema definition from a file.
* ``--system-schema KEYSPACE.TABLE`` - Use the known definition of built-in tables (only works for system tables).
By default, the tool uses the first method: ``--schema-file schema.cql``; i.e. it assumes there is a schema file named ``schema.cql`` in the working directory.
If this fails, it will exit with an error.
The schema file should contain all definitions needed to interpret data belonging to the table.
Example ``schema.cql``:
.. code-block:: cql
CREATE KEYSPACE ks WITH replication = {'class': 'NetworkTopologyStrategy', 'mydc1': 1, 'mydc2': 4};
CREATE TYPE ks.mytype (
f1 int,
f2 text
);
CREATE TABLE ks.cf (
pk int,
ck text,
v1 int,
v2 mytype,
PRIMARY KEY (pk, ck)
);
Note:
* In addition to the table itself, the definition also has to includes any user defined types the table uses.
* The keyspace definition is optional, if missing one will be auto-generated.
* The schema file doesn't have to be called ``schema.cql``, this is just the default name. Any file name is supported (with any extension).
Dropped columns
***************
The examined sstable might have columns which were dropped from the schema definition. In this case providing the up-do-date schema will not be enough, the tool will fail when attempting to process a cell for the dropped column.
Dropped columns can be provided to the tool in the form of insert statements into the ``system_schema.dropped_columns`` system table, in the schema definition file. Example:
.. code-block:: cql
INSERT INTO system_schema.dropped_columns (
keyspace_name,
table_name,
column_name,
dropped_time,
type
) VALUES (
'ks',
'cf',
'v1',
1631011979170675,
'int'
);
CREATE TABLE ks.cf (pk int PRIMARY KEY, v2 int);
System tables
*************
If the examined table is a system table -- it belongs to one of the system keyspaces (``system``, ``system_schema``, ``system_distributed`` or ``system_distributed_everywhere``) -- you can just tell the tool to use the known built-in definition of said table. This is possible with the ``--system-schema`` flag. Example:
.. code-block:: console
scylla sstable dump-data --system-schema system.local ./path/to/md-123456-big-Data.db
Supported Operations
^^^^^^^^^^^^^^^^^^^^^^^
The ``dump-*`` operations output JSON. For ``dump-data``, you can specify another output format.
@@ -56,17 +137,17 @@ Dumping the content of the SStable:
.. code-block:: console
scylla-sstable dump-data /path/to/md-123456-big-Data.db
scylla sstable dump-data /path/to/md-123456-big-Data.db
Dumping the content of two SStables as a unified stream:
.. code-block:: console
scylla-sstable dump-data --merge /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db
scylla sstable dump-data --merge /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db
Validating the specified SStables:
.. code-block:: console
scylla-sstable validate /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db
scylla sstable validate /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db

View File

@@ -1,4 +1,4 @@
scylla-types
Scylla Types
==============
.. versionadded:: 5.0
@@ -26,7 +26,7 @@ The command syntax is as follows:
* Provide the values in the hex form without a leading 0x prefix.
* You must specify the type of the provided values. See :ref:`Specifying the Value Type <scylla-types-type>`.
* The number of provided values depends on the operation. See :ref:`Supported Operations <scylla-types-operations>` for details.
* The scylla-types operations come with additional options. See :ref:`Additional Options <scylla-types-options>` for the list of options.
* The ``scylla types`` operations come with additional options. See :ref:`Additional Options <scylla-types-options>` for the list of options.
.. _scylla-types-type:

View File

@@ -4,8 +4,10 @@ SSTabledump
This tool allows you to converts SSTable into a JSON format file.
SSTabledump supported when using Scylla 3.0, Scylla Enterprise 2019.1, and newer versions.
In older versions, the tool is named SSTable2json_.
If you need more flexibility or want to dump more than just the data-component, see scylla-sstable_.
.. _SSTable2json: /operating-scylla/admin-tools/sstable2json
.. _scylla-sstable: /operating-scylla/admin-tools/scylla-sstable
Use the full path to the data file when executing the command.

View File

@@ -9,12 +9,9 @@ Scylla for Administrators
Procedures <procedures/index>
security/index
admin-tools/index
manager/index
ScyllaDB Monitoring Stack <https://monitoring.docs.scylladb.com/>
ScyllaDB Operator <https://operator.docs.scylladb.com/>
ScyllaDB Manager <https://manager.docs.scylladb.com/>
Scylla Monitoring Stack <monitoring/index>
Scylla Operator <scylla-operator/index>
Upgrade Procedures </upgrade/index>
System Configuration <system-configuration/index>
benchmarking-scylla
@@ -36,15 +33,9 @@ Scylla for Administrators
:class: my-panel
* :doc:`Scylla Tools </operating-scylla/admin-tools/index>` - Tools for Administrating and integrating with Scylla
<<<<<<< HEAD
* :doc:`Scylla Manager </operating-scylla/manager/index>` - Tool for cluster administration and automation
* `ScyllaDB Monitoring Stack <https://monitoring.docs.scylladb.com/stable/>`_ - Tool for cluster monitoring and alerting
* `ScyllaDB Operator <https://operator.docs.scylladb.com>`_ - Tool to run Scylla on Kubernetes
=======
* `ScyllaDB Manager <https://manager.docs.scylladb.com/>`_ - Tool for cluster administration and automation
* :doc:`Scylla Monitoring Stack </operating-scylla/monitoring/index>` - Tool for cluster monitoring and alerting
* :doc:`Scylla Operator </operating-scylla/scylla-operator/index>` - Tool to run Scylla on Kubernetes
>>>>>>> 40050f951 (doc: add the link to manager.docs.scylladb.com to the toctree)
* :doc:`Scylla Logs </getting-started/logging/>`
.. panel-box::

View File

@@ -22,22 +22,4 @@ For example:
``nodetool refresh nba player_stats``
Load and Stream
---------------
.. versionadded:: 4.6
.. code::
nodetool refresh <my_keyspace> <my_table> [--load-and-stream | -las]
The Load and Stream feature extends nodetool refresh. The new ``-las`` option loads arbitrary sstables that do not belong to a node into the cluster. It loads the sstables from the disk and calculates the data's owning nodes, and streams automatically.
For example, say the old cluster has 6 nodes and the new cluster has 3 nodes. We can copy the sstables from the old cluster to any of the new nodes and trigger the load and stream process.
Load and Stream make restores and migrations much easier:
* You can place sstable from every node to every node
* No need to run nodetool cleanup to remove unused data
.. include:: nodetool-index.rst

View File

@@ -41,14 +41,6 @@ Scylla nodetool repair command supports the following options:
nodetool repair -et 90874935784
nodetool repair --end-token 90874935784
- ``-seq``, ``--sequential`` Use *-seq* to carry out a sequential repair.
For example, a sequential repair of all keyspaces on a node:
::
nodetool repair -seq
- ``-hosts`` ``--in-hosts`` syncs the **repair master** data subset only between a list of nodes, using host ID or Address. The list *must* include the **repair master**.

View File

@@ -102,4 +102,4 @@ Cluster Management Procedures
Procedures for handling failures and practical examples of different scenarios.
* :ref:`Handling Failures<raft-handliing-failures>`
* :ref:`Handling Failures<raft-handling-failures>`

View File

@@ -198,7 +198,7 @@ By default ScyllaDB will try to use cache, but since the data wont be used ag
As a consequence it can lead to bad latency on operational workloads due to increased rate of cache misses.
To prevent this problem, queries from analytical workloads can bypass the cache using the bypass cache option.
:ref:`Bypass Cache <select-statement>` is only available with Scylla Enterprise.
See :ref:`Bypass Cache <bypass-cache>` for more information.
Batching
========

View File

@@ -1,2 +0,0 @@
.. Note:: This reference covers CQL specification version |cql-version|

View File

@@ -68,7 +68,7 @@ Gracefully stop the node
.. code:: sh
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the new release
------------------------------------
@@ -92,13 +92,13 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
3. Check scylla-enterprise-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
3. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
4. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
@@ -130,7 +130,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
@@ -164,7 +164,7 @@ Start the node
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------

View File

@@ -7,9 +7,7 @@ This document is a step-by-step procedure for upgrading from ScyllaDB Open Sourc
Applicable Versions
===================
This guide covers upgrading ScyllaDB from version 5.0.x to ScyllaDB Enterprise version 2022.1.y on the following platform:
* |OS|
This guide covers upgrading ScyllaDB from version 5.0.x to ScyllaDB Enterprise version 2022.1.y on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported |OS| versions.
Upgrade Procedure
=================
@@ -30,7 +28,7 @@ Apply the following procedure **serially** on each node. Do not move to the next
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2022.1 features.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes. See :doc:`here </operating-scylla/manager/2.1/sctool>` for suspending Scylla Manager's scheduled or running repairs.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/>`_ for suspending Scylla Manager's scheduled or running repairs.
* Not to apply schema changes.
@@ -116,7 +114,7 @@ New io.conf format was introduced in ScyllaDB 2.3 and 2019.1. If your io.conf do
.. code:: sh
sudo service scylla-enterprise-server start
sudo service scylla-server start
Validate
--------
@@ -156,7 +154,7 @@ Gracefully shutdown ScyllaDB
.. code:: sh
nodetool drain
sudo service scylla-enterprise-server stop
sudo service scylla-server stop
Download and install the old release
------------------------------------

View File

@@ -0,0 +1,73 @@
======================================================================
Upgrade Guide - |SCYLLA_NAME| |SRC_VERSION| to |NEW_VERSION| for |OS|
======================================================================
This document is a step-by-step procedure for upgrading from |SCYLLA_NAME| |FROM| to |SCYLLA_NAME| |TO|, and rollback to 2021.1 if required.
Applicable versions
===================
This guide covers upgrading |SCYLLA_NAME| from version |FROM| to version |TO| on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
Upgrade Procedure
=================
.. include:: /upgrade/upgrade-enterprise/_common/enterprise_2022.1_warnings.rst
A ScyllaDB Enterprise upgrade is a rolling procedure which does **not** require a full cluster shutdown.
For each of the nodes in the cluster, you will:
* Drain the node and backup the data
* Check your current release
* Backup the configuration file
* Stop ScyllaDB
* Download and install the new ScyllaDB packages
* Start ScyllaDB
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating that the node that you upgraded is up and running the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2022.x.z features.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes.
* Not to apply schema changes.
Upgrade Steps
=============
Drain the node and backup the data
------------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In ScyllaDB, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is completed on all nodes, the snapshot should be removed with the ``nodetool clearsnapshot -t <snapshot>`` command, or you risk running out of space.
Backup the configuration file
-------------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2022.x.z
Backup more config files.
.. code:: sh
for conf in $(cat /var/lib/dpkg/info/scylla-*server.conffiles /var/lib/dpkg/info/scylla-*conf.conffiles /var/lib/dpkg/info/scylla-*jmx.conffiles | grep -v init ); do sudo cp -v $conf $conf.backup-2.1; done
Gracefully stop the node
------------------------
.. code:: sh
sudo service scylla-server stop
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``dpkg -s scylla-enterprise-server``. You should use the same version in case you want to |ROLLBACK|_ the upgrade. If you are not running a 2022.x.y version, stop right here! This guide only covers 2022.x.y to 2022.x.z upgrades.

View File

@@ -0,0 +1,95 @@
**To upgrade ScyllaDB:**
#. Update the |APT|_ to **2022.x**.
#. Install:
.. code:: sh
sudo apt-get clean all
sudo apt-get update
sudo apt-get dist-upgrade scylla-enterprise
Answer y to the first two questions.
Start the node
--------------
.. code:: sh
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after 2 minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from ScyllaDB Enterprise release 2022.x.z to 2022.x.y. Apply this procedure if an upgrade from 2022.x.y to 2022.x.z failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2022.x.z.
ScyllaDB rollback is a rolling procedure which does **not** require a full cluster shutdown.
For each of the nodes rollback to 2022.x.y, you will:
* Gracefully shutdown ScyllaDB
* Downgrade to the previous release
* Restore the configuration file
* Restart ScyllaDB
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating that the node is up and running the new version.
Rollback Steps
==============
Gracefully shutdown ScyllaDB
-----------------------------
.. code:: sh
nodetool drain
sudo service scylla-server stop
Downgrade to the previous release
----------------------------------
Install:
.. code:: sh
sudo apt-get install scylla-enterprise=2022.x.y\* scylla-enterprise-server=2022.x.y\* scylla-enterprise-jmx=2022.x.y\* scylla-enterprise-tools=2022.x.y\* scylla-enterprise-tools-core=2022.x.y\* scylla-enterprise-kernel-conf=2022.x.y\* scylla-enterprise-conf=2022.x.y\* scylla-enterprise-python3=2022.x.y\*
sudo apt-get install scylla-enterprise-machine-image=2022.x.y\* # only execute on AMI instance
Answer y to the first two questions.
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2022.x.z /etc/scylla/scylla.yaml
Restore more config files.
.. code:: sh
for conf in $(cat /var/lib/dpkg/info/scylla-*server.conffiles /var/lib/dpkg/info/scylla-*conf.conffiles /var/lib/dpkg/info/scylla-*jmx.conffiles | grep -v init ); do sudo cp -v $conf.backup-2.1 $conf; done
sudo systemctl daemon-reload
Start the node
--------------
.. code:: sh
sudo service scylla-server start
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -0,0 +1,2 @@
.. include:: /upgrade/_common/upgrade-guide-v2022-patch-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-guide-v2022-patch-ubuntu-and-debian-p2.rst

View File

@@ -0,0 +1,79 @@
=============================================================================
Upgrade Guide - |SCYLLA_NAME| |SRC_VERSION| to |NEW_VERSION| for |OS|
=============================================================================
This document is a step-by-step procedure for upgrading from ScyllaDB Enterprise 2021.1 to ScyllaDB Enterprise 2022.1, and rollback to 2021.1 if required.
Applicable Versions
===================
This guide covers upgrading ScyllaDB Enterprise from version **2021.1.8** or later to ScyllaDB Enterprise version 2022.1.y on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
Upgrade Procedure
=================
.. include:: /upgrade/upgrade-enterprise/_common/enterprise_2022.1_warnings.rst
A ScyllaDB upgrade is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes in the cluster, you will:
* Check the cluster schema
* Drain the node and backup the data
* Backup the configuration file
* Stop ScyllaDB
* Download and install the new ScyllaDB packages
* Start ScyllaDB
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node that you upgraded is up and running the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new 2022.1 features.
* Not to run administration functions, like repairs, refresh, rebuild, or add or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/index.html>`_ for suspending ScyllaDB Manager's scheduled or running repairs.
* Not to apply schema changes.
.. include:: /upgrade/_common/upgrade_to_2022_warning.rst
Upgrade Steps
=============
Check the cluster schema
-------------------------
Make sure that all nodes have the schema synched before the upgrade. The upgrade will fail if there is a schema disagreement between nodes.
.. code:: sh
nodetool describecluster
Drain the nodes and backup the data
-------------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In ScyllaDB, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is completed on all nodes, the snapshot should be removed with the ``nodetool clearsnapshot -t <snapshot>`` command to prevent running out of space.
Backup the configuration file
------------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-2021.1
Gracefully stop the node
------------------------
.. code:: sh
sudo service scylla-server stop
Download and install the new release
------------------------------------
Before upgrading, check what version you are running now using ``dpkg -l scylla\*server``. You should use the same version in case you want to |ROLLBACK|_ the upgrade. If you are not running a 2021.1.x version, stop right here! This guide only covers 2021.1.x to 2022.1.y upgrades.

View File

@@ -0,0 +1,127 @@
**To upgrade ScyllaDB:**
#. Update the |APT|_ to **2022.1** and enable scylla/ppa repo.
.. code:: sh
Ubuntu 16:
sudo add-apt-repository -y ppa:scylladb/ppa
#. Configure Java 1.8, which is requested by ScyllaDB Enterprise 2022.1.
.. code:: sh
sudo apt-get update
sudo apt-get install -y openjdk-8-jre-headless
sudo update-java-alternatives -s java-1.8.0-openjdk-amd64
#. Install:
.. code:: sh
sudo apt-get clean all
sudo apt-get update
sudo apt-get dist-upgrade scylla-enterprise
Answer y to the first two questions.
Start the node
--------------
A new io.conf format was introduced in Scylla 2.3 and 2019.1. If your io.conf doesn't contain `--io-properties-file` option, then it's still the old format. You need to re-run the io setup to generate new io.conf.
.. code:: sh
sudo scylla_io_setup
.. code:: sh
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
See :doc:`Scylla Metrics Update - Scylla Enterprise 2021.1 to 2022.1<metric-update-2021.1-to-2022.1>` for more information.
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from ScyllaDB Enterprise release 2022.1.x to 2022.1.y. Apply this procedure if an upgrade from 2021.1 to 2022.1 failed before completing on all nodes. Use this procedure only for nodes you upgraded to 2022.1
ScyllaDB rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes you rollback to 2021.1, you will:
* Drain the node and stop ScyllaDB
* Retrieve the old Scylla packages
* Restore the configuration file
* Restart ScyllaDB
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback Steps
==============
Gracefully shutdown ScyllaDB
----------------------------
.. code:: sh
nodetool drain
sudo service scylla-server stop
Download and install the old release
------------------------------------
#. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/apt/sources.list.d/scylla.list
#. Update the |APT|_ to **2021.1**.
#. Install:
.. code:: sh
sudo apt-get clean all
sudo apt-get update
sudo apt-get remove scylla\* -y
sudo apt-get install scylla-enterprise
Answer y to the first two questions.
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-2021.1 /etc/scylla/scylla.yaml
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from the previous snapshot - 2022.1 uses a different set of system tables. Refer to :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>`.
.. code:: sh
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo chown -R scylla:scylla /var/lib/scylla/data/keyspace_name/table_name-UUID/
Start the node
--------------
.. code:: sh
sudo service scylla-server start
Validate
--------
Check the upgrade instructions above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -0,0 +1,2 @@
.. include:: /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p2.rst

View File

@@ -2,89 +2,84 @@
Upgrade Guide - |SCYLLA_NAME| |SRC_VERSION| to |NEW_VERSION| for |OS|
=============================================================================================================
This document is a step by step procedure for upgrading from |SCYLLA_NAME| |SRC_VERSION| to |SCYLLA_NAME| |NEW_VERSION|, and rollback to |SRC_VERSION| if required.
This document is a step by step procedure for upgrading from |SCYLLA_NAME| |SRC_VERSION| to |SCYLLA_NAME| |NEW_VERSION|, and rollback to version |SRC_VERSION| if required.
Applicable versions
Applicable Versions
===================
This guide covers upgrading Scylla from the following versions: |SRC_VERSION|.x or later to |SCYLLA_NAME| version |NEW_VERSION|.y, on the following platforms:
* |OS|
This guide covers upgrading |SCYLLA_NAME| |SRC_VERSION| to |SCYLLA_NAME| |NEW_VERSION| on |OS|.
See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
Upgrade Procedure
=================
Upgrading your Scylla version is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
Upgrading your ScyllaDB version is a rolling procedure that does not require a full cluster shutdown. For each of the nodes in the cluster, serially (i.e. one at a time), you will:
* Check the cluster's schema
* Drain the node and backup the data
* Backup the configuration file
* Stop the Scylla service
* Download and install new Scylla packages
* Start the Scylla service
* Stop ScyllaDB
* Download and install new ScyllaDB packages
* Start ScyllaDB
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use new |NEW_VERSION| features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/index.html>`_ for suspending Scylla Manager (only available Scylla Enterprise) scheduled or running repairs.
* Not to use the new |NEW_VERSION| features
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/index.html>`_ for suspending ScyllaDB Manager (only available for ScyllaDB Enterprise) scheduled or running repairs.
* Not to apply schema changes
.. note:: Before upgrading, make sure to use the latest `Scylla Montioring <https://monitoring.docs.scylladb.com/>`_ stack.
.. note:: Before upgrading, make sure to use the latest `ScyllaDB Montioring <https://monitoring.docs.scylladb.com/>`_ stack.
Upgrade steps
Upgrade Steps
=============
Check the cluster schema
------------------------
Make sure that all nodes have the schema synched prior to upgrade as any schema disagreement between the nodes causes the upgrade to fail.
Make sure that all nodes have the schema synced before the upgrade. The upgrade will fail if there is a schema disagreement between nodes.
.. code:: sh
nodetool describecluster
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is **highly recommended** to backup all the data to an external device. In Scylla, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
Drain the nodes and backup the data
--------------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In ScyllaDB, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to an external backup device.
Take note of the directory name that nodetool gives you, and copy all the directories having that name under ``/var/lib/scylla`` to an external backup device.
When the upgrade is complete (for all nodes), remove the snapshot by running ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of disk space.
When the upgrade is completed on all nodes, remove the snapshot with the ``nodetool clearsnapshot -t <snapshot>`` command to prevent running out of space.
Backup configuration file
-------------------------
Backup the configuration file
------------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-src
Stop Scylla
-----------
Stop ScyllaDB
---------------
.. include:: /rst_include/scylla-commands-stop-index.rst
Download and install the new release
------------------------------------
Before upgrading, check what Scylla version you are currently running with ``rpm -qa | grep scylla-server``. You should use the same version as this version in case you want to :ref:`rollback <rollback-procedure>` the upgrade. If you are not running a |SRC_VERSION|.x version, stop right here! This guide only covers |SRC_VERSION|.x to |NEW_VERSION|.y upgrades.
Before upgrading, check what version you are running now using ``rpm -qa | grep scylla-server``. You should use the same version as this version in case you want to :ref:`rollback <rollback-procedure>` the upgrade. If you are not running a |SRC_VERSION|.x version, stop right here! This guide only covers |SRC_VERSION|.x to |NEW_VERSION|.y upgrades.
To upgrade:
1. Update the |SCYLLA_REPO|_ to |NEW_VERSION|
2. Install the new Scylla version
#. Update the |SCYLLA_REPO|_ to |NEW_VERSION|.
#. Install the new ScyllaDB version:
.. code:: sh
.. code:: sh
sudo yum clean all
sudo yum update scylla\* -y
.. note::
Alternator users upgrading from Scylla 4.0 to 4.1, need to set :doc:`default isolation level </upgrade/upgrade-opensource/upgrade-guide-from-4.0-to-4.1/alternator>`
sudo yum clean all
sudo yum update scylla\* -y
Start the node
@@ -94,62 +89,63 @@ Start the node
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the Scylla version. Validate that the version matches the one you upgraded to.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after two minutes, to validate no new issues are introduced.
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version. Validate that the version matches the one you upgraded to.
#. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
#. Check again after two minutes, to validate no new issues are introduced.
Once you are sure the node upgrade was successful, move to the next node in the cluster.
* More on |Scylla_METRICS|_
See |Scylla_METRICS|_ for more information..
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from |SCYLLA_NAME| release |NEW_VERSION|.x to |SRC_VERSION|.y. Apply this procedure if an upgrade from |SRC_VERSION| to |NEW_VERSION| failed before completing on all nodes. Use this procedure only for the nodes that you upgraded to |NEW_VERSION|
The following procedure describes a rollback from |SCYLLA_NAME| release |NEW_VERSION|.x to |SRC_VERSION|.y. Apply this procedure if an upgrade from |SRC_VERSION| to |NEW_VERSION| failed before completing on all nodes. Use this procedure only for the nodes that you upgraded to |NEW_VERSION|.
Scylla rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes rollback to |SRC_VERSION|, you will:
ScyllaDB rollback is a rolling procedure that does **not** require a full cluster shutdown.
For each of the nodes you rollback to |SRC_VERSION|, you will:
* Drain the node and stop Scylla
* Retrieve the old Scylla packages
* Drain the node and stop ScyllaDB
* Retrieve the old ScyllaDB packages
* Restore the configuration file
* Reload the systemd configuration
* Restart the Scylla service
* Restart ScyllaDB
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating the rollback was successful and that the node is up and running with the old version.
Apply the following procedure **serially** on each node. Do not move to the next node before validating the rollback was successful and that the node is up and running the old version.
Rollback steps
Rollback Steps
==============
Gracefully shutdown Scylla
--------------------------
Gracefully shutdown ScyllaDB
-----------------------------
.. code:: sh
nodetool drain
.. include:: /rst_include/scylla-commands-stop-index.rst
Download and install the new release
Download and install the old release
------------------------------------
1. Remove the old repo file.
#. Remove the old repo file.
.. code:: sh
.. code:: sh
sudo rm -rf /etc/yum.repos.d/scylla.repo
sudo rm -rf /etc/yum.repos.d/scylla.repo
2. Update the |SCYLLA_REPO|_ to |SRC_VERSION|
3. Install
#. Update the |SCYLLA_REPO|_ to |SRC_VERSION|.
#. Install:
.. code:: console
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum remove scylla\\*tools-core
sudo yum downgrade scylla\\* -y
sudo yum install scylla
.. parsed-literal::
\ sudo yum clean all
\ sudo rm -rf /var/cache/yum
\ sudo yum remove scylla\\*tools-core
\ sudo yum downgrade scylla\\* -y
\ sudo yum install |PKG_NAME|
\
Restore the configuration file
------------------------------
@@ -162,7 +158,7 @@ Restore the configuration file
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from previous snapshot, |NEW_VERSION| uses a different set of system tables. Reference doc: :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>`
Restore all tables of **system** and **system_schema** from previous snapshot because |NEW_VERSION| uses a different set of system tables. See :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>` for details.
.. code:: sh
@@ -173,7 +169,7 @@ Restore all tables of **system** and **system_schema** from previous snapshot, |
Reload systemd configuration
---------------------------------
Require to reload the unit file if the systemd unit file is changed.
You must reload the unit file if the systemd unit file is changed.
.. code:: sh
@@ -182,9 +178,8 @@ Require to reload the unit file if the systemd unit file is changed.
Start the node
--------------
.. include:: /rst_include/scylla-commands-start-index.rst
Validate
--------
Check the upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster. Keep in mind that the version you want to see on your node is the old version, which you noted at the beginning of the procedure.
Check the upgrade instructions above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -96,30 +96,6 @@ Answer y to the first two questions.
Alternator users upgrading from Scylla 4.0 to 4.1, need to set :doc:`default isolation level </upgrade/upgrade-opensource/upgrade-guide-from-4.0-to-4.1/alternator>`
Update 3rd party and OS packages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. versionadded:: Scylla 5.0
.. versionadded:: Scylla Enterprise 2021.1.10
This step is optional. It is recommended if you run a Scylla official image (EC2 AMI, GCP, and Azure images) based on Ubuntu 20.04.
Run the following command:
.. code:: sh
cat scylla-packages-xxx-x86_64.txt | sudo xargs -n1 apt-get -y
Where xxx is the relevant Scylla version ( |NEW_VERSION| ). The file is included in the Scylla packages downloaded in the previous step.
For example
.. code:: sh
cat scylla-packages-5.1.2-x86_64.txt | sudo xargs -n1 apt-get -y
Start the node
--------------

View File

@@ -0,0 +1,68 @@
Upgrade Guide - |SCYLLA_NAME| |SRC_VERSION| to |NEW_VERSION| for |OS|
======================================================================
This document is a step-by-step procedure for upgrading from |SCYLLA_NAME| |FROM| to |SCYLLA_NAME| |TO|, and rollback to 2021.1 if required.
Applicable Versions
------------------------
This guide covers upgrading |SCYLLA_NAME| from version |FROM| to version |TO| on |OS|. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
Upgrade Procedure
----------------------------
.. note::
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running the new version.
A ScyllaDB upgrade is a rolling procedure which does **not** require full cluster shutdown.
For each of the nodes in the cluster, you will:
* Drain node and backup the data.
* Check your current release.
* Backup configuration file.
* Stop ScyllaDB.
* Download and install new ScyllaDB packages.
* Start ScyllaDB.
* Validate that the upgrade was successful.
**During** the rolling upgrade it is highly recommended:
* Not to use new |TO| features.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes.
* Not to apply schema changes.
Upgrade steps
-------------------------------
Drain node and backup the data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In ScyllaDB, backup is done using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by ``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup configuration file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-5.x.z
Gracefully stop the node
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: sh
sudo service scylla-server stop
Download and install the new release
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before upgrading, check what version you are running now using ``dpkg -s scylla-server``. You should use the same version in case you want to |ROLLBACK|_ the upgrade. If you are not running a |FROM| version, stop right here! This guide only covers |FROM| to |TO| upgrades.

View File

@@ -0,0 +1,84 @@
**To upgrade ScyllaDB:**
#. Update the |APT|_ to |NEW_VERSION|.
#. Install:
.. code:: sh
sudo apt-get update
sudo apt-get dist-upgrade scylla
Answer y to the first two questions.
Start the node
^^^^^^^^^^^^^^^^
.. code:: sh
sudo service scylla-server start
Validate
^^^^^^^^^^^^^^^^
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
#. Check the scylla-server log (execute ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no errors.
#. Check again after 2 minutes, to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
Rollback Procedure
-----------------------
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from ScyllaDB release |TO| to |FROM|. Apply this procedure if an upgrade from |FROM| to |TO| failed before completing on all nodes. Use this procedure only for nodes you upgraded to |TO|.
ScyllaDB rollback is a rolling procedure which does **not** require full cluster shutdown.
For each of the nodes rollback to |FROM|, you will:
* Drain the node and stop ScyllaDB.
* Downgrade to previous release.
* Restore the configuration file.
* Restart ScyllaDB.
* Validate the rollback success.
Apply the following procedure **serially** on each node. Do not move to the next node before validating the node is up and running with the new version.
Rollback steps
------------------------
Gracefully shutdown ScyllaDB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: sh
nodetool drain
sudo service scylla-server stop
Downgrade to previous release
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Install:
.. code:: sh
sudo apt-get install scylla=5.x.y\* scylla-server=5.x.y\* scylla-jmx=5.x.y\* scylla-tools=5.x.y\* scylla-tools-core=5.x.y\* scylla-kernel-conf=5.x.y\* scylla-conf=5.x.y\*
Answer y to the first two questions.
Restore the configuration file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-5.x.z /etc/scylla/scylla.yaml
Start the node
^^^^^^^^^^^^^^^^^^^
.. code:: sh
sudo service scylla-server start
Validate
^^^^^^^^^^^^^^^^^^
Check upgrade instruction above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

View File

@@ -0,0 +1,2 @@
.. include:: /upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian-p2.rst

Some files were not shown because too many files have changed in this diff Show More