This commit replaces the link to the OSS-only page
(the 5.2-to-5.4 upgrade guide not present in
the Enterprise docs) on the Raft page.
While providing the link to the specific upgrade
guide is more user-friendly, it causes build failures
of the Enterprise documentation. I've replaced
it with the link to the general Upgrade section.
The ".. only:: opensource" directive used to wrap
the OSS-only content correctly excludes the content
form the Enterprise docs - but it doesn't prevent
build warnings.
This commit must be backported to branch-5.4 to
prevent errors in all versions.
Closesscylladb/scylladb#16176
(cherry picked from commit 24d5dbd66f)
Fixes#16153
* jmx 166599f...f45067f (3):
> ColumnFamilyStore: only quote table names if necessary
> APIBuilder: allow quoted scope names
> ColumnFamilyStore: don't fail if there is a table with ":" in its name
* java dfbf3726ee...3764ae94db (1):
> NodeProbe: allow addressing table name with colon in it
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16294
This commit adds a short paragraph to the Raft
page to explain how to enable consistent
topology updates with Raft - an experimental
feature in version 5.4.
The paragraph should satisfy the requirements
for version 5.4. The Raft page will be
rewritten in the next release when consistent
topology changes with Raft will be GA.
Fixes https://github.com/scylladb/scylladb/issues/15080
Requires backport to branch-5.4.
Closesscylladb/scylladb#16273
(cherry picked from commit 409e20e5ab)
This commit fixes the rollback procedure in
the 4.6-to-5.0 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
is fixed.
- The "Gracefully shutdown ScyllaDB" command
is fixed.
In addition, there are the following updates
to be in sync with the tests:
- The "Backup the configuration file" step is
extended to include a command to backup
the packages.
- The Rollback procedure is extended to restore
the backup packages.
- The Reinstallation section is fixed for RHEL.
Refs https://github.com/scylladb/scylladb/issues/11907
This commit must be backported to branch-5.4, branch-5.2, and branch-5.1
Closesscylladb/scylladb#16155
(cherry picked from commit 1e80bdb440)
This commit fixes the rollback procedure in
the 5.0-to-5.1 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
is fixed.
- The "Gracefully shutdown ScyllaDB" command
is fixed.
In addition, there are the following updates
to be in sync with the tests:
- The "Backup the configuration file" step is
extended to include a command to backup
the packages.
- The Rollback procedure is extended to restore
the backup packages.
- The Reinstallation section is fixed for RHEL.
Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.
Refs https://github.com/scylladb/scylladb/issues/11907
This commit must be backported to branch-5.4, branch-5.2, and branch-5.1
Closesscylladb/scylladb#16154
(cherry picked from commit 7ad0b92559)
This commit fixes the rollback procedure in
the 5.1-to-5.2 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
is fixed.
- The "Gracefully shutdown ScyllaDB" command
is fixed.
In addition, there are the following updates
to be in sync with the tests:
- The "Backup the configuration file" step is
extended to include a command to backup
the packages.
- The Rollback procedure is extended to restore
the backup packages.
- The Reinstallation section is fixed for RHEL.
Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.
Refs https://github.com/scylladb/scylladb/issues/11907
This commit must be backported to branch-5.4 and branch-5.2.
Closesscylladb/scylladb#16152
(cherry picked from commit 91cddb606f)
Currently the cache updaters aren't exception safe
yet they are intended to be.
Instead of allowing exceptions from
`external_updater::execute` escape `row_cache::update`,
abort using `on_fatal_internal_error`.
Future changes should harden all `execute` implementations
to effectively make them `noexcept`, then the pure virtual
definition can be made `noexcept` to cement that.
\Fixes scylladb/scylladb#15576
\Closes scylladb/scylladb#15577
* github.com:scylladb/scylladb:
row_cache: abort on exteral_updater::execute errors
row_cache: do_update: simplify _prev_snapshot_pos setup
(cherry picked from commit 4a0f16474f)
Closesscylladb/scylladb#16256
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.
The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.
The easy fix is to perform the only potentially-throwing step first.
Fixes#15822Closesscylladb/scylladb#15864
(cherry picked from commit 93ea3d41d8)
This commit adds information on how to enable
object storage for a keyspace.
The "Keyspace storage options" section already
existed in the doc, but it was not valid as
the support was only added in version 5.4
The scope of this commit:
- Update the "Keyspace storage options" section.
- Add the information about object storage support
to the Data Definition> CREATE KEYSPACE section
* Marked as "Experimental".
* Excluded from the Enterprise docs with the
.. only:: opensource directive.
This commit must be backported to branch-5.4,
as support for object storage was added
in version 5.4.
Closesscylladb/scylladb#16081
(cherry picked from commit bfe19c0ed2)
Update node_exporter to 1.7.0.
The previous version (1.6.1) was flagged by security scanners (such as
Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that
problem.
[Botond: regenerate frozen toolchain]
Fixes#16085Closesscylladb/scylladb#16086Closesscylladb/scylladb#16090
(cherry picked from commit 321459ec51)
[avi: regenerate frozen toolchain]
* ./tools/jmx 8d15342e...9a03d4fa (1):
> Merge "scylla-apiclient: update several Java dependencies" from Piotr Grabowski
* ./tools/java 3c09ab97...dfbf3726 (1):
> Merge 'build: update several dependencies' from Piotr Grabowski
Update build dependencies which were flagged by security scanners.
Refs: scylladb/scylla-jmx#220
Refs: scylladb/scylla-tools-java#351Closesscylladb/scylladb#16149
This commit fixes the rollback procedure in
the 5.2-to-5.4 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
is fixed.
- The "Gracefully shutdown ScyllaDB" command
is fixed.
In addition, there are the following updates
to be in sync with the tests:
- The "Backup the configuration file" step is
extended to include a command to backup
the packages.
- The Rollback procedure is extended to restore
the backup packages.
- The Reinstallation section is fixed for RHEL.
Also, I've removed the optional step to enable
consistent schema management from the list of
steps - the appropriate section has already
been removed, but it remained in the procedure
description, which was misleading.
Refs https://github.com/scylladb/scylladb/issues/11907
This commit must be backported to branch-5.4
Closesscylladb/scylladb#16114
(cherry picked from commit 3751acce42)
This miniset addresses two potential conversions to `global_schema_ptr` of incomplete materialized views schemas.
One of them was completely unnecessary and also is a "chicken and an egg" problem where on the sync schema procedure itself a view schema was converted to `global_schema_ptr` solely for the purposes of logging. This can create a
"hickup" in the materialized views updates if they are comming from a node with a different mv schema.
The reason why sometimes a synced schema can have no base info is because of deactivision and reactivision of the schema inside the `schema_registry` which doesn't restore the base information due to lack of context.
When a schema is synced the problem becomes easy since we can just use the latest base information from the database.
Fixes#14011Closesscylladb/scylladb#14861
* github.com:scylladb/scylladb:
migration manager: fix incomplete mv schemas returned from get_schema_for_write
migration_manager: do not globalize potentially incomplete schema
(cherry picked from commit 5752dc875b)
allow_mutation_read_page_without_live_row is a new option in the
partition_slice::option option set. In a mixed clusters, old nodes
possibly don't know this new option, so its usage must be protected by a
cluster feature. This patch does just that.
Fixes: #15795Closesscylladb/scylladb#15890
(cherry picked from commit f53961248d)
Currently, it is started/stopped in the streaming/maintenance sg, which
is what the API itself runs in.
Starting the native transport in the streaming sg, will lead to severely
degraded performance, as the streaming sg has significantly less
CPU/disk shares and reader concurrency semaphore resources.
Furthermore, it will lead to multi-paged reads possibly switching
between scheduling groups mid-way, triggering an internal error.
To fix, use `with_scheduling_group()` for both starting and stopping
native transport. Technically, it is only strictly necessary for
starting, but I added it for stop as well for consistency.
Also apply the same treatment to RPC (Thrift). Although no one uses it,
best to fix it, just to be on the safe side.
I think we need a more systematic approach for solving this once and for
all, like passing the scheduling group to the protocol server and have
it switch to it internally. This allows the server to always run on the
correct scheduling group, not depending on the caller to remember using
it. However, I think this is best done in a follow-up, to keep this
critical patch small and easily backportable.
Fixes: #15485Closesscylladb/scylladb#16019
(cherry picked from commit dfd7981fa7)
$ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL
itself.
To detect RHEL correctly, we also need to check $ID = "rhel".
Fixes#16040Closesscylladb/scylladb#16041
(cherry picked from commit 338a9492c9)
There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.
Fixes: https://github.com/scylladb/scylladb/issues/14087Closesscylladb/scylladb#15806
* github.com:scylladb/scylladb:
test/boost/multishard_mutation_query_test: fix querier cache misses expectations
test/lib/test_utils: add require_* variants for all comparators
(cherry picked from commit 457d170078)
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.
This fix affects also alternator's secondary indexes.
Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:
./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000
Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:
p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0
With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.
Fixes https://github.com/scylladb/scylladb/issues/15844Closesscylladb/scylladb#15845
(cherry picked from commit 020a9c931b)
This commit adds the .. only:: opensource directive
to the Raft page to exclude the link to the 5.2-to-5.4
upgrade guide from the Enterprise documentation.
The Raft page belongs to both OSS and Enterprise
documentation sets, while the upgrade guide
is OSS-only. This causes documentation build
issues in the Enterprise repository, for example,
https://github.com/scylladb/scylla-enterprise/pull/3242.
As a rule, all OSS-only links should be provided
by using the .. only:: opensource directive.
This commit must be backported to branch-5.4
to prevent errors in the documentation for
ScyllaDB Enterprise 2024.1
(backport)
Closesscylladb/scylladb#16064
(cherry picked from commit ca22de4843)
Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.
Fixes: #16004Closesscylladb/scylladb#16013
(cherry picked from commit 22381441b0)
`system.raft` was using the "user memory pool", i.e. the
`dirty_memory_manager` for this table was set to
`database::_dirty_memory_manager` (instead of
`database::_system_dirty_memory_manager`).
This meant that if a write workload caused memory pressure on the user
memory pool, internal `system.raft` writes would have to wait for
memtables of user tables to get flushed before the write would proceed.
This was observed in SCT longevity tests which ran a heavy workload on
the cluster and concurrently, schema changes (which underneath use the
`system.raft` table). Raft would often get stuck waiting many seconds
for user memtables to get flushed. More details in issue #15622.
Experiments showed that moving Raft to system memory fixed this
particular issue, bringing the waits to reasonable levels.
Currently `system.raft` stores only one group, group 0, which is
internally used for cluster metadata operations (schema and topology
changes) -- so it makes sense to keep use system memory.
In the future we'd like to have other groups, for strongly consistent
tables. These groups should use the user memory pool. It means we won't
be able to use `system.raft` for them -- we'll just have to use a
separate table.
Fixes: scylladb/scylladb#15622Closesscylladb/scylladb#15972
(cherry picked from commit f094e23d84)
When topology coordinator tries to fence the previous coordinator it
performs a group0 operation. The current topology coordinator might be
aborted in the meantime, which will result in a `raft::request_aborted`
exception being thrown. After the fix to scylladb/scylladb#15728 was
merged, the exception is caught, but then `sleep_abortable` is called
which immediately throws `abort_requested_exception` as it uses the same
abort source as the group0 operation. The `fence_previous_coordinator`
function which does all those things is not supposed to throw
exceptions, if it does - it causes `raft_state_monitor_fiber` to exit,
completely disabling the topology coordinator functionality on that
node.
Modify the code in the following way:
- Catch `abort_requested_exception` thrown from `sleep_abortable` and
exit the function if it happens. In addition to the described issue,
it will also handle the case when abort is requested while
`sleep_abortable` happens,
- Catch `raft::request_aborted` thrown from group0 operation, log the
exception with lower verbosity and exit the function explicitly.
Finally, wrap both `fence_previous_coordinator` and `run` functions in a
`try` block with `on_fatal_internal_error` in the catch handler in order
to implement the behavior that adding `noexcept` was originally supposed
to introduce.
Fixes: scylladb/scylladb#15747Closesscylladb/scylladb#15948
* github.com:scylladb/scylladb:
raft topology: catch and abort on exceptions from topology_coordinator::run
Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept"
raft topology: don't print an error when fencing previous coordinator is aborted
raft topology: handle abort exceptions from sleeping in fence_previous_coordinator
(cherry picked from commit 07e9522d6c)
This commit updates the Repair-Based Node
Operations page. In particular:
- Information about RBNO enabled for all
node operations is added (before 5.4, RBNO
was enabled for the replace operation, while
it was experimental for others).
- The content is rewritten to remove redundant
information about previous versions.
The improvement is part of the 5.4 release.
This commit must be backported to branch-5.4
Closesscylladb/scylladb#16015
(cherry picked from commit 8a4a8f077a)
since we use `sphinx_multiversion` for building multiple versions of document. and in #15860, we changed the way how options are rendered, so the same change should be applied to the branch which includes the option list.
to address the conflicts, in addition to #15860, the depended PRs are also backported. so, in this pull request, following PRs are backported:
- #15827
- #15782
- #15860Closesscylladb/scylladb#16030
* github.com:scylladb/scylladb:
docs: add divider using CSS
docs: extract _clean_description as a filter
docs: render option with role
docs: update cofig params design
docs: parse source files right into rst
instead of hardwiring the formatting in the html code, do this using
CSS, more flexible this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit ff12f1f678)
would be better to split the parser from the formatter. in future,
we can apply more filter on top of the exiting one.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 1694a7addc)
so we can cross-reference them with the syntax like
:confval:`alternator_timeout_in_ms`.
or even render an option like:
.. confval:: alternator_timeout_in_ms
in order to make the headerlink of the option visible,
a new CSS rule is added.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 9ddc639237)
This commit fixes the information about
Raft-based consistent cluster management
in the 5.2-to-5.4 upgrade guide.
This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4.
In addition, it adds information about removing
DateTieredCompactionStrategy to the 5.2-to-5.4
upgrade guide, including the guideline to
migrate to TimeWindowCompactionStrategy.
Closesscylladb/scylladb#15988
(cherry picked from commit ca0f5f39b5)
This commit updates the cqlsh compatibility
with Python to Python 3.
In addition it:
- Replaces "Cassandra" with "ScyllaDB" in
the description of cqlsh.
The previous description was outdated, as
we no longer can talk about using cqlsh
released with Cassandra.
- Replaces occurrences of "Scylla" with "ScyllaDB".
- Adds additional locations of cqlsh (Docker Hub
and PyPI), as well as the link to the scylla-cqlsh
repository.
Closesscylladb/scylladb#16016
(cherry picked from commit 8d618bbfc6)
We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.
Fixes: #14330Closesscylladb/scylladb#15879
(cherry picked from commit 90a8489809)
This PR adds the 5.2-5.4 upgrade guide.
In addition, it removes the redundant upgrade guide from 5.2 to 5.3 (as 5.3 was skipped), as well as some mentions of version 5.3.
This PR must be backported to branch-5.4.
Closesscylladb/scylladb#15880
* github.com:scylladb/scylladb:
doc: add the upgrade guide from 5.2 to 5.4
doc: remove version "5.3" from the docs
doc: remove the 5.2-to-5.3 upgrade guide
(cherry picked from commit 74f68a472f)
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.
Add a simple test to check that the `failure_detector/endpoints`
API returns nonzero generation.
Fixes: scylladb/scylladb#15816Closesscylladb/scylladb#15970
* github.com:scylladb/scylladb:
test: rest_api: test that generation is nonzero in `failure_detector/endpoints`
api: failure_detector: fix indentation
api: failure_detector: invoke on shard 0
(cherry picked from commit 9443253f3d)
This commit updates the package installation
instructions in version 5.4.
- It updates the variables to include "5.4"
as the version name.
- It adds the information for the newly supported
Rocky/RHEL 9 - a new EPEL download link is required.
Closesscylladb/scylladb#15963
(cherry picked from commit 1e0cbfe522)
This commit adds OS support information
in version 5.4 (removing the non-released
version 5.3).
In particular, it adds support for Oracle Linux
and Amazon Linux.
Also, it removes support for outdated versions.
Closesscylladb/scylladb#15923
(cherry picked from commit 3756705520)