Fixes#27077
Multiple points can be clarified relating to:
* Names of each sub-procedure could be clearer
* Requirements of each sub-procedure could be clearer
* Clarify which keyspaces are relevant and how to check them
* Fix typos in keyspace name
Closesscylladb/scylladb#26855
(cherry picked from commit a0734b8605)
Closesscylladb/scylladb#27148
This commit removes the now redundant driver pages from
the Scylla DB documentation. Instead, the link to the pages
where we moved the diver information is added.
Also, the links are updated across the ScyllaDB manual.
Redirections are added for all the removed pages.
Fixes https://github.com/scylladb/scylladb/issues/26871Closesscylladb/scylladb#27277
(cherry picked from commit c5580399a8)
Closesscylladb/scylladb#27436
This series allows an operator to reset 'cleanup needed' flag if he already cleaned up the node, so that automatic cleanup will not do it again. We also change 'nodetool cleanup' back to run cleanup on one node only (and reset 'cleanup needed' flag in the end), but the new '--global' option allows to run cleanup on all nodes that needed it simultaneously.
Fixes https://github.com/scylladb/scylladb/issues/26866
Backport to all supported version since automatic cleanup behaviour as it is now may create unexpected by the operator load during cluster resizing.
- (cherry picked from commit e872f9cb4e)
- (cherry picked from commit 0f0ab11311)
Parent PR: #26868Closesscylladb/scylladb#27089
* github.com:scylladb/scylladb:
cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster
cleanup: Add RESTful API to allow reset cleanup needed flag
97ab3f6622 changed "nodetool cleanup" (without arguments) to run
cleanup on all dirty nodes in the cluster. This was somewhat unexpected,
so this patch changes it back to run cleanup on the target node only (and
reset "cleanup needed" flag afterwards) and it adds "nodetool cluster
cleanup" command that runs the cleanup on all dirty nodes in the
cluster.
(cherry picked from commit 0f0ab11311)
If the user connects to Scylla via the maintenance socket, it may happen
that `auth_integration` has not been registered in the service level
controller yet. One example is maintenance mode when that will never
happen; another when the connection occurs before Scylla is fully
initialized.
To avoid unnecessary crashes, we add new branches if the passed user is
absent or if it corresponds to the anonymous role. Since the role
corresponding to a connection via the maintenance socket is the anonymous
role, that solves the problem.
In those cases, we completely circumvent any calls to `auth_integration`
and handle them separately. The modified methods are:
* `get_user_scheduling_group`,
* `with_user_service_level`,
* `describe_service_levels`.
For the first two, the new behavior is in line with the previous
implementation of those functions. The last behaves differently now,
but since it's a soft error, crashing the node is not necessary anyway.
We throw an exception instead, whose error message should give the user
a hint of what might be wrong.
The other uses of `auth_integration` within the service level controller
are not problematic:
* `find_effective_service_level`,
* `find_cached_effective_service_level`.
They take the name of a role as their argument. Since the anonymous role
doesn't have a name, it's not possible to call them with it.
Fixesscylladb/scylladb#26816
(cherry picked from commit c0f7622d12)
Capacity based balancing was introduced in 2025.1. It computes balance
based on a node's capacity: the number of tablets located on a node
should be directly proportional to that node's storage capacity.
This change adds this explanation to the docs.
Fixes: #25686Closesscylladb/scylladb#25687
(cherry picked from commit de5dab8429)
Closesscylladb/scylladb#26105
This commit:
- Extends the Drivers support table with information on which driver supports tablets
and since which version.
- Adds the driver support policy to the Drivers page.
- Reorganizes the Drivers page to accommodate the updates.
In addition:
- The CPP-over-Rust driver is added to the table.
- The information about Serverless (which we don't support) is removed
and replaced with tablets to correctly describe the contents of the table.
Fixes https://github.com/scylladb/scylladb/issues/19471
Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69Closesscylladb/scylladb#24635
(cherry picked from commit 18b4d4a77c)
Closesscylladb/scylladb#25247
Although valid for compact tables, non-full (or empty) clustering key prefixes are not handled for row keys when writing sstables. Only the present components are written, consequently if the key is empty, it is omitted entirely.
When parsing sstables, the parsing code unconditionally parses a full prefix.
This mis-match results in parsing failures, as the parser parses part of the row content as a key resulting in a garbage key and subsequent mis-parsing of the row content and maybe even subsequent partitions.
Introduce a new system table: `system.corrupt_data` and infrastructure similar to `large_data_handler`: `corrupt_data_handler` which abstracts how corrupt data is handled. The sstable writer now passes rows such corrupt keys to the corrupt data handler. This way, we avoid corrupting the sstables beyond parsing and the rows are also kept around in system.corrupt_data for later inspection and possible recovery.
Add a full-stack test which checks that rows with bad keys are correctly handled.
Fixes: https://github.com/scylladb/scylladb/issues/24489
The bug is present in all versions, has to be backported to all supported versions.
- (cherry picked from commit 92b5fe8983)
- (cherry picked from commit 0753643606)
- (cherry picked from commit b0d5462440)
- (cherry picked from commit 093d4f8d69)
- (cherry picked from commit 678deece88)
- (cherry picked from commit 64f8500367)
- (cherry picked from commit b931145a26)
- (cherry picked from commit 3e1c50e9a7)
- (cherry picked from commit 46ff7f9c12)
- (cherry picked from commit ebd9420687)
- (cherry picked from commit aae212a87c)
- (cherry picked from commit 592ca789e2)
- (cherry picked from commit edc2906892)
Parent PR: #24492Closesscylladb/scylladb#24740
* github.com:scylladb/scylladb:
test/boost/sstable_datafile_test: add test for corrupt data
sstables/mx/writer: handler rows with empty keys
test/lib/cql_assertions: introduce columns_assertions
sstables: add corrupt_data_handler to sstables::sstables
tools/scylla-sstable: make large_data_handler a local
db: introduce corrupt_data_handler
mutation: introduce frozen_mutation_fragment_v2
mutation/mutation_partition_view: read_{clustering,static}_row(): return row type
mutation/mutation_partition_view: extract de-ser of {clustering,static} row
idl-compiler.py: generate skip() definition for enums serializers
idl: extract full_position.idl from position_in_partition.idl
db/system_keyspace: add apply_mutation()
db/system_keyspace: introduce the corrupt_data table
ScyllaDB supports non-frozen UDTs since 3.2, no need to keep referencing
this limitation in the current docs. Replace the description of the
limitation with general description of frozen semantics for UDTs.
Fixes: #22929Closesscylladb/scylladb#24763
(cherry picked from commit 37ef9efb4e)
Closesscylladb/scylladb#24779
To serve as a place to store corrupt mutation fragments. These fragments
cannot be written to sstables, as they would be spread around by
compaction and/or repair. They even might make parsing the sstable
impossible. So they are stored in this special table instead, kept
around to be inspected later and possibly restored if possible.
(cherry picked from commit 92b5fe8983)
This change resolves an issue where selecting a version from the multiversion dropdown on Markdown pages (e.g. https://docs.scylladb.com/manual/stable/alternator/getting-started.html) incorrectly redirected users to the main page instead of the corresponding versioned page.
The underlying cause was that the `multiversion` extension relies on `source_suffix` to identify available pages for URL mapping. Without this configuration, proper redirection fails for `.md` files.
This fix should be backported to `2025.1` to ensure correct behavior. Otherwise, the fix will only take effect in future releases.
Testing locally is non-trivial: clone the repository, apply the changes to each relevant branch, set `smv_remote_whitelist` to "", then run `make multiversionpreview`. Afterward, switch between versions in the dropdown to verify behavior. I've tested it locally, so the best next step is to merge and confirm that it works as expected in the live environment.
Closesscylladb/scylladb#23957
(cherry picked from commit 4ba7182515)
Closesscylladb/scylladb#24010
Currently, when we rebuild a tablet, we stream data from all
replicas. This creates a lot of redundancy, wastes bandwidth
and CPU resources.
In this series, we split the streaming stage of tablet rebuild into
two phases: first we stream tablet's data from only one replica
and then repair the tablet.
Fixes: https://github.com/scylladb/scylladb/issues/17174.
Needs backport to 2025.1 to prevent out of space during streaming
- (cherry picked from commit b80e957a40)
- (cherry picked from commit ed7b8bb787)
- (cherry picked from commit 5d6041617b)
- (cherry picked from commit 4a847df55c)
- (cherry picked from commit eb17af6143)
- (cherry picked from commit acd32b24d3)
- (cherry picked from commit 372b562f5e)
Parent PR: #23187Closesscylladb/scylladb#23682
* github.com:scylladb/scylladb:
test: add test for rebuild with repair
locator: service: move to rebuild_v2 transition if cluster is upgraded
locator: service: add transition to rebuild_repair stage for rebuild_v2
locator: service: add rebuild_repair tablet transition stage
locator: add maybe_get_primary_replica
locator: service: add rebuild_v2 tablet transition kind
gms: add REPAIR_BASED_TABLET_REBUILD cluster feature
Modify write_both_read_old and streaming stages in rebuild_v2 transition
kind: write_both_read_old moves to rebuild_repair stage and streaming stage
streams data only from one replica.
(cherry picked from commit eb17af6143)
Currently, in the streaming stage of rebuild tablet transition,
we stream tablet data from all replicas.
This patch series splits the streaming stage into two phases:
- repair phase, where we repair the tablet;
- streaming phase, where we stream tablet data from one replica.
To differentiate the two streaming methods, a new tablet transition
kind - rebuild_v2 - is added.
The transtions and stages for rebuild_v2 transition kind will be
added in the following patches.
(cherry picked from commit ed7b8bb787)
Add a new nodetool cluster super-command. Add nodetool
cluster repair command to repair tablet keyspaces.
It uses the new /storage_service/tablets/repair API.
The nodetool cluster repair command allows you to specify
the keyspace and tables to be repaired. A cluster repair of many
tables will request /storage_service/tablets/repair and wait for
the result synchronously for each table.
The nodetool repair command, which was previously used to repair
keyspaces of any type, now repairs only vnode keyspaces.
Fixes: https://github.com/scylladb/scylladb/issues/22409.
Needs backport to 2025.1 that introduces the new tablet repair API
- (cherry picked from commit cbde835792)
- (cherry picked from commit b81c81c7f4)
- (cherry picked from commit aa3973c850)
- (cherry picked from commit 8bbc5e8923)
- (cherry picked from commit 02fb71da42)
- (cherry picked from commit 9769d7a564)
Parent PR: #22905Closesscylladb/scylladb#23672
* github.com:scylladb/scylladb:
docs: nodetool: update repair and add tablet-repair docs
test: nodetool: add tests for cluster repair command
nodetool: add cluster repair command
nodetool: repair: extract getting hosts and dcs to functions
nodetool: repair: warn about repairing tablet keyspaces
nodetool: repair: move keyspace_uses_tablets function
Fixes#22688
If we set a dc rf to zero, the options map will still retain a dc=0 entry.
If this dc is decommissioned, any further alters of keyspace will fail,
because the union of new/old options will now contained an unknown keyword.
Change alter ks options processing to simply remove any dc with rf=0 on
alter, and treat this as an implicit dc=0 in nw-topo strategy.
This means we change the reallocate_tablets routine to not rely on
the strategy objects dc mapping, but the full replica topology info
for dc:s to consider for reallocation. Since we verify the input
on attribute processing, the amount of rf/tablets moved should still
be legal.
v2:
* Update docs as well.
v3:
* Simplify dc processing
* Reintroduce options empty check, but do early in ks_prop_defs
* Clean up unit test some
Closesscylladb/scylladb#22693
(cherry picked from commit 342df0b1a8)
Closesscylladb/scylladb#22877
`tablets_mode_for_new_keyspaces=enforced` enables tablets by default for
new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`.
However, it does not allow to opt-out when creating
new keyspaces by setting `tablets = {'enabled': false}`.
Refs scylladb/scylla-enterprise#4355
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 62aeba759b)