nodetool cluster repair without additional params repairs all tablet
keyspaces in a cluster. Currently, if a table is dropped while
the command is running, all tables are repaired but the command finishes
with a failure.
Modify nodetool cluster repair. If a table wasn't specified
(i.e. all tables are repaired), the command finishes successfully
even if a table was dropped.
If a table was specified and it does not exist (e.g. because it was
dropped before the repair was requested), then the behavior remains
unchanged.
Fixes: SCYLLADB-568.
Closesscylladb/scylladb#28739
(cherry picked from commit 2e68f48068)
Closesscylladb/scylladb#29006Closesscylladb/scylladb#29038
In PR 5b6570be52 we introduced the config option
`sstable_compression_user_table_options` to allow adjusting the default
compression settings for user tables. However, the new option was hooked
into the CQL layer and applied only to CQL base tables, not to the whole
spectrum of user tables: CQL auxiliary tables (materialized views,
secondary indexes, CDC log tables), Alternator base tables, Alternator
auxiliary tables (GSIs, LSIs, Streams).
Fix this by moving the logic into the `schema_builder` via a schema
initializer. This ensures that the default compression settings are
applied uniformly regardless of how the table is created, while also
keeping the logic in a central place.
Register the initializer at startup in all executables where schemas are
being used (`scylla_main()`, `scylla_sstable_main()`, `cql_test_env`).
Finally, remove the ad-hoc logic from `create_table_statement`
(redundant as of this patch), remove the xfail markers from the relevant
tests and adjust `test_describe_cdc_log_table_create_statement` to
expect LZ4WithDicts as the default compressor.
Fixes#26914.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit 1e37781d86)
Before this commit, when the underlying materialized view was created,
it didn't have the property `tombstone_gc` set to any value. That
was a bug and we fix it now.
Two reproducer tests is added for validation. They reproduce the problem
and don't pass before this commit.
Fixesscylladb/scylladb#26542
This PR extends the restore API so that it accepts primary_replica_only as parameter and it combines the concepts of primary-replica-only with scoped streaming so that with:
- `scope=all primary_replica_only=true` The restoring node will stream to the global primary replica only
- `scope=dc primary_replica_only=true` The restoring node will stream to the local primary replica only.
- `scope=rack primary_replica_only=true` The restoring node will stream only to the primary replica from within its own rack (with rf=#racks, the restoring node will stream only to itself)
- `scope=node primary_replica_only=true` is not allowed, the restoring node will always stream only to itself so the primary_replica_only parameter wouldn't make sense.
The PR also adjusts the `nodetool refresh` restriction on running restore with both primary_replica_only and scope, it adds primary_replica_only to `nodetool restore` and it adds cluster tests for primary replica within scope.
Fixes#26584
- (cherry picked from commit 965a16ce6f)
- (cherry picked from commit 136b45d657)
- (cherry picked from commit 83aee954b4)
- (cherry picked from commit c1b3fe30be)
- (cherry picked from commit d4e43bd34c)
- (cherry picked from commit 817fdadd49)
- (cherry picked from commit a04ebb829c)
Parent PR: #26609Closesscylladb/scylladb#27011
* github.com:scylladb/scylladb:
Add cluster tests for checking scoped primary_replica_only streaming
Improve choice distribution for primary replica
Refactor cluster/object_store/test_backup
nodetool restore: add primary-replica-only option
nodetool refresh: Enable scope={all,dc,rack} with primary_replica_only
Enable scoped primary replica only streaming
Support primary_replica_only for native restore API
97ab3f6622 changed "nodetool cleanup" (without arguments) to run
cleanup on all dirty nodes in the cluster. This was somewhat unexpected,
so this patch changes it back to run cleanup on the target node only (and
reset "cleanup needed" flag afterwards) and it adds "nodetool cluster
cleanup" command that runs the cleanup on all dirty nodes in the
cluster.
(cherry picked from commit 0f0ab11311)
Add --primary-replica-only and update docs page for
nodetool restore.
The relationship with the scope parameter is:
- scope=all primary_replica_only=true gets the global primary replica
- scope=dc primary_replica_only=true gets the local primary replica
- scope=rack primary_replica_only=true is like a noop, it gets the only
replica in the rack (rf=#racks)
- scope=node primary_replica_only=node is not allowed
Fixes#26584
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
(cherry picked from commit c1b3fe30be)
So far it was not allowed to pass a scope when using
the primary_replica_only option, this patch enables
it because the concepts are now combined so that:
- scope=all primary_replica_only=true gets the global primary replica
- scope=dc primary_replica_only=true gets the local primary replica
- scope=rack primary_replica_only=true is like a noop, it gets the only
replica in the rack (rf=#racks)
- scope=node primary_replica_only=node is not allowed
Fixes#26584
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
(cherry picked from commit 83aee954b4)
This patch fixes 2 issues at one go:
First, Currently sstables::load clears the sharding metadata
(via open_data()), and so scylla-sstable always prints
an empty array for it.
Second, printing token values would generate invalid json
as they are currently printed as binary bytes, and they
should be printed simply as numbers, as we do elsewhere,
for example, for the first and last keys.
Fixes#26982
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#26991
(cherry picked from commit f9ce98384a)
Closesscylladb/scylladb#27042
Some tools commands have links to online documentation in their help output. These links were left behind in the source-available change, they still point to the old opensource docs. Furthermore, the links in the scylla-sstable help output always point to the latest stable release's documentation, instead of the appropriate one for the branch the tool was built from. Fix both of these.
Fixes: scylladb/scylladb#26320
Broken documentation link fix for the tool help output, needs backport to all live source-available versions.
- (cherry picked from commit 5a69838d06)
- (cherry picked from commit 15a4a9936b)
- (cherry picked from commit fe73c90df9)
Parent PR: #26322Closesscylladb/scylladb#26390
* github.com:scylladb/scylladb:
tools/scylla-sstable: fix doc links
release: adjust doc_link() for the post source-available world
tools/scylla-nodetool: remove trailing " from doc urls
Using the name regular as the incremental mode could be confusing, since
regular might be interpreted as the non-incremental repair. It is better
to use incremental directly.
Before:
- regular (standard incremental repair)
- full (full incremental repair)
- disabled (incremental repair disabled)
After:
- incremental (standard incremental repair)
- full (full incremental repair)
- disabled (incremental repair disabled)
Fixes#26503Closesscylladb/scylladb#26504
(cherry picked from commit 13dd88b010)
Closesscylladb/scylladb#26521
The doc links in scylla-sstable help output are static, so they always
point to the documentation of the latest stable release, not to the
documentation of the release the tool binary is from. On top of that,
the links point to old open-source documentation, which is now EOL.
Fix both problems: point link at the new source-available documentation
pages and make them version aware.
(cherry picked from commit fe73c90df9)
Extend the `sstable_format` config enum with a "ms" value,
and, if it's enabled (in the config and in cluster features),
use it for new sstables on the node.
(Before this commit, writing `ms` sstables should only be possible
in unit tests, via internal APIs. After this commit, the format
can be enabled in the config and the database will write it during
normal operation).
As of this commit, the new format is not the default yet.
(But it will become the default in a later commit in the same series).
Introduce `ms` -- a new sstable format version which
is a hybrid of Cassandra's `me` and `da`.
It is based on `me`, but with the index components
(Summary.db and Index.db) replaced with the index
components of `da` (Partitions.db and Rows.db).
As of this patch, the version is never chosen
anywhere for writing sstables yet. It is only introduced.
We will add it to unit tests in a later commit,
and expose it to users in yet later commit.
Later in this patch series we will introduce `ms` as the new highest
format, but we won't be able to make it the default within the same
series due to some dtest incompatibilities.
Until `ms` is the default, we don't `scylla sstable` to default to
it, even though it's the highest. Let's choose the default
version in `scylla sstable` using the same method which is
used by Scylla in general: by letting the `sstable_manager` choose.
An offline, scylla-sstable variant of nodetool upgradesstables command.
Applies latest (or selected) sstable version and latest schema.
Closesscylladb/scylladb#26109
Sstables store a basic schema in the statistics component. The scylla-sstable tool uses this to be able to read and dump sstables in a self-contained manner, without requiring an external schema source.
The problem is that the schema stored int he statistics component is incomplete: it doesn't store column names for key columns, so these have placeholder names in dump outputs where column names are visible.
This is not a disaster but it is confusing and it can cause errors in scripts which want to check the content of sstables, while also knowing the schema and expecting the proper names for key columns.
To make sstables truly self-contained w.r.t. the schema, add a complete schema to the scylla component. This schema contains the names and types of all columns, as well as some basic information about the schema: keyspace name, table name, id and version.
When available, scylla-sstable's schema loader will use this new more complete schema and fall-back to the old method of loading the (incomplete) schema from the statistics component otherwise.
New feature, no backport required.
Closesscylladb/scylladb#24187
* github.com:scylladb/scylladb:
test/boost/schema_loader_test: add specific test with interesting types
test/lib/random_schema: add random_schema(schema_ptr) constructor
test/boost/schema_loader_test: test_load_schema_from_sstable: add fall-back test
tools/schema_loader: add support for loading from scylla-metadata
tools/schema_loader: extract code which load schema from statistics
sstables: scylla_metadata: add schema member
Some files in compaction/ have using namespace {compaction,sstables}
clauses, some even in headers. This is considered bad practice and
muddies the namespace use. Remove them.
The namespace usage in this directory is very inconsistent, with files
and classes scattered in:
* global namespace
* namespace compaction
* namespace sstables
With cases, where all three used in the same file. This code used to
live in sstables/ and some of it still retains namespace sstables as a
heritage of that time. The mismatch between the dir (future module) and
the namespace used is confusing, so finish the migration and move all
code in compaction/ to namespace compaction too.
This patch, although large, is mechanic and only the following kind of
changes are made:
* replace namespace sstable {} with namespace compaction {}
* add namespace compaction {}
* drop/add sstables::
* drop/add compaction::
* move around forward-declarations so they are in the correct namespace
context
This refactoring revealed some awkward leftover coupling between
sstables and compaction, in sstables/sstable_set.cc, where the
make_sstable_set() methods of compaction strategies are implemented.
When available, load the schema from the Scylla component, where the
column names of keys are also available. Fall-back to loading the schema
from the Statistics component otherwise (previous behaviour).
To store the most important schema fields, like id, version, keyspace
name, table name and the list of all columns, along with their kind,
name and type. This will serve as alternative schema source to the one
stored in statistics component. This latter one doesn't store any of the
metatada and neither does primary key names (just the types), so it is
leads to confusion when it is used as schema source for scylla-sstable.
This new schema stored in the scylla-metadata component is not intended
to be a full-schema, equivalent to the one stored in the schema tables,
it is intended to be good enough for scylla-sstable being able to parse
sstables in a self-sufficient manner.
Acecssing this member directly is deprecated, migrate code to use {get,set}_query_param() and friends instead.
Fixes: https://github.com/scylladb/scylladb/issues/26023
Preparation for seastar update, no backport required.
Closesscylladb/scylladb#26024
* github.com:scylladb/scylladb:
treewide: move away from accessing httpd::request::query_parameters
test/pylib/s3_server_mock.py: better handle empty query params
Passing `0` as the `initial_tablets` argument causes `schema_loaders`'s
placeholder keyspace to be a tablet keyspace.
This causes `scylla sstable` to reject some table schemas which
are legitimate in this context. For example, `scylla sstable`
refuses to work with sstables which contains `counter` columns,
because tablets don't support counters.
This is undesirable. Let's make `schema_loader`'s keyspace
a non-tablet keyspace.
Closesscylladb/scylladb#26192
`purgeable.lua` was written for a specific investigation a few years ago.
`writetime-histogram.lua` is an sstable script transcription of the former scylla-sstable writetime-histogram command. This was also written for an investigation (before script command existed) and is too specific to be a native command, so was removed by edaf67edcb.
Add both scripts to the sample script library, they can be useful, either for a future investigation, or as samples to copy+edit to write new scripts (and train AI).
New sstable scripts, no backport
Closesscylladb/scylladb#26137
* github.com:scylladb/scylladb:
tools/scylla-sstable-scripts: introduce writetime-histogram.lua
tools/scylla-sstable-scripts: introduce purgable.lua
Produces a histogram with the writetime (timestamp) of the data in the
sstable(s). The histogram is printed to the output, along with general
stats about the processed data.
Collects and prints statistics about how much data is purgeable in an
sstable. Works only with tombstone_gc = {'mode': 'timeout'};
Can help diagnosing the efficiency (or lack of) tombstone-gc.
when cluster repair is run for an entire keyspace, nodetool makes a
repair api request for each table.
if the keyspace contains colocated tables, then the api request for the
colocated tables will fail, because currently scylla doesn't allow making
repair requests for specific colocated tables, but only for base tables.
if the request is to repair an entire keyspace then we can ignore this,
because we will make a repair request for all base tables, and this in
turn will repair also all the colocated tables in the keyspace.
however if specific tables are requested and some of them are colocated
then we should propagate the error to let the user know the request is
invalid.
Refs https://github.com/scylladb/scylladb/issues/24816
no backport - no colocated tablets in previous releases
Closesscylladb/scylladb#26051
* github.com:scylladb/scylladb:
nodetool: ignore repair request error of colocated tables
storage_service: improve error message on repair of colocated tables
The `json_mutation_stream_parser.cc` file was not included in the
`scylla-tools` CMake target. This could lead to "undefined reference"
linker errors when building with CMake.
This commit adds the missing source file to the target's source list.
Closesscylladb/scylladb#26108
This command was written for an investigation and was used exactly once.
This would have been a perfect candidate for the (also rarely used)
scylla-sstable script command, but it didn't exist yet.
Drop this command from the tool, such super-specific commands should be
written as sstable-scripts nowadays, which is what we will do if we ever
need this again.
Closesscylladb/scylladb#26062
when cluster repair is run for an entire keyspace, nodetool makes a
repair api request for each table.
if the keyspace contains colocated tables, then the api request for the
colocated tables will fail, because currently scylla doesn't allow making
repair requests for specific colocated tables, but only for base tables.
if the request is to repair an entire keyspace then we can ignore this,
because we will make a repair request for all base tables, and this in
turn will repair also all the colocated tables in the keyspace.
however if specific tables are requested and some of them are colocated
then we should propagate the error to let the user know the request is
invalid.
Refs scylladb/scylladb#24816
tools/scylla-sstable.cc has 3.5k SLOC, out of which this class alone is 1K. Extract into own hh and cc. Since this class was already using pimpl, the header remains nice and small.
Code cleanup, no backport needed.
Closesscylladb/scylladb#26064
* github.com:scylladb/scylladb:
tools: extract json_mtuation_stream_parser to its own hh,cc files
tools/scylla-sstable: fix indentation
tools/scylla-sstable: prepare for extracting json_mutation_stream_parser
tools/scylla-sstable.cc has 3.5k SLOC, out of which this class alone is
1K. Extract into own hh and cc, just a copy-paste after the preparation
commit.
Make methods out-of-line, so class declaration stands on its own,
without definition of impl.
Move auxiliary structures, used only by impl, out of the class scope.
Move parser to tools namespace, and auxiliaries to anonymous namespace
within the tools one.
Pass down logger ref to parser impl and below, to prepare for sst_log
not being available in scope.
Add comment to parser class explaining what it does.
The `--incremental-mode` option specifies the incremental repair mode.
Can be 'disabled', 'regular', or 'full'.
'regular': The incremental repair logic is enabled. Unrepaired sstables
will be included for repair. Repaired sstables will be skipped. The
incremental repair states will be updated after repair.
'full': The incremental repair logic is enabled. Both repaired and
unrepaired sstables will be included for repair. The incremental repair
states will be updated after repair.
'disabled': The incremental repair logic is disabled completely. The
incremental repair states, e.g., repaired_at in sstables and
sstables_repaired_at in the system.tablets table, will not be updated
after repair.
When the option is not provided, it defaults to regular.
Fixes#25931Closesscylladb/scylladb#25969
We are moving away from integer generations, so stop using them.
Also drop the --generation command-line parameter, UUID generations
don't have be provided by the caller, because random UUIDs will not
collide with each other. To help the caller still know what generation
the output sstable has (previously they provided it via --generation),
print the generation to stdout.
Closesscylladb/scylladb#25166
Enabling the configuration option should have no negative impact on how the tool
behaves. There is no topology and we do not create any keyspaces (except for
trivial ones using `SimpleStrategy` and RF=1), only their metadata. Thanks to
that, we don't go through validation logic that could fail in presence of an
RF-rack-invalid keyspace.
On the other hand, enabling `rf_rack_valid_keyspaces` lets the tool access code
hidden behind that option. While that might not be of any consequence right now,
in the future it might be crucial (for instance, see: scylladb/scylladb#23030).
Note that other tools don't need an adjustment:
* scylla-types: it uses schema_builder, but it doesn't reuse any other
relevant part of Scylla.
* nodetool: it manages Scylla instances but is not an instance itself, and it
does not reuse any codepaths.
* local-file-key-generator: it has nothing to do with Scylla's logic.
Other files in the `tools` directory are auxiliary and are instructed with an
already created instance of `db::config`. Hence, no need to modify them either.
Fixesscylladb/scylladb#25792Closesscylladb/scylladb#25794
The memory usage is tracked with the help of a semaphore, so just export
its "consumed" units.
One tricky place here is the need to skip metrics registration for
scylla-sstable tool. The thing is that the tools starts the storage
manager and sstables manager on start and then some of tool's operations
may want to start both managers again (via cql environment) causing
double metrics registration exception.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25769
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.
New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.
The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.
The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.
Note: following configuration needs to be added to ccache.conf:
sloppiness = pch_defines,time_macros
Closes#25182
To avoid dependency proliferation, switch to forward declarations.
In one case, we introduce indirection via std::unique_ptr and
deinline the constructor and destructor.
Ref #1Closesscylladb/scylladb#25584
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.
Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.
This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:
The repaired_at on disk field of a SSTable is used.
- A 64-bit number increases sequentially
The sstables_repaired_at is added to the system.tablets table.
- repaired_at <= sstables_repaired_at means the sstable is repaired
The being_repaired in memory field of a SSTable is added.
- A repair UUID tells which sstable has participated in the repair
Initial test results:
1) Medium dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~500GB
Cluster pre-populated with ~500GB of data before starting repairs job.
Results for Repair Timings:
The regular repair run took 210 mins.
Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
The speedup is: 183 mins / 48s = 228X
2) Small dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~167GB
Cluster pre-populated with ~167GB of data before starting the repairs job.
Regular repair 1st run took 110s, 2nd and 3rd runs took 110s.
Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
The speedup is: 110s / 1.5s = 73X
3) Large dataset results
Node amount: 6
Instance type: i4i.2xlarge, 3 racks
50% of base load, 50% read/write
Dataset == Sum of data on each node
Dataset Non-incremental repair (minutes)
1.3 TiB 31:07
3.5 TiB 25:10
5.0 TiB 19:03
6.3 TiB 31:42
Dataset Incremental repair (minutes)
1.3 TiB 24:32
3.0 TiB 13:06
4.0 TiB 5:23
4.8 TiB 7:14
5.6 TiB 3:58
6.3 TiB 7:33
7.0 TiB 6:55
Fixes#22472Closesscylladb/scylladb#24291
* github.com:scylladb/scylladb:
replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
compaction: Move compaction_reenabler to compaction_reenabler.hh
topology_coordinator: Make rpc::remote_verb_error to warning level
repair: Add metrics for sstable bytes read and skipped from sstables
test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
test.py: Add tests for tablet incremental repair
repair: Add tablet incremental repair support
compaction: Add tablet incremental repair support
feature_service: Add TABLET_INCREMENTAL_REPAIR feature
tablet_allocator: Add tablet_force_tablet_count_increase and decrease
repair: Add incremental helpers
sstable: Add being_repaired to sstable
sstables: Add set_repaired_at to metadata_collector
mutation_compactor: Introduce add operator to compaction_stats
tablet: Add sstables_repaired_at to system.tablets table
test: Fix drain api in task_manager_client.py
This patch addes incremental_repair support in compaction.
- The sstables are split into repaired and unrepaired set.
- Repaired and unrepaired set compact sperately.
- The repaired_at from sstable and sstables_repaired_at from
system.tablets table are used to decide if a sstable is repaired or
not.
- Different compactions tasks, e.g., minor, major, scrub, split, are
serialized with tablet repair.