This reverts commit 8192f45e84.
The merge exposed a bug where truncate (via drop) fails and causes Raft
errors, leading to schema inconsistencies across nodes. This results in
test_table_drop_with_auto_snapshot failures with 'Keyspace test does not exist'
errors.
The specific problematic change was in commit 19b6207f which modified
truncate_table_on_all_shards to set use_sstable_identifier = true. This
causes exceptions during truncate that are not properly handled, leading
to Raft applier fiber stopping and nodes losing schema synchronization.
To be used for naming sstables in the snapshot by their
sstable identifiers rather than their generation, to
facilitate global deduplication of sstables in backup.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This series allows an operator to reset 'cleanup needed' flag if he already cleaned up the node, so that automatic cleanup will not do it again. We also change 'nodetool cleanup' back to run cleanup on one node only (and reset 'cleanup needed' flag in the end), but the new '--global' option allows to run cleanup on all nodes that needed it simultaneously.
Fixes https://github.com/scylladb/scylladb/issues/26866
Backport to all supported version since automatic cleanup behaviour as it is now may create unexpected by the operator load during cluster resizing.
Closesscylladb/scylladb#26868
* https://github.com/scylladb/scylladb:
cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster
cleanup: Add RESTful API to allow reset cleanup needed flag
97ab3f6622 changed "nodetool cleanup" (without arguments) to run
cleanup on all dirty nodes in the cluster. This was somewhat unexpected,
so this patch changes it back to run cleanup on the target node only (and
reset "cleanup needed" flag afterwards) and it adds "nodetool cluster
cleanup" command that runs the cleanup on all dirty nodes in the
cluster.
Cleaning up a node using per keyspace/table interface does not reset cleanup
needed flag in the topology. The assumption was that running cleanup on
already clean node does nothing and completes quickly. But due to
https://github.com/scylladb/scylladb/issues/12215 (which is closed as
WONTFIX) this is not the case. This patch provides the ability to reset
the flag in the topology if operator cleaned up the node manually
already.
Current native restore does not support primary_replica_only, it is
hard-coded disabled and this may lead to data amplification issues.
This patch extends the restore REST API to accept a
primary_replica_only parameter and propagates it to
sstables_loader so it gets correctly passed to
load_and_stream.
Fixes#26584
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
If a node is dead and cannot be brought back, tablet migrations are
stuck, until the node is explicitly marked as "permanently dead" /
"ignored node" / "excluded" (name differs in different contexts).
Currently, this is done during removenode and replace operations but
it should be possible to only mark the node as dead, for the purpose
of unblocking migrations or other topology operations, without doing
the actual removenode, because full removal might be currently
impossible, or not desirable due to lack of capacity or priorities.
This patch introduces this kind of API:
nodetool excludenode <host-id> [ ... <host-id> ]
Having this kind of API is an improvement in user experience in
several cases. For example, when we lose a rack, the only viable
option for recovery is to run removenode with an extra
--ignore-dead-nodes option. This removenode will fail in the tablet
draining phase, as there is no live node in the rack to rebuild
replicas in. This is confusing to the operator. But necessary before
ALTER KEYSPACE can proceed in order to change replication options to
drop the rack from RF.
Having this API allows operators to have more unified procedures,
where "nodetool excludenode" is always the first step of recovery,
which unblocks further topology operations, both those which restore
capacity, but also auto-scaling, tablet split/merge, load balancing,
etc.
Fixes#21281
This patch adds a new flag `drop-unfixable-sstables` to the scrub operation
in segregate mode, allowing to automatically drop SSTables that
cannot be fixed during scrub. It also includes API support of the 'drop_unfixable_sstables'
paramater and validation to ensure this flag is not enabled in other modes rather than segragate.
Seastar httpd recommended users to stop using contiguous requet.content string and read body they need from request's input_stream instead. However, "official" deprecation of request content had been only made recently.
This PR patches REST API server to turn this feature on and patches few handlers that mess with request bodies to read them from request stream.
Using newer seastar API, no need to backport
Closesscylladb/scylladb#26418
* github.com:scylladb/scylladb:
api: Switch to request content streaming
api: Fix indentation after previous patch
api: Coroutinize set_relabel_config handler
api: Coroutinize set_error_injection handler
There are three handler that need to be patched all at once with the
server itself being marked with set_content_streaming
For two simple handler just get the content string with
read_entire_stream_contiguous helper. This is what httpd server did
anyway.
The "start_restore" handler used the contiguous contents to parse json
from using rjson utility. This handler is patched to use
read_entire_stream() that returns a vector of temporary buffers. The
rjson parser has a helper to pars from that vector, so the change is
also optimization.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The handler effectively works with the view_builder and should be
registerd in the block that has this service captured.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Before, the `nodetool getendpoints` expected the key as one string separated by : (for example 1:val:ue). This caused errors if any part of the key had a colon because it was unclear whether a colon was a separator or part of the key.
This change adds a new API endpoint, `/storage_service/natural_endpoints/v2/{keyspace}`, which accepts composite partition keys as multiple key_component query parameters (e.g., ?key_component=1&key_component=val:ue). The `nodetool getendpoints` command was updated to support a new `--key-components` option, allowing users to pass key components as an array. The client and test infrastructure were extended to support multiple values for a query parameter, and tests were added to verify correct behavior with composite keys.
The previous method of passing partition keys as colon-separated strings is preserved for backward compatibility.
Backport is not required, since this change relies on recent Seastar updates
Fixes#16596Closesscylladb/scylladb#26169
* github.com:scylladb/scylladb:
docs: document --key-components option for getendpoints
test/nodetool/test_getendpoints: add coverage for --key-components param in getendpoints
nodetool: Introduce new option --key-components to specify compound partition keys as array
rest_api/test_storage_service: add v2 natural_endpoints test for composite key with multiple components
api/storage_service: add GET 'natural_endpoints' v2 to support composite keys with ':'
rest_api_mock: support duplicate query parameters
test/rest_api: support multiple query values per key in RestApiSession.send()
nodetool: add support of new seastar query_parameters_type to scylla_rest_client
The original `/storage_service/natural_endpoints` endpoint uses colon-separated strings for composite keys,
which causes ambiguity when key components contained colons.
This commits adds a new `/storage_service/natural_endpoints/v2/{keyspace}` endpoint that accepts partition key components
via repeated `key_component` query parameters to avoid this issue.
The handler in question when called for tablets-enabled keyspace, returns ranges that are inconsistent with those from system.tablets. Like this:
system.tablets:
```
TabletReplicas(last_token=-4611686018427387905, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 0)])
TabletReplicas(last_token=-1, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
TabletReplicas(last_token=4611686018427387903, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 1), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
TabletReplicas(last_token=9223372036854775807, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 1), ('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0)])
```
range_to_endpoint_map:
```
{'key': ['-9069053676502949657', '-8925522303269734226'], 'value': ['127.110.40.2', '127.110.40.3']}
{'key': ['-8925522303269734226', '-8868737574445419305'], 'value': ['127.110.40.2', '127.110.40.3']}
...
{'key': ['-337928553869203886', '-288500562444694340'], 'value': ['127.110.40.1', '127.110.40.3']}
{'key': ['-288500562444694340', '105026475358661740'], 'value': ['127.110.40.1', '127.110.40.3']}
{'key': ['105026475358661740', '611365860935890281'], 'value': ['127.110.40.1', '127.110.40.3']}
...
{'key': ['8307064440200319556', '9117218379311179096'], 'value': ['127.110.40.2', '127.110.40.1']}
{'key': ['9117218379311179096', '9125431458286674075'], 'value': ['127.110.40.2', '127.110.40.1']}
```
Not only the number of ranges differs, but also separating tokens do not match (e.g. tokens -2 and 0 belong to different tablets according to system.tablets, but fall into the same "range" in the API result).
The source of confusion is that despite storage_service::get_range_to_address_map() is given correct e.r.m. pointer from the table, it still uses token_metadata::sorted_token() to work with. The fix is -- when the e.r.m. is per-table, the tokens should be get from token_metadata's tablet_map (e.g. compare this to storage_service::effective_ownership() -- it grabs tokens differently for vnodes/tables cases).
This PR fixes the mentioned problem and adds validation test. The test also checks /storage_service/describe_ring endpoint that happens to return correct set of values.
The API is very ancient, so the bug is present in all versions with tablets
Fixes#26331Closesscylladb/scylladb#26231
* github.com:scylladb/scylladb:
test: Add validation of data returned by /storage_service endpoints
test,lib: Add range_to_endpoint_map() method to rest client
api: Indentation fix after previous patches
storage_service: Get tablet tokens if e.r.m. is per-table
storage_service,api: Get e.r.m. inside get_range_to_address_map()
storage_service: Calculate tokens on stack
There is still some compaction related code left in `sstables/`, move this to `compaction/` to make the compaction module more self-contained.
Code cleanup, no backport.
Closesscylladb/scylladb#26277
* github.com:scylladb/scylladb:
sstables,compaction: move make_sstable_set() implementations to compactions/
sstables,compaction: move compaction exceptions to compaction/
Seastar API level 8 changes a function type from std::function to
noncopyable_function. Apply those changes in tree and update the build
configuration.
Closesscylladb/scylladb#26006
sstables/exceptions.hh still hosts some compaction specific exception
types. Move them over to the new compaction/exceptions.hh, to make the
compaction module more self-contained.
Some files in compaction/ have using namespace {compaction,sstables}
clauses, some even in headers. This is considered bad practice and
muddies the namespace use. Remove them.
The namespace usage in this directory is very inconsistent, with files
and classes scattered in:
* global namespace
* namespace compaction
* namespace sstables
With cases, where all three used in the same file. This code used to
live in sstables/ and some of it still retains namespace sstables as a
heritage of that time. The mismatch between the dir (future module) and
the namespace used is confusing, so finish the migration and move all
code in compaction/ to namespace compaction too.
This patch, although large, is mechanic and only the following kind of
changes are made:
* replace namespace sstable {} with namespace compaction {}
* add namespace compaction {}
* drop/add sstables::
* drop/add compaction::
* move around forward-declarations so they are in the correct namespace
context
This refactoring revealed some awkward leftover coupling between
sstables and compaction, in sstables/sstable_set.cc, where the
make_sstable_set() methods of compaction strategies are implemented.
Now it's the caller (API handler) that gets e.r.m. from keyspace or
table, and this patch moves this selection into the callee.
This is preparational change. Next patch will need to pass optional
table_id to get_range_to_address_map(), and to make this table_id
presense consistent with whether e.r.m. is per table or not, it's
simpler to move e.r.m. evaluation into the latter method as well.
(indentation in API handler is deliberately left broken)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
currently repair requests can't be added or deleted on non-base
colocated tables. improve the error message and comments to be more
clear and detailed.
The handler uses database service, not storage_service, and should
belong to the corresponding API module from column_family.cc
Once moved, the handler can use captured sharded<database> reference and
forget about http_context::db.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25834
This patch introduces a new `incremental_mode` parameter to the tablet
repair REST API, providing more fine-grained control over the
incremental repair process.
Previously, incremental repair was on and could not be turned off. This
change allows users to select from three distinct modes:
- `regular`: This is the default mode. It performs a standard
incremental repair, processing only unrepaired sstables and skipping
those that are already repaired. The repair state (`repaired_at`,
`sstables_repaired_at`) is updated.
- `full`: This mode forces the repair to process all sstables, including
those that have been previously repaired. This is useful when a full
data validation is needed without disabling the incremental repair
feature. The repair state is updated.
- `disabled`: This mode completely disables the incremental repair logic
for the current repair operation. It behaves like a classic
(pre-incremental) repair, and it does not update any incremental
repair state (`repaired_at` in sstables or `sstables_repaired_at` in
the system.tablets table).
The implementation includes:
- Adding the `incremental_mode` parameter to the
`/storage_service/repair/tablet` API endpoint.
- Updating the internal repair logic to handle the different modes.
- Adding a new test case to verify the behavior of each mode.
- Updating the API documentation and developer documentation.
Fixes#25605Closesscylladb/scylladb#25693
Parsiong scrub options may throw after a snapshot is taken thus leaving
it on disk even though an operation reported as "failed". Not, probably,
critical, but not nice either.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The handler validates if the given ks:cf pair exists in the database,
then finds the table id to process further. There's a helper that does
both.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25669
POSTing on the same URL launches storage_service::drain(), so GETing on
it should (not that it's restriced somehow, but still) work on the same
service. This changes removes one more user of http_context::database
which in turn will allow removding database reference from context
eventually.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25677
Both handlers need database to proceed and thus need to be registered
(and unregistered) in a group that captures database for its handlers.
Once moved, the used get_cf_stats() method can be marked local to
column_family.cc file.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25671
Now it accepts http context and immediately gets the database from it to
pass to map_reduce_cf. Callers are updated to pass database from where
the context they already have.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.
However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.
Refs #22733
* No backport required
Closesscylladb/scylladb#25222
* github.com:scylladb/scylladb:
locator: abstract_replication_strategy: implement local_replication_strategy
locator: vnode_effective_replication_map: convert clone_data_gently to clone_gently
locator: abstract_replication_map: rename make_effective_replication_map
locator: abstract_replication_map: rename calculate_effective_replication_map
replica: database: keyspace: rename {create,update}_effective_replication_map
locator: effective_replication_map_factory: rename create_effective_replication_map
locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
locator: abstract_replication_strategy: rename global_vnode_effective_replication_map
keyspace: rename get_vnode_effective_replication_map
dht: range_streamer: use naked e_r_m pointers
storage_service: use naked e_r_m pointers
alternator: ttl: use naked e_r_m pointers
locator: abstract_replication_strategy: define is_local
to get_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map (both are per-keyspace).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prefer for specializing the local replication strategy,
local effective replication map, et. al byt defining
an is_local() predicate, similar to uses_tablets().
Note that is_vnode_based() still applies to local replication
strategy.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Added a new POST endpoint `/storage_service/drop_quarantined_sstables` to the REST API.
This endpoint allows dropping all quarantined SSTables either globally or
for a specific keyspace and tables.
Optional query parameters `keyspace` and `tables` (comma-separated table names) can be
provided to limit the scope of the operation.
Fixesscylladb/scylladb#19061
Backport is not required, it is new functionality
Closesscylladb/scylladb#25063
* github.com:scylladb/scylladb:
docs: Add documentation for the nodetool dropquarantinedsstables command
nodetool: add command for dropping quarantine sstables
rest_api: add endpoint which drops all quarantined sstables
Instead of using lambda, pass pointer to struct member. The result is
the same, but the code is nicer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25123