Commit Graph

321 Commits

Author SHA1 Message Date
Aleksandra Martyniuk
f48b57e7b9 compaction: use table_info in compaction tasks
Task manager compaction tasks need table names for logs.
Thus, compaction tasks store table infos instead of table ids.

get_table_ids function is deleted as it isn't used anywhere.
2023-05-30 09:58:55 +02:00
Aleksandra Martyniuk
4206139e5a api: move table_info to schema/schema_fwd.hh
table_info is moved from api/storage_service.hh to schema/schema_fwd.hh
so that it could be used in task manager's tasks.
2023-05-30 09:57:21 +02:00
Kefu Chai
2fbcbc09b0 api: specialize fmt::formatter<api::table_info>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `api::table_info` without the help of `operator<<`.

but the corresponding `operator<<()` is preserved in this change, as we
still have lots of callers relying on this << operator instorage_service.cc
where std::vector<table_info> is formatted using operator<<(ostream&, const Range&)
defined in to_string.hh. we could have used fmt/ranges.h to print the
std::vector<table_info>. but the combination of operator<<(ostream&, const Range&)
and FMT_DEPRECATED_OSTREAM renders this impossible. because
unlike the builtin range formatter specializations, the fallback formatter
synthesized from the operator<< does not have brackets defined for
the range printer. the brackets are used as the left and right marks
of the range, for instance, the array-alike containers are printed
like [1,2,3], while the tuple-alike containers are printed like
(1,2,3). once we are allowed to remove FMT_DEPRECATED_OSTREAM, we
should be able to use the builtin range formatter, and remove the
operator<< for api::table_info by then.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13975
2023-05-24 09:49:44 +03:00
Tomasz Grabiec
9d4bca26cc Merge 'raft topology: implement check_and_repair_cdc_streams API' from Kamil Braun
`check_and_repair_cdc_streams` is an existing API which you can use when the
current CDC generation is suboptimal, e.g. after you decommissioned a node the
current generation has more stream IDs than you need. In that case you can do
`nodetool checkAndRepairCdcStreams` to create a new generation with fewer
streams.

It also works when you change number of shards on some node. We don't
automatically introduce a new generation in that case but you can use
`checkAndRepairCdcStreams` to create a new generation with restored
shard-colocation.

This PR implements the API on top of raft topology, it was originally
implemented using gossiper.  It uses the `commit_cdc_generation` topology
transition state and a new `publish_cdc_generation` state to create new CDC
generations in a cluster without any nodes changing their `node_state`s in the
process.

Closes #13683

* github.com:scylladb/scylladb:
  docs: update topology-over-raft.md
  test: topology_experimental_raft: test `check_and_repair_cdc` API
  raft topology: implement `check_and_repair_cdc_streams` API
  raft topology: implement global request handling
  raft topology: introduce `prepare_new_cdc_generation_data`
  raft_topology: `get_node_to_work_on_opt`: return guard if no node found
  raft topology: remove `node_to_work_on` from `commit_cdc_generation` transition
  raft topology: separate `publish_cdc_generation` state
  raft topology: non-node-specific `exec_global_command`
  raft topology: introduce `start_operation()`
  raft topology: non-node-specific `topology_mutation_builder`
  topology_state_machine: introduce `global_topology_request`
  topology_state_machine: use `uint16_t` for `enum_class`es
  raft topology: make `new_cdc_generation_data_uuid` topology-global
2023-05-22 11:33:58 +02:00
Kefu Chai
b112a3b78a api: storage_service: use string for generation
in this change, the type of the "generation" field of "sstable" in the
return value of RESTful API entry point at
"/storage_service/sstable_info" is changed from "long" to "string".

this change depends on the corresponding change on tools/jmx submodule,
so we have to include the submodule change in this very commit.

this API is used by our JMX exporter, which in turn exposes the
SSTable information via the "StorageService.getSSTableInfo" mBean
operation, which returns the retrieved SSTable info as a list of
CompositeData. and "generation" is a field of an element in the
CompositeData. in general, the scylla JMX exporter is consumed
by the nodetool, which prints out returned SSTable info list with
a pretty formatted table, see
tools/java/src/java/org/apache/cassandra/tools/nodetool/SSTableInfo.java.
the nodetool's formatter is not aware of the schema or type of the
SSTables to be printed, neither does it enforce the type -- it just
tries it best to pretty print them as a tabular.

But the fields in CompositeData is typed, when the scylla JMX exporter
translates the returned SSTables from the RESTful API, it sets the
typed fields of every `SSTableInfo` when constructing `PerTableSSTableInfo`.
So, we should be consistent on the type of "generation" field on both
the JMX and the RESTful API sides. because we package the same version
of scylla-jmx and nodetool in the same precompiled tarball, and enforce
the dependencies on exactly same version when shipping deb and rpm
packages, we should be safe when it comes to interoperability of
scylla-jmx and scylla. also, as explained above, nodetool does not care
about the typing, so it is not a problem on nodetool's front.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13834
2023-05-15 20:33:48 +03:00
Raphael S. Carvalho
abc1eae1c2 Add API to disable tombstone GC in compaction
Adding new APIs /column_family/tombstone_gc and
/storage_service/tombstone_gc.

Mimicks existing APIs /column_family/autocompaction and
/storage_service/autocompaction.

column_family variant must specify a single table only,
following existing convention.

whereas the storage_service one can specify an entire
keyspace, or a subset of a tables in a keyspace.

column_family API usage
-----

The table name must be in keyspace:name format

Get status:
curl -s -X GET "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

Enable GC
curl -s -X POST "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

Disable GC
curl -s -X DELETE "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

storage_service API usage
-----

Tables can be specified using a comma-separated list.

Enable GC on keyspace
curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

Disable GC on keyspace
curl -s -X DELETE "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

Enable GC on a subset of tables
curl -s -X POST
"http://127.0.0.1:10000/storage_service/tombstone_gc/ks?cf=table1,table2"

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:38 -03:00
Raphael S. Carvalho
07104393af api: storage_service: restore indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:36 -03:00
Raphael S. Carvalho
501b5a9408 api: storage_service: extract code to set attribute for a set of tables
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:33:50 -03:00
Botond Dénes
bb62038119 Merge 'Scrub compaction task' from Aleksandra Martyniuk
Task manager's tasks covering scrub compaction on top,
shard and table level.

For this levels we have common scrub tasks for each scrub
mode since they share code. Scrub modes will be differentiated
on compaction group level.

Closes #13694

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to test scrub compaction
  compaction: add table_scrub_sstables_compaction_task_impl
  compaction: add shard_scrub_sstables_compaction_task_impl
  compaction: add scrub_sstables_compaction_task_impl
  api: get rid of unnecessary std::optional in scrub
  compaction: rename rewrite_sstables_compaction_task_impl
2023-05-10 14:18:20 +03:00
Aleksandra Martyniuk
8d32579fe6 compaction: add scrub_sstables_compaction_task_impl
Implementation of task_manager's task covering scrub sstables
compaction.
2023-05-09 11:13:57 +02:00
Aleksandra Martyniuk
79c39e4ea7 api: get rid of unnecessary std::optional in scrub
In scrub lambdas returning std::optional<compaction_stats> cannot
return empty value. Hence, std::optional wrapper isn't needed.
2023-05-09 10:31:44 +02:00
Kamil Braun
372a06f735 raft topology: implement check_and_repair_cdc_streams API
The original API is gossiper-based. Since we're moving CDC generations
handling to Raft-based topology, we need to implement this API as well.

For now the API creates a new generation unconditionally, in a follow-up
I'll introduce a check to skip the creation if the current generation is
optimal.
2023-05-08 16:49:01 +02:00
Kefu Chai
9b35faf485 treewide: replace generation_type::value() with generation_type::as_int()
* replace generation_type::value() with generation_type::as_int()
* drop generation_value()

because we will switch over to UUID based generation identifier, the member
function or the free function generation_value() cannot fulfill the needs
anymore. so, in this change, they are consolidated and are replaced by
"as_int()", whose name is more specific, and will also work and won't be
misleading even after switching to UUID based generation identifier. as
`value()` would be confusing by then: it could be an integer or a UUID.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-06 18:24:45 +08:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Tomasz Grabiec
dc04da15ec db: Introduce get_non_local_vnode_based_strategy_keyspaces()
It's meant to be used in places where currently
get_non_local_strategy_keyspaces() is used, but work only with
keyspaces which use vnode-based replication strategy.
2023-04-24 10:49:36 +02:00
Benny Halevy
4cdad8bc8b gms: heart_beat_state: use generation_type and version_type
Define default constructor as heart_beat_state(gms::generation_type(0))

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:48:01 +03:00
Kefu Chai
ecb5380638 treewide: s/boost::lexical_cast<std::string>/fmt::to_string()/
this change replaces all occurrences of `boost::lexical_cast<std::string>`
in the source tree with `fmt::to_string()`. for couple reasons:

* `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`,
  so the latter is easier to parse and read.
* `boost::lexical_cast<std::string>` creates a stringstream under the
  hood, so it can use the `operator<<` to stringify the given object.
  but stringstream is known to be less performant than fmtlib.
* we are migrating to fmtlib based formatting, see #13245. so
  using `fmt::to_string()` helps us to remove yet another dependency
  on `operator<<`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13611
2023-04-21 09:43:53 +03:00
Pavel Emelyanov
ece731301c api: Use ctx.sp in storage service handler
Similarly to previous patch, but from another routes group. The storage
service API calls mainly use storage service, but one place needs proxy
to call recalculate_schema_version() with

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 13:14:52 +03:00
Aleksandra Martyniuk
c4098df4ec compaction: create task manager's task for rewrite sstables keyspace compaction
Implementation of task_manager's task covering rewrite sstables keyspace
compaction that can be started through storage_service api.
2023-04-11 11:04:21 +02:00
Aleksandra Martyniuk
73860b7c9d compaction: create task manager's task for offstrategy keyspace compaction
Implementation of task_manager's task covering offstrategy keyspace compaction
that can be started through storage_service api.
2023-03-30 10:44:56 +02:00
Botond Dénes
60240e6d91 Merge 'bytes, gms: replace operator<<(..) with fmt formatter' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `bytes` and `gms::inet_address` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const bytes &)` and `operator<<(ostream, const gms::inet_address&)`.

`gms::inet_address` related changes are included here in hope to demonstrate the usage of delimiter specifier of `fmt_hex` 's formatter.

Refs #13245

Closes #13275

* github.com:scylladb/scylladb:
  gms/inet_address: implement operator<< using fmt::formatter
  treewide: use fmtlib to format gms::inet_address
  gms/inet_address: specialize fmt::formatter<gms::inet_address>
  bytes: implement formatting helpers using formatter
  bytes: specialize fmt::formatter<bytes>
  bytes: specialize fmt::formatter<fmt_hex>
  bytes: mark fmt_hex::v `const`
2023-03-28 08:25:41 +03:00
Kefu Chai
8dbaef676d treewide: use fmtlib to format gms::inet_address
the goal of this change is to reduce the dependency on
`operator<<(ostream&, const gms::inet_address&)`.

this is not an exhaustive search-and-replace change, as in some
caller sites we have other dependencies to yet-converted ostream
printer, we cannot fix them all, this change only updates some
caller of `operator<<(ostream&, const gms::inet_address&)`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 20:06:45 +08:00
Botond Dénes
b5afdf56c3 Merge 'Cleanup keyspace compaction task' from Aleksandra Martyniuk
Task manager task implementations of classes that cover
cleanup keyspace compaction which can be started through
/storage_service/keyspace_compaction/ api.

Top level task covers the whole compaction and creates child
tasks on each shard.

Closes #12712

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to test cleanup compaction
  compaction: create task manager's task for cleanup keyspace compaction on one shard
  compaction: create task manager's task for cleanup keyspace compaction
  api: add get_table_ids to get table ids from table infos
  compaction: create cleanup_compaction_task_impl
2023-03-27 11:52:51 +03:00
Kefu Chai
94c6df0a08 treewide: use fmtlib when printing UUID
this change tries to reduce the number of callers using operator<<()
for printing UUID. they are found by compiling the tree after commenting
out `operator<<(std::ostream& out, const UUID& uuid)`. but this change
alone is not enough to drop all callers, as some callers are using
`operator<<(ostream&, const unordered_map&)` and other overloads to
print ranges whose elements contain UUID. so in order to limit the
 scope of the change, we are not changing them here.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-20 15:38:45 +08:00
Aleksandra Martyniuk
7dd27205f6 compaction: create task manager's task for cleanup keyspace compaction
Implementation of task_manager's task covering cleanup keyspace compaction
that can be started through storage_service api.
2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk
4a5752d0d0 api: add get_table_ids to get table ids from table infos 2023-03-13 16:35:39 +01:00
Kefu Chai
063b3be8a7 api: reference httpd::* symbols like 'httpd::*'
it turns out we have `using namespace httpd;` in seastar's
`request_parser.rl`, and we should not rely on this statement to
expose the symbols in `seatar::httpd` to `seastar` namespace.
in this change,

* api/*.hh: all httpd symbols are referenced by `httpd::*`
  instead of being referenced as if they are in `seastar`.
* api/*.cc: add `using namespace seastar::httpd`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 18:21:03 +08:00
Kefu Chai
5522080f80 api: s/request/http::request/
seastar::httpd::request was deprecated in favor of `seastar::http::request`
since bdd5d929891d2cb821eca25896e25ed4ff658b7a.
so let's use the latter. this change also silences the warning of:

```
/home/kefu/dev/scylladb/api/authorization_cache.cc: In function ‘void api::set_authorization_cache(http_context&, seastar::httpd::routes&, seastar::sharded<auth::service>&)’:
/home/kefu/dev/scylladb/api/authorization_cache.cc:19:104: error: ‘using seastar::httpd::request = struct seastar::http::request’ is deprecated: Use http::request instead [-Werror=deprecated-declarations]
   19 |     httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<request> req) -> future<json::json_return_type> {
      |                                                                                                        ^~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 14:03:42 +08:00
Aleksandra Martyniuk
159e603ac4 compaction: create task manager's task for major keyspace compaction
Implementation of task_manager's task covering major keyspace compaction
that can be started through storage_service api.
2023-02-23 15:48:05 +01:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Raphael S. Carvalho
640436e72a api: storage_service: Run maintenance compactions on all compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:15 -03:00
Benny Halevy
ec5707a4a8 api: storage_service: fixup indentation 2022-11-20 09:14:45 +02:00
Benny Halevy
cc63719782 api: storage_service: add run_on_existing_tables
Gracefully skip tables that were removed
in the background.

Fixes #12007

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:14:29 +02:00
Benny Halevy
9ef9b9d1d9 api: storage_service: add parse_table_infos
The table UUIDs are the same on all shards
so we might as well get them on shard 0
(as we already do) and reuse them on other shards.

It is more efficient and accurate to lookup the table
eventually on the shard using its uuid rather than
its name.  If the table was dropped and recreated
using the same name in the background, the new
table will have a new uuid and do the api function
does not apply to it anymore.

A following change will handle the no_such_column_family
cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:14:21 +02:00
Benny Halevy
9b4a9b2772 api: storage_service: log errors from compaction related handlers
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:03:25 +02:00
Benny Halevy
a47f96bc05 api: storage_service: coroutinize compaction related handlers
Before we improve parsing tables lists
and handling of no_such_column_family
errors.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:03:25 +02:00
Benny Halevy
fc278be6c4 table: add perform_cleanup_compaction
Move the integration with compaction_manager
from the api layer to the tabel class so
it can also make sure the memtable is cleaned up in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-06 19:41:33 +02:00
Benny Halevy
85523c45c0 api: storage_service: add logging for compaction operations et al
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-06 19:41:31 +02:00
Benny Halevy
9ef2631ec2 api, service: storage_service: removenode: allow passing ignore_nodes as uuid:s
Currently the api is inconsistent: requiring a uuid for the
host_id of the node to be removed, while the ignored nodes list
is given as comma-separated ip addresses.

Instead, support identifying the ignored_nodes either
by their host_id (uuid) or ip address.

Also, require all ignore_nodes to be of the same kind:
either UUIDs or ip addresses, as a mix of the 2 is likely
indicating a user error.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:49:03 +03:00
Benny Halevy
340a5a0c94 api: storage_service: remove_node: validate host_id
The node to be removed must be identified by its host_id.
Validate that at the api layer and pass the parsed host_id
down to storage_service::removenode.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:38:13 +03:00
Pavel Emelyanov
5fba0a7f65 api: Move update_snitch endpoint
It's now living in storage_service.cc, but non-global snitch is
available in endpoint_snitch.cc so move the endpoint handler there

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-20 12:33:20 +03:00
Benny Halevy
d32c497cd9 database: automatically take snapshot of base table views
The logic to reject explicit snapshot of views/indexes
was improved in aa127a2dbb.
However, we never implemented auto-snapshot of
view/indexes when taking a snapshot of the base table.

This is implemented in this patch.

The implementation is built on top of
ba42852b0e
so it would be hard to backport to 5.1 or earlier
releases.

Fixes #11612

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 11:02:54 +03:00
Benny Halevy
55b0b8fe2c api: storage_service: reject snapshot of views in api layer
Rather than pushing the check to
`snapshot_ctl::take_column_family_snapshot`, just check
that explcitly when taking a snapshot of a particular
table by name over the api.

Other paths that call snapshot_ctl::take_column_family_snapshot
are internal and use it to snap views already.

With that, we can get rid of the allow_view_snapshots flag
that was introduced in aab4cd850c.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 10:44:56 +03:00
Avi Kivity
be44fd63f9 Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy
This series converts the synchronous `effective_replication_map::get_range_addresses` to async
by calling the replication strategy async entry point with the same name, as its callers are already async
or can be made so easily.

To allow it to yield and work on a coherent view of the token_metadata / topology / replication_map,
let the callers of this patch hold a effective_replication_map per keyspace and pass it down
to the (now asynchronous) functions that use it (making affected storage_service methods static where possible
if they no longer depend on the storage_service instance).

Also, the repeated calls to everywhere_replication_strategy::calculate_natural_endpoints
are optimized in this series by introducing a virtual abstract_replication_strategy::has_static_natural_endpoints predicate
that is true for local_strategy and everywhere_replication_strategy, and is false otherwise.
With it, functions repeatedly calling calculate_natural_endpoints in a loop, for every token, will call it only once since it will return the same result every time anyhow.

Refs #11005

Doesn't fix the issue as the large allocation still remains until we make change dht::token_range_vector chunked (chunked_vector cannot be used as is at the moment since we require the ability to push also to the front when unwrapping)

Closes #11009

* github.com:scylladb/scylladb:
  effective_replication_map: make get_range_addresses asynchronous
  range_streamer: add_ranges and friends: get erm as param
  storage_service: get_new_source_ranges: get erm as param
  storage_service: get_changed_ranges_for_leaving: get erm as param
  storage_service: get_ranges_for_endpoint: get erm as param
  repair: use get_non_local_strategy_keyspaces_erms
  database: add get_non_local_strategy_keyspaces_erms
  database: add get_non_local_strategy_keyspaces
  storage_service: coroutinize update_pending_ranges
  effective_replication_map: add get_replication_strategy
  effective_replication_map: get_range_addresses: use the precalculated replication_map
  abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies
  abstract_replication_strategy: reindent
  utils: sequenced_set: expose set and `contains` method
  abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set
  utils: sequenced_set: templatize VectorType
  utils: sanitize sequenced_set
  utils: sequenced_set: delete mutable get_vector method
2022-08-09 13:25:53 +03:00
Benny Halevy
7ee6048255 database: add get_non_local_strategy_keyspaces
For node operations, we currently call get_non_system_keyspaces
but really want to work on all keyspace that have non-local
replication strategy as they are replicated on other nodes.

Reflect that in the replica::database function name.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Benny Halevy
d96b56fee2 database: rename {flush,snapshot}_on_all and make static
Follow the convention of drop_table_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
14faa3b6f4 compaction_manager: perform_cleanup, perform_sstable_upgrade: use a lw_shared_ptr for owned token ranges
And completely get rid of the dependency on replica::database.

Also, add respective rest_api tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 08:08:11 +03:00
Aleksandra Martyniuk
6ea5bc96d7 scrub compaction: return status indicating aborted operations
over the rest api

Performing compaction scrub user did not know whether an operation
was aborted.

If compaction scrub is aborted, return status the user gets over
rest api is set to 1.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
f1980f8dc6 scrub compaction: count validation errors and return status over the rest api
Performing compaction scrub user did not know whether any validation
errors were encountered.

The number of validation errors per given compaction scrub is gathered
and summed from each shard. Basing on that value return status over
the rest api is set to 3 if any validation errors were encountered.
2022-07-29 09:35:20 +02:00