It's now living in storage_service.cc, but non-global snitch is
available in endpoint_snitch.cc so move the endpoint handler there
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The logic to reject explicit snapshot of views/indexes
was improved in aa127a2dbb.
However, we never implemented auto-snapshot of
view/indexes when taking a snapshot of the base table.
This is implemented in this patch.
The implementation is built on top of
ba42852b0e
so it would be hard to backport to 5.1 or earlier
releases.
Fixes#11612
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than pushing the check to
`snapshot_ctl::take_column_family_snapshot`, just check
that explcitly when taking a snapshot of a particular
table by name over the api.
Other paths that call snapshot_ctl::take_column_family_snapshot
are internal and use it to snap views already.
With that, we can get rid of the allow_view_snapshots flag
that was introduced in aab4cd850c.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This series converts the synchronous `effective_replication_map::get_range_addresses` to async
by calling the replication strategy async entry point with the same name, as its callers are already async
or can be made so easily.
To allow it to yield and work on a coherent view of the token_metadata / topology / replication_map,
let the callers of this patch hold a effective_replication_map per keyspace and pass it down
to the (now asynchronous) functions that use it (making affected storage_service methods static where possible
if they no longer depend on the storage_service instance).
Also, the repeated calls to everywhere_replication_strategy::calculate_natural_endpoints
are optimized in this series by introducing a virtual abstract_replication_strategy::has_static_natural_endpoints predicate
that is true for local_strategy and everywhere_replication_strategy, and is false otherwise.
With it, functions repeatedly calling calculate_natural_endpoints in a loop, for every token, will call it only once since it will return the same result every time anyhow.
Refs #11005
Doesn't fix the issue as the large allocation still remains until we make change dht::token_range_vector chunked (chunked_vector cannot be used as is at the moment since we require the ability to push also to the front when unwrapping)
Closes#11009
* github.com:scylladb/scylladb:
effective_replication_map: make get_range_addresses asynchronous
range_streamer: add_ranges and friends: get erm as param
storage_service: get_new_source_ranges: get erm as param
storage_service: get_changed_ranges_for_leaving: get erm as param
storage_service: get_ranges_for_endpoint: get erm as param
repair: use get_non_local_strategy_keyspaces_erms
database: add get_non_local_strategy_keyspaces_erms
database: add get_non_local_strategy_keyspaces
storage_service: coroutinize update_pending_ranges
effective_replication_map: add get_replication_strategy
effective_replication_map: get_range_addresses: use the precalculated replication_map
abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies
abstract_replication_strategy: reindent
utils: sequenced_set: expose set and `contains` method
abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set
utils: sequenced_set: templatize VectorType
utils: sanitize sequenced_set
utils: sequenced_set: delete mutable get_vector method
For node operations, we currently call get_non_system_keyspaces
but really want to work on all keyspace that have non-local
replication strategy as they are replicated on other nodes.
Reflect that in the replica::database function name.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.
Fixes#11207
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
over the rest api
Performing compaction scrub user did not know whether an operation
was aborted.
If compaction scrub is aborted, return status the user gets over
rest api is set to 1.
Performing compaction scrub user did not know whether any validation
errors were encountered.
The number of validation errors per given compaction scrub is gathered
and summed from each shard. Basing on that value return status over
the rest api is set to 3 if any validation errors were encountered.
And add calls to maybe_yield to prevent stalls in this path
as seen in performance testing.
Also, add a respective rest_api test.
Fixes#11114
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is only needed for the "storage_service/describe_ring" api
and service/storage_service shouldn't bother with it.
It's an api sugar coating.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
storage_service/keyspaces?type=user along with user keyspaces returned
the keyspaces that were internal but non-system.
The list of the keyspaces for the user option
(storage_service/keyspaces?type=user) contains neither system nor
internal but only user keyspaces.
Fixes: #11042Closes#11049
`generation_type` is (supposed to be) conceptually different from
`int64_t` (even if physically they are the same), but at present
Scylla code still largely treats them interchangeably.
In addition to using `generation_type` in more places, we
provide (no-op) `generation_value()` and `generation_from_value()`
operations to make the smoke-and-mirrors more believable.
The churn is considerable, but all mechanical. To avoid even
more (way, way more) churn, unit test code is left untreated for
now, except where it uses the affected core APIs directly.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Previously, any attempt to take a materialized view or secondary index
snapshot was considered a mistake and caused the snapshot operation to
abort, with a suggestion to snapshot the base table instead.
But an automatic pre-scrub snapshot of a view cannot be attributed to
user error, so the operation should not be aborted in that case.
(It is an open question whether the more correct thing to do during
pre-scrub snapshot would be to silently ignore views. Or perhaps they
should be ignored in all cases except when the user explicitly asks to
snapshot them, by name)
Closes#10760.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
snapshot operations over the api are rare
but they contain significant state on disk in the
form of sstables hard-linked to the snapshot directories.
Also, we've seen snapshot operations hang in the field,
requiring a core dump to analyse the issue,
while there were no records in the log indicating
when previous snapshot operations were last executed.
This change promotes logging to info level
when take_snapshot and del_snapshot start,
and logs errors if in case they fail.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.
Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.
Closes#10433
Currently snitch drivers register themselves in class-registry with all
sorts of construction options possible. All those different constuctors
are in fact "config options".
When later snitch will declare its dependencies (gossiper and system
keyspace), it will require patching all this registrations, which's very
inconvenient.
This patch introduces the snitch_config struct and replaces all the
snitch constructors with the snitch_driver(snitch_config cfg) one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
There's a static global sharded<local_cache> variable in system keyspace
the keeps several bits on board that other subsystems need to get from
the system keyspace, but what to have it in future<>-less manner.
Some time ago the system_keyspace became a classical sharded<> service
that references the qctx and the local cache. This set removes the global
cache variable and makes its instances be unique_ptr's sitting on the
system keyspace instances.
The biggest obstacle on this route is the local_host_id that was cached,
but at some point was copied onto db::config to simplify getting the value
from sstables manager (there's no system keyspace at hand there at all).
So the first thing this set does is removes the cached host_id and makes
all the users get it from the db::config.
(There's a BUG with config copy of host id -- replace node doesn't
update it. This set also fixes this place)
De-globalizing the cache is the prerequisite for untangling the snitch-
-messaging-gossiper-system_keyspace knot. Currently cache is initialized
too late -- when main calls system_keyspace.start() on all shards -- but
before this time messaging should already have access to it to store
its preferred IP mappings.
tests: unit(dev), dtest.simple_boot_shutdown(dev)
"
* 'br-trade-local-hostid-for-global-cache' of https://github.com/xemul/scylla:
system_keyspace: Make set_local_host_id non-static
system_keyspace: Make load_local_host_id non-static
system_keyspace: Remove global cache instance
system_keyspace: Make it peering service
system_keyspace,snitch: Make load_dc_rack_info non-static
system_keyspace,cdc,storage_service: Make bootstrap manipulations non-static
system_keyspace: Coroutinize set_bootstrap_state
gossiper: Add system keyspace dependency
cdc_generation_service: Add system keyspace dependency
system_keyspace: Remove local host id from local cache
storage_service: Update config.host_id on replace
storage_service: Indentation fix after previous patch
storage_service: Coroutinize prepare_replacement_info()
system_distributed_keyspace: Indentation fix after previous patch
code,system_keyspace: Relax system_keyspace::load_local_host_id() usage
code,system_keyspace: Remove system_keyspace::get_local_host_id()
This reverts commit 37dc31c429. There is no
reason to suppose compacting different tables concurently on different shards
reduces space requirements, apart from non-deterministically pausing
random shards.
However, when data is badly distributed and there are many tables, it will
slow down major compaction considerably. Consider a case where there are
100 tables, each with a 2GB large partition on some shard. This extra
200GB will be compacted on just one shard. With compation rate of 40 MB/s,
this adds more than an hour to the process. With the existing code, these
compactions would overlap if the badly distributed data was not all in one
shard.
It is also counter to tablets, where data is not equally ditributed on
purpose.
Closes#10246
The method is nowadays called from several places:
- API
- sys.dist.ks. (to udpate view building info)
- storage service prepare_to_join()
- set up in main
They all, but the last, can use db::config cached value, because
it's loaded earlier than any of them (but the last -- that's the
loading part itself).
Once patched, the load_local_host_id() can avoid checking the cache
for that value -- it will not be there for sure.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All its (indirect) callers had been patched to have it, now it's
possible to have the argument in it. Next patch will make use of it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To make major compaction more resilient to low-
disk space conditions, 342bfbd65a
sorted the tables based on their live disk space used.
However, each shard still makes progress in its own pace.
This change serializes major compaction between tables
so we still compact in parallel on all shards, but one
(distributed) table at a time.
As a follow-up, we can consider serializing even at the single shard
level when disk space is critically low, so we can't even risk
parallel compaction across all shards.
Refs scylladb/scylla-dtest#2653
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220313153814.2203660-1-bhalevy@scylladb.com>
This bit is hairy. First, it indicates that the storage service
entered the init_server() method. But, once the node is up and
running it also indicates whether the gossiper is enabled or not
via the APi call.
To rely on the operation mode, first, the NONE mode is introduced
at which the server starts. Then in init_server() is switches to
STARTING.
Second change is to stop using the bit in enable/disable gossiper
API call, instead -- check the gossiper.is_enabled() itself.
To keep the is_initialized API call compatible, when the operation
mode is NORMAL it would return true/false according to the status
of the gossiper. This change is simple because storage service API
handlers already have the gossiper instance hanging around.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The is_joined() status can be get with get_operation_mode(). Since
it indicates that the operation mode is JOINING, NORMAL or anything
above, the operation mode the enum class should be shuffled to get
the simple >= comparison.
Another needed change is to set mode few steps earlier than it
happens now to cover the non-bootstrap startup case.
And the third change is to partially revert the d49aa7ab that made
the .is_joined() method be future-less. Nowadays the is_joined() is
called only from the API which is happy with being future-full in
all other storage service state checks.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is trivial change, since the only user is in API and the
get_operation_mode + mode values are at hand.
One thing to pay attention to -- the new method checks the mode to
be <= STARTING, not for equality. Now this is equivalent change,
but next patch will introduce NONE mode that should be reported
as is_starting() too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it reports back formatted mode. For future convenience it's
needed to return the raw value, all the more so the mode enum class
is already public.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Throwing std::runtime_error results in
http status 500 (internal_server_error), but the problem
is with the request parameters, nt with the server.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Perform offstrategy compaction via the REST API with
a new `keyspace_offstrategy_compaction` option.
This is useful for performing offstrategy compaction
post repair, after repairing all token ranges.
Otherwise, offstrategy compaction will only be
auto-triggered after a 5 minutes idle timeout.
Like major compaction, the api call returns the offstrategy
compaction task future, so it's waited on.
The `long` result counts the number of tables that required
offstrategy compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
These functions are called from the api layer.
Continue to hide the repair tracker from the caller
but use the repair_service already available
at the api layer to invoke the respective high-level
methods without requiring `the_repair_tracker()`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
Splits and validate the cf parameter, containing an optional
comma-separated list of table names.
If any table is not found and a no_such_column_family
exception is thrown, wrap it in a `bad_param_exception`
so it will translate to `reply::status_type::bad_request`
rather than `reply::status_type::internal_server_error`.
With that, hide the split_cf function from api/api.hh
since it was used only from api/storage_service
and new use sites should use validate_tables instead.
Fixes#9754
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This series extends `compaction_manager::stop_ongoing_compaction` so it can be used from the api layer for:
- table::disable_auto_compaction
- compaction_manager::stop_compaction
Fixes#9313Fixes#9695
Test: unit(dev)
Closes#9699
* github.com:scylladb/scylla:
compaction_manager: stop_compaction: wait for ongoing compactions to stop
compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level
compaction_manager: unify stop_ongoing_compactions implementations
compaction_manager: stop_ongoing_compactions: add compaction_type option
compaction_manager: get_compactions: get a table* parameter
table: disable_auto_compaction: stop ongoing compactions
compaction_manager: make stop_ongoing_compactions public
table: futurize disable_auto_compactions
It's not uncommong for cleanup to be issued against an entire keyspace,
which may be composed of tons of tables. To increase chances of success
if low on space, cleanup will now start from smaller tables first, such
that bigger tables will have more space available, once they're reached,
to satisfy their space requirement.
parallel_for_each() is dropped and wasn't needed given that manager
performs per-shard serialization of cleanup jobs.
Refs #9504.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211130133712.64517-1-raphaelsc@scylladb.com>