The method starts a task that uploads all files from the given
keyspace's snapshot to the requested endpoint/bucket. The task runs in
the background, its task_id is returned from the method once it's
spawned and it should be used via /task_manager API to track the task
execution and completion (hint: it's good to have non-zero TTL value to
make sure fast backups don't finish before the caller manages to call
wait_task API).
If snapshot doesn't exist, nothing happens (FIXME, need to return back
an error in that case).
If endpoint is not configured locally, the API call resolves with
bad-request instantly.
Sstables components are scanned for all tables in the keyspace and are
uploaded into the /bucket/${cf_name}/${snapshot_name}/ path.
Task is not abortable (FIXME -- to be added) and doesn't really report
its progress other than running/done state (FIXME -- to be added too).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Pretty much all services in Scylla have their own config. Add one to
snapshot-ctl too, it will be populated later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage_manager maintains set of clients to configured object
storage(s). The snapshot ctl is going to spawn tasks that will talk to
those storages, thus it needs the storage manager to get the clients
from.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This service is going to start tasks managed by task manager. For that,
it should have its module set up and registered.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This helper will be used by a code from another .cc file, so the
template needs to be in header for smooth instantiation
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
in theory, std::result_of_t should have been removed in C++20. and
std::invoke_result_t is available since C++17. thanks to libstdc++,
the tree is compiling. but we should not rely on this.
so, in this change, we replace all `std::result_of_t` with
`std::invoke_result_t`. actually, clang + libstdc++ is already warning
us like:
```
In file included from /home/runner/work/scylladb/scylladb/multishard_mutation_query.cc:9:
In file included from /home/runner/work/scylladb/scylladb/schema/schema_registry.hh:11:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/unordered_map:38:
Warning: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/type_traits:2624:5: warning: 'result_of<void (noop_compacted_fragments_consumer::*(noop_compacted_fragments_consumer &))()>' is deprecated: use 'std::invoke_result' instead [-Wdeprecated-declarations]
2624 | using result_of_t = typename result_of<_Tp>::type;
| ^
/home/runner/work/scylladb/scylladb/mutation/mutation_compactor.hh:518:43: note: in instantiation of template type alias 'result_of_t' requested here
518 | if constexpr (std::is_same_v<std::result_of_t<decltype(&GCConsumer::consume_end_of_stream)(GCConsumer&)>, void>) {
|
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18835
Walking per-table snapshot directory without lock is racy. There's
snapshot-ctl locking that's used to get db-wide snapshot details, it
should be used to get per-table snapshot details too
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Previous patches broke indentation in this method. Fix it by shortening
the summation loop with the help of std::accumulate()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently database::get_snapshot_details() returns a collection of
snapshots. The snapshot_ctl converts this collection into similarly
looking one with slightly different structures inside. The resulting
collection is converted one more time on the API layer into another
similarly looking map.
This patch removes the intermediate conversion.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
So that it's in-sync with table::get_snapshot_details(). Next patches
will improve this place even further.
Also, there can be many snapshots and vector can grow large, but that's
less of an issue here.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
run_snapshot_list_operation() takes a continuation, so passing it
a lambda coroutine without protection is dangerous.
Protect the coroutine with coroutine::lambda so it doesn't lost its
contents.
Fixes#12192.
Closes#12193
The logic to reject explicit snapshot of views/indexes
was improved in aa127a2dbb.
However, we never implemented auto-snapshot of
view/indexes when taking a snapshot of the base table.
This is implemented in this patch.
The implementation is built on top of
ba42852b0e
so it would be hard to backport to 5.1 or earlier
releases.
Fixes#11612
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than pushing the check to
`snapshot_ctl::take_column_family_snapshot`, just check
that explcitly when taking a snapshot of a particular
table by name over the api.
Other paths that call snapshot_ctl::take_column_family_snapshot
are internal and use it to snap views already.
With that, we can get rid of the allow_view_snapshots flag
that was introduced in aab4cd850c.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Previously, any attempt to take a materialized view or secondary index
snapshot was considered a mistake and caused the snapshot operation to
abort, with a suggestion to snapshot the base table instead.
But an automatic pre-scrub snapshot of a view cannot be attributed to
user error, so the operation should not be aborted in that case.
(It is an open question whether the more correct thing to do during
pre-scrub snapshot would be to silently ignore views. Or perhaps they
should be ignored in all cases except when the user explicitly asks to
snapshot them, by name)
Closes#10760.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
And move the logic from snapshot-ctl down to the
replica::database layer.
A following patch will move the flush phase
from the replica::table::snapshot layer
out to the caller.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Detecting a secondary index by checking for a dot
in the table name is wrong as tables generated by Alternator
may contain a dot in their name.
Instead detect bot hmaterialized view and secondary indexes
using the schema()->is_view() method.
Fixes#10526
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Snapshot-ctl methods fetch information about snapshots from
column family objects. The problem with this is that we get rid
of these objects once the table gets dropped, while the snapshots
might still be present (the auto_snapshot option is specifically
made to create this kind of situation). This commit switches from
relying on column family interface to scanning every datadir
that the database knows of in search for "snapshots" folders.
Fixes#3463Closes#7122Closes#9884
Signed-off-by: Piotr Wojtczak <piotr.m.wojtczak@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This pull request fixes two preexisting issues related to snapshot_ctl::true_snapshots_size
https://github.com/scylladb/scylla/issues/9897https://github.com/scylladb/scylla/issues/9898
And adds a couple unit tests to tests the snapshot_ctl functionality.
Test: unit(dev), database_test.{test_snapshot_ctl_details,test_snapshot_ctl_true_snapshots_size}(debug)
Closes#9899
* github.com:scylladb/scylla:
table: get_snapshot_details: count allocated_size
snapshot_ctl: cleanup true_snapshots_size
snpashot_ctl: true_snapshots_size: do not map_reduce across all shards
snapshot_ctl uses map_reduce over all database shards,
each counting the size of the snapshots directory,
which is shared, not per-shard.
So the total live size returned by it is multiples by the number of shards.
Add a unit test to test that.
Fixes#9897
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
The option is provided by nodetool snapshot
https://docs.scylladb.com/operating-scylla/nodetool-commands/snapshot/
```
nodetool [(-h <host> | --host <host>)] [(-p <port> | --port <port>)]
[(-pp | --print-port)] [(-pw <password> | --password <password>)]
[(-pwf <passwordFilePath> | --password-file <passwordFilePath>)]
[(-u <username> | --username <username>)] snapshot
[(-cf <table> | --column-family <table> | --table <table>)]
[(-kc <kclist> | --kc.list <kclist>)]
[(-sf | --skip-flush)] [(-t <tag> | --tag <tag>)] [--] [<keyspaces...>]
-sf / –skip-flush Do not flush memtables before snapshotting (snapshot will not contain unflushed data)
```
But is currently ignored by scylla-jmx (scylladb/scylla-jmx#167)
and not supported at the api level.
This patch adds support for the option in advance
from the api service level down via snapshot_ctl
to the table class and snapshot implementation.
In addition, a corresponding unit test was added to verify
that taking a snapshot with `skip_flush` does not flush the memtable
(at the table::snapshot level).
Refs #8725Closes#8726
* github.com:scylladb/scylla:
test: database_test: add snapshot_skip_flush_works
api: storage_service/snapshots: support skip-flush option
snapshot: support skip_flush option
table: snapshot: add skip_flush option
api: storage_service/snapshots: add sf (skip_flush) option
Every time db/config.hh is modified (e.g., to add a new configuration
option), 110 source files need to be recompiled. Many of those 110 didn't
really care about configuration options, and just got the dependency
accidentally by including some other header file.
In this patch, I remove the include of "db/config.hh" from all header
files. It is only needed in source files - and header files only
need forward declarations. In some cases, source files were missing
certain includes which they got incidentally from db/config.hh, so I
had to add these includes explicitly.
After this patch, the number of source files that get recompiled after a
change to db/config.hh goes down from 110 to 45.
It also means that 65 source files now compile faster because they don't
include db/config.hh and whatever it included.
Additionally, this patch also eliminates a few unnecessary inclusions
of database.hh in other header files, which can use a forward declaration
or database_fwd.hh. Some of the source files including one of those
header files relied on one of the many header files brought in by
database.hh, so we need to include those explicitly.
In view_update_generator.hh something interesting happened - it *needs*
database.hh because of code in the header file, but only included
database_fwd.hh, and the only reason this worked was that the files
including view_update_generator.hh already happened to unnecessarily
include database.hh. So we fix that too.
Refs #1
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210505102111.955470-1-nyh@scylladb.com>
Fixes#8212
Some snapshotting operations call in on a single table at a time.
When checking for existing snapshots in this case, we should not
bother with snapshots in other tables. Add an optional "filter"
to check routine, which if non-empty includes tables to check.
Use case is "scrub" which calls with a limited set of tables
to snapshot.
Closes#8240
C++17 introduced try_emplace for maps to replace a pattern:
if(element not in a map) {
map.emplace(...)
}
try_emplace is more efficient and results in a more concise code.
This commit introduces usage of try_emplace when it's appropriate.
Tests: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.
`contains` does not only express the intend of the code better but also
does it in more unified way.
This commit replaces all the occurences of the `count` with the
`contains`.
Tests: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
Now when the snapshot stopping is correctly handled, we may pull the database
reference all the way down to the schema::describe().
One tricky place is in table::napshot() -- the local db reference is pulled
through an smp::submit_to call, but thanks to the shard checks in the place
where it is needed the db is still "local"
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For this de-static run_snapshot_*_operation (because we no longer have
the static global to get the lock from) and make the snapshot_ctl be
peering_sharded_service to call invoke_on.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This includes
- rename namespace in snapshot-ctl.[cc|hh]
- move methods from storage_service to snapshot_ctl
- move snapshot_details struct
- temporarily make storage_service._snapshot_lock and ._snapshot_ops public
- replace two get_local_storage_service() occurrences with this._db
The latter is not 100% clear as the code that does this references "this"
from another shard, but the _db in question is the distributed object, so
they are all the same on all instances.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is plain move, no other modifications are made, even the
"service" namespace is kept, only few broken indentation fixes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
A placeholder for snapshotting code that will be moved into it
from the storage_service.
Also -- pass it through the API for future use.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>