Commit Graph

183 Commits

Author SHA1 Message Date
Avi Kivity
5342d79461 Merge "Preparatory work in sstable_set for the upcoming compound_sstable_set_impl" from Raphael
* 'preparatory_work_for_compound_set' of github.com:raphaelsc/scylla:
  sstable_set: move all() implementation into sstable_set_impl
  sstable_set: preparatory work to change sstable_set::all() api
  sstables: remove bag_sstable_set
2021-03-10 19:19:26 +02:00
Raphael S. Carvalho
05b07c7161 sstable_set: preparatory work to change sstable_set::all() api
users of sstable_set::all() rely on the set itself keeping a reference
to the returned list, so user can iterate through the list assuming
that it is alive all the way through.

this will change in the future though, because there will be a
compound set impl which will have to merge the all() of multiple
managed sets, and the result is a temporary value.

so even range-based loops on all() have to keep a ref to the returned
list, to avoid the list from being prematurely destroyed.

so the following code
	for (auto& sst : *sstable_set.all()) { ...}
becomes
	for (auto sstables = sstable_set.all(); auto& sst : *sstables) { ... }

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-03-10 12:02:12 -03:00
Asias He
61ac8d03b9 repair: Add ignore_nodes option
In some cases, user may want to repair the cluster, ignoring the node
that is down. For example, run repair before run removenode operation to
remove a dead node.

Currently, repair will ignore the dead node and keep running repair
without the dead node but report the repair is partial and report the
repair is failed. It is hard to tell if the repair is failed only due to
the dead node is not present or some other errors.

In order to exclude the dead node, one can use the hosts option. But it
is hard to understand and use, because one needs to list all the "good"
hosts including the node itself. It will be much simpler, if one can
just specify the node to exclude explicitly.

In addition, we support ignore nodes option in other node operations
like removenode. This change makes the interface to ignore a node
explicitly more consistent.

Refs: #7806

Closes #8233
2021-03-09 16:03:13 +01:00
Avi Kivity
5f4bf18387 Revert "Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros"
This reverts commit 31909515b3, reversing
changes made to ef97adc72a. It shows many
serious regressions in dtest.

Fixes #8197.
2021-03-02 13:21:22 +02:00
Avi Kivity
31909515b3 Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros
Currently, the sstable_set in a table is copied before every change
to allow accessing the unchanged version by existing sstable readers.
This patch changes the sstable_set to a structure that keeps all its
versions that are referenced somewhere and provides a way of getting
a reference to an immutable version of the set.
Each sstable in the set is associated with the versions it is alive in,
and is removed when all such versions don't have references anymore.
To avoid copying, the object holding all sstables in the set version is
changed to a new structure, sstable_list, which was previously an alias
for std::unordered_set<shared_sstable>, and which implements most of the
methods of an unordered_set, but its iterator uses the actual set with
all sstables from all referenced versions and iterates over those
sstables that belong to the captured version.
The methods that modify the sets contents give strong exception guarantee
by trying to insert new sstables to its containers, and erasing them in
the case of an caught exception.
To release shared_sstables as soon as possible (i.e. when all references
to versions that contain them die), each time a version is removed, all
sstables that were referenced exclusively by this version are erased. We
are able to find these sstables efficiently by storing, for each version,
all sstables that were added and erased in it, and, when a version is
removed, merging it with the next one. When a version that adds an sstable
gets merged with a version that removes it, this sstable is erased.

Fixes #2622

Signed-off-by: Wojciech Mitros wojciech.mitros@scylladb.com

Closes #8111

* github.com:scylladb/scylla:
  sstables: add test for checking the latency of updating the sstable_set in a table
  sstables: move column_family_test class from test/boost to test/lib
  sstables: use fast copying of the sstable_set instead of rebuilding it
  sstables: replace the sstable_set with a versioned structure
  sstables: remove potential ub
  sstables: make sstable_set constructor less error-prone
2021-03-01 14:16:36 +02:00
Kamil Braun
e2f03e4aba cdc: move (most of) CDC generation management code to the new service
Currently all management of CDC generations happens in storage_service,
which is a big ball of mud that does many unrelated things.

Previous commits have introduced a new service for managing CDC
generations. This code moves most of the relevant code to this new
service.

However, some part still remains in storage_service: the bootstrap
procedure, which happens inside storage_service, must also do some
initialization regarding CDC generations, for example: on restart it
must retrieve the latest known generation timestamp from disk; on
bootstrap it must create a new generation and announce it to other
nodes. The order of these operations w.r.t the rest of the startup
procedure is important, hence the startup procedure is the only right
place for them.

Still, what remains in storage_service is a small part of the entire
CDC generation management logic; most of it has been moved to the
new service. This includes listening for generation changes and
updating the data structures for performing CDC log writes (cdc::metadata).
Furthermore these functions now return futures (and are internally
coroutines), where previously they required a seastar::async context.
2021-02-26 12:06:12 +01:00
Wojciech Mitros
48153a1e2c sstables: remove potential ub
If the range expression in a range based for loop returns a temporary,
its lifetime is extended until the end of the loop. The same can't be said
about temporaries created within the range expression. In our case,
*t->get_sstables_including_compacted_undeleted() returns a reference to a
const sstable_list, but the t->get_sstables_including_compacted_undeleted()
is a temporary lw_shared_ptr, so its lifetime may not be prolonged until the
end of the loop, and it may be the sole owner of the referenced sstable_list,
so the referenced sstable_list may be already deleted inside the loop too.
Fix by creating a local copy of the lw_shared_ptr, and get reference from it
in the loop.

Fixes #7605

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-02-11 11:02:55 +01:00
Asias He
4d32d03172 storage_service: Introduce load_and_stream
=== Introduction ===

This feature extends the nodetool refresh to allow loading arbitrary sstables
that do not belong to a node into the cluster. It loads the sstables from disk
and calculates the owning nodes of the data and streams to the owners
automatically.

From example, say the old cluster has 6 nodes and the new cluster has 3 nodes.
We can copy the sstables from the old cluster to any of the new nodes and
trigger the load and stream process.

This can make restores and migrations much easier.

=== Performance ===

I managed to get 40MB/s per shard on my build machine.
CPU: AMD Ryzen 7 1800X Eight-Core Processor
DISK: Samsung SSD 970 PRO 512GB

Assume 1TB sstables per node, each shard can do 40MB/s, each node has 32
shards, we can finish the load and stream 1TB of data in 13 mins on each
node.

1TB / 40 MB per shard * 32 shard / 60 s = 13 mins

=== Tests ===

backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_test
which creates a cluster with 4 nodes and inserts data, then use
load_and_stream to restore to a 2 nodes cluster.

=== Usage ===

curl -X POST "http://{ip}:10000/storage_service/sstables/{keyspace}?cf={table}&load_and_stream=true

=== Notes ===

Btw, with the old nodetool refresh, the node will not pick up the data
that does not belong to this node but it will not delete it either. One
has to run nodetool cleanup to remove those data manually which is a
surprise to me and probably to users as well. With load and stream, the
process will delete the sstables once it finishes stream, so no nodetool
cleanup is needed.

The name of this feature load and stream follows load and store in CPU world.

Fixes #7831
2021-01-18 16:32:33 +08:00
Asias He
829b4c1438 repair: Make removenode safe by default
Currently removenode works like below:

- The coordinator node advertises the node to be removed in
  REMOVING_TOKEN status in gossip

- Existing nodes learn the node in REMOVING_TOKEN status

- Existing nodes sync data for the range it owns

- Existing nodes send notification to the coordinator

- The coordinator node waits for notification and announce the node in
  REMOVED_TOKEN

Current problems:

- Existing nodes do not tell the coordinator if the data sync is ok or failed.

- The coordinator can not abort the removenode operation in case of error

- Failed removenode operation will make the node to be removed in
  REMOVING_TOKEN forever.

- The removenode runs in best effort mode which may cause data
  consistency issues.

  It means if a node that owns the range after the removenode
  operation is down during the operation, the removenode node operation
  will continue to succeed without requiring that node to perform data
  syncing. This can cause data consistency issues.

  For example, Five nodes in the cluster, RF = 3, for a range, n1, n2,
  n3 is the old replicas, n2 is being removed, after the removenode
  operation, the new replicas are n1, n5, n3. If n3 is down during the
  removenode operation, only n1 will be used to sync data with the new
  owner n5. This will break QUORUM read consistency if n1 happens to
  miss some writes.

Improvements in this patch:

- This patch makes the removenode safe by default.

We require all nodes in the cluster to participate in the removenode operation and
sync data if needed. We fail the removenode operation if any of them is down or
fails.

If the user want the removenode operation to succeed even if some of the nodes
are not available, the user has to explicitly pass a list of nodes that can be
skipped for the operation.

$ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id>

Example restful api:

$ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5"

- The coordinator can abort data sync on existing nodes

For example, if one of the nodes fails to sync data. It makes no sense for
other nodes to continue to sync data because the whole operation will
fail anyway.

- The coordinator can decide which nodes to ignore and pass the decision
  to other nodes

Previously, there is no way for the coordinator to tell existing nodes
to run in strict mode or best effort mode. Users will have to modify
config file or run a restful api cmd on all the nodes to select strict
or best effort mode. With this patch, the cluster wide configuration is
eliminated.

Fixes #7359

Closes #7626
2020-12-10 10:14:39 +02:00
Tomasz Grabiec
d3a5814f4f api: Connect nodetool resetlocalschema to schema version recalculation
It doesn't really do what the nodetool command is docuemented to do,
which is to truncate local schema tables, but it is still an
improvement.

Message-Id: <1605740190-30332-1-git-send-email-tgrabiec@scylladb.com>
2020-11-19 13:55:09 +02:00
Benny Halevy
29ed59f8c4 main: start a shared_token_metadata
And use it to get a token_metadata& compatible
with current usage, until the services are converted to
use token_metadata_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Amnon Heiman
48c3c94aa6 api/storage_service.cc: Add the get_range_to_endpoint_map
The get_range_to_endpoint_map method, takes a keyspace and returns a map
between the token ranges and the endpoint.

It is used by some external tools for repair.

Token ranges are codes as size-2 array, if start or end are empty, they will be
added as an empty string.

The implementation uses get_range_to_address_map and re-pack it
accordingly.

The use of stream_range_as_array it to reduce the risk of large
allocations and stalls.

Relates to scylladb/scylla-jmx#36

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes #7329
2020-10-08 12:09:09 +03:00
Pavel Emelyanov
8333fed8aa compaction: Keep database reference on upgrade options
The only place that creates them is the API upgrade_sstables call.

The created options object doesn't over-survive the returned
future, so it's safe to keep this reference there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-21 14:58:40 +03:00
Pavel Emelyanov
285648620b repair: Keep sharded messaging service reference on repair_info
This reference comes from the API that already has it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:53 +03:00
Pavel Emelyanov
8b4820b520 repair: Keep sharded messaging service in API
The reference will be needed in repair_start, so prepare one in advance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:53 +03:00
Pavel Emelyanov
126dac8ad1 repair: Unset API endpoints on stop
This unset the roll-back of the correpsonding _set-s. The messaging
service will be (already is, but implicitly) used in repair API
callbacks, so make sure they are unset before the messaging service
is stopped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:53 +03:00
Pavel Emelyanov
fe2c479c04 repair: Setup API endpoints in separate helper
There will be the unset part soon, this is the preparation. No functional
changes in api/storage_server.cc, just move the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Asias He
271fac56a3 repair: Add synchronous API to query repair status
This new api blocks until the repair job is either finished or failed or timeout.

E.g.,

- Without timeout
curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123

- With timeout
curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123&timeout=5

The timeout is in second.

The current asynchronous api returns immediately even if the repair is in progress.

E.g., curl -X GET http://127.0.0.1:10000/storage_service/repair_async/ks?id=123

User can use the new synchronous API to avoid keep sending the query to
poll if the repair job is finished.

Fixes #6445
2020-07-14 11:20:15 +03:00
Asias He
07e253542d compaction_manager: Avoid stall in perform_cleanup
The following stall was seen during a cleanup operation:

scylla: Reactor stalled for 16262 ms on shard 4.

| std::_MakeUniq<locator::tokens_iterator_impl>::__single_object std::make_unique<locator::tokens_iterator_impl, locator::tokens_iterator_impl&>(locator::tokens_iterator_impl&) at /usr/include/fmt/format.h:1158
|  (inlined by) locator::token_metadata::tokens_iterator::tokens_iterator(locator::token_metadata::tokens_iterator const&) at ./locator/token_metadata.cc:1602
| locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at simple_strategy.cc:?
|  (inlined by) locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at ./locator/simple_strategy.cc:56
| locator::abstract_replication_strategy::get_ranges(gms::inet_address, locator::token_metadata&) const at /usr/include/fmt/format.h:1158
| locator::abstract_replication_strategy::get_ranges(gms::inet_address) const at /usr/include/fmt/format.h:1158
| service::storage_service::get_ranges_for_endpoint(seastar::basic_sstring<char, unsigned int, 15u, true> const&, gms::inet_address const&) const at /usr/include/fmt/format.h:1158
| service::storage_service::get_local_ranges(seastar::basic_sstring<char, unsigned int, 15u, true> const&) const at /usr/include/fmt/format.h:1158
|  (inlined by) operator() at ./sstables/compaction_manager.cc:691
|  (inlined by) _M_invoke at /usr/include/c++/9/bits/std_function.h:286
| std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>::operator()(table const&) const at /usr/include/fmt/format.h:1158
|  (inlined by) compaction_manager::rewrite_sstables(table*, sstables::compaction_options, std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>) at ./sstables/compaction_manager.cc:604
| compaction_manager::perform_cleanup(table*) at /usr/include/fmt/format.h:1158

To fix, we furturize the function to get local ranges and sstables.

In addition, this patch removes the dependency to global storage_service object.

Fixes #6662
2020-07-01 15:03:50 +08:00
Pavel Emelyanov
d0d2da6ccb api: Remove excessive capture
The "result" in this lambda is already not used and can be removed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-29 19:08:59 +03:00
Pavel Emelyanov
4f5ffa980d api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-29 19:08:59 +03:00
Pavel Emelyanov
d99969e0e0 api: Fix wrongly captured map of snapshots
The results of get_snapshot_details() is saved in do_with, then is
captured on the json callback by reference, then the do_with's
future returns, so by the time callback is called the map is already
free and empty.

Fix by capturing the result directly on the callback.
Fixes recently merged b6086526.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-29 19:08:21 +03:00
Pavel Emelyanov
d674baacef snapshot: Move all code into db::snapshot_ctl class
This includes
- rename namespace in snapshot-ctl.[cc|hh]
- move methods from storage_service to snapshot_ctl
- move snapshot_details struct
- temporarily make storage_service._snapshot_lock and ._snapshot_ops public
- replace two get_local_storage_service() occurrences with this._db

The latter is not 100% clear as the code that does this references "this"
from another shard, but the _db in question is the distributed object, so
they are all the same on all instances.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 19:59:53 +03:00
Pavel Emelyanov
d989d9c1c7 snapshots: Initial skeleton
A placeholder for snapshotting code that will be moved into it
from the storage_service.

Also -- pass it through the API for future use.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 19:54:14 +03:00
Pavel Emelyanov
9a8a1635b7 snapshots: Properly shutdown API endpoints
Now with the seastar httpd routes unset() at hands we
can shut down individual API endpoints. Do this for
snapshot calls, this will make snapshot controller stop
safe.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 17:27:45 +03:00
Pavel Emelyanov
b608652622 api: Rewrap set_server_snapshot lambda
The lambda calls the core snapshot method deep inside the
json marshalling callback. This will bring problems with
stopping the snapshot controller in the next patches.

To prepare for this -- call the .get_snapshot_details()
first, then keep the result in do_with() context. This
change doesn't affect the issue the lambde in question is
about to solve as the whole result set is anyway kept in
memory while being streamed outside.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 17:27:45 +03:00
Glauber Costa
bb07678346 api: do not allow user to meddle with auto compaction too early
We are about to use the auto compaction property during the
populate/reshard process. If the user toggles it, the database can be
left in a bad state.

There should be no reason why a user would want to set that up this
early. So we'll disallow it.

To do that property, it is better if the check of whether or not
the storage service is ready to accomodate this request is local
to the storage service itself. We then move the logic of set_tables_autocompaction
from api to the storage service. The API layer now merely translates
the table names and pass it along.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-06-18 09:00:25 -04:00
Nadav Har'El
86a4dfcd29 merge: api: Command to check and repair cdc streams
Merged pull request https://github.com/scylladb/scylla/pull/6551
from Juliusz Stasiewicz:

The command regenerates streams when:

    generations corresponding to a gossiped timestamp cannot be
    fetched from system_distributed table,
    or when generation token ranges do not align with token metadata.

In such case the streams are regenerated and new timestamp is
gossiped around. The returned JSON is always empty, regardless of
whether streams needed regeneration or not.

Fixes #6498
Accompanied by: scylladb/scylla-jmx#109, scylladb/scylla-tools-java#172
2020-06-15 14:17:35 +03:00
Pavel Emelyanov
a1df24621c thrift_controller: Switch on standalone
Remove the on-storage_service instance and make everybody use
th standalone one.

Stopping the thrift is done by registering the controller in
client service shutdown hooks. This automatically wires the
stopping into drain, decommission and isolation codes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-12 22:14:33 +03:00
Pavel Emelyanov
c26943e7b5 thrift_controller: Pass one through management API
The goal is to make the relevant endpoints work on standalone
thrift controller instead of the storage_service's one, so
prepare this controller (dummy for now) and pass it all the
way down the API code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-12 22:14:33 +03:00
Pavel Emelyanov
1d5cdfe3c6 cql_controller: Switch on standalone
Remove the on-storage_service instance and make everybody use
th standalone one.

Stopping the server is done by registering the controller in
client service shutdown hooks. This automatically wires the
stopping into drain, decommission and isolation codes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-12 22:14:09 +03:00
Pavel Emelyanov
7ebe44f33d cql_controller: Pass one through management API
The goal is to make the relevant endpoints work on standalone
cql controller instead of the storage_service's one, so
prepare this controller (dummy for now) and pass it all the
way down the API code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-12 22:14:09 +03:00
Pavel Emelyanov
6a89c987e4 api: Tune reg/unreg of client services control endpoints
Currntly API endpoints to start and stop cql_server and thrift
are registered right after the storage service is started, but
much earlier than those services are. In between these two
points a lot of other stuff gets initialized. This opens a small
window  during which cql_server and thrift can be started by
hand too early.

The most obvious problem is -- the storage_service::join_cluster()
may not yet be called, the auth service is thus not started, but
starting cql/thrift needs auth.

Another problem is those endpoints are not unregistered on stop,
thus creating another way to start cql/thrif at wrong time.

Also the endpoints registration change helps further patching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-12 18:47:24 +03:00
Juliusz Stasiewicz
0ad50013ff storage_service: Implementation of API call to repair CDC streams
The command regenerates streams when:
- generations corresponding to a gossiped timestamp cannot be
fetched from `system_distributed` table,
- or when generation token ranges do not align with token metadata.

In such case the streams are regenerated and new timestamp is
gossiped around. The returned JSON is always empty, regardless of
whether streams needed regeneration or not.
2020-06-06 16:52:21 +02:00
Juliusz Stasiewicz
aadd2ffa6a api: Added command /storage_service/cdc_streams_check_and_repair
This commit introduces a placeholder for HTTP POST request at
`/storage_service/cdc_streams_check_and_repair`.
2020-05-29 12:23:08 +02:00
Avi Kivity
513faa5c71 Merge 'Use http Stream for describe ring' from Amnon
"
This series changes the describe_ring API to use HTTP stream instead of serializing the results and send it as a single buffer.

While testing the change I hit a 4-year-old issue inside service/storage_proxy.cc that causes a use after free, so I fixed it along the way.

Fixes #6297
"

* amnonh-stream_describe_ring:
  api/storage_service.cc: stream result of token_range
  storage_service: get_range_to_address_map prevent use after free
2020-05-17 14:05:26 +03:00
Amnon Heiman
7c4562d532 api/storage_service.cc: stream result of token_range
The get token range API can become big which can cause large allocation
and stalls.

This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.

Fixes #6297

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-05-17 13:56:05 +03:00
Avi Kivity
f1fde537a9 Merge 'Support Snapshot of multiple tables' from Amnon
This series adds support for taking a snapshot of multiple tables.

Fixes #6333

* amnonh-snapshot_keyspace_table:
  api/storage_service.cc: Snapshot, support multiple tables
  service/storage_service: Take snapshot of multiple tables
2020-05-12 11:34:09 +03:00
Ivan Prisyazhnyy
84e25e8ba4 api: support table auto compaction control
The patch implements:

- /storage_service/auto_compaction API endpoint
- /column_family/autocompaction/{name} API endpoint

Those APIs allow to control and request the status of background
compaction jobs for the existing tables.

The implementation introduces the table::_compaction_disabled_by_user.
Then the CompactionManager checks if it can push the background
compaction job for the corresponding table.

New members
===

    table::enable_auto_compaction();
    table::disable_auto_compaction();
    bool table::is_auto_compaction_disabled_by_user() const

Test
===
Tests: unit(sstable_datafile_test autocompaction_control_test), manual

    $ ninja build/dev/test/boost/sstable_datafile_test
    $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000

The test tries to submit a compaction job after playing
with autocompaction control table switch. However, there is
no reliable way to hook pending compaction task. The code
assumed that with_scheduling_group() closure will never
preempt execution of the stats check.

Revert
===
Reverts commit c8247ac. In previous version the execution
sometimes resulted into the following error:

    test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
    critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed

This version adds a few sstables to the cf, starts
the compaction and awaits until it is finished.

API change
===

- `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled).
- `/column_family/autocompaction/` got support for POST/DELETE per table

Fixes
===

Fixes #1488
Fixes #1808
Fixes #440

Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
2020-05-07 16:23:38 +03:00
Amnon Heiman
ee7b40e31b api/storage_service.cc: Snapshot, support multiple tables
It is sometimes useful to take a snapshot of multiple tables inside a
keyspace.

This patch add support for multiple tables names when taking a snapshot.

The change consist of splitting the table (column family) name and use
the array of table instead of just one.

After this patch this will be supported:
curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=snapshottag&kn=system&cf=range_xfers,large_partitions'

Fixes #6333

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-05-05 12:55:36 +03:00
Raphael S. Carvalho
02e046608f api/service: fix segfault when taking a snapshot without keyspace specified
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
2020-04-27 23:37:00 +03:00
Pekka Enberg
c8247aced6 Revert "api: support table auto compaction control"
This reverts commit 1c444b7e1e. The test
it adds sometimes fails as follows:

  test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
  critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed

Ivan is working on a fix, but let's revert this commit to avoid blocking
next promotion failing from time to time.
2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy
1c444b7e1e api: support table auto compaction control
This patch adds API endpoint /column_family/autocompaction/{name}
that listen to GET and POST requests to pick and control table
background compactions.

To implement that the patch introduces "_compaction_disabled_by_user"
flag that affects if CompactionManager is allowed to push background
compactions jobs into the work.

It introduces

    table::enable_auto_compaction();
    table::disable_auto_compaction();
    bool table::is_auto_compaction_disabled_by_user() const

to control auto compaction state.

Fixes #1488
Fixes #1808
Fixes #440
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
2020-04-08 21:18:38 +03:00
Rafael Ávila de Espíndola
8da235e440 everywhere: Use futurize_invoke instead of futurize<T>::invoke
No functionality change, just simpler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200330165308.52383-1-espindola@scylladb.com>
2020-04-03 15:53:35 +02:00
Rafael Ávila de Espíndola
eca0ac5772 everywhere: Update for deprecated apply functions
Now apply is only for tuples, for varargs use invoke.

This depends on the seastar changes adding invoke.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200324163809.93648-1-espindola@scylladb.com>
2020-03-25 08:49:53 +02:00
Pavel Emelyanov
7363d56946 sstables: Move get_highest_supported_format
The global get_highest_supported_format helper and its declaration
are scattered all over the code, so clean this up and prepare the
ground for moving _sstables_format from the storage_service onto
the sstables_manager (not this set).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-25 14:31:45 +03:00
Amnon Heiman
6b020e67ce api/storage_service: Support specifying a table when deleting a snapshot
This patch adds an optional parameter to DELETE /storage_service/snapshots

After this patch the following will be supported:

If a keyspace called keyspace1 and a table called standard1 exists.

curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1'

curl -X DELETE --header 'Accept: application/json' 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1&cf=standard1'

Fixes #5658

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-02-18 16:34:10 +02:00
Amnon Heiman
c3260bad25 storage_service: Add optional table name to clear snapshot
There are cases when it is useful to delete specific table from a
snapshot.

An example is when a snapshot is used for backup. Backup can take a long
period of time, during that time, each of the tables can be deleted once
it was backup without waiting for the entire backup process to
completed.

This patch adds such an option to the database and to the storage_service
wrapping method that calls it.

If a table is specified a filter function is created that filter only
the column family with that given name.

This is similar to the filtering at the keyspace level.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-02-18 16:34:10 +02:00
Tomasz Grabiec
76d1dd7ec6 Merge "nodetool scrub: implement validation and the skip-corrupted flag
" from Botond

Nodetool scrub rewrites all sstables, validating their data. If corrupt
data is found the scrub is aborted. If the skip-corrupted flag is set,
corrupt data is instead logged (just the keys) and skipped.

The scrubbing algorithm itself is fairly simple, especially that we
already have a mutation stream validator that we can use to validate the
data. However currently scrub is piggy-backed on top of cleanup
compaction. To implement this flag, we have to make scrub a separate
compaction type and propagate down the flag. This required some
massaging of the code:
* Add support for more than two (cleanup or not) compaction types.
* Allow passing custom options for each compaction type.
* Allow stopping a compaction without the manager retrying it later.

Additionally the validator itself needed some changes to allow different
ways to handle errors, as needed by the scrub.

Fixes: #5487

* https://github.com/denesb/nodetool-scrub-skip-corrupted/v7:
  table: cleanup_sstables(): only short-circuit on actual cleanup
  compaction: compaction_type: add Upgrade
  compaction: introduce compaction_options
  compaction: compaction_descriptor: use compaction options instead of
    cleanup flag
  compaction_manager: collect all cleanup related logic in
    perform_cleanup()
  sstables: compaction_stop_exception: add retry flag
  mutation_fragment_stream_validator: split into low-level and
    high-level API
  compaction: introduce scrub_compaction
  compaction_manager: scrub: don't piggy-back on upgrade_sstables()
  test: sstable_datafile_test: add scrub unit test
2020-02-17 15:28:07 +02:00