This implements support for triggering major compations through the REST
API. Please note that "split_output" is not supported and Glauber Costa
confirmed this this is fine:
"We don't support splits, nor do I think we should."
Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
Simple REST API for error injection is implemented.
The API allow the following operations:
* injecting an error at given injection name
* listing injections
* disabling an injection
* disabling all injections
Currently the API enables/disables on all shards.
Closes#3295
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
If sstring is made an alias to std::string ADL causes std::make_shared
to be found. Explicitly ask for ::make_shared.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Presently lightweight transactions piggy back the old
row value on prepare round response. If one of the participants
did not provide the old value or the values from peers don't match,
we perform a full read round which will repair the Paxos table and the
base table, if necessary, at all participants.
Capture the fact that read optimization has failed in a metric.
Message-Id: <20200304192955.84208-2-kostja@scylladb.com>
The global get_highest_supported_format helper and its declaration
are scattered all over the code, so clean this up and prepare the
ground for moving _sstables_format from the storage_service onto
the sstables_manager (not this set).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This series adds an option to the API that supports deleting
a specific table from a snapshot.
The implementation works in a similar way to the option
to specify specific keyspaces when deleting a snapshot.
The motivation is to allow reducing disk-space when using
the snapshot for backup. A dtest PR is sent to the dtest
repository.
Fixes#5658
Original PR #5805
Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf)
* amnonh/delete_table_snapshot:
test/boost/database_test: adopt new clear_snapshot signature
api/storage_service: Support specifying a table when deleting a snapshot
storage_service: Add optional table name to clear snapshot
* amnonh/delete_table_snapshot:
test/boost/database_test: adopt new clear_snapshot signature
api/storage_service: Support specifying a table when deleting a snapshot
storage_service: Add optional table name to clear snapshot
The set_config registers lambdas that need db.local(), so
these routes must be registered after database is started.
Fixes: #5849
Tests: unit(dev), manual wget on API
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200219130654.24259-1-xemul@scylladb.com>
There are cases when it is useful to delete specific table from a
snapshot.
An example is when a snapshot is used for backup. Backup can take a long
period of time, during that time, each of the tables can be deleted once
it was backup without waiting for the entire backup process to
completed.
This patch adds such an option to the database and to the storage_service
wrapping method that calls it.
If a table is specified a filter function is created that filter only
the column family with that given name.
This is similar to the filtering at the keyspace level.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
" from Botond
Nodetool scrub rewrites all sstables, validating their data. If corrupt
data is found the scrub is aborted. If the skip-corrupted flag is set,
corrupt data is instead logged (just the keys) and skipped.
The scrubbing algorithm itself is fairly simple, especially that we
already have a mutation stream validator that we can use to validate the
data. However currently scrub is piggy-backed on top of cleanup
compaction. To implement this flag, we have to make scrub a separate
compaction type and propagate down the flag. This required some
massaging of the code:
* Add support for more than two (cleanup or not) compaction types.
* Allow passing custom options for each compaction type.
* Allow stopping a compaction without the manager retrying it later.
Additionally the validator itself needed some changes to allow different
ways to handle errors, as needed by the scrub.
Fixes: #5487
* https://github.com/denesb/nodetool-scrub-skip-corrupted/v7:
table: cleanup_sstables(): only short-circuit on actual cleanup
compaction: compaction_type: add Upgrade
compaction: introduce compaction_options
compaction: compaction_descriptor: use compaction options instead of
cleanup flag
compaction_manager: collect all cleanup related logic in
perform_cleanup()
sstables: compaction_stop_exception: add retry flag
mutation_fragment_stream_validator: split into low-level and
high-level API
compaction: introduce scrub_compaction
compaction_manager: scrub: don't piggy-back on upgrade_sstables()
test: sstable_datafile_test: add scrub unit test
Now that we have the necessary infrastructure to do actual scrubbing,
don't rely on `upgrade_sstables()` anymore behind the scenes, instead do
an actual scrub.
Also, use the skip-corrupted flag.
The list_snapshot API, uses http stream to stream the result to the
caller.
It needs to keep all objects and stream alive until the stream is closed.
This patch adds do_with to hold these objects during the lifetime of the
function.
Fixes#5752
get_snapshot should use http stream to reduce memory allocation and
stalls.
This patch change the implementation so it would stream each of the
snapshot object instead of creating a single response and return it.
Fixes#5468
Depends on scylladb/seastar#723
This commit builds on top of the introduced per scheduling group
statistics template and employs it for achieving a per scheduling
group statistics in storage_proxy.
Some of the statistics also had meaning as a global - per
shard one. Those are the ones for determining if to
throttle the write request. This was handled by creating a
global stats struct that will hold those stats and by changing
the stat update to also include the global one.
One point that complicated it is an already existing aggregation
over the per shard stats that now became a per scheduling group
per shard stats, converting the aggregation to a two-dimensional
aggregation.
One thing this commit doesn't handle is validating that an individual
statistic didn't "cross a scheduling group boundary", such validation
is possible but it can easily be added in the future. There is a
subtlety to doing so since if the operation did cross to other
scheduling group two connected statistics can lose balance
for example written bytes and completed write transactions.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
In storage_service's snapshot code there are checks for
_operation_mode being _not_ JOINING to proceed. The intention
is apparently to allow for snapshots only after the cluster
join. However, here's how the start-up code looks like
- _operation_mode = STARTING in storage_service::constructor
- snapshot API registered in api::set_server_storage_service
- _operation_mode = JOINING in storage_service::join_token_ring
So in between steps 2 and 3 snapshots can be taken.
Although there's a quick and simple fix for that (check for the
_operation_mode to be not STARTING either) I think it's better
to register the snapshot API later instead. This will help
greatly to de-bload the storage_service, in particular -- to
incapsulate the _operation_mode properly.
Note, though the check for _operation_mode is made only for
taking snapshot, I move all snapshot ops registration to the
later phase.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is preparation for the next patch -- the lambda in
question (and the used type) will be needed in two
functions, so make the lambda a "real" function.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a lonely get_load_map() call on storage_service that
needs only load broadcaster, always runs on shard 0 and that's it.
Next patch will move this whole stuff into its own helper no-shard
container and this is preparation for this.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.
This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.
The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
In swagger 1.2 int is defined as int32.
We originally used int following the jmx definition, in practice
internally we use uint and int64 in many places.
While the API format the type correctly, an external system that uses
swagger-based code generator can face a type issue problem.
This patch replace all use of int in a return type with long that is defined as int64.
Changing the return type, have no impact on the system, but it does help
external systems that use code generator from swagger.
Fixes#5347
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
From Shlomi:
4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)
I get the error on c-s once decommission ends
java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated
The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.
To fix, we should reject the nodetool cleanup when there is pending ranges on that node.
Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.
Refs: #5045
Merged patch series by Amnon Heiman:
This patch fixes a bug that a map is held on the stack and then is used
by a future.
Instead, the map is now moved to the relevant lambda function.
Fixes#4824
The sum_ratio struct is a helper struct that is used when calculating
ratio over multiple shards.
Originally it was created thinking that it may need to use future, in
practice it was never used and the future was ignore.
This patch remove the future from the implementation and reduce an
unhandle future warning from the compilation.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This Patch adds an implementation of the get build index API and remove a
FIXME.
The API returns the list of the built secondary indexes belongs to a column family.
Example:
CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) );
CREATE index on scylla_demo.mytableID (time);
$ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid'
["mytableid_time_idx"]
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch fixes a bug that a map is held on the stack and then is used
by a future.
Instead, the map is now wrapped with do_with.
Fixes#4824
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Assembles information and attributes of sstables in one or more
column families.
v2:
* Use (not really legal) nested "type" in json
* Rename "table" param to "cf" for consistency
* Some comments on data sizes
* Stream result to avoid huge string allocations on final json
There is a gcc9 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415
that makes it impossible to pass std::any through a seastar::future<T>.
Fortunately, there is only one user of seastar::future<std::any> in
Scylla and it is not performance-critical. This patch avoids the gcc9
bug by using seastar::future<std::unique_ptr<std::any>>.
Fixes#4525
req_param uses boost::lexical cast to convert text->var.
However, lexical_cast does not handle textual booleans,
thus param=true causes not only wrong values, but
exceptions.
Message-Id: <20190610140511.15478-1-calle@scylladb.com>
Now that named_value::value_as_json() exists, make use of it to report the
current value of a configuration variable via the REST API, instead of
_make_config_values().
ignore_ready_future in load_new_ss_tables broke
migration_test:TestMigration_with_*.migrate_sstable_with_counter_test_expect_fail dtests.
The java.io.NotSerializableException in nodetool was caused by exceptions that
were too long.
This fix prints the problematic file names onto the node system log
and includes the casue in the resulting exception so to provide the user
with information about the nature of the error.
Fixes#4375
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190331154006.12808-1-bhalevy@scylladb.com>
Fixes#4245
Implemented as a compation barrier (forcing previous compactions to
finish) + parameterized "cleanup", with sstable list based on
parameters.
get_compaciton_history can return big chunk of data.
To prevent large memory allocation, the get_compaction_history now read
each compaction_history record and use the http stream to send it.
Fixes#4152
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Use std::function instead of a template parameter. Likely doesn't gain
anyting, because the template was always instantiated with the same type
(the result of std::bind() with the same signatures), but still good practice.
std::function was used instead of noncopyable_function because
sharded::map_reduce0() copies the input function.
Use noncopyable_function instead of a template parameter. Likely doesn't gain
anyting, because the template was always instantiated with the same type
(the result of std::bind() with the same signatures), but still good practice.
This renames some variables and functions to make it clear that they
refer to partitions and not rows.
Old versions of sstablemetadata used to refer to a row histogram, but
current versions now mention a partition histogram instead.
This patch doesn't change the exposed API names.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181229223311.4184-2-espindola@scylladb.com>
This header, which is easily replaced with a forward declaration,
introduces a dependency on database.hh everywhere. Remove it and scatter
includes of database.hh in source files that really need it.