C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.
`contains` does not only express the intend of the code better but also
does it in more unified way.
This commit replaces all the occurences of the `count` with the
`contains`.
Tests: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
The following stall was seen during a cleanup operation:
scylla: Reactor stalled for 16262 ms on shard 4.
| std::_MakeUniq<locator::tokens_iterator_impl>::__single_object std::make_unique<locator::tokens_iterator_impl, locator::tokens_iterator_impl&>(locator::tokens_iterator_impl&) at /usr/include/fmt/format.h:1158
| (inlined by) locator::token_metadata::tokens_iterator::tokens_iterator(locator::token_metadata::tokens_iterator const&) at ./locator/token_metadata.cc:1602
| locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at simple_strategy.cc:?
| (inlined by) locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at ./locator/simple_strategy.cc:56
| locator::abstract_replication_strategy::get_ranges(gms::inet_address, locator::token_metadata&) const at /usr/include/fmt/format.h:1158
| locator::abstract_replication_strategy::get_ranges(gms::inet_address) const at /usr/include/fmt/format.h:1158
| service::storage_service::get_ranges_for_endpoint(seastar::basic_sstring<char, unsigned int, 15u, true> const&, gms::inet_address const&) const at /usr/include/fmt/format.h:1158
| service::storage_service::get_local_ranges(seastar::basic_sstring<char, unsigned int, 15u, true> const&) const at /usr/include/fmt/format.h:1158
| (inlined by) operator() at ./sstables/compaction_manager.cc:691
| (inlined by) _M_invoke at /usr/include/c++/9/bits/std_function.h:286
| std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>::operator()(table const&) const at /usr/include/fmt/format.h:1158
| (inlined by) compaction_manager::rewrite_sstables(table*, sstables::compaction_options, std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>) at ./sstables/compaction_manager.cc:604
| compaction_manager::perform_cleanup(table*) at /usr/include/fmt/format.h:1158
To fix, we furturize the function to get local ranges and sstables.
In addition, this patch removes the dependency to global storage_service object.
Fixes#6662
The results of get_snapshot_details() is saved in do_with, then is
captured on the json callback by reference, then the do_with's
future returns, so by the time callback is called the map is already
free and empty.
Fix by capturing the result directly on the callback.
Fixes recently merged b6086526.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This includes
- rename namespace in snapshot-ctl.[cc|hh]
- move methods from storage_service to snapshot_ctl
- move snapshot_details struct
- temporarily make storage_service._snapshot_lock and ._snapshot_ops public
- replace two get_local_storage_service() occurrences with this._db
The latter is not 100% clear as the code that does this references "this"
from another shard, but the _db in question is the distributed object, so
they are all the same on all instances.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
A placeholder for snapshotting code that will be moved into it
from the storage_service.
Also -- pass it through the API for future use.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now with the seastar httpd routes unset() at hands we
can shut down individual API endpoints. Do this for
snapshot calls, this will make snapshot controller stop
safe.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The lambda calls the core snapshot method deep inside the
json marshalling callback. This will bring problems with
stopping the snapshot controller in the next patches.
To prepare for this -- call the .get_snapshot_details()
first, then keep the result in do_with() context. This
change doesn't affect the issue the lambde in question is
about to solve as the whole result set is anyway kept in
memory while being streamed outside.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We are about to use the auto compaction property during the
populate/reshard process. If the user toggles it, the database can be
left in a bad state.
There should be no reason why a user would want to set that up this
early. So we'll disallow it.
To do that property, it is better if the check of whether or not
the storage service is ready to accomodate this request is local
to the storage service itself. We then move the logic of set_tables_autocompaction
from api to the storage service. The API layer now merely translates
the table names and pass it along.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/6551
from Juliusz Stasiewicz:
The command regenerates streams when:
generations corresponding to a gossiped timestamp cannot be
fetched from system_distributed table,
or when generation token ranges do not align with token metadata.
In such case the streams are regenerated and new timestamp is
gossiped around. The returned JSON is always empty, regardless of
whether streams needed regeneration or not.
Fixes#6498
Accompanied by: scylladb/scylla-jmx#109, scylladb/scylla-tools-java#172
Remove the on-storage_service instance and make everybody use
th standalone one.
Stopping the thrift is done by registering the controller in
client service shutdown hooks. This automatically wires the
stopping into drain, decommission and isolation codes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The goal is to make the relevant endpoints work on standalone
thrift controller instead of the storage_service's one, so
prepare this controller (dummy for now) and pass it all the
way down the API code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Remove the on-storage_service instance and make everybody use
th standalone one.
Stopping the server is done by registering the controller in
client service shutdown hooks. This automatically wires the
stopping into drain, decommission and isolation codes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The goal is to make the relevant endpoints work on standalone
cql controller instead of the storage_service's one, so
prepare this controller (dummy for now) and pass it all the
way down the API code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currntly API endpoints to start and stop cql_server and thrift
are registered right after the storage service is started, but
much earlier than those services are. In between these two
points a lot of other stuff gets initialized. This opens a small
window during which cql_server and thrift can be started by
hand too early.
The most obvious problem is -- the storage_service::join_cluster()
may not yet be called, the auth service is thus not started, but
starting cql/thrift needs auth.
Another problem is those endpoints are not unregistered on stop,
thus creating another way to start cql/thrif at wrong time.
Also the endpoints registration change helps further patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The command regenerates streams when:
- generations corresponding to a gossiped timestamp cannot be
fetched from `system_distributed` table,
- or when generation token ranges do not align with token metadata.
In such case the streams are regenerated and new timestamp is
gossiped around. The returned JSON is always empty, regardless of
whether streams needed regeneration or not.
"
This series changes the describe_ring API to use HTTP stream instead of serializing the results and send it as a single buffer.
While testing the change I hit a 4-year-old issue inside service/storage_proxy.cc that causes a use after free, so I fixed it along the way.
Fixes#6297
"
* amnonh-stream_describe_ring:
api/storage_service.cc: stream result of token_range
storage_service: get_range_to_address_map prevent use after free
The get token range API can become big which can cause large allocation
and stalls.
This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.
Fixes#6297
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This series adds support for taking a snapshot of multiple tables.
Fixes#6333
* amnonh-snapshot_keyspace_table:
api/storage_service.cc: Snapshot, support multiple tables
service/storage_service: Take snapshot of multiple tables
The patch implements:
- /storage_service/auto_compaction API endpoint
- /column_family/autocompaction/{name} API endpoint
Those APIs allow to control and request the status of background
compaction jobs for the existing tables.
The implementation introduces the table::_compaction_disabled_by_user.
Then the CompactionManager checks if it can push the background
compaction job for the corresponding table.
New members
===
table::enable_auto_compaction();
table::disable_auto_compaction();
bool table::is_auto_compaction_disabled_by_user() const
Test
===
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
$ ninja build/dev/test/boost/sstable_datafile_test
$ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000
The test tries to submit a compaction job after playing
with autocompaction control table switch. However, there is
no reliable way to hook pending compaction task. The code
assumed that with_scheduling_group() closure will never
preempt execution of the stats check.
Revert
===
Reverts commit c8247ac. In previous version the execution
sometimes resulted into the following error:
test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed
This version adds a few sstables to the cf, starts
the compaction and awaits until it is finished.
API change
===
- `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled).
- `/column_family/autocompaction/` got support for POST/DELETE per table
Fixes
===
Fixes#1488Fixes#1808Fixes#440
Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
This reverts commit 1c444b7e1e. The test
it adds sometimes fails as follows:
test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed
Ivan is working on a fix, but let's revert this commit to avoid blocking
next promotion failing from time to time.
This patch adds API endpoint /column_family/autocompaction/{name}
that listen to GET and POST requests to pick and control table
background compactions.
To implement that the patch introduces "_compaction_disabled_by_user"
flag that affects if CompactionManager is allowed to push background
compactions jobs into the work.
It introduces
table::enable_auto_compaction();
table::disable_auto_compaction();
bool table::is_auto_compaction_disabled_by_user() const
to control auto compaction state.
Fixes#1488Fixes#1808Fixes#440
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
The global get_highest_supported_format helper and its declaration
are scattered all over the code, so clean this up and prepare the
ground for moving _sstables_format from the storage_service onto
the sstables_manager (not this set).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are cases when it is useful to delete specific table from a
snapshot.
An example is when a snapshot is used for backup. Backup can take a long
period of time, during that time, each of the tables can be deleted once
it was backup without waiting for the entire backup process to
completed.
This patch adds such an option to the database and to the storage_service
wrapping method that calls it.
If a table is specified a filter function is created that filter only
the column family with that given name.
This is similar to the filtering at the keyspace level.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
" from Botond
Nodetool scrub rewrites all sstables, validating their data. If corrupt
data is found the scrub is aborted. If the skip-corrupted flag is set,
corrupt data is instead logged (just the keys) and skipped.
The scrubbing algorithm itself is fairly simple, especially that we
already have a mutation stream validator that we can use to validate the
data. However currently scrub is piggy-backed on top of cleanup
compaction. To implement this flag, we have to make scrub a separate
compaction type and propagate down the flag. This required some
massaging of the code:
* Add support for more than two (cleanup or not) compaction types.
* Allow passing custom options for each compaction type.
* Allow stopping a compaction without the manager retrying it later.
Additionally the validator itself needed some changes to allow different
ways to handle errors, as needed by the scrub.
Fixes: #5487
* https://github.com/denesb/nodetool-scrub-skip-corrupted/v7:
table: cleanup_sstables(): only short-circuit on actual cleanup
compaction: compaction_type: add Upgrade
compaction: introduce compaction_options
compaction: compaction_descriptor: use compaction options instead of
cleanup flag
compaction_manager: collect all cleanup related logic in
perform_cleanup()
sstables: compaction_stop_exception: add retry flag
mutation_fragment_stream_validator: split into low-level and
high-level API
compaction: introduce scrub_compaction
compaction_manager: scrub: don't piggy-back on upgrade_sstables()
test: sstable_datafile_test: add scrub unit test
Now that we have the necessary infrastructure to do actual scrubbing,
don't rely on `upgrade_sstables()` anymore behind the scenes, instead do
an actual scrub.
Also, use the skip-corrupted flag.
The list_snapshot API, uses http stream to stream the result to the
caller.
It needs to keep all objects and stream alive until the stream is closed.
This patch adds do_with to hold these objects during the lifetime of the
function.
Fixes#5752
get_snapshot should use http stream to reduce memory allocation and
stalls.
This patch change the implementation so it would stream each of the
snapshot object instead of creating a single response and return it.
Fixes#5468
Depends on scylladb/seastar#723
In storage_service's snapshot code there are checks for
_operation_mode being _not_ JOINING to proceed. The intention
is apparently to allow for snapshots only after the cluster
join. However, here's how the start-up code looks like
- _operation_mode = STARTING in storage_service::constructor
- snapshot API registered in api::set_server_storage_service
- _operation_mode = JOINING in storage_service::join_token_ring
So in between steps 2 and 3 snapshots can be taken.
Although there's a quick and simple fix for that (check for the
_operation_mode to be not STARTING either) I think it's better
to register the snapshot API later instead. This will help
greatly to de-bload the storage_service, in particular -- to
incapsulate the _operation_mode properly.
Note, though the check for _operation_mode is made only for
taking snapshot, I move all snapshot ops registration to the
later phase.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is preparation for the next patch -- the lambda in
question (and the used type) will be needed in two
functions, so make the lambda a "real" function.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a lonely get_load_map() call on storage_service that
needs only load broadcaster, always runs on shard 0 and that's it.
Next patch will move this whole stuff into its own helper no-shard
container and this is preparation for this.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.
This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.
The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
From Shlomi:
4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)
I get the error on c-s once decommission ends
java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated
The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.
To fix, we should reject the nodetool cleanup when there is pending ranges on that node.
Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.
Refs: #5045
Assembles information and attributes of sstables in one or more
column families.
v2:
* Use (not really legal) nested "type" in json
* Rename "table" param to "cf" for consistency
* Some comments on data sizes
* Stream result to avoid huge string allocations on final json
ignore_ready_future in load_new_ss_tables broke
migration_test:TestMigration_with_*.migrate_sstable_with_counter_test_expect_fail dtests.
The java.io.NotSerializableException in nodetool was caused by exceptions that
were too long.
This fix prints the problematic file names onto the node system log
and includes the casue in the resulting exception so to provide the user
with information about the nature of the error.
Fixes#4375
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190331154006.12808-1-bhalevy@scylladb.com>
Fixes#4245
Implemented as a compation barrier (forcing previous compactions to
finish) + parameterized "cleanup", with sstable list based on
parameters.
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"