We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.
In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
Not used yet, this patch does all the churn of propagating a permit
to each impl.
In the next patch we will use it to track to track the memory
consumption of `_buffer`.
The verb is sent by repair code, so it should be registered
in the same place, not in main. Also -- the verb should be
unregistered on stop.
The global messaging service instance is made similarly to the
row-level one, as there's no ready to use repair service.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There goal is to make it possible to reg/unreg not only row-level
verbs. While at it -- equip the init call with sharded<database>&
argument, it will be needed by the next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series follows the suggestion from https://github.com/scylladb/scylla/pull/7203#issuecomment-689499773 discussion and deprecates a number of cluster features. The deprecation does not remove any features from the strings sent via gossip to other nodes, but it removes all checks for these features from code, assuming that the checks are always true. This assumption is quite safe for features introduced over 2 years ago, because the official upgrade path only allows upgrading from a previous official release, and these feature bits were introduced many release cycles ago.
All deprecated features were picked from a `git blame` output which indicated that they come from 2018:
```git
e46537b7d3 2016-05-31 11:44:17 +0200 RANGE_TOMBSTONES_FEATURE = "RANGE_TOMBSTONES";
85c092c56c 2016-07-11 10:59:40 +0100 LARGE_PARTITIONS_FEATURE = "LARGE_PARTITIONS";
02bc0d2ab3 2016-12-09 22:09:30 +0100 MATERIALIZED_VIEWS_FEATURE = "MATERIALIZED_VIEWS";
67ca6959bd 2017-01-30 19:50:13 +0000 COUNTERS_FEATURE = "COUNTERS";
815c91a1b8 2017-04-12 10:14:38 +0300 INDEXES_FEATURE = "INDEXES";
d2a2a6d471 2017-08-03 10:53:22 +0300 DIGEST_MULTIPARTITION_READ_FEATURE = "DIGEST_MULTIPARTITION_READ";
ecd2bf128b 2017-09-01 09:55:02 +0100 CORRECT_COUNTER_ORDER_FEATURE = "CORRECT_COUNTER_ORDER";
713d75fd51 2017-09-14 19:15:41 +0200 SCHEMA_TABLES_V3 = "SCHEMA_TABLES_V3";
2f513514cc 2017-11-29 11:57:09 +0000 CORRECT_NON_COMPOUND_RANGE_TOMBSTONES = "CORRECT_NON_COMPOUND_RANGE_TOMBSTONES";
0be3bd383b 2017-12-04 13:55:36 +0200 WRITE_FAILURE_REPLY_FEATURE = "WRITE_FAILURE_REPLY";
0bab3e59c2 2017-11-30 00:16:34 +0000 XXHASH_FEATURE = "XXHASH";
fbc97626c4 2018-01-14 21:28:58 -0500 ROLES_FEATURE = "ROLES";
802be72ca6 2018-03-18 06:25:52 +0100 LA_SSTABLE_FEATURE = "LA_SSTABLE_FORMAT";
71e22fe981 2018-05-25 10:37:54 +0800 STREAM_WITH_RPC_STREAM = "STREAM_WITH_RPC_STREAM";
```
Tests: unit(dev)
manual(verifying with cqlsh that the feature strings are indeed still set)
"
Closes#7234.
* psarna-clean_up_features:
gms: add comments for deprecated features
gms: remove unused feature bits
streaming: drop checks for RPC stream support
roles: drop checks for roles schema support
service: drop checks for xxhash support
service: drop checks for write failure reply support
sstables: drop checks for non-compound range tombstones support
service: drop checks for v3 schema support
repair: drop checks for large partitions support
service: drop checks for digest multipartition read support
sstables: drop checks for correct counter order support
cql3: drop checks for materialized views support
cql3: drop checks for counters support
cql3: drop checks for indexing support
from Asias.
This series follows "repair: Add progress metrics for node ops #6842"
and adds the metrics for the remaining node operations,
i.e., replace, decommission and removenode.
Fixes#1244, #6733
* asias-repair_progress_metrics_replace_decomm_removenode:
repair: Add progress metrics for removenode ops
repair: Add progress metrics for decommission ops
repair: Add progress metrics for replace ops
Change 94995acedb added yielding to abstract_replication_strategy::do_get_ranges.
And 07e253542d used get_ranges_in_thread in compaction_manager.
However, there is nothing to prevent token_metadata, and in particular its
`_sorted_tokens` from changing while iterating over them in do_get_ranges if the latter yields.
Therefore copy the the replication strategy `_token_metadata` in `get_ranges_in_thread(inet_address ep)`.
If the caller provides `token_metadata` to get_ranges_in_thread, then the caller
must make sure that we can safely yield while accessing token_metadata (like
in `do_rebuild_replace_with_repair`).
Fixes#7044
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200915074555.431088-1-bhalevy@scylladb.com>
Large partitions are supported for over 2 years and upgrades are only
allowed from versions which already have the support, so the checks
are hereby dropped.
The method extracts an element from the list, constructs
a desired object from it and frees. This is common usage
of range_tombstone_list. Having a helper helps encapsulating
the exact collection inside the class.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The net patch will change the way range tombstones are
fed into hasher. To make sure the codeflow doesn't
become exception-unsafe, mark the relevant methods as
nont-throwing.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The following metric is added:
scylla_node_maintenance_operations_removenode_finished_percentage{shard="0",type="gauge"} 0.650000
It is the number of finished percentage for removenode operation so
far.
Fixes#1244, #6733
The following metric is added:
scylla_node_maintenance_operations_decommission_finished_percentage{shard="0",type="gauge"}
0.650000
It is the number of finished percentage for decommission operation so
far.
Fixes#1244, #6733
The following metric is added:
scylla_node_maintenance_operations_replace_finished_percentage{shard="0",type="gauge"} 0.650000
It is the number of finished percentage for replace operation so far.
Fixes#1244, #6733
"
This series adds scylla repairs command to help debug repair.
Fixes#7103
"
* asias-repair_help_debug_scylla_repairs_cmd:
scylla-gdb.py: Add scylla repairs command
repair: Add repair_state to track repair states
scylla-gdb.py: Print the pointers of elements in boost_intrusive_list_printer
scylla-gdb.py: Add printer for gms::inet_address
scylla-gdb.py: Fix a typo in boost_intrusive_list
repair: Fix the incorrect comments for _all_nodes
repair: Add row_level_repair object pointer in repair_meta
repair: Add counter for reads issued and finished for repair_reader
We copy a list, which was reported to generate a 15ms stall.
This is easily fixed by moving it instead, which is safe since this is
the last use of the variable.
Fixes#7115.
Use repair_state to track the major state of repair from the beginning
to the end of repair.
With this patch, we can easily know at which state both the repair
master and followers are. It is very helpful when debugging a repair
hang issue.
Refs #7103
It is useful to distinguish if the repair is a regular repair or used
for node operations.
In addition, log the keyspace and tables are repaired.
Fixes#7086
Storage service and repair code have identical helpers to get local
ranges for keyspace. Move this helper's code onto database, later it
will be reused by one more place.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are 4 places that call this helper:
- storage proxy. Callers are rpc verb handlers and already have the proxy
at hands from which they can get the messaging service instance
- repair. There's local-global messaging instance at hands, and the caller
is in verb handler too
- streaming. The caller is verb handler, which is unregistered on stop, so
the messaging service instance can be captured
- migration manager itself. The caller already uses "this", so the messaging
service instance can be get from it
The better approach would be to make get_schema_definition be the method of
migration_manager, but the manager is stopped for real on shutdown, thus
referencing it from the callers might not be safe and needs revisiting. At
the same time the messaging service is always alive, so using its reference
is safe.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now all the users of messaging service have the needed reference.
Again, the messaging service is not really stopped at the end, so its usage
is safe regardless of whether repair stuff itself leaks on stop or not.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reference comes from repair_info and storage_service calls, both
had been already patched for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The row-level repair keeps its statics for needed services, same as the
streaming does. Treat the messaging service the same way to stop using
the global one in the next patches.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This function needs the messaging service inside, but the closest place where it
can get one from is the storage_service API handlers. Temporarily move the call for
global messaging service into storage service, its turn for this cleanup will
come later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The db.invoke_on_all's lambda tries to get the sharded db reference via
the global storage service. This can be done in a much nicer way.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.
`contains` does not only express the intend of the code better but also
does it in more unified way.
This commit replaces all the occurences of the `count` with the
`contains`.
Tests: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
"
This series adds progress metrics for the node operations. Metrics for bootstrap and rebuild progress are added as a starter. I will add more for the remaining operations after getting feedback.
With this the Scylla Monitor and Scylla Manager can know the progress of the bootstrap and other node operations. E.g.,
scylla_node_ops_bootstrap_nr_ranges_finished{shard="0",type="derive"} 50
scylla_node_ops_bootstrap_nr_ranges_total{shard="0",type="derive"} 1040
Fixes#1244, #6733
"
* 'repair_progress_metrics_v3' of github.com:asias/scylla:
repair: Add progress metrics for repair ops
repair: Add progress metrics for rebuild ops
repair: Add progress metrics for bootstrap ops
"
It is pretty hard to find the repair_meta object when debugging a core.
This patch makes it is easier by putting repair_meta object created by
both repair follower and master into a map.
Fixes#7009
"
* asias-repair_make_debug_eaiser_track_all_repair_metas:
repair: Add repair_meta_tracker to track repair_meta for followers and masters
repair: Move thread local object _repair_metas out of the function
It is pretty hard to find the repair_meta object when debugging a core.
This patch makes it is easier by putting repair_meta object created by
both repair follower and master into boost intrusive list.
Fixes#7009
"
This path set fixes stalls in repair that are caused by std::list merge and clear operations during test_latency_read_with_nemesis test.
Fixes#6940Fixes#6975Fixes#6976
"
* 'fix_repair_list_stall_merge_clear_v2' of github.com:asias/scylla:
repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower
repair: Use clear_gently in get_sync_boundary to avoid stall
utils: Add clear_gently
repair: Use merge_to_gently to merge two lists
utils: Add merge_to_gently
The row_diff list in apply_rows_on_master_in_thread and
apply_rows_on_follower can be large. Modify do_apply_rows to remove the
row from the list when the row is consumed to avoid stall when the list
is destroyed.
Fixes#6975
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
the code pattern looked like:
<collection>.find(<element>) != <collection>.end()
In C++20 the same can be expressed with:
<collection>.contains(<element>)
This is not only more concise but also expresses the intend of the code
more clearly.
This commit replaces all the occurences of the old pattern with the new
approach.
Tests: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>
The following metric is added:
scylla_node_maintenance_operations_repair_finished_percentage{shard="0",type="gauge"} 0.650000
It is the number of finished percentage for all ongoing repair operations.
When all ongoing repair operations finish, the percentage stays at 100%.
Fixes#1244, #6733
The following metric is added:
scylla_node_maintenance_operations_rebuild_finished_percentage{shard="0",type="gauge"} 0.650000
It is the number of finished percentage for rebuild operation so far.
Fixes#1244, #6733
The following metric is added:
scylla_node_maintenance_operations_bootstrap_finished_percentage{shard="0",type="gauge"} 0.850000
It is the number of finished percentage for bootstrap operation so far.
Fixes#1244, #6733
We saw scylla hit user after free in repair with the following procedure during tests:
- n1 and n2 in the cluster
- n2 ran decommission
- n2 sent data to n1 using repair
- n2 was killed forcely
- n1 tried to remove repair_meta for n1
- n1 hit use after free on repair_meta object
This was what happened on n1:
1) data was received -> do_apply_rows was called -> yield before create_writer() was called
2) repair_meta::stop() was called -> wait_for_writer_done() / do_wait_for_writer_done was called
with _writer_done[node_idx] not engaged
3) step 1 resumed, create_writer() was called and _repair_writer object was referenced
4) repair_meta::stop() finished, repair_meta object and its member _repair_writer was destroyed
5) The fiber created by create_writer() at step 3 hit use after free on _repair_writer object
To fix, we should call wait_for_writer_done() after any pending
operations were done which were protected by repair_meta::_gate. This
prevents wait for writer done finishes before the writer is in the
process of being created.
Fixes: #6853Fixes: #6868
Backports: 4.0, 4.1, 4.2
In some cases estimated number of partitions can be 0, which is albeit a
legit estimation result, breaks many low-level sstable writer code, so
some of these have assertions to ensure estimated partitions is > 0.
To avoid hitting this assert all users of the sstable writers do the
clamping, to ensure estimated partitions is at least 1. However leaving
this to the callers is error prone as #6913 has shown it. As this
clamping is standard practice, it is better to do it in the writers
themselves, avoiding this problem altogether. This is exactly what this
patch does. It also adds two unit tests, one that reproduces the crash
in #6913, and another one that ensures all sstable writers are fine with
estimated partitions being 0 now. Call sites previously doing the
clamping are changed to not do it, it is unnecessary now as the writer
does it itself.
Fixes#6913
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200724120227.267184-1-bdenes@scylladb.com>
5 services register handlers in messaging, but not all of them
have clear unregistration methods.
Summary:
migration_manager: everything is in place, no changes
gossiper: ditto
proxy: some verbs unregistration is missing
repair: no unregistration at all
streaming: ditto
This patch adds the needed unregistration methods.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We recently saw a weird log message:
WARN 2020-07-19 10:22:46,678 [shard 0] repair - repair id [id=4,
uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] failed: std::runtime_error
({shard 0: fmt::v6::format_error (invalid type specifier), shard 1:
fmt::v6::format_error (invalid type specifier)})
It turned out we have:
throw std::runtime_error(format("repair id {:d} on shard {:d} failed to
repair {:d} sub ranges", id, shard, nr_failed_ranges));
in the code, but we changed the id from integer to repair_uniq_id class.
We do not really need to specify the format specifiers for numbers.
Fixes#6874
All they can live with forward declaration of the f._m._r. plus a
seastar header in commitlog code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The schema_tables.hh -> migration_manager.hh couple seems to work as one
of "single header for everyhing" creating big blot for many seemingly
unrelated .hh's.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently, repair uses an integer to identify a repair job. The repair
id starts from 1 since node restart. As a result, different repair jobs
will have same id across restart.
To make the id more unique across restart, we can use an uuid in
addition to the integer id. We can not drop the use of the integer id
completely since the http api and nodetool use it.
Fixes#6786
Consider a cluster with two nodes:
- n1 (dc1)
- n2 (dc2)
A third node is bootstrapped:
- n3 (dc2)
The n3 fails to bootstrap as follows:
[shard 0] init - Startup failed: std::runtime_error
(bootstrap_with_repair: keyspace=system_distributed,
range=(9183073555191895134, 9196226903124807343], no existing node in
local dc)
The system_distributed keyspace is using SimpleStrategy with RF 3. For
the keyspace that does not use NetworkTopologyStrategy, we should not
require the source node to be in the same DC.
Fixes: #6744
Backports: 4.0 4.1, 4.2