Commit Graph

520 Commits

Author SHA1 Message Date
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Piotr Sarna
f7a7931377 streaming: drop checks for RPC stream support
Streaming with RPC stream is supported for over 2 years and upgrades
are only allowed from versions which already have the support,
so the checks are hereby dropped.
2020-09-14 12:18:13 +02:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Pavel Emelyanov
78f2193956 streaming: Do not reveal raw pointer in info message
Showing raw pointer values in logs is not considered to be good
practice. However, for debugging/tracing this might be helpful.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Asias He
e86881be99 repair: Print repair reason in repair stats log
It is useful to distinguish if the repair is a regular repair or used
for node operations.

In addition, log the keyspace and tables are repaired.

Fixes #7086
2020-08-25 11:05:47 +03:00
Pavel Emelyanov
24eaf827c0 migration_manager: Add messaging service as argument to get_schema_definition
There are 4 places that call this helper:

- storage proxy. Callers are rpc verb handlers and already have the proxy
  at hands from which they can get the messaging service instance
- repair. There's local-global messaging instance at hands, and the caller
  is in verb handler too
- streaming. The caller is verb handler, which is unregistered on stop, so
  the messaging service instance can be captured
- migration manager itself. The caller already uses "this", so the messaging
  service instance can be get from it

The better approach would be to make get_schema_definition be the method of
migration_manager, but the manager is stopped for real on shutdown, thus
referencing it from the callers might not be safe and needs revisiting. At
the same time the messaging service is always alive, so using its reference
is safe.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:53 +03:00
Pavel Emelyanov
d2c475f27c streaming: Keep messaging service on send_info
And use it in send_mutation_fragments.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Pavel Emelyanov
a6888e3ce3 streaming: Keep reference on messaging
Streaming uses messaging, init it with itw own reference.

Nowadays the whole streaming subsystem uses global static references on the
needed services.  This is not nice, but still better than just using code-wide
globals, so treat the messaging service here the same way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Pavel Emelyanov
163d615dc3 streaming: Use local ms() on ::start
This is just a cleanup to avoid explicit global call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Piotr Sarna
bd2d48e99c streaming: make stream_plan::abort noexcept
Aborting a stream plan is used in deinitialization code
ran in noexcept environment, so it should be noexcept itself.
Tested on a not-merged-yet Seastar patch with hardened noexcept
checks for abort_source.

Message-Id: <6eada033bb394d725b83a7e0f92381cb792ef6a1.1596446857.git.sarna@scylladb.com>
2020-08-03 14:00:19 +03:00
Botond Dénes
fe127a2155 sstables: clamp estimated_partitions to [1, +inf) in writers
In some cases estimated number of partitions can be 0, which is albeit a
legit estimation result, breaks many low-level sstable writer code, so
some of these have assertions to ensure estimated partitions is > 0.
To avoid hitting this assert all users of the sstable writers do the
clamping, to ensure estimated partitions is at least 1. However leaving
this to the callers is error prone as #6913 has shown it. As this
clamping is standard practice, it is better to do it in the writers
themselves, avoiding this problem altogether. This is exactly what this
patch does. It also adds two unit tests, one that reproduces the crash
in #6913, and another one that ensures all sstable writers are fine with
estimated partitions being 0 now. Call sites previously doing the
clamping are changed to not do it, it is unnecessary now as the writer
does it itself.

Fixes #6913

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200724120227.267184-1-bdenes@scylladb.com>
2020-07-27 09:19:37 +02:00
Pavel Emelyanov
5060063cd6 messaging: Add missing per-service unregistering methods
5 services register handlers in messaging, but not all of them
have clear unregistration methods.

Summary:

migration_manager: everything is in place, no changes
gossiper: ditto
proxy: some verbs unregistration is missing
repair: no unregistration at all
streaming: ditto

This patch adds the needed unregistration methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-22 16:34:00 +03:00
Pavel Emelyanov
08e36ca77c streaming: Do not use db->invoke_on_all in vain
The db instance is not needed to initialize messages, so use plain smp::invoke_on_all

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-22 16:31:57 +03:00
Rafael Ávila de Espíndola
af44684418 messaging_service: Don't return variadic futures from make_sink_and_source_for_* 2020-06-29 16:50:45 -07:00
Avi Kivity
e5be3352cf database, streaming, messaging: drop streaming memtables
Before Scylla 3.0, we used to send streaming mutations using
individual RPC requests and flush them together using dedicated
streaming memtables. This mechanism is no longer in use and all
versions that use it have long reached end-of-life.

Remove this code.
2020-06-25 15:25:54 +02:00
Rafael Ávila de Espíndola
64c8164e6c everywhere: Update to seastar api v4 (when_all_succeed returning a tuple)
We now just need to replace a few calls to then with then_unpack.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200618172100.111147-1-espindola@scylladb.com>
2020-06-23 19:40:18 +03:00
Avi Kivity
de38091827 priority_manager: merge streaming_read and streaming_write classes into one class
Streaming is handled by just once group for CPU scheduling, so
separating it into read and write classes for I/O is artificial, and
inflates the resources we allow for streaming if both reads and writes
happen at the same time.

Merge both classes into one class ("streaming") and adjust callers. The
merged class has 200 shares, so it reduces streaming bandwidth if both
directions are active at the same time (which is rare; I think it only
happens in view building).
2020-06-22 15:09:04 +03:00
Avi Kivity
9afd599d7c Merge 'range_streamer: Handle table of RF 1 in get_range_fetch_map' from Asias
"
After "Make replacing node take writes" series, with repair based node
operations disabled, we saw the replace operation fail like:

```
[shard 0] init - Startup failed: std::runtime_error (unable to find
sufficient sources for streaming range (9203926935651910749, +inf) in
keyspace system_auth)
```
The reason is the system_auth keyspace has default RF of 1. It is
impossible to find a source node to stream from for the ranges owned by
the replaced node.

In the past, the replace operation with keyspace of RF 1 passes, because
the replacing node calls token_metadata.update_normal_tokens(tokens,
ip_of_replacing_node) before streaming. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9021954492552185543, -9016289150131785593] exists on {127.0.0.6}
```

Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in
range_streamer::get_range_fetch_map will pass if the source is the node
itself. However, it will not stream from the node itself. As a result,
the system_auth keyspace will not get any data.

After the "Make replacing node take writes" series, the replacing node
calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node)
after the streaming finishes. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9049647518073030406, -9048297455405660225] exists on {127.0.0.5}
```

Since 127.0.0.5 was dead, the source node check failed, so the bootstrap
operation.

Ta fix, we ignore the table of RF 1 when it is unable to find a source
node to stream.

Fixes #6351
"

* asias-fix_bootstrap_with_rf_one_in_range_streamer:
  range_streamer: Handle table of RF 1 in get_range_fetch_map
  streaming: Use separate streaming reason for replace operation
2020-06-10 16:03:13 +03:00
Asias He
a521c429e1 streaming: Do not send end of stream in case of error
Current sender sends stream_mutation_fragments_cmd::end_of_stream to
receiver when an error is received from a peer node. To be safe, send
stream_mutation_fragments_cmd::error instead of
stream_mutation_fragments_cmd::end_of_stream to prevent end_of_stream to
be written into the sstable when a partition is not closed yet.

In addition, use mutation_fragment_stream_validator to valid the
mutation fragments emitted from the reader, e.g., check if
partition_start and partition_end are paired when the reader is done. If
not, fail the stream session and send
stream_mutation_fragments_cmd::error instead of
stream_mutation_fragments_cmd::end_of_stream to isolate the problematic
sstables on the sender node.

Refs: #6478
2020-06-09 18:46:12 +03:00
Pavel Emelyanov
67d5fad65f storage_service: Remove some inclusions of its header
GC pass over .cc files. Some really do not need it, some need for features/gossiper

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-01 09:08:40 +03:00
Pavel Emelyanov
07add9767b streaming: Get local db with own helper
There's a static global instance of needed services and helpers
for it in streaming code. This is not great to use them, but at
least this change unifies different pieces of streaming code and
removes the storage_service.hh from streaming_session.cc (the
streaming_sessio.hh doesn't include it either).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-01 09:08:40 +03:00
Pavel Emelyanov
428ef9c9ac streaming: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-01 09:08:40 +03:00
Pavel Emelyanov
5db04fcf30 streaming: Do not explicitly switch sched group
This is continuation of ac998e95 -- the sched group is
switched by messaging service for a verb, no need to do
it by hands.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-01 09:08:40 +03:00
Asias He
fa9ee234a0 streaming: Use separate streaming reason for replace operation
Currently, replace and bootstrap share the same streaming reason,
stream_reason::bootstrap, because they share most of the code
in boot_strapper.

In order to distinguish the two, we need to introduce a new stream
reason, stream_reason::replace. It is safe to do so in a mixed cluster
because current code only check if the stream_reason is
stream_reason::repair.

Refs: #6351
2020-05-22 09:30:52 +08:00
Piotr Jastrzebski
e72696a8e6 sharding_info: rename the class to sharder
Also rename all variables that were named si or sinfo
to sharder.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
94ff653b99 selective_token_range_sharder: replace i_partitioner with sharding_info
The class does not depend on partitioning logic but only uses
sharding logic. This means it is possible and desirable to limit its
dependency to only sharding_info.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 09:36:22 +02:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00
Botond Dénes
e0284bb9ee treewide: add missing headers and/or forward declarations 2020-03-23 09:29:45 +02:00
Rafael Ávila de Espíndola
c0072eab30 everywhere: Be more explicit that we don't want std::make_shared
If sstring is made an alias to std::string ADL causes std::make_shared
to be found. Explicitly ask for ::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-03-10 13:13:48 -07:00
Avi Kivity
906784639d Merge "Clean sstables from using global objects" from Pavel E
"
This set cleans sstable_writer_config and surrounding sstables
code from using global storage_ and feature_ service-s and database
by moving the configuration logic onto sstables_manager (that
was supposed to do it since eebc3701a5).

Most of the complexity is hidden around sstable_writer_config
creation, this set makes the sstables_manager create this object
with an explicit call. All the rest are consequences of this change.

Tests: unit(debug), manual start-stop
"

* 'br-clean-sstables-manager-2' of https://github.com/xemul/scylla:
  sstables: Move get_highest_supported_format
  sstables: Remove global get_config() helper
  sstables: Use manager's config() in .new_sstable_component_file()
  sstable_writer_config: Extend with more db::config stuff
  sstables_manager: Don't use global helper to generate writer config
  sstable_writer_config: Sanitize out some features fields initialization
  sstable_writer_config: Factor out some field initialization
  sstables: Generate writer config via manager only
  sstables: Keep reference on manager
  test: Re-use existing global sstables_manager
  table: Pass sstable_writer_config into write_memtable_to_sstable
2020-03-03 18:33:01 +02:00
Raphael S. Carvalho
40e75fb109 streaming/stream_transfer_task: avoid pointless iterations in has_relevant_range_on_this_shard()
When has_relevant_range_on_this_shard() found a relevant range, it will unnecessarily
iterate through the end. Verified manually that this could be thousands of pointless
iterations when streaming data to a node just added. The relevant code could be
simplified by de-futurizing it but I think it remains so to allow task scheduler
to preempt it if necessary.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200220224048.28804-2-raphaelsc@scylladb.com>
2020-02-28 07:57:12 +02:00
Raphael S. Carvalho
8a986bc23b streaming/stream_transfer_task: avoid unecessary copies of ranges
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200220224048.28804-1-raphaelsc@scylladb.com>
2020-02-28 07:57:12 +02:00
Pavel Emelyanov
5adce3390c sstables: Generate writer config via manager only
The sstable_writer_config creation looks simple (just declare
the struct instance) but behind the scenes references storage
and feature services, messes with database config, etc.

This patch teaches the sstables_manager generate the writer
config and makes the rest of the code use it. For future
safety by-hands creation of the sstable_writer_config is
prohibited.

The manager is referenced through table-s and sstable-s, but
two existing sstables_managers live on database object, and
table-s and sstable-s both live shorter than the database,
this reference is save.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-25 14:31:04 +03:00
Raphael S. Carvalho
56f66cff9f dht: Extract to_partition_ranges() from streaming to allow reuse
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-20 10:53:01 -03:00
Piotr Jastrzebski
9494da2102 distribute_reader_and_consume_on_shards: don't take partitioner
This function already takes schema so it can get partitioner
using schema::get_partitioner.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
db19a76b1f selective_token_range_sharder: stop calling global_partitioner()
This requires a change in a repair that uses
selective_token_range_sharder.

Repair performs operation on a set of tables. We will have to
make sure that all of that tables use the same partitioner.

This is achieved by adding a check to a repair_info constructor.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:19:15 +01:00
Piotr Jastrzebski
dd1120454b dht: move sharders to a separate header
i_partitioner.hh is widely included while sharders are used
only in 6 places so there's no need to include them in
the whole codebase.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:19:02 +01:00
Pavel Emelyanov
b11cf6e950 cql3/query_processor.hh: Debloat from other headers
This gives ~30% less (251 jobs -> 181 jobs) recompile when touching it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200212225828.3374-1-xemul@scylladb.com>
2020-02-16 11:22:30 +02:00
Pavel Emelyanov
abe588888d database: Use feature service
Keep local feature_service reference on database. This relaxes the
circular storage_service <-> database reference, but not removes it
completely.

This needs some args tossing in apply_to_builder, but it's
rather straightforward, so comes in the same patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-03 15:16:23 +03:00
Asias He
145fd0313a streaming: Fix map access in stream_manager::get_progress
When the progress is queried, e.g., query from nodetool netstats
the progress info might not be updated yet.

Fix it by checking before access the map to avoid errors like:

std::out_of_range (_Map_base::at)

Fixes: #5437
Tests: nodetool_additional_test.py:TestNodetool.netstats_test
2020-01-06 10:31:15 +02:00
Asias He
6b7344f6e5 streaming: Fix typo in stream_result_future::maybe_complete
s/progess/progress/

Refs: #5437
2019-12-16 11:12:03 +02:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Asias He
b89ced4635 streaming: Do not open rpc stream connection if reader has no data
We can use the reader::peek() to check if the reader contains any data.
If not, do not open the rpc stream connection. It helps to reduce the
port usage.

Refs: #4943
2019-10-08 10:31:02 +02:00
Botond Dénes
783277fb02 stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase
Currently when an error happens during the receive and distribute phase
it is swallowed and we just return a -1 status to the remote. We only
log errors that happen during responding with the status. This means
that when streaming fails, we only know that something went wrong, but
the node on which the failure happened doesn't log anything.

Fix by also logging errors happening in the receive and distribute
phase. Also mention the phase in which the error happened in both error
log messages.

Refs: #4901
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>
2019-09-05 13:43:00 +02:00
Botond Dénes
136fc856c5 treewide: silence discarded future warnings for questionable discards
This patches silences the remaining discarded future warnings, those
where it cannot be determined with reasonable confidence that this was
indeed the actual intent of the author, or that the discarding of the
future could lead to problems. For all those places a FIXME is added,
with the intent that these will be soon followed-up with an actual fix.
I deliberately haven't fixed any of these, even if the fix seems
trivial. It is too easy to overlook a bad fix mixed in with so many
mechanical changes.
2019-08-26 19:28:43 +03:00
Botond Dénes
fddd9a88dd treewide: silence discarded future warnings for legit discards
This patch silences those future discard warnings where it is clear that
discarding the future was actually the intent of the original author,
*and* they did the necessary precautions (handling errors). The patch
also adds some trivial error handling (logging the error) in some
places, which were lacking this, but otherwise look ok. No functional
changes.
2019-08-26 18:54:44 +03:00
Asias He
49a73aa2fc streaming: Move stream_mutation_fragments_cmd to a new file (#4812)
Avoid including the lengthy stream_session.hh in messaging_service.

More importantly, fix the build because currently messaging_service.cc
and messaging_service.hh does not include stream_mutation_fragments_cmd.
I am not sure why it builds on my machine. Spotted this when backporting
the "streaming: Send error code from the sender to receiver" to 3.0
branch.

Refs: #4789
2019-08-07 14:59:46 +02:00
Asias He
288371ce75 streaming: Do not call rpc stream flush in send_mutation_fragments
The stream close() guarantees the data sent will be flushed. No need to
call the stream flush() since the stream is not reused.

Follow up fix for commit bac987e32a (streaming: Send error code from
the sender to receiver).

Refs #4789
2019-08-07 14:31:17 +02:00
Asias He
bac987e32a streaming: Send error code from the sender to receiver
In case of error on the sender side, the sender does not propagate the
error to the receiver. The sender will close the stream. As a result,
the receiver will get nullopt from the source in
get_next_mutation_fragment and pass mutation_fragment_opt with no value
to the generating_reader. In turn, the generating_reader generates end
of stream. However, the last element that the generating_reader has
generated can be any type of mutation_fragment. This makes the sstable
that consumes the generating_reader violates the mutation_fragment
stream rule.

To fix, we need to propagate the error. However RPC streaming does not
support propagate the error in the framework. User has to send an error
code explicitly.

Fixes: #4789
2019-08-06 16:54:56 +02:00