Commit Graph

162 Commits

Author SHA1 Message Date
Asias He
0ba4e73068 streaming: Introduce the failed parameter for complete message
Use this flag to notify the peer that the session is failed so that the
peer can close the failed session more quickly.

The flag is used as a rpc::optional so it is compatible use old
version of the verb.
2017-07-18 11:24:31 +08:00
Tomasz Grabiec
07ed512060 migration_manager: Give empty response to schema pulls from incompatible nodes
The old nodes which are still using v2 schema tables will fail to
apply our response, with error messages complaining about not being
able to locate schema of certain versions (new schema tables). This
change inhibits such errors by responding with an empty mutation list.
2017-07-07 19:09:57 +02:00
Avi Kivity
c4ae2206c7 messaging: respect inter_dc_tcp_nodelay configuration parameter
We respect it partially (client side only) for now.

Fixes #6.
Message-Id: <20170623172048.23103-1-avi@scylladb.com>
2017-06-24 21:49:27 +02:00
Gleb Natapov
23c51b3e57 messaging_service: connection drop notifier
Allow registering callbacks that will be called when connection is going
down.
2017-06-13 09:57:14 +03:00
Gleb Natapov
69c5526301 messaging_service: return cache hit ratio as part of data read 2017-06-13 09:57:14 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Paweł Dziepak
bf60b7844b messaging_service: add COUNTER_MUTATION verb
This verb is going to be used for coordinator<->leader communication
during counter updates.
2017-02-02 10:35:14 +00:00
Amnon Heiman
45b6070832 Merge seastar upstream
* seastar 397685c...c1dbd89 (13):
  > lowres_clock: drop cache-line alignment for _timer
  > net/packet: add missing include
  > Merge "Adding histogram and description support" from Amnon
  > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
  > Set the option '--server' of tests/tcp_sctp_client to be required
  > core/memory: Remove superfluous assignment
  > core/memory: Remove dead code
  > core/reactor: Use logger instead of cerr
  > fix inverted logic in overprovision parameter
  > rpc: fix timeout checking condition
  > rpc: use lowres_clock instead of high resolution one
  > semaphore: make semaphore's clock configurable
  > rpc: detect timedout outgoing packets earlier

Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.

Includes fixup from Amnon:

collectd api should use the metrics getters

As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.

This will be important when the internal implementation will changed
from union to variant.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
2017-02-01 14:39:08 +02:00
Gleb Natapov
0a2dd39c75 messaging_service: move MUTATION_DONE messages to separate connection
If a node gets more MUTATION request that it can handle via RPC it will
stop reading from this RPC connection, but this will prevent it from
getting MUTATION_DONE responses for requests it coordinates because
currently MUTATION and MUTATION_DONE messages shares same connection.

To solve this problem this patches moves MUTATION_DONE messages to
separate connection.

Fixes: #1843

Message-Id: <20161201155942.GC11581@scylladb.com>
2016-12-21 11:10:15 +02:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Asias He
85034c1b57 Convert to use dht::partition_range 2016-12-19 08:04:30 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Avi Kivity
18078bea9b storage_proxy: avoid calculating digest when only one replica is contacted
If we're talking to just one replica, the digest is not going to be used,
so better not to calculate it at all.  The optimization helps with
LOCAL_ONE queries where the result is large, but does not contain large
blobs (many small rows).

This patch adds a digest_algorithm parameter to the READ_DATA verb that
can take on two values: none and MD5 (default), and sets it to none when
we're reading from one replica.

In the future we may add other values for more hardware-friendly digest
algorithms.
Message-Id: <1479380600-19206-1-git-send-email-avi@scylladb.com>
2016-11-17 13:04:30 +02:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Avi Kivity
c94fb1bf12 build: reduce inclusions of messaging_service.hh
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.

Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
2016-10-05 11:46:49 +03:00
Gleb Natapov
26ae8e8365 implement listen_on_broadcast_address option
When using multiple physical network interfaces, set this to true to
listen on broadcast_address in addition to the listen_address, allowing
nodes to communicate in both interfaces.  Ignore this property if the
network configuration automatically routes between the public and
private networks such as EC2.

Message-Id: <20160921094810.GA28654@scylladb.com>
2016-09-26 08:49:54 +03:00
Gleb Natapov
a2cdddb795 storage_proxy: forward mutation write with correct timeout value
Now that mutation handler knows how much time is left for mutation
write to be handled it can use this knowledge to set correct timeout
for forwarded mutations.

Message-Id: <20160828080637.GE9243@scylladb.com>
2016-08-29 13:06:36 +03:00
Vlad Zolotarov
4c16df9e4c service: instrument MUTATE flow with tracing
Store the trace state in the abstract_write_response_handler.
Instrument send_mutation RPC to receive an additional
rpc::optional parameter that will contain optional<trace_info>
value.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Paweł Dziepak
7e06499458 repair: convert hashing to streamed_mutations
This patch makes hashing for repair calculate checksums in a way that
doesn't require rebuilding whole mutation.
Unfortunately, such checksums are incompatible with the old ones so the
old way for computing checksums is preserved for compatibility reasons.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Gleb Natapov
726b79ea91 messaging_service: enable internode_compression option
Use LZ4 for internode compression if enabled.

Message-Id: <20160711141734.GZ18455@scylladb.com>
2016-07-11 18:30:21 +03:00
Paweł Dziepak
32a5de7a1f db: handle receiving fragmented mutations
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.

Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.

This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Asias He
b36d3be5d4 messaging_service: Fix messaging_service::stop
There are two problems:

1. _server_tls is not stopped

2. _server and _server_tls might not be created if
messaging_service::start_listen is not called yet.
2016-06-08 11:13:36 +08:00
Asias He
f27e5d2a68 messaging_service: Delay listening ms during boot up
When a node starts up, peer node can send gossip syn message to it
before the gossip message handlers are registered in messaging_service.

We can see:

  scylla[123]:  [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored

To fix, we delay the listening of messaging_service to the point when
gossip message handlers are registered.
Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>
2016-05-31 12:28:11 +03:00
Gleb Natapov
12cf60c302 messaging_service: add timestemp of last modification to READ_DIGEST verb return value 2016-05-24 13:27:34 +03:00
Calle Wilund
58f7edb04f messaging_service: Change tls init to use credentials_builder
To simplify init of msg service, use credendials_builder
to encapsulate tls options so actual credentials can be
more easily created in each shard.

Message-Id: <1462283265-27051-2-git-send-email-calle@scylladb.com>
2016-05-09 14:12:53 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Pekka Enberg
16f947dcb3 message/messaging_service: Remove init_messaging_service() declaration
The function no longer exists so drop the function declaration.
Message-Id: <1457694134-25600-1-git-send-email-penberg@scylladb.com>
2016-03-14 13:54:53 +02:00
Gleb Natapov
2d092bbd32 storage_proxy: send read requests with timeout
No need to wait for replies long after request is timed out.
Message-Id: <1457351304-28721-2-git-send-email-gleb@scylladb.com>
2016-03-07 14:00:11 +01:00
Paweł Dziepak
b92f8a6d2b messaging_service: add SCHEMA_CHECK verb
SCHEMA_CHECK is used to get node schema version.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:54 +00:00
Asias He
697b16414a gossip: Make gossip message handling async
In each gossip round, i.e., gossiper::run(), we do:

1) send syn message
2)                           peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
   apply_state_locally
     mark_alive
       send_gossip_echo
     handle_major_state_change
       on_restart
       mark_alive
         send_gossip_echo
       mark_dead
         on_dead
       on_join
     apply_new_states
       do_on_change_notifications
          on_change
4) send back ack2 message
5)                            peer node: process ack2 message
   			      apply_state_locally

At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.

In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.

With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).

Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.

Refs: #900
2016-02-24 19:33:39 +08:00
Asias He
63df54b368 messaging_service: Add GOSSIP_DIGEST_ACK
We will soon switch to use no-wait message for gossip. GOSSIP_DIGEST_SYN
will no longer return GOSSIP_DIGEST_ACK message. So we need a standalone
verb for GOSSIP_DIGEST_ACK.
2016-02-24 19:31:14 +08:00
Gleb Natapov
2ae1ae2d18 Cleanup messaging_service.hh includes a bit.
Forward declare some classes instead.

Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:24 +02:00
Asias He
46c392eb17 messaging_service: Stop retrying if messaging_service is being shutdown
If we are shutting down the messaging_service, we should not retry the
message again.

Refs #862

Message-Id: <7c3afb646ba8254eca69096d80dd5ea007e416a7.1454418053.git.asias@scylladb.com>
2016-02-02 19:50:54 +02:00
Gleb Natapov
116ad5a603 Use net::messaging_service::current_version for serialization format versioning
Message-Id: <1454421603-13080-2-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:53 +01:00
Gleb Natapov
19067db642 remove old serializer 2016-02-02 12:15:50 +02:00
Asias He
fbf796b812 messaging_service: Use standalone connection for stream verbs
In streaming, the amount of data needs to be streamed to peer nodes
might be large.

In order to avoid the streaming overwhelms the TCP connection used by
user CQL verbs and starves the user CQL queries, we use a standalone TCP
connection for streaming verbs.
2016-02-01 11:01:56 +08:00
Gleb Natapov
6a581bb8b6 messaging_service: replace rpc::type with boost::type
RPC moved to boost::type to make serializers less rpc centric. Scylla
should follow.

Message-Id: <20160126164450.GA11706@scylladb.com>
2016-01-27 11:57:45 +02:00
Asias He
e8b8b454df streaming: Flatten streaming messages class namespace
There are only two messages: prepare_message and outgoing_file_message.
Actually only the prepare_message is the message we send on wire.
Flatten the namespace.
2016-01-26 13:04:29 +08:00
Avi Kivity
5c5207f122 Merge "Another round of streaming cleanup" from Asias
"- Merge stream_init_message and stream_parepare_message
- Drop  session_index / keep_ss_table_level / file_message_header"
2016-01-25 12:54:30 +02:00
Asias He
77684a5d4c messaging_service: Drop STREAM_INIT_MESSAGE
The verb is not used anymore.
Message-Id: <1453719054-29584-1-git-send-email-asias@scylladb.com>
2016-01-25 12:53:08 +02:00
Asias He
53c6cd7808 gossip: Rename echo verb to gossip_echo
It is used by gossip only. I really could not allow myself to get along
this inconsistence. Change before we still can.
Message-Id: <1453719054-29584-2-git-send-email-asias@scylladb.com>
2016-01-25 12:53:07 +02:00
Asias He
ad4a096b80 streaming: Get rid of stream_init_message
Unlike streaming in c*, scylla does not need to open tcp connections in
streaming service for both incoming and outgoing messages, seastar::rpc
does the work. There is no need for a standalone stream_init_message
message in the streaming negotiation stage, we can merge the
stream_init_message into stream_prepare_message.
2016-01-25 16:24:16 +08:00
Gleb Natapov
067bdb23cd Move reconcilable_result and frozen_mutation to idl 2016-01-24 12:45:41 +02:00
Gleb Natapov
18dff5ebc8 Move smart pointer serialization helpers to .cc file.
They are not used outside of the .cc file, so should not be in the
header.
2016-01-24 12:45:41 +02:00
Gleb Natapov
93da9b2725 Remove redundant vector serialization code.
IDL serializer has the code to serialize vectors, so use it instead.
2016-01-24 12:45:41 +02:00
Avi Kivity
d5050e4c6a storage_proxy: make MUTATION and MUTATION_DONE verbs sychronous at the server side
While MUTATION and MUTATION_DONE are asynchronous by nature (when a MUTATION
completes, it sends a MUTATION_DONE message instead of responding
synchronously), we still want them to be synchronous at the server side
wrt. the RPC server itself.  This is because RPC accounts for resources
consumed by the handler only while the handler is executing; if we return
immediately, and let the code execute asynchronously, RPC believes no
resources are consumed and can instantiate more handlers than the shard
has resources for.

Fix by changing the return type of the handlers to future<no_wait_type>
(from a plain no_wait_type), and making that future complete when local
processing is over.

Ref #596.
Message-Id: <1453048967-5286-1-git-send-email-avi@scylladb.com>
2016-01-18 09:59:34 +02:00
Tomasz Grabiec
e88f41fb3f messaging_service: Move REPAIR_CHECKSUM_RANGE verb out of the streaming verbs group
Message-Id: <1452620321-17223-1-git-send-email-tgrabiec@scylladb.com>
2016-01-12 20:17:08 +02:00
Tomasz Grabiec
e1e8858ed1 service: Fetch and sync schema 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
cdca20775f messaging_service: Introduce get_source() 2016-01-11 10:34:53 +01:00