Commit Graph

232 Commits

Author SHA1 Message Date
Gleb Natapov
646e400918 Provide available memory size to messaging_service object during creation 2018-06-11 15:34:13 +03:00
Avi Kivity
dd12214628 messaging_service: move msg_addr into its own header file
Make it possible to use msg_addr without depending on messaging_service.hh.
2018-03-12 20:05:23 +02:00
Avi Kivity
cd668061fc storage_service: remove system_keyspace.hh include
Re-distribute include among the files that really need it.
2018-03-11 18:53:49 +02:00
Duarte Nunes
440ea56010 message/messaging_service: Specify algorithm when requesting digest
While not strictly needed, specify which algorithm to use when request
a digest from a remote node. This is more flexible than relying on a
cluster wide feature, although that's what we'll do in subsequent
patches. It also makes the verb more consistent with the data request.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-02-01 01:02:50 +00:00
Glauber Costa
08a0c3714c allow request-specific read timeouts in storage proxy reads
Timeouts are a global property. However, for tables in keyspaces like
the system keyspace, we don't want to uphold that timeout--in fact, we
wan't no timeout there at all.

We already apply such configuration for requests waiting in the queued
sstable queue: system keyspace requests won't be removed. However, the
storage proxy will insert its own timeouts in those requests, causing
them to fail.

This patch changes the storage proxy read layer so that the timeout is
applied based on the column family configuration, which is in turn
inherited from the keyspace configuration. This matches our usual
way of passing db parameters down.

In terms of implementation, we can either move the timeout inside the
abstract read executor or keep it external. The former is a bit cleaner,
the the latter has the nice property that all executors generated will
share the exact same timeout point. In this patch, we chose the latter.

We are also careful to propagate the timeout information to the replica.
So even if we are talking about the local replica, when we add the
request to the concurrency queue, we will do it in accordance with the
timeout specified by the storage proxy layer.

After this patch, Scylla is able to start just fine with very low
timeouts--since read timeouts in the system keyspace are now ignored.

Fixes #2462

Implementation notes, and general comments about open discussion in 2462:

* Because we are not bypassing the timeout, just setting it high enough,
  I consider the concerns about the batchlog moot: if we fail for any
  other reason that will be propagated. Last case, because the timeout
  is per-CF, we could do what we do for the dirty memory manager and
  move the batchlog alone to use a different timeout setting.

* Storage proxy likes specifying its timeouts as a time_point, whereas
  when we get low enough as to deal with the read_concurrency_config,
  we are talking about deltas. So at some point we need to convert time_points
  to durations. We do that in the database query functions.

v2:
- use per-request instead of per-table timeouts.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-01-12 07:43:21 -05:00
Vlad Zolotarov
be6f8be9cb messaging_service: fix a mutli-NIC support
Don't enforce the outgoing connections from the 'listen_address'
interface only.

If 'local_address' is given to connect() it will enforce it to use a
particular interface to connect from, even if the destination address
should be accessed from a different interface. If we don't specify the
'local_address' the source interface will be chosen according to the
routing configuration.

Fixes #3066

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1513372688-21595-1-git-send-email-vladz@scylladb.com>
2017-12-17 10:51:20 +02:00
Gleb Natapov
16964de1f3 storage_proxy: fail read/write requests early if it cannot be completed due to errors
If errors make reaching CL impossible a request can be aborted earlier
without waiting for timeout.
2017-12-05 16:46:25 +02:00
Duarte Nunes
1fbe9dc851 message/messaging_service: Close all server sockets
We were stopping the loop prematurely.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20171127181417.8167-1-duarte@scylladb.com>
2017-11-28 11:08:08 +02:00
Asias He
8fa35d6ddf messaging_service: Get rid of timeout and retry logic for streaming verb
With the "Use range_streamer everywhere" (7217b7ab36) seires, all
the user of streaming now do streaming with relative small ranges and
can retry streaming at higher level.

There are problems with timeout and retry at RPC verb level in streaming:
1) Timeout can be false negative.
2) We can not cancel the send operations which are already called. When
user aborts the streaming, the retry logic keeps running for a long
time.

This patch removes all the timeout and retry logic for streaming verbs.
After this, the timeout is the job of TCP, the retry is the job of the
upper layer.

Message-Id: <df20303c1fa728dcfdf06430417cf2bd7a843b00.1503994267.git.asias@scylladb.com>
2017-08-29 17:20:00 +03:00
Avi Kivity
3edec66903 Revert "repair: Make send_repair_checksum_range timeout"
This reverts commit 98757069a5. We have the
failure detector which will detect an unresponsive node and fail the RPC.
Adding a timeout can just introduce false positives.
2017-08-06 13:09:36 +03:00
Asias He
98757069a5 repair: Make send_repair_checksum_range timeout
If the verb never returns the repair will hangs forever. Make it use the
timeout version of the send_message.

Fixes #2662
2017-08-02 21:41:50 +08:00
Duarte Nunes
85e85ec72e Don't catch polymorphic exceptions by value
It makes gcc a very sad compiler.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170726172053.5639-2-duarte@scylladb.com>
2017-07-27 09:39:58 +03:00
Asias He
0ba4e73068 streaming: Introduce the failed parameter for complete message
Use this flag to notify the peer that the session is failed so that the
peer can close the failed session more quickly.

The flag is used as a rpc::optional so it is compatible use old
version of the verb.
2017-07-18 11:24:31 +08:00
Tomasz Grabiec
07ed512060 migration_manager: Give empty response to schema pulls from incompatible nodes
The old nodes which are still using v2 schema tables will fail to
apply our response, with error messages complaining about not being
able to locate schema of certain versions (new schema tables). This
change inhibits such errors by responding with an empty mutation list.
2017-07-07 19:09:57 +02:00
Avi Kivity
c4ae2206c7 messaging: respect inter_dc_tcp_nodelay configuration parameter
We respect it partially (client side only) for now.

Fixes #6.
Message-Id: <20170623172048.23103-1-avi@scylladb.com>
2017-06-24 21:49:27 +02:00
Gleb Natapov
23c51b3e57 messaging_service: connection drop notifier
Allow registering callbacks that will be called when connection is going
down.
2017-06-13 09:57:14 +03:00
Gleb Natapov
69c5526301 messaging_service: return cache hit ratio as part of data read 2017-06-13 09:57:14 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Calle Wilund
d5f57bd047 messaging_service: Move log printout to actual listen start
Fixes  #1845
Log printout was before we actually had evaluated endpoint
to create, thus never included SSL info.
Message-Id: <1487766738-27797-1-git-send-email-calle@scylladb.com>
2017-02-22 17:08:21 +01:00
Paweł Dziepak
bf60b7844b messaging_service: add COUNTER_MUTATION verb
This verb is going to be used for coordinator<->leader communication
during counter updates.
2017-02-02 10:35:14 +00:00
Amnon Heiman
45b6070832 Merge seastar upstream
* seastar 397685c...c1dbd89 (13):
  > lowres_clock: drop cache-line alignment for _timer
  > net/packet: add missing include
  > Merge "Adding histogram and description support" from Amnon
  > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
  > Set the option '--server' of tests/tcp_sctp_client to be required
  > core/memory: Remove superfluous assignment
  > core/memory: Remove dead code
  > core/reactor: Use logger instead of cerr
  > fix inverted logic in overprovision parameter
  > rpc: fix timeout checking condition
  > rpc: use lowres_clock instead of high resolution one
  > semaphore: make semaphore's clock configurable
  > rpc: detect timedout outgoing packets earlier

Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.

Includes fixup from Amnon:

collectd api should use the metrics getters

As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.

This will be important when the internal implementation will changed
from union to variant.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
2017-02-01 14:39:08 +02:00
Paweł Dziepak
1a52569f7d storage_proxy: pass maximum result size to replicas
We may want to change the default individual result size limit in the
future. If it is provided by the coordinator and not hardcoded in the
replicas this can be done without causing data query digest mismatches
or wasteful mutation query results.
2016-12-22 17:16:23 +01:00
Gleb Natapov
0a2dd39c75 messaging_service: move MUTATION_DONE messages to separate connection
If a node gets more MUTATION request that it can handle via RPC it will
stop reading from this RPC connection, but this will prevent it from
getting MUTATION_DONE responses for requests it coordinates because
currently MUTATION and MUTATION_DONE messages shares same connection.

To solve this problem this patches moves MUTATION_DONE messages to
separate connection.

Fixes: #1843

Message-Id: <20161201155942.GC11581@scylladb.com>
2016-12-21 11:10:15 +02:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Asias He
85034c1b57 Convert to use dht::partition_range 2016-12-19 08:04:30 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Avi Kivity
18078bea9b storage_proxy: avoid calculating digest when only one replica is contacted
If we're talking to just one replica, the digest is not going to be used,
so better not to calculate it at all.  The optimization helps with
LOCAL_ONE queries where the result is large, but does not contain large
blobs (many small rows).

This patch adds a digest_algorithm parameter to the READ_DATA verb that
can take on two values: none and MD5 (default), and sets it to none when
we're reading from one replica.

In the future we may add other values for more hardware-friendly digest
algorithms.
Message-Id: <1479380600-19206-1-git-send-email-avi@scylladb.com>
2016-11-17 13:04:30 +02:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Avi Kivity
c94fb1bf12 build: reduce inclusions of messaging_service.hh
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.

Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
2016-10-05 11:46:49 +03:00
Gleb Natapov
c95df8f053 messaging_service: use correct value for listen_to_bc_address is a constructor used by tests
Also make sure to not listen on the same exact address twice in case
listen_address == broadcast_address. Scylla configuration code does not
allow such thing to be configured, but better to be safe.

Message-Id: <20160927102316.GO32178@scylladb.com>
2016-09-27 11:27:23 +01:00
Gleb Natapov
26ae8e8365 implement listen_on_broadcast_address option
When using multiple physical network interfaces, set this to true to
listen on broadcast_address in addition to the listen_address, allowing
nodes to communicate in both interfaces.  Ignore this property if the
network configuration automatically routes between the public and
private networks such as EC2.

Message-Id: <20160921094810.GA28654@scylladb.com>
2016-09-26 08:49:54 +03:00
Gleb Natapov
a2cdddb795 storage_proxy: forward mutation write with correct timeout value
Now that mutation handler knows how much time is left for mutation
write to be handled it can use this knowledge to set correct timeout
for forwarded mutations.

Message-Id: <20160828080637.GE9243@scylladb.com>
2016-08-29 13:06:36 +03:00
Vlad Zolotarov
4c16df9e4c service: instrument MUTATE flow with tracing
Store the trace state in the abstract_write_response_handler.
Instrument send_mutation RPC to receive an additional
rpc::optional parameter that will contain optional<trace_info>
value.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Paweł Dziepak
7e06499458 repair: convert hashing to streamed_mutations
This patch makes hashing for repair calculate checksums in a way that
doesn't require rebuilding whole mutation.
Unfortunately, such checksums are incompatible with the old ones so the
old way for computing checksums is preserved for compatibility reasons.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Gleb Natapov
726b79ea91 messaging_service: enable internode_compression option
Use LZ4 for internode compression if enabled.

Message-Id: <20160711141734.GZ18455@scylladb.com>
2016-07-11 18:30:21 +03:00
Paweł Dziepak
32a5de7a1f db: handle receiving fragmented mutations
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.

Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.

This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Asias He
b36d3be5d4 messaging_service: Fix messaging_service::stop
There are two problems:

1. _server_tls is not stopped

2. _server and _server_tls might not be created if
messaging_service::start_listen is not called yet.
2016-06-08 11:13:36 +08:00
Asias He
f7d25e6bae messaging_service: Handle _server is not created in foreach_server_connection_stats
It is possible _server is not created yet when
foreach_server_connection_stats is called. Handle this case.
2016-06-08 11:13:35 +08:00
Vlad Zolotarov
4c17a422e0 cql3: instrument a SELECT query to send tracing info
Instrument a coordinator of a SELECT query to send tracing session
info to the corresponding replica Nodes.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:17:25 +03:00
Asias He
f27e5d2a68 messaging_service: Delay listening ms during boot up
When a node starts up, peer node can send gossip syn message to it
before the gossip message handlers are registered in messaging_service.

We can see:

  scylla[123]:  [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored

To fix, we delay the listening of messaging_service to the point when
gossip message handlers are registered.
Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>
2016-05-31 12:28:11 +03:00
Avi Kivity
3f6ecb9f28 Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb 2016-05-29 15:58:55 +03:00
Gleb Natapov
32c9a06faf messaging_service: abort retrying send during exit
Fixes #862

Message-Id: <1463579574-15789-3-git-send-email-gleb@scylladb.com>
2016-05-29 11:39:36 +03:00
Gleb Natapov
12cf60c302 messaging_service: add timestemp of last modification to READ_DIGEST verb return value 2016-05-24 13:27:34 +03:00
Calle Wilund
58f7edb04f messaging_service: Change tls init to use credentials_builder
To simplify init of msg service, use credendials_builder
to encapsulate tls options so actual credentials can be
more easily created in each shard.

Message-Id: <1462283265-27051-2-git-send-email-calle@scylladb.com>
2016-05-09 14:12:53 +03:00
Duarte Nunes
dada385826 rpc: Secure connection attempts can be cancelled
This patch adds support for secure connection attempts to be
cancellable.

Fixes #862

Includes seastar upstream merge:

* seastar f1a3520...7782ad4 (1):
  > Merge "rpc: Allow client connections to be cancelled" from Duarte

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1462783335-10731-1-git-send-email-duarte@scylladb.com>
2016-05-09 11:44:53 +03:00
Calle Wilund
d8ea85cd90 messaging_service: Add logging to match origin
To announce rpc port + ssl if on.

Message-Id: <1462368016-32394-1-git-send-email-calle@scylladb.com>
2016-05-05 10:26:01 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Gleb Natapov
1e6352e398 messaging: do not admit new requests during messaging service shutdown.
Sending a message may open new client connection which will never be
closed in case messaging service is shutting down already.

Fixes #1059

Message-Id: <1458639452-29388-3-git-send-email-gleb@scylladb.com>
2016-03-22 13:00:18 +02:00
Gleb Natapov
357c91a076 messaging: do not delete client during messaging service shutdown
Messaging service stop() method calls stop() on all clients. If
remove_rpc_client_one() is called while those stops are running
client::stop() will be called twice which not suppose to happen. Fix it
by ignoring client remove request during messaging service shutdown.

Fixes #1059

Message-Id: <1458639452-29388-2-git-send-email-gleb@scylladb.com>
2016-03-22 13:00:18 +02:00