scylladb

Author	SHA1	Message	Date
Asias He	8fa35d6ddf	messaging_service: Get rid of timeout and retry logic for streaming verb With the "Use range_streamer everywhere" (`7217b7ab36`) seires, all the user of streaming now do streaming with relative small ranges and can retry streaming at higher level. There are problems with timeout and retry at RPC verb level in streaming: 1) Timeout can be false negative. 2) We can not cancel the send operations which are already called. When user aborts the streaming, the retry logic keeps running for a long time. This patch removes all the timeout and retry logic for streaming verbs. After this, the timeout is the job of TCP, the retry is the job of the upper layer. Message-Id: <df20303c1fa728dcfdf06430417cf2bd7a843b00.1503994267.git.asias@scylladb.com>	2017-08-29 17:20:00 +03:00
Avi Kivity	3edec66903	Revert "repair: Make send_repair_checksum_range timeout" This reverts commit `98757069a5`. We have the failure detector which will detect an unresponsive node and fail the RPC. Adding a timeout can just introduce false positives.	2017-08-06 13:09:36 +03:00
Asias He	98757069a5	repair: Make send_repair_checksum_range timeout If the verb never returns the repair will hangs forever. Make it use the timeout version of the send_message. Fixes #2662	2017-08-02 21:41:50 +08:00
Duarte Nunes	85e85ec72e	Don't catch polymorphic exceptions by value It makes gcc a very sad compiler. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170726172053.5639-2-duarte@scylladb.com>	2017-07-27 09:39:58 +03:00
Asias He	0ba4e73068	streaming: Introduce the failed parameter for complete message Use this flag to notify the peer that the session is failed so that the peer can close the failed session more quickly. The flag is used as a rpc::optional so it is compatible use old version of the verb.	2017-07-18 11:24:31 +08:00
Tomasz Grabiec	07ed512060	migration_manager: Give empty response to schema pulls from incompatible nodes The old nodes which are still using v2 schema tables will fail to apply our response, with error messages complaining about not being able to locate schema of certain versions (new schema tables). This change inhibits such errors by responding with an empty mutation list.	2017-07-07 19:09:57 +02:00
Avi Kivity	c4ae2206c7	messaging: respect inter_dc_tcp_nodelay configuration parameter We respect it partially (client side only) for now. Fixes #6. Message-Id: <20170623172048.23103-1-avi@scylladb.com>	2017-06-24 21:49:27 +02:00
Gleb Natapov	23c51b3e57	messaging_service: connection drop notifier Allow registering callbacks that will be called when connection is going down.	2017-06-13 09:57:14 +03:00
Gleb Natapov	69c5526301	messaging_service: return cache hit ratio as part of data read	2017-06-13 09:57:14 +03:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Calle Wilund	d5f57bd047	messaging_service: Move log printout to actual listen start Fixes #1845 Log printout was before we actually had evaluated endpoint to create, thus never included SSL info. Message-Id: <1487766738-27797-1-git-send-email-calle@scylladb.com>	2017-02-22 17:08:21 +01:00
Paweł Dziepak	bf60b7844b	messaging_service: add COUNTER_MUTATION verb This verb is going to be used for coordinator<->leader communication during counter updates.	2017-02-02 10:35:14 +00:00
Amnon Heiman	45b6070832	Merge seastar upstream * seastar 397685c...c1dbd89 (13): > lowres_clock: drop cache-line alignment for _timer > net/packet: add missing include > Merge "Adding histogram and description support" from Amnon > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&' > Set the option '--server' of tests/tcp_sctp_client to be required > core/memory: Remove superfluous assignment > core/memory: Remove dead code > core/reactor: Use logger instead of cerr > fix inverted logic in overprovision parameter > rpc: fix timeout checking condition > rpc: use lowres_clock instead of high resolution one > semaphore: make semaphore's clock configurable > rpc: detect timedout outgoing packets earlier Includes treewide change to accomodate rpc changing its timeout clock to lowres_clock. Includes fixup from Amnon: collectd api should use the metrics getters As part of a preperation of the change in the metrics layer, this change the way the collectd api uses the metrics value to use the getters instead of calling the member directly. This will be important when the internal implementation will changed from union to variant. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>	2017-02-01 14:39:08 +02:00
Paweł Dziepak	1a52569f7d	storage_proxy: pass maximum result size to replicas We may want to change the default individual result size limit in the future. If it is provided by the coordinator and not hardcoded in the replicas this can be done without causing data query digest mismatches or wasteful mutation query results.	2016-12-22 17:16:23 +01:00
Gleb Natapov	0a2dd39c75	messaging_service: move MUTATION_DONE messages to separate connection If a node gets more MUTATION request that it can handle via RPC it will stop reading from this RPC connection, but this will prevent it from getting MUTATION_DONE responses for requests it coordinates because currently MUTATION and MUTATION_DONE messages shares same connection. To solve this problem this patches moves MUTATION_DONE messages to separate connection. Fixes: #1843 Message-Id: <20161201155942.GC11581@scylladb.com>	2016-12-21 11:10:15 +02:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Asias He	85034c1b57	Convert to use dht::partition_range	2016-12-19 08:04:30 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Avi Kivity	18078bea9b	storage_proxy: avoid calculating digest when only one replica is contacted If we're talking to just one replica, the digest is not going to be used, so better not to calculate it at all. The optimization helps with LOCAL_ONE queries where the result is large, but does not contain large blobs (many small rows). This patch adds a digest_algorithm parameter to the READ_DATA verb that can take on two values: none and MD5 (default), and sets it to none when we're reading from one replica. In the future we may add other values for more hardware-friendly digest algorithms. Message-Id: <1479380600-19206-1-git-send-email-avi@scylladb.com>	2016-11-17 13:04:30 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Gleb Natapov	c95df8f053	messaging_service: use correct value for listen_to_bc_address is a constructor used by tests Also make sure to not listen on the same exact address twice in case listen_address == broadcast_address. Scylla configuration code does not allow such thing to be configured, but better to be safe. Message-Id: <20160927102316.GO32178@scylladb.com>	2016-09-27 11:27:23 +01:00
Gleb Natapov	26ae8e8365	implement listen_on_broadcast_address option When using multiple physical network interfaces, set this to true to listen on broadcast_address in addition to the listen_address, allowing nodes to communicate in both interfaces. Ignore this property if the network configuration automatically routes between the public and private networks such as EC2. Message-Id: <20160921094810.GA28654@scylladb.com>	2016-09-26 08:49:54 +03:00
Gleb Natapov	a2cdddb795	storage_proxy: forward mutation write with correct timeout value Now that mutation handler knows how much time is left for mutation write to be handled it can use this knowledge to set correct timeout for forwarded mutations. Message-Id: <20160828080637.GE9243@scylladb.com>	2016-08-29 13:06:36 +03:00
Vlad Zolotarov	4c16df9e4c	service: instrument MUTATE flow with tracing Store the trace state in the abstract_write_response_handler. Instrument send_mutation RPC to receive an additional rpc::optional parameter that will contain optional<trace_info> value. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Paweł Dziepak	7e06499458	repair: convert hashing to streamed_mutations This patch makes hashing for repair calculate checksums in a way that doesn't require rebuilding whole mutation. Unfortunately, such checksums are incompatible with the old ones so the old way for computing checksums is preserved for compatibility reasons. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Gleb Natapov	726b79ea91	messaging_service: enable internode_compression option Use LZ4 for internode compression if enabled. Message-Id: <20160711141734.GZ18455@scylladb.com>	2016-07-11 18:30:21 +03:00
Paweł Dziepak	32a5de7a1f	db: handle receiving fragmented mutations If mutations are fragmented during streaming a special care must be taken so that isolation guarantees are not broken. Mutations received with flag "fragmented" set are applied to a memtable that is used only by that particular streaming task and the sstables created by flushing such memtables are not made visible until the task is complte. Also, in case the streaming fails all data is dropped. This means that fragmented mutations cannot benefit from coalescing of writes from multiple streaming plans, hence separate way of handling them so that there is no loss of performance for small partitions. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Asias He	b36d3be5d4	messaging_service: Fix messaging_service::stop There are two problems: 1. _server_tls is not stopped 2. _server and _server_tls might not be created if messaging_service::start_listen is not called yet.	2016-06-08 11:13:36 +08:00
Asias He	f7d25e6bae	messaging_service: Handle _server is not created in foreach_server_connection_stats It is possible _server is not created yet when foreach_server_connection_stats is called. Handle this case.	2016-06-08 11:13:35 +08:00
Vlad Zolotarov	4c17a422e0	cql3: instrument a SELECT query to send tracing info Instrument a coordinator of a SELECT query to send tracing session info to the corresponding replica Nodes. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:25 +03:00
Asias He	f27e5d2a68	messaging_service: Delay listening ms during boot up When a node starts up, peer node can send gossip syn message to it before the gossip message handlers are registered in messaging_service. We can see: scylla[123]: [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored To fix, we delay the listening of messaging_service to the point when gossip message handlers are registered. Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>	2016-05-31 12:28:11 +03:00
Avi Kivity	3f6ecb9f28	Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb	2016-05-29 15:58:55 +03:00
Gleb Natapov	32c9a06faf	messaging_service: abort retrying send during exit Fixes #862 Message-Id: <1463579574-15789-3-git-send-email-gleb@scylladb.com>	2016-05-29 11:39:36 +03:00
Gleb Natapov	12cf60c302	messaging_service: add timestemp of last modification to READ_DIGEST verb return value	2016-05-24 13:27:34 +03:00
Calle Wilund	58f7edb04f	messaging_service: Change tls init to use credentials_builder To simplify init of msg service, use credendials_builder to encapsulate tls options so actual credentials can be more easily created in each shard. Message-Id: <1462283265-27051-2-git-send-email-calle@scylladb.com>	2016-05-09 14:12:53 +03:00
Duarte Nunes	dada385826	rpc: Secure connection attempts can be cancelled This patch adds support for secure connection attempts to be cancellable. Fixes #862 Includes seastar upstream merge: * seastar f1a3520...7782ad4 (1): > Merge "rpc: Allow client connections to be cancelled" from Duarte Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1462783335-10731-1-git-send-email-duarte@scylladb.com>	2016-05-09 11:44:53 +03:00
Calle Wilund	d8ea85cd90	messaging_service: Add logging to match origin To announce rpc port + ssl if on. Message-Id: <1462368016-32394-1-git-send-email-calle@scylladb.com>	2016-05-05 10:26:01 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	1e6352e398	messaging: do not admit new requests during messaging service shutdown. Sending a message may open new client connection which will never be closed in case messaging service is shutting down already. Fixes #1059 Message-Id: <1458639452-29388-3-git-send-email-gleb@scylladb.com>	2016-03-22 13:00:18 +02:00
Gleb Natapov	357c91a076	messaging: do not delete client during messaging service shutdown Messaging service stop() method calls stop() on all clients. If remove_rpc_client_one() is called while those stops are running client::stop() will be called twice which not suppose to happen. Fix it by ignoring client remove request during messaging service shutdown. Fixes #1059 Message-Id: <1458639452-29388-2-git-send-email-gleb@scylladb.com>	2016-03-22 13:00:18 +02:00
Asias He	b8abd88841	messaging_service: Take reference of ms in send_message_timeout_and_retry Take a reference of messaging_service object inside send_message_timeout_and_retry to make sure it is not freed during the life time of send_message_timeout_and_retry operation.	2016-03-22 12:32:19 +02:00
Gleb Natapov	e228ef1bd9	messaging: enable keepalive tcp option for inter-node communication Some network equipment that does TCP session tracking tend to drop TCP sessions after a period of inactivity. Use keepalive mechanism to prevent this from happening for our inter-node communication. Message-Id: <20160314173344.GI31837@scylladb.com>	2016-03-14 19:39:39 +02:00
Pekka Enberg	16f947dcb3	message/messaging_service: Remove init_messaging_service() declaration The function no longer exists so drop the function declaration. Message-Id: <1457694134-25600-1-git-send-email-penberg@scylladb.com>	2016-03-14 13:54:53 +02:00
Asias He	bcdd3dbb3e	messaging_service: Add missed throw It is missed somehow. Message-Id: <1457684884-4776-1-git-send-email-asias@scylladb.com>	2016-03-11 11:01:24 +02:00
Asias He	bf3507d093	messaging_service: Stop retrying if node is removed from gossip - Start a node - Inject data - Start another node to bootstrap - Before the second node finishes streaming, kill the second node - After a while the node will be removed from the cluster becusue it does not manage to join the cluster. - At this time, messaging_service might keep retrying the stream_mutations unncessarily. To fix, check if the peer node is still a known node in the gossip.	2016-03-09 07:35:20 +08:00
Gleb Natapov	2d092bbd32	storage_proxy: send read requests with timeout No need to wait for replies long after request is timed out. Message-Id: <1457351304-28721-2-git-send-email-gleb@scylladb.com>	2016-03-07 14:00:11 +01:00
Paweł Dziepak	b92f8a6d2b	messaging_service: add SCHEMA_CHECK verb SCHEMA_CHECK is used to get node schema version. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 12:49:54 +00:00
Asias He	697b16414a	gossip: Make gossip message handling async In each gossip round, i.e., gossiper::run(), we do: 1) send syn message 2) peer node: receive syn message, send back ack message 3) process ack message in handle_ack_msg apply_state_locally mark_alive send_gossip_echo handle_major_state_change on_restart mark_alive send_gossip_echo mark_dead on_dead on_join apply_new_states do_on_change_notifications on_change 4) send back ack2 message 5) peer node: process ack2 message apply_state_locally At the moment, syn is "wait" message, it times out in 3 seconds. In step 3, all the registered gossip callbacks are called which might take significant amount of time to complete. In order to reduce the gossip round latency, we make syn "no-wait" and do not run the handle_ack_msg insdie the gossip::run(). As a result, we will not get a ack message as the return value of a syn message any more, so a GOSSIP_DIGEST_ACK message verb is introduced. With this patch, the gossip message exchange is now async. It is useful when some nodes are down in the cluster. We will not delay the gossip round, which is supposed to run every second, 3*n seconds (n = 1-3, since it talks to 1-3 peer nodes in each gossip round) or even longer (considering the time to run gossip callbacks). Later, we can make talking to the 1-3 peer nodes in parallel to reduce latency even more. Refs: #900	2016-02-24 19:33:39 +08:00

1 2 3 4 5

224 Commits