scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 00:50:35 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	e88f41fb3f	messaging_service: Move REPAIR_CHECKSUM_RANGE verb out of the streaming verbs group Message-Id: <1452620321-17223-1-git-send-email-tgrabiec@scylladb.com>	2016-01-12 20:17:08 +02:00
Vlad Zolotarov	9232ad927f	messaging_service::get_rpc_client(): fix the encryption logic According to specification (here https://wiki.apache.org/cassandra/InternodeEncryption) when the internode encryption is set to `dc` the data passed between DCs should be encrypted and similarly, when it's set to `rack` the inter-rack traffic should encrypted. Currently Scylla would encrypt the traffic inside a local DC in the first case and inside the local RACK in the later one. This patch fixes the encryption logic to follow the specification above. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1452501794-23232-1-git-send-email-vladz@cloudius-systems.com>	2016-01-12 16:22:26 +02:00
Tomasz Grabiec	e1e8858ed1	service: Fetch and sync schema	2016-01-11 10:34:53 +01:00
Tomasz Grabiec	cdca20775f	messaging_service: Introduce get_source()	2016-01-11 10:34:53 +01:00
Tomasz Grabiec	da3a453003	service: Add GET_SCHEMA_VERSION remote call The verb belongs to a seaprate client to avoid potential deadlocks should the throttling on connection level be introduced in the future. Another reason is to reduce latency for version requests as it can potentially block many requests.	2016-01-11 10:34:52 +01:00
Asias He	2345cda42f	messaging_service: Rename shard_id to msg_addr Use shard_id as the destination of the messaging_service is confusing, since shard_id is used in the context of cpu id. Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>	2016-01-07 10:36:35 +02:00
Nadav Har'El	f5b2135a80	repair: repair_checksum_range message This patch adds a new type of message, "REPAIR_CHECKSUM_RANGE" to scylla's "messaging_service" RPC mechanism, for the use of repair: With this message the repair's master host tells a slave host to calculate the checksum of a column-family's partitions in a given token range, and return that checksum. The implementation of this message uses the checksum_range() function defined in the previous patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-01-05 15:38:40 +02:00
Gleb Natapov	fae98f5d67	Revert "messaging_service: wait for outstanding requests" This reverts commit `9661d8936b`. Message-Id: <1450690729-22551-3-git-send-email-gleb@scylladb.com>	2016-01-03 16:06:39 +02:00
Gleb Natapov	de0771f1d1	Revert "messaging_service: restore indentation" This reverts commit `dcbba2303e`. Message-Id: <1450690729-22551-2-git-send-email-gleb@scylladb.com>	2016-01-03 16:06:38 +02:00
Asias He	1b3d2dee8f	streaming: Drop src_cpu_id parameter Now that we can get the src_cpu_id from rpc::client_info. No need to pass it as verb parameter.	2015-12-31 11:25:09 +01:00
Asias He	3ae21e06b5	messaging_service: Add src_cpu_id to CLIENT_ID verb It is useful to figure out which shard to send messages back to the sender.	2015-12-31 11:25:09 +01:00
Asias He	22d0525bc0	streaming: Get rid of the _from_ parameter Get this from cinfo.retrieve_auxiliary inside the rpc handler.	2015-12-31 11:25:08 +01:00
Asias He	89b79d44de	streaming: Get rid of the _connecting_ parameter messaging_service will use private ip address automatically to connect a peer node if possible. There is no need for the upper level like streaming to worry about it. Drop it simplifies things a bit.	2015-12-31 11:25:08 +01:00
Gleb Natapov	2bcfe02ee6	messaging: remove unused verbs	2015-12-30 15:06:35 +01:00
Gleb Natapov	f0e8b8805c	messaging: constify some handlers	2015-12-30 15:06:35 +01:00
Calle Wilund	d1badfa108	messaging_service: Optionally create SSL endpoints * Accept port + credentials + option for what to encrypt * If set, enable a SSL listener at ssl_port * Check outgoing connections by IP to determine if they should go to SSL/normal endpoint Requires seastar RPC patch Note: currently, the connections created by messaging service does _not_ do certificate name verification. While DNS lookup is probably not that expensive here, I am not 100% sure it is the desired behaviour. Normal trust is however verified.	2015-12-28 10:10:35 +00:00
Avi Kivity	827a4d0010	Merge "streaming: Invalidate cache upon receiving of stream" from Asias "When a node gain or regain responsibility for certain token ranges, streaming will be performed, upon receiving of the stream data, the row cache is invalidated for that range. Refs #484."	2015-12-28 10:24:46 +02:00
Avi Kivity	2b22772e3c	Merge "Introduce keep alive timer for stream_session" from Asias "Fixes stream_session hangs: 1) if the sending node is gone, the receiving peer will wait forever 2) if the node which should send COMPLETE_MESSAGE to the peer node is gone, the peer node will wait forever"	2015-12-27 16:56:32 +02:00
Avi Kivity	f3980f1fad	Merge seastar upstream * seastar 51154f7...8b2171e (9): > memcached: avoid a collision of an expiration with time_point(-1). > tutorial: minor spelling corrections etc. > tutorial: expand semaphores section > Merge "Use steady_clock where monotonic clock is required" from Vlad > Merge "TLS fixes + RPC adaption" from Calle > do_with() optimization > tutorial: explain limiting parallelism using semaphores > submit_io: change pending flushes criteria > apps: remove defunct apps/seastar Adjust code to use steady_clock instead of high_resolution_clock.	2015-12-27 14:40:20 +02:00
Asias He	f527e07be6	streaming: Get stream_session in STREAM_MUTATION handler Get from address from cinfo. It is needed to figure out which stream session this mutation is belonged to, since we need to update the keep alive timer for this stream session.	2015-12-24 20:34:44 +08:00
Asias He	bd276fd087	streaming: Increase retry timeout Currently, if the node is actually down, although the streaming_timeout is 10 seconds, the sending of the verb will return rpc_closed error immediately, so we give up in 20 * 5 = 100 seconds. After this change, we give up in 10 * 30 = 300 seconds at least, and 10 * (30 + 30) = 600 seconds at most.	2015-12-24 20:34:44 +08:00
Asias He	eaea09ee71	streaming: Retransmit COMPLETE_MESSAGE message It is oneway message at the moment. If a COMPLETE_MESSAGE is lost, no one will close the session. The first step to fix the issue is to try to retransmit the message.	2015-12-24 20:34:44 +08:00
Asias He	2d32195c32	streaming: Invalidate cache upon receiving of stream When a node gain or regain responsibility for certain token ranges, streaming will be performed, upon receiving of the stream data, the row cache is invalidated for that range. Refs #484.	2015-12-21 14:44:13 +08:00
Paweł Dziepak	dcbba2303e	messaging_service: restore indentation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-17 14:06:41 +01:00
Paweł Dziepak	9661d8936b	messaging_service: wait for outstanding requests Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-17 14:06:41 +01:00
Avi Kivity	b34a1f6a84	Merge "Preliminary changes for handling of schema changes" from Tomasz "I extracted some less controversial changes on which the schema changes series will depend o somewhat reduce the noise in the main series."	2015-12-16 19:08:22 +02:00
Tomasz Grabiec	872bfadb3d	messaging_service: Remove unused parameters from send_migration_request()	2015-12-16 18:06:54 +01:00
Avi Kivity	e27a5d97f6	Merge "background mutation throttling" from Gleb Fixes the case where background activity needed to complete CL=ONE writes is queued up in the storage proxy, and the client adds new work faster than it can be cleared.	2015-12-16 18:08:12 +02:00
Gleb Natapov	de63b3a824	storage_proxy: provide timeout for send_mutation verb Providing timeout for send_mutation verb allows rpc to drop packets that sit in outgoing queue for to long.	2015-12-16 10:13:46 +02:00
Nadav Har'El	63c0906b16	messaging_service: drop unnecessary explicit templates The previous patch added message_service read()/write() support for all types which know how to serialize themselves through our "old" serialization API (serialize()/deserialize()/serialized_size()). So we no longer need the almost 200 lines of repetitive code in messaging_service.{cc,hh} which defined these read/write templates separately for a dozen different types using their *serialize() methods. We also no longer need the helper functions read_gms()/write_gms(), which are basically the same code as that in the template functions added in the previous patch. Compilation is not significantly slowed down by this patch, because it merely replaces a dozen templates by one template that covers them all - it does not add new template complexity, and these templates are anyway instantiated only in messaging_service.cc (other code only calls specific functions defined in messaging_service.cc, and does not use these templates). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2015-12-15 19:07:05 +02:00
Nadav Har'El	438f6b79f7	messaging_service: allow any self-serializing type Currently, messaging_service only supports sending types for which a read/ write function has been explicitly implemented in messageing_service.hh/cc. Some types already have serialization/deserialization methods inside them, and those could have been used for the serialization without having to write new functions for each of these types. Many of these types were already supported explicitly in messaging_service.{cc,hh}, but some were forgot - for example, dht::token. So this patch adds a default implemention of messaging_service write()/read() which will work for any type which has these serialization methods. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2015-12-15 19:07:05 +02:00
Asias He	66938ac129	streaming: Add retransmit logic for streaming verbs Retransmit streaming related verbs and give up in 5 minutes. Tested with: lein test :only cassandra.batch-test/batch-halves-decommission Fixes #568.	2015-12-09 15:12:36 +02:00
Tomasz Grabiec	d64db98943	query: Convert serialization of query::result to use db::serializer<> That's what we're trying to standardize on. This patch also fixes an issue with current query::result::serialize() not being const-qualified, because it modifies the buffer. messaging_service did a const cast to work this around, which is not safe.	2015-12-03 09:19:11 +01:00
Gleb Natapov	8c02ad0e9e	messaging: log connection dropping event	2015-11-30 19:42:04 +02:00
Gleb Natapov	33e5097090	messaging: do not kill live connection needlessly Messaging service closes connection in rpc call continuation on closed_error, but the code runs for each outstanding rpc call on the connection, so first continuation may destroy genuinely closed connection, then connection is reopened and next continuation that handless previous error kills now perfectly healthy connection. Fix this by closing connection only in error state.	2015-11-23 20:16:28 +02:00
Gleb Natapov	eb220507ce	storage_proxy: use correct endpoint address for mutation acks processing Write handler keeps track of all endpoints that not yet acked mutation verb. It uses broadcast address as an enpoint id, but if local address is different from broadcast address for local enpoints acknowledgements will come from different address, so socket address cannot be used as an acknowledgement source. Origin solves this by sending "from" in each message, it looks like an overhead, solve this by providing endpoint's broadcast address in rpc client_info and use that instead.	2015-11-16 10:29:47 +01:00
Amnon Heiman	d5d0653210	messaging_service: Add a function that goes over all the server stats The API needs to get the stats from the rpc server, that is hidden from the messaging service API. This patch adds a foreach function that goes over all the server stats without exposing the server implementation. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-11-02 16:15:52 +02:00
Asias He	2c8867c348	config: Enable storage_port option	2015-10-29 08:58:41 +08:00
Vlad Zolotarov	d8de1099eb	message::messaging_service: introduce _preferred_ip_cache This map will contain the (internal) IPs corresponding to specific Nodes. The mapping is also stored in the system.peers table. So, instead of always connecting to external IP messaging_service::get_rpc_client() will query _preferred_ip_cache and only if there is no entry for a given Node will connect to the external IP. We will call for init_local_preferred_ip_cache() at the end of system table init. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Improved the _preferred_ip_cache description. - Code styling issues. New in v3: - Make get_internal_ip() public. - get_rpc_client(): return a get_preferred_ip() usage dropped in v2 by mistake during rebase.	2015-10-26 14:09:26 +02:00
Vlad Zolotarov	f896f9a908	message::messaging_service: added remove_rpc_client(shard_id) This function erases shard_info objects from all _clients maps. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Use remove_rpc_client_one() instead of direct map::erase().	2015-10-26 14:09:26 +02:00
Vlad Zolotarov	e9789dd68c	message::messaging_service: fixes in rpc_protocol_client_wrapper shut down - Ensure messaging_service::stop() blocks until all rpc_protocol::client::stop() are over. - Remove the async code from rpc_protocol_client_wrapper destructor - call for stop() everywhere it's needed instead. Ensure that rpc_protocol_client_wrapper is always "stopped" when its destructor is called. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v3: - Code style fixes. - Killed rpc_protocol_client_wrapper::_stopped. - Killed rpc_protocol_client_wrapper::~rpc_protocol_client_wrapper(). - Use std::move() for saving shared pointer before erasing the entry from _clients in remove_rpc_client_one() in order to avoid extra ref count bumping.	2015-10-26 14:09:26 +02:00
Vlad Zolotarov	842b13325d	message::messaging_service: make _clients to be std::array This makes code cleaner. Also it would allow less changes if we decide to increase _clients size in the future. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-10-26 14:09:26 +02:00
Asias He	1965e8751b	messaging_service: Add REPLICATION_FINISHED verb It is used to send replication finished message by storage_service when removing a node from a cluster.	2015-10-21 16:11:33 +08:00
Tomasz Grabiec	19d7d30e67	Replace references to 'urchin' with 'scylla'	2015-10-19 11:08:05 +03:00
Calle Wilund	37131fcc05	messaging_service: TRUNCATE verb methods	2015-09-30 09:09:42 +02:00
Gleb Natapov	140641689b	messaging: do not use rpc client in error state Using rpc client in error state will result in a message loss. Try to reconnect instead.	2015-09-24 17:50:51 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Asias He	eead846712	messaging_service: Make gossip use standalone tcp connection For unknown reasons, I saw gossip syn message got rpc timeout erros when the cluster is under heavy cassandra-strss stress. Using a standalone tcp connection seems to fix the issue.	2015-09-19 10:17:42 +03:00
Asias He	0f5df4476c	gossip: Make the timeout longer for gossip syn and echo message When the cluster is under heavy load, the time to exchange a gossip message might take longer than 1s. Let's make the timeout longer for now before we can solve the large delay of gossip message issue.	2015-09-17 11:35:31 +03:00
Asias He	1e7d883ae1	messaging_service: Fix shard_id We should ignore equal and less than operators for shard_id as well. Within a 3 nodes cluster, each node has 4 cpus, on first node Before: [fedora@ip-172-30-0-99 ~]$ netstat -nt\|grep 100\:7000 tcp 0 0 172.30.0.99:36998 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:36772 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:40125 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:60182 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:38013 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:51997 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:56532 172.30.0.100:7000 ESTABLISHED After: [fedora@ip-172-30-0-99 ~]$ netstat -nt\|grep 100\:7000 tcp 0 0 172.30.0.99:45661 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:57395 172.30.0.100:7000 ESTABLISHED tcp 0 0 172.30.0.99:37807 172.30.0.100:7000 ESTABLISHED tcp 0 36 172.30.0.99:50567 172.30.0.100:7000 ESTABLISHED Each shard of a node is supposed to have 1 connection to a peer node, thus each node will have #cpu connections to a peer node. With this patch, the cluster is much more stable than before on AWS. So far, I see no timeout in the gossip syn message exchange.	2015-09-16 08:44:47 +02:00

1 2 3

137 Commits