scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Asias He	53f95abd96	virtio: Fix feature setup This fixes a big tcp_server rx regression. Before: ========== rxrx ============ Server: 192.168.66.123:10000 Connections: 100 Bytes Sent(MiB): 10000 Total Time(Secs): 85.074086675 --->> big regression!!! Bandwidth(MiB/Sec): 117.54460601148733 After: ========== rxrx ============ Server: 192.168.66.123:10000 Connections: 100 Bytes Sent(MiB): 10000 Total Time(Secs): 9.905637754 Bandwidth(MiB/Sec): 1009.5261151622362	2014-12-10 11:01:54 +02:00
Avi Kivity	b87a76412c	packet: avoid hand-rolled deleter chaining, use deleter::append instead The hand-rolled deleter chaining in packet::append was invalidated by the make_free_deleter() optimization, since deleter->_next is no longer guaranteed to be valid (and deleter::operator->() is still exposed, despite that). Switch to deleter::append(), which does the right thing. Fixes a memory leak in tcp_server.	2014-12-09 20:37:17 +02:00
Gleb Natapov	8bb82512a1	net: enable RSS for V4 IP/UDP/TCP	2014-12-09 18:55:19 +02:00
Gleb Natapov	73f6d943e1	net: separate device initialization from queues initialization This patch adds new class distributed_device which is responsible for initializing HW device and it is shared between all cpus. Old device class responsibility becomes managing rx/tx queue pair and it is local per cpu. Each cpu have to call distributed_device::init_local_queue() to create its own device. The logic to distribute cpus between available queues (in case there is no enough queues for each cpu) is in the distributed_device currently and not really implemented yet, so only one queue or queues == cpus scenarios are supported currently, but this can be fixed later. The plan is to rename "distributed_device" to "device" and "device" to "queue_pair" in later patches.	2014-12-09 18:55:14 +02:00
Gleb Natapov	2fb3dc03f6	net: remove unused opts parameter from proxy_net_device constructor	2014-12-09 18:55:05 +02:00
Asias He	9a9297c89d	ip: Implement fragment timeout and memory usage limit	2014-12-09 09:59:44 +02:00
Asias He	89c8c6148f	net: Add packet::memory Add packet::memory() which estimates the memory load (by adding sizeof packet::impl). Note it will only be accurate after linearize/compact.	2014-12-09 09:59:44 +02:00
Asias He	c03e356873	net: Improve packet::linearize Free the original memory earlier if copied all of them.	2014-12-09 09:59:43 +02:00
Nadav Har'El	3f2ea82e6d	dpdk: rx checksum offloading If the card supports this (and usually, it does), enable rx checksum offloading by the card, and avoid calculating the checksums ourselves. With rx checksum offloading, the card checks in incoming packets the IP header checksum and the L4 (TCP or UDP) checksum, and gives us a flag when one of them is wrong, meaning that we do not need to do these calculations ourselves. This patch improves memcached performance on my setup by almost 3%. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-12-08 20:41:31 +02:00
Avi Kivity	f4d7bd7e00	reactor: register pollers using a RAII class Avoids leaking a poller.	2014-12-07 17:36:44 +02:00
Vlad Zolotarov	5bc89b974a	dpdk: First proper offload features initialization - Query the port for its caps. - Properly adjust the queue numbers according to the caps. - Enable RSS only if the final queues number is greater than 1. - Enable Rx VLAN stripping. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-07 17:32:36 +02:00
Vlad Zolotarov	5cc8785b96	packet: Added HW VLAN stipping option. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-07 17:32:36 +02:00
Vlad Zolotarov	2d10018870	dpdk: separate the EAL initialization from port initialization - Create a new class dpdk_eal that initializes DPDK EAL. - Get rid of portmask crap and provide a port index to a dpdk::net_device constructor. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-07 17:31:12 +02:00
Avi Kivity	a2016bc1dd	ip: fix smp fragment reassembly ipv4::handle_on_cpu() did not properly convert from network byte order, so it saw any packets with DF=1 as fragmented. Fix by applying the proper conversion.	2014-12-07 12:01:31 +02:00
Avi Kivity	2ee0239a4a	Merge branch 'tgrabiec/zero-copy-2' of github.com:cloudius-systems/seastar-dev Zero-copy memcached get from Tomasz: "I've measured memcached on muninn/huginn to be 7.5% better with this on vhost stack."	2014-12-04 16:31:04 +02:00
Tomasz Grabiec	c4335c49f6	core: convert output APIs to work on packets This way zero-copy supporting code can put data directly to packet object and pass it through all layers efficiently.	2014-12-04 13:51:26 +01:00
Tomasz Grabiec	72b0794759	packet: add constructor for appending temporary_buffers	2014-12-04 13:37:35 +01:00
Tomasz Grabiec	3a2d74e3d3	packet: add reserve() method	2014-12-04 13:37:35 +01:00
Tomasz Grabiec	f3dada6f1d	packet: add constructor for appending deleters Deleters not always come with fragments. When multiple fragments share a deleter, first fragments are appended and then one deleter for all of them.	2014-12-04 13:37:35 +01:00
Tomasz Grabiec	8ffcdac455	packet: move lambdas rather than copy them Some lambdas are not copyable.	2014-12-04 13:37:35 +01:00
Tomasz Grabiec	2650c68824	packet: add more constructor variants	2014-12-04 13:37:35 +01:00
Avi Kivity	3e4842a2a1	Merge branch 'asias/ip' of github.com:cloudius-systems/seastar-dev IP fragment reassembly from Asias.	2014-12-03 16:03:18 +02:00
Asias He	59aa280f0d	ip: Add IPv4 reassembly support If a TCP or UDP IP datagram is fragmented, only the first fragment will contain the port information. When a fragment without port information is received, we have no idea which "stream" this fragment belongs to, thus we no idea how to forward this packet. To solve this problem, we use "forward twice" method. When IP datagram which needs fragmentation is received, we forward it using the frag_id(src_ip, dst_ip, identification, protocol) hash. When all the fragments are received, we forward it using the connection_id(src_ip, src_port, dst_ip, dst_port) hash.	2014-12-03 21:40:49 +08:00
Gleb Natapov	4d3b6497ea	reactor: rework poll infrastructure Move idle state management out from smp poller back to generic code. Each poller returns if it did any useful work and generic code decided if it should go idle based on that. If a poller requires constant polling it should always return true.	2014-12-03 14:37:33 +02:00
Tomasz Grabiec	f556172619	temporary_buffer: make empty buffer don't need to malloc()	2014-12-03 13:15:09 +01:00
Tomasz Grabiec	76a8908b21	virtio: fix indentation	2014-12-03 13:15:09 +01:00
Asias He	2702af5e7d	net: Add help packet_merger This can be used for both TCP out-of-order and IP fragmentation merging.	2014-12-03 17:47:30 +08:00
Asias He	8335787268	net: Expose interface::forward This can be used with ipv4 fragmentation.	2014-12-03 17:47:29 +08:00
Asias He	7ca33fdd72	ip: Add helper for fragmentation	2014-12-03 17:47:29 +08:00
Gleb Natapov	7dbc333da6	core: Allow forwarding from/to any cpu	2014-12-03 17:47:29 +08:00
Gleb Natapov	bf46f9c948	net: Change how networking devices are created Currently each cpu creates network device as part of native networking stack creation and all cpus create native networking stack independently, which makes it impossible to use data initialized by one cpu in another cpu's networking device initialization. For multiqueue devices often some parts of an initialization have to be handled by one cpu and all other cpus should wait for the first one before creating their network devices. Even without multiqueue proxy devices should be created after master device is created so that proxy device may get a pointer to the master at creation time (existing code uses global per cpu device pointer and assume that master device is created on cpu 0 to compensate for the lack of ordering). This patch makes it possible to delay native networking stack creation until network device is created. It allows one cpu to be responsible for creation of network devices on multiple cpus. Single queue device initialize master device on one cpu and call other cpus with a pointer to master device and its cpu id which are used in proxy device creation. This removes the need for per cpu device pointer and "master on cpu 0" assumption from the code since now master device and slave devices know about each other and can communicate directly.	2014-11-30 18:10:08 +02:00
Vlad Zolotarov	12caa3afe4	net: add option to use a dpdk PMD networking backend - Added "dpdk-pmd" option: - Defaulted to FALSE. - When TRUE - use DPDK PMD drivers. - Call for dpdk net_device creation function if dpdk-poll option is given - Added DPDK networking backend options to all options list Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-30 12:14:56 +02:00
Vlad Zolotarov	5cd984b5cc	dpdk: Initial commit - Currently only a single port and a single queue are supported. - All DPDK EAL configuration is hard-coded in the dpdk_net_device constructor instead of coming from the app parameters. - No offload features are enabled. - Tx: will spin in the dpdk_net_device::send() till there is a place in the HW ring to place a current packet. - Tx: copy data from the `packet` frags into the rte_mbuf's data. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-30 12:13:52 +02:00
Asias He	88a1a37a88	ip: Support IP fragmentation in TX path Tested with UDP sending large datagrams with ufo off.	2014-11-30 10:16:38 +02:00
Glauber Costa	b3c163e603	xen: fix typo in event channel detection Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-28 14:15:36 +01:00
Glauber Costa	c3ae30b760	xen: delete event channel as well If we don't have split channels, we need to delete the relevant property. because xs_rm() returns true if the feature does not exist, it won't affect the transaction if we just delete all of them. Therefore we don't need to do any conditional test. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-27 18:00:35 +01:00
Glauber Costa	3848130f2f	xen: only add features to feature array We are adding everything we read into the features array. Because in the destructor we will remove everything in the features list, we'll end up removing more than we should. Things like the mac address, handle, etc, should never be deleted. This is not a problem for OSv because usually, after the destructor is called, the whole guest is down. But for userspace, the network card is left there, but will cease to work if we delete too much. After we do that with the _features array - it's original intent, it becomes reduntant with features nack. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-27 18:00:35 +01:00
Glauber Costa	bd8a18c178	xen: umask event channels when setup is ready This is not required for OSv, but is required for userspace operation. It won't work without it. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-27 18:00:35 +01:00
Avi Kivity	861957e5ba	Merge branch 'glommer/xen' of github.com:cloudius-systems/seastar-dev Glauber says: "This patch yields a small performance boost. It is not complete, since the rest of the performance work is still missing since half of that is in OSv. But more importantly, it now works on AWS."	2014-11-26 18:30:26 +02:00
Glauber Costa	b56a89d5c9	xen: translate feature name When the backend advertises "feature-rx-copy", the frontend should register for "request-rx-copy". The local hypervisor seems to be forgiving about it, but the one in AWS, it is not, and doubly so. First, it doesn't recognize these as the same. And second, it refuses to connect the backend if this feature is not advertised by the frontend. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-26 17:22:58 +01:00
Glauber Costa	a9a79e3ba6	xen: ring unification The ring processing is almost the same for both rx and tx, with the exception with the core of the action. We can actually unify them nicely with some use of template programming. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-26 17:21:09 +01:00
Glauber Costa	e7c9aeb8a5	xen: interrupt mitigation There are two things we can do that will lead to less interrupts being sent. The first, is to read the new rsp_cons value at the end of every interaction. If the backend produces more frames in the mean time, we'll be able to process in the same round, without getting another interrupt. The other, is to set the rsp_event only after all the frames are processed. As a matter of fact, both the tx and rx rings did one of them, but not the same one. The next patch will unify the ring code to avoid problems like that in the future. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-26 17:17:45 +01:00
Gleb Natapov	4f4731c37b	net: delay network stack creation Network device has to be available when network stack is created, but sometimes network device creation should wait for device initialization by another cpu. This patch makes it possible to delay network stack creation until network device is available.	2014-11-26 16:46:04 +02:00
Avi Kivity	87fdf52205	Merge branch 'clang'	2014-11-26 15:01:14 +02:00
Avi Kivity	e8894227bc	xen: declare nr_ents higher to satisfy clang	2014-11-26 15:00:13 +02:00
Avi Kivity	8ce9697401	dhcp: wrap initializers with braces to prevent ambiguity	2014-11-26 14:59:49 +02:00
Asias He	1a1ff2a22a	tcp: Fix get_isn It should be microseconds instead of milliseconds. Signed-off-by: Asias He <asias@cloudius-systems.com>	2014-11-26 13:26:54 +02:00
Asias He	fecf47b50a	tcp: Defending against sequence number attacks This patch implements initial sequence number generation algorithm per RFC6528.	2014-11-26 12:34:16 +02:00
Gleb Natapov	cee8eb3121	net: remove unused function from net/native-stack.hh	2014-11-26 12:19:47 +02:00
Avi Kivity	9eea1752b0	Merge branch 'asias/tcp' of github.com:cloudius-systems/seastar-dev TCP improvements from Asias.	2014-11-25 11:58:47 +02:00

1 2 3 4 5 ...

336 Commits