scylladb

Author	SHA1	Message	Date
Gleb Natapov	13c1324d45	net: provide some statistics via collectd Provide batching and overall send/received packet stats.	2015-01-08 17:41:26 +02:00
Gleb Natapov	77bd21c387	net: implement bulk sending interface for proxy queue Take advantage of the bulk interface to send several packets simultaneity with one submit_to() to remote cpu.	2015-01-06 15:24:10 +02:00
Gleb Natapov	12bce3f4fc	net: make interface get packets from l3 Instead of l3 (arp/ipv4) pushing packets into interface's queue, make them register functions that interface can use to ask l3 for packets.	2015-01-06 15:24:10 +02:00
Gleb Natapov	e5d0adb339	net: make qp poll for tx packets from networking stack Packets are accumulated in interface's packet queue. The queue is polled by qp to see if there is something to send.	2015-01-06 15:24:10 +02:00
Gleb Natapov	865d95c0f1	net: provide bulk sending interface for qp Implement it as calls to send() in a loop for now. Each device will get proper implementation later.	2015-01-06 15:24:10 +02:00
Gleb Natapov	6ad9114c0b	reactor: add at_destroy() function to the reactor and use it Unfortunately at_exit() cannot be used to delete objects since when it runs the reactor is still active and deleted object may still been used. We need another API that runs its task after reactor is already stopped. at_destroy() will be such api.	2014-12-30 15:21:10 +02:00
Gleb Natapov	a445b8174e	net: wait for link to be ready before creating network stack	2014-12-29 13:06:10 +02:00
Gleb Natapov	510171d083	net: add function to map packet's rss hash to a cpu Provide a function that maps packet's rss hash to a cpu that should handle it. This function is needed to find appropriate src port for outgoing tcp/udp connection. Use this function to forward de-fragmented ip packet to avoid one extra hop too.	2014-12-23 17:36:40 +02:00
Gleb Natapov	c8189157ed	net: use RSS hash key calculated by HW if available Some (all?) RSS capable HW provides us with a hash that was used to select rx queue the packet was delivered to. If such hash is available it is better to use it to forward packet instead of calculating hash ourself and suffering cache missed.	2014-12-16 10:53:41 +02:00
Gleb Natapov	d8ddaeb104	net: forward reassembled ip packet to correct queue To figure out a cpu that should handle reassembled TCP packet RSS redirection table have to be consulted.	2014-12-16 10:53:41 +02:00
Gleb Natapov	fbef83beb0	net: support for num of cpus > num of queues This patch introduce a logic to divide cpus between available hw queue pairs. Each cpu with hw qp gets a set of cpus to distribute traffic to. The algorithm doesn't take any topology considerations into account yet.	2014-12-16 10:53:41 +02:00
Gleb Natapov	7ac3ba901c	net: rework packet forwarding logic Instead of forward() deciding packet destination make it collect input for RSS hash function depending on packet type. After data is collected use toeplitz hash function to calculate packet's destination.	2014-12-16 10:53:41 +02:00
Gleb Natapov	649210b5b6	net: rename net::distributed_device to net::device	2014-12-11 13:06:32 +02:00
Gleb Natapov	0e70ba69cf	net: rename net::device to net::qp	2014-12-11 13:06:27 +02:00
Gleb Natapov	8ff89f7f01	net: remove unused device_placement struct	2014-12-11 13:06:22 +02:00
Nadav Har'El	3d874892a7	dpdk: enable transmit-side checksumming offload This patch uses the NIC's capability to calculate in hardware the IP, TCP and UDP checksums on outgoing packets, instead of us doing this on the sending CPU. This can save us quite a bit of calculations (especially for the TCP/UDP checksum of full-sized packets), and avoid cache-polution on the CPU when sending cold data. On my setup this patch improves the performance of a single-cpu memcached by 6%. Together with the recent patch for receive-side checksum offloading, the total improvement is 10%. This patch is somewhat complicated by the fact we have so many different combinations of checksum-offloading capabilities; While virtio can only offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP (e.g., ICMP), and some packets are not even IP (e.g., ARP), so this patch modifies a few of the hardware-features flags and the per-packet offload-information flags to fit our new needs. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-12-10 18:05:02 +02:00
Gleb Natapov	73f6d943e1	net: separate device initialization from queues initialization This patch adds new class distributed_device which is responsible for initializing HW device and it is shared between all cpus. Old device class responsibility becomes managing rx/tx queue pair and it is local per cpu. Each cpu have to call distributed_device::init_local_queue() to create its own device. The logic to distribute cpus between available queues (in case there is no enough queues for each cpu) is in the distributed_device currently and not really implemented yet, so only one queue or queues == cpus scenarios are supported currently, but this can be fixed later. The plan is to rename "distributed_device" to "device" and "device" to "queue_pair" in later patches.	2014-12-09 18:55:14 +02:00
Asias He	8335787268	net: Expose interface::forward This can be used with ipv4 fragmentation.	2014-12-03 17:47:29 +08:00
Gleb Natapov	7dbc333da6	core: Allow forwarding from/to any cpu	2014-12-03 17:47:29 +08:00
Gleb Natapov	bf46f9c948	net: Change how networking devices are created Currently each cpu creates network device as part of native networking stack creation and all cpus create native networking stack independently, which makes it impossible to use data initialized by one cpu in another cpu's networking device initialization. For multiqueue devices often some parts of an initialization have to be handled by one cpu and all other cpus should wait for the first one before creating their network devices. Even without multiqueue proxy devices should be created after master device is created so that proxy device may get a pointer to the master at creation time (existing code uses global per cpu device pointer and assume that master device is created on cpu 0 to compensate for the lack of ordering). This patch makes it possible to delay native networking stack creation until network device is created. It allows one cpu to be responsible for creation of network devices on multiple cpus. Single queue device initialize master device on one cpu and call other cpus with a pointer to master device and its cpu id which are used in proxy device creation. This removes the need for per cpu device pointer and "master on cpu 0" assumption from the code since now master device and slave devices know about each other and can communicate directly.	2014-11-30 18:10:08 +02:00
Gleb Natapov	136a56859f	net: limit the number of packets that are waiting to be sent to another cpu If packet arrive faster than they can be forwarded we can run out of memory.	2014-11-09 18:06:22 +02:00
Asias He	2a582fd1a6	net: fix tso maximum packet size With TSO enabled, we can see a Ethernet frame larger than 64K on tap device. This makes wireshark unable to handle. It complains: The capture file appears to be damaged or corrupt. (pcapng_read_packet_block: cap_len 65549 is larger than WTAP_MAX_PACKET_SIZE 65535.)	2014-11-06 14:50:11 +02:00
Avi Kivity	31078be7f7	net: initialize interface::_proto_map early If the driver starts pushing packets early, we need this field to be initialized so they can be properly ignored.	2014-11-04 10:54:44 +02:00
Asias He	2625dd5944	net: Introduce eth_protocol_num	2014-10-13 11:37:56 +08:00
Asias He	a111b2fa1e	net: TCP segment offload and UDP fragmentation offload support Only tx path is added for now. Will enable rx path when merge receive buffer feature is supported in virtio-net. UDP fragmentation offload is not hooked up, will do in a separate patch.	2014-10-09 19:52:55 +03:00
Gleb Natapov	4e7d8a8506	Introduce packet classification mechanism Classifier returns what cpu a packets should be processed on. It may return special broadcast identifier. The patch includes classifier for tcp, udp and arp. Arp classifier broadcasts arp reply to all cpus. Default classifier does not forward packet.	2014-10-07 11:03:57 +03:00
Gleb Natapov	0b59abafa7	Add net::device::l2inject function Will need it later to handle forwarded packets. Also save net::device pointer in thread local variable to get to device instance easily. When we ill have more then one device per cpu we will have to change to something more sophisticated.	2014-10-07 11:03:52 +03:00
Asias He	236418d262	net: Support TCP checksum offload It gives ~5% httpd improvements on monster. csum-offload option is added, e.g., to disable: ./httpd --network-stack native --csum-offload off	2014-09-24 11:03:39 +03:00
Avi Kivity	313768654a	net: remove queuing from l2->l3 rx path Use a subscription instead. Queueing should be implemented at the highest possible level (e.g. tcp), to avoid double-queueing.	2014-09-22 11:28:35 +03:00
Avi Kivity	4738f3f05c	net: switch device rx to stream<packet> Still have that internal rx queue.	2014-09-22 11:27:47 +03:00
Avi Kivity	812ac77d2f	net: spit out packet class into its own files	2014-09-16 10:13:09 +03:00
Avi Kivity	4d28e910db	net: queue packets at the L3 protocol level If an L3 packet receiver is not able to register itself as a packet receiver after processing a packet, or if it is simply not dispatched quickly enough, then we will drop packets. Add a queue at the protocol layer to buffer those packets.	2014-09-14 16:00:07 +03:00
Avi Kivity	509e4e2768	net: add packet move assignment operator	2014-09-10 10:42:05 +03:00
Avi Kivity	b91389b1d5	core: extract packet::deleter into a core class Useful everywhere zero-copy can be used.	2014-09-04 13:28:51 +03:00
Avi Kivity	6a53e41053	net: packet sharing support Add a share() method that enables reference counting for the packet and returns a clone. The packet's deleter will only be invoked after all clones are destroyed. This is useful for tcp, which keeps a packet in the unacknowledged transmit queue while sending it lower down the stack.	2014-09-04 09:13:00 +03:00
Avi Kivity	ee3c7a0c1d	net: add packet::append(packet&&) Allows constructing a mega-packet out of several input packets.	2014-09-03 11:55:07 +03:00
Avi Kivity	b65e054604	net: optimize prepending headers to empty packets	2014-09-03 09:56:56 +03:00
Avi Kivity	1fbe325f63	net: add a helper to allocate a header in an existing packet Use in IP and ethernet layers.	2014-09-02 23:29:43 +03:00
Avi Kivity	7473b13275	net: fix packet::trim_front() when trimming entire packet When trimming the tcp header from a tcp packet without any data (such as a SYN), nothing remains. trim_front() did not account for this, and crashed. Fix by checking for the condition.	2014-09-02 18:24:34 +03:00
Avi Kivity	c77f77ee3f	build: organize files into a directory structure	2014-08-31 21:29:13 +03:00

40 Commits