Commit Graph

40 Commits

Author SHA1 Message Date
Gleb Natapov
13c1324d45 net: provide some statistics via collectd
Provide batching and overall send/received packet stats.
2015-01-08 17:41:26 +02:00
Gleb Natapov
77bd21c387 net: implement bulk sending interface for proxy queue
Take advantage of the bulk interface to send several packets simultaneity
with one submit_to() to remote cpu.
2015-01-06 15:24:10 +02:00
Gleb Natapov
12bce3f4fc net: make interface get packets from l3
Instead of l3 (arp/ipv4) pushing packets into interface's queue, make
them register functions that interface can use to ask l3 for packets.
2015-01-06 15:24:10 +02:00
Gleb Natapov
e5d0adb339 net: make qp poll for tx packets from networking stack
Packets are accumulated in interface's packet queue. The queue is polled
by qp to see if there is something to send.
2015-01-06 15:24:10 +02:00
Gleb Natapov
865d95c0f1 net: provide bulk sending interface for qp
Implement it as calls to send() in a loop for now. Each device will
get proper implementation later.
2015-01-06 15:24:10 +02:00
Gleb Natapov
6ad9114c0b reactor: add at_destroy() function to the reactor and use it
Unfortunately at_exit() cannot be used to delete objects since when
it runs the reactor is still active and deleted object may still been used.
We need another API that runs its task after reactor is already stopped.
at_destroy() will be such api.
2014-12-30 15:21:10 +02:00
Gleb Natapov
a445b8174e net: wait for link to be ready before creating network stack 2014-12-29 13:06:10 +02:00
Gleb Natapov
510171d083 net: add function to map packet's rss hash to a cpu
Provide a function that maps packet's rss hash to a cpu that should handle
it. This function is needed to find appropriate src port for outgoing
tcp/udp connection. Use this function to forward de-fragmented ip packet
to avoid one extra hop too.
2014-12-23 17:36:40 +02:00
Gleb Natapov
c8189157ed net: use RSS hash key calculated by HW if available
Some (all?) RSS capable HW provides us with a hash that was used to
select rx queue the packet was delivered to. If such hash is available
it is better to use it to forward packet instead of calculating hash
ourself and suffering cache missed.
2014-12-16 10:53:41 +02:00
Gleb Natapov
d8ddaeb104 net: forward reassembled ip packet to correct queue
To figure out a cpu that should handle reassembled TCP packet RSS
redirection table have to be consulted.
2014-12-16 10:53:41 +02:00
Gleb Natapov
fbef83beb0 net: support for num of cpus > num of queues
This patch introduce a logic to divide cpus between available hw queue
pairs. Each cpu with hw qp gets a set of cpus to distribute traffic
to. The algorithm doesn't take any topology considerations into account yet.
2014-12-16 10:53:41 +02:00
Gleb Natapov
7ac3ba901c net: rework packet forwarding logic
Instead of forward() deciding packet destination make it collect input
for RSS hash function depending on packet type. After data is collected
use toeplitz hash function to calculate packet's destination.
2014-12-16 10:53:41 +02:00
Gleb Natapov
649210b5b6 net: rename net::distributed_device to net::device 2014-12-11 13:06:32 +02:00
Gleb Natapov
0e70ba69cf net: rename net::device to net::qp 2014-12-11 13:06:27 +02:00
Gleb Natapov
8ff89f7f01 net: remove unused device_placement struct 2014-12-11 13:06:22 +02:00
Nadav Har'El
3d874892a7 dpdk: enable transmit-side checksumming offload
This patch uses the NIC's capability to calculate in hardware the IP, TCP
and UDP checksums on outgoing packets, instead of us doing this on the
sending CPU. This can save us quite a bit of calculations (especially for
the TCP/UDP checksum of full-sized packets), and avoid cache-polution on
the CPU when sending cold data.

On my setup this patch improves the performance of a single-cpu memcached
by 6%. Together with the recent patch for receive-side checksum offloading,
the total improvement  is 10%.

This patch is somewhat complicated by the fact we have so many different
combinations of checksum-offloading capabilities; While virtio can only
offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and
layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP
(e.g., ICMP), and some packets are not even IP (e.g., ARP), so this
patch modifies a few of the hardware-features flags and the per-packet
offload-information flags to fit our new needs.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-12-10 18:05:02 +02:00
Gleb Natapov
73f6d943e1 net: separate device initialization from queues initialization
This patch adds new class distributed_device which is responsible for
initializing HW device and it is shared between all cpus. Old device
class responsibility becomes managing rx/tx queue pair and it is local
per cpu. Each cpu have to call distributed_device::init_local_queue() to
create its own device. The logic to distribute cpus between available
queues (in case there is no enough queues for each cpu) is in the
distributed_device currently and not really implemented yet, so only one
queue or queues == cpus scenarios are supported currently, but this can
be fixed later.

The plan is to rename "distributed_device" to "device" and "device"
to "queue_pair" in later patches.
2014-12-09 18:55:14 +02:00
Asias He
8335787268 net: Expose interface::forward
This can be used with ipv4 fragmentation.
2014-12-03 17:47:29 +08:00
Gleb Natapov
7dbc333da6 core: Allow forwarding from/to any cpu 2014-12-03 17:47:29 +08:00
Gleb Natapov
bf46f9c948 net: Change how networking devices are created
Currently each cpu creates network device as part of native networking
stack creation and all cpus create native networking stack independently,
which makes it impossible to use data initialized by one cpu in another
cpu's networking device initialization. For multiqueue devices often some
parts of an initialization have to be handled by one cpu and all other
cpus should wait for the first one before creating their network devices.
Even without multiqueue proxy devices should be created after master
device is created so that proxy device may get a pointer to the master
at creation time (existing code uses global per cpu device pointer and
assume that master device is created on cpu 0 to compensate for the lack
of ordering).

This patch makes it possible to delay native networking stack creation
until network device is created. It allows one cpu to be responsible
for creation of network devices on multiple cpus. Single queue device
initialize master device on one cpu and call other cpus with a pointer
to master device and its cpu id which are used in proxy device creation.
This removes the need for per cpu device pointer and "master on cpu 0"
assumption from the code since now master device and slave devices know
about each other and can communicate directly.
2014-11-30 18:10:08 +02:00
Gleb Natapov
136a56859f net: limit the number of packets that are waiting to be sent to another cpu
If packet arrive faster than they can be forwarded we can run out of
memory.
2014-11-09 18:06:22 +02:00
Asias He
2a582fd1a6 net: fix tso maximum packet size
With TSO enabled, we can see a Ethernet frame larger than 64K on tap
device. This makes wireshark unable to handle. It complains:

   The capture file appears to be damaged or corrupt.
   (pcapng_read_packet_block: cap_len 65549 is larger than
   WTAP_MAX_PACKET_SIZE 65535.)
2014-11-06 14:50:11 +02:00
Avi Kivity
31078be7f7 net: initialize interface::_proto_map early
If the driver starts pushing packets early, we need this field to be
initialized so they can be properly ignored.
2014-11-04 10:54:44 +02:00
Asias He
2625dd5944 net: Introduce eth_protocol_num 2014-10-13 11:37:56 +08:00
Asias He
a111b2fa1e net: TCP segment offload and UDP fragmentation offload support
Only tx path is added for now. Will enable rx path when merge receive
buffer feature is supported in virtio-net.

UDP fragmentation offload is not hooked up, will do in a separate patch.
2014-10-09 19:52:55 +03:00
Gleb Natapov
4e7d8a8506 Introduce packet classification mechanism
Classifier returns what cpu a packets should be processed on. It may
return special broadcast identifier. The patch includes classifier for
tcp, udp and arp. Arp classifier broadcasts arp reply to all cpus. Default
classifier does not forward packet.
2014-10-07 11:03:57 +03:00
Gleb Natapov
0b59abafa7 Add net::device::l2inject function
Will need it later to handle forwarded packets. Also save net::device
pointer in thread local variable to get to device instance easily. When
we ill have more then one device per cpu we will have to change to
something more sophisticated.
2014-10-07 11:03:52 +03:00
Asias He
236418d262 net: Support TCP checksum offload
It gives ~5% httpd improvements on monster.

csum-offload option is added, e.g., to disable:

./httpd --network-stack native --csum-offload off
2014-09-24 11:03:39 +03:00
Avi Kivity
313768654a net: remove queuing from l2->l3 rx path
Use a subscription instead.  Queueing should be implemented at the highest
possible level (e.g. tcp), to avoid double-queueing.
2014-09-22 11:28:35 +03:00
Avi Kivity
4738f3f05c net: switch device rx to stream<packet>
Still have that internal rx queue.
2014-09-22 11:27:47 +03:00
Avi Kivity
812ac77d2f net: spit out packet class into its own files 2014-09-16 10:13:09 +03:00
Avi Kivity
4d28e910db net: queue packets at the L3 protocol level
If an L3 packet receiver is not able to register itself as a packet receiver
after processing a packet, or if it is simply not dispatched quickly enough,
then we will drop packets.

Add a queue at the protocol layer to buffer those packets.
2014-09-14 16:00:07 +03:00
Avi Kivity
509e4e2768 net: add packet move assignment operator 2014-09-10 10:42:05 +03:00
Avi Kivity
b91389b1d5 core: extract packet::deleter into a core class
Useful everywhere zero-copy can be used.
2014-09-04 13:28:51 +03:00
Avi Kivity
6a53e41053 net: packet sharing support
Add a share() method that enables reference counting for the packet and
returns a clone.  The packet's deleter will only be invoked after all clones
are destroyed.

This is useful for tcp, which keeps a packet in the unacknowledged transmit
queue while sending it lower down the stack.
2014-09-04 09:13:00 +03:00
Avi Kivity
ee3c7a0c1d net: add packet::append(packet&&)
Allows constructing a mega-packet out of several input packets.
2014-09-03 11:55:07 +03:00
Avi Kivity
b65e054604 net: optimize prepending headers to empty packets 2014-09-03 09:56:56 +03:00
Avi Kivity
1fbe325f63 net: add a helper to allocate a header in an existing packet
Use in IP and ethernet layers.
2014-09-02 23:29:43 +03:00
Avi Kivity
7473b13275 net: fix packet::trim_front() when trimming entire packet
When trimming the tcp header from a tcp packet without any data (such
as a SYN), nothing remains.  trim_front() did not account for this,
and crashed.

Fix by checking for the condition.
2014-09-02 18:24:34 +03:00
Avi Kivity
c77f77ee3f build: organize files into a directory structure 2014-08-31 21:29:13 +03:00