Commit Graph

50 Commits

Author SHA1 Message Date
Gleb Natapov
32b42af49f net: register l3 poller for tcp connections
This patch change tcp to register a poller so that l3 can poll tcp for
a packet instead of pushing packets from tcp to ipv4. This pushes
networking tx path inversion a little bit closer to an application.
2015-01-11 10:48:32 +02:00
Gleb Natapov
d5c309c74e net: provide poller registration API between l3 and l4
Both push and pull methods will be supported between l3 and l4 after
this patch.
2015-01-11 10:17:48 +02:00
Gleb Natapov
2b340b80ce net: unfuturize packet fragmentation
Since sending of a single packet does not involve futures anymore we can
simplify this code.
2015-01-11 10:17:48 +02:00
Gleb Natapov
b824790798 net: move udp_v4 from network_stack into ipv4 class
ipv4 class manages tcp and icmp, but for some reason udp is managed by
network_stack. Fix this and make all L4 protocol handling to be the same.
2015-01-08 11:33:19 +02:00
Gleb Natapov
0fd014fc35 net: add add completion callback between l3 and l4
L4 will provide the callback to be called by L3 after the packet is
handled to lower layers for transmission. L4 will know that it can queue
more data from user at this point. The patch also change send function
that can no longer block to return void instead of future<>.
2015-01-06 15:24:10 +02:00
Gleb Natapov
e80fa4af7d net: drop top level 'remaining' from ipv4::send()
It is not needed.
2015-01-06 15:24:10 +02:00
Gleb Natapov
12bce3f4fc net: make interface get packets from l3
Instead of l3 (arp/ipv4) pushing packets into interface's queue, make
them register functions that interface can use to ask l3 for packets.
2015-01-06 15:24:10 +02:00
Avi Kivity
87f63f7b90 shared_ptr: rename to lw_shared_ptr (for light-weight)
The current shared_ptr implementation is efficient, but does not support
polymorphic types.

Rename it in order to make room for a polymorphic shared_ptr.
2015-01-04 22:38:49 +02:00
Gleb Natapov
510171d083 net: add function to map packet's rss hash to a cpu
Provide a function that maps packet's rss hash to a cpu that should handle
it. This function is needed to find appropriate src port for outgoing
tcp/udp connection. Use this function to forward de-fragmented ip packet
to avoid one extra hop too.
2014-12-23 17:36:40 +02:00
Avi Kivity
3e4c53300d Merge branch 'mq' of ssh://github.com/cloudius-systems/seastar-dev
Multiqueue support for #cpu != #q, from Gleb.
2014-12-16 11:11:22 +02:00
Gleb Natapov
d8ddaeb104 net: forward reassembled ip packet to correct queue
To figure out a cpu that should handle reassembled TCP packet RSS
redirection table have to be consulted.
2014-12-16 10:53:41 +02:00
Gleb Natapov
7ac3ba901c net: rework packet forwarding logic
Instead of forward() deciding packet destination make it collect input
for RSS hash function depending on packet type. After data is collected
use toeplitz hash function to calculate packet's destination.
2014-12-16 10:53:41 +02:00
Gleb Natapov
c13adb9c12 net: rework how dhcp handles dhcp packet.
Currently dhcp assumes that cpu 0 gets all the packets and redistributes
them by itself. With multiqueue this is not necessary the case, so the
current trick to disable forwarding by installing special dhcp forward()
function will not work. Rework it by installing packet filter on all
cpus before running dhcp and forward all dhcp packets to cpu 0.
2014-12-15 17:31:25 +02:00
Asias He
0790266be0 ip: Switch to use lowres_clock 2014-12-15 19:39:33 +08:00
Nadav Har'El
3d874892a7 dpdk: enable transmit-side checksumming offload
This patch uses the NIC's capability to calculate in hardware the IP, TCP
and UDP checksums on outgoing packets, instead of us doing this on the
sending CPU. This can save us quite a bit of calculations (especially for
the TCP/UDP checksum of full-sized packets), and avoid cache-polution on
the CPU when sending cold data.

On my setup this patch improves the performance of a single-cpu memcached
by 6%. Together with the recent patch for receive-side checksum offloading,
the total improvement  is 10%.

This patch is somewhat complicated by the fact we have so many different
combinations of checksum-offloading capabilities; While virtio can only
offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and
layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP
(e.g., ICMP), and some packets are not even IP (e.g., ARP), so this
patch modifies a few of the hardware-features flags and the per-packet
offload-information flags to fit our new needs.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-12-10 18:05:02 +02:00
Asias He
9a9297c89d ip: Implement fragment timeout and memory usage limit 2014-12-09 09:59:44 +02:00
Avi Kivity
a2016bc1dd ip: fix smp fragment reassembly
ipv4::handle_on_cpu() did not properly convert from network byte order, so
it saw any packets with DF=1 as fragmented.

Fix by applying the proper conversion.
2014-12-07 12:01:31 +02:00
Asias He
59aa280f0d ip: Add IPv4 reassembly support
If a TCP or UDP IP datagram is fragmented, only the first fragment will
contain the port information. When a fragment without port information
is received, we have no idea which "stream" this fragment belongs to,
thus we no idea how to forward this packet.

To solve this problem, we use "forward twice" method. When IP datagram
which needs fragmentation is received, we forward it using the
frag_id(src_ip, dst_ip, identification, protocol) hash. When all the
fragments are received, we forward it using the connection_id(src_ip,
src_port, dst_ip, dst_port) hash.
2014-12-03 21:40:49 +08:00
Asias He
88a1a37a88 ip: Support IP fragmentation in TX path
Tested with UDP sending large datagrams with ufo off.
2014-11-30 10:16:38 +02:00
Gleb Natapov
d698811bdd fix smp broadcast packet handling
Some packets, like arp replies, are broadcast to all cpus for handling,
but only packet structure is copied for each cpu, the actual packet data
is the same for all of them. Currently networking stack mangles a
packet data during its travel up the stack while doing ntoh()
translations which cannot obviously work for broadcaster packets. This
patches fixes the code to not modify packet data while doing ntoh(), but
do it in a stack allocated copy of a data instead.
2014-11-06 10:30:30 +02:00
Calle Wilund
bd263b3b4e net: Add "packet filter" functionality + accessors + "raw" packet send function
Perhaps not the best way to enable "hijacking" the ip stack (for DHCP
querying), but considering the options seems the least intrusive.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-11-05 14:50:28 +02:00
Asias He
c33270105b net: Handle extra bytes contained in Ethernet frame.
The Ethernet frame might contain extra bytes after the IP packet for
padding. Trim the extra bytes in order not to confuse TCP.

E.g. When doing TCP connection:

1) <SYN>
2) <SYN,ACK>
3) <ACK>

Packet 3) should be 14 + 20 + 20 = 54 bytes, the sender might send a
packet of size 60 bytes, containing 6 extra bytes for padding.

Fix httpd on ran/sif.
2014-11-04 10:41:41 +02:00
Avi Kivity
7a1f84a556 reactor: replace references to reactor::_id by its accessor cpu_id() 2014-11-01 17:34:43 +02:00
Asias He
2625dd5944 net: Introduce eth_protocol_num 2014-10-13 11:37:56 +08:00
Asias He
5cf3f200c5 net: Introduce ip_protocol_num
We use this in all the places where the ip protocol number is used.
2014-10-13 11:37:56 +08:00
Asias He
05c72b0808 net: UDP checksum offload and UPD fragmentation offload 2014-10-13 11:37:56 +08:00
Gleb Natapov
4e7d8a8506 Introduce packet classification mechanism
Classifier returns what cpu a packets should be processed on. It may
return special broadcast identifier. The patch includes classifier for
tcp, udp and arp. Arp classifier broadcasts arp reply to all cpus. Default
classifier does not forward packet.
2014-10-07 11:03:57 +03:00
Tomasz Grabiec
05aece51dc virtio: remove intermediate queue
Currently the send path buffers packets in (unbounded) _tx_queue and
in virtio ring. On queue would suffice though.

This change also prpagates the back pressure resulting from queue-full
condition up the send path. This is needed, becasue otherwise if
senders are faster than the network we will eventually run out of
memory. This would also cause a "buffer bloat" effect, which hurts
latency-sensitive workloads.
2014-10-04 11:27:23 +02:00
Tomasz Grabiec
076d3b2682 ip: connect send() action with L3's send() action
So that back-pressure or failure from the lower layers are tranferred.
2014-10-04 11:27:23 +02:00
Tomasz Grabiec
04b53b7498 ip: make send() composable
This allows the caller to compose it with other actions when send() is
done or when it fails.
2014-10-01 13:45:28 +02:00
Asias He
cff8cb353a net: Add netmask option 2014-09-28 10:06:08 +03:00
Asias He
7ab735d3c7 net: Gateway support 2014-09-28 10:05:58 +03:00
Asias He
c5d623265d net: Support ping 2014-09-25 17:49:38 +03:00
Asias He
4bd9f4d49d net: Always check IP header checksum
virtio-net do tcp checksum offload not IP header checksum offload.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
2014-09-25 12:56:53 +03:00
Asias He
236418d262 net: Support TCP checksum offload
It gives ~5% httpd improvements on monster.

csum-offload option is added, e.g., to disable:

./httpd --network-stack native --csum-offload off
2014-09-24 11:03:39 +03:00
Avi Kivity
907792fe26 net: ipv4: remove unused field to please clang 2014-09-22 17:19:15 +03:00
Avi Kivity
abbc62588b net: checksum incoming ipv4 packets 2014-09-22 15:46:35 +03:00
Avi Kivity
313768654a net: remove queuing from l2->l3 rx path
Use a subscription instead.  Queueing should be implemented at the highest
possible level (e.g. tcp), to avoid double-queueing.
2014-09-22 11:28:35 +03:00
Tomasz Grabiec
53ce24c850 ip: add method to register L4 protocol handlers 2014-09-16 18:48:14 +03:00
Avi Kivity
37c90fe54e net: make packet data members private
This will assist in future refactoring.
2014-09-16 11:24:13 +03:00
Avi Kivity
5e2f1f0bc6 net: spit out ip checksum routines into their own file 2014-09-16 10:26:44 +03:00
Avi Kivity
89ec8f2ae7 net: fix IP checksum overflow during reduction
When reducing the checksum from a 32-bit or 64-bit intermediate,
we can get an overflow after the first overflow handling step:

0000_8000_8000_ffff
-> 10_ffff
->  1_000f
->    0010

Since we lacked the second step, we got an off-by-one in the checksum.
2014-09-14 16:58:27 +03:00
Avi Kivity
1fbe325f63 net: add a helper to allocate a header in an existing packet
Use in IP and ethernet layers.
2014-09-02 23:29:43 +03:00
Avi Kivity
1396459085 net: integrate tcp into ipv4
Define the traits class used to communicate address types and pseudo header
to tcp, and a few glue classes.
2014-09-02 20:39:12 +03:00
Avi Kivity
673dd21c8b net: fix ip tx
- checksum
- total length
- endianness

were all wrong.
2014-09-02 20:34:19 +03:00
Avi Kivity
c6412f23fc net: fix ip netmask checks 2014-09-02 20:33:49 +03:00
Avi Kivity
0a45d4d73b net: implement IPv4 L3->l4 dispatching 2014-09-01 15:19:17 +03:00
Avi Kivity
b2b24031e9 net: generalize IP checksummer
Allow it to checksum packets and fragments.
2014-09-01 15:17:27 +03:00
Avi Kivity
0fbda7c1ec Fix IP checksum for off lengths
Since the data is in network byte order, we must pad the last word to the right.
2014-08-31 23:37:34 +03:00
Avi Kivity
c77f77ee3f build: organize files into a directory structure 2014-08-31 21:29:13 +03:00