Commit Graph

209 Commits

Author SHA1 Message Date
Glauber Costa
6bb8d687d0 native stack: support more than virtio
Support xenfront as well, when we are in a Xen domain.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-05 15:09:03 +02:00
Glauber Costa
72abe62c4e xenfront basic support
This is the basic support for xenfront. It can be used in domU, provided there
is a network interface to be hijacked.

The code that follows, is just the mechanics of managing the grants, event
channels, etc.

However, it does not yet work: I can't see netback injecting any data into it.
I am still debugging the protocol, but I wanted to flush the current state.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-05 15:09:03 +02:00
Gleb Natapov
d77ee625bd virtio: signal availability of a virtio buffer in a vring after sending packet
Currently there is an implicit unbounded queue between virtio driver
and networking stack where packets may accumulate if they are received
faster that networking stack can handle them. The queuing happen because
virtio buffer availability is signaled immediately after received buffer
promise is fulfilled, but promise fulfilment does not mean that buffer is
processed, only that task that will process it is placed on a task queue.

The patch fixes the problem by making virtio buffer available only after
previous buffer's completion task is executed. It makes the aforementioned
implicit queue between virtio driver and networking stack bound by virtio
ring size.
2014-11-04 15:19:27 +02:00
Gleb Natapov
99941f0c16 virtio: remove feedback from virtio_net_device::queue_rx_packet()
Instead of providing back pressure towards NIC, which will cause NIC to
slow down and drop packets, network stack should drop packets it cannot
handle by itself. Otherwise one slow receiver may cause drops for all
others.  Our native network stack correctly drops packets instead of
providing feedback, so it is safe to just remove feedback from an API.
2014-11-04 15:19:13 +02:00
Avi Kivity
174cc6b876 packet: add linearize()
This is helpful for net devices that do not support scatter/gather.
2014-11-04 10:55:04 +02:00
Avi Kivity
31078be7f7 net: initialize interface::_proto_map early
If the driver starts pushing packets early, we need this field to be
initialized so they can be properly ignored.
2014-11-04 10:54:44 +02:00
Asias He
c33270105b net: Handle extra bytes contained in Ethernet frame.
The Ethernet frame might contain extra bytes after the IP packet for
padding. Trim the extra bytes in order not to confuse TCP.

E.g. When doing TCP connection:

1) <SYN>
2) <SYN,ACK>
3) <ACK>

Packet 3) should be 14 + 20 + 20 = 54 bytes, the sender might send a
packet of size 60 bytes, containing 6 extra bytes for padding.

Fix httpd on ran/sif.
2014-11-04 10:41:41 +02:00
Asias He
345d3a3628 net: Add trim_back to packet 2014-11-04 10:13:36 +02:00
Avi Kivity
7a1f84a556 reactor: replace references to reactor::_id by its accessor cpu_id() 2014-11-01 17:34:43 +02:00
Asias He
b4544a3c76 tcp: Retransmission support
Manage the RTO Timer using the algorithm in rfc6298.
2014-10-31 12:11:20 +02:00
Tomasz Grabiec
95fd885996 virito: fix typo 2014-10-30 19:50:58 +02:00
Gleb Natapov
af91a11b2b net: implement virtual packet queue in net::proxy::send()
Currently net::proxy::send() waits for previous packet to be sent to
another cpu before accepting next packet.  This slows down sender to much.
This patch implement virtual queue that allows many packets to be send
simultaneously but it starts to drop packets when number of outstanding
packets goes over the limit, if we will try to queue them we will run
out of memory if a sender generates packets faster that they can be
sent. It also tries to generate back pressure by returning a feature
that will become ready when queue goes under the limit, but it returns
it only for a first sender that goes over it. Doing this for all callers
may be an option too, but it is not clear which one and how many should
we wake when queue goes under the limit again.
2014-10-30 18:36:08 +02:00
Nadav Har'El
f497299f44 virtio: support virtio ring assigned from OSv
As a second option beyond running on Linux with vhost, this patch
allows Seastar to run in OSv with the virtio network device "assigned"
to the application (i.e., we use the virtio rings directly, with no OSv
involvement beyond the initial setup).

To use this feature, one needs to compile Seastar with the "HAVE_OSV"
flag, the osv::assigned_virtio::get() symbol needs to be available
(which means we run under OSv), and it should return a non-null object
(which means the OSv was run with --assign-net).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:08 +02:00
Nadav Har'El
4b44968e86 virtio: expose notifier's wake_wait
The wake_wait() method is only available for the notifier. Expose it
from the vring holding this notifier, and from the rx or tx queue holding
this vring.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:07 +02:00
Nadav Har'El
5db5f7622a virtio: make virtio_net_device an abstract class
Make virtio_net_device an abstract class, and move the vhost-specific
code to a subclass, virtio_net_device_vhost.

In a subsequent patch, we'll have a second subclass, for a virtio
device assigned from OSv.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:07 +02:00
Nadav Har'El
8326f43ded virtio: make virt_to_phys a virtual function
In the existing code, virt_to_phys() was a fixed do-nothing function.
This is good for vhost, but not good enough in OSv where the to convert
virtual addresses to physical we need an actual calculation.

The solution in this patch, using a virtual function, is not optimal
and should probably be replaced with a template later.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:06 +02:00
Nadav Har'El
db16e4f634 virtio: separate notification from vring
Currently, the "vring" class is hardcoded to do guest-host notifications
via eventfd. This patch switches to a general "notification object" with
two virtual functions - host_notify(), which unconditionally notifies the
host, and host_wait() which returns a future<> on which one can wait for
the host to notify us.

This patch provides one implementation of this notification object, using
eventfd as before, as needed when using vhost. We'll later provide a
different implementation for running under OSv.

This patch uses pointers and virtual functions; This adds a bit of
overhead to every notification, but it is small compared to the other
costs of these notifications. Nevertheless, we can change it in the
future to make the notification object a template parameter instead of
an abstract class.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:06 +02:00
Avi Kivity
fa7ea4f86e Revert "tcp: Retransmission support"
This reverts commit 71ecf7650a - it leaks
memory like crazy.
2014-10-30 13:46:10 +02:00
Tomasz Grabiec
c5b7bbf37f net: udp: do not use packet data in native_datagram's methods
It's too easy to shoot yourself in the foot when trying to call
get_src() after packet data was moved.

Reported-by: Asias He <asias@cloudius-systems.com>
2014-10-30 12:40:15 +02:00
Tomasz Grabiec
95975151f6 virtio: change descriptor free list to FIFO instead of LIFO
Based on observation that with packets comprised of multiple fragments
vhost_get_vq_desc() goes higher in CPU profile. Avi suggested that the
current LIFO handling of free descriptors causes contention on cache
lines between seastar on vhost.

Gives 6-10% boost depending on hardware.
2014-10-29 19:19:54 +02:00
Asias He
c16384a9fd tcp: Delete connection when <RST> is received
wrk might send <RST> instead of <FIN> to close a connection.
2014-10-29 10:03:34 +02:00
Asias He
71ecf7650a tcp: Retransmission support
Use very simple algorithm as a starter.
2014-10-29 10:03:32 +02:00
Gleb Natapov
1c827805bc virtio: Use correct eventfd for virtio rx queue
It is nice to be able to actually kick rx queue from time to time.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-28 16:47:26 +02:00
Asias He
04dd72efd3 tcp: Support close initiated on server side for posix stack
Fix hang with ab test on posix stack:

ab -n 1000 http://127.0.0.1/

Fixes #3
2014-10-28 12:42:47 +02:00
Avi Kivity
929f714e4c net: allow constructing a packet from const data
Since we're copying the data, we can call const_cast<> without fear.
2014-10-28 11:01:37 +02:00
Avi Kivity
d5675c32a7 net: add ostream support for packet
print your packets with

   print("got packet: %s\n", p);

!
2014-10-28 11:01:37 +02:00
Avi Kivity
ae7c071a01 net: fix packet::append with internal data 2014-10-28 11:01:37 +02:00
Asias He
fd56c6345c net: Remove leftover code in packet.hh 2014-10-28 10:42:18 +02:00
Avi Kivity
6dcf24f98d Move contents of async-action.hh into future-util.hh 2014-10-27 19:28:10 +02:00
Avi Kivity
9fbd13175b net: move mechanics of listening to a tcp connection to tcp.cc
Removes an include of tcp.hh.
2014-10-24 22:18:54 +03:00
Avi Kivity
e18b77d5cd udp: add missing include 2014-10-24 22:18:54 +03:00
Avi Kivity
04db837450 net: move native stack implementation classes to new header file
This will allow us to instantiate them for tcp in tcp.cc, reducing
compile times.
2014-10-24 22:18:54 +03:00
Avi Kivity
332cd6424b ip: use indirection to access tcp
This reduces the number of files that include tcp.hh.
2014-10-24 22:18:46 +03:00
Avi Kivity
ec7b5eeed2 tcp: move ipv4_tcp implementation into tcp.cc
First step in isolating tcp from the rest of the stack.
2014-10-24 21:45:20 +03:00
Asias He
c08879edea tcp: Do not advertise zero window when ACK remote FIN
Follow Linux's behavior.
2014-10-24 16:27:00 +08:00
Asias He
fb0123ec61 tcp: Ack data and FIN in a single packet 2014-10-24 16:25:35 +08:00
Asias He
cbc5e9392f tcp: Send <ACK> packet to ack data only when data is present 2014-10-24 16:24:06 +08:00
Asias He
6018b27bab tcp: Add comments for SYN 2014-10-24 16:23:25 +08:00
Asias He
4717d0bc48 net: Rename stack -> native-stack 2014-10-24 09:14:16 +08:00
Asias He
d251f33123 net: Remove unnecessary include of "stack.hh" 2014-10-24 09:10:23 +08:00
Asias He
e2c2580e81 tcp: Fix ACK with both data and FIN
local  send: <FIN>
remote send: <ACK>

In response to local <FIN> packet, remote can send <ACK> packet acking
both data and FIN.

Avoid consuming extra one byte in data handling.
2014-10-23 12:57:33 +03:00
Asias He
d12e495653 tcp: Support close initiated on server side 2014-10-23 12:57:32 +03:00
Tomasz Grabiec
cec8b6c5de core: fix SIGSEGV in packet::packet(fragment frag, packet&& x)
We move from x._impl in the initializer list, so it will hold nullptr
later.
2014-10-23 11:31:03 +03:00
Asias He
aa06198f0a tcp: Remove tcb from tcbs when connection is closed
Currently tcb are inserted but never removed from tcbs.

This patch also removes an unnecessary <ACK> packet (packet #5 below) to
the client.

1) client: <FIN>
2) server: <ACK>
3) server: <FIN>
4) client: <ACK>
5) server: <ACK>
2014-10-22 16:39:49 +03:00
Calle Wilund
40db2c0ba1 Collectd 'daemon' module
Per-cpu value list registry with polling -> udp send

- Allows registration of metric values associated with
  collectd id path (plugin/[plugin-inst/]type[/type-instance]).
- Values are broadcast/sent at periodic intervals. (config)
- Config through seastar.conf / app-template.
- Value registration can be revoked safely, either manually or
  through anchor.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-22 12:01:16 +03:00
Asias He
6561bde964 net: Add TCP option support
Maximum segment size and Window scale option are supported currently.
2014-10-22 10:28:06 +03:00
Avi Kivity
91782ac6a2 virtio: optimize single-buffer packet deleter
Instead of allocating a vector to store the buffers to be destroyed, in the
case of a single buffer, use an ordinary free deleter.

This doesn't currently help much because the packet is share()d later on,
but if we may be able to eliminate the sharing one day.
2014-10-21 11:27:05 +03:00
Avi Kivity
61782fcc05 packet: add a vectored constructor with a deleter
Add packet(Iterator, Iterator, deleter).

(unfortunately we have both a template version with a template parameter
named Deleter, and a non-template version with a parameter called deleter.
Need to sort the naming out).
2014-10-21 11:24:27 +03:00
Avi Kivity
7ac12f4839 Merge branch 'virtio'
Remove allocations from the virtio receive and transmit paths.
2014-10-19 10:58:30 +03:00
Glauber Costa
79b9053751 net: parse ethernet address from string
Given a string, return the corresponding ethernet address. This is useful
specially for xen, where we read the mac address from the xenstore.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-10-19 10:55:26 +03:00