This is the basic support for xenfront. It can be used in domU, provided there
is a network interface to be hijacked.
The code that follows, is just the mechanics of managing the grants, event
channels, etc.
However, it does not yet work: I can't see netback injecting any data into it.
I am still debugging the protocol, but I wanted to flush the current state.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Currently there is an implicit unbounded queue between virtio driver
and networking stack where packets may accumulate if they are received
faster that networking stack can handle them. The queuing happen because
virtio buffer availability is signaled immediately after received buffer
promise is fulfilled, but promise fulfilment does not mean that buffer is
processed, only that task that will process it is placed on a task queue.
The patch fixes the problem by making virtio buffer available only after
previous buffer's completion task is executed. It makes the aforementioned
implicit queue between virtio driver and networking stack bound by virtio
ring size.
Instead of providing back pressure towards NIC, which will cause NIC to
slow down and drop packets, network stack should drop packets it cannot
handle by itself. Otherwise one slow receiver may cause drops for all
others. Our native network stack correctly drops packets instead of
providing feedback, so it is safe to just remove feedback from an API.
The Ethernet frame might contain extra bytes after the IP packet for
padding. Trim the extra bytes in order not to confuse TCP.
E.g. When doing TCP connection:
1) <SYN>
2) <SYN,ACK>
3) <ACK>
Packet 3) should be 14 + 20 + 20 = 54 bytes, the sender might send a
packet of size 60 bytes, containing 6 extra bytes for padding.
Fix httpd on ran/sif.
Currently net::proxy::send() waits for previous packet to be sent to
another cpu before accepting next packet. This slows down sender to much.
This patch implement virtual queue that allows many packets to be send
simultaneously but it starts to drop packets when number of outstanding
packets goes over the limit, if we will try to queue them we will run
out of memory if a sender generates packets faster that they can be
sent. It also tries to generate back pressure by returning a feature
that will become ready when queue goes under the limit, but it returns
it only for a first sender that goes over it. Doing this for all callers
may be an option too, but it is not clear which one and how many should
we wake when queue goes under the limit again.
As a second option beyond running on Linux with vhost, this patch
allows Seastar to run in OSv with the virtio network device "assigned"
to the application (i.e., we use the virtio rings directly, with no OSv
involvement beyond the initial setup).
To use this feature, one needs to compile Seastar with the "HAVE_OSV"
flag, the osv::assigned_virtio::get() symbol needs to be available
(which means we run under OSv), and it should return a non-null object
(which means the OSv was run with --assign-net).
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
The wake_wait() method is only available for the notifier. Expose it
from the vring holding this notifier, and from the rx or tx queue holding
this vring.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Make virtio_net_device an abstract class, and move the vhost-specific
code to a subclass, virtio_net_device_vhost.
In a subsequent patch, we'll have a second subclass, for a virtio
device assigned from OSv.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
In the existing code, virt_to_phys() was a fixed do-nothing function.
This is good for vhost, but not good enough in OSv where the to convert
virtual addresses to physical we need an actual calculation.
The solution in this patch, using a virtual function, is not optimal
and should probably be replaced with a template later.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Currently, the "vring" class is hardcoded to do guest-host notifications
via eventfd. This patch switches to a general "notification object" with
two virtual functions - host_notify(), which unconditionally notifies the
host, and host_wait() which returns a future<> on which one can wait for
the host to notify us.
This patch provides one implementation of this notification object, using
eventfd as before, as needed when using vhost. We'll later provide a
different implementation for running under OSv.
This patch uses pointers and virtual functions; This adds a bit of
overhead to every notification, but it is small compared to the other
costs of these notifications. Nevertheless, we can change it in the
future to make the notification object a template parameter instead of
an abstract class.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
It's too easy to shoot yourself in the foot when trying to call
get_src() after packet data was moved.
Reported-by: Asias He <asias@cloudius-systems.com>
Based on observation that with packets comprised of multiple fragments
vhost_get_vq_desc() goes higher in CPU profile. Avi suggested that the
current LIFO handling of free descriptors causes contention on cache
lines between seastar on vhost.
Gives 6-10% boost depending on hardware.
local send: <FIN>
remote send: <ACK>
In response to local <FIN> packet, remote can send <ACK> packet acking
both data and FIN.
Avoid consuming extra one byte in data handling.
Currently tcb are inserted but never removed from tcbs.
This patch also removes an unnecessary <ACK> packet (packet #5 below) to
the client.
1) client: <FIN>
2) server: <ACK>
3) server: <FIN>
4) client: <ACK>
5) server: <ACK>
Per-cpu value list registry with polling -> udp send
- Allows registration of metric values associated with
collectd id path (plugin/[plugin-inst/]type[/type-instance]).
- Values are broadcast/sent at periodic intervals. (config)
- Config through seastar.conf / app-template.
- Value registration can be revoked safely, either manually or
through anchor.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Instead of allocating a vector to store the buffers to be destroyed, in the
case of a single buffer, use an ordinary free deleter.
This doesn't currently help much because the packet is share()d later on,
but if we may be able to eliminate the sharing one day.
Add packet(Iterator, Iterator, deleter).
(unfortunately we have both a template version with a template parameter
named Deleter, and a non-template version with a parameter called deleter.
Need to sort the naming out).
Given a string, return the corresponding ethernet address. This is useful
specially for xen, where we read the mac address from the xenstore.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>