Commit Graph

83 Commits

Author SHA1 Message Date
Gleb Natapov
13c1324d45 net: provide some statistics via collectd
Provide batching and overall send/received packet stats.
2015-01-08 17:41:26 +02:00
Gleb Natapov
51fb18aba0 net: remove unused variable from virtio 2015-01-08 16:45:01 +02:00
Gleb Natapov
aae617f9f5 net: revert whatever is left from "virtio: batch transmitted packets" commit.
Revert remains of commit 503f1bf4 since there is no need to batch
packets inside virtio any more. Upper layer does it already.
2015-01-06 15:24:10 +02:00
Gleb Natapov
72324f02e2 net: implement bulk sending interface for virtio 2015-01-06 15:24:10 +02:00
Gleb Natapov
d329a0a614 net: remove non polling mode from virtio-net 2014-12-28 14:54:43 +02:00
Avi Kivity
ebf89ac560 virtio: use make_object_deleter 2014-12-16 14:55:02 +02:00
Gleb Natapov
fbef83beb0 net: support for num of cpus > num of queues
This patch introduce a logic to divide cpus between available hw queue
pairs. Each cpu with hw qp gets a set of cpus to distribute traffic
to. The algorithm doesn't take any topology considerations into account yet.
2014-12-16 10:53:41 +02:00
Avi Kivity
a3f08c32de virtio: rename misleading _deleters field
It's just a set of buffers (albeit maintained as unique_ptrs for their
destructors).  Not the 'deleter' type.
2014-12-15 11:42:33 +02:00
Avi Kivity
38b1398750 virtio: remove outdated TODO re single-fragment packet
We already special case single fragment packets on the receive path.
2014-12-15 11:39:00 +02:00
Avi Kivity
508322c7da virtio: de-futurize receive
Move completion handling (destroy packet, adjust descriptors count) to
a completion function rather than a future.  Reduces allocations and task
executed.
2014-12-14 18:49:01 +02:00
Avi Kivity
1ee959d3e2 virtio: de-futurize transmit
Move completion handling (destroy packet, adjust descriptors count) to
a completion function rather than a future.  Reduces allocations and task
executed.
2014-12-14 18:49:01 +02:00
Avi Kivity
c7c0aebf07 virtio: abstract vring request completions
Currently vring request completions are handled by fulfilling a promise
contained in the request.  While promises are very flexible, this comes
at a cost (allocating and executing a task), and this flexibility is unneeded
when request handling is very regular (such as in virtio-net rx and tx
completion handling).

Make vring more flexible by allowing the completion function to be specified
as a template parameter.  No changes to the actual users - they now specify
the completion function as fulfilling the same promise as vring previously
did.
2014-12-14 18:49:01 +02:00
Avi Kivity
a86faf0209 virtio: de-virtualize virt_to_phys
It is not a device property, but a system property.
2014-12-14 18:49:01 +02:00
Avi Kivity
f3d2908757 virtio: move buffer and config out of vring class
Prior to templating it, best to get the common elements out.
2014-12-14 18:49:01 +02:00
Avi Kivity
fcbcc19231 virtio: remove buffer_chain class
It's a concept that is instantiated by its users, not a true class.
2014-12-14 18:49:01 +02:00
Avi Kivity
5c4ae7a726 virtio: minor code movement 2014-12-14 18:49:01 +02:00
Avi Kivity
d14da53171 virtio: move into 'namespace virtio' 2014-12-14 18:49:01 +02:00
Avi Kivity
ea2cfbbcd8 virtio: fix indentation 2014-12-14 10:28:48 +02:00
Avi Kivity
503f1bf4d0 virtio: batch transmitted packets
Instead of placing packets directly into the virtio ring, add them to
a temporary queue, and flush it when we are polled.  This reduces
cross-cpu writes and kicks.
2014-12-11 19:20:50 +02:00
Avi Kivity
97dff83461 virtio: don't try to complete after posting a buffer, if in poll mode
We will poll for it soon anyway, and completing too soon simply reduces
batching.
2014-12-11 19:15:46 +02:00
Avi Kivity
4e653081a4 virtio: poll mode support
With a new --virtio-poll-mode, poll queues instead of waiting for an
interrupt.

Increases httpd throughput by about 12%.
2014-12-11 19:15:46 +02:00
Gleb Natapov
649210b5b6 net: rename net::distributed_device to net::device 2014-12-11 13:06:32 +02:00
Gleb Natapov
0e70ba69cf net: rename net::device to net::qp 2014-12-11 13:06:27 +02:00
Nadav Har'El
3d874892a7 dpdk: enable transmit-side checksumming offload
This patch uses the NIC's capability to calculate in hardware the IP, TCP
and UDP checksums on outgoing packets, instead of us doing this on the
sending CPU. This can save us quite a bit of calculations (especially for
the TCP/UDP checksum of full-sized packets), and avoid cache-polution on
the CPU when sending cold data.

On my setup this patch improves the performance of a single-cpu memcached
by 6%. Together with the recent patch for receive-side checksum offloading,
the total improvement  is 10%.

This patch is somewhat complicated by the fact we have so many different
combinations of checksum-offloading capabilities; While virtio can only
offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and
layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP
(e.g., ICMP), and some packets are not even IP (e.g., ARP), so this
patch modifies a few of the hardware-features flags and the per-packet
offload-information flags to fit our new needs.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-12-10 18:05:02 +02:00
Asias He
53f95abd96 virtio: Fix feature setup
This fixes a big tcp_server rx regression.

Before:
========== rxrx ============
Server:  192.168.66.123:10000
Connections:  100
Bytes Sent(MiB):  10000
Total Time(Secs):  85.074086675           --->> big regression!!!
Bandwidth(MiB/Sec):  117.54460601148733

After:
========== rxrx ============
Server:  192.168.66.123:10000
Connections:  100
Bytes Sent(MiB):  10000
Total Time(Secs):  9.905637754
Bandwidth(MiB/Sec):  1009.5261151622362
2014-12-10 11:01:54 +02:00
Gleb Natapov
73f6d943e1 net: separate device initialization from queues initialization
This patch adds new class distributed_device which is responsible for
initializing HW device and it is shared between all cpus. Old device
class responsibility becomes managing rx/tx queue pair and it is local
per cpu. Each cpu have to call distributed_device::init_local_queue() to
create its own device. The logic to distribute cpus between available
queues (in case there is no enough queues for each cpu) is in the
distributed_device currently and not really implemented yet, so only one
queue or queues == cpus scenarios are supported currently, but this can
be fixed later.

The plan is to rename "distributed_device" to "device" and "device"
to "queue_pair" in later patches.
2014-12-09 18:55:14 +02:00
Avi Kivity
2ee0239a4a Merge branch 'tgrabiec/zero-copy-2' of github.com:cloudius-systems/seastar-dev
Zero-copy memcached get from Tomasz:

"I've measured memcached on muninn/huginn to be 7.5% better with this on vhost
stack."
2014-12-04 16:31:04 +02:00
Tomasz Grabiec
76a8908b21 virtio: fix indentation 2014-12-03 13:15:09 +01:00
Gleb Natapov
7dbc333da6 core: Allow forwarding from/to any cpu 2014-12-03 17:47:29 +08:00
Gleb Natapov
bf46f9c948 net: Change how networking devices are created
Currently each cpu creates network device as part of native networking
stack creation and all cpus create native networking stack independently,
which makes it impossible to use data initialized by one cpu in another
cpu's networking device initialization. For multiqueue devices often some
parts of an initialization have to be handled by one cpu and all other
cpus should wait for the first one before creating their network devices.
Even without multiqueue proxy devices should be created after master
device is created so that proxy device may get a pointer to the master
at creation time (existing code uses global per cpu device pointer and
assume that master device is created on cpu 0 to compensate for the lack
of ordering).

This patch makes it possible to delay native networking stack creation
until network device is created. It allows one cpu to be responsible
for creation of network devices on multiple cpus. Single queue device
initialize master device on one cpu and call other cpus with a pointer
to master device and its cpu id which are used in proxy device creation.
This removes the need for per cpu device pointer and "master on cpu 0"
assumption from the code since now master device and slave devices know
about each other and can communicate directly.
2014-11-30 18:10:08 +02:00
Asias He
88a1a37a88 ip: Support IP fragmentation in TX path
Tested with UDP sending large datagrams with ufo off.
2014-11-30 10:16:38 +02:00
Avi Kivity
88b38bfbdf Revert "virtio: Lazy interrupts"
This reverts commit 817023f91741e43731823e72d60800016cbf2633; causes hangs
and throughput problems.
2014-11-24 09:28:41 +02:00
Vlad Zolotarov
1238807d98 net: implement a few proper constructors for ethernet_address
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-23 23:26:54 +02:00
Asias He
817023f917 virtio: Lazy interrupts
Tell host to interrupt less. This is useful for tx queue completion
since we do not care much when the tx is completed exactly.

Passed test with memcached and tcp_server.
2014-11-18 10:17:38 +02:00
Nadav Har'El
5b24dd78e2 virtio: don't use file eventfd for OSv notifications
Now that our reactor supports non-file-descriptor notification
mechanisms, switch to using one instead of eventfd when notifying
of virtio interrupts.

This will allow us to change the OSv enable_interrupt() code to
run the handler directly, not in a separate thread, because it
no longer needs to do sleepable write() to an eventfd file descriptor.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-13 22:24:38 +02:00
Gleb Natapov
d77ee625bd virtio: signal availability of a virtio buffer in a vring after sending packet
Currently there is an implicit unbounded queue between virtio driver
and networking stack where packets may accumulate if they are received
faster that networking stack can handle them. The queuing happen because
virtio buffer availability is signaled immediately after received buffer
promise is fulfilled, but promise fulfilment does not mean that buffer is
processed, only that task that will process it is placed on a task queue.

The patch fixes the problem by making virtio buffer available only after
previous buffer's completion task is executed. It makes the aforementioned
implicit queue between virtio driver and networking stack bound by virtio
ring size.
2014-11-04 15:19:27 +02:00
Gleb Natapov
99941f0c16 virtio: remove feedback from virtio_net_device::queue_rx_packet()
Instead of providing back pressure towards NIC, which will cause NIC to
slow down and drop packets, network stack should drop packets it cannot
handle by itself. Otherwise one slow receiver may cause drops for all
others.  Our native network stack correctly drops packets instead of
providing feedback, so it is safe to just remove feedback from an API.
2014-11-04 15:19:13 +02:00
Tomasz Grabiec
95fd885996 virito: fix typo 2014-10-30 19:50:58 +02:00
Nadav Har'El
f497299f44 virtio: support virtio ring assigned from OSv
As a second option beyond running on Linux with vhost, this patch
allows Seastar to run in OSv with the virtio network device "assigned"
to the application (i.e., we use the virtio rings directly, with no OSv
involvement beyond the initial setup).

To use this feature, one needs to compile Seastar with the "HAVE_OSV"
flag, the osv::assigned_virtio::get() symbol needs to be available
(which means we run under OSv), and it should return a non-null object
(which means the OSv was run with --assign-net).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:08 +02:00
Nadav Har'El
4b44968e86 virtio: expose notifier's wake_wait
The wake_wait() method is only available for the notifier. Expose it
from the vring holding this notifier, and from the rx or tx queue holding
this vring.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:07 +02:00
Nadav Har'El
5db5f7622a virtio: make virtio_net_device an abstract class
Make virtio_net_device an abstract class, and move the vhost-specific
code to a subclass, virtio_net_device_vhost.

In a subsequent patch, we'll have a second subclass, for a virtio
device assigned from OSv.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:07 +02:00
Nadav Har'El
8326f43ded virtio: make virt_to_phys a virtual function
In the existing code, virt_to_phys() was a fixed do-nothing function.
This is good for vhost, but not good enough in OSv where the to convert
virtual addresses to physical we need an actual calculation.

The solution in this patch, using a virtual function, is not optimal
and should probably be replaced with a template later.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:06 +02:00
Nadav Har'El
db16e4f634 virtio: separate notification from vring
Currently, the "vring" class is hardcoded to do guest-host notifications
via eventfd. This patch switches to a general "notification object" with
two virtual functions - host_notify(), which unconditionally notifies the
host, and host_wait() which returns a future<> on which one can wait for
the host to notify us.

This patch provides one implementation of this notification object, using
eventfd as before, as needed when using vhost. We'll later provide a
different implementation for running under OSv.

This patch uses pointers and virtual functions; This adds a bit of
overhead to every notification, but it is small compared to the other
costs of these notifications. Nevertheless, we can change it in the
future to make the notification object a template parameter instead of
an abstract class.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:06 +02:00
Tomasz Grabiec
95975151f6 virtio: change descriptor free list to FIFO instead of LIFO
Based on observation that with packets comprised of multiple fragments
vhost_get_vq_desc() goes higher in CPU profile. Avi suggested that the
current LIFO handling of free descriptors causes contention on cache
lines between seastar on vhost.

Gives 6-10% boost depending on hardware.
2014-10-29 19:19:54 +02:00
Gleb Natapov
1c827805bc virtio: Use correct eventfd for virtio rx queue
It is nice to be able to actually kick rx queue from time to time.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-28 16:47:26 +02:00
Avi Kivity
6dcf24f98d Move contents of async-action.hh into future-util.hh 2014-10-27 19:28:10 +02:00
Avi Kivity
91782ac6a2 virtio: optimize single-buffer packet deleter
Instead of allocating a vector to store the buffers to be destroyed, in the
case of a single buffer, use an ordinary free deleter.

This doesn't currently help much because the packet is share()d later on,
but if we may be able to eliminate the sharing one day.
2014-10-21 11:27:05 +03:00
Avi Kivity
e6834b9fb3 virtio: remove allocations from transmit path
Instead of allocating a buffer vector, construct a "virtual vector"
that transforms packet fragments as needed.
2014-10-15 17:17:01 +03:00
Avi Kivity
ba5447871b virtio: switch to allocating virtio decriptors front-to-back
Simplifies requirements on callers.
2014-10-15 17:17:01 +03:00
Avi Kivity
a331b5a129 virtio: move vring::buffer::completed to vring::buffer_chain
We aren't interested in completion of a buffer, just a buffer_chain (aka
request).  Move it there to simplify things.
2014-10-15 17:17:01 +03:00