Add a space after the "Checking link status" to prevent it from
merging with "done" if the link is up immediatelly.
For instance this is going to be the case for a VF
of a PF with already established link (e.g. on AWS).
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
We assume that if Rx IPv4, TCP and UDP checksum offload features are suported then
they are supported or not supported all together. The same is about the Tx UDP and TCP
checksum offload.
Add the assert that check this assumption.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Even if port has a single queue we still want the RSS feature to be
available in order to make HW calculate RSS hash for us.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
DPDK 1.8 provides per-device default Tx and Rx queues configurations in the output
of rte_eth_dev_info_get(). Use them instead of ixgbe tuned hardcoded values.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
- Rename: init_port() -> init_port_start().
- Added a function init_port_fini() that has a code originally found flat in
init_local_queue().
- Moved the link state check to init_port_fini() since the link state should
be checked after the port has been started.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
- Move the smp::dpdk_eal_init() code into the dpdk::eal::init() where it belongs.
- Removed the unused "opts" parameter of dpdk::dpdk_device constructor - all its usage
has been moved to dpdk::eal::init().
- Cleanup in reactor.cc: #if HAVE_DPDK -> #ifdef HAVE_DPDK; since we give a -DHAVE_DPDK
option to a compiler.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
DPDK initialization creates its own threads and assumes that application
uses them, otherwise things do not work correctly (rte_lcore_id()
returns incorrect value for instance). This patch uses DPDK threads to
run seastar main loop making DPDK APIs work as expected.
Some (all?) RSS capable HW provides us with a hash that was used to
select rx queue the packet was delivered to. If such hash is available
it is better to use it to forward packet instead of calculating hash
ourself and suffering cache missed.
This patch introduce a logic to divide cpus between available hw queue
pairs. Each cpu with hw qp gets a set of cpus to distribute traffic
to. The algorithm doesn't take any topology considerations into account yet.
This patch uses the NIC's capability to calculate in hardware the IP, TCP
and UDP checksums on outgoing packets, instead of us doing this on the
sending CPU. This can save us quite a bit of calculations (especially for
the TCP/UDP checksum of full-sized packets), and avoid cache-polution on
the CPU when sending cold data.
On my setup this patch improves the performance of a single-cpu memcached
by 6%. Together with the recent patch for receive-side checksum offloading,
the total improvement is 10%.
This patch is somewhat complicated by the fact we have so many different
combinations of checksum-offloading capabilities; While virtio can only
offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and
layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP
(e.g., ICMP), and some packets are not even IP (e.g., ARP), so this
patch modifies a few of the hardware-features flags and the per-packet
offload-information flags to fit our new needs.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This patch adds new class distributed_device which is responsible for
initializing HW device and it is shared between all cpus. Old device
class responsibility becomes managing rx/tx queue pair and it is local
per cpu. Each cpu have to call distributed_device::init_local_queue() to
create its own device. The logic to distribute cpus between available
queues (in case there is no enough queues for each cpu) is in the
distributed_device currently and not really implemented yet, so only one
queue or queues == cpus scenarios are supported currently, but this can
be fixed later.
The plan is to rename "distributed_device" to "device" and "device"
to "queue_pair" in later patches.
If the card supports this (and usually, it does), enable rx checksum
offloading by the card, and avoid calculating the checksums ourselves.
With rx checksum offloading, the card checks in incoming packets the
IP header checksum and the L4 (TCP or UDP) checksum, and gives us a
flag when one of them is wrong, meaning that we do not need to do these
calculations ourselves.
This patch improves memcached performance on my setup by almost 3%.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
- Query the port for its caps.
- Properly adjust the queue numbers according to the caps.
- Enable RSS only if the final queues number is greater than 1.
- Enable Rx VLAN stripping.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
- Create a new class dpdk_eal that initializes DPDK EAL.
- Get rid of portmask crap and provide a port index to a dpdk::net_device
constructor.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Move idle state management out from smp poller back to generic code. Each
poller returns if it did any useful work and generic code decided if it
should go idle based on that. If a poller requires constant polling it
should always return true.
Currently each cpu creates network device as part of native networking
stack creation and all cpus create native networking stack independently,
which makes it impossible to use data initialized by one cpu in another
cpu's networking device initialization. For multiqueue devices often some
parts of an initialization have to be handled by one cpu and all other
cpus should wait for the first one before creating their network devices.
Even without multiqueue proxy devices should be created after master
device is created so that proxy device may get a pointer to the master
at creation time (existing code uses global per cpu device pointer and
assume that master device is created on cpu 0 to compensate for the lack
of ordering).
This patch makes it possible to delay native networking stack creation
until network device is created. It allows one cpu to be responsible
for creation of network devices on multiple cpus. Single queue device
initialize master device on one cpu and call other cpus with a pointer
to master device and its cpu id which are used in proxy device creation.
This removes the need for per cpu device pointer and "master on cpu 0"
assumption from the code since now master device and slave devices know
about each other and can communicate directly.
- Currently only a single port and a single queue are supported.
- All DPDK EAL configuration is hard-coded in the dpdk_net_device constructor instead
of coming from the app parameters.
- No offload features are enabled.
- Tx: will spin in the dpdk_net_device::send() till there is a place in the HW ring to
place a current packet.
- Tx: copy data from the `packet` frags into the rte_mbuf's data.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>