Commit Graph

992 Commits

Author SHA1 Message Date
Avi Kivity
ea2cfbbcd8 virtio: fix indentation 2014-12-14 10:28:48 +02:00
Avi Kivity
535b447343 circular_buffer: get rid of {pre|post}_push_{front|back}
As Nadav suggests, with the simplified circular_buffer implementation they
no longer provide any value and only obfuscate the code.
2014-12-14 10:00:43 +02:00
Avi Kivity
94a1cdd6e4 Merge branch 'circular_buffer'
circular_buffer simplifications and enhancements.
2014-12-13 18:45:47 +02:00
Avi Kivity
209e0958d2 Merge branch 'nettx'
More virtio and smp batching.
2014-12-13 18:45:25 +02:00
Avi Kivity
9de1b10724 circular_buffer: add unsafe array access method
By allowing access-past-the-end, we can prefetch ahead of the queue without
checking the current queue size.
2014-12-11 22:20:50 +02:00
Avi Kivity
ec0fb398fb circular_buffer: optimize by using masking instead of tests
Since we control the capacity, we can force it to be a power of two,
and use masking instead of tests to handle wraparound.

A side benefit is that we don't have to allocate an extra element.
2014-12-11 22:14:02 +02:00
Avi Kivity
aaf9884064 circular_buffer: fix pop_front(), pop_back()
These methods should destroy the objects they are popping.

We probably haven't seen any leaks since we usually move() the item
before popping it.
2014-12-11 21:55:09 +02:00
Avi Kivity
746dfae355 circular_buffer: add array dereference operator
Useful for prefetching.
2014-12-11 21:32:56 +02:00
Avi Kivity
8a5a8192e4 Merge branch 'hugepages' of ../seastar
Allow backing seastar memory with hugetlbfs files.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-12-11 19:25:06 +02:00
Avi Kivity
d11803d1b9 smp: batch request processing
We're currently using boost::lockfree::consume_all() to consume
smp requests, but this has two problems:

 1. consume_all() calls consume_one() internally, which means it accesses
    the ring index once per message
 2  we interleave calling the request function with accessing the ring, which
    allows the other side to access the ring again, bouncing ring cache lines.

Fix by copying all available items in one show, using pop(array), and then
processing them afterwards.
2014-12-11 19:20:50 +02:00
Avi Kivity
5855f0c82a smp: batch completion processing
We're currently using boost::lockfree::consume_all() to consume
smp completions, but this has two problems:

 1. consume_all() calls consume_one() internally, which means it accesses
    the ring index once per message
 2  we interleave calling the request function with accessing the ring, which
    allows the other side to access the ring again, bouncing ring cache lines.

Fix by copying all available items in one show, using pop(array), and then
processing them afterwards.
2014-12-11 19:20:50 +02:00
Avi Kivity
04488eebea smp: batch messages across smp request/response queues
Instead of incurring the overhead of pushing a message down the queue (two
cache line misses), amortize of over 16 messages (3/4 cache line misses per
batch).

Batch size is limited by poll frequency, so we should adjust that
dynamically.
2014-12-11 19:20:50 +02:00
Avi Kivity
2717ac3c37 smp: improve _pending_fifo flushing
Instead of flushing pending items one by one, flush them all at once,
amortizing the write to the index.
2014-12-11 19:20:50 +02:00
Avi Kivity
b6485bcb7c smp: initialize _pending_fifo on sending cpu
If it needs to be resized, it will cause a deallocation on the wrong cpu,
so initialize it on the sending cpu.

Does not break with circular_buffer<>, but it's not going to be a
circular_buffer<> for long.
2014-12-11 19:20:50 +02:00
Avi Kivity
503f1bf4d0 virtio: batch transmitted packets
Instead of placing packets directly into the virtio ring, add them to
a temporary queue, and flush it when we are polled.  This reduces
cross-cpu writes and kicks.
2014-12-11 19:20:50 +02:00
Avi Kivity
97dff83461 virtio: don't try to complete after posting a buffer, if in poll mode
We will poll for it soon anyway, and completing too soon simply reduces
batching.
2014-12-11 19:15:46 +02:00
Avi Kivity
4e653081a4 virtio: poll mode support
With a new --virtio-poll-mode, poll queues instead of waiting for an
interrupt.

Increases httpd throughput by about 12%.
2014-12-11 19:15:46 +02:00
Pekka Enberg
0a12cb6d65 README: Add libpciaccess-devel package to pre-requisites
It's needed on Fedora to build Seastar.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2014-12-11 14:15:18 +02:00
Gleb Natapov
da53dcff80 net: simplify calculation of number of queues 2014-12-11 13:06:38 +02:00
Gleb Natapov
649210b5b6 net: rename net::distributed_device to net::device 2014-12-11 13:06:32 +02:00
Gleb Natapov
0e70ba69cf net: rename net::device to net::qp 2014-12-11 13:06:27 +02:00
Gleb Natapov
8ff89f7f01 net: remove unused device_placement struct 2014-12-11 13:06:22 +02:00
Avi Kivity
db88632456 reactor: wire up hugetlbfs support 2014-12-11 12:25:31 +02:00
Avi Kivity
4453fd1d6a memory: add support for allocating memory via hugetlbfs
This is a little tricky, since we only know we want hugetlbfs after memory
has been initialized, so we start up in anonymous memory, and later
switch to hugetlbfs by copying it to hugetlb-backed memory and mremap()ing
it back into place.
2014-12-11 12:25:31 +02:00
Avi Kivity
ca2c7d8767 memory: abstract mmap() call
To support hugepages, we will need a different mmap() call, so abstract
it out.
2014-12-11 12:25:31 +02:00
Avi Kivity
0043c1a994 memory: drop duplicate madvise() call 2014-12-11 12:25:31 +02:00
Avi Kivity
38443e2c4c posix: change file_desc mmap API to return an mmap_area
An mmap_area munmap()s itself when destroyed, reclaiming memory.
2014-12-11 12:25:31 +02:00
Avi Kivity
158c61063b posix: allow providing the hint/addr parameter to mmap 2014-12-11 12:25:31 +02:00
Avi Kivity
fe8785fb6a posix: allow specifiying mmap flags
Change 'shared' to a flags parameter so that we can specify flags other
than MAP_PRIVATE or MAP_SHARED.
2014-12-11 12:25:31 +02:00
Avi Kivity
ee339bb6ea posix: fix file_desc::map() flags parameter name
It's actually protection, not flags, so change to align with the syscall
to avoid confusion.
2014-12-11 12:25:31 +02:00
Avi Kivity
2e0035dac8 posix: fix file_desc::map() error checking
mmap(2) returns MAP_FAILED on error, not nullptr.
2014-12-11 12:25:31 +02:00
Avi Kivity
c95927f223 posix: add file_desc::size() 2014-12-11 12:25:31 +02:00
Avi Kivity
160907bf05 posix: add support for ftruncate() 2014-12-10 20:04:13 +02:00
Avi Kivity
91dc788a33 posix: add support for creating temporary files 2014-12-10 20:04:13 +02:00
Nadav Har'El
3d874892a7 dpdk: enable transmit-side checksumming offload
This patch uses the NIC's capability to calculate in hardware the IP, TCP
and UDP checksums on outgoing packets, instead of us doing this on the
sending CPU. This can save us quite a bit of calculations (especially for
the TCP/UDP checksum of full-sized packets), and avoid cache-polution on
the CPU when sending cold data.

On my setup this patch improves the performance of a single-cpu memcached
by 6%. Together with the recent patch for receive-side checksum offloading,
the total improvement  is 10%.

This patch is somewhat complicated by the fact we have so many different
combinations of checksum-offloading capabilities; While virtio can only
offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and
layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP
(e.g., ICMP), and some packets are not even IP (e.g., ARP), so this
patch modifies a few of the hardware-features flags and the per-packet
offload-information flags to fit our new needs.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-12-10 18:05:02 +02:00
Asias He
53f95abd96 virtio: Fix feature setup
This fixes a big tcp_server rx regression.

Before:
========== rxrx ============
Server:  192.168.66.123:10000
Connections:  100
Bytes Sent(MiB):  10000
Total Time(Secs):  85.074086675           --->> big regression!!!
Bandwidth(MiB/Sec):  117.54460601148733

After:
========== rxrx ============
Server:  192.168.66.123:10000
Connections:  100
Bytes Sent(MiB):  10000
Total Time(Secs):  9.905637754
Bandwidth(MiB/Sec):  1009.5261151622362
2014-12-10 11:01:54 +02:00
Avi Kivity
fa5c61d4e4 temporary_buffer: fix wrong oom check
If malloc(0) is allowed to return nullptr, so don't throw an exception in
that case.
2014-12-10 10:33:29 +02:00
Avi Kivity
9aadcb7718 Merge branch 'deleter'
Fix a memory leak in packet and bugs in the deleter class that make it likely.
2014-12-10 09:53:59 +02:00
Avi Kivity
441331f158 temporary_buffer: fix missing exception
Since we switched temporary_buffer to malloc(), it now longer throws
an exception after running out of memory, which leads to a segfault
when referencing a null buffer.
2014-12-10 09:53:37 +02:00
Avi Kivity
9ae2075d54 deleter: remove bad/unused interfaces 2014-12-09 20:37:44 +02:00
Avi Kivity
b87a76412c packet: avoid hand-rolled deleter chaining, use deleter::append instead
The hand-rolled deleter chaining in packet::append was invalidated
by the make_free_deleter() optimization, since deleter->_next is no longer
guaranteed to be valid (and deleter::operator->() is still exposed, despite
that).

Switch to deleter::append(), which does the right thing.

Fixes a memory leak in tcp_server.
2014-12-09 20:37:17 +02:00
Avi Kivity
7708627144 deleter: improve make_free_deleter() with null input
While make_free_deleter(nullptr) will function correctly,
deleter::operator bool() on the result will not.

Fix by checking for null, and avoiding the free deleter optimization in
that case -- it doesn't help anyway.
2014-12-09 20:37:16 +02:00
Avi Kivity
15dd8ed1bb deleter: mark as final class
Prevent accidental inheritance.
2014-12-09 20:24:35 +02:00
Gleb Natapov
8bb82512a1 net: enable RSS for V4 IP/UDP/TCP 2014-12-09 18:55:19 +02:00
Gleb Natapov
73f6d943e1 net: separate device initialization from queues initialization
This patch adds new class distributed_device which is responsible for
initializing HW device and it is shared between all cpus. Old device
class responsibility becomes managing rx/tx queue pair and it is local
per cpu. Each cpu have to call distributed_device::init_local_queue() to
create its own device. The logic to distribute cpus between available
queues (in case there is no enough queues for each cpu) is in the
distributed_device currently and not really implemented yet, so only one
queue or queues == cpus scenarios are supported currently, but this can
be fixed later.

The plan is to rename "distributed_device" to "device" and "device"
to "queue_pair" in later patches.
2014-12-09 18:55:14 +02:00
Gleb Natapov
2fb3dc03f6 net: remove unused opts parameter from proxy_net_device constructor 2014-12-09 18:55:05 +02:00
Gleb Natapov
34a8744fd3 smp: wait for all cpus before signaling start promise
If start promise on initial cpu is signaled before other cpus have
networking stack constructed collected initialization crashes since it
tries to create a UDP socket on all available cpus when initial one is
ready.
2014-12-09 18:54:56 +02:00
Avi Kivity
7dfd7de8cd future: optimize data-less future<>
A future that does not carry any data (future<>) and its sibling (promise<>)
are heavily used in the code.  We can optimize them by overlaying the
future's payload, which in this case can only be an std::exception_ptr,
with the future state, as a pointer and an enum have disjoint values.

This of course depends on std::exception_ptr being implemented as a pointer,
but as it happens, it is.

With this, sizeof(future<>) is reduced from 24 bytes to 16 bytes.
2014-12-09 10:08:48 +02:00
Asias He
20acb6db9c xen: Fix mismatched signature
Found with clang:

[46/68] CXX build/release/core/xen/evtchn.o
FAILED: clang -MMD -MT build/release/core/xen/evtchn.o -MF
build/release/core/xen/evtchn.o.d -std=gnu++1y   -Wall -Werror
-fvisibility=hidden -pthread -I.  -Wno-mismatched-tags -DHAVE_XEN
-DHAVE_HWLOC -DHAVE_NUMA -O2 -I build/release/gen -c -o
build/release/core/xen/evtchn.o core/xen/evtchn.cc
core/xen/evtchn.cc:83:18: error: 'xen::userspace_evtchn::umask' hides
overloaded virtual function [-Werror,-Woverloaded-virtual]
    virtual void umask(int *port, unsigned count);
                 ^
core/xen/evtchn.hh:38:18: note: hidden overloaded virtual function
'xen::evtchn::umask' declared here: type mismatch at 2nd parameter
('int' vs 'unsigned int')
    virtual void umask(int *port, int count) {};
                 ^
1 error generated.
2014-12-09 09:59:46 +02:00
Asias He
9a9297c89d ip: Implement fragment timeout and memory usage limit 2014-12-09 09:59:44 +02:00