Commit Graph

969 Commits

Author SHA1 Message Date
Avi Kivity
db88632456 reactor: wire up hugetlbfs support 2014-12-11 12:25:31 +02:00
Avi Kivity
4453fd1d6a memory: add support for allocating memory via hugetlbfs
This is a little tricky, since we only know we want hugetlbfs after memory
has been initialized, so we start up in anonymous memory, and later
switch to hugetlbfs by copying it to hugetlb-backed memory and mremap()ing
it back into place.
2014-12-11 12:25:31 +02:00
Avi Kivity
ca2c7d8767 memory: abstract mmap() call
To support hugepages, we will need a different mmap() call, so abstract
it out.
2014-12-11 12:25:31 +02:00
Avi Kivity
0043c1a994 memory: drop duplicate madvise() call 2014-12-11 12:25:31 +02:00
Avi Kivity
38443e2c4c posix: change file_desc mmap API to return an mmap_area
An mmap_area munmap()s itself when destroyed, reclaiming memory.
2014-12-11 12:25:31 +02:00
Avi Kivity
158c61063b posix: allow providing the hint/addr parameter to mmap 2014-12-11 12:25:31 +02:00
Avi Kivity
fe8785fb6a posix: allow specifiying mmap flags
Change 'shared' to a flags parameter so that we can specify flags other
than MAP_PRIVATE or MAP_SHARED.
2014-12-11 12:25:31 +02:00
Avi Kivity
ee339bb6ea posix: fix file_desc::map() flags parameter name
It's actually protection, not flags, so change to align with the syscall
to avoid confusion.
2014-12-11 12:25:31 +02:00
Avi Kivity
2e0035dac8 posix: fix file_desc::map() error checking
mmap(2) returns MAP_FAILED on error, not nullptr.
2014-12-11 12:25:31 +02:00
Avi Kivity
c95927f223 posix: add file_desc::size() 2014-12-11 12:25:31 +02:00
Avi Kivity
160907bf05 posix: add support for ftruncate() 2014-12-10 20:04:13 +02:00
Avi Kivity
91dc788a33 posix: add support for creating temporary files 2014-12-10 20:04:13 +02:00
Asias He
53f95abd96 virtio: Fix feature setup
This fixes a big tcp_server rx regression.

Before:
========== rxrx ============
Server:  192.168.66.123:10000
Connections:  100
Bytes Sent(MiB):  10000
Total Time(Secs):  85.074086675           --->> big regression!!!
Bandwidth(MiB/Sec):  117.54460601148733

After:
========== rxrx ============
Server:  192.168.66.123:10000
Connections:  100
Bytes Sent(MiB):  10000
Total Time(Secs):  9.905637754
Bandwidth(MiB/Sec):  1009.5261151622362
2014-12-10 11:01:54 +02:00
Avi Kivity
fa5c61d4e4 temporary_buffer: fix wrong oom check
If malloc(0) is allowed to return nullptr, so don't throw an exception in
that case.
2014-12-10 10:33:29 +02:00
Avi Kivity
9aadcb7718 Merge branch 'deleter'
Fix a memory leak in packet and bugs in the deleter class that make it likely.
2014-12-10 09:53:59 +02:00
Avi Kivity
441331f158 temporary_buffer: fix missing exception
Since we switched temporary_buffer to malloc(), it now longer throws
an exception after running out of memory, which leads to a segfault
when referencing a null buffer.
2014-12-10 09:53:37 +02:00
Avi Kivity
9ae2075d54 deleter: remove bad/unused interfaces 2014-12-09 20:37:44 +02:00
Avi Kivity
b87a76412c packet: avoid hand-rolled deleter chaining, use deleter::append instead
The hand-rolled deleter chaining in packet::append was invalidated
by the make_free_deleter() optimization, since deleter->_next is no longer
guaranteed to be valid (and deleter::operator->() is still exposed, despite
that).

Switch to deleter::append(), which does the right thing.

Fixes a memory leak in tcp_server.
2014-12-09 20:37:17 +02:00
Avi Kivity
7708627144 deleter: improve make_free_deleter() with null input
While make_free_deleter(nullptr) will function correctly,
deleter::operator bool() on the result will not.

Fix by checking for null, and avoiding the free deleter optimization in
that case -- it doesn't help anyway.
2014-12-09 20:37:16 +02:00
Avi Kivity
15dd8ed1bb deleter: mark as final class
Prevent accidental inheritance.
2014-12-09 20:24:35 +02:00
Gleb Natapov
8bb82512a1 net: enable RSS for V4 IP/UDP/TCP 2014-12-09 18:55:19 +02:00
Gleb Natapov
73f6d943e1 net: separate device initialization from queues initialization
This patch adds new class distributed_device which is responsible for
initializing HW device and it is shared between all cpus. Old device
class responsibility becomes managing rx/tx queue pair and it is local
per cpu. Each cpu have to call distributed_device::init_local_queue() to
create its own device. The logic to distribute cpus between available
queues (in case there is no enough queues for each cpu) is in the
distributed_device currently and not really implemented yet, so only one
queue or queues == cpus scenarios are supported currently, but this can
be fixed later.

The plan is to rename "distributed_device" to "device" and "device"
to "queue_pair" in later patches.
2014-12-09 18:55:14 +02:00
Gleb Natapov
2fb3dc03f6 net: remove unused opts parameter from proxy_net_device constructor 2014-12-09 18:55:05 +02:00
Gleb Natapov
34a8744fd3 smp: wait for all cpus before signaling start promise
If start promise on initial cpu is signaled before other cpus have
networking stack constructed collected initialization crashes since it
tries to create a UDP socket on all available cpus when initial one is
ready.
2014-12-09 18:54:56 +02:00
Avi Kivity
7dfd7de8cd future: optimize data-less future<>
A future that does not carry any data (future<>) and its sibling (promise<>)
are heavily used in the code.  We can optimize them by overlaying the
future's payload, which in this case can only be an std::exception_ptr,
with the future state, as a pointer and an enum have disjoint values.

This of course depends on std::exception_ptr being implemented as a pointer,
but as it happens, it is.

With this, sizeof(future<>) is reduced from 24 bytes to 16 bytes.
2014-12-09 10:08:48 +02:00
Asias He
20acb6db9c xen: Fix mismatched signature
Found with clang:

[46/68] CXX build/release/core/xen/evtchn.o
FAILED: clang -MMD -MT build/release/core/xen/evtchn.o -MF
build/release/core/xen/evtchn.o.d -std=gnu++1y   -Wall -Werror
-fvisibility=hidden -pthread -I.  -Wno-mismatched-tags -DHAVE_XEN
-DHAVE_HWLOC -DHAVE_NUMA -O2 -I build/release/gen -c -o
build/release/core/xen/evtchn.o core/xen/evtchn.cc
core/xen/evtchn.cc:83:18: error: 'xen::userspace_evtchn::umask' hides
overloaded virtual function [-Werror,-Woverloaded-virtual]
    virtual void umask(int *port, unsigned count);
                 ^
core/xen/evtchn.hh:38:18: note: hidden overloaded virtual function
'xen::evtchn::umask' declared here: type mismatch at 2nd parameter
('int' vs 'unsigned int')
    virtual void umask(int *port, int count) {};
                 ^
1 error generated.
2014-12-09 09:59:46 +02:00
Asias He
9a9297c89d ip: Implement fragment timeout and memory usage limit 2014-12-09 09:59:44 +02:00
Asias He
89c8c6148f net: Add packet::memory
Add packet::memory() which estimates the memory load (by adding sizeof
packet::impl). Note it will only be accurate after linearize/compact.
2014-12-09 09:59:44 +02:00
Asias He
c03e356873 net: Improve packet::linearize
Free the original memory earlier if copied all of them.
2014-12-09 09:59:43 +02:00
Nadav Har'El
3f2ea82e6d dpdk: rx checksum offloading
If the card supports this (and usually, it does), enable rx checksum
offloading by the card, and avoid calculating the checksums ourselves.

With rx checksum offloading, the card checks in incoming packets the
IP header checksum and the L4 (TCP or UDP) checksum, and gives us a
flag when one of them is wrong, meaning that we do not need to do these
calculations ourselves.

This patch improves memcached performance on my setup by almost 3%.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-12-08 20:41:31 +02:00
Shlomi Livne
2f5644db1b Update README with additional instruction for running DPDK 2014-12-08 16:11:22 +02:00
Avi Kivity
30143fe18d reactor: destroy network_stack after timer infrastructure
The network stack contains a timer, so it must be constructed after the
timer infrastructure and destroyed before it.

Fixes a segfault on shutdown.
2014-12-07 17:37:13 +02:00
Avi Kivity
674076c7bd smp: fix indentation 2014-12-07 17:37:13 +02:00
Avi Kivity
f4d7bd7e00 reactor: register pollers using a RAII class
Avoids leaking a poller.
2014-12-07 17:36:44 +02:00
Avi Kivity
5b7ebc0f6f build: disable string literal warnings when building with dpdk 2014-12-07 17:34:41 +02:00
Vlad Zolotarov
5bc89b974a dpdk: First proper offload features initialization
- Query the port for its caps.
 - Properly adjust the queue numbers according to the caps.
 - Enable RSS only if the final queues number is greater than 1.
 - Enable Rx VLAN stripping.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-07 17:32:36 +02:00
Vlad Zolotarov
5cc8785b96 packet: Added HW VLAN stipping option.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-07 17:32:36 +02:00
Vlad Zolotarov
2d10018870 dpdk: separate the EAL initialization from port initialization
- Create a new class dpdk_eal that initializes DPDK EAL.
 - Get rid of portmask crap and provide a port index to a dpdk::net_device
   constructor.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-07 17:31:12 +02:00
Gleb Natapov
4ade76a182 reactor: add missing std::forward in at_exit() 2014-12-07 16:45:53 +02:00
Avi Kivity
a2016bc1dd ip: fix smp fragment reassembly
ipv4::handle_on_cpu() did not properly convert from network byte order, so
it saw any packets with DF=1 as fragmented.

Fix by applying the proper conversion.
2014-12-07 12:01:31 +02:00
Avi Kivity
2ee0239a4a Merge branch 'tgrabiec/zero-copy-2' of github.com:cloudius-systems/seastar-dev
Zero-copy memcached get from Tomasz:

"I've measured memcached on muninn/huginn to be 7.5% better with this on vhost
stack."
2014-12-04 16:31:04 +02:00
Tomasz Grabiec
8bfca6f740 memcached: convert 'get' to use zero-copy send. 2014-12-04 13:51:35 +01:00
Tomasz Grabiec
e831884c13 tests: add zero copy UDP test
It listens for requests on port 10000 and sends responses comprised of
three chunks of data in one packet. The chunk sizes are specified via
the --chunk-size argument.

The reqest can be anything, its content is ignored.

You can switch to equivalent copying version by passing --copy
argument.
2014-12-04 13:51:35 +01:00
Tomasz Grabiec
c4335c49f6 core: convert output APIs to work on packets
This way zero-copy supporting code can put data directly to packet
object and pass it through all layers efficiently.
2014-12-04 13:51:26 +01:00
Tomasz Grabiec
ba0ac1c2b8 core: simplify write_all()
The only case when write_all() does not write all the data is when the
fiber fails at some point, in which case the resulting future is
failed too.
2014-12-04 13:37:36 +01:00
Tomasz Grabiec
cd3ba33ead core: introduce scattered_message
It's a builder class for creating messages comprised of multiple
fragments.
2014-12-04 13:37:35 +01:00
Tomasz Grabiec
a2ca556836 sstring: introduce release()
Releases owenrship of the data and gives it away as
temporary_buffer. This way we can avoid allocation when putting rvalue
sstring if it's already using external storage. Except we need to
allocate a deleter which uses delete[], but this can be fixed later.
2014-12-04 13:37:35 +01:00
Tomasz Grabiec
72b0794759 packet: add constructor for appending temporary_buffers 2014-12-04 13:37:35 +01:00
Tomasz Grabiec
3a2d74e3d3 packet: add reserve() method 2014-12-04 13:37:35 +01:00
Tomasz Grabiec
f3dada6f1d packet: add constructor for appending deleters
Deleters not always come with fragments. When multiple fragments share
a deleter, first fragments are appended and then one deleter for all
of them.
2014-12-04 13:37:35 +01:00