Commit Graph

43077 Commits

Author SHA1 Message Date
Avi Kivity
1a7fd983ac memory: fix buffer overrun
We store spans in freelist i if the span's size >= 2^i.  However, when
picking a span to satisfy an allocation, we must use the next larger list
if the size is not a power of two, so that we can be sure that all spans on
that list can satisfy that request.

The current code doesn't do that, so it under-allocates, leading to memory
corruption.
2014-11-15 11:52:39 -08:00
Nadav Har'El
5b24dd78e2 virtio: don't use file eventfd for OSv notifications
Now that our reactor supports non-file-descriptor notification
mechanisms, switch to using one instead of eventfd when notifying
of virtio interrupts.

This will allow us to change the OSv enable_interrupt() code to
run the handler directly, not in a separate thread, because it
no longer needs to do sleepable write() to an eventfd file descriptor.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-13 22:24:38 +02:00
Tomasz Grabiec
c262060d92 memcache: avoid vprintf()
Improves memaslap UDP posix throughput on my laptop by 40% (from 73k to 105k).

When item is created we cache flags and size part of the response so
that there's no need to call expensive string formatting in get(). The
down side is that this pollutes "item" object with protocol-specific
field, but since ASCII is the only protocol which is supported now and
it's not like we can't fix it later, I think it's fine.
2014-11-13 22:22:07 +02:00
Tomasz Grabiec
627e14c2e4 sstring: introduce make_sstring()
It concatenates multiple string-like entities in one go and gives away
an sstring. It does at most one allocation for the final sstring and
one copy per each string. Works with heterogenous arguments, both
sstrings and constant strings are supported, string_views are planned.
2014-11-13 22:22:05 +02:00
Tomasz Grabiec
42b20cdad1 test.py: print output from test on error 2014-11-13 22:22:01 +02:00
Nadav Har'El
405f3ea8c3 reactor: refactor main loop for epoll and OSv
The reactor is currently designed around the concept of file descriptors
and polling them. Every source of events is a file descriptor, and those
which are not, like timers, signals and inter-thread notifications, are
"converted" to file-descriptor events using timerfd, signalfd and eventfd
respectively.

But for running OSv with a directly assigned virtio device, we don't want
to use file descriptors for notifications: When we need each interrupt
to signal an eventfd, this is slow, and also problematic because file
descriptors contain locks so we can't signal an eventfd at interrupt
time, causing the existing code to use an extra thread to do this.

So this patch refactors the reactor to allow the main loop to be based
no just on file descriptors, but on a different type of abstractions.
We have a reactor_backend (with epoll and osv implementation), to which we
We don't add "file descriptors" but rather more abstract notions like
timer, signal or "notifier" (similar to eventfd). The Linux epoll
implementation indeed uses file descriptors internally (with timer
using a timerfd, signal using signalfd and notifier using eventfd)
but the OSv implementation does not use file descriptors.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-12 18:15:59 +02:00
Calle Wilund
bfbdbdf29c dhcp: fix assert/crash in DHCP renew cycle.
Must not signal "_config" promise on renew. Also not needed.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-11-11 14:04:00 +02:00
Avi Kivity
067112a319 Merge branch 'tgrabiec/smp'
From Tomasz:

"There will be now a separate DB per core, each serving a subset of the key
space (sharding). From the outside in appears to behave as one DB."
2014-11-11 13:52:59 +02:00
Tomasz Grabiec
6913079927 tests: memcache: do not constrain tests to 1 CPU 2014-11-11 13:52:23 +02:00
Tomasz Grabiec
b0dd9e736c memcached: SMP support
There is a separate DB per core, each serving a subset of the key
space. From the outside in appears to behave as one DB.

item_key type was changed to include the hash so that we calculate the
hash only once. The same hash is used for sharding and hashing. No
need for store_hash<> option on unordered_set<> any more.

Some seastar-specific and hashtable-specific stats were moved from the
general "stats" command into "stats hash", which shows per-core
statistics.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
a82b2beb32 core: add shutdown hook registration facility
Use like this:

  engine.at_exit([] {
     std::cout << "so long!\n";
     return make_ready_future<>();
  });

All lambdas will be executed when reactor is stopped, in order, on the
same CPU on which they were registerred.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
95e09be799 net: add has_per_core_namespace() attribute to network stack
POSIX stack does not allow one to bind more than one socket to given
port. Native stack on the other hand does. The way services are set up
depends on that. For instance, on native stack one might want to start
the service on all cores, but on POSIX stack only on one of them.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
b647bb5746 smp: introduce distributed::start_single()
Services which create UDP sockets on the same port on POSIX stack can
have only one instance. This decision needs to be made at run-time.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
618cbd5729 smp: introduce foreign_ptr<>
A smart pointer wrapper which deletes the pointer on the CPU on which
it was wrapped.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
0b4ee2ff60 core: advertise element type in shared_ptr<>
Other smart pointers also do that. Will help foreign_ptr<>.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
a77ecbeeef smp: introduce distributed::invoke_on_all() overload for void-returning functions 2014-11-11 13:52:23 +02:00
Tomasz Grabiec
c71f762f59 smp: introduce distributed::local() 2014-11-11 13:52:23 +02:00
Tomasz Grabiec
79982a8545 smp: add distributed::invoke_on() overload for void-returning functions 2014-11-11 13:52:23 +02:00
Tomasz Grabiec
8bbe285004 smp: improve forwarding of arguments in distributed::invoke_on()
It is now capable of moving r-values rather than copying them.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
1988748885 smp: introduce distributed::map_reduce() 2014-11-11 13:52:23 +02:00
Tomasz Grabiec
7e25d70392 core: introduce map_reduce() utility
It spawns async mapping action in parallel and reduces the results as
they come.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
6df3a03c0a core: make submit_to() accept functions which return non-futures
This adds an overload which will automatically wrap non-future
non-void result in a ready future. Pro: less boiler plate code at call
sites.
2014-11-11 13:52:23 +02:00
Tomasz Grabiec
c2fbfe8e84 core: destroy network stack before destroying timer lists.
Fixes assert failure during ^C:
   #0  0x0000003e134348c7 in raise () from /lib64/libc.so.6
   #1  0x0000003e1343652a in abort () from /lib64/libc.so.6
   #2  0x0000003e1342d46d in __assert_fail_base () from /lib64/libc.so.6
   #3  0x0000003e1342d522 in __assert_fail () from /lib64/libc.so.6
   #4  0x0000000000409a7c in boost::intrusive::list_impl<boost::intrusive::mhtraits<timer, boost::intrusive::list_
        at /usr/include/boost/intrusive/list.hpp:1263
   #5  0x00000000004881cc in iterator_to (this=<optimized out>, value=...) at core/timer-set.hh:71
   #6  reactor::del_timer (this=<optimized out>, tmr=tmr@entry=0x60000005cda8) at core/reactor.cc:287
   #7  0x00000000004682a5 in ~timer (this=0x60000005cda8, __in_chrg=<optimized out>) at ./core/reactor.hh:974
   #8  ~resolution (this=0x60000005cd90, __in_chrg=<optimized out>) at net/arp.hh:86
   #9  ~pair (this=0x60000005cd88, __in_chrg=<optimized out>) at /usr/include/c++/4.9.2/bits/stl_pair.h:96
2014-11-11 13:52:23 +02:00
Calle Wilund
c3ba7a73bb dhcp: actually ensure that packets are processed on cpu 0
Previous code (or lack thereof) hoped to achieve this.
Not quite successfully.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-11-10 17:09:27 +02:00
Nadav Har'El
63fb31a8be README: another missing package
We use "-lpciaccess", so need to install libpciaccess-dev on Ubuntu

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-10 16:39:15 +02:00
Nadav Har'El
4298ad2a3c README: explain how to install missing pieces on Ubuntu 12.04
Say which prerequisites to install on Ubuntu 12.04, and how to set up
gcc 4.9 side-by-side with the existing gcc 4.8 (without harming the
existing gcc 4.8 installation).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-10 16:06:46 +02:00
Gleb Natapov
c908d5508e smp: do not reorder tasks submitted to smp queue
Currently semaphore is used to keep track off free space in smp queue,
but our semaphore does not guaranty that order in which tasks call wait()
will be the same order they will get access to a resource. This may cause
packet reordering in smp which is not desirable for TCP performance. This
patch replaces the semaphore with a simple counter and another queue to
hold items that cannot be places into smp queue due to lack of space.
2014-11-10 15:58:48 +02:00
Asias He
e2b1186cca net: Add more tcp and ip header const
net::tcp_hdr_len_min
net::ipv4_hdr_len_min
net::ipv6_hdr_len_min

InetTraits::ip_hdr_len_min is added to handle both ipv4 and ipv6.
2014-11-10 10:17:49 +02:00
Asias He
7260d7b9de tcp: Out of order input support
Tested with emulated packet reordering using tc and tcp_server rx test:

sudo tc qdisc add dev tap0 root netem delay 100ms reorder 25% 50%
2014-11-10 10:01:06 +02:00
Asias He
ead391491d net: Add rx test in tcp_server 2014-11-10 10:01:05 +02:00
Gleb Natapov
2a56c52fcb net: distribute udp packets according to address pair 2014-11-09 18:17:54 +02:00
Gleb Natapov
c64e1e27fb net: move connid out of tcp to be reused for udp 2014-11-09 18:17:44 +02:00
Gleb Natapov
25da340e07 net: remove rx feedback from proxy net device
99941f0c16 did that for virtio, do the
same for proxy here.
2014-11-09 18:07:14 +02:00
Gleb Natapov
136a56859f net: limit the number of packets that are waiting to be sent to another cpu
If packet arrive faster than they can be forwarded we can run out of
memory.
2014-11-09 18:06:22 +02:00
Nadav Har'El
fcce304908 collectd: Don't use the network stack before it is set up
The current code (this will change soon with my reactor patches)
constructs a default (Posix) network stack before reactore::configure()
reassigns it to the requested network stack.

It turns out there is one place we use the network stack before calling
reactore::configure(), which ends up using the Posix stack even though
we want the native stack - this is both silly and plainly doesn't work on
the OSv setup.

The problem is that app_template.hh tries to configure scollectd before
the engine is started. This calls scollectd::impl::start() which calls
engine.net().make_udp_channel(). When this happens this early, it creates
a Posix socket...

This patch moves the scollectd configuration to after the engine is
started. It makes sense to me: As far as I understand, scollectd is all
about sending packets (diagnostic packets), and it's kind of silly to
start sending packets before starting the machinary which allows us to
send packets.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
[avi: use customary indentation, remove unneeded make_ready_future()]
2014-11-09 17:46:09 +02:00
Tomasz Grabiec
39a688d173 memcache: udp: remove dead code 2014-11-09 17:03:37 +02:00
Tomasz Grabiec
6611160db1 smp: fix distributed::stop()
One problem was that 'inst' was const, another was that the vector was
not cleared which made ~distributed() to complain.
2014-11-09 16:47:16 +02:00
Tomasz Grabiec
48d57a6cd9 core: make distributed::start() capable of forwarding references 2014-11-09 16:34:14 +02:00
Tomasz Grabiec
b6511ce3f4 core: add future::discard_result()
Use when you don't want to care about the result and just want to
return a future<>.

The current implementation may not be the most optimal way to do it
but it can be improved later if there's need.
2014-11-09 16:33:34 +02:00
Tomasz Grabiec
761d6119ef posix: simplify uses of setsockopt 2014-11-09 16:33:33 +02:00
Tomasz Grabiec
bf774e1b92 posix: make setsockopt accept value via universal reference
It's more convenient for users that way. If someone wants to pass a
reference, we use a reference. If he passes an r-value, we accept it
and use parameter l-value instead.
2014-11-09 16:33:33 +02:00
Gleb Natapov
2ac24ced66 smp: smp queues idle polling
This patch adds "smp queue polling before going idle" to the reactor.
It allows to avoid signalfd overhead in case receiver thread is not idle
when message is sent. With this patch on top of two other patches from
me that are still waiting to be committed I see 450120 Requests/sec with
wrk and "httpd -c 2 --network-stack native" with native stack. With one
cpu the result is 316002, so we have around 40% scaling. The bottleneck
in this test is cpu 0 which takes 100% cpu time.
2014-11-09 16:26:27 +02:00
Avi Kivity
f265fe5ecd xen: allow disabling the split-event-channel feature for debugging 2014-11-09 16:19:37 +02:00
Avi Kivity
59a7eeeea0 dhcp: retry
Some bridges delay forwarding until some time has passed, which requires
DHCP retries.
2014-11-09 16:13:25 +02:00
Avi Kivity
adc97c0162 dhcp: filter out DHCP failures
If we don't, we start the system before we have an IP address, and when
we actually do get the IP address, we fail an assert on the _config promise,
which was already fulfilled.
2014-11-09 15:03:07 +02:00
Avi Kivity
5bb13601fe xen: wrap in "xen" namespace
Names like "port" are too generic for the global namespace.
2014-11-09 14:41:01 +02:00
Avi Kivity
fede31896c xen: mark port's constructor as explicit
Prevent accidental construction.
2014-11-09 14:36:06 +02:00
Avi Kivity
14968812fe xen: remove port::operator int()
It's dangerous as it can be invoked in unexpected places.
2014-11-09 14:34:25 +02:00
Avi Kivity
16b0013c6b xen: add port destructor
De-register from the port list.

Add a FIXME for unbinding the port from Xen.
2014-11-09 13:34:30 +02:00
Avi Kivity
46aac42704 xen: make 'port' a value object
Makes it easier of users to manage its lifetime.
2014-11-09 13:30:52 +02:00