Commit Graph

53948 Commits

Author SHA1 Message Date
Gleb Natapov
af91a11b2b net: implement virtual packet queue in net::proxy::send()
Currently net::proxy::send() waits for previous packet to be sent to
another cpu before accepting next packet.  This slows down sender to much.
This patch implement virtual queue that allows many packets to be send
simultaneously but it starts to drop packets when number of outstanding
packets goes over the limit, if we will try to queue them we will run
out of memory if a sender generates packets faster that they can be
sent. It also tries to generate back pressure by returning a feature
that will become ready when queue goes under the limit, but it returns
it only for a first sender that goes over it. Doing this for all callers
may be an option too, but it is not clear which one and how many should
we wake when queue goes under the limit again.
2014-10-30 18:36:08 +02:00
Avi Kivity
d9992ee98c Merge branch 'osv'
Nadav writes:

"This patch set allows Seastar's "native" stack to work over OSv, with OSv
"assigning" the virtio queue directly to Seastar's control - as well as
keeping the existing support for vhost (for running Seastar in a Linux
host or guest).

When Seastar is compiled with the "HAVE_OSV" flag, it uses the api in
<osv/virtio-assign.hh> (so don't forget the appropriate "-I" as well)
to make use of a virtio device assigned to it by OSv. At run time,
Seastar uses either this OSv interface, or the Linux vhost interface,
depending on what's available.

The current code works, but for the sake of quickly producing something
working, I made two compromises which will need to be fixed later:

1. The virt_to_phys() function has become a virtual function, slowing
   it down. We need to measure how much this matter, and if it does,
   switch to templates...

2. The host-to-guest notification is done in a very inefficient matter:
   We catches the host's interrupt, wake up a thread which then wakes
   up an eventfd which is noticed by Seastar's epoll event loop.
   We need that silly extra thread because eventfd signalling is not
   lock-free so cannot be done by an interrupt handler."
2014-10-30 16:47:39 +02:00
Nadav Har'El
b1964be121 build: add "--with-osv=..." configuration option
Add a "--with-osv=<path>" option to configure.py as a shortcut to
the long list of options needed to compile Seastar for OSv (as
explained in README-OSv)

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:47:24 +02:00
Nadav Har'El
f497299f44 virtio: support virtio ring assigned from OSv
As a second option beyond running on Linux with vhost, this patch
allows Seastar to run in OSv with the virtio network device "assigned"
to the application (i.e., we use the virtio rings directly, with no OSv
involvement beyond the initial setup).

To use this feature, one needs to compile Seastar with the "HAVE_OSV"
flag, the osv::assigned_virtio::get() symbol needs to be available
(which means we run under OSv), and it should return a non-null object
(which means the OSv was run with --assign-net).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:08 +02:00
Nadav Har'El
4b44968e86 virtio: expose notifier's wake_wait
The wake_wait() method is only available for the notifier. Expose it
from the vring holding this notifier, and from the rx or tx queue holding
this vring.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:07 +02:00
Nadav Har'El
5db5f7622a virtio: make virtio_net_device an abstract class
Make virtio_net_device an abstract class, and move the vhost-specific
code to a subclass, virtio_net_device_vhost.

In a subsequent patch, we'll have a second subclass, for a virtio
device assigned from OSv.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:07 +02:00
Nadav Har'El
8326f43ded virtio: make virt_to_phys a virtual function
In the existing code, virt_to_phys() was a fixed do-nothing function.
This is good for vhost, but not good enough in OSv where the to convert
virtual addresses to physical we need an actual calculation.

The solution in this patch, using a virtual function, is not optimal
and should probably be replaced with a template later.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:06 +02:00
Nadav Har'El
db16e4f634 virtio: separate notification from vring
Currently, the "vring" class is hardcoded to do guest-host notifications
via eventfd. This patch switches to a general "notification object" with
two virtual functions - host_notify(), which unconditionally notifies the
host, and host_wait() which returns a future<> on which one can wait for
the host to notify us.

This patch provides one implementation of this notification object, using
eventfd as before, as needed when using vhost. We'll later provide a
different implementation for running under OSv.

This patch uses pointers and virtual functions; This adds a bit of
overhead to every notification, but it is small compared to the other
costs of these notifications. Nevertheless, we can change it in the
future to make the notification object a template parameter instead of
an abstract class.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-30 16:45:06 +02:00
Avi Kivity
6f48e27bfd httpd: simplify connection termination
The current implementation uses a sort of "manual reference counting"; any
place which may be the last one using the connection checks if it is indeed
the last one, and if so, deletes the connection object.

With recent changes this has become unwields, as there are too many cases
to track.

To fix, we separate the connection into two streams: a read() stream that
is internally serialized (only one request is parsed at a time) and that
returns when there are no more requests to parse, and a respond() stream
that is also internally serialized, and termiantes when the last response
has been written.  The caller then waits on the two streams with when_all().
2014-10-30 14:09:47 +02:00
Avi Kivity
79e8497e1d queue: add size() accessor 2014-10-30 14:08:23 +02:00
Avi Kivity
3d414111eb future: make .rescue() require an rvalue reference for its future
This makes it harder to misuse.
2014-10-30 14:07:42 +02:00
Avi Kivity
7f91f1b937 future: add when_all(future...)
when_all(f1, f2) returns a future that becomes ready when all input futures
are ready.  The return value is a tuple with all input futures, so the values
and exceptions can be accessed.
2014-10-30 13:59:17 +02:00
Avi Kivity
c4bc67414e future: add then_wrapped()
Unlike future::then(), which unwraps the value, then_wrapped() keeps it
wrapped in a future<>, so if it is exceptional, it can still be accessed.

This is similar to the proposed std::future::then(), so we should later
rename it to match (and rename the existing future::then() to future::next().
2014-10-30 13:55:31 +02:00
Avi Kivity
fa7ea4f86e Revert "tcp: Retransmission support"
This reverts commit 71ecf7650a - it leaks
memory like crazy.
2014-10-30 13:46:10 +02:00
Tomasz Grabiec
c5b7bbf37f net: udp: do not use packet data in native_datagram's methods
It's too easy to shoot yourself in the foot when trying to call
get_src() after packet data was moved.

Reported-by: Asias He <asias@cloudius-systems.com>
2014-10-30 12:40:15 +02:00
Tomasz Grabiec
ca85016556 tests: memcache: fix regex
Some versions of python do not tolerate this regex:

   r'(\w*)?'

ERROR: test_incr (__main__.TestCommands)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/memcached/test_memcached.py", line 464, in test_incr
    self.assertRegexpMatches(call('incr key 1\r\n').decode(), r'0(\w*)?\r\n')
  File "/usr/lib64/python3.3/unittest/case.py", line 1178, in deprecated_func
    return original_func(*args, **kwargs)
  File "/usr/lib64/python3.3/unittest/case.py", line 1153, in assertRegex
    expected_regex = re.compile(expected_regex)
  File "/usr/lib64/python3.3/re.py", line 214, in compile
    return _compile(pattern, flags)
  File "/usr/lib64/python3.3/re.py", line 281, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib64/python3.3/sre_compile.py", line 498, in compile
    code = _code(p, flags)
  File "/usr/lib64/python3.3/sre_compile.py", line 483, in _code
    _compile(code, p.data, flags)
  File "/usr/lib64/python3.3/sre_compile.py", line 75, in _compile
    elif _simple(av) and op is not REPEAT:
  File "/usr/lib64/python3.3/sre_compile.py", line 362, in _simple
    raise error("nothing to repeat")
sre_constants.error: nothing to repeat
2014-10-30 10:17:52 +02:00
Calle Wilund
4b4f33c1ba collectd: add a few counters to reactor
Might be slightly useful for monitoring, and might also serve as an
example.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-29 19:23:42 +02:00
Tomasz Grabiec
95975151f6 virtio: change descriptor free list to FIFO instead of LIFO
Based on observation that with packets comprised of multiple fragments
vhost_get_vq_desc() goes higher in CPU profile. Avi suggested that the
current LIFO handling of free descriptors causes contention on cache
lines between seastar on vhost.

Gives 6-10% boost depending on hardware.
2014-10-29 19:19:54 +02:00
Asias He
c16384a9fd tcp: Delete connection when <RST> is received
wrk might send <RST> instead of <FIN> to close a connection.
2014-10-29 10:03:34 +02:00
Asias He
71ecf7650a tcp: Retransmission support
Use very simple algorithm as a starter.
2014-10-29 10:03:32 +02:00
Gleb Natapov
1c827805bc virtio: Use correct eventfd for virtio rx queue
It is nice to be able to actually kick rx queue from time to time.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-10-28 16:47:26 +02:00
Asias He
04dd72efd3 tcp: Support close initiated on server side for posix stack
Fix hang with ab test on posix stack:

ab -n 1000 http://127.0.0.1/

Fixes #3
2014-10-28 12:42:47 +02:00
Tomasz Grabiec
57861d39ce memcache: add 'stats hash' command
Prints some details about hashtable usage.
2014-10-28 12:33:00 +02:00
Calle Wilund
1891a7ab1f collectd: add gauge (absolute value) counter as well for #registered
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-28 11:09:15 +02:00
Calle Wilund
66c2a62259 collectd: use low-res timestamps
Since hi-res seems to work poorly at least on my fedora and are a bit of
an overkill anyway.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-28 11:09:09 +02:00
Calle Wilund
774e48e42e collectd: enforce encoding as double/uint64_t depending on data_type
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-28 11:09:04 +02:00
Calle Wilund
030709a3c9 collectd: fix bug with function value binding
Remove unneeded code in send loop (values don't need to be copied).

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-28 11:08:57 +02:00
Calle Wilund
abb15db28f collectd: ensure the protocol writer is consistent ref. daemon
Also make it less talkative (lower byte overhead) by keeping track of IDs
sent.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-28 11:08:45 +02:00
Calle Wilund
8619bf2ba7 collectd: Modify the scollectd modules own counters to use well-known types
So it can be consumed by unmodified collectd server.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-28 11:08:19 +02:00
Calle Wilund
b42ec4caea collectd - typo in value list composition
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2014-10-28 11:07:52 +02:00
Avi Kivity
b1ec66900f memcached: reindent ascii parser test 2014-10-28 11:04:45 +02:00
Avi Kivity
90cb9376ab memcached: enhance test to check for differently scattered packets
Use funky indentation to reduce diff size; can be adjutsed later.
2014-10-28 11:01:37 +02:00
Avi Kivity
929f714e4c net: allow constructing a packet from const data
Since we're copying the data, we can call const_cast<> without fear.
2014-10-28 11:01:37 +02:00
Avi Kivity
d5675c32a7 net: add ostream support for packet
print your packets with

   print("got packet: %s\n", p);

!
2014-10-28 11:01:37 +02:00
Avi Kivity
ae7c071a01 net: fix packet::append with internal data 2014-10-28 11:01:37 +02:00
Asias He
fd56c6345c net: Remove leftover code in packet.hh 2014-10-28 10:42:18 +02:00
Avi Kivity
6dcf24f98d Move contents of async-action.hh into future-util.hh 2014-10-27 19:28:10 +02:00
Tomasz Grabiec
c6545bf2df tests: add another test case for future::forward_to() 2014-10-27 15:58:59 +02:00
Tomasz Grabiec
eb84a3b78b core: fix future::forward_to()
It did not handle properly the case when the target promise's future gets dead
without installing a callback or the future was never installed. The
mishanlding of the former case was causing httpd to abort on SMP.
2014-10-27 15:58:57 +02:00
Gleb Natapov
0aee62c4ae smp: forward exception thrown by RPC callback back to a caller
If an exception is uncaught here it prevents smp_message_queue::listen()
from tail call itself so further RPC stops working.
2014-10-27 10:55:18 +02:00
Gleb Natapov
bfb1d17843 reactor: remove unused local variables 2014-10-26 18:35:56 +02:00
Avi Kivity
8818af1c23 core: move semaphore class into its own file 2014-10-26 15:52:01 +02:00
Avi Kivity
5fef739544 Merge branch 'distributed'
Infrastructure for services distributed across cpus.
2014-10-26 14:35:19 +02:00
Avi Kivity
3e4e2344b8 Merge branch 'semaphore'
Semaphore speedups.
2014-10-26 14:35:07 +02:00
Avi Kivity
5e4f649a57 semaphore: switch list -> circular_buffer
circular_buffer is much more efficient, since allocations are amortized.
2014-10-26 14:34:56 +02:00
Avi Kivity
0d745abf69 future: sprinke noexcept everywhere
When used correctly, noexcept allows containers to optimize their reallocation
code.
2014-10-26 14:34:56 +02:00
Avi Kivity
4af8036677 http: convert to use distributed<> infrastructure 2014-10-26 13:34:31 +02:00
Avi Kivity
e7ce27ea32 smp: add distributed<>, infrastructure for distributed service
Summary:

  distributed<my_service> dist;

  dist.start() - constructs my_service on all cpus
  dist.stop() - destroys previously constructed instances
  dist.invoke(cpu, &my_service::method, args...) - run method on one cpu
  dist.invoke_on_all(cpu, &my_service::method, args...) - run method on all cpus
2014-10-26 13:34:31 +02:00
Avi Kivity
82321d435f future: add parallel_for_each() helper
Runs functions in paralllel, and returns a future<> that becomes ready
when all are complete.
2014-10-26 13:34:31 +02:00
Avi Kivity
2639c284f3 reactor: add cpu_id() accessor 2014-10-26 11:25:16 +02:00