Currently net::proxy::send() waits for previous packet to be sent to
another cpu before accepting next packet. This slows down sender to much.
This patch implement virtual queue that allows many packets to be send
simultaneously but it starts to drop packets when number of outstanding
packets goes over the limit, if we will try to queue them we will run
out of memory if a sender generates packets faster that they can be
sent. It also tries to generate back pressure by returning a feature
that will become ready when queue goes under the limit, but it returns
it only for a first sender that goes over it. Doing this for all callers
may be an option too, but it is not clear which one and how many should
we wake when queue goes under the limit again.
Nadav writes:
"This patch set allows Seastar's "native" stack to work over OSv, with OSv
"assigning" the virtio queue directly to Seastar's control - as well as
keeping the existing support for vhost (for running Seastar in a Linux
host or guest).
When Seastar is compiled with the "HAVE_OSV" flag, it uses the api in
<osv/virtio-assign.hh> (so don't forget the appropriate "-I" as well)
to make use of a virtio device assigned to it by OSv. At run time,
Seastar uses either this OSv interface, or the Linux vhost interface,
depending on what's available.
The current code works, but for the sake of quickly producing something
working, I made two compromises which will need to be fixed later:
1. The virt_to_phys() function has become a virtual function, slowing
it down. We need to measure how much this matter, and if it does,
switch to templates...
2. The host-to-guest notification is done in a very inefficient matter:
We catches the host's interrupt, wake up a thread which then wakes
up an eventfd which is noticed by Seastar's epoll event loop.
We need that silly extra thread because eventfd signalling is not
lock-free so cannot be done by an interrupt handler."
Add a "--with-osv=<path>" option to configure.py as a shortcut to
the long list of options needed to compile Seastar for OSv (as
explained in README-OSv)
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
As a second option beyond running on Linux with vhost, this patch
allows Seastar to run in OSv with the virtio network device "assigned"
to the application (i.e., we use the virtio rings directly, with no OSv
involvement beyond the initial setup).
To use this feature, one needs to compile Seastar with the "HAVE_OSV"
flag, the osv::assigned_virtio::get() symbol needs to be available
(which means we run under OSv), and it should return a non-null object
(which means the OSv was run with --assign-net).
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
The wake_wait() method is only available for the notifier. Expose it
from the vring holding this notifier, and from the rx or tx queue holding
this vring.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Make virtio_net_device an abstract class, and move the vhost-specific
code to a subclass, virtio_net_device_vhost.
In a subsequent patch, we'll have a second subclass, for a virtio
device assigned from OSv.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
In the existing code, virt_to_phys() was a fixed do-nothing function.
This is good for vhost, but not good enough in OSv where the to convert
virtual addresses to physical we need an actual calculation.
The solution in this patch, using a virtual function, is not optimal
and should probably be replaced with a template later.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Currently, the "vring" class is hardcoded to do guest-host notifications
via eventfd. This patch switches to a general "notification object" with
two virtual functions - host_notify(), which unconditionally notifies the
host, and host_wait() which returns a future<> on which one can wait for
the host to notify us.
This patch provides one implementation of this notification object, using
eventfd as before, as needed when using vhost. We'll later provide a
different implementation for running under OSv.
This patch uses pointers and virtual functions; This adds a bit of
overhead to every notification, but it is small compared to the other
costs of these notifications. Nevertheless, we can change it in the
future to make the notification object a template parameter instead of
an abstract class.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
The current implementation uses a sort of "manual reference counting"; any
place which may be the last one using the connection checks if it is indeed
the last one, and if so, deletes the connection object.
With recent changes this has become unwields, as there are too many cases
to track.
To fix, we separate the connection into two streams: a read() stream that
is internally serialized (only one request is parsed at a time) and that
returns when there are no more requests to parse, and a respond() stream
that is also internally serialized, and termiantes when the last response
has been written. The caller then waits on the two streams with when_all().
when_all(f1, f2) returns a future that becomes ready when all input futures
are ready. The return value is a tuple with all input futures, so the values
and exceptions can be accessed.
Unlike future::then(), which unwraps the value, then_wrapped() keeps it
wrapped in a future<>, so if it is exceptional, it can still be accessed.
This is similar to the proposed std::future::then(), so we should later
rename it to match (and rename the existing future::then() to future::next().
It's too easy to shoot yourself in the foot when trying to call
get_src() after packet data was moved.
Reported-by: Asias He <asias@cloudius-systems.com>
Some versions of python do not tolerate this regex:
r'(\w*)?'
ERROR: test_incr (__main__.TestCommands)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/memcached/test_memcached.py", line 464, in test_incr
self.assertRegexpMatches(call('incr key 1\r\n').decode(), r'0(\w*)?\r\n')
File "/usr/lib64/python3.3/unittest/case.py", line 1178, in deprecated_func
return original_func(*args, **kwargs)
File "/usr/lib64/python3.3/unittest/case.py", line 1153, in assertRegex
expected_regex = re.compile(expected_regex)
File "/usr/lib64/python3.3/re.py", line 214, in compile
return _compile(pattern, flags)
File "/usr/lib64/python3.3/re.py", line 281, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib64/python3.3/sre_compile.py", line 498, in compile
code = _code(p, flags)
File "/usr/lib64/python3.3/sre_compile.py", line 483, in _code
_compile(code, p.data, flags)
File "/usr/lib64/python3.3/sre_compile.py", line 75, in _compile
elif _simple(av) and op is not REPEAT:
File "/usr/lib64/python3.3/sre_compile.py", line 362, in _simple
raise error("nothing to repeat")
sre_constants.error: nothing to repeat
Based on observation that with packets comprised of multiple fragments
vhost_get_vq_desc() goes higher in CPU profile. Avi suggested that the
current LIFO handling of free descriptors causes contention on cache
lines between seastar on vhost.
Gives 6-10% boost depending on hardware.
It did not handle properly the case when the target promise's future gets dead
without installing a callback or the future was never installed. The
mishanlding of the former case was causing httpd to abort on SMP.
Summary:
distributed<my_service> dist;
dist.start() - constructs my_service on all cpus
dist.stop() - destroys previously constructed instances
dist.invoke(cpu, &my_service::method, args...) - run method on one cpu
dist.invoke_on_all(cpu, &my_service::method, args...) - run method on all cpus