Commit Graph

148 Commits

Author SHA1 Message Date
Takuya ASADA
902b5b00a4 Make pollable_fd::_s as private variable
We can use pollable_fd::writeable/readable instead.

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-01-08 12:07:16 +02:00
Avi Kivity
c95e452a3a Merge branch 'directory'
Directory listing support, using subscription<sstring> to represent the
stream of file names produced by the directory lister running in parallel
with the directory consumer.
2015-01-08 11:14:52 +02:00
Avi Kivity
3be04e7009 reactor: implement open_directory(), list_directory()
open_directory() is similar to open_file_dma() with just the O_ flags adjusted.

list_directory() returns a subscription(), so that both the producer and
the consumer can be asynchronous.
2015-01-08 11:09:25 +02:00
Takuya ASADA
b9a2541c7e Add reactor::connect(), client_socket definition and network stack stub code
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-01-08 01:26:36 +09:00
Takuya ASADA
7730ef29cd Add readable()/writeable() method on pollable_fd 2015-01-08 01:26:36 +09:00
Takuya ASADA
24820543ff Add reactor::posix_connect()
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-01-08 01:26:30 +09:00
Gleb Natapov
6ad9114c0b reactor: add at_destroy() function to the reactor and use it
Unfortunately at_exit() cannot be used to delete objects since when
it runs the reactor is still active and deleted object may still been used.
We need another API that runs its task after reactor is already stopped.
at_destroy() will be such api.
2014-12-30 15:21:10 +02:00
Gleb Natapov
efd6c33af0 reactor: remove no longer needed thread_pool function 2014-12-28 18:19:20 +02:00
Gleb Natapov
3eefc1ada2 reactor: replace thread_pool poller with signal notification
Drops one poller since now signal poller is used to process thread_pool
completions.
2014-12-28 18:19:18 +02:00
Gleb Natapov
367bbd75f3 reactor: move epoll to its own poller
Enable it only if there is fd to poll.
2014-12-28 14:54:43 +02:00
Gleb Natapov
889cc69a28 reactor: remove non polling mode 2014-12-28 14:54:43 +02:00
Gleb Natapov
7d3fb282c5 reactor: poll thread pool for completion instead of using eventfd 2014-12-28 14:54:43 +02:00
Gleb Natapov
a57e75ab9f reactor: move timers to use signals instead of timerfd to signal a completion 2014-12-28 14:54:43 +02:00
Gleb Natapov
51b56d90f2 reactor: poll for signals completion instead of using signalfd 2014-12-28 14:54:43 +02:00
Gleb Natapov
3d374110c6 reactor: poll for aio completion instead of using eventfd
io_getevents() avoids system call if timeout is zero and there is no
completed event.
2014-12-28 13:39:59 +02:00
Gleb Natapov
466acedcb2 timer: cancel all timers during reactor destruction
If a timer is not canceled it will try to cancel itself during
destruction which may happen after engine is already destroyed.
2014-12-25 09:14:42 +02:00
Vlad Zolotarov
ddf239a943 dpdk: Move the scattered DPDK EAL initialization into the dpdk::eal.
- Move the smp::dpdk_eal_init() code into the dpdk::eal::init() where it belongs.
 - Removed the unused "opts" parameter of dpdk::dpdk_device constructor - all its usage
   has been moved to dpdk::eal::init().
 - Cleanup in reactor.cc: #if HAVE_DPDK -> #ifdef HAVE_DPDK; since we give a -DHAVE_DPDK
   option to a compiler.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-22 17:36:49 +02:00
Gleb Natapov
b958a44304 smp: create seastar threads using DPDK when compiled with DPDK support
DPDK initialization creates its own threads and assumes that application
uses them, otherwise things do not work correctly (rte_lcore_id()
returns incorrect value for instance). This patch uses DPDK threads to
run seastar main loop making DPDK APIs work as expected.
2014-12-18 14:43:37 +02:00
Avi Kivity
c09694d76c reactor: fix corruption in _pollers
register_poller() (and unregister_poller()) adjusts _pollers, but it may be
called while iterating it, and since std::vector<> mutations invalidate
iterators, corruption occurs.

Fix by deferring manipulation of _pollers into a task, which is executed at
a time where _pollers is not touched.
2014-12-17 16:56:20 +02:00
Avi Kivity
481080feb8 reactor: refactor reactor::poller
Currently, reactor::_pollers holds reactor::poller pointers; since these
are movable types, it's hard to maintain _pollers, as the pointers can keep
changing.

Refactor poller so that _pollers points at an internal type, which does not
move when a reactor::poller moves.  This requires getting rid of
std::function, since it lacks a comparison operator.
2014-12-17 16:20:52 +02:00
Asias He
42c4085f29 timer: Introduce lowres_clock 2014-12-15 19:39:33 +08:00
Asias He
0242d402b7 timer: Drop Clock template parameter in time_set 2014-12-15 19:39:33 +08:00
Asias He
62fff15e54 timer: Make timer a template 2014-12-15 19:39:33 +08:00
Avi Kivity
c56dcaf17a smp: fix cross-cpu access in poll mode
We look at _poll mode in another cpu's cache accidentally, as pard of
the peer->idle() call.

Fix by looking at our own _poll variable first; they should all be the same.
2014-12-15 11:30:06 +02:00
Avi Kivity
4ab36be8c9 reactor: fix pointless allocation in wait_and_process()
wait_and_process() expects an std::function<>, but we pass it a lambda,
forcing it to allocate.

Prepare the sdt::function<> in advance, so it can pass by reference.
2014-12-14 15:58:56 +02:00
Avi Kivity
04488eebea smp: batch messages across smp request/response queues
Instead of incurring the overhead of pushing a message down the queue (two
cache line misses), amortize of over 16 messages (3/4 cache line misses per
batch).

Batch size is limited by poll frequency, so we should adjust that
dynamically.
2014-12-11 19:20:50 +02:00
Avi Kivity
2717ac3c37 smp: improve _pending_fifo flushing
Instead of flushing pending items one by one, flush them all at once,
amortizing the write to the index.
2014-12-11 19:20:50 +02:00
Avi Kivity
b6485bcb7c smp: initialize _pending_fifo on sending cpu
If it needs to be resized, it will cause a deallocation on the wrong cpu,
so initialize it on the sending cpu.

Does not break with circular_buffer<>, but it's not going to be a
circular_buffer<> for long.
2014-12-11 19:20:50 +02:00
Gleb Natapov
34a8744fd3 smp: wait for all cpus before signaling start promise
If start promise on initial cpu is signaled before other cpus have
networking stack constructed collected initialization crashes since it
tries to create a UDP socket on all available cpus when initial one is
ready.
2014-12-09 18:54:56 +02:00
Avi Kivity
30143fe18d reactor: destroy network_stack after timer infrastructure
The network stack contains a timer, so it must be constructed after the
timer infrastructure and destroyed before it.

Fixes a segfault on shutdown.
2014-12-07 17:37:13 +02:00
Avi Kivity
674076c7bd smp: fix indentation 2014-12-07 17:37:13 +02:00
Avi Kivity
f4d7bd7e00 reactor: register pollers using a RAII class
Avoids leaking a poller.
2014-12-07 17:36:44 +02:00
Gleb Natapov
4ade76a182 reactor: add missing std::forward in at_exit() 2014-12-07 16:45:53 +02:00
Avi Kivity
2ee0239a4a Merge branch 'tgrabiec/zero-copy-2' of github.com:cloudius-systems/seastar-dev
Zero-copy memcached get from Tomasz:

"I've measured memcached on muninn/huginn to be 7.5% better with this on vhost
stack."
2014-12-04 16:31:04 +02:00
Tomasz Grabiec
c4335c49f6 core: convert output APIs to work on packets
This way zero-copy supporting code can put data directly to packet
object and pass it through all layers efficiently.
2014-12-04 13:51:26 +01:00
Tomasz Grabiec
ba0ac1c2b8 core: simplify write_all()
The only case when write_all() does not write all the data is when the
fiber fails at some point, in which case the resulting future is
failed too.
2014-12-04 13:37:36 +01:00
Tomasz Grabiec
bcea3a67ca output_stream: support for output packet trimming
For UDP memcached we cannot generate arbitrarily large chunks, we need
to trim to datagram size. It's most efficient to split in the
output_stream.
2014-12-03 20:02:21 +01:00
Tomasz Grabiec
4b7c42a5c7 output_stream: fix bug in write()
When coalescing large buffer with buffered data _end was not updated
so flush() would yield shorter packet.
2014-12-03 20:02:21 +01:00
Tomasz Grabiec
6ae5177c2c output_stream: do not allocate on flush()
In UDP memcached flush() is always the last operation on
outpout_stream, so that allocation is wasted.
2014-12-03 20:02:21 +01:00
Gleb Natapov
4fd3313e3e reactor: add "--poll" command line switch
If the switch is used reactor never goes idle.
2014-12-03 14:37:49 +02:00
Gleb Natapov
d151763967 reactor: move memory barrier to idle() accessors 2014-12-03 14:37:41 +02:00
Gleb Natapov
4d3b6497ea reactor: rework poll infrastructure
Move idle state management out from smp poller back to generic code. Each
poller returns if it did any useful work and generic code decided if it
should go idle based on that. If a poller requires constant polling it
should always return true.
2014-12-03 14:37:33 +02:00
Avi Kivity
e9432e9254 reactor: move collectd initialization out of reactor::run()
It's complicated enough without it.
2014-11-30 14:24:19 +02:00
Vlad Zolotarov
47b3721ccf reactor: added a "pollers" abstraction
Each "poller" registers a non-blocking callback which is then called in
every iteration of a reactor's main loop.

Each "poller"'s callback returns a boolean: if TRUE then a main loop is allowed to block
(e.g. in epoll()).

If any of registered "pollers" returns FALSE then reactor's main loop is forbidded to block
in the current iteration.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-30 12:12:39 +02:00
Gleb Natapov
4f4731c37b net: delay network stack creation
Network device has to be available when network stack is created, but
sometimes network device creation should wait for device initialization
by another cpu. This patch makes it possible to delay network stack
creation until network device is available.
2014-11-26 16:46:04 +02:00
Tomasz Grabiec
f458117b83 core: avoid recursion in keep_doing()
Recursion takes up space on stack which takes up space in caches which
means less room for useful data.

In addition to that, a limit on iteration count can be larger than the
limit on recursion, because we're not limited by stack size here.

Also, recursion makes flame-graphs really hard to analyze because
keep_doing() frames appear at different levels of nesting in the
profile leading to many short "towers" instead of one big tower.

This change reuses the same counter for limiting iterations as is used
to limit the number of tasks executed by the reactor before polling.

There was a run-time parameter added for controlling task quota.
2014-11-20 11:16:09 +02:00
Tomasz Grabiec
b8344e31e0 output_stream: coalesce large buffers with data already in the buffer
Assuming the output_stream size is set to 8K, a sequence of writes of
lengths: 128B, 8K, 128B would yield three fragments of exactly those
sizes. This is not optimal as one could fit those in just 2 fragments
of up to 8K size. This change makes the output_stream yield 8K and
256B fragments for this case.
2014-11-15 11:58:10 -08:00
Tomasz Grabiec
b1208d6501 output_stream: simplify flush()
output_stream can be used by only one fiber at a time so from
correctness point of view it doesn't matter if we set _end before or
after put(), but setting it before it allows us to have one future
less, which is a win.
2014-11-15 11:58:09 -08:00
Nadav Har'El
405f3ea8c3 reactor: refactor main loop for epoll and OSv
The reactor is currently designed around the concept of file descriptors
and polling them. Every source of events is a file descriptor, and those
which are not, like timers, signals and inter-thread notifications, are
"converted" to file-descriptor events using timerfd, signalfd and eventfd
respectively.

But for running OSv with a directly assigned virtio device, we don't want
to use file descriptors for notifications: When we need each interrupt
to signal an eventfd, this is slow, and also problematic because file
descriptors contain locks so we can't signal an eventfd at interrupt
time, causing the existing code to use an extra thread to do this.

So this patch refactors the reactor to allow the main loop to be based
no just on file descriptors, but on a different type of abstractions.
We have a reactor_backend (with epoll and osv implementation), to which we
We don't add "file descriptors" but rather more abstract notions like
timer, signal or "notifier" (similar to eventfd). The Linux epoll
implementation indeed uses file descriptors internally (with timer
using a timerfd, signal using signalfd and notifier using eventfd)
but the OSv implementation does not use file descriptors.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-12 18:15:59 +02:00
Avi Kivity
067112a319 Merge branch 'tgrabiec/smp'
From Tomasz:

"There will be now a separate DB per core, each serving a subset of the key
space (sharding). From the outside in appears to behave as one DB."
2014-11-11 13:52:59 +02:00