scylladb

Author	SHA1	Message	Date
Takuya ASADA	902b5b00a4	Make pollable_fd::_s as private variable We can use pollable_fd::writeable/readable instead. Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>	2015-01-08 12:07:16 +02:00
Avi Kivity	c95e452a3a	Merge branch 'directory' Directory listing support, using subscription<sstring> to represent the stream of file names produced by the directory lister running in parallel with the directory consumer.	2015-01-08 11:14:52 +02:00
Avi Kivity	3be04e7009	reactor: implement open_directory(), list_directory() open_directory() is similar to open_file_dma() with just the O_ flags adjusted. list_directory() returns a subscription(), so that both the producer and the consumer can be asynchronous.	2015-01-08 11:09:25 +02:00
Takuya ASADA	b9a2541c7e	Add reactor::connect(), client_socket definition and network stack stub code Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>	2015-01-08 01:26:36 +09:00
Takuya ASADA	7730ef29cd	Add readable()/writeable() method on pollable_fd	2015-01-08 01:26:36 +09:00
Takuya ASADA	24820543ff	Add reactor::posix_connect() Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>	2015-01-08 01:26:30 +09:00
Gleb Natapov	6ad9114c0b	reactor: add at_destroy() function to the reactor and use it Unfortunately at_exit() cannot be used to delete objects since when it runs the reactor is still active and deleted object may still been used. We need another API that runs its task after reactor is already stopped. at_destroy() will be such api.	2014-12-30 15:21:10 +02:00
Gleb Natapov	efd6c33af0	reactor: remove no longer needed thread_pool function	2014-12-28 18:19:20 +02:00
Gleb Natapov	3eefc1ada2	reactor: replace thread_pool poller with signal notification Drops one poller since now signal poller is used to process thread_pool completions.	2014-12-28 18:19:18 +02:00
Gleb Natapov	367bbd75f3	reactor: move epoll to its own poller Enable it only if there is fd to poll.	2014-12-28 14:54:43 +02:00
Gleb Natapov	889cc69a28	reactor: remove non polling mode	2014-12-28 14:54:43 +02:00
Gleb Natapov	7d3fb282c5	reactor: poll thread pool for completion instead of using eventfd	2014-12-28 14:54:43 +02:00
Gleb Natapov	a57e75ab9f	reactor: move timers to use signals instead of timerfd to signal a completion	2014-12-28 14:54:43 +02:00
Gleb Natapov	51b56d90f2	reactor: poll for signals completion instead of using signalfd	2014-12-28 14:54:43 +02:00
Gleb Natapov	3d374110c6	reactor: poll for aio completion instead of using eventfd io_getevents() avoids system call if timeout is zero and there is no completed event.	2014-12-28 13:39:59 +02:00
Gleb Natapov	466acedcb2	timer: cancel all timers during reactor destruction If a timer is not canceled it will try to cancel itself during destruction which may happen after engine is already destroyed.	2014-12-25 09:14:42 +02:00
Vlad Zolotarov	ddf239a943	dpdk: Move the scattered DPDK EAL initialization into the dpdk::eal. - Move the smp::dpdk_eal_init() code into the dpdk::eal::init() where it belongs. - Removed the unused "opts" parameter of dpdk::dpdk_device constructor - all its usage has been moved to dpdk::eal::init(). - Cleanup in reactor.cc: #if HAVE_DPDK -> #ifdef HAVE_DPDK; since we give a -DHAVE_DPDK option to a compiler. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-22 17:36:49 +02:00
Gleb Natapov	b958a44304	smp: create seastar threads using DPDK when compiled with DPDK support DPDK initialization creates its own threads and assumes that application uses them, otherwise things do not work correctly (rte_lcore_id() returns incorrect value for instance). This patch uses DPDK threads to run seastar main loop making DPDK APIs work as expected.	2014-12-18 14:43:37 +02:00
Avi Kivity	c09694d76c	reactor: fix corruption in _pollers register_poller() (and unregister_poller()) adjusts _pollers, but it may be called while iterating it, and since std::vector<> mutations invalidate iterators, corruption occurs. Fix by deferring manipulation of _pollers into a task, which is executed at a time where _pollers is not touched.	2014-12-17 16:56:20 +02:00
Avi Kivity	481080feb8	reactor: refactor reactor::poller Currently, reactor::_pollers holds reactor::poller pointers; since these are movable types, it's hard to maintain _pollers, as the pointers can keep changing. Refactor poller so that _pollers points at an internal type, which does not move when a reactor::poller moves. This requires getting rid of std::function, since it lacks a comparison operator.	2014-12-17 16:20:52 +02:00
Asias He	42c4085f29	timer: Introduce lowres_clock	2014-12-15 19:39:33 +08:00
Asias He	0242d402b7	timer: Drop Clock template parameter in time_set	2014-12-15 19:39:33 +08:00
Asias He	62fff15e54	timer: Make timer a template	2014-12-15 19:39:33 +08:00
Avi Kivity	c56dcaf17a	smp: fix cross-cpu access in poll mode We look at _poll mode in another cpu's cache accidentally, as pard of the peer->idle() call. Fix by looking at our own _poll variable first; they should all be the same.	2014-12-15 11:30:06 +02:00
Avi Kivity	4ab36be8c9	reactor: fix pointless allocation in wait_and_process() wait_and_process() expects an std::function<>, but we pass it a lambda, forcing it to allocate. Prepare the sdt::function<> in advance, so it can pass by reference.	2014-12-14 15:58:56 +02:00
Avi Kivity	04488eebea	smp: batch messages across smp request/response queues Instead of incurring the overhead of pushing a message down the queue (two cache line misses), amortize of over 16 messages (3/4 cache line misses per batch). Batch size is limited by poll frequency, so we should adjust that dynamically.	2014-12-11 19:20:50 +02:00
Avi Kivity	2717ac3c37	smp: improve _pending_fifo flushing Instead of flushing pending items one by one, flush them all at once, amortizing the write to the index.	2014-12-11 19:20:50 +02:00
Avi Kivity	b6485bcb7c	smp: initialize _pending_fifo on sending cpu If it needs to be resized, it will cause a deallocation on the wrong cpu, so initialize it on the sending cpu. Does not break with circular_buffer<>, but it's not going to be a circular_buffer<> for long.	2014-12-11 19:20:50 +02:00
Gleb Natapov	34a8744fd3	smp: wait for all cpus before signaling start promise If start promise on initial cpu is signaled before other cpus have networking stack constructed collected initialization crashes since it tries to create a UDP socket on all available cpus when initial one is ready.	2014-12-09 18:54:56 +02:00
Avi Kivity	30143fe18d	reactor: destroy network_stack after timer infrastructure The network stack contains a timer, so it must be constructed after the timer infrastructure and destroyed before it. Fixes a segfault on shutdown.	2014-12-07 17:37:13 +02:00
Avi Kivity	674076c7bd	smp: fix indentation	2014-12-07 17:37:13 +02:00
Avi Kivity	f4d7bd7e00	reactor: register pollers using a RAII class Avoids leaking a poller.	2014-12-07 17:36:44 +02:00
Gleb Natapov	4ade76a182	reactor: add missing std::forward in at_exit()	2014-12-07 16:45:53 +02:00
Avi Kivity	2ee0239a4a	Merge branch 'tgrabiec/zero-copy-2' of github.com:cloudius-systems/seastar-dev Zero-copy memcached get from Tomasz: "I've measured memcached on muninn/huginn to be 7.5% better with this on vhost stack."	2014-12-04 16:31:04 +02:00
Tomasz Grabiec	c4335c49f6	core: convert output APIs to work on packets This way zero-copy supporting code can put data directly to packet object and pass it through all layers efficiently.	2014-12-04 13:51:26 +01:00
Tomasz Grabiec	ba0ac1c2b8	core: simplify write_all() The only case when write_all() does not write all the data is when the fiber fails at some point, in which case the resulting future is failed too.	2014-12-04 13:37:36 +01:00
Tomasz Grabiec	bcea3a67ca	output_stream: support for output packet trimming For UDP memcached we cannot generate arbitrarily large chunks, we need to trim to datagram size. It's most efficient to split in the output_stream.	2014-12-03 20:02:21 +01:00
Tomasz Grabiec	4b7c42a5c7	output_stream: fix bug in write() When coalescing large buffer with buffered data _end was not updated so flush() would yield shorter packet.	2014-12-03 20:02:21 +01:00
Tomasz Grabiec	6ae5177c2c	output_stream: do not allocate on flush() In UDP memcached flush() is always the last operation on outpout_stream, so that allocation is wasted.	2014-12-03 20:02:21 +01:00
Gleb Natapov	4fd3313e3e	reactor: add "--poll" command line switch If the switch is used reactor never goes idle.	2014-12-03 14:37:49 +02:00
Gleb Natapov	d151763967	reactor: move memory barrier to idle() accessors	2014-12-03 14:37:41 +02:00
Gleb Natapov	4d3b6497ea	reactor: rework poll infrastructure Move idle state management out from smp poller back to generic code. Each poller returns if it did any useful work and generic code decided if it should go idle based on that. If a poller requires constant polling it should always return true.	2014-12-03 14:37:33 +02:00
Avi Kivity	e9432e9254	reactor: move collectd initialization out of reactor::run() It's complicated enough without it.	2014-11-30 14:24:19 +02:00
Vlad Zolotarov	47b3721ccf	reactor: added a "pollers" abstraction Each "poller" registers a non-blocking callback which is then called in every iteration of a reactor's main loop. Each "poller"'s callback returns a boolean: if TRUE then a main loop is allowed to block (e.g. in epoll()). If any of registered "pollers" returns FALSE then reactor's main loop is forbidded to block in the current iteration. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-30 12:12:39 +02:00
Gleb Natapov	4f4731c37b	net: delay network stack creation Network device has to be available when network stack is created, but sometimes network device creation should wait for device initialization by another cpu. This patch makes it possible to delay network stack creation until network device is available.	2014-11-26 16:46:04 +02:00
Tomasz Grabiec	f458117b83	core: avoid recursion in keep_doing() Recursion takes up space on stack which takes up space in caches which means less room for useful data. In addition to that, a limit on iteration count can be larger than the limit on recursion, because we're not limited by stack size here. Also, recursion makes flame-graphs really hard to analyze because keep_doing() frames appear at different levels of nesting in the profile leading to many short "towers" instead of one big tower. This change reuses the same counter for limiting iterations as is used to limit the number of tasks executed by the reactor before polling. There was a run-time parameter added for controlling task quota.	2014-11-20 11:16:09 +02:00
Tomasz Grabiec	b8344e31e0	output_stream: coalesce large buffers with data already in the buffer Assuming the output_stream size is set to 8K, a sequence of writes of lengths: 128B, 8K, 128B would yield three fragments of exactly those sizes. This is not optimal as one could fit those in just 2 fragments of up to 8K size. This change makes the output_stream yield 8K and 256B fragments for this case.	2014-11-15 11:58:10 -08:00
Tomasz Grabiec	b1208d6501	output_stream: simplify flush() output_stream can be used by only one fiber at a time so from correctness point of view it doesn't matter if we set _end before or after put(), but setting it before it allows us to have one future less, which is a win.	2014-11-15 11:58:09 -08:00
Nadav Har'El	405f3ea8c3	reactor: refactor main loop for epoll and OSv The reactor is currently designed around the concept of file descriptors and polling them. Every source of events is a file descriptor, and those which are not, like timers, signals and inter-thread notifications, are "converted" to file-descriptor events using timerfd, signalfd and eventfd respectively. But for running OSv with a directly assigned virtio device, we don't want to use file descriptors for notifications: When we need each interrupt to signal an eventfd, this is slow, and also problematic because file descriptors contain locks so we can't signal an eventfd at interrupt time, causing the existing code to use an extra thread to do this. So this patch refactors the reactor to allow the main loop to be based no just on file descriptors, but on a different type of abstractions. We have a reactor_backend (with epoll and osv implementation), to which we We don't add "file descriptors" but rather more abstract notions like timer, signal or "notifier" (similar to eventfd). The Linux epoll implementation indeed uses file descriptors internally (with timer using a timerfd, signal using signalfd and notifier using eventfd) but the OSv implementation does not use file descriptors. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-11-12 18:15:59 +02:00
Avi Kivity	067112a319	Merge branch 'tgrabiec/smp' From Tomasz: "There will be now a separate DB per core, each serving a subset of the key space (sharding). From the outside in appears to behave as one DB."	2014-11-11 13:52:59 +02:00

1 2 3

148 Commits