scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	f458117b83	core: avoid recursion in keep_doing() Recursion takes up space on stack which takes up space in caches which means less room for useful data. In addition to that, a limit on iteration count can be larger than the limit on recursion, because we're not limited by stack size here. Also, recursion makes flame-graphs really hard to analyze because keep_doing() frames appear at different levels of nesting in the profile leading to many short "towers" instead of one big tower. This change reuses the same counter for limiting iterations as is used to limit the number of tasks executed by the reactor before polling. There was a run-time parameter added for controlling task quota.	2014-11-20 11:16:09 +02:00
Avi Kivity	f80b2a6554	hwloc: fix leaking topology object	2014-11-18 10:28:23 +02:00
Tomasz Grabiec	b8344e31e0	output_stream: coalesce large buffers with data already in the buffer Assuming the output_stream size is set to 8K, a sequence of writes of lengths: 128B, 8K, 128B would yield three fragments of exactly those sizes. This is not optimal as one could fit those in just 2 fragments of up to 8K size. This change makes the output_stream yield 8K and 256B fragments for this case.	2014-11-15 11:58:10 -08:00
Tomasz Grabiec	b1208d6501	output_stream: simplify flush() output_stream can be used by only one fiber at a time so from correctness point of view it doesn't matter if we set _end before or after put(), but setting it before it allows us to have one future less, which is a win.	2014-11-15 11:58:09 -08:00
Avi Kivity	1a7fd983ac	memory: fix buffer overrun We store spans in freelist i if the span's size >= 2^i. However, when picking a span to satisfy an allocation, we must use the next larger list if the size is not a power of two, so that we can be sure that all spans on that list can satisfy that request. The current code doesn't do that, so it under-allocates, leading to memory corruption.	2014-11-15 11:52:39 -08:00
Tomasz Grabiec	627e14c2e4	sstring: introduce make_sstring() It concatenates multiple string-like entities in one go and gives away an sstring. It does at most one allocation for the final sstring and one copy per each string. Works with heterogenous arguments, both sstrings and constant strings are supported, string_views are planned.	2014-11-13 22:22:05 +02:00
Nadav Har'El	405f3ea8c3	reactor: refactor main loop for epoll and OSv The reactor is currently designed around the concept of file descriptors and polling them. Every source of events is a file descriptor, and those which are not, like timers, signals and inter-thread notifications, are "converted" to file-descriptor events using timerfd, signalfd and eventfd respectively. But for running OSv with a directly assigned virtio device, we don't want to use file descriptors for notifications: When we need each interrupt to signal an eventfd, this is slow, and also problematic because file descriptors contain locks so we can't signal an eventfd at interrupt time, causing the existing code to use an extra thread to do this. So this patch refactors the reactor to allow the main loop to be based no just on file descriptors, but on a different type of abstractions. We have a reactor_backend (with epoll and osv implementation), to which we We don't add "file descriptors" but rather more abstract notions like timer, signal or "notifier" (similar to eventfd). The Linux epoll implementation indeed uses file descriptors internally (with timer using a timerfd, signal using signalfd and notifier using eventfd) but the OSv implementation does not use file descriptors. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-11-12 18:15:59 +02:00
Avi Kivity	067112a319	Merge branch 'tgrabiec/smp' From Tomasz: "There will be now a separate DB per core, each serving a subset of the key space (sharding). From the outside in appears to behave as one DB."	2014-11-11 13:52:59 +02:00
Tomasz Grabiec	a82b2beb32	core: add shutdown hook registration facility Use like this: engine.at_exit([] { std::cout << "so long!\n"; return make_ready_future<>(); }); All lambdas will be executed when reactor is stopped, in order, on the same CPU on which they were registerred.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	95e09be799	net: add has_per_core_namespace() attribute to network stack POSIX stack does not allow one to bind more than one socket to given port. Native stack on the other hand does. The way services are set up depends on that. For instance, on native stack one might want to start the service on all cores, but on POSIX stack only on one of them.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	b647bb5746	smp: introduce distributed::start_single() Services which create UDP sockets on the same port on POSIX stack can have only one instance. This decision needs to be made at run-time.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	618cbd5729	smp: introduce foreign_ptr<> A smart pointer wrapper which deletes the pointer on the CPU on which it was wrapped.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	0b4ee2ff60	core: advertise element type in shared_ptr<> Other smart pointers also do that. Will help foreign_ptr<>.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	a77ecbeeef	smp: introduce distributed::invoke_on_all() overload for void-returning functions	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	c71f762f59	smp: introduce distributed::local()	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	79982a8545	smp: add distributed::invoke_on() overload for void-returning functions	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	8bbe285004	smp: improve forwarding of arguments in distributed::invoke_on() It is now capable of moving r-values rather than copying them.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	1988748885	smp: introduce distributed::map_reduce()	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	7e25d70392	core: introduce map_reduce() utility It spawns async mapping action in parallel and reduces the results as they come.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	6df3a03c0a	core: make submit_to() accept functions which return non-futures This adds an overload which will automatically wrap non-future non-void result in a ready future. Pro: less boiler plate code at call sites.	2014-11-11 13:52:23 +02:00
Tomasz Grabiec	c2fbfe8e84	core: destroy network stack before destroying timer lists. Fixes assert failure during ^C: #0 0x0000003e134348c7 in raise () from /lib64/libc.so.6 #1 0x0000003e1343652a in abort () from /lib64/libc.so.6 #2 0x0000003e1342d46d in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003e1342d522 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000000409a7c in boost::intrusive::list_impl<boost::intrusive::mhtraits<timer, boost::intrusive::list_ at /usr/include/boost/intrusive/list.hpp:1263 #5 0x00000000004881cc in iterator_to (this=<optimized out>, value=...) at core/timer-set.hh:71 #6 reactor::del_timer (this=<optimized out>, tmr=tmr@entry=0x60000005cda8) at core/reactor.cc:287 #7 0x00000000004682a5 in ~timer (this=0x60000005cda8, __in_chrg=<optimized out>) at ./core/reactor.hh:974 #8 ~resolution (this=0x60000005cd90, __in_chrg=<optimized out>) at net/arp.hh:86 #9 ~pair (this=0x60000005cd88, __in_chrg=<optimized out>) at /usr/include/c++/4.9.2/bits/stl_pair.h:96	2014-11-11 13:52:23 +02:00
Gleb Natapov	c908d5508e	smp: do not reorder tasks submitted to smp queue Currently semaphore is used to keep track off free space in smp queue, but our semaphore does not guaranty that order in which tasks call wait() will be the same order they will get access to a resource. This may cause packet reordering in smp which is not desirable for TCP performance. This patch replaces the semaphore with a simple counter and another queue to hold items that cannot be places into smp queue due to lack of space.	2014-11-10 15:58:48 +02:00
Nadav Har'El	fcce304908	collectd: Don't use the network stack before it is set up The current code (this will change soon with my reactor patches) constructs a default (Posix) network stack before reactore::configure() reassigns it to the requested network stack. It turns out there is one place we use the network stack before calling reactore::configure(), which ends up using the Posix stack even though we want the native stack - this is both silly and plainly doesn't work on the OSv setup. The problem is that app_template.hh tries to configure scollectd before the engine is started. This calls scollectd::impl::start() which calls engine.net().make_udp_channel(). When this happens this early, it creates a Posix socket... This patch moves the scollectd configuration to after the engine is started. It makes sense to me: As far as I understand, scollectd is all about sending packets (diagnostic packets), and it's kind of silly to start sending packets before starting the machinary which allows us to send packets. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com> [avi: use customary indentation, remove unneeded make_ready_future()]	2014-11-09 17:46:09 +02:00
Tomasz Grabiec	6611160db1	smp: fix distributed::stop() One problem was that 'inst' was const, another was that the vector was not cleared which made ~distributed() to complain.	2014-11-09 16:47:16 +02:00
Tomasz Grabiec	48d57a6cd9	core: make distributed::start() capable of forwarding references	2014-11-09 16:34:14 +02:00
Tomasz Grabiec	b6511ce3f4	core: add future::discard_result() Use when you don't want to care about the result and just want to return a future<>. The current implementation may not be the most optimal way to do it but it can be improved later if there's need.	2014-11-09 16:33:34 +02:00
Tomasz Grabiec	761d6119ef	posix: simplify uses of setsockopt	2014-11-09 16:33:33 +02:00
Tomasz Grabiec	bf774e1b92	posix: make setsockopt accept value via universal reference It's more convenient for users that way. If someone wants to pass a reference, we use a reference. If he passes an r-value, we accept it and use parameter l-value instead.	2014-11-09 16:33:33 +02:00
Gleb Natapov	2ac24ced66	smp: smp queues idle polling This patch adds "smp queue polling before going idle" to the reactor. It allows to avoid signalfd overhead in case receiver thread is not idle when message is sent. With this patch on top of two other patches from me that are still waiting to be committed I see 450120 Requests/sec with wrk and "httpd -c 2 --network-stack native" with native stack. With one cpu the result is 316002, so we have around 40% scaling. The bottleneck in this test is cpu 0 which takes 100% cpu time.	2014-11-09 16:26:27 +02:00
Avi Kivity	5bb13601fe	xen: wrap in "xen" namespace Names like "port" are too generic for the global namespace.	2014-11-09 14:41:01 +02:00
Avi Kivity	fede31896c	xen: mark port's constructor as explicit Prevent accidental construction.	2014-11-09 14:36:06 +02:00
Avi Kivity	14968812fe	xen: remove port::operator int() It's dangerous as it can be invoked in unexpected places.	2014-11-09 14:34:25 +02:00
Avi Kivity	16b0013c6b	xen: add port destructor De-register from the port list. Add a FIXME for unbinding the port from Xen.	2014-11-09 13:34:30 +02:00
Avi Kivity	46aac42704	xen: make 'port' a value object Makes it easier of users to manage its lifetime.	2014-11-09 13:30:52 +02:00
Avi Kivity	f5a2dcd9ec	xen: simplify evtchn port management By switching from a map of a list of semaphores to a multimap of ports, we have less indirection and things become more straightforward.	2014-11-09 13:14:23 +02:00
Avi Kivity	b2af728f0e	xen: provide xenstore::read_or_default() This is useful for features that are provided incrementally, so may not be present on all hypervisors. If the value is not present, return a user-provided default, which also has a system-provided default (0).	2014-11-09 12:07:31 +02:00
Glauber Costa	9a8cde5170	xen: have a list of semaphores per event channel We current have one port per event channel. We need to have a list of semaphores that will all be made ready when an interrupt kicks in. This is useful in the case where both tx and rx are bound to the same event channel. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-09 11:54:10 +02:00
Glauber Costa	7ae3ab57d4	xen: change process_interrupts to take a port, rather than a semaphore If we do that, plus make it an instance method, we should be able to use make_ready_port. This is consistent with the userspace implementation and from that point any changes there will be propagated to both. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-09 11:54:10 +02:00
Glauber Costa	ab3d02e347	xen: use a port class instead of an integer to represent an event channel The representation of an event channel as an integer poses a problem, in which waiting on an integer port doesn't work well when the same event channel is assigned for both tx and rx. The future will be ready for one of the sides, but we won't process the other. One alternative is to have conditions in the future processing, and in case the event channels are bound to the same port, process both events. But a better solution is to use a class to represent the bound ports, and instances of those classes will have their own pending methods. Infrastructure will be written in a following patch to make sure that all listeners to the same port will be made ready when an interrupt kicks in Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-09 11:54:08 +02:00
Glauber Costa	4dcc48c306	evtchn: allow to retrieve instance without parameters gntalloc already has a method like this, code it for evtchn as well. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-09 11:54:07 +02:00
Avi Kivity	4df81e0fba	Merge branch 'glommer/xen' of github.com:cloudius-systems/seastar-dev From Glauber: "This is all the xen work I have. There is still improvements to be made with the ring management, memory allocation, and other areas."	2014-11-06 12:45:30 +02:00
Glauber Costa	3d0f2de8bb	xen: method to end a grant operation Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-06 11:21:30 +01:00
Glauber Costa	0a1f5f9e73	xen: defer grant table operations Instead of returning a reference to a grant that is already present in an array, defer the initialization. This is how the OSv driver handles it, and I honestly am not sure if this is really needed: it seems to me we should be able to just reuse the old grants. I need to check in the backend code if we can be any smarter than this. However, right now we need to do something to recycle the buffers, and just re-doing the refs would lead to inconsistencies. So the best by now is to close and reopen the grants, and then later on rework this in a way that works for both the initial setup and the recycle. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-06 11:21:30 +01:00
Glauber Costa	ee172e36c1	xen: enhance gntref Enhance gntref with some useful operations. Also provide a default object that represents an invalid grant. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-06 11:21:29 +01:00
Gleb Natapov	d698811bdd	fix smp broadcast packet handling Some packets, like arp replies, are broadcast to all cpus for handling, but only packet structure is copied for each cpu, the actual packet data is the same for all of them. Currently networking stack mangles a packet data during its travel up the stack while doing ntoh() translations which cannot obviously work for broadcaster packets. This patches fixes the code to not modify packet data while doing ntoh(), but do it in a stack allocated copy of a data instead.	2014-11-06 10:30:30 +02:00
Glauber Costa	63c8db870f	xen: remove debug printfs As packet flow is working reasonably now, most of the prints can go. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-05 22:30:25 +01:00
Avi Kivity	5052d34d23	Merge branch 'xen' Partial Xen support.	2014-11-05 15:31:23 +02:00
Avi Kivity	2d14053e6e	xen: make gntref more readable Convert it from std::pair with meaningless .first and .second fields to a proper struct.	2014-11-05 15:09:04 +02:00
Avi Kivity	a9a87c8dbd	xen: fix low-level interrupt handling with osv The Xen code registers a function that calls semaphore::signal as an interrupt handler, however that function is not smp safe and may crash, and in events it generates are likely to be ignored, since they are just appended to the reactor queue without any real wakeup to the reactor thread. Switch to using an eventfd. That's still unsafe, but a little better, since its signalling is smp safe, and will cause the reactor thread to wake up in case it was asleep. With this, we are able to receive multiple packets.	2014-11-05 15:09:03 +02:00
Avi Kivity	a769737faa	xen: fix another bad grant operation We used gnttab_grant_foreign_access() instead of gnttab_grant_foreign_access_ref(). While the two functions have similar enough signatures, they do very different things. With the change, we are able to receive packets from Xen, though we crash immediately.	2014-11-05 15:09:03 +02:00

1 2 3 4 5 ...

295 Commits