scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Glauber Costa	6bb8d687d0	native stack: support more than virtio Support xenfront as well, when we are in a Xen domain. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-05 15:09:03 +02:00
Glauber Costa	72abe62c4e	xenfront basic support This is the basic support for xenfront. It can be used in domU, provided there is a network interface to be hijacked. The code that follows, is just the mechanics of managing the grants, event channels, etc. However, it does not yet work: I can't see netback injecting any data into it. I am still debugging the protocol, but I wanted to flush the current state. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-05 15:09:03 +02:00
Glauber Costa	9fa8124ade	xen: evtchn support This patch enables xen event channels. It creates the placeholder for the kernel evtchns when we move to OSv as well. The main problem with this patch, is that evtchn::pending can return more than one evtchn, so this that I am doing here is technically wrong. We should probably call keep_doing() in pending() itself, and have that to store the references to futures equivalent to the possible event channels, that would then be made ready. I am, however, having a bit of a hard time coding this, since it's still unclear how, once the future is consumed, we would generate the next. Please note: All of this is moot if we disable "split event channels", which can be done by masking that feature in case it is even available. In that case, only one event channel will be notified, and when ready, we process both tx and rx. This is yet another reason why I haven't insisted so much in fixing this properly Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-05 15:09:03 +02:00
Glauber Costa	891b40a2af	xen: gntalloc device This patch creates a seastar enabled version of the xen gntalloc device. TODO: grow the table dynamically, and fix the index selection algorithm. Shouldn't just always bump 1. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-05 15:09:03 +02:00
Glauber Costa	7963eb026c	header functions for osv + xen Should come from OSv, we should fix this soon. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-05 15:09:03 +02:00
Asias He	534d4017a1	scripts: Add tap.sh This is the script I used to setup a tap device for seastar.	2014-11-05 13:05:55 +02:00
Avi Kivity	6a2532fb00	print: add log() function Like print(), but with time information prepended.	2014-11-05 11:35:50 +02:00
Gleb Natapov	d77ee625bd	virtio: signal availability of a virtio buffer in a vring after sending packet Currently there is an implicit unbounded queue between virtio driver and networking stack where packets may accumulate if they are received faster that networking stack can handle them. The queuing happen because virtio buffer availability is signaled immediately after received buffer promise is fulfilled, but promise fulfilment does not mean that buffer is processed, only that task that will process it is placed on a task queue. The patch fixes the problem by making virtio buffer available only after previous buffer's completion task is executed. It makes the aforementioned implicit queue between virtio driver and networking stack bound by virtio ring size.	2014-11-04 15:19:27 +02:00
Gleb Natapov	99941f0c16	virtio: remove feedback from virtio_net_device::queue_rx_packet() Instead of providing back pressure towards NIC, which will cause NIC to slow down and drop packets, network stack should drop packets it cannot handle by itself. Otherwise one slow receiver may cause drops for all others. Our native network stack correctly drops packets instead of providing feedback, so it is safe to just remove feedback from an API.	2014-11-04 15:19:13 +02:00
Gleb Natapov	f0416f44b1	keep_doing: remove infinite loop Prevent keep_doing() from monopolizing the cpu.	2014-11-04 15:19:01 +02:00
Gleb Natapov	e5dfd8e863	future: limit number of ready futures that are executed without scheduling a task Otherwise stack may overflow if a very long chain of ready futures is executed.	2014-11-04 15:18:32 +02:00
Gleb Natapov	f8575a1745	reactor: limit number of tasks the reactor runs between polling fds If there is a task that always adds another task to a ready task list epoll_wait will never run.	2014-11-04 15:18:23 +02:00
Avi Kivity	174cc6b876	packet: add linearize() This is helpful for net devices that do not support scatter/gather.	2014-11-04 10:55:04 +02:00
Avi Kivity	31078be7f7	net: initialize interface::_proto_map early If the driver starts pushing packets early, we need this field to be initialized so they can be properly ignored.	2014-11-04 10:54:44 +02:00
Asias He	c33270105b	net: Handle extra bytes contained in Ethernet frame. The Ethernet frame might contain extra bytes after the IP packet for padding. Trim the extra bytes in order not to confuse TCP. E.g. When doing TCP connection: 1) <SYN> 2) <SYN,ACK> 3) <ACK> Packet 3) should be 14 + 20 + 20 = 54 bytes, the sender might send a packet of size 60 bytes, containing 6 extra bytes for padding. Fix httpd on ran/sif.	2014-11-04 10:41:41 +02:00
Asias He	345d3a3628	net: Add trim_back to packet	2014-11-04 10:13:36 +02:00
Asias He	52f2a2b35b	tests: Add tcp_sever Currently, it is a ping/pong sever. When the client sends a ping, the server responds a pong. We can add more tests to extend. This is useful for testing TCP_RR.	2014-11-03 09:53:32 +02:00
Asias He	dbbd9865ad	core: Fix read_exactly Return an empty tmp buf if eof.	2014-11-03 09:53:31 +02:00
Glauber Costa	9a86af9543	xen: xenstore communication This patch enables to interact with xenstore. Since now OSv now fakes the presence of libxenstore, the code is the same for userspace and kernel. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-02 16:41:19 +02:00
Glauber Costa	5c5a5b8279	build: reorder configure.py Reorder the file in a way, as to allow the usage of command line switches in the construction of the compiled objects. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-02 16:41:19 +02:00
Avi Kivity	c845606a2d	Merge branch 'join' Rewrite http connection termination management to support the various cases dictated by the HTTP spec: client-side connection close, server-side connection close, and header specified connection close.	2014-11-01 19:38:09 +02:00
Avi Kivity	55f3a03e1d	reactor: make more internal variables private	2014-11-01 17:34:45 +02:00
Avi Kivity	7a1f84a556	reactor: replace references to reactor::_id by its accessor cpu_id()	2014-11-01 17:34:43 +02:00
Raphael S. Carvalho	d382a314b3	core: sstring: move initialized_later to be a public member It's required for instantiating a sstring with the constructor basic_sstring(initialized_later, size_t size). Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2014-11-01 15:36:28 +02:00
Raphael S. Carvalho	6188745a98	core: introduce size method into file_impl Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2014-11-01 15:36:28 +02:00
Asias He	b4544a3c76	tcp: Retransmission support Manage the RTO Timer using the algorithm in rfc6298.	2014-10-31 12:11:20 +02:00
Tomasz Grabiec	3de4680e56	function_input_iterator: fix postfix operator++	2014-10-30 19:51:21 +02:00
Tomasz Grabiec	95fd885996	virito: fix typo	2014-10-30 19:50:58 +02:00
Gleb Natapov	af91a11b2b	net: implement virtual packet queue in net::proxy::send() Currently net::proxy::send() waits for previous packet to be sent to another cpu before accepting next packet. This slows down sender to much. This patch implement virtual queue that allows many packets to be send simultaneously but it starts to drop packets when number of outstanding packets goes over the limit, if we will try to queue them we will run out of memory if a sender generates packets faster that they can be sent. It also tries to generate back pressure by returning a feature that will become ready when queue goes under the limit, but it returns it only for a first sender that goes over it. Doing this for all callers may be an option too, but it is not clear which one and how many should we wake when queue goes under the limit again.	2014-10-30 18:36:08 +02:00
Avi Kivity	d9992ee98c	Merge branch 'osv' Nadav writes: "This patch set allows Seastar's "native" stack to work over OSv, with OSv "assigning" the virtio queue directly to Seastar's control - as well as keeping the existing support for vhost (for running Seastar in a Linux host or guest). When Seastar is compiled with the "HAVE_OSV" flag, it uses the api in <osv/virtio-assign.hh> (so don't forget the appropriate "-I" as well) to make use of a virtio device assigned to it by OSv. At run time, Seastar uses either this OSv interface, or the Linux vhost interface, depending on what's available. The current code works, but for the sake of quickly producing something working, I made two compromises which will need to be fixed later: 1. The virt_to_phys() function has become a virtual function, slowing it down. We need to measure how much this matter, and if it does, switch to templates... 2. The host-to-guest notification is done in a very inefficient matter: We catches the host's interrupt, wake up a thread which then wakes up an eventfd which is noticed by Seastar's epoll event loop. We need that silly extra thread because eventfd signalling is not lock-free so cannot be done by an interrupt handler."	2014-10-30 16:47:39 +02:00
Nadav Har'El	b1964be121	build: add "--with-osv=..." configuration option Add a "--with-osv=<path>" option to configure.py as a shortcut to the long list of options needed to compile Seastar for OSv (as explained in README-OSv) Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-10-30 16:47:24 +02:00
Nadav Har'El	f497299f44	virtio: support virtio ring assigned from OSv As a second option beyond running on Linux with vhost, this patch allows Seastar to run in OSv with the virtio network device "assigned" to the application (i.e., we use the virtio rings directly, with no OSv involvement beyond the initial setup). To use this feature, one needs to compile Seastar with the "HAVE_OSV" flag, the osv::assigned_virtio::get() symbol needs to be available (which means we run under OSv), and it should return a non-null object (which means the OSv was run with --assign-net). Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-10-30 16:45:08 +02:00
Nadav Har'El	4b44968e86	virtio: expose notifier's wake_wait The wake_wait() method is only available for the notifier. Expose it from the vring holding this notifier, and from the rx or tx queue holding this vring. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-10-30 16:45:07 +02:00
Nadav Har'El	5db5f7622a	virtio: make virtio_net_device an abstract class Make virtio_net_device an abstract class, and move the vhost-specific code to a subclass, virtio_net_device_vhost. In a subsequent patch, we'll have a second subclass, for a virtio device assigned from OSv. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-10-30 16:45:07 +02:00
Nadav Har'El	8326f43ded	virtio: make virt_to_phys a virtual function In the existing code, virt_to_phys() was a fixed do-nothing function. This is good for vhost, but not good enough in OSv where the to convert virtual addresses to physical we need an actual calculation. The solution in this patch, using a virtual function, is not optimal and should probably be replaced with a template later. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-10-30 16:45:06 +02:00
Nadav Har'El	db16e4f634	virtio: separate notification from vring Currently, the "vring" class is hardcoded to do guest-host notifications via eventfd. This patch switches to a general "notification object" with two virtual functions - host_notify(), which unconditionally notifies the host, and host_wait() which returns a future<> on which one can wait for the host to notify us. This patch provides one implementation of this notification object, using eventfd as before, as needed when using vhost. We'll later provide a different implementation for running under OSv. This patch uses pointers and virtual functions; This adds a bit of overhead to every notification, but it is small compared to the other costs of these notifications. Nevertheless, we can change it in the future to make the notification object a template parameter instead of an abstract class. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-10-30 16:45:06 +02:00
Avi Kivity	6f48e27bfd	httpd: simplify connection termination The current implementation uses a sort of "manual reference counting"; any place which may be the last one using the connection checks if it is indeed the last one, and if so, deletes the connection object. With recent changes this has become unwields, as there are too many cases to track. To fix, we separate the connection into two streams: a read() stream that is internally serialized (only one request is parsed at a time) and that returns when there are no more requests to parse, and a respond() stream that is also internally serialized, and termiantes when the last response has been written. The caller then waits on the two streams with when_all().	2014-10-30 14:09:47 +02:00
Avi Kivity	79e8497e1d	queue: add size() accessor	2014-10-30 14:08:23 +02:00
Avi Kivity	3d414111eb	future: make .rescue() require an rvalue reference for its future This makes it harder to misuse.	2014-10-30 14:07:42 +02:00
Avi Kivity	7f91f1b937	future: add when_all(future...) when_all(f1, f2) returns a future that becomes ready when all input futures are ready. The return value is a tuple with all input futures, so the values and exceptions can be accessed.	2014-10-30 13:59:17 +02:00
Avi Kivity	c4bc67414e	future: add then_wrapped() Unlike future::then(), which unwraps the value, then_wrapped() keeps it wrapped in a future<>, so if it is exceptional, it can still be accessed. This is similar to the proposed std::future::then(), so we should later rename it to match (and rename the existing future::then() to future::next().	2014-10-30 13:55:31 +02:00
Avi Kivity	fa7ea4f86e	Revert "tcp: Retransmission support" This reverts commit `71ecf7650a` - it leaks memory like crazy.	2014-10-30 13:46:10 +02:00
Tomasz Grabiec	c5b7bbf37f	net: udp: do not use packet data in native_datagram's methods It's too easy to shoot yourself in the foot when trying to call get_src() after packet data was moved. Reported-by: Asias He <asias@cloudius-systems.com>	2014-10-30 12:40:15 +02:00
Tomasz Grabiec	ca85016556	tests: memcache: fix regex Some versions of python do not tolerate this regex: r'(\w)?' ERROR: test_incr (__main__.TestCommands) ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/memcached/test_memcached.py", line 464, in test_incr self.assertRegexpMatches(call('incr key 1\r\n').decode(), r'0(\w)?\r\n') File "/usr/lib64/python3.3/unittest/case.py", line 1178, in deprecated_func return original_func(args, *kwargs) File "/usr/lib64/python3.3/unittest/case.py", line 1153, in assertRegex expected_regex = re.compile(expected_regex) File "/usr/lib64/python3.3/re.py", line 214, in compile return _compile(pattern, flags) File "/usr/lib64/python3.3/re.py", line 281, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib64/python3.3/sre_compile.py", line 498, in compile code = _code(p, flags) File "/usr/lib64/python3.3/sre_compile.py", line 483, in _code _compile(code, p.data, flags) File "/usr/lib64/python3.3/sre_compile.py", line 75, in _compile elif _simple(av) and op is not REPEAT: File "/usr/lib64/python3.3/sre_compile.py", line 362, in _simple raise error("nothing to repeat") sre_constants.error: nothing to repeat	2014-10-30 10:17:52 +02:00
Calle Wilund	4b4f33c1ba	collectd: add a few counters to reactor Might be slightly useful for monitoring, and might also serve as an example. Signed-off-by: Calle Wilund <calle@cloudius-systems.com>	2014-10-29 19:23:42 +02:00
Tomasz Grabiec	95975151f6	virtio: change descriptor free list to FIFO instead of LIFO Based on observation that with packets comprised of multiple fragments vhost_get_vq_desc() goes higher in CPU profile. Avi suggested that the current LIFO handling of free descriptors causes contention on cache lines between seastar on vhost. Gives 6-10% boost depending on hardware.	2014-10-29 19:19:54 +02:00
Asias He	c16384a9fd	tcp: Delete connection when <RST> is received wrk might send <RST> instead of <FIN> to close a connection.	2014-10-29 10:03:34 +02:00
Asias He	71ecf7650a	tcp: Retransmission support Use very simple algorithm as a starter.	2014-10-29 10:03:32 +02:00
Gleb Natapov	1c827805bc	virtio: Use correct eventfd for virtio rx queue It is nice to be able to actually kick rx queue from time to time. Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-10-28 16:47:26 +02:00
Asias He	04dd72efd3	tcp: Support close initiated on server side for posix stack Fix hang with ab test on posix stack: ab -n 1000 http://127.0.0.1/ Fixes #3	2014-10-28 12:42:47 +02:00

1 2 3 4 5 ...

676 Commits