scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Gleb Natapov	8a754386c2	net: remove unused variable in native_network_stack	2014-11-25 09:54:44 +02:00
Asias He	e14674ff3c	tcp: Improve merge_out_of_order In case of seg_beg > _rcv.need, we can stop looking since seg_beg can grow only.	2014-11-25 15:44:59 +08:00
Asias He	2a3ce92b19	tcp: Reduce maximum delayed timer The maximum delayed ack timer allowed by RFC1122 is 500ms, most implementations use 200ms by default, including Windows and Linux.	2014-11-25 15:39:33 +08:00
Glauber Costa	dd8c5a3521	xen: fix index calculation The xen protocol needs works by filling positions in a circular ring. The indexes become free to be used again when they are processed by the other side. There is a problem, however: those indexes must be sequential, because all the sides share is a produced / consumed index. But there are situations in which we call get_index() - which produces an index X, but the .then() clause schedules some other caller of send() to run in our place. That one, in turn, can call get_index(), then create a packet with index X + 1 that will be put in the ring before the packet with index X. If the other end processes this packet very fast, it will respond saying "I have processed packets up to X + 1". We will act on it as marking X as processed as well - since it comes before X + 1, and when X is really processed, chaos will ensue. The solution for that is to just have the semaphore to count how many spaces we have in the ring. Once we guarantee that the current caller have space, we then compute get_index() inside the .then() clause. This works well because the indexes are all sequential anyway. For the same reason, we are actually able to remove the queue, and resort to a simple counter. Once we know there is room, we just get the next index, whatever it may be. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 17:01:14 +01:00
Glauber Costa	3f67c12925	xen: make idx method static It does not depend on any instance member. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 16:44:05 +01:00
Glauber Costa	3c195d25e6	xen: useful assert we can't reach this place with a negative ref id, so let's assert to make sure we're fine. Help catching some bugs. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 16:42:55 +01:00
Glauber Costa	fa252087c4	xen: use the right index The index in the ring and the packet id tends to be the same. But it doesn't have to. There are some situations where the backend and the frontend get out of sync with this, and this is totally valid. One example is when the backend skb already have enough room to hold all of the data being transmitted (netback.c, line 1611 @3.16). The netback will respond immediately, even though there are other pending packets that are not yet fully processed. The ring index, then, must come from the rsp value, not from the req/rsp id. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-24 16:38:38 +01:00
Asias He	e0df395124	Add make_free_deleter	2014-11-24 18:16:25 +08:00
Asias He	cfd8a1f997	Revert "core: special-case deleter for raw memory" This reverts commit `f75d1822cc`.	2014-11-24 18:16:25 +08:00
Asias He	5eaecc8805	Use default allocator	2014-11-24 18:16:25 +08:00
Asias He	35186f659a	tcp: Fix transmission When a bulk of data is passed from user application, the TCP layer call output only once to send data. This will slow TX a lot, because the output will send at most MSS size of data while we might have way more than MSS to send. We will send again only after remote ack the data we just sent. This slowness can be seen easily with tso turned off. To fix, we should send as much as we are allowed to. This patch boosts TX bandwidth from 0.N MiB/Sec to hundreds MiB/Sec. Before: [asias@hjpc pingpong]$ go run client-txtx.go Server: 192.168.66.123:10000 Connections: 1 Bytes Received(MiB): 10 Total Time(Secs): 76.217338072 Bandwidth(MiB/Sec): 0.13120374252054473 After: [asias@hjpc pingpong]$ go run client-txtx.go Server: 192.168.66.123:10000 Connections: 1 Bytes Received(MiB): 100 Total Time(Secs): 0.5105951040000001 Bandwidth(MiB/Sec): 195.84989988466475	2014-11-24 11:54:33 +02:00
Asias He	18565277f3	tcp: Add congestion control support This patch adds congestion control to our TCP according to RFC5681. These four algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery, are added. Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>	2014-11-24 11:54:19 +02:00
Avi Kivity	347b135b78	memory: rename statistics member functions to be more readable Requested by Gleb.	2014-11-24 11:53:13 +02:00
Avi Kivity	f32a8be723	memory: make statistics thread local Noticed by Gleb.	2014-11-24 11:50:07 +02:00
Avi Kivity	3b69d76f06	Merge branch 'stats' Add statistics for memory allocations and httpd.	2014-11-24 10:00:34 +02:00
Avi Kivity	88b38bfbdf	Revert "virtio: Lazy interrupts" This reverts commit 817023f91741e43731823e72d60800016cbf2633; causes hangs and throughput problems.	2014-11-24 09:28:41 +02:00
Vlad Zolotarov	1238807d98	net: implement a few proper constructors for ethernet_address Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-23 23:26:54 +02:00
Vlad Zolotarov	80396b5153	packet: Use const_cast The original cast gave an error with -Werror=cast-qual Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-23 23:25:38 +02:00
Avi Kivity	0903e36179	httpd: export statistics via collectd	2014-11-23 19:28:54 +02:00
Avi Kivity	a7f14fa13e	core: export memory statistics via collectd	2014-11-23 19:28:26 +02:00
Avi Kivity	5cd200831b	memory: add allocation statistics collection	2014-11-23 19:28:07 +02:00
Glauber Costa	5f82f7296f	xen: debug functions This is being very helpful for my local debugging. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-20 16:57:16 +01:00
Glauber Costa	bc617f8340	make xen work again. After the latest reactor rework from Nadav, it is no longer allowed to use eventfds in the reactor for OSv. Change the code to use the reactor notifier instead. We could just use that instead of semaphores altogether. But because the semaphore is per listener, we need a translation anyway. So let's keep this one doing the interrupt processing, and the semaphores doing the rest of the work. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com> Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-11-20 14:59:25 +02:00
Glauber Costa	af01032450	make xen work again. After the latest reactor rework from Nadav, it is no longer allowed to use eventfds in the reactor for OSv. Change the code to use the reactor notifier instead. We could just use that instead of semaphores altogether. But because the semaphore is per listener, we need a translation anyway. So let's keep this one doing the interrupt processing, and the semaphores doing the rest of the work. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2014-11-20 13:41:20 +01:00
Tomasz Grabiec	e85371112b	memcached: optimize handle_get() We don't really need to copy keys as the parser is not reused until we're done. Also, in case of a single key we don't use map_reduce() which saves us one allocation (reducer).	2014-11-20 13:25:07 +02:00
Tomasz Grabiec	7737871ea0	memcached: move data rather than copy it	2014-11-20 13:25:06 +02:00
Avi Kivity	37bf4898e3	allocator_test: limit runtime to 5 seconds	2014-11-20 13:04:03 +02:00
Avi Kivity	cabdf1dcbc	allocator_test: reindent	2014-11-20 12:37:18 +02:00
Avi Kivity	3d0724c4c1	Merge branch 'xen'	2014-11-20 12:33:45 +02:00
Avi Kivity	57fdff17ff	xen: fix non-split event channel xenbus notification The xenbus name for non-split event channels is different.	2014-11-20 12:25:13 +02:00
Avi Kivity	62693f884f	xen: bind event channels after determining split event channel support	2014-11-20 12:25:13 +02:00
Avi Kivity	3118c5b464	xen: positively acknowledge supported features Instead of just nacking unsupported features, respond to all features, with a NACK for unsupported ones and ACK for supported ones.	2014-11-20 12:25:13 +02:00
Avi Kivity	ce03a3a9c0	xen: add evtchn port default constructor and move assignment operator	2014-11-20 12:17:29 +02:00
Avi Kivity	c588eb66d4	xen: fix evtchn port destructor and move constructor w/ uninitialized port	2014-11-20 12:16:40 +02:00
Gleb Natapov	4fe794a4e6	smp: fix barrier usage in smp initialization code inited variable may be destroyed while still in use by other threads. Fix it by making its scope static.	2014-11-20 12:07:57 +02:00
Tomasz Grabiec	f458117b83	core: avoid recursion in keep_doing() Recursion takes up space on stack which takes up space in caches which means less room for useful data. In addition to that, a limit on iteration count can be larger than the limit on recursion, because we're not limited by stack size here. Also, recursion makes flame-graphs really hard to analyze because keep_doing() frames appear at different levels of nesting in the profile leading to many short "towers" instead of one big tower. This change reuses the same counter for limiting iterations as is used to limit the number of tasks executed by the reactor before polling. There was a run-time parameter added for controlling task quota.	2014-11-20 11:16:09 +02:00
Asias He	8cb9185cb6	tcp: Set retransmission timer dynamically Set RTO (retransmission timer) according to RFC6298. Now, we have a dynamic RTO istead of the hard coded 3 seconds, and an exponential back-off timer for retransmission.	2014-11-20 10:50:53 +02:00
Tomasz Grabiec	9c9f3d21bf	tests: make option defaults be effective in tests	2014-11-20 10:40:03 +02:00
Avi Kivity	f80b2a6554	hwloc: fix leaking topology object	2014-11-18 10:28:23 +02:00
Avi Kivity	d222ea6ceb	util: add defer(), a function that defers work until the end of scope	2014-11-18 10:27:51 +02:00
Asias He	817023f917	virtio: Lazy interrupts Tell host to interrupt less. This is useful for tx queue completion since we do not care much when the tx is completed exactly. Passed test with memcached and tcp_server.	2014-11-18 10:17:38 +02:00
Asias He	e386b72638	tcp: Fix ACK on closed channel In case of local: Send Data + FIN remote: Ack Data + FIN We should strip 1 byte off in data ACK only if we have sent out FIN. Otherwise, we will think there is 1 bytes that remote hasn't acked and retransmit. This patch fixes the unnecessary retransmission of the last memcache get response. Found this issue when looking at TCP flow in memaslap testing. Before: 38811 1.000117000 192.168.66.100 -> 192.168.66.123 MEMCACHE 124 get 38812 1.000593000 192.168.66.123 -> 192.168.66.100 MEMCACHE 1164 VALUE 38813 1.000624000 192.168.66.100 -> 192.168.66.123 TCP 54 59708 > 11211 [FIN, ACK] Seq=2217067730 Ack=20399459 Win=185856 Len=0 38814 1.000769000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59708 [ACK] Seq=20399459 Ack=2217067731 Win=3737600 Len=0 38815 4.000883000 192.168.66.123 -> 192.168.66.100 MEMCACHE 1164 [TCP Retransmission] VALUE 38816 4.000934000 192.168.66.100 -> 192.168.66.123 TCP 54 [TCP Dup ACK 38813#1] 59708 > 11211 [ACK] Seq=2217067731 Ack=20399459 Win=185856 Len=0 38817 4.001054000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59708 [FIN, ACK] Seq=20399459 Ack=2217067731 Win=3737600 Len=0 38818 4.001094000 192.168.66.100 -> 192.168.66.123 TCP 54 59708 > 11211 [ACK] Seq=2217067731 Ack=20399460 Win=185856 Len=0 After: 38547 1.000224000 192.168.66.100 -> 192.168.66.123 MEMCACHE 124 get 38548 1.000264000 192.168.66.123 -> 192.168.66.100 MEMCACHE 1164 VALUE 38549 1.000292000 192.168.66.100 -> 192.168.66.123 TCP 54 59717 > 11211 [FIN, ACK] Seq=1862323816 Ack=20267265 Win=185856 Len=0 38550 1.000441000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59717 [ACK] Seq=20267265 Ack=1862323817 Win=3737600 Len=0 38551 1.000602000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59717 [FIN, ACK] Seq=20267265 Ack=1862323817 Win=3737600 Len=0 38552 1.000626000 192.168.66.100 -> 192.168.66.123 TCP 54 59717 > 11211 [ACK] Seq=1862323817 Ack=20267266 Win=185856 Len=0	2014-11-18 10:17:33 +02:00
Asias He	6e9521b86b	tests: Increase bytes transfered in tx test From 10MiB to 100MiB, stress more.	2014-11-18 10:16:38 +02:00
Asias He	ee023f4f84	tcp: Fix delayed ack When doing tcp rx testing, I saw a lot of retransmission because of the delayed ACK. Our current delayed ACK algorithm does not comply with what RFC 1122 suggests. As described in RFC 1122, a host may delay sending an ACK response by up to 500 ms. Additionally, with a stream of full-sized incoming segments, ACK responses must be sent for every second segment. === Before === [asias@hjpc pingpong]$ go run client-rxrx.go Bytes Sent(MiB): 100 Total Time(Secs): 322.620879376 Bandwidth(MiB/Sec): 0.30996133974160595 78 2.412385 192.168.66.100 -> 192.168.66.123 TCP 32174 37672 > 10000 [ACK] Seq=2149425323 Ack=1000001 Win=229 Len=32120 79 2.612985 192.168.66.100 -> 192.168.66.123 TCP 1514 [TCP Retransmission] 37672 > 10000 [ACK] Seq=2149425323 Ack=1000001 Win=229 Len=1460 80 2.613131 192.168.66.123 -> 192.168.66.100 TCP 54 10000 > 37672 [ACK] Seq=1000001 Ack=2149457443 Win=29200 Len=0 === After === [asias@hjpc pingpong]$ go run client-rxrx.go Bytes Sent(MiB): 100 Total Time(Secs): 0.244951095 Bandwidth(MiB/Sec): 408.2447559583271 No retransmission is seen.	2014-11-17 11:50:51 +02:00
Avi Kivity	8e47ed8b06	tests: whitelist allocator_test	2014-11-15 12:19:37 -08:00
Tomasz Grabiec	05d89f1ab9	tests: add output_stream_test	2014-11-15 12:11:11 -08:00
Tomasz Grabiec	b8344e31e0	output_stream: coalesce large buffers with data already in the buffer Assuming the output_stream size is set to 8K, a sequence of writes of lengths: 128B, 8K, 128B would yield three fragments of exactly those sizes. This is not optimal as one could fit those in just 2 fragments of up to 8K size. This change makes the output_stream yield 8K and 256B fragments for this case.	2014-11-15 11:58:10 -08:00
Tomasz Grabiec	b1208d6501	output_stream: simplify flush() output_stream can be used by only one fiber at a time so from correctness point of view it doesn't matter if we set _end before or after put(), but setting it before it allows us to have one future less, which is a win.	2014-11-15 11:58:09 -08:00
Tomasz Grabiec	825b3608a4	tests: configure reactor for tests Commit `405f3ea8c3` changed reactor so that _network_stack is no longer default initialized to POSIX but to nullptr. This caused tests to segfault, becayse they are not using application template which takes care of configuration. The fix is to call configure() so that netwrok stack will be set to POSIX.	2014-11-15 11:58:07 -08:00
Avi Kivity	c52c56ce7b	tests: add memory allocation test	2014-11-15 11:56:16 -08:00

... 844 845 846 847 848 ...

43077 Commits