Commit Graph

43077 Commits

Author SHA1 Message Date
Gleb Natapov
8a754386c2 net: remove unused variable in native_network_stack 2014-11-25 09:54:44 +02:00
Asias He
e14674ff3c tcp: Improve merge_out_of_order
In case of seg_beg > _rcv.need, we can stop looking since seg_beg can
grow only.
2014-11-25 15:44:59 +08:00
Asias He
2a3ce92b19 tcp: Reduce maximum delayed timer
The maximum delayed ack timer allowed by RFC1122 is 500ms, most
implementations use 200ms by default, including Windows and Linux.
2014-11-25 15:39:33 +08:00
Glauber Costa
dd8c5a3521 xen: fix index calculation
The xen protocol needs works by filling positions in a circular ring. The
indexes become free to be used again when they are processed by the other side.

There is a problem, however: those indexes must be sequential, because all the
sides share is a produced / consumed index. But there are situations in which
we call get_index() - which produces an index X, but the .then() clause
schedules some other caller of send() to run in our place. That one, in turn,
can call get_index(), then create a packet with index X + 1 that will be put in
the ring before the packet with index X.

If the other end processes this packet very fast, it will respond saying "I
have processed packets up to X + 1". We will act on it as marking X as
processed as well - since it comes before X + 1, and when X is really
processed, chaos will ensue.

The solution for that is to just have the semaphore to count how many spaces we
have in the ring. Once we guarantee that the current caller have space, we then
compute get_index() inside the .then() clause. This works well because the
indexes are all sequential anyway.

For the same reason, we are actually able to remove the queue, and resort to a
simple counter. Once we know there is room, we just get the next index,
whatever it may be.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-24 17:01:14 +01:00
Glauber Costa
3f67c12925 xen: make idx method static
It does not depend on any instance member.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-24 16:44:05 +01:00
Glauber Costa
3c195d25e6 xen: useful assert
we can't reach this place with a negative ref id, so let's assert to make sure
we're fine. Help catching some bugs.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-24 16:42:55 +01:00
Glauber Costa
fa252087c4 xen: use the right index
The index in the ring and the packet id tends to be the same. But it doesn't
have to.  There are some situations where the backend and the frontend get out
of sync with this, and this is totally valid.

One example is when the backend skb already have enough room to hold all of the
data being transmitted (netback.c, line 1611 @3.16). The netback will respond
immediately, even though there are other pending packets that are not yet fully
processed.

The ring index, then, must come from the rsp value, not from the req/rsp id.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-24 16:38:38 +01:00
Asias He
e0df395124 Add make_free_deleter 2014-11-24 18:16:25 +08:00
Asias He
cfd8a1f997 Revert "core: special-case deleter for raw memory"
This reverts commit f75d1822cc.
2014-11-24 18:16:25 +08:00
Asias He
5eaecc8805 Use default allocator 2014-11-24 18:16:25 +08:00
Asias He
35186f659a tcp: Fix transmission
When a bulk of data is passed from user application, the TCP layer call
output only once to send data. This will slow TX a lot, because the
output will send at most MSS size of data while we might have way more
than MSS to send. We will send again only after remote ack the data we
just sent. This slowness can be seen easily with tso turned off.

To fix, we should send as much as we are allowed to. This patch boosts
TX bandwidth from 0.N MiB/Sec to hundreds MiB/Sec.

Before:
[asias@hjpc pingpong]$ go run client-txtx.go
Server:  192.168.66.123:10000
Connections:  1
Bytes Received(MiB):  10
Total Time(Secs):  76.217338072
Bandwidth(MiB/Sec):  0.13120374252054473

After:
[asias@hjpc pingpong]$ go run client-txtx.go
Server:  192.168.66.123:10000
Connections:  1
Bytes Received(MiB):  100
Total Time(Secs):  0.5105951040000001
Bandwidth(MiB/Sec):  195.84989988466475
2014-11-24 11:54:33 +02:00
Asias He
18565277f3 tcp: Add congestion control support
This patch adds congestion control to our TCP according to RFC5681.
These four algorithms: slow start, congestion avoidance, fast
retransmit, and fast recovery, are added.

Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
2014-11-24 11:54:19 +02:00
Avi Kivity
347b135b78 memory: rename statistics member functions to be more readable
Requested by Gleb.
2014-11-24 11:53:13 +02:00
Avi Kivity
f32a8be723 memory: make statistics thread local
Noticed by Gleb.
2014-11-24 11:50:07 +02:00
Avi Kivity
3b69d76f06 Merge branch 'stats'
Add statistics for memory allocations and httpd.
2014-11-24 10:00:34 +02:00
Avi Kivity
88b38bfbdf Revert "virtio: Lazy interrupts"
This reverts commit 817023f91741e43731823e72d60800016cbf2633; causes hangs
and throughput problems.
2014-11-24 09:28:41 +02:00
Vlad Zolotarov
1238807d98 net: implement a few proper constructors for ethernet_address
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-23 23:26:54 +02:00
Vlad Zolotarov
80396b5153 packet: Use const_cast
The original cast gave an error with -Werror=cast-qual

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-23 23:25:38 +02:00
Avi Kivity
0903e36179 httpd: export statistics via collectd 2014-11-23 19:28:54 +02:00
Avi Kivity
a7f14fa13e core: export memory statistics via collectd 2014-11-23 19:28:26 +02:00
Avi Kivity
5cd200831b memory: add allocation statistics collection 2014-11-23 19:28:07 +02:00
Glauber Costa
5f82f7296f xen: debug functions
This is being very helpful for my local debugging.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-20 16:57:16 +01:00
Glauber Costa
bc617f8340 make xen work again.
After the latest reactor rework from Nadav, it is no longer allowed to use eventfds
in the reactor for OSv. Change the code to use the reactor notifier instead.

We could just use that instead of semaphores altogether. But because the semaphore is
per listener, we need a translation anyway. So let's keep this one doing the interrupt
processing, and the semaphores doing the rest of the work.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-20 14:59:25 +02:00
Glauber Costa
af01032450 make xen work again.
After the latest reactor rework from Nadav, it is no longer allowed to use eventfds
in the reactor for OSv. Change the code to use the reactor notifier instead.

We could just use that instead of semaphores altogether. But because the semaphore is
per listener, we need a translation anyway. So let's keep this one doing the interrupt
processing, and the semaphores doing the rest of the work.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-20 13:41:20 +01:00
Tomasz Grabiec
e85371112b memcached: optimize handle_get()
We don't really need to copy keys as the parser is not reused until
we're done.

Also, in case of a single key we don't use map_reduce() which
saves us one allocation (reducer).
2014-11-20 13:25:07 +02:00
Tomasz Grabiec
7737871ea0 memcached: move data rather than copy it 2014-11-20 13:25:06 +02:00
Avi Kivity
37bf4898e3 allocator_test: limit runtime to 5 seconds 2014-11-20 13:04:03 +02:00
Avi Kivity
cabdf1dcbc allocator_test: reindent 2014-11-20 12:37:18 +02:00
Avi Kivity
3d0724c4c1 Merge branch 'xen' 2014-11-20 12:33:45 +02:00
Avi Kivity
57fdff17ff xen: fix non-split event channel xenbus notification
The xenbus name for non-split event channels is different.
2014-11-20 12:25:13 +02:00
Avi Kivity
62693f884f xen: bind event channels after determining split event channel support 2014-11-20 12:25:13 +02:00
Avi Kivity
3118c5b464 xen: positively acknowledge supported features
Instead of just nacking unsupported features, respond to all features,
with a NACK for unsupported ones and ACK for supported ones.
2014-11-20 12:25:13 +02:00
Avi Kivity
ce03a3a9c0 xen: add evtchn port default constructor and move assignment operator 2014-11-20 12:17:29 +02:00
Avi Kivity
c588eb66d4 xen: fix evtchn port destructor and move constructor w/ uninitialized port 2014-11-20 12:16:40 +02:00
Gleb Natapov
4fe794a4e6 smp: fix barrier usage in smp initialization code
inited variable may be destroyed while still in use by other threads.
Fix it by making its scope static.
2014-11-20 12:07:57 +02:00
Tomasz Grabiec
f458117b83 core: avoid recursion in keep_doing()
Recursion takes up space on stack which takes up space in caches which
means less room for useful data.

In addition to that, a limit on iteration count can be larger than the
limit on recursion, because we're not limited by stack size here.

Also, recursion makes flame-graphs really hard to analyze because
keep_doing() frames appear at different levels of nesting in the
profile leading to many short "towers" instead of one big tower.

This change reuses the same counter for limiting iterations as is used
to limit the number of tasks executed by the reactor before polling.

There was a run-time parameter added for controlling task quota.
2014-11-20 11:16:09 +02:00
Asias He
8cb9185cb6 tcp: Set retransmission timer dynamically
Set RTO (retransmission timer) according to RFC6298. Now, we have a
dynamic RTO istead of the hard coded 3 seconds, and an exponential
back-off timer for retransmission.
2014-11-20 10:50:53 +02:00
Tomasz Grabiec
9c9f3d21bf tests: make option defaults be effective in tests 2014-11-20 10:40:03 +02:00
Avi Kivity
f80b2a6554 hwloc: fix leaking topology object 2014-11-18 10:28:23 +02:00
Avi Kivity
d222ea6ceb util: add defer(), a function that defers work until the end of scope 2014-11-18 10:27:51 +02:00
Asias He
817023f917 virtio: Lazy interrupts
Tell host to interrupt less. This is useful for tx queue completion
since we do not care much when the tx is completed exactly.

Passed test with memcached and tcp_server.
2014-11-18 10:17:38 +02:00
Asias He
e386b72638 tcp: Fix ACK on closed channel
In case of

local:  Send Data + FIN
remote: Ack Data + FIN

We should strip 1 byte off in data ACK only if we have sent out
FIN. Otherwise, we will think there is 1 bytes that remote hasn't
acked and retransmit.

This patch fixes the unnecessary retransmission of the last memcache get
response. Found this issue when looking at TCP flow in memaslap testing.

Before:
38811 1.000117000 192.168.66.100 -> 192.168.66.123 MEMCACHE 124 get
38812 1.000593000 192.168.66.123 -> 192.168.66.100 MEMCACHE 1164 VALUE
38813 1.000624000 192.168.66.100 -> 192.168.66.123 TCP 54 59708 > 11211
      [FIN, ACK] Seq=2217067730 Ack=20399459 Win=185856 Len=0
38814 1.000769000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59708
      [ACK] Seq=20399459 Ack=2217067731 Win=3737600 Len=0
38815 4.000883000 192.168.66.123 -> 192.168.66.100 MEMCACHE 1164
      [TCP Retransmission] VALUE
38816 4.000934000 192.168.66.100 -> 192.168.66.123 TCP 54
      [TCP Dup ACK 38813#1] 59708 > 11211
      [ACK] Seq=2217067731 Ack=20399459 Win=185856 Len=0
38817 4.001054000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59708
      [FIN, ACK] Seq=20399459 Ack=2217067731 Win=3737600 Len=0
38818 4.001094000 192.168.66.100 -> 192.168.66.123 TCP 54 59708 > 11211
      [ACK] Seq=2217067731 Ack=20399460 Win=185856 Len=0

After:
38547 1.000224000 192.168.66.100 -> 192.168.66.123 MEMCACHE 124 get
38548 1.000264000 192.168.66.123 -> 192.168.66.100 MEMCACHE 1164 VALUE
38549 1.000292000 192.168.66.100 -> 192.168.66.123 TCP 54 59717 > 11211
      [FIN, ACK] Seq=1862323816 Ack=20267265 Win=185856 Len=0
38550 1.000441000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59717
      [ACK] Seq=20267265 Ack=1862323817 Win=3737600 Len=0
38551 1.000602000 192.168.66.123 -> 192.168.66.100 TCP 54 11211 > 59717
      [FIN, ACK] Seq=20267265 Ack=1862323817 Win=3737600 Len=0
38552 1.000626000 192.168.66.100 -> 192.168.66.123 TCP 54 59717 > 11211
      [ACK] Seq=1862323817 Ack=20267266 Win=185856 Len=0
2014-11-18 10:17:33 +02:00
Asias He
6e9521b86b tests: Increase bytes transfered in tx test
From 10MiB to 100MiB, stress more.
2014-11-18 10:16:38 +02:00
Asias He
ee023f4f84 tcp: Fix delayed ack
When doing tcp rx testing, I saw a lot of retransmission because of the
delayed ACK.  Our current delayed ACK algorithm does not comply with
what RFC 1122 suggests.

As described in RFC 1122, a host may delay sending an ACK response by up
to 500 ms. Additionally, with a stream of full-sized incoming segments,
ACK responses must be sent for every second segment.

=== Before ===
[asias@hjpc pingpong]$ go run client-rxrx.go
Bytes Sent(MiB):  100
Total Time(Secs):  322.620879376
Bandwidth(MiB/Sec):  0.30996133974160595

78 2.412385 192.168.66.100 -> 192.168.66.123 TCP 32174 37672 > 10000
   [ACK] Seq=2149425323 Ack=1000001 Win=229 Len=32120
79 2.612985 192.168.66.100 -> 192.168.66.123 TCP 1514 [TCP Retransmission]
   37672 > 10000 [ACK] Seq=2149425323 Ack=1000001 Win=229 Len=1460
80 2.613131 192.168.66.123 -> 192.168.66.100 TCP 54 10000 > 37672
   [ACK] Seq=1000001 Ack=2149457443 Win=29200 Len=0

=== After ===
[asias@hjpc pingpong]$ go run client-rxrx.go
Bytes Sent(MiB):  100
Total Time(Secs):  0.244951095
Bandwidth(MiB/Sec):  408.2447559583271

No retransmission is seen.
2014-11-17 11:50:51 +02:00
Avi Kivity
8e47ed8b06 tests: whitelist allocator_test 2014-11-15 12:19:37 -08:00
Tomasz Grabiec
05d89f1ab9 tests: add output_stream_test 2014-11-15 12:11:11 -08:00
Tomasz Grabiec
b8344e31e0 output_stream: coalesce large buffers with data already in the buffer
Assuming the output_stream size is set to 8K, a sequence of writes of
lengths: 128B, 8K, 128B would yield three fragments of exactly those
sizes. This is not optimal as one could fit those in just 2 fragments
of up to 8K size. This change makes the output_stream yield 8K and
256B fragments for this case.
2014-11-15 11:58:10 -08:00
Tomasz Grabiec
b1208d6501 output_stream: simplify flush()
output_stream can be used by only one fiber at a time so from
correctness point of view it doesn't matter if we set _end before or
after put(), but setting it before it allows us to have one future
less, which is a win.
2014-11-15 11:58:09 -08:00
Tomasz Grabiec
825b3608a4 tests: configure reactor for tests
Commit 405f3ea8c3 changed reactor so
that _network_stack is no longer default initialized to POSIX but to
nullptr. This caused tests to segfault, becayse they are not using
application template which takes care of configuration.

The fix is to call configure() so that netwrok stack will be set to
POSIX.
2014-11-15 11:58:07 -08:00
Avi Kivity
c52c56ce7b tests: add memory allocation test 2014-11-15 11:56:16 -08:00