Commit Graph

43077 Commits

Author SHA1 Message Date
Avi Kivity
f5a2dcd9ec xen: simplify evtchn port management
By switching from a map of a list of semaphores to a multimap of ports,
we have less indirection and things become more straightforward.
2014-11-09 13:14:23 +02:00
Avi Kivity
8857412365 xen: fix explicitly-disabled split event channel feature
In case the hypervisor supports the split event channel feature, but
advertises it as disabled, we must not assume it works.
2014-11-09 12:08:49 +02:00
Avi Kivity
b2af728f0e xen: provide xenstore::read_or_default()
This is useful for features that are provided incrementally, so may not
be present on all hypervisors.  If the value is not present, return a
user-provided default, which also has a system-provided default (0).
2014-11-09 12:07:31 +02:00
Glauber Costa
9a8cde5170 xen: have a list of semaphores per event channel
We current have one port per event channel. We need to have a list of
semaphores that will all be made ready when an interrupt kicks in. This is
useful in the case where both tx and rx are bound to the same event channel.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-09 11:54:10 +02:00
Glauber Costa
7ae3ab57d4 xen: change process_interrupts to take a port, rather than a semaphore
If we do that, plus make it an instance method, we should be able to use
make_ready_port. This is consistent with the userspace implementation and
from that point any changes there will be propagated to both.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-09 11:54:10 +02:00
Glauber Costa
ab3d02e347 xen: use a port class instead of an integer to represent an event channel
The representation of an event channel as an integer poses a problem, in which
waiting on an integer port doesn't work well when the same event channel is
assigned for both tx and rx. The future will be ready for one of the sides, but
we won't process the other.

One alternative is to have conditions in the future processing, and in case the
event channels are bound to the same port, process both events. But a better
solution is to use a class to represent the bound ports, and instances of those
classes will have their own pending methods.

Infrastructure will be written in a following patch to make sure that all
listeners to the same port will be made ready when an interrupt kicks in

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-09 11:54:08 +02:00
Glauber Costa
4dcc48c306 evtchn: allow to retrieve instance without parameters
gntalloc already has a method like this, code it for evtchn as well.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-09 11:54:07 +02:00
Glauber Costa
9cd9fda570 xen: don't take split feature for granted
The backend may be completely silent about the existence of the split channels feature.
In that case, trying to read through the template directly would cause an exception,
since we can't convert the empty string.

The backend-id, OTOH, is guaranteed to exist and wasn't using the template signature.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-09 11:54:06 +02:00
Avi Kivity
467ff9cbe2 xen: fix grant refernce leaks
We copy our grant reference into a temporary, so free_ref() does not
clear the real entry, causing an assert() to trigger later on.

Fix by capturing the grant reference entry by reference.

With this, the xen network driver survives multiple trips around the ring.
2014-11-07 14:26:00 +02:00
Avi Kivity
ee7ec972eb xen: request tx notification on tx completion
Otherwise, we never learn that transmission has completed and never
recycle ring entries.

This is still a little lame as we don't do any batching.
2014-11-07 14:26:00 +02:00
Avi Kivity
8a29d4a78a xen: replenish rx ring entries
Keep recycling free ring entries back into the receive ring so we can
receive more than 256 packets.

The code is a little lame at the moment since it writes the index and
notifies the host for every frame, but that can be adjusted later.
2014-11-07 14:25:58 +02:00
Avi Kivity
8dff32eea5 xen: simplify free grant table ref id management
There is no reason to wait when pushing back a free id - there is nothing
that could possibly block there.

Switch from a queue<> to an std::queue<> and use a semaphore to guard
popping from the queue.
2014-11-07 13:09:24 +02:00
Asias He
dbfd636a0b net: Fix proxy_net_device option parse
Running tcp stream test with --smp > 1, sometimes the server sends TSO
frame, sometimes it does not. If we set --smp = 1, the server always
sends TSO frame. This is because the proxy device does not parse all the
features in the opts. We should copy the _hw_features from the real
device but it is not easy. For now, we simply duplicate the parse code.
2014-11-07 11:17:33 +02:00
Asias He
ff674d3e0e tcp: Avoid unnecessary ACK
E.g. Avoid Dup ACK in packet #981

979 4.115432000 192.168.66.123 -> 192.168.66.100 TCP 20406
    [TCP Window Full] 10000 > 50112 [ACK] Seq=10675905 Ack=801512443 Win=3737600 Len=20352

980 4.119002000 192.168.66.100 -> 192.168.66.123 TCP 54
    [TCP ZeroWindow] 50112 > 10000 [ACK] Seq=801512443 Ack=10696257 Win=0 Len=0

981 4.119063000 192.168.66.123 -> 192.168.66.100 TCP 54
    [TCP Dup ACK 979#1] 10000 > 50112 [ACK] Seq=10696257 Ack=801512443 Win=3737600 Len=0

982 4.137244000 192.168.66.100 -> 192.168.66.123 TCP 54
    [TCP Window Update] 50112 > 10000 [ACK] Seq=801512443 Ack=10696257 Win=40704 Len=0
2014-11-07 11:17:33 +02:00
Asias He
5b994fb4f0 tcp: Fix _data_received_promise and _all_data_acked_promise
We should clear it right after we set value, otherwise we might set
value more than once.
2014-11-07 11:17:31 +02:00
Raphael S. Carvalho
3878d387f0 memcache: udp: allocate conversation state
Fix UDP for memcache with native stack

memcached: apps/memcached/memcached.cc:807:
void memcache::assert_resolved(future<>): Assertion `f.available()'
failed.

Tomek writes:
UDP path relied on the fact that handle() could not block, because
the output stream does not block, and passed references to variables
which live on stack. Since you now can block in handle_get(), this
no longer holds. We should chnage that, ie allocate conversation
state like we do in TCP.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2014-11-06 18:21:56 +02:00
Asias He
784bf7a7e2 tests: Rename class http_server to tcp_server 2014-11-06 17:06:10 +02:00
Tomasz Grabiec
26361873bc test_ascii_parser: fix potential use-after-free error 2014-11-06 15:48:31 +02:00
Asias He
1e40660248 net: Switch to optional for _data_received 2014-11-06 14:50:16 +02:00
Asias He
14130ab1e8 net: Fix TCP sending of bulk data
Fix tcp_server tx test. We still have more to do.

Native stack:
$ go run client-txtx.go
Bytes Received(MiB):  1000
Total Time(Secs):  1.567927562
Bandwidth(MiB/Sec):  637.7845662234746

Posix stack:
$ go run client-txtx.go
Bytes Received(MiB):  1000
Total Time(Secs):  1.014354958
Bandwidth(MiB/Sec):  985.8481906291427

Note: client-txtx uses 100 concurrent connections.
2014-11-06 14:50:12 +02:00
Asias He
2a582fd1a6 net: fix tso maximum packet size
With TSO enabled, we can see a Ethernet frame larger than 64K on tap
device. This makes wireshark unable to handle. It complains:

   The capture file appears to be damaged or corrupt.
   (pcapng_read_packet_block: cap_len 65549 is larger than
   WTAP_MAX_PACKET_SIZE 65535.)
2014-11-06 14:50:11 +02:00
Asias He
2e9366ba24 tests: Add send test to tcp_server
$ printf "txtx" | nc 192.168.66.123 10000

Server will send a large mount of data to client for TCP tx testing.
Currently, it sends 10MB of char 'X'.
2014-11-06 14:49:53 +02:00
Avi Kivity
4df81e0fba Merge branch 'glommer/xen' of github.com:cloudius-systems/seastar-dev
From Glauber:

"This is all the xen work I have. There is still improvements to be made with
the ring management, memory allocation, and other areas."
2014-11-06 12:45:30 +02:00
Glauber Costa
6c0aaa126c xen: grant recycle
handle buffer recycles. Right now it is very simple: allocate a new receive
buffer after a succesful receival, and mark the tx spot free when we get the tx
event notification.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-06 11:22:15 +01:00
Glauber Costa
3d0f2de8bb xen: method to end a grant operation
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-06 11:21:30 +01:00
Glauber Costa
0a1f5f9e73 xen: defer grant table operations
Instead of returning a reference to a grant that is already present in an
array, defer the initialization. This is how the OSv driver handles it, and I
honestly am not sure if this is really needed: it seems to me we should be able
to just reuse the old grants. I need to check in the backend code if we can be
any smarter than this.

However, right now we need to do something to recycle the buffers, and just
re-doing the refs would lead to inconsistencies. So the best by now is to close
and reopen the grants, and then later on rework this in a way that works for
both the initial setup and the recycle.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-06 11:21:30 +01:00
Glauber Costa
722926d545 xen: factor out allocation of a single rx entry
I'll need this code later to refill the buffer, so factor this out

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-06 11:21:30 +01:00
Glauber Costa
01c861fba4 xen: don't increment producer index in receive path
Right now, we allocate the whole index, and notify the backend that we have
produced nr_ents indexes. If we do that, we cannot increment the producer index
when we receive a new package. This would make the index overflow, and
basically, it is the responsible for the biggest part of the slowdown we are
seeing.

Before this patch, we're seeing 2s RTT for pings. After the patch:

64 bytes from 192.168.100.79: icmp_seq=1 ttl=64 time=0.437 ms
64 bytes from 192.168.100.79: icmp_seq=2 ttl=64 time=0.431 ms
64 bytes from 192.168.100.79: icmp_seq=3 ttl=64 time=0.475 ms

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-06 11:21:29 +01:00
Glauber Costa
ae1122bfc8 xen: manage index list
Aside from managing the grant references, we also need to manage the positional
indexes in the array. We need to keep track of which indexes are free, and
which are used. Because we need the actual position number to fill xen's data
structures, I figured we could use a queue and then fill it up with all the
integers in our range. The queue is already futurized, so that's easy.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-06 11:21:29 +01:00
Glauber Costa
ee172e36c1 xen: enhance gntref
Enhance gntref with some useful operations. Also provide a default object that
represents an invalid grant.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-06 11:21:29 +01:00
Gleb Natapov
d698811bdd fix smp broadcast packet handling
Some packets, like arp replies, are broadcast to all cpus for handling,
but only packet structure is copied for each cpu, the actual packet data
is the same for all of them. Currently networking stack mangles a
packet data during its travel up the stack while doing ntoh()
translations which cannot obviously work for broadcaster packets. This
patches fixes the code to not modify packet data while doing ntoh(), but
do it in a stack allocated copy of a data instead.
2014-11-06 10:30:30 +02:00
Pekka Enberg
86aa399482 net: Fix build when Xen support is disabled
Fixes the following link errors when Xen support is disabled:

build/release/net/native-stack.o: In function `net::add_native_net_options_description(boost::program_options::options_description&)':
/seastar/net/native-stack.cc:101: undefined reference to `get_xenfront_net_options_description()'
build/release/net/native-stack.o: In function `net::create_native_net_device(boost::program_options::variables_map)':
/seastar/net/native-stack.cc:93: undefined reference to `create_xenfront_net_device(boost::program_options::variables_map, bool)'
collect2: error: ld returned 1 exit status

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2014-11-06 10:24:03 +02:00
Glauber Costa
63c8db870f xen: remove debug printfs
As packet flow is working reasonably now, most of the prints can go.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-05 22:30:25 +01:00
Glauber Costa
73b8f98318 xen: use nr_ents instead of numeric constant in netfront header
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-05 21:41:52 +01:00
Avi Kivity
5052d34d23 Merge branch 'xen'
Partial Xen support.
2014-11-05 15:31:23 +02:00
Avi Kivity
369f31d4c5 xen: simplify front_ring constructor 2014-11-05 15:09:04 +02:00
Avi Kivity
2d14053e6e xen: make gntref more readable
Convert it from std::pair with meaningless .first and .second fields to
a proper struct.
2014-11-05 15:09:04 +02:00
Avi Kivity
0a0dc6eb90 xen: provide correct checksum offload flags to the host
Tell Xen when we've computed the checksum ourselves, and when we have a
partial checksum filled.
2014-11-05 15:09:04 +02:00
Avi Kivity
c52b4fdc47 xen: partial support for checksum offload
Checksum offload cannot be disabled in Xen (or at least, I haven't figured
out how).  Advertise it as enabled, so that tcp doesn't drop packets as
failing their checksum.

Still need to flesh out the transmit path.

With this, seastar sends SYN/ACK packets in response to connection requests.
2014-11-05 15:09:04 +02:00
Avi Kivity
6581de0fa7 xen: nack features we don't support yet
Pretending to support a feature we don't can lead to protocol failures.
2014-11-05 15:09:04 +02:00
Avi Kivity
2fdaac3132 xen: linearize packet before transmitting
Since we haven't negotiated the scatter/gather capability yet, and we
don't support the scatter protocol, linearize the packet before sending it.
2014-11-05 15:09:03 +02:00
Avi Kivity
a9a87c8dbd xen: fix low-level interrupt handling with osv
The Xen code registers a function that calls semaphore::signal as
an interrupt handler, however that function is not smp safe and may crash,
and in events it generates are likely to be ignored, since they are just
appended to the reactor queue without any real wakeup to the reactor thread.

Switch to using an eventfd.  That's still unsafe, but a little better, since
its signalling is smp safe, and will cause the reactor thread to wake up
in case it was asleep.

With this, we are able to receive multiple packets.
2014-11-05 15:09:03 +02:00
Avi Kivity
6e193b2874 xen: fix memory barrier when writing rx buffer ring
The barrier must separate writing the ring data from the ring index,
otherwise the other side may see unwritten ring data.
2014-11-05 15:09:03 +02:00
Avi Kivity
9f5a4e90d1 xen: fix misaccounting of prepared rx buffers
We prepared N buffers, but only told the host about one.  This meant the host
stopped forwarding received packets almost immediately.

Fix by writing the Xen-visible ring index correctly.
2014-11-05 15:09:03 +02:00
Avi Kivity
80c8337eef xen: don't receive packets before we've created a subscription
Or the code falls over on a null _sub.
2014-11-05 15:09:03 +02:00
Avi Kivity
a769737faa xen: fix another bad grant operation
We used gnttab_grant_foreign_access() instead of
gnttab_grant_foreign_access_ref().  While the two functions have similar
enough signatures, they do very different things.

With the change, we are able to receive packets from Xen, though we crash
immediately.
2014-11-05 15:09:03 +02:00
Avi Kivity
afbe788235 xen: fix bad grant operation
We used gnttab_grant_foreign_access() instead of
gnttab_grant_foreign_access_ref().  While the two functions have similar
enough signatures, they do very different things.

With the change, we are able to transmit packets through Xen.
2014-11-05 15:09:03 +02:00
Avi Kivity
6269fe2bdf xen: fix virt_to_mfn()
Need to shift by 12 to get to a frame number.  With this the host accepts
the guest interface.
2014-11-05 15:09:03 +02:00
Glauber Costa
6bb8d687d0 native stack: support more than virtio
Support xenfront as well, when we are in a Xen domain.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-05 15:09:03 +02:00
Glauber Costa
72abe62c4e xenfront basic support
This is the basic support for xenfront. It can be used in domU, provided there
is a network interface to be hijacked.

The code that follows, is just the mechanics of managing the grants, event
channels, etc.

However, it does not yet work: I can't see netback injecting any data into it.
I am still debugging the protocol, but I wanted to flush the current state.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-05 15:09:03 +02:00