Commit Graph

1250 Commits

Author SHA1 Message Date
Asias He
6a468dfd3d packet: Linearize more in packet_merger::merge
This fix tcp_server rxrx test on DPDK. The problem is that when we
receive out of order packets, we will hold the packet in the ooo queue.
We do linearize on the incoming packet which will copy the packet and
thus free the old packet. However, we missed one case where we need to
linearize. As a result, the original packet will be held in the ooo
queue. In DPDK, we have fixed buffer in the rx pool. When all the dpdk
buffer are in ooo queue, we will not be able to receive further packets.
So rx hangs, even ping will not work.
2015-02-05 17:52:32 +08:00
Asias He
4f21d500cb tcp: Do nothing if already in CLOSED state when close
This fix the following:

Server side:
$ tcp_server

Client side:
$ go run client.go -host 192.168.66.123 -conn 10 -test txtx
$ control-c

At this time, connection in tcp_server will be in CLOSED state (reset by
the remote), then tcp_server will call tcp::tcb::close() and wait for
wait_for_all_data_acked(), but no one will signal it. Thus we have tons
of leaked connection in CLOSED state.
2015-02-05 17:52:32 +08:00
Asias He
dd741d11b8 tcp: Fix FIN is not sent in some cases
We call output_one to make sure a packet with FIN is actually generated
and then sent out. If we only call output() and _packetq is not empty,
in tcp::tcb::get_packet(), packet with FIN will not be generated, thus
we will not send out a FIN.

This can happen when retransmit packets have been queued into _packetq,
then ACK comes which ACK all of the unacked data, then the application
call close() to close the connection.
2015-02-05 17:52:32 +08:00
Asias He
f600e3c902 tcp: Add queued_len
Take the number of queued data into account when checking if all
the data is sent.
2015-02-05 17:52:32 +08:00
Asias He
fca74f9563 tcp: Implement RFC6582 NewReno
We currently have RFC5681, a.k.a Reno TCP, as the congestion control
algorithms: slow start, congestion avoidance, fast retransmit, and fast
recovery. RFC6582 describes a specific algorithm for responding to
partial acknowledgments, referred to as NewReno, to improve Reno.
2015-02-05 17:45:48 +08:00
Asias He
426938f4ed tcp: Add Limited Transfer per RFC3042 and RFC5681
When RFC3042 is in use, additional data sent in limited transmit MUST
NOT be included in this calculation to update _snd.ssthresh.
2015-02-05 17:05:00 +08:00
Asias He
2289b03354 httpd: Fix RST handling
I found wrk sometimes sends RST instead a FIN to close a connection. In
this case, we will reset the connection and go to CLOSED state. However
httpd will not delete this, so we will have leaked connections in CLOSED
state.

Fix by handling the exception and sending an empty response as we do in
EOF case. Here we do not pass the exception to upper layer again,
otherwise httpd will be very noise.
2015-02-05 16:57:58 +08:00
Gleb Natapov
89763c95c9 core: optimise timer completions vs periodic timers
The way periodic timers are rearmed during timer completion causes
timer_settime() to be called twice for each periodic timer completion:
once during rearm and second time by enable_fn(). Fix it by providing
another function that only re-adds timer into timers container, but do
not call timer_settime().
2015-01-29 12:43:28 +02:00
Avi Kivity
94e01e6d0e tests: exit after timertest ends 2015-01-29 12:24:03 +02:00
Avi Kivity
070eb7d496 tests: serialize timer tests
Otherwise the output gets interspersed.
2015-01-29 12:20:39 +02:00
Avi Kivity
59c0d7e893 smp: fix work item deletion
Delete it after completion, not after responding.
2015-01-29 12:14:05 +02:00
Gleb Natapov
bcae5f2538 smp: fix memory leak in smp queue
Delete completed items. Fixes regression from ff4aca2ee0.
2015-01-29 11:49:24 +02:00
Avi Kivity
42bc73a25d dpdk: initialize _tx_burst_idx
Should fix random segfault.
2015-01-29 11:18:54 +02:00
Asias He
0ab01d06ac tcp: Rework segment arrival handling
Follow RFC793 section "SEGMENT ARRIVES".

There are 4 major cases:

1) If the state is CLOSED
2) If the state is LISTEN
3) If the state is SYN-SENT
4) If the state is other state

Note:

- This change is significant (more than 10 pages in RFC793 describing
  this segment arrival handling).
- More test is needed. Good news is, so far, tcp_server(ping/txtx/rxrx)
  tests and httpd work fine.
2015-01-29 10:59:31 +02:00
Tomasz Grabiec
661bb3d478 tests: Use test_runner to run boost tests 2015-01-29 10:30:14 +02:00
Tomasz Grabiec
a1fecad8cb tests: Introduce test_runner class
It uses app_template to launch seastar framework and can be used from
outside threads to inject tasks.
2015-01-29 10:30:14 +02:00
Tomasz Grabiec
8ad50d6614 core: Add exchanger class 2015-01-29 10:30:13 +02:00
Avi Kivity
b3dd1c8285 Merge branch 'signal' of ../seastar
Simplify signal handling.
2015-01-29 10:08:27 +02:00
Takuya ASADA
9de86ed651 tests: Support tcp_server tests(ping,txtx,rxrx) on tcp_client
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-01-28 16:26:46 +02:00
Tomasz Grabiec
7a55f21b29 core: Move _timer to an instance field
So that the callback which is set on it and which is allocated on CPU
0 is destroyed on CPU 0 when the clock dies. Otherwise we can attempt
to delete it after the CPU thread is gone if CPU 0 != main thread.
2015-01-28 16:18:55 +02:00
Tomasz Grabiec
8a126b9088 core: Fix use-after-free error on _threads
When smp::configure() is called from non-main thread, then the global
state which it allocates will be destroyed after reactor is destroyed,
because it will be destroyed from the main thread and the reactor will
be destroyed together with the thread which called
smp::configure(). This will result in SIGSEGV when allocator tries to
free _threads vector across CPU threads because the target CPU was
alrady freed. See issue #10.

To fix this, I introduced smp::cleanup() method which should cleanup
all global state and should be called in the same thread in which
smp::configure() was called.

I need to call smp::configure() from non-main thread for integration
with boost unit testing framework.
2015-01-28 16:18:53 +02:00
Tomasz Grabiec
555977f5e6 core: drop BSD license text from resource.hh 2015-01-28 16:18:50 +02:00
Gleb Natapov
ada48a5213 net: use iterators to iterate over circular_buffer in dpdk 2015-01-28 13:49:09 +02:00
Gleb Natapov
bb072fc5c9 core: add iterator for circular_buffer container 2015-01-28 13:49:09 +02:00
Takuya ASADA
be5568ae31 distributed: handle invoke_on with void return value
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-01-28 13:48:18 +02:00
Avi Kivity
38839293f9 reactor: simplify timer handling
Instead of scheduling timer processing to happen in the future,
process timers in the context of the poller (signal poller for high
resolution timer, lowres time poller for low resolution timers).

This both reduces timer latency (not very important) but removes a
use of promise/future in a repetitive task, which is error prone.
2015-01-28 11:29:16 +02:00
Avi Kivity
770214ea63 reactor: simplify signal handling
Signals can be repetitive, and therefore are not a good match for
promise/future, which are single shot.

Replace with plain callbacks.
2015-01-28 11:07:14 +02:00
Avi Kivity
24d5c319a3 httpd: return Server and Date headers
Required by some benchmarks.
2015-01-27 18:57:59 +02:00
Avi Kivity
a9c0fbd8f7 timer: add constructor from callback 2015-01-27 18:57:59 +02:00
Gleb Natapov
5454c79613 core: allocate reactors on each cpu instead of using thread_local variable
I see TLS init function for engine high in cache miss profile. And yes,
this patch has #define.
2015-01-27 14:46:49 +02:00
Gleb Natapov
7a92efe8d1 core: add local engine accessor function
Do not use thread local engine variable directly, but use accessor
instead.
2015-01-27 14:46:49 +02:00
Gleb Natapov
18d212b04e core: do not use separate thread_local variable to track pending signals
Access to thread_local variable goes throw a helper function.
2015-01-27 12:33:10 +02:00
Gleb Natapov
74f9f1fdd2 core: prefetch different amount of work items for different queues
Incoming item processing usually takes more work then completion
item processing. Prefetch more completion items to make sure they are
ready before access.
2015-01-25 17:51:21 +02:00
Gleb Natapov
0383459b93 core: prefetch only valid addresses
Prefethcing non mapped address incurs address translation cost.
2015-01-25 17:51:21 +02:00
Gleb Natapov
ff4aca2ee0 core: prefetch work items before processing 2015-01-25 14:48:30 +02:00
Gleb Natapov
b9554219dc core: add prefetch functions 2015-01-25 14:48:30 +02:00
Vlad Zolotarov
85b62d8132 memory: hugetlbfs mapping may not be invalid
Turn a condition into an assert() since if a mapping is invalid this may
only mean that we have a bug.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-01-25 13:22:11 +02:00
Tomasz Grabiec
24227fd933 shared_ptr: Add helpers for hashing shared_ptr<> by value 2015-01-25 13:12:04 +02:00
Raphael S. Carvalho
be7dbcbf50 core: improve reactor::receive_signal()
receive_signal() uses the unordered map _signal_handlers (signo mapped to
signal_handler) to either register a signal or find an existing one, and
from there get a future from the promise associated with that signal.
The problem is _signal_handlers.emplace() being called unconditionally,
resulting in the constructor from signal_handler always being called to
needlessly re-register the same handler, even when the signo is already
inserted in the map.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-01-25 13:10:55 +02:00
Gleb Natapov
b250b53004 core: remove reference to smp_message_queue from async_work_item
It is unused.
2015-01-22 13:25:30 +02:00
Gleb Natapov
d83e4c49fc smp: put fields used by different cores on different cache lines 2015-01-22 13:25:30 +02:00
Avi Kivity
a8054e5aae memory: init virt to phys mapping after binding memory to node
Binding may cause memory to move, so initialize the page map after it is done.
2015-01-22 12:38:22 +02:00
Avi Kivity
d0ec99317d net: move some device and qp methods out-of-line 2015-01-22 09:44:44 +02:00
Avi Kivity
5678a0995e net: use a redirection table to forward packets to proxy queues
Build a 128-entry redirection table to select which cpu services which
packet, when we have more cores than queues (and thus need to dispatch
internally).

Add a --hw-queue-weight to control the relative weight of the hardware queue.
With a weight of 0, the core that services the hardware queue will not
process any packets; with a weight of 1 (default) it will process an equal
share of packets, compared to proxy queues.
2015-01-22 09:36:04 +02:00
Avi Kivity
285d4af077 memory: adjust pagemap parsing
The pfn is in bits 0:54 inclusive, we missed the high bit.

Should have no effect in systems with less than a few exabytes of memory.
2015-01-21 19:06:30 +02:00
Asias He
71ac2b5b24 tcp: Rename tcp::send()
Unlike tcp::tcb::send() and tcp::connection::send() which send tcp
packets associated with tcb, tcp::send() only send packets associated
without tcb. We have a bunch of send() functions, rename it to make the
code more readable.
2015-01-21 13:22:40 +02:00
Asias He
917247455c tcp: Use set_exception instead of set_value to notify user on rst 2015-01-21 11:20:06 +02:00
Asias He
8ce7cfd64b tcp: Fix listener port
It is supposed to zero the origin's port.
2015-01-21 11:20:05 +02:00
Avi Kivity
0af3af9d8d Merge branch 'asias/syn_fin_retransmit' of github.com:cloudius-systems/seastar-dev
TCP fixes from Asias, adding SYN/FIN retransmits and handling timeouts around
them.
2015-01-21 10:30:09 +02:00
Asias He
0c09a6bd7a tcp: Return a future for tcp::connect() 2015-01-21 16:20:39 +08:00