scylladb

Author	SHA1	Message	Date
Gleb Natapov	32b42af49f	net: register l3 poller for tcp connections This patch change tcp to register a poller so that l3 can poll tcp for a packet instead of pushing packets from tcp to ipv4. This pushes networking tx path inversion a little bit closer to an application.	2015-01-11 10:48:32 +02:00
Gleb Natapov	d5c309c74e	net: provide poller registration API between l3 and l4 Both push and pull methods will be supported between l3 and l4 after this patch.	2015-01-11 10:17:48 +02:00
Gleb Natapov	2b340b80ce	net: unfuturize packet fragmentation Since sending of a single packet does not involve futures anymore we can simplify this code.	2015-01-11 10:17:48 +02:00
Gleb Natapov	b824790798	net: move udp_v4 from network_stack into ipv4 class ipv4 class manages tcp and icmp, but for some reason udp is managed by network_stack. Fix this and make all L4 protocol handling to be the same.	2015-01-08 11:33:19 +02:00
Gleb Natapov	0fd014fc35	net: add add completion callback between l3 and l4 L4 will provide the callback to be called by L3 after the packet is handled to lower layers for transmission. L4 will know that it can queue more data from user at this point. The patch also change send function that can no longer block to return void instead of future<>.	2015-01-06 15:24:10 +02:00
Gleb Natapov	e80fa4af7d	net: drop top level 'remaining' from ipv4::send() It is not needed.	2015-01-06 15:24:10 +02:00
Gleb Natapov	12bce3f4fc	net: make interface get packets from l3 Instead of l3 (arp/ipv4) pushing packets into interface's queue, make them register functions that interface can use to ask l3 for packets.	2015-01-06 15:24:10 +02:00
Avi Kivity	87f63f7b90	shared_ptr: rename to lw_shared_ptr (for light-weight) The current shared_ptr implementation is efficient, but does not support polymorphic types. Rename it in order to make room for a polymorphic shared_ptr.	2015-01-04 22:38:49 +02:00
Gleb Natapov	510171d083	net: add function to map packet's rss hash to a cpu Provide a function that maps packet's rss hash to a cpu that should handle it. This function is needed to find appropriate src port for outgoing tcp/udp connection. Use this function to forward de-fragmented ip packet to avoid one extra hop too.	2014-12-23 17:36:40 +02:00
Avi Kivity	3e4c53300d	Merge branch 'mq' of ssh://github.com/cloudius-systems/seastar-dev Multiqueue support for #cpu != #q, from Gleb.	2014-12-16 11:11:22 +02:00
Gleb Natapov	d8ddaeb104	net: forward reassembled ip packet to correct queue To figure out a cpu that should handle reassembled TCP packet RSS redirection table have to be consulted.	2014-12-16 10:53:41 +02:00
Gleb Natapov	7ac3ba901c	net: rework packet forwarding logic Instead of forward() deciding packet destination make it collect input for RSS hash function depending on packet type. After data is collected use toeplitz hash function to calculate packet's destination.	2014-12-16 10:53:41 +02:00
Gleb Natapov	c13adb9c12	net: rework how dhcp handles dhcp packet. Currently dhcp assumes that cpu 0 gets all the packets and redistributes them by itself. With multiqueue this is not necessary the case, so the current trick to disable forwarding by installing special dhcp forward() function will not work. Rework it by installing packet filter on all cpus before running dhcp and forward all dhcp packets to cpu 0.	2014-12-15 17:31:25 +02:00
Asias He	0790266be0	ip: Switch to use lowres_clock	2014-12-15 19:39:33 +08:00
Nadav Har'El	3d874892a7	dpdk: enable transmit-side checksumming offload This patch uses the NIC's capability to calculate in hardware the IP, TCP and UDP checksums on outgoing packets, instead of us doing this on the sending CPU. This can save us quite a bit of calculations (especially for the TCP/UDP checksum of full-sized packets), and avoid cache-polution on the CPU when sending cold data. On my setup this patch improves the performance of a single-cpu memcached by 6%. Together with the recent patch for receive-side checksum offloading, the total improvement is 10%. This patch is somewhat complicated by the fact we have so many different combinations of checksum-offloading capabilities; While virtio can only offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP (e.g., ICMP), and some packets are not even IP (e.g., ARP), so this patch modifies a few of the hardware-features flags and the per-packet offload-information flags to fit our new needs. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-12-10 18:05:02 +02:00
Asias He	9a9297c89d	ip: Implement fragment timeout and memory usage limit	2014-12-09 09:59:44 +02:00
Avi Kivity	a2016bc1dd	ip: fix smp fragment reassembly ipv4::handle_on_cpu() did not properly convert from network byte order, so it saw any packets with DF=1 as fragmented. Fix by applying the proper conversion.	2014-12-07 12:01:31 +02:00
Asias He	59aa280f0d	ip: Add IPv4 reassembly support If a TCP or UDP IP datagram is fragmented, only the first fragment will contain the port information. When a fragment without port information is received, we have no idea which "stream" this fragment belongs to, thus we no idea how to forward this packet. To solve this problem, we use "forward twice" method. When IP datagram which needs fragmentation is received, we forward it using the frag_id(src_ip, dst_ip, identification, protocol) hash. When all the fragments are received, we forward it using the connection_id(src_ip, src_port, dst_ip, dst_port) hash.	2014-12-03 21:40:49 +08:00
Asias He	88a1a37a88	ip: Support IP fragmentation in TX path Tested with UDP sending large datagrams with ufo off.	2014-11-30 10:16:38 +02:00
Gleb Natapov	d698811bdd	fix smp broadcast packet handling Some packets, like arp replies, are broadcast to all cpus for handling, but only packet structure is copied for each cpu, the actual packet data is the same for all of them. Currently networking stack mangles a packet data during its travel up the stack while doing ntoh() translations which cannot obviously work for broadcaster packets. This patches fixes the code to not modify packet data while doing ntoh(), but do it in a stack allocated copy of a data instead.	2014-11-06 10:30:30 +02:00
Calle Wilund	bd263b3b4e	net: Add "packet filter" functionality + accessors + "raw" packet send function Perhaps not the best way to enable "hijacking" the ip stack (for DHCP querying), but considering the options seems the least intrusive. Signed-off-by: Calle Wilund <calle@cloudius-systems.com>	2014-11-05 14:50:28 +02:00
Asias He	c33270105b	net: Handle extra bytes contained in Ethernet frame. The Ethernet frame might contain extra bytes after the IP packet for padding. Trim the extra bytes in order not to confuse TCP. E.g. When doing TCP connection: 1) <SYN> 2) <SYN,ACK> 3) <ACK> Packet 3) should be 14 + 20 + 20 = 54 bytes, the sender might send a packet of size 60 bytes, containing 6 extra bytes for padding. Fix httpd on ran/sif.	2014-11-04 10:41:41 +02:00
Avi Kivity	7a1f84a556	reactor: replace references to reactor::_id by its accessor cpu_id()	2014-11-01 17:34:43 +02:00
Asias He	2625dd5944	net: Introduce eth_protocol_num	2014-10-13 11:37:56 +08:00
Asias He	5cf3f200c5	net: Introduce ip_protocol_num We use this in all the places where the ip protocol number is used.	2014-10-13 11:37:56 +08:00
Asias He	05c72b0808	net: UDP checksum offload and UPD fragmentation offload	2014-10-13 11:37:56 +08:00
Gleb Natapov	4e7d8a8506	Introduce packet classification mechanism Classifier returns what cpu a packets should be processed on. It may return special broadcast identifier. The patch includes classifier for tcp, udp and arp. Arp classifier broadcasts arp reply to all cpus. Default classifier does not forward packet.	2014-10-07 11:03:57 +03:00
Tomasz Grabiec	05aece51dc	virtio: remove intermediate queue Currently the send path buffers packets in (unbounded) _tx_queue and in virtio ring. On queue would suffice though. This change also prpagates the back pressure resulting from queue-full condition up the send path. This is needed, becasue otherwise if senders are faster than the network we will eventually run out of memory. This would also cause a "buffer bloat" effect, which hurts latency-sensitive workloads.	2014-10-04 11:27:23 +02:00
Tomasz Grabiec	076d3b2682	ip: connect send() action with L3's send() action So that back-pressure or failure from the lower layers are tranferred.	2014-10-04 11:27:23 +02:00
Tomasz Grabiec	04b53b7498	ip: make send() composable This allows the caller to compose it with other actions when send() is done or when it fails.	2014-10-01 13:45:28 +02:00
Asias He	cff8cb353a	net: Add netmask option	2014-09-28 10:06:08 +03:00
Asias He	7ab735d3c7	net: Gateway support	2014-09-28 10:05:58 +03:00
Asias He	c5d623265d	net: Support ping	2014-09-25 17:49:38 +03:00
Asias He	4bd9f4d49d	net: Always check IP header checksum virtio-net do tcp checksum offload not IP header checksum offload. Signed-off-by: Avi Kivity <avi@cloudius-systems.com>	2014-09-25 12:56:53 +03:00
Asias He	236418d262	net: Support TCP checksum offload It gives ~5% httpd improvements on monster. csum-offload option is added, e.g., to disable: ./httpd --network-stack native --csum-offload off	2014-09-24 11:03:39 +03:00
Avi Kivity	907792fe26	net: ipv4: remove unused field to please clang	2014-09-22 17:19:15 +03:00
Avi Kivity	abbc62588b	net: checksum incoming ipv4 packets	2014-09-22 15:46:35 +03:00
Avi Kivity	313768654a	net: remove queuing from l2->l3 rx path Use a subscription instead. Queueing should be implemented at the highest possible level (e.g. tcp), to avoid double-queueing.	2014-09-22 11:28:35 +03:00
Tomasz Grabiec	53ce24c850	ip: add method to register L4 protocol handlers	2014-09-16 18:48:14 +03:00
Avi Kivity	37c90fe54e	net: make packet data members private This will assist in future refactoring.	2014-09-16 11:24:13 +03:00
Avi Kivity	5e2f1f0bc6	net: spit out ip checksum routines into their own file	2014-09-16 10:26:44 +03:00
Avi Kivity	89ec8f2ae7	net: fix IP checksum overflow during reduction When reducing the checksum from a 32-bit or 64-bit intermediate, we can get an overflow after the first overflow handling step: 0000_8000_8000_ffff -> 10_ffff -> 1_000f -> 0010 Since we lacked the second step, we got an off-by-one in the checksum.	2014-09-14 16:58:27 +03:00
Avi Kivity	1fbe325f63	net: add a helper to allocate a header in an existing packet Use in IP and ethernet layers.	2014-09-02 23:29:43 +03:00
Avi Kivity	1396459085	net: integrate tcp into ipv4 Define the traits class used to communicate address types and pseudo header to tcp, and a few glue classes.	2014-09-02 20:39:12 +03:00
Avi Kivity	673dd21c8b	net: fix ip tx - checksum - total length - endianness were all wrong.	2014-09-02 20:34:19 +03:00
Avi Kivity	c6412f23fc	net: fix ip netmask checks	2014-09-02 20:33:49 +03:00
Avi Kivity	0a45d4d73b	net: implement IPv4 L3->l4 dispatching	2014-09-01 15:19:17 +03:00
Avi Kivity	b2b24031e9	net: generalize IP checksummer Allow it to checksum packets and fragments.	2014-09-01 15:17:27 +03:00
Avi Kivity	0fbda7c1ec	Fix IP checksum for off lengths Since the data is in network byte order, we must pad the last word to the right.	2014-08-31 23:37:34 +03:00
Avi Kivity	c77f77ee3f	build: organize files into a directory structure	2014-08-31 21:29:13 +03:00

50 Commits