scylladb

Author	SHA1	Message	Date
Gleb Natapov	13c1324d45	net: provide some statistics via collectd Provide batching and overall send/received packet stats.	2015-01-08 17:41:26 +02:00
Gleb Natapov	8d4e6b832a	net: implement bulk sending interface for dpdk	2015-01-06 15:24:10 +02:00
Gleb Natapov	f0cdc47a3a	net: do not sleep while waiting for link in dpdk Use promise and seastar timers instead.	2014-12-29 13:06:10 +02:00
Vlad Zolotarov	db50b480a3	dpdk: check_port_link_status(): Cosmetics fix of a printouts. Add a space after the "Checking link status" to prevent it from merging with "done" if the link is up immediatelly. For instance this is going to be the case for a VF of a PF with already established link (e.g. on AWS). Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-23 16:55:50 +02:00
Vlad Zolotarov	1a6474d6cc	dpdk: added the asserts to check the assumptions regarding CSUM features We assume that if Rx IPv4, TCP and UDP checksum offload features are suported then they are supported or not supported all together. The same is about the Tx UDP and TCP checksum offload. Add the assert that check this assumption. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-23 16:55:44 +02:00
Vlad Zolotarov	38781639ef	dpdk: Use all availiable parser options for RSS. Don't limit ourselves to just IPV4, TCP and UDP even if it's all we currently care about. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-23 16:55:38 +02:00
Vlad Zolotarov	02dd7a3e24	packet: Change the type of offload_info.vlan_tci to std::experimental::optional Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-23 16:51:05 +02:00
Vlad Zolotarov	c9e0e7aff8	dpdk: Set RSS mode: enable RSS if seastar is configured with more than 1 CPU. Even if port has a single queue we still want the RSS feature to be available in order to make HW calculate RSS hash for us. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-23 16:50:28 +02:00
Vlad Zolotarov	15e432715a	dpdk: Use DPDK provided default configurations for Rx and Tx queues parameters. DPDK 1.8 provides per-device default Tx and Rx queues configurations in the output of rte_eth_dev_info_get(). Use them instead of ixgbe tuned hardcoded values. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-23 16:48:31 +02:00
Vlad Zolotarov	51bb90a397	dpdk: Don't print the MAC address from the hw_address() method. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-22 17:37:18 +02:00
Vlad Zolotarov	2b4f9f69f8	dpdk: Make the port initialization stages more pronounced - Rename: init_port() -> init_port_start(). - Added a function init_port_fini() that has a code originally found flat in init_local_queue(). - Moved the link state check to init_port_fini() since the link state should be checked after the port has been started. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-22 17:37:13 +02:00
Vlad Zolotarov	59403f0774	dpdk: First version that supports both 1.7.x and 1.8.x (current git master) DPDK versions. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-22 17:37:05 +02:00
Vlad Zolotarov	ddf239a943	dpdk: Move the scattered DPDK EAL initialization into the dpdk::eal. - Move the smp::dpdk_eal_init() code into the dpdk::eal::init() where it belongs. - Removed the unused "opts" parameter of dpdk::dpdk_device constructor - all its usage has been moved to dpdk::eal::init(). - Cleanup in reactor.cc: #if HAVE_DPDK -> #ifdef HAVE_DPDK; since we give a -DHAVE_DPDK option to a compiler. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-22 17:36:49 +02:00
Vlad Zolotarov	7ec062e222	dpdk: Move dpdk_eal class into a separate file - Make it's methods static. - Rename dpdk::dpdk_eal -> dpdk::eal Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-22 17:36:42 +02:00
Gleb Natapov	b958a44304	smp: create seastar threads using DPDK when compiled with DPDK support DPDK initialization creates its own threads and assumes that application uses them, otherwise things do not work correctly (rte_lcore_id() returns incorrect value for instance). This patch uses DPDK threads to run seastar main loop making DPDK APIs work as expected.	2014-12-18 14:43:37 +02:00
Gleb Natapov	c8189157ed	net: use RSS hash key calculated by HW if available Some (all?) RSS capable HW provides us with a hash that was used to select rx queue the packet was delivered to. If such hash is available it is better to use it to forward packet instead of calculating hash ourself and suffering cache missed.	2014-12-16 10:53:41 +02:00
Gleb Natapov	d796487976	net: use our RSS key instead of letting DPDK select one	2014-12-16 10:53:41 +02:00
Gleb Natapov	d8ddaeb104	net: forward reassembled ip packet to correct queue To figure out a cpu that should handle reassembled TCP packet RSS redirection table have to be consulted.	2014-12-16 10:53:41 +02:00
Gleb Natapov	64adef7def	net: copy RSS redirection table from a device We will need it in later patch.	2014-12-16 10:53:41 +02:00
Gleb Natapov	fbef83beb0	net: support for num of cpus > num of queues This patch introduce a logic to divide cpus between available hw queue pairs. Each cpu with hw qp gets a set of cpus to distribute traffic to. The algorithm doesn't take any topology considerations into account yet.	2014-12-16 10:53:41 +02:00
Gleb Natapov	da53dcff80	net: simplify calculation of number of queues	2014-12-11 13:06:38 +02:00
Gleb Natapov	649210b5b6	net: rename net::distributed_device to net::device	2014-12-11 13:06:32 +02:00
Gleb Natapov	0e70ba69cf	net: rename net::device to net::qp	2014-12-11 13:06:27 +02:00
Nadav Har'El	3d874892a7	dpdk: enable transmit-side checksumming offload This patch uses the NIC's capability to calculate in hardware the IP, TCP and UDP checksums on outgoing packets, instead of us doing this on the sending CPU. This can save us quite a bit of calculations (especially for the TCP/UDP checksum of full-sized packets), and avoid cache-polution on the CPU when sending cold data. On my setup this patch improves the performance of a single-cpu memcached by 6%. Together with the recent patch for receive-side checksum offloading, the total improvement is 10%. This patch is somewhat complicated by the fact we have so many different combinations of checksum-offloading capabilities; While virtio can only offload layer-4 checksumming (tcp/udp), dpdk lets us offload both ip and layer-4 checksum. Moreover, some packets are just IP but not TCP/UDP (e.g., ICMP), and some packets are not even IP (e.g., ARP), so this patch modifies a few of the hardware-features flags and the per-packet offload-information flags to fit our new needs. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-12-10 18:05:02 +02:00
Gleb Natapov	8bb82512a1	net: enable RSS for V4 IP/UDP/TCP	2014-12-09 18:55:19 +02:00
Gleb Natapov	73f6d943e1	net: separate device initialization from queues initialization This patch adds new class distributed_device which is responsible for initializing HW device and it is shared between all cpus. Old device class responsibility becomes managing rx/tx queue pair and it is local per cpu. Each cpu have to call distributed_device::init_local_queue() to create its own device. The logic to distribute cpus between available queues (in case there is no enough queues for each cpu) is in the distributed_device currently and not really implemented yet, so only one queue or queues == cpus scenarios are supported currently, but this can be fixed later. The plan is to rename "distributed_device" to "device" and "device" to "queue_pair" in later patches.	2014-12-09 18:55:14 +02:00
Nadav Har'El	3f2ea82e6d	dpdk: rx checksum offloading If the card supports this (and usually, it does), enable rx checksum offloading by the card, and avoid calculating the checksums ourselves. With rx checksum offloading, the card checks in incoming packets the IP header checksum and the L4 (TCP or UDP) checksum, and gives us a flag when one of them is wrong, meaning that we do not need to do these calculations ourselves. This patch improves memcached performance on my setup by almost 3%. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2014-12-08 20:41:31 +02:00
Avi Kivity	f4d7bd7e00	reactor: register pollers using a RAII class Avoids leaking a poller.	2014-12-07 17:36:44 +02:00
Vlad Zolotarov	5bc89b974a	dpdk: First proper offload features initialization - Query the port for its caps. - Properly adjust the queue numbers according to the caps. - Enable RSS only if the final queues number is greater than 1. - Enable Rx VLAN stripping. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-07 17:32:36 +02:00
Vlad Zolotarov	2d10018870	dpdk: separate the EAL initialization from port initialization - Create a new class dpdk_eal that initializes DPDK EAL. - Get rid of portmask crap and provide a port index to a dpdk::net_device constructor. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-12-07 17:31:12 +02:00
Avi Kivity	3e4842a2a1	Merge branch 'asias/ip' of github.com:cloudius-systems/seastar-dev IP fragment reassembly from Asias.	2014-12-03 16:03:18 +02:00
Gleb Natapov	4d3b6497ea	reactor: rework poll infrastructure Move idle state management out from smp poller back to generic code. Each poller returns if it did any useful work and generic code decided if it should go idle based on that. If a poller requires constant polling it should always return true.	2014-12-03 14:37:33 +02:00
Gleb Natapov	7dbc333da6	core: Allow forwarding from/to any cpu	2014-12-03 17:47:29 +08:00
Gleb Natapov	bf46f9c948	net: Change how networking devices are created Currently each cpu creates network device as part of native networking stack creation and all cpus create native networking stack independently, which makes it impossible to use data initialized by one cpu in another cpu's networking device initialization. For multiqueue devices often some parts of an initialization have to be handled by one cpu and all other cpus should wait for the first one before creating their network devices. Even without multiqueue proxy devices should be created after master device is created so that proxy device may get a pointer to the master at creation time (existing code uses global per cpu device pointer and assume that master device is created on cpu 0 to compensate for the lack of ordering). This patch makes it possible to delay native networking stack creation until network device is created. It allows one cpu to be responsible for creation of network devices on multiple cpus. Single queue device initialize master device on one cpu and call other cpus with a pointer to master device and its cpu id which are used in proxy device creation. This removes the need for per cpu device pointer and "master on cpu 0" assumption from the code since now master device and slave devices know about each other and can communicate directly.	2014-11-30 18:10:08 +02:00
Vlad Zolotarov	5cd984b5cc	dpdk: Initial commit - Currently only a single port and a single queue are supported. - All DPDK EAL configuration is hard-coded in the dpdk_net_device constructor instead of coming from the app parameters. - No offload features are enabled. - Tx: will spin in the dpdk_net_device::send() till there is a place in the HW ring to place a current packet. - Tx: copy data from the `packet` frags into the rte_mbuf's data. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2014-11-30 12:13:52 +02:00

35 Commits