This patch adds new class distributed_device which is responsible for
initializing HW device and it is shared between all cpus. Old device
class responsibility becomes managing rx/tx queue pair and it is local
per cpu. Each cpu have to call distributed_device::init_local_queue() to
create its own device. The logic to distribute cpus between available
queues (in case there is no enough queues for each cpu) is in the
distributed_device currently and not really implemented yet, so only one
queue or queues == cpus scenarios are supported currently, but this can
be fixed later.
The plan is to rename "distributed_device" to "device" and "device"
to "queue_pair" in later patches.
If start promise on initial cpu is signaled before other cpus have
networking stack constructed collected initialization crashes since it
tries to create a UDP socket on all available cpus when initial one is
ready.
A future that does not carry any data (future<>) and its sibling (promise<>)
are heavily used in the code. We can optimize them by overlaying the
future's payload, which in this case can only be an std::exception_ptr,
with the future state, as a pointer and an enum have disjoint values.
This of course depends on std::exception_ptr being implemented as a pointer,
but as it happens, it is.
With this, sizeof(future<>) is reduced from 24 bytes to 16 bytes.
If the card supports this (and usually, it does), enable rx checksum
offloading by the card, and avoid calculating the checksums ourselves.
With rx checksum offloading, the card checks in incoming packets the
IP header checksum and the L4 (TCP or UDP) checksum, and gives us a
flag when one of them is wrong, meaning that we do not need to do these
calculations ourselves.
This patch improves memcached performance on my setup by almost 3%.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
- Query the port for its caps.
- Properly adjust the queue numbers according to the caps.
- Enable RSS only if the final queues number is greater than 1.
- Enable Rx VLAN stripping.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
- Create a new class dpdk_eal that initializes DPDK EAL.
- Get rid of portmask crap and provide a port index to a dpdk::net_device
constructor.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
ipv4::handle_on_cpu() did not properly convert from network byte order, so
it saw any packets with DF=1 as fragmented.
Fix by applying the proper conversion.
It listens for requests on port 10000 and sends responses comprised of
three chunks of data in one packet. The chunk sizes are specified via
the --chunk-size argument.
The reqest can be anything, its content is ignored.
You can switch to equivalent copying version by passing --copy
argument.
Releases owenrship of the data and gives it away as
temporary_buffer. This way we can avoid allocation when putting rvalue
sstring if it's already using external storage. Except we need to
allocate a deleter which uses delete[], but this can be fixed later.
Seastar's allocator has an assertion which checks that the memory
block is freed on the same CPU on which it was allocated. This is
reasonable because all allocations in seastar need to be CPU-local.
The problem is that boost libraries (program_options,
unit_testing_framework) make heavy use of lazily-allocated static
variables, for instance:
const variable_value&
variables_map::get(const std::string& name) const
{
static variable_value empty;
// ...
}
Such variable will be allocated in in the current threads but freed in
the main thread when exit handlers are executed.
This results in:
output_stream_test: core/memory.cc:415: void memory::cpu_pages::free(void*):
Assertion `((reinterpret_cast<uintptr_t>(ptr) >> cpu_id_shift) & 0xff) == cpu_id' failed.
This change works around the problem by forcing initialization in the
main thread.
See issue #10.
This function belongs to a group of functions for associating futures
with time points. Currently there's only now(), which servers as a shorthand
for make_ready_future<>().
If a TCP or UDP IP datagram is fragmented, only the first fragment will
contain the port information. When a fragment without port information
is received, we have no idea which "stream" this fragment belongs to,
thus we no idea how to forward this packet.
To solve this problem, we use "forward twice" method. When IP datagram
which needs fragmentation is received, we forward it using the
frag_id(src_ip, dst_ip, identification, protocol) hash. When all the
fragments are received, we forward it using the connection_id(src_ip,
src_port, dst_ip, dst_port) hash.
Move idle state management out from smp poller back to generic code. Each
poller returns if it did any useful work and generic code decided if it
should go idle based on that. If a poller requires constant polling it
should always return true.