1) Make --dpdk-pmd parameter to be a flag instead of a (key, value).
2) Default to a default hugetlbfs DPDK settings when --hugepages is not
given and --dpdk-pmd is set.
This will allow a more friendly user experience in general and when one doesn't
want to provide a --hugepages parameter in particular.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
- Move the smp::dpdk_eal_init() code into the dpdk::eal::init() where it belongs.
- Removed the unused "opts" parameter of dpdk::dpdk_device constructor - all its usage
has been moved to dpdk::eal::init().
- Cleanup in reactor.cc: #if HAVE_DPDK -> #ifdef HAVE_DPDK; since we give a -DHAVE_DPDK
option to a compiler.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
This patchset adds support for an asynchronous thrift service serving the
Cassandra interface.
The implementation uses the "continuation object style" thrift code generation
option, which can be readily adapted to the promise/future interface.
With this, one can connect with a client and see this:
$ ./cassandra-stress insert -node localhost
Exception in thread "main" java.lang.RuntimeException: org.apache.thrift.TApplicationException: sorry, not implemented
at org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:139)
at org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:109)
at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesThrift(SettingsSchema.java:112)
at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:60)
at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:200)
at org.apache.cassandra.stress.StressAction.run(StressAction.java:57)
at org.apache.cassandra.stress.Stress.main(Stress.java:109)
Caused by: org.apache.thrift.TApplicationException: sorry, not implemented
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.cassandra.thrift.Cassandra$Client.recv_set_cql_version(Cassandra.java:1896)
at org.apache.cassandra.thrift.Cassandra$Client.set_cql_version(Cassandra.java:1883)
at org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:128)
... 6 more
As thrift does not support pipelining, the server is very simple. It
implements the thrift framed transport, where each message is preceded
by a four-byte message size header.
Where possible, throw an exception instead of returning an uninitialized
value.
Where not possible (if the method does not throw), return a "dummy" string.
Support adding a thrift file as a source. Since thrift generates multiple
output files, whose names cannot be trivially derived from the source file
name, we have to specify it as an object containing the source file name
and any additional information needed to derive the generated file names
(in this case, the generated thrift services).
The generic thrift headers bring in a #define (yuch) named VERSION, while the
Cassandra interface also defines a symbol with the same name.
Rename the symbol to avoid a compile conflict.
DPDK initialization creates its own threads and assumes that application
uses them, otherwise things do not work correctly (rte_lcore_id()
returns incorrect value for instance). This patch uses DPDK threads to
run seastar main loop making DPDK APIs work as expected.
register_poller() (and unregister_poller()) adjusts _pollers, but it may be
called while iterating it, and since std::vector<> mutations invalidate
iterators, corruption occurs.
Fix by deferring manipulation of _pollers into a task, which is executed at
a time where _pollers is not touched.
Currently, reactor::_pollers holds reactor::poller pointers; since these
are movable types, it's hard to maintain _pollers, as the pointers can keep
changing.
Refactor poller so that _pollers points at an internal type, which does not
move when a reactor::poller moves. This requires getting rid of
std::function, since it lacks a comparison operator.
When we have an object acting as resource guard for memory, we can convert
it into a deleter using
make_deleter([obj = std::move(obj)] {})
introduce a simpler interface
make_object_deleter(std::move(obj))
for doing the same thing.
Some (all?) RSS capable HW provides us with a hash that was used to
select rx queue the packet was delivered to. If such hash is available
it is better to use it to forward packet instead of calculating hash
ourself and suffering cache missed.
This patch introduce a logic to divide cpus between available hw queue
pairs. Each cpu with hw qp gets a set of cpus to distribute traffic
to. The algorithm doesn't take any topology considerations into account yet.
Instead of forward() deciding packet destination make it collect input
for RSS hash function depending on packet type. After data is collected
use toeplitz hash function to calculate packet's destination.
Instead of returning special value from forward() to broadcast arm reply
call arp.learn() on all cpus at arp protocol lever. The ability of
forward() to return special value will be removed by later patches.
Currently dhcp assumes that cpu 0 gets all the packets and redistributes
them by itself. With multiqueue this is not necessary the case, so the
current trick to disable forwarding by installing special dhcp forward()
function will not work. Rework it by installing packet filter on all
cpus before running dhcp and forward all dhcp packets to cpu 0.
From Asias:
"Add a low resolution clock source in addition to what std::chrono provides.
With it we can reduce the expensive std::chrono::high_resolution_clock::now()
calls."
We look at _poll mode in another cpu's cache accidentally, as pard of
the peer->idle() call.
Fix by looking at our own _poll variable first; they should all be the same.
Futures are great for complicated asynchronous operations, but for a
synchronous operation like destroying a packet after transmit, or
converting a buffer to a packet during receive, they're overkill.
This patchset fixes those two cases in virtio, in which futures
are used as an abstraction layer between vring and the transmit/receive
queues, by converting vring into a template, so that the completion function
can be adjusted for the transmit or receive case during compile time instead
of at run time.
10% improvement on httpd with --smp 1, >20% with --smp 3.
Move completion handling (destroy packet, adjust descriptors count) to
a completion function rather than a future. Reduces allocations and task
executed.
Move completion handling (destroy packet, adjust descriptors count) to
a completion function rather than a future. Reduces allocations and task
executed.
Currently vring request completions are handled by fulfilling a promise
contained in the request. While promises are very flexible, this comes
at a cost (allocating and executing a task), and this flexibility is unneeded
when request handling is very regular (such as in virtio-net rx and tx
completion handling).
Make vring more flexible by allowing the completion function to be specified
as a template parameter. No changes to the actual users - they now specify
the completion function as fulfilling the same promise as vring previously
did.