It is based on tcp_client and works with our httpd server.
1) timer based, to run the test for 10 seconds
$ http_client --server 192.168.66.100:10000 --conn 100 --duration 10 --smp 2
========== http_client ============
Server: 192.168.66.100:10000
Connections: 100
Requests/connection: dynamic (timer based)
Requests on cpu 0: 33400
Requests on cpu 1: 33368
Total cpus: 2
Total requests: 66768
Total time: 10.011478
Requests/sec: 6669.145442
========== done ============
2) nr of reqs per connection based, to run the test with 100 connections
each has to run 1000 reqs
$ http_client --server 192.168.66.100:10000 --conn 100 --reqs 1000 --smp 2
========== http_client ============
Server: 192.168.66.100:10000
Connections: 100
Requests/connection: 1000
Requests on cpu 0: 50000
Requests on cpu 1: 50000
Total cpus: 2
Total requests: 100000
Total time: 15.002731
Requests/sec: 6665.453192
========== done ============
This patch is based on Shlomi's initial version.
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
Signed-off-by: Asias He <asias@cloudius-systems.com>
Directory listing support, using subscription<sstring> to represent the
stream of file names produced by the directory lister running in parallel
with the directory consumer.
The current shared_ptr implementation is efficient, but does not support
polymorphic types.
Rename it in order to make room for a polymorphic shared_ptr.
Terminate a running memcached instance with a SIGTERM instead of a
SIGKILL to allow the process to close in a cleaner manner
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
This patch introduce a logic to divide cpus between available hw queue
pairs. Each cpu with hw qp gets a set of cpus to distribute traffic
to. The algorithm doesn't take any topology considerations into account yet.
Instead of forward() deciding packet destination make it collect input
for RSS hash function depending on packet type. After data is collected
use toeplitz hash function to calculate packet's destination.
This patch adds new class distributed_device which is responsible for
initializing HW device and it is shared between all cpus. Old device
class responsibility becomes managing rx/tx queue pair and it is local
per cpu. Each cpu have to call distributed_device::init_local_queue() to
create its own device. The logic to distribute cpus between available
queues (in case there is no enough queues for each cpu) is in the
distributed_device currently and not really implemented yet, so only one
queue or queues == cpus scenarios are supported currently, but this can
be fixed later.
The plan is to rename "distributed_device" to "device" and "device"
to "queue_pair" in later patches.
It listens for requests on port 10000 and sends responses comprised of
three chunks of data in one packet. The chunk sizes are specified via
the --chunk-size argument.
The reqest can be anything, its content is ignored.
You can switch to equivalent copying version by passing --copy
argument.
Seastar's allocator has an assertion which checks that the memory
block is freed on the same CPU on which it was allocated. This is
reasonable because all allocations in seastar need to be CPU-local.
The problem is that boost libraries (program_options,
unit_testing_framework) make heavy use of lazily-allocated static
variables, for instance:
const variable_value&
variables_map::get(const std::string& name) const
{
static variable_value empty;
// ...
}
Such variable will be allocated in in the current threads but freed in
the main thread when exit handlers are executed.
This results in:
output_stream_test: core/memory.cc:415: void memory::cpu_pages::free(void*):
Assertion `((reinterpret_cast<uintptr_t>(ptr) >> cpu_id_shift) & 0xff) == cpu_id' failed.
This change works around the problem by forcing initialization in the
main thread.
See issue #10.
Currently each cpu creates network device as part of native networking
stack creation and all cpus create native networking stack independently,
which makes it impossible to use data initialized by one cpu in another
cpu's networking device initialization. For multiqueue devices often some
parts of an initialization have to be handled by one cpu and all other
cpus should wait for the first one before creating their network devices.
Even without multiqueue proxy devices should be created after master
device is created so that proxy device may get a pointer to the master
at creation time (existing code uses global per cpu device pointer and
assume that master device is created on cpu 0 to compensate for the lack
of ordering).
This patch makes it possible to delay native networking stack creation
until network device is created. It allows one cpu to be responsible
for creation of network devices on multiple cpus. Single queue device
initialize master device on one cpu and call other cpus with a pointer
to master device and its cpu id which are used in proxy device creation.
This removes the need for per cpu device pointer and "master on cpu 0"
assumption from the code since now master device and slave devices know
about each other and can communicate directly.
- Fixed the IP addresses swapping.
- Added cmdline parameters to choose between virtio and DPDK tests.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>