Commit Graph

53948 Commits

Author SHA1 Message Date
Tomasz Grabiec
f556172619 temporary_buffer: make empty buffer don't need to malloc() 2014-12-03 13:15:09 +01:00
Tomasz Grabiec
1c49669f59 temporary_buffer: introduce operator bool()
It's used as a test for emptiness by convention.

Allows for things like:

  if (buf) {
     // not empty
  }
2014-12-03 13:15:09 +01:00
Tomasz Grabiec
cbe6169d36 test.py: speed up allocator test when running tests in fast mode 2014-12-03 13:15:09 +01:00
Tomasz Grabiec
76a8908b21 virtio: fix indentation 2014-12-03 13:15:09 +01:00
Asias He
2702af5e7d net: Add help packet_merger
This can be used for both TCP out-of-order and IP fragmentation merging.
2014-12-03 17:47:30 +08:00
Asias He
8335787268 net: Expose interface::forward
This can be used with ipv4 fragmentation.
2014-12-03 17:47:29 +08:00
Asias He
7ca33fdd72 ip: Add helper for fragmentation 2014-12-03 17:47:29 +08:00
Gleb Natapov
7dbc333da6 core: Allow forwarding from/to any cpu 2014-12-03 17:47:29 +08:00
Asias He
6c097fe2e9 tests: Make udp_server SMP aware 2014-12-03 11:03:00 +02:00
Avi Kivity
1e572a3248 app-template: add missing app-template.cc 2014-12-01 18:01:12 +02:00
Nadav Har'El
8827eb3b27 Clean up link line with DPDK (v2)
The command line linking with DPDK's libraries looked like a cross between
random character generator and black magic. Reading a bit on the DPDK
mailing list, it turns out there is method in this madness (flawed method,
but method nontheless):

1. Instead of using "-l..." they used "-Wl,-l..." everywhere. Turns out
   they did this ugliness to "hide" this option from libtool.

   We don't use libtool, and don't need to hide anything from it.

2. They used "--start-group ... --end-group" to avoid having to figure
   out the right link order.

   It was easy to figure out the right link order and avoid this option.

3. They used "--whole-archive" on all the DPDK libraries. Unfortunately,
   this option *is* needed, because the way DPDK is written, it is not
   suited to be compiled into an (non-shared) library: Each of the DPDK
   drivers ("librte_pmd_*") has a constructor function which needs to
   run to register itself. This works fine with shared libraries (whose
   constructors are run on load) but with a ".a" library, the whole
   library is left out because nothing from the outside refers to any
   of its symbols.

   So what we should do is to use --whole-archive only on the PMD drivers,
   and all will be fully compiled into the generated program. The rest of
   the DPDK libraries will be linked normally, and hopefully because we
   don't use large parts of DPDK, big chunks will not be compiled in.

   If we don't add this "--whole-archive", none of the drivers will be
   compiled into the program, the initialization will not be able to
   find any driver, and just complain there are no ethernet ports.

After this patch, Seastar with DPDK still compiles, and runs.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-12-01 18:00:14 +02:00
Avi Kivity
6b5973af70 app-template: don't alias boost::program_options as bpo in a header file
We only have one global namespace, let's work together to keep it free
of pollution.
2014-12-01 17:56:34 +02:00
Avi Kivity
78691fc72f app-template: move to a .cc file
Reduce compile loads.
2014-12-01 17:48:18 +02:00
Avi Kivity
1820c8eaf6 blkdiscard_test: add missing include
For keep_doing().
2014-12-01 17:47:57 +02:00
Avi Kivity
e1397038d4 future-util.hh: add missing include
'task_quota' needs reactor.hh
2014-12-01 17:47:28 +02:00
Avi Kivity
256d1823c6 app-template: warn on debug mode 2014-12-01 17:33:47 +02:00
Avi Kivity
7619c01941 Merge branch 'flashcache' of github.com:cloudius-systems/seastar-dev
Flashcache fixes from Raphael.
2014-12-01 14:53:22 +02:00
Avi Kivity
be0ae4f5dc memory: Un-hide standard allocator functions
With -fvisibility=hidden, all executable symbols are hidden from shared
objects, allowing more optimizations (especially with -flto).  However, hiding
the allocator symbols mean that memory allocated in the executable cannot
be freed in a library, since they will use different allocators.

Fix by exposing these symbols with default visibility.

Fixes crash loading some dpdk libraries.
2014-12-01 14:49:04 +02:00
Raphael S. Carvalho
0653a9f3f7 flashcache: fix _total_mem_disk accounting when erasing mem-disk items
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2014-12-01 10:30:23 -02:00
Raphael S. Carvalho
9de9f34423 flashcache: fix erase on disk-based items
Fixed by adding missing break statement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2014-12-01 10:28:42 -02:00
Gleb Natapov
c90e56e4fb memory: dynamically search for memory level in a topology
Current code assumes that memory is at node level, but on non numa
machines there is no node level at all. Instead of assuming memory
location in a topology search for it dynamically.
2014-12-01 14:09:36 +02:00
Nadav Har'El
99e18901c1 Fix build with dpdk
With gcc 4.9.2, build with DPDK enabled breaks with error like:

../dpdk-1.7.1/x86_64-native-linuxapp-gcc/include/rte_pci.h:99:37:
warning: invalid suffix on literal; C++11 requires a space between literal
and string macro [-Wliteral-suffix]
 #define PCI_SHORT_PRI_FMT "%.2"PRIx8":%.2"PRIx8".%"PRIx8

The problem is that C++11 outlawed, breaking decades of proud C-preprocessor
tradition, using a macro if stuck to the end of a string. But this used
in DPDK's header files, so we need to turn this error into a warning
(let's keep the warning, hopefully it will disappear in newer versions
of DPDK).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-30 19:02:10 +02:00
Nadav Har'El
e1887713be Add missing Ubuntu package to README
Since recently, we also need the "libcrypto++-dev" package to compile
Seastar (libcrypto++ used by the TCP sequence number randomization...).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2014-11-30 18:25:32 +02:00
Gleb Natapov
bf46f9c948 net: Change how networking devices are created
Currently each cpu creates network device as part of native networking
stack creation and all cpus create native networking stack independently,
which makes it impossible to use data initialized by one cpu in another
cpu's networking device initialization. For multiqueue devices often some
parts of an initialization have to be handled by one cpu and all other
cpus should wait for the first one before creating their network devices.
Even without multiqueue proxy devices should be created after master
device is created so that proxy device may get a pointer to the master
at creation time (existing code uses global per cpu device pointer and
assume that master device is created on cpu 0 to compensate for the lack
of ordering).

This patch makes it possible to delay native networking stack creation
until network device is created. It allows one cpu to be responsible
for creation of network devices on multiple cpus. Single queue device
initialize master device on one cpu and call other cpus with a pointer
to master device and its cpu id which are used in proxy device creation.
This removes the need for per cpu device pointer and "master on cpu 0"
assumption from the code since now master device and slave devices know
about each other and can communicate directly.
2014-11-30 18:10:08 +02:00
Gleb Natapov
a38f189f5a memory: handle hwloc cousin lists not circular
Use hwloc_get_next_obj_by_type() instead of directly following cousin
list and handle list wrap around. Also fixed use of uninitialized
variable (I wonder why compiler did not complain).
2014-11-30 16:20:53 +02:00
Avi Kivity
e9432e9254 reactor: move collectd initialization out of reactor::run()
It's complicated enough without it.
2014-11-30 14:24:19 +02:00
Gleb Natapov
cbc2f40680 memory: fix numa memory initialization
Current code crashes on an assert while dividing memory to cpus if number
of cpus seastar is configured to use is smaller then number of available
numa nodes. The reason is that seastar tries to use all available memory,
but considers only one numa node while dividing it. This patch makes
memory division two phase process: first each cpu tries to grub as
much memory from its local node as it can, second all free memory that
was left is divided between all cpus. The algorithm works like that to
prevent one cpu from stealing local memory from another cpu.
2014-11-30 12:50:23 +02:00
Avi Kivity
987b6ca3eb Merge branch 'dpdk'
dpdk support from Vlad:

"- Currently only a single port and a single queue are supported.
 - All DPDK EAL configuration is hard-coded in the dpdk_net_device constructor instead
   of coming from the app parameters.
 - No offload features are enabled.
 - Tx: will spin in the dpdk_net_device::send() till there is a place in the HW ring to
       place a current packet.
 - Tx: copy data from the `packet` frags into the rte_mbuf's data."
2014-11-30 12:19:04 +02:00
Vlad Zolotarov
a0769ae189 echotest: Added support for DPDK PMD backend
- Fixed the IP addresses swapping.
 - Added cmdline parameters to choose between virtio and DPDK tests.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-30 12:14:58 +02:00
Vlad Zolotarov
93f7cc434d tests: rename virtiotest -> echotest
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-30 12:14:58 +02:00
Vlad Zolotarov
857719556a README.md: Added a DPDK related chapter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-30 12:14:58 +02:00
Vlad Zolotarov
12caa3afe4 net: add option to use a dpdk PMD networking backend
- Added "dpdk-pmd" option:
     - Defaulted to FALSE.
     - When TRUE - use DPDK PMD drivers.
 - Call for dpdk net_device creation function if dpdk-poll option is given
 - Added DPDK networking backend options to all options list

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-30 12:14:56 +02:00
Vlad Zolotarov
5cd984b5cc dpdk: Initial commit
- Currently only a single port and a single queue are supported.
    - All DPDK EAL configuration is hard-coded in the dpdk_net_device constructor instead
      of coming from the app parameters.
    - No offload features are enabled.
    - Tx: will spin in the dpdk_net_device::send() till there is a place in the HW ring to
          place a current packet.
    - Tx: copy data from the `packet` frags into the rte_mbuf's data.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-30 12:13:52 +02:00
Vlad Zolotarov
47b3721ccf reactor: added a "pollers" abstraction
Each "poller" registers a non-blocking callback which is then called in
every iteration of a reactor's main loop.

Each "poller"'s callback returns a boolean: if TRUE then a main loop is allowed to block
(e.g. in epoll()).

If any of registered "pollers" returns FALSE then reactor's main loop is forbidded to block
in the current iteration.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2014-11-30 12:12:39 +02:00
Asias He
88a1a37a88 ip: Support IP fragmentation in TX path
Tested with UDP sending large datagrams with ufo off.
2014-11-30 10:16:38 +02:00
Avi Kivity
f4daca803d Merge branch 'glommer/xen' of github.com:cloudius-systems/seastar-dev
Xen fixes (userspace + osv) from Glauber.
2014-11-29 14:06:27 +02:00
Glauber Costa
2cf187590f xen: fix userspace interrupts
The local variable used to read the ports won't be valid after we return from
the function. Moving it to be an instance member is not ideal, but it work if
we don't unmask the ports until we're ready signaling them all.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-28 14:23:14 +01:00
Glauber Costa
b3c163e603 xen: fix typo in event channel detection
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-28 14:15:36 +01:00
Glauber Costa
c3ae30b760 xen: delete event channel as well
If we don't have split channels, we need to delete the relevant property.
because xs_rm() returns true if the feature does not exist, it won't affect the
transaction if we just delete all of them. Therefore we don't need to do any
conditional test.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-27 18:00:35 +01:00
Glauber Costa
a4667c48e6 xen: fix gntalloc for userspace
It broke when we changed things to accomodate OSv's functions. The following
code works.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-27 18:00:35 +01:00
Glauber Costa
f06233695c xenstore: bail on error
If there is some error opening the xenstore - for instance, if we run
without privileges, we should bail out or we will segfault later.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-27 18:00:35 +01:00
Glauber Costa
3848130f2f xen: only add features to feature array
We are adding everything we read into the features array. Because in the
destructor we will remove everything in the features list, we'll end up
removing more than we should. Things like the mac address, handle, etc, should
never be deleted.

This is not a problem for OSv because usually, after the destructor is called,
the whole guest is down. But for userspace, the network card is left there,
but will cease to work if we delete too much.

After we do that with the _features array - it's original intent, it becomes
reduntant with features nack.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-27 18:00:35 +01:00
Glauber Costa
bd8a18c178 xen: umask event channels when setup is ready
This is not required for OSv, but is required for userspace operation.
It won't work without it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-27 18:00:35 +01:00
Avi Kivity
861957e5ba Merge branch 'glommer/xen' of github.com:cloudius-systems/seastar-dev
Glauber says:

"This patch yields a small performance boost. It is not complete, since the rest
of the performance work is still missing since half of that is in OSv.

But more importantly, it now works on AWS."
2014-11-26 18:30:26 +02:00
Glauber Costa
b56a89d5c9 xen: translate feature name
When the backend advertises "feature-rx-copy", the frontend should register for
"request-rx-copy". The local hypervisor seems to be forgiving about it, but the
one in AWS, it is not, and doubly so.

First, it doesn't recognize these as the same. And second, it refuses to
connect the backend if this feature is not advertised by the frontend.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-26 17:22:58 +01:00
Glauber Costa
a9a79e3ba6 xen: ring unification
The ring processing is almost the same for both rx and tx, with the exception
with the core of the action. We can actually unify them nicely with some use of
template programming.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-26 17:21:09 +01:00
Glauber Costa
e7c9aeb8a5 xen: interrupt mitigation
There are two things we can do that will lead to less interrupts being sent.
The first, is to read the new rsp_cons value at the end of every interaction.
If the backend produces more frames in the mean time, we'll be able to process
in the same round, without getting another interrupt.

The other, is to set the rsp_event only after all the frames are processed.

As a matter of fact, both the tx and rx rings did one of them, but not the same
one. The next patch will unify the ring code to avoid problems like that in the
future.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2014-11-26 17:17:45 +01:00
Gleb Natapov
4f4731c37b net: delay network stack creation
Network device has to be available when network stack is created, but
sometimes network device creation should wait for device initialization
by another cpu. This patch makes it possible to delay network stack
creation until network device is available.
2014-11-26 16:46:04 +02:00
Avi Kivity
87fdf52205 Merge branch 'clang' 2014-11-26 15:01:14 +02:00
Avi Kivity
e8894227bc xen: declare nr_ents higher to satisfy clang 2014-11-26 15:00:13 +02:00