Currently we require that memory be freed on the same cpu it was allocated.
This does not impose difficulties on the user code, since our code is already
smp-unsafe, and so must use message-passing to run the destructor on the
origin cpu, so memory is naturally freed there as well.
However, library code does not run under our assumptions, specifically
std::exception_ptr, which we do transport across cores.
To support this use case, add low-performance support for cross-cpu frees,
using an atomic singly linked list per core.
This is a little tricky, since we only know we want hugetlbfs after memory
has been initialized, so we start up in anonymous memory, and later
switch to hugetlbfs by copying it to hugetlb-backed memory and mremap()ing
it back into place.
Allow memory users to declare methods of reclaiming memory (reclaimers),
and allow the main loop to declare a safe point for calling these reclaimers.
The memory mananger will then schedule calls to reclaimers when memory runs
low.