swapcontext() is expensive as it invokes system calls. Replace it with
setjmp()/longjmp(). We still use setcontext() initially, since that's
the most reasonable portable method of setting up a stack.
Context switch time (measured by thread_context_switch) is reduced to
120ns (from 450ns), with inefficiencies in the test itself and in future<>
dominating.
Add a thread class that can be used to launch a blockable thread of
execution. Within a thread, future<>::get() can be called on an
unavailable future, in which case it blocks until the future is made ready.