Since we control the capacity, we can force it to be a power of two,
and use masking instead of tests to handle wraparound.
A side benefit is that we don't have to allocate an extra element.
Since we have lots of queues, we need an efficient queue structure,
esp. for moveable types. libstdc++'s std::deque is quite hairy,
and boost's circular_buffer_space_optimized uses assignments instead of
constructors, which are both slower and less available than constructors.
This patch implements a growable circular buffer for these needs.