There was a redundant work in split_fragmented(): value_length_if_fixed() was
called inside the loop (N virtual calls), and no reserve() was done
on the output vector causing repeated reallocations.
This patch reserves the output vector to _dimension and
caches value_length_if_fixed() before the loop.
Additionally, split read_vector_element() into two specialized functions:
read_vector_element_fixed() and read_vector_element_variable(), and hoist
the branch on fixed_len outside the loop in split_fragmented() and
deserialize_loop(). This avoids a conditional branch per element in the
hot path.
Benchmark results (1024-dim float vector, release build, -O3 -flto):
10.34 us -> 7.45 us (1.39x, 28% faster)