We intend to share immutable sstable components among shards to
reduce excessive memory usage when resharding shared sstables.
This change is about grouping those components into a structure,
and using foreign ptr to make sure that the structure will be
deleted by whichever shard created it.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Bloom filter loading and saving is slow with single-bit access to the bitmap,
causing latency spikes of ~100ms for 20MB sstables. Larger sstables will be
much worse.
Fix by using the newly introduced large_bitmap bulk load/save methods. With
this, the maximum observed task latency was 16ms.
Fixes#299 (partially at least; larger bitmaps may require more work still).
The current filter tracker uses a distributed mechanism, even though the values
for all CPUs but one are usually at zero. This is because - I wrongly assumed -
that when using legacy sstables, the same sstable would be serving keys for
multiple shards, leading to the map reduce being a necessary operation in this
case.
However, Avi currently point out that:
"It is and it isn't [the case]. Yes the sstable will be loaded on multiple cores, but
each core will have its own independent sstable object (only the files on disk
are shared).
So to aggregate statistics on such a shared sstables, you have to match them by
name (and the sharded<filter_tracker> is useless)."
Avi is correct in his remarks. The code will hereby be simplified by keeping
local counters only, and the map reduce operation will happen at a higher
level.
Also, because the users of the get methods will go through the sstable, we can
actually just move them there. With that we can leave the counters private to
the external world in the filter itself.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Large vectors require contiguous storage, which may not be available (or may
be expensive to obtain). Switch to deque<> instead, which allocates
discontiguous storage.
Allocation problems were observed with the summary and with the bloom
filter bitmaps.
The sstables write path has been partially de-futurized, but now creates a
ton of threads, and yet does not exploit this as everything is serialized.
Remove those extra threads and futures and use a single thread to write
everything. If needed, we'll employ write-behind in output_stream to
increase parallelism.
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
All sharded services "should" define a stop method. Calling them is also
a good practice. For this one specifically, though, we will not call stop.
We miss a good way to add a Deleter to a shared_ptr class, and that would
be the only reliable way to tie into its lifetime.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Use the provided filter instead of always returning true. For existing tables,
this arrives from the bloom filter file. We don't yet fully write a bloom
filter file.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>