"
With this series the mutation compactor can now consume a v2 stream. On
the output side it still uses v1, so it can now act as an online
v2->v1 converter. This allows us to push out v2->v1 conversion to as far
as the compactor, usually the next to last component in a read pipeline,
just before the final consumer. For reads this is as far as we can go,
as the intra-node ABI and hence the result-sets built are v1. For
compaction we could go further and eliminate conversion altogether, but
this requires some further work on both the compactor and the sstable
writer and so it is left to be done later.
To summarize, this patchset enables a v2 input for the compactor and it
updates compaction and single partition reads to use it.
"
* 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla:
table: add make_reader_v2()
querier: convert querier_cache and {data,mutation}_querier to v2
compaction: upgrade compaction::make_interposer_consumer() to v2
mutation_reader: remove unecessary stable_flattened_mutations_consumer
compaction/compaction_strategy: convert make_interposer_consumer() to v2
mutation_writer: migrate timestamp_based_splitting_writer to v2
mutation_writer: migrate shard_based_splitting_writer to v2
mutation_writer: add v2 clone of feed_writer and bucket_writer
flat_mutation_reader_v2: add reader_consumer_v2 typedef
mutation_reader: add v2 clone of queue_reader
compact_mutation: make start_new_page() independent of mutation_fragment version
compact_mutation: add support for consuming a v2 stream
compact_mutation: extract range tombstone consumption into own method
range_tombstone_assembler: add get_range_tombstone_change()
range_tombstone_assembler: add get_current_tombstone()
seastar::later() was recently deprecated and replaced with two
alternatives: a cheap seastar::yield() and an expensive (but more
powerful) seastar::check_for_io_immediately(), that corresponds to
the original later().
This patch replaces all later() calls with the weaker yield(). In
all cases except one, it's unambiguously correct. In one case
(test/perf scheduling_latency_measurer::stop()) it's not so ambiguous,
since check_for_io_immediately() will additionally force a poll and
so will cause more work to be done (but no additional tasks to be
executed). However, I think that any measurement that relies on
the measuring the work on the last tick to be inaccurate (you need
thousands of ticks to get any amount of confidence in the
measurement) that in the end it doesn't matter what we pick.
Tests: unit (dev)
Closes#9904
Split off of #9835.
The series removes extraneous includes of lister.hh from header files
and adds a unit test for lister::scan_dir to test throwing an exception
from the walker function passed to `scan_dir`.
Test: unit(dev)
Closes#9885
* github.com:scylladb/scylla:
test: add lister_list
lister: add more overloads of fs::path operator/ for std::string and string_view
resource_manager: remove unnecessary include of lister.hh from header file
sstables: sstable_directory: remove unncessary include of lister.hh from header file
Test the lister class.
In particular the ability to abort the lister
when the walker function throws an exception.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
The first patch introduces evictable_reader_v2, and the second one
further simplifies it. We clone instead of converting because there
is at least one downstream (by way of multishard_combining_reader) use
that is not itself straightforward to convert at the moment
(multishard_mutation_query), and because evictable_reader instances
cannot be {up,down}graded (since users also access the undelying
buffers). This also means that shard_reader, reader_lifecycle_policy
and multishard_combining_reader have to be cloned.
"
* tag 'clone-evictable-reader-to-v2/v3' of https://github.com/cmm/scylla:
convert make_multishard_streaming_reader() to flat_mutation_reader_v2
convert table::make_streaming_reader() to flat_mutation_reader_v2
convert make_flat_multi_range_reader() to flat_mutation_reader_v2
view_update_generator: remove unneeded call to downgrade_to_v1()
introduce multishard_combining_reader_v2
introduce shard_reader_v2
introduce the reader_lifecycle_policy_v2 abstract base
evictable_reader_v2: further code simplifications
introduce evictable_reader_v2 & friends
After the rewrite of the test/scylla-gdb, the test for "scylla fiber"
was disabled - and this patch brings it back.
For the "scylla fiber" operation to do something interesting (and not just
print an error message and seem to succeed...) it needs a real task pointer.
The old code interrupted Scylla in a breakpoint and used get_local_tasks(),
but in the new test framework we attach to Scylla while it's idle, so
there are no ready tasks. So in this patch we use the find_vptrs()
function to find a continuation from http_server::do_accept_one() - it
has an interesting fiber of 5 continuations.
After this patch all 33 tests in test/scylla-gdb/test_misc.py pass.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220110211813.581807-1-nyh@scylladb.com>
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
The change exposes a dependency on the construction
order of static variables (which isn't defined), so we remove
the dependency in the first two patches.
Closes#9891
* github.com:scylladb/scylla:
replica: move distributed_loader into replica module
tracing: make sure keyspace and table names are available to static constructors
auth: make sure keyspace and table names are available to static constructors
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
As already noted in commit eac6fb8, many of the scylla-gdb tests fail on
aarch64 for various reasons. The solution used in that commit was to
have test/scylla-gdb/run pretend to succeed - without testing anything -
when not running on x86_64. This workaround was accidentally lost when
scylla-gdb/run was recently rewritten.
This patch brings this workaround back, but in a slightly different form -
Instead of the run script not doing anything, the tests do get called, but
the "gdb" fixture in test/scylla-gdb/conftest.py causes each individual
test to be skipped.
The benefit of this approach is that it can easily be improved in the
future to only skip (or xfail) specific tests which are known to fail on
aarch64, instead of all of them - as half of the tests do pass on aarch64.
Fixes#9892.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220109152630.506088-1-nyh@scylladb.com>
Said wrapper was conceived to make unmovable `compact_mutation` because
readers wanted movable consumers. But `compact_mutation` is movable for
years now, as all its unmovable bits were moved into an
`lw_shared_ptr<>` member. So drop this unnecessary wrapper and its
unnecessary usages.
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
This patch is an almost complete rewrite of the test/scylla-gdb
framework for testing Scylla's gdb commands.
The goals of this rewrite are described in issue #9864. In short, the
goals are:
1. Use pytest to define individual test cases instead one long Python
script. This will make it easier to add more tests, to run only
individual tests (e.g., test/scylla-gdb/run somefile.py::sometest),
to understand which test failed when it fails - and a lot of other
pytest conveniences.
2. Instead of an ad-hoc shell script to run Scylla, gdb, and the test,
use the same Python code which is used in other test suites (alternator,
cql-pytest, redis, and more). The resulting handling of the temporary
resources (processes, directories, IP address) is more robust, and
interrupting test/scylla-gdb/run will correctly kill its child
processes (both Scylla and gdb).
All existing gdb tests (except one - more on this below...) were
easily rewritten in the new framework.
The biggest change in this patch is who starts what. Before this patch,
"run" starts gdb, which in turn starts Scylla, stops it on a breakpoint,
and then runs various tests. After this patch, "run" starts Scylla on
its own (like it does in test/cql-pytest/run, et al.), and then gdb runs
pytest - and in a pytest fixture attaches to the running Scylla process.
The biggest benefit of this approach is that "run" is aware of both gdb
and Scylla, and can kill both with abruptly with SIGKILL to end the test.
But there's also a downside to this change: One of the tests (of "scylla
fiber") needs access to some task object. Before this patch, Scylla was
stopped on a breakpoint, and a task was available at that point. After
this patch, we attach gdb to an idle Scylla, and the test cannot find
any task to use. So the test_fiber() test fails for now.
One way we could perhaps fix it is to add a breakpoint and "continue"
Scylla a bit more after attaching to it. However, I could find the right
breakpoint - and we may also need to send a request to Scylla to
get it to reach that breakpoint. I'm still looking for a better way
to have access to some "task" object we can test on.
Fixes#9864.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220102221534.1096659-1-nyh@scylladb.com>
The header file <seastar/net/ip.hh> is a large collection of unrelated stuff, and according to ClangBuildAnalyzer, takes 2 seconds to compile for every source file that included it - and unfortunately virtually all Scylla source files included it - through either "types.hh" or "gms/inet_address.hh". That's 2*300 CPU seconds wasted.
In this two-patch series we completely eliminate the inclusion of <seastar/net/ip.hh> from Scylla. We still need the ipv4_address, ipv6_address types (e.g., gms/inet_address.hh uses it to hold a node's IP address) so those were split (in a Seastar patch that is already in) from ip.hh into separate small header files that we can include.
This patch reduces the entire build time (of build/dev/scylla) by 4% - reducing almost 10 sCPU minutes (!) from the build.
Closes#9875
* github.com:scylladb/scylla:
build performance: do not include <seastar/net/ip.hh>
build performance: speed up inclusion of <gm/inet_address.hh>
In a previous patch, we noticed that the header file <gm/inet_address.hh>,
which is included, directly or indirectly, by most source files,
includes <seastar/net/ip.hh> which is very slow to compile, and
replaced it by the much faster-to-include <seastar/net/ipv[46]_address.hh>.
However, we also included <seastar/net/ip.hh> in types.hh - and that
too is included by almost every file, so the actual saving from the
above patch was minimal. So in this patch we replace this include too.
After this patch Scylla does not include <seastar/net/ip.hh> at all.
According to ClangBuildAnalyzer, this reduces the average time to include
types.hh (multiply this by 312 times!) from 4 seconds to 1.8 seconds,
and reduces total build time (dev mode) by about 3%.
Some of the source files were now missing some include directives, that
were previously included in ip.hh - so we need to add those explicitly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The test case assumed int32 partition key, but
scylla_bench_large_part_ds1 has int64 partition key. This resulted in
no results to be returned by the reader.
Fixs by introducing a partition key factory on the data source level.
Message-Id: <20220105150550.67951-1-tgrabiec@scylladb.com>
We already have multiple tests for the unimplemented "Projection" feature
of GSI and LSI (see issue #5036). This patch adds seven more test cases,
focusing on various types of errors conditions (e.g., trying to project
the same attribute twice), esoteric corner cases (it's fine to list a key in
NonKeyAttributes!), and corner cases that I expect we will have in our
implementation (e.g., a projected attribute may either be a real Scylla
column or just an element in a map column).
All new tests pass on DynamoDB and fail on Alternator (due to #5036), so
marked with "xfail".
Refs #5036.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211228193748.688060-1-nyh@scylladb.com>
"
Like flat_mutation_reader_from_fragments, this reader is also heavily
used by tests to compose a specific workload for readers above it. So
instead of converting it, we add a v2 variant and leave the v1 variant
in place.
The v2 variant was written from scratch to have built-in support for
reading in reverse. It is built-on `mutation::consume()` to avoid
duplicating the logic of consuming the contents of the mutation. To
avoid stalls, `mutation::consume()` gets support for pausing and
resuming consuming a mutation.
Tests: unit(dev)
"
* 'flat_mutation_reader_from_mutations_v2/v2' of https://github.com/denesb/scylla:
flat_mutation_reader: convert make_flat_mutation_reader_from_mutation() v2
flat_mutation_reader: extract mutation slicing into a function
mutation: consume(): make it pausable/resumable
mutation: consume(): restructure clustering iterator initialization
test/boost/mutation_test: add rebuild test for mutation::consume()
This patch adds a new cql-pytest test file - test_ttl.py - with
currently just a couple of tests for the "with default_time_to_live"
feature. One is a basic test, and second reproduces issue #9842 -
that "using ttl 0" should override the default time to live, but
doesn't.
The test for #9842, test_default_ttl_0_override, fails on Scylla and
passes on Cassandra, and is marked "xfail".
Refs #9842.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211227091502.553577-1-nyh@scylladb.com>
Since this reader is also heavily used by tests to compose a specific
workload for readers above it, we just add a v2 variant, instead of
changing the existing v1 one.
The v2 variant was written from scratch to have built-in support for
reading in reverse. It is built-on `mutation::consume()` to avoid
duplicating the logic of consuming the contents of the mutation.
A v2 native unit test is also added.
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.
To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.
In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:
1) GC a tombstone after gc_grace_seconds
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;
This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.
2) Never GC a tombstone
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};
3) GC a tombstone immediately
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};
4) GC a tombstone after repair
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};
In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.
A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.
Tests: compaction_test.py, ninja test
Fixes#3560
[avi: resolve conflicts vs data_dictionary]
In the next patches we will refactor mutation::consume(). Before doing
that add another test, which rebuilds the consumed mutation, comparing
it with the original.
test/rest_api has a "--ssl" option to use encrypted CQL. It's not clear
to me why this is useful (it doesn't actually test encryption of the
REST API!), but as long as we have such an option, it should work.
And it didn't work because of a typo - we set a "check_cql" variable to the
right function, but then forgot to use it and used run.check_cql instead
(which is just for unencrypted cql).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220102123202.1052930-1-nyh@scylladb.com>
This change makes row cache support reverse reads natively so that reversing wrappers are not needed when reading from cache and thus the read can be executed efficiently, with similar cost as the forward-order read.
The database is serving reverse reads from cache by default after this. Before, it was bypassing cache by default after 703aed3277.
Refs: #1413
Tests:
- unit [dev]
- manual query with build/dev/scylla and cache tracing on
Closes#9454
* github.com:scylladb/scylla:
tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries
row_cache: partition_snapshot_row_cursor: Print more details about the current version vector
row_cache: Improve trace-level logging
config: Use cache for reversed reads by default
config: Adjust reversed_reads_auto_bypass_cache description
row_cache: Support reverse reads natively
mvcc: partition_snapshot: Support slicing range tombstones in reverse
test: flat_mutation_reader_assertions: Consume expected range tombstones before end_of_partition
row_cache: Log produced range tombstones
test: Make produces_range_tombstone() report ck_ranges
tests: lib: random_mutation_generator: Extract make_random_range_tombstone()
partition_snapshot_row_cursor: Support reverse iteration
utils: immutable-collection: Make movable
intrusive_btree: Make default-initialized iterator cast to false
This completes the batch_ and modification_statement rework.
Also touch the private batch_statement::read_command while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Otherwise, it may trip an assertion when the nuderlying
file is closed, as seen in e.g.:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4318/artifact/testlog/x86_64_release/sstable_3_x_test.test_read_rows_only_index.4174.log
```
test/boost/sstable_3_x_test.cc(0): Entering test case "test_read_rows_only_index"
sstable_3_x_test: ./seastar/src/core/fstream.cc:205: virtual seastar::file_data_source_impl::~file_data_source_impl(): Assertion `_reads_in_progress == 0' failed.
Aborting on shard 0.
Backtrace:
0x22557e8
0x2286842
0x7f2799e99a1f
/lib64/libc.so.6+0x3d2a1
/lib64/libc.so.6+0x268a3
/lib64/libc.so.6+0x26788
/lib64/libc.so.6+0x35a15
0x222c53d
0x222c548
0xb929cc
0xc0b23b
0xa84bbf
0x24d0111
```
Decoded:
```
__GI___assert_fail at :?
~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:205
~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:202
std::default_delete<seastar::data_source_impl>::operator()(seastar::data_source_impl*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
(inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
(inlined by) ~data_source at ././seastar/include/seastar/core/iostream.hh:55
(inlined by) ~input_stream at ././seastar/include/seastar/core/iostream.hh:254
(inlined by) ~continuous_data_consumer at ././sstables/consumer.hh:484
(inlined by) ~index_consume_entry_context at ././sstables/index_reader.hh:116
(inlined by) std::default_delete<sstables::index_consume_entry_context<sstables::index_consumer> >::operator()(sstables::index_consume_entry_context<sstables::index_consumer>*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
(inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
(inlined by) ~index_bound at ././sstables/index_reader.hh:395
(inlined by) ~index_reader at ././sstables/index_reader.hh:435
std::default_delete<sstables::index_reader>::operator()(sstables::index_reader*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
(inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
(inlined by) ~index_reader_assertions at ././test/lib/index_reader_assertions.hh:31
(inlined by) operator() at ./test/boost/sstable_3_x_test.cc:4630
```
Test: unit(dev), sstable_3_x_test.test_read_rows_only_index(release X 10000)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211222132858.2155227-1-bhalevy@scylladb.com>
"
The second patch in this series is a mechanical conversion of
reader_concurrency_semaphore to flat_mutation_reader_v2, and caller
updates.
The first patch is needed to pass the test suite, since without it a
real reader version conversion would happen on every entry to and exit
from reader_concurrency_semaphore, which is stressful (for example:
mutation_reader_test.test_multishard_streaming_reader reaches 8191
conversions for a couple of readers, which somehow causes it to catch
SIGSEGV in diverse and seemingly-random places).
Note that in a real workload it is unreasonable to expect readers being
parked in a reader_concurrency_semaphore to be pristine, so
short-circuiting their version conversions will be impossible and this
workaround will not really help.
"
* tag 'rcs-v2-v4' of https://github.com/cmm/scylla:
reader_concurrency_semaphore: convert to flat_mutation_reader_v2
short-circuit flat mutation reader upgrades and downgrades
Make sure that major will compact data in all sstables and memtable,
as tombstones sitting in memtable could shadow data in sstables.
For example, a tombstone in memtable deleting a large partition could
be missed in major, so space wouldn't be saved as expected.
Additionally, write amplification is reduced as data in memtable
won't have to travel through tiers once flushed.
Fixes#9514.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211217160055.96693-2-raphaelsc@scylladb.com>
"
TWCS perform STCS on a window as long as it's the most recent one.
From there on, TWCS will compact all files in the past window into
a single file. With some moderate write load, it could happen that
there's still some compaction activity in that past window, meaning
that per-window major may miss some files being currently compacted.
As a result, a past window may contain more than 1 file after all
compaction activity is done on its behalf, which may increase read
amplification. To avoid that, TWCS will now make sure that per-window
major is serialized, to make sure no files are missed.
Fixes#9553.
tests: unit(dev).
"
* 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla:
TWCS: Make sure major on past window is done on all its sstables
TWCS: remove needless param for STCS options
TWCS: kill unused param in newest_bucket()
compaction: Implement strategy control and wire it
compaction: Add interface to control strategy behavior.
"
Users are adjusted by sprinkling `upgrade_to_v2()` and
`downgrade_to_v1()` where necessary (or removing any of these where
possible). No attempt was made to optimize and reduce the amount of
v1<->v2 conversions. This is left for follow-up patches to keep this set
small.
The combined reader is composed of 3 layers:
1. fragment producer - pop fragments from readers, return them in batches
(each fragment in a batch having the same type and pos).
2. fragment merger - merge fragment batches into single fragments
3. reader implementation glue-code
Converting layers (1) and (3) was mostly mechanical. The logic of
merging range tombstone changes is implemented at layer (2), so the two
different producer (layer 1) implementations we have share this logic.
Tests: unit(dev)
"
* 'combined-reader-v2/v4' of https://github.com/denesb/scylla:
test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging
mutation_reader: convert make_clustering_combined_reader() to v2
mutation_reader: convert position_reader_queue to v2
mutation_reader: convert make_combined_reader() overloads to v2
mutation_reader: combined_reader: convert reader_selector to v2
mutation_reader: convert combined reader to v2
mutation_reader: combined_reader: attach stream_id to mutation_fragments
flat_mutation_reader_v2: add v2 version of empty reader
test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation
Allow stopping compaction by type on a given keyspace and list of tables.
Also add api unit test suite that tests the existing `stop_compaction` api
and the new `stop_keyspace_compaction` api.
Fixes#9700Closes#9746
* github.com:scylladb/scylla:
api: storage_service: validate_keyspace: improve exception error message
api: compaction_manager: add stop_keyspace_compaction
api: storage_service: expose validate_keyspace and parse_tables
api: compaction_manager: stop_compaction: fix type description
compaction_manager: stop_compaction: expose optional table*
test: api: add basic compaction_manager test
Some implementation notes below.
When iterating in reverse, _last_row is after the current entry
(_next_row) in table schema order, not before like in the forward
mode.
Since there is no dummy row before all entries, reverse iteration must
be now prepared for the fact that advancing _next_row may land not
pointing at any row. The partition_snapshot_row_cursor maintains
continuity() correctly in this case, and positions the cursor before
all rows, so most of the code works unchanged. The only excpetion is
in move_to_next_entry(), which now cannot assume that failure to
advance to an entry means it can end a read.
maybe_drop_last_entry() is not implemented in reverse mode, which may
expose reverse-only workload to the problem of accumulating dummy
entries.
ensure_population_lower_bound() was not updating _last_row after
inserting the entry in latets version. This was not a problem for
forward reads because they do not modify the row in the partition
snapshot represented by _last_row. They only need the row to be there
in the latest version after the call. It's different for reveresed
reads, which change the continuity of the entry represented by
_last_row, hence _last_row needs to have the iterator updated to point
to the entry from the latest version, otherwise we'd set the
continuity of the previous version entry which would corrupt the
continuity.
There may be unconsumed but expected fragments in the stream at the
time of the call to produces_partition_end().
Call check_rts() sooner to avoid failures.