Files

Michał Radwański 1fbf433966 mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse

This commit removes consume_in_reverse::legacy_half_reverse, an option
once used to indicate that the given key ranges are sorted descending,
based on the clustering key of the start of the range, and that the
range tombstones inside partition would be sorted (descending, as all
the mutation fragments would) according to their end (but range
tombstone would still be stored according to their start bound).

As it turns out, mutation::consume, when called with legacy_half_reverse
option produces invalid fragment stream, one where all the row
tombstone changes come after all the clustering rows. This was not an
issue, since when constructing results from the query, Scylla would not
pass the tombstones to the client, but instead compact data beforehand.

In this commit, the consume_in_reverse::legacy_half_reverse is removed,
along with all the uses.

As for the swap out in mutation_partition.cc in query_mutation and
to_data_query_result:

The downstream was not prepared to deal with legacy_half_reverse.
mutation::consume contains

```
     if (reverse == consume_in_reverse::yes) {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
        }
     } else {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
         }
     }
```

So why did it work at all? to_data_query_result deals with a single slice.
The used consumer (compact_for_query_v2) compacts-away the range tombstone
changes, and thus the only difference between the consume_in_reverse::no
and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys
and the second one was ordered decreasing. This property is maintained if
we swap out for the consume_in_reverse::yes format.

2023-01-05 18:48:55 +01:00

3.9 KiB

Raw Blame History

Reverse reads

A read is called reverse when it reads with reverse clustering order (compared to that of the schema). Example:

CREATE TABLE mytable (
    pk int,
    ck int,
    s int STATIC,
    v int,
    PRIMARY KEY (pk, ck)
) WITH
    CLUSTERING ORDER BY (ck ASC);

# Forward read (using table's native order)
SELECT * FROM mytable WHERE pk = 1;
# Explicit forward order
SELECT * FROM mytable WHERE pk = 1 ORDER BY ck ASC;

# Reverse read
SELECT * FROM mytable WHERE pk = 1 ORDER BY ck DESC;

If the table's native clustering order is DESC, then a read with ASC order is considered reverse.

Legacy format

The legacy format is how scylla handled reverse queries internally. We are in the process of migrating to the native reverse format.

Request

The query::partition_slice::options::reversed flag is set. Clustering ranges in both query::partition_slice::_row_ranges and query::specific_ranges::_ranges (query::partition_slice::_specific_ranges) are half-reversed: they are ordered in reverse, but when they are compared to other mutation-fragments, their end bound is used as position, instead of the start bound as usual. When compared to other clustering ranges the end bound is used as the start bound and vice-versa. Example:

For the clustering keys (ASC order): ck1, ck2, ck3, ck4, ck5, ck6. A _row_ranges field of a slice might contain this:

[ck1, ck2], [ck4, ck5]

The legacy reversed version would look like this:

[ck4, ck5], [ck1, ck2]

Note how the ranges themselves are the same (bounds not reversed), it is just the range vector itself that is reversed.

Result

Results are ordered with the reversed clustering order with the caveat that range-tombstones are ordered by their end bound, using the native schema's comparators. For example given the following partition:

ps{pk1}, sr{}, cr{ck1}, rt{[ck2, ck4)}, cr{ck2}, cr{ck3}, cr{ck4}, ck{ck5}, pe{}

The legacy reverse format equivalent of this looks like the following:

ps{pk1}, sr{}, cr{ck5}, rt{[ck2, ck4)}, cr{ck4}, cr{ck3}, cr{ck2}, ck{ck1}, pe{}

Note:

Only clustering elements change;
Range tombstone's bounds are not reversed;
Range tombstones can be ordered off-by-one due to native schema comparators used: rt{[ck2, ck4)} should be ordered after cr{ck4}.

Legend:

ps = partitions-tart
sr = static-row
cr = clustering-row
rt = range-tombstone
pe = partition-end

Native format

The native format uses ordering equivalent to that of a table with reverse clustering format. Using mytable as an example, the native reverse format would be an identical table my_reverse_table, which uses CLUSTERING ORDER BY (ck DESC);. This allows middle layers in a read pipeline to just use a schema with reversed clustering order and process the reverse stream as normal.

Request

The query::partition_slice::options::reversed flag is set as in the legacy format. Clustering ranges in both query::partition_slice::_row_ranges and query::specific_ranges::_ranges (query::partition_slice::_specific_ranges) are fully-reversed: they are ordered in reverse, their bound being swapped as well. Example:

For the clustering keys (ASC order): ck1, ck2, ck3, ck4, ck5, ck6. A _row_ranges field of a slice might contain this:

[ck1, ck2], [ck4, ck5]

The native reversed version would look like this:

[ck5, ck4], [ck2, ck1]

In addition to this, the schema is reversed on the replica, at the start of the read, so all the reverse-capable and intermediate readers in the stack get a reversed schema to work with.

Result

Results are ordered with the reversed clustering order with the bounds of range-tombstones swapped. For example, given the same partition that was used in the legacy format example, the native reverse version would look like this:

ps{pk1}, sr{}, cr{ck5}, cr{ck4}, rt{(ck4, ck2]}, cr{ck3}, cr{ck2}, ck{ck1}, pe{}

3.9 KiB Raw Blame History

Reverse reads

Legacy format

Request

Result

Native format

Request

Result

3.9 KiB

Raw Blame History