When merging multiple query-results, we use the last-position of the
last result in the combined one as the combined result's last position.
This only works however if said last result was included fully.
Otherwise we have to discard the last-position included with the result
and the pager will use the position of the last row in the combined
result as the last position.
The commit introducing the above logic mistakenly discarded the last
position when the result is a short read or a page is not full. This is
not necessary and even harmful as it can result in an empty combined
result being delivered to the pager, without a last-position.
query_result was the wrong place to put last position into. It is only
included in data-responses, but not on digest-responses. If we want to
support empty pages from replicas, both data and digest responses have
to include the last position. So hoist up the last position to the
parent structure: query::result. This is a breaking change inter-node
ABI wise, but it is fine: the current code wasn't released yet.
Closes#11072
Enables parallelization of UDA and native aggregates. The way the
query is parallelized is the same as in #9209. Separate reduction
type for `COUNT(*)` is left for compatibility reason.
Use the recently introduced query-result facility to have the replica
set the position where the query should continue from. For now this is
the same as what the implicit position would have been previously (last
row in result), but it opens up the possibility to stop the query at a
dead row.
To be used to allow the replica to specify the last position in the
stream, where the query was left off. Currently this is always
the same as the implicit position -- the last row in the result-set --
but this requires only stopping the read on a live row, which is a
requirement we want to lift: we want to be able to stop on a tombstone.
As tombstones are not included in the query result, we have to allow the
replica to overwrite the last seen position explicitly.
This patch introduces the new field in the query-result IDL but it is
not written to yet, nor is it read, that is left for the next patches.
It was done to show more context in case of forward_result::merge
arguments size mismatch and also to prevent aborts caused by another
nodes sending malformed data.
Except for the verb addition, this commit also defines forward_request
and forward_result structures, used as an argument and result of the new
rpc. forward_request is used to forward information about select
statement that does count(*) (or other aggregating functions such as
max, min, avg in the future). Due to the inability to serialize
cql3::statements::select_statement, I chose to include
query::read_command, dht::partition_range_vector and some configuration
options in forward_request. They can be serialized and are sufficient
enough to allow creation of service::pager::query_pagers::pager.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Most of the machinery was already implemented since it was used when
jumping between clustering ranges of a query slice. We need only perform
one additional thing when performing an index skip during
fast-forwarding: reset the stored range tombstone in the consumer (which
may only be stored in fast-forwarding mode, so it didn't matter that it
wasn't reset earlier). Comments were added to explain the details.
As a preparation for the change, we extend the sstable reversing reader
random schema test with a fast-forwarding test and include some minor
fixes.
Fixes#9427.
Closes#9484
* github.com:scylladb/scylla:
query-request: add comment about clustering ranges with non-full prefix key bounds
sstables: mx: enable position fast-forwarding in reverse mode
test: sstable_conforms_to_mutation_source_test: extend `test_sstable_reversing_reader_random_schema` with fast-forwarding
test: sstable_conforms_to_mutation_source_test: fix `vector::erase` call
test: mutation_source_test: extract `forwardable_reader_to_mutation` function
test: random_schema: fix clustering column printing in `random_schema::cql`
The test would check whether the forward and reverse readers returned
consistent results when created in non-forwarding mode with slicing.
Do the same but using fast-forwarding instead of slicing.
To do this we require a vector of `position_range`s. We also need a
vector of `clustering_range`s for the existing test. We modify the
existing `random_ranges` function to return `position_range`s instead of
`clustering_range`s since `position_range`s are easier to reason about,
especially when we consider non-full clustering key prefixes. A function
is introduced to convert a `position_range` to a `clustering_range` for
the existing test.
Currently, we cannot select more than 2^32 rows from a table because we are limited by types of
variables containing the numbers of rows. This patch changes these types and sets new limits.
The new limits take effect while selecting all rows from a table - custom limits of rows in a result
stay the same (2^32-1).
In classes which are being serialized and used in messaging, in order to be able to process queries
originating from older nodes, the top 32 bits of new integers are optional and stay at the end
of the class - if they're absent we assume they equal 0.
The backward compatibility was tested by querying an older node for a paged selection, using the
received paging_state with the same select statement on an upgraded node, and comparing the returned
rows with the result generated for the same query by the older node, additionally checking if the
paging_state returned by the upgraded node contained new fields with correct values. Also verified
if the older node simply ignores the top 32 bits of the remaining rows number when handling a query
with a paging_state originating from an upgraded node by generating and sending such a query to
an older node and checking the paging_state in the reply(using python driver).
Fixes#5101.
Since it contains a precise set of columns, it's more
accurate to call it a set, not a mask. Besides, the name
column_mask is already used for column options on storage
level.
Non-full prefix keys are currently not handled correctly as all keys
are treated as if they were full prefixes, and therefore they represent
a point in the key space. Non-full prefixes however represent a
sub-range of the key space and therefore require null extending before
they can be treated as a point.
As a quick reminder, `key` is used to trim the clustering ranges such
that they only cover positions >= then key. Thus,
`trim_clustering_row_ranges_to()` does the equivalent of intersecting
each range with (key, inf). When `key` is a prefix, this would exclude
all positions that are prefixed by key as well, which is not desired.
Fixes: #4839
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190819134950.33406-1-bdenes@scylladb.com>
Allow expressing `pos` in term of a `position_in_partition_view`, which
allows finer control of the exact position, allowing specifying position
before, at or after a certain key.
The previous overload is kept for backward compatibility, invoking the
new overload behind the curtains.
This algorithm was already duplicated in two places
(service/pager/query_pagers.cc and mutation_reader.cc). Soon it will be
used in a third place. Instead of triplicating, move it into a function
that everybody can use.
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().
Mechanically converted with https://github.com/avikivity/unsprint.
query::full_slice doesn't select any regular or static columns, which
is at odds with the expectations of its users. This patch replaces it
with the schema::full_slice() version.
Refs #2885
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>
Now that range queries go through the normal digest path, we rely on
query::result::calculate_counts() to count the amount of partitions
and rows returned. This patch makes it a bit faster.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the may_be_affected_by() function to the view class,
which is responsible to determine whether an update to a base class
affects one of its views.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Original IDL generated code was hardcoded to always use bytes_ostream.
This patch makes the output stream a template parameter so that any
valid output stream can be used.
Unfortunately, making IDL writers generic requires updates in the code
that uses them, this is fixed in C++17 which would be able to deduce the
parameter in most cases.
This patch makes the row limit enforced by the storage_proxy layer.
It adds a row limit to the query_result_merger, useful when merging
results for concurrent queries.
More importantly, it provides guarantees that upper layers may be
relying on implicitly (e.g., the paging code).
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch ensures we keep track of how many partitions we've queried
so we don't ask for more than the number we need.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds a partition count to query::result, filled by the
query::result::builder. The partition count is present whenever the
result carries data, being absent only for the case where the result
contains only a digest.
We also ensure that counts are present for an empty query::result.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds a partition limit to the query_result_merger, useful
when merging results for concurrent queries. This change also makes
the partition limit enforced by the storage_proxy layer, no changes
being needed by the upper layers, namely the Thrift interface.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces an infrastrucutre for limiting result size.
There is a shard-local limit which makes sure that all results combined
do not use more than 10% of the shard memory.
There is also an invidual limit which restricts a result to 4 MB.
In order
In order to avoid sending tiny results there is minimum guaranteed size
(4 kB), which the query needs to reserve before it starts producing the
result.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
When paging is used the cluster is allowed to return less rows than the
client asked for. However, if such possibility is used we need a way of
telling that to the coordinator and the paging implementation so that
they can differentiate between short reads caused by the replica running
out of data to sent and short reads caused by any other means.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
That will be important for sstable code that will rule out a sstable
if it doesn't cover a given clustering key range.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
query::full_slice is a partiton slice which has full clustering row
ranges for all partition keys and no per-partition row limit.
Options and columns are not set.
It is used as a helper object in cases when a reference to
partition_slice is needed but the user code needs just all data there is
(an example of such case would be sstable compaction).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This patch as a per-partition row limit. It ensures both local
queries and the reconciliation logic abide by this limit.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The patch calculates row count during result building and while merging.
If one of results that are being merged does not have row count the
merged result will not have one either.