Add a consume() overload which takes a range tombstone change and drops
it just like the existing range tombstone overload does: query results
don't care about range tombstones.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Following ad48d8b43c, fix a similar problem which popped up
with higher inlining thresholds in query-result-writer.hh. Since
idl/query depends on idl/keys, it must follow in definition order.
Closes#7384
Currently, we cannot select more than 2^32 rows from a table because we are limited by types of
variables containing the numbers of rows. This patch changes these types and sets new limits.
The new limits take effect while selecting all rows from a table - custom limits of rows in a result
stay the same (2^32-1).
In classes which are being serialized and used in messaging, in order to be able to process queries
originating from older nodes, the top 32 bits of new integers are optional and stay at the end
of the class - if they're absent we assume they equal 0.
The backward compatibility was tested by querying an older node for a paged selection, using the
received paging_state with the same select statement on an upgraded node, and comparing the returned
rows with the result generated for the same query by the older node, additionally checking if the
paging_state returned by the upgraded node contained new fields with correct values. Also verified
if the older node simply ignores the top 32 bits of the remaining rows number when handling a query
with a paging_state originating from an upgraded node by generating and sending such a query to
an older node and checking the paging_state in the reply(using python driver).
Fixes#5101.
Use the digester class instead of md5_hasher to encapsulate the
decision of which hash algorithm to use.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Introduce class result_options to carry result options through the
request pipeline, which at this point mean the result type and the
digest algorithm. This class allows us to encapsulate the concrete
digest algorithm to use.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"Now that range queries go through the normal digest path, we rely on
query::result::calculate_counts() to count the amount of partitions
and rows returned.
This series optimizes it, in case it is needed, and also changes the
result message to include the partition and row counts, avoiding the
calculation altogether."
* 'calculate-counts/v3' of github.com:duarten/scylla:
query-result: Send row and partition count over the wire
query::result: Optimize calculate_counts()
Original IDL generated code was hardcoded to always use bytes_ostream.
This patch makes the output stream a template parameter so that any
valid output stream can be used.
Unfortunately, making IDL writers generic requires updates in the code
that uses them, this is fixed in C++17 which would be able to deduce the
parameter in most cases.
This patch adds a partition count to query::result, filled by the
query::result::builder. The partition count is present whenever the
result carries data, being absent only for the case where the result
contains only a digest.
We also ensure that counts are present for an empty query::result.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds a partition count to the query::result::builder. It is
intended to be incremented by users, and later used to build a
query::result.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
When paging is used the cluster is allowed to return less rows than the
client asked for. However, if such possibility is used we need a way of
telling that to the coordinator and the paging implementation so that
they can differentiate between short reads caused by the replica running
out of data to sent and short reads caused by any other means.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
If we're talking to just one replica, the digest is not going to be used,
so better not to calculate it at all. The optimization helps with
LOCAL_ONE queries where the result is large, but does not contain large
blobs (many small rows).
This patch adds a digest_algorithm parameter to the READ_DATA verb that
can take on two values: none and MD5 (default), and sets it to none when
we're reading from one replica.
In the future we may add other values for more hardware-friendly digest
algorithms.
Message-Id: <1479380600-19206-1-git-send-email-avi@scylladb.com>
The patch calculates row count during result building and while merging.
If one of results that are being merged does not have row count the
merged result will not have one either.
Query result digest is used to verify that all replicas have the same
data. Therefore, it needs to contain more information than the query
result itself in order to ensure proper detection of disagreements.
Generally, adding clustering keys to the digest regardless of whether
the client asked for them will guarantee correctness. However, adding
tombstones as well improves the chances of early detection of nodes
containing stale data.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
The query result footprint for cassandra-stress mutation as reported
by tests/memory-footprint increased by 18% from 285 B to 337 B.
perf_simple_query shows slight regression in throughput (-8%):
build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000
Before: ~433k tps
After: ~400k tps
We want the format of query results to be eventually defined in the
IDL and be independent of the format we use in memory to represent
collections. This change is a step in this direction.
The change decouples format of collection cells in query results from
our in-memory representation. We currently use collection_mutation_view,
after the change we will use CQL binary protocol format. We use that because
it requires less transformations on the coordinator side.
One complication is that some list operations need to retrieve keys
used in list cells, not only values. To satisfy this need, new query
option was added called "collections_as_maps" which will cause lists
and sets to be reinterpreted as maps matching their underlying
representation. This allows the coordinator to generate mutations
referencing existing items in lists.
Allows for having more than one clustering row range set, depending on
PK queried (although right now limited to one - which happens to be exactly
the number of mutiplexing paging needs... What a coincidence...)
Encapsulates the row_ranges member in a query function, and if needed holds
ranges outside the default one in an extra object.
Query result::builder::add_partition now fetches the correct row range for
the partition, and this is the range used in subsequent iteration.
When partition has no live regular rows, but has some data live in the
static row, then it should appear in the results, even though we
didn't select any static column.
To reproduce:
create table cf (k blob, c blob, v blob, s1 blob static, primary key (k, c));
update cf set s1 = 0x01 where k = 0x01;
update cf set s1 = 0x02 where k = 0x02;
select k from cf;
The "select" statement should return 2 rows, but was returning 0.
The following query worked fine, because static columns were included:
select * from cf;
The data query should contain only live data, so we shouldn't write a
partition entry if it's supposed to be absent from the results. We
can'r tell that though until we've processed all the data. To solve
this problem, query result writer is using an optimistic approach,
where the partition header will be retracted from the buffer
(cheaply), if it turns out there's no live data in it.
Deleted cells store deletion time not expiry time. This change makes
expiry() valid only for live cells with TTL and adds deletion_time(),
which is inteded to be used with deleted cells.