Introduce `read_from_collection_cell_view()` which reads a `collection_mutation` directly from the IDL representation of a collection (`ser::collection_cell_view`). This cuts down the number of allocations required drastically compared to the current method of:
IDL -> collection_mutatio_description -> collection_mutation
Reduces the number of allocations to unfreeze a collection from O(collection_cell_count) -> O(1) (actually, due to buffer fragmentation, it is O(collection_size)).
The new method is used when unfreezing frozen mutations and frozen mutation fragments. This is on the hot path: all writes with collections benefit.
Add a `--collection` flag to `perf-simple-query` to allow measuring the performance improvement of this PR.
With `dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error --write` the number of allocations drop from ~123 to 102, which is a significant amount of allocations shaved off.
Refs: https://github.com/scylladb/scylladb/issues/3602 (solves one use-case out of the many listed therein)
Fixes: SCYLLADB-1046
Fixes: SCYLLADB-1077
Backport: this is an optimization so normally not a backport candidate, but we may have to backport to relieve certain customers
Closesscylladb/scylladb#29033
* github.com:scylladb/scylladb:
test/perf/perf_simple_query: add --collection=N
test/boost/frozen_mutation_test: add freeze/unfreeze test for large collections
mutation/mutation_partition_view: use read_from_collection_cell_view() to read collections
mutation/collection_mutation: introduce read_from_collection_cell_view()
mutation/atomic_cell: atomic_cell_type: add write*() and *serialized_size()
mutation/collection_mutation: generalize serialize_collection_mutation
mutation/mutation_partition_view: avoid copying collection
mutation/mutation_partition_view: accept collection_mutation in the consume API
partition_builder: add move variant of accept_*_cell() collection overloads
Reads a collection_mutation directly from the IDL representation of a
collection. This cuts down the number of allocations required
drastically compared to the current method of:
IDL -> collection_mutatio_description -> collection_mutation
Intended to be used in frozen_mutation::unfreeze() and similar use-cases.
In commit 727f68e0f5 we added the ability to SELECT:
* Individual elements of a map: `SELECT map_col[key]`.
* Individual elements of a set: `SELECT set_col[key]` returns key if the key exists in the set, or null if it doesn't, allowing to check if the element exists in the set.
* Individual pieces of a UDT: `SELECT udt_col.field`.
But at the time, we didn't provide any way to retrieve the **meta-data** for this value, namely its timestamp and TTL. We did not support `SELECT TIMESTAMP(collection[key])`, or `SELECT TIMESTAMP(udt.field)`.
Users requested to support such SELECTs in the past (see issue #15427), and Cassandra 5.0 added support for this feature - for both maps and sets and udts - so we also need this feature for compatibility. This feature was also requested recently by vector-search developers, who wanted to read Alternator columns - stored as map elements, not individual columns - with their WRITETIME information.
The first four patches in this series adds the feature (in four smaller patches instead one big one), the fifth and sixth patches add tests (cqlpy and boost tests, respectively). The seventh patch adds documentation.
All the new tests pass on Cassandra 5, failed on Scylla before the present fix, and pass with it.
The fix was surprisingly difficult. Our existing implementation (from 727f68e0f5 building on earlier machinery) doesn't just "read" `map_col[key]` and allow us to return just its timestamp. Rather, the implementation reads the entire map, serializes it in some temporary format that does **not** include the timestamps and ttls, and then takes the subscript key, at which point we no longer have the timestamp or ttl of the element. So the fix had to cross all these layers of the implementation.
While adding support for UDT fields in a pre-existing grammar nonterminal "subscriptExpr", we unintentionally added support for UDT fields also in LWT expressions (which used this nonterminal). LWT missing support for UDT fields was a long-time known compatibility issue (#13624) so we unintentionally fixed it :-) Actually, to completely fix it we needed another small change in the expression implementation, so the eighth patch in this series does this.
Fixes#15427Fixes#13624Closesscylladb/scylladb#29134
* github.com:scylladb/scylladb:
cql3: support UDT fields in LWT expressions
cql3: document WRITETIME() and TTL() for elements of map, set or UDT
test/boost: test WRITETIME() and TTL() on map collection elements
test/cqlpy: test WRITETIME() and TTL() on element of map, set or UDT
cql3: prepare and evaluate WRITETIME/TTL on collection elements and UDT fields
cql3: parse per-element timestamps/TTLs in the selection layer
cql3: add extended wire format for per-element timestamps and TTLs
cql3: extend WRITETIME/TTL grammar to accept collection and UDT elements
Introduce the infrastructure needed to transport per-element timestamps
and TTL expiry times from replicas to coordinators, required for
WRITETIME(col[key]) / TTL(col[key]) and WRITETIME(col.field) /
TTL(col.field).
* Add a 'writetime_ttl_individual_element' cluster feature flag that
guards usage of the new wire format during rolling upgrades: the
extended format is only emitted and consumed when every node in the
cluster supports it.
* Implement serialize_for_cql_with_timestamps() (types/types.cc), a
variant of serialize_for_cql() that appends a per-element section to
the regular CQL bytes, listing each live element's serialized key,
timestamp, and expiry. The format is:
[uint32 cql_len][cql bytes]
[int32 entry_count]
[per entry: (int32 key_len)(key bytes)(int64 timestamp)(int64 expiry)]
expiry is -1 when the element has no TTL.
* Add partition_slice::option::send_collection_timestamps and modify
write_cell() (mutation_partition.cc) to use the new function
serialize_for_cql_with_timestamps() when this option is available.
This commit stands alone with no user-visible effect: nothing yet sets
the new partition-slice option. The next patch adds the selection-layer
code that sets the option and parses the extended response.
As requested in #22104, moved the files and fixed other includes and build system.
Moved files:
- combine.hh
- collection_mutation.hh
- collection_mutation.cc
- converting_mutation_partition_applier.hh
- converting_mutation_partition_applier.cc
- counters.hh
- counters.cc
- timestamp.hh
Fixes: #22104
This is a cleanup, no need to backport
Closesscylladb/scylladb#25085