Commit Graph

6 Commits

Author SHA1 Message Date
Nadav Har'El
e277f747bd Merge 'Make collection unfreezing more efficient' from Botond Dénes
Introduce `read_from_collection_cell_view()` which reads a `collection_mutation` directly from the IDL representation of a collection (`ser::collection_cell_view`). This cuts down the number of allocations required drastically compared to the current method of:

    IDL -> collection_mutatio_description -> collection_mutation

Reduces the number of allocations to unfreeze a collection from O(collection_cell_count) -> O(1) (actually, due to buffer fragmentation, it is O(collection_size)).
The new method is used when unfreezing frozen mutations and frozen mutation fragments. This is on the hot path: all writes with collections benefit.

Add a `--collection` flag to `perf-simple-query` to allow measuring the performance improvement of this PR.
With  `dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error --write`  the number of allocations drop from ~123 to 102, which is a significant amount of allocations shaved off.

Refs: https://github.com/scylladb/scylladb/issues/3602 (solves one use-case out of the many listed therein)
Fixes: SCYLLADB-1046
Fixes: SCYLLADB-1077

Backport: this is an optimization so normally not a backport candidate, but we may have to backport to relieve certain customers

Closes scylladb/scylladb#29033

* github.com:scylladb/scylladb:
  test/perf/perf_simple_query: add --collection=N
  test/boost/frozen_mutation_test: add freeze/unfreeze test for large collections
  mutation/mutation_partition_view: use read_from_collection_cell_view() to read collections
  mutation/collection_mutation: introduce read_from_collection_cell_view()
  mutation/atomic_cell: atomic_cell_type: add write*() and *serialized_size()
  mutation/collection_mutation: generalize serialize_collection_mutation
  mutation/mutation_partition_view: avoid copying collection
  mutation/mutation_partition_view: accept collection_mutation in the consume API
  partition_builder: add move variant of accept_*_cell() collection overloads
2026-05-10 20:39:08 +03:00
Botond Dénes
1bb04824a8 mutation/collection_mutation: introduce read_from_collection_cell_view()
Reads a collection_mutation directly from the IDL representation of a
collection. This cuts down the number of allocations required
drastically compared to the current method of:

    IDL -> collection_mutatio_description -> collection_mutation

Intended to be used in frozen_mutation::unfreeze() and similar use-cases.
2026-04-15 09:46:54 +03:00
Piotr Dulikowski
9fc2c65d18 Merge 'cql3: implement WRITETIME() and TTL() of individual elements of map, set, and UDT' from Nadav Har'El
In commit 727f68e0f5 we added the ability to SELECT:

* Individual elements of a map: `SELECT map_col[key]`.
* Individual elements of a set: `SELECT set_col[key]` returns key if the key exists in the set, or null if it doesn't, allowing to check if the element exists in the set.
* Individual pieces of a UDT: `SELECT udt_col.field`.

But at the time, we didn't provide any way to retrieve the **meta-data** for this value, namely its timestamp and TTL. We did not support `SELECT TIMESTAMP(collection[key])`, or `SELECT TIMESTAMP(udt.field)`.

Users requested to support such SELECTs in the past (see issue #15427), and Cassandra 5.0 added support for this feature - for both maps and sets and udts - so we also need this feature for compatibility. This feature was also requested recently by vector-search developers, who wanted to read Alternator columns - stored as map elements, not individual columns - with their WRITETIME information.

The first four patches in this series adds the feature (in four smaller patches instead one big one), the fifth and sixth patches add tests (cqlpy and boost tests, respectively). The seventh patch adds documentation.

All the new tests pass on Cassandra 5, failed on Scylla before the present fix, and pass with it.

The fix was surprisingly difficult. Our existing implementation (from 727f68e0f5 building on earlier machinery) doesn't just "read" `map_col[key]` and allow us to return just its timestamp. Rather, the implementation reads the entire map, serializes it in some temporary format that does **not** include the timestamps and ttls, and then takes the subscript key, at which point we no longer have the timestamp or ttl of the element. So the fix had to cross all these layers of the implementation.

While adding support for UDT fields in a pre-existing grammar nonterminal "subscriptExpr", we unintentionally added support for UDT fields also in LWT expressions (which used this nonterminal). LWT missing support for UDT fields was a long-time known compatibility issue (#13624) so we unintentionally fixed it :-) Actually, to completely fix it we needed another small change in the expression implementation, so the eighth patch in this series does this.

Fixes #15427
Fixes #13624

Closes scylladb/scylladb#29134

* github.com:scylladb/scylladb:
  cql3: support UDT fields in LWT expressions
  cql3: document WRITETIME() and TTL() for elements of map, set or UDT
  test/boost: test WRITETIME() and TTL() on map collection elements
  test/cqlpy: test WRITETIME() and TTL() on element of map, set or UDT
  cql3: prepare and evaluate WRITETIME/TTL on collection elements and UDT fields
  cql3: parse per-element timestamps/TTLs in the selection layer
  cql3: add extended wire format for per-element timestamps and TTLs
  cql3: extend WRITETIME/TTL grammar to accept collection and UDT elements
2026-04-14 12:35:46 +02:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Nadav Har'El
bb63db34e5 cql3: add extended wire format for per-element timestamps and TTLs
Introduce the infrastructure needed to transport per-element timestamps
and TTL expiry times from replicas to coordinators, required for
WRITETIME(col[key]) / TTL(col[key]) and WRITETIME(col.field) /
TTL(col.field).

* Add a 'writetime_ttl_individual_element' cluster feature flag that
  guards usage of the new wire format during rolling upgrades: the
  extended format is only emitted and consumed when every node in the
  cluster supports it.

* Implement serialize_for_cql_with_timestamps() (types/types.cc), a
  variant of serialize_for_cql() that appends a per-element section to
  the regular CQL bytes, listing each live element's serialized key,
  timestamp, and expiry.  The format is:
    [uint32 cql_len][cql bytes]
    [int32  entry_count]
    [per entry: (int32 key_len)(key bytes)(int64 timestamp)(int64 expiry)]
  expiry is -1 when the element has no TTL.

* Add partition_slice::option::send_collection_timestamps and modify
  write_cell() (mutation_partition.cc) to use the new function
  serialize_for_cql_with_timestamps() when this option is available.

This commit stands alone with no user-visible effect: nothing yet sets
the new partition-slice option.  The next patch adds the selection-layer
code that sets the option and parses the extended response.
2026-04-12 11:49:06 +03:00
Ernest Zaslavsky
5ba5aec1f8 treewide: Move mutation related files to a mutation directory
As requested in #22104, moved the files and fixed other includes and build system.

Moved files:
 - combine.hh
 - collection_mutation.hh
 - collection_mutation.cc
 - converting_mutation_partition_applier.hh
 - converting_mutation_partition_applier.cc
 - counters.hh
 - counters.cc
 - timestamp.hh

Fixes: #22104

This is a cleanup, no need to backport

Closes scylladb/scylladb#25085
2025-09-24 13:23:38 +03:00