Files
scylladb/mutation/collection_mutation.hh
Piotr Dulikowski 9fc2c65d18 Merge 'cql3: implement WRITETIME() and TTL() of individual elements of map, set, and UDT' from Nadav Har'El
In commit 727f68e0f5 we added the ability to SELECT:

* Individual elements of a map: `SELECT map_col[key]`.
* Individual elements of a set: `SELECT set_col[key]` returns key if the key exists in the set, or null if it doesn't, allowing to check if the element exists in the set.
* Individual pieces of a UDT: `SELECT udt_col.field`.

But at the time, we didn't provide any way to retrieve the **meta-data** for this value, namely its timestamp and TTL. We did not support `SELECT TIMESTAMP(collection[key])`, or `SELECT TIMESTAMP(udt.field)`.

Users requested to support such SELECTs in the past (see issue #15427), and Cassandra 5.0 added support for this feature - for both maps and sets and udts - so we also need this feature for compatibility. This feature was also requested recently by vector-search developers, who wanted to read Alternator columns - stored as map elements, not individual columns - with their WRITETIME information.

The first four patches in this series adds the feature (in four smaller patches instead one big one), the fifth and sixth patches add tests (cqlpy and boost tests, respectively). The seventh patch adds documentation.

All the new tests pass on Cassandra 5, failed on Scylla before the present fix, and pass with it.

The fix was surprisingly difficult. Our existing implementation (from 727f68e0f5 building on earlier machinery) doesn't just "read" `map_col[key]` and allow us to return just its timestamp. Rather, the implementation reads the entire map, serializes it in some temporary format that does **not** include the timestamps and ttls, and then takes the subscript key, at which point we no longer have the timestamp or ttl of the element. So the fix had to cross all these layers of the implementation.

While adding support for UDT fields in a pre-existing grammar nonterminal "subscriptExpr", we unintentionally added support for UDT fields also in LWT expressions (which used this nonterminal). LWT missing support for UDT fields was a long-time known compatibility issue (#13624) so we unintentionally fixed it :-) Actually, to completely fix it we needed another small change in the expression implementation, so the eighth patch in this series does this.

Fixes #15427
Fixes #13624

Closes scylladb/scylladb#29134

* github.com:scylladb/scylladb:
  cql3: support UDT fields in LWT expressions
  cql3: document WRITETIME() and TTL() for elements of map, set or UDT
  test/boost: test WRITETIME() and TTL() on map collection elements
  test/cqlpy: test WRITETIME() and TTL() on element of map, set or UDT
  cql3: prepare and evaluate WRITETIME/TTL on collection elements and UDT fields
  cql3: parse per-element timestamps/TTLs in the selection layer
  cql3: add extended wire format for per-element timestamps and TTLs
  cql3: extend WRITETIME/TTL grammar to accept collection and UDT elements
2026-04-14 12:35:46 +02:00

148 lines
6.5 KiB
C++

/*
* Copyright (C) 2019-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#pragma once
#include "utils/chunked_vector.hh"
#include "schema/schema_fwd.hh"
#include "gc_clock.hh"
#include "mutation/atomic_cell.hh"
#include "mutation/compact_and_expire_result.hh"
#include "compaction/compaction_garbage_collector.hh"
#include <iosfwd>
#include <forward_list>
class abstract_type;
class compaction_garbage_collector;
class row_tombstone;
class collection_mutation;
// An auxiliary struct used to (de)construct collection_mutations.
// Unlike collection_mutation which is a serialized blob, this struct allows to inspect logical units of information
// (tombstone and cells) inside the mutation easily.
struct collection_mutation_description {
tombstone tomb;
// FIXME: use iterators?
// we never iterate over `cells` more than once, so there is no need to store them in memory.
// In some cases instead of constructing the `cells` vector, it would be more efficient to provide
// a one-time-use forward iterator which returns the cells.
utils::chunked_vector<std::pair<bytes, atomic_cell>> cells;
// Expires cells based on query_time. Expires tombstones based on max_purgeable and gc_before.
// Removes cells covered by tomb or this->tomb.
compact_and_expire_result compact_and_expire(column_id id, row_tombstone tomb, gc_clock::time_point query_time,
can_gc_fn&, gc_clock::time_point gc_before, compaction_garbage_collector* collector = nullptr);
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
// Similar to collection_mutation_description, except that it doesn't store the cells' data, only observes it.
struct collection_mutation_view_description {
tombstone tomb;
// FIXME: use iterators? See the fixme in collection_mutation_description; the same considerations apply here.
utils::chunked_vector<std::pair<bytes_view, atomic_cell_view>> cells;
// Copies the observed data, storing it in a collection_mutation_description.
collection_mutation_description materialize(const abstract_type&) const;
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
class collection_mutation_input_stream {
std::forward_list<bytes> _linearized;
managed_bytes_view _src;
public:
collection_mutation_input_stream(const managed_bytes_view& src) : _src(src) {}
template <Trivial T>
T read_trivial() {
return ::read_simple<T>(_src);
}
bytes_view read_linearized(size_t n);
managed_bytes_view read_fragmented(size_t n);
bool empty() const;
};
// Given a collection_mutation_view, returns an auxiliary struct allowing the inspection of each cell.
// The function needs to be given the type of stored data to reconstruct the structural information.
collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, collection_mutation_input_stream&);
class collection_mutation_view {
public:
managed_bytes_view data;
// Is this a noop mutation?
bool is_empty() const;
// Is any of the stored cells live (not deleted nor expired) at the time point `tp`,
// given the later of the tombstones `t` and the one stored in the mutation (if any)?
// Requires a type to reconstruct the structural information.
bool is_any_live(const abstract_type&, tombstone t = tombstone(), gc_clock::time_point tp = gc_clock::time_point::min()) const;
// The maximum of timestamps of the mutation's cells and tombstone.
api::timestamp_type last_update(const abstract_type&) const;
// Given a function that operates on a collection_mutation_view_description,
// calls it on the corresponding description of `this`.
template <typename F>
inline decltype(auto) with_deserialized(const abstract_type& type, F f) const {
collection_mutation_input_stream stream(data);
return f(deserialize_collection_mutation(type, stream));
}
class printer {
const abstract_type& _type;
const collection_mutation_view& _cmv;
public:
printer(const abstract_type& type, const collection_mutation_view& cmv)
: _type(type), _cmv(cmv) {}
friend fmt::formatter<printer>;
};
};
// A serialized mutation of a collection of cells.
// Used to represent mutations of collections (lists, maps, sets) or non-frozen user defined types.
// It contains a sequence of cells, each representing a mutation of a single entry (element or field) of the collection.
// Each cell has an associated 'key' (or 'path'). The meaning of each (key, cell) pair is:
// for sets: the key is the serialized set element, the cell contains no data (except liveness information),
// for maps: the key is the serialized map element's key, the cell contains the serialized map element's value,
// for lists: the key is a timeuuid identifying the list entry, the cell contains the serialized value,
// for user types: the key is an index identifying the field, the cell contains the value of the field.
// The mutation may also contain a collection-wide tombstone.
class collection_mutation {
public:
managed_bytes _data;
collection_mutation() {}
collection_mutation(const abstract_type&, collection_mutation_view);
collection_mutation(const abstract_type&, managed_bytes);
operator collection_mutation_view() const;
};
collection_mutation merge(const abstract_type&, collection_mutation_view, collection_mutation_view);
collection_mutation difference(const abstract_type&, collection_mutation_view, collection_mutation_view);
// Serializes the given collection of cells to a sequence of bytes ready to be sent over the CQL protocol.
bytes_ostream serialize_for_cql(const abstract_type&, collection_mutation_view);
// Like serialize_for_cql, but uses an extended format that embeds per-element
// timestamps and expiries, for use with WRITETIME(col[key]) / TTL(col[key])
// and WRITETIME(col.field) / TTL(col.field) selectors.
// The format is: [cql-bytes-length as uint32][regular CQL bytes][count as int32]
// [per-element: (key-len as int32)(key bytes)(timestamp as int64)(expiry as int64 in gc_clock ticks, -1 if none)]
bytes_ostream serialize_for_cql_with_timestamps(const abstract_type&, collection_mutation_view);
template <>
struct fmt::formatter<collection_mutation_view::printer> : fmt::formatter<string_view> {
auto format(const collection_mutation_view::printer&, fmt::format_context& ctx) const
-> decltype(ctx.out());
};