When evaluating an LWT condition involving both static and non-static
cells, and matching no regular row, the static row must be used UNLESS
the IF condition is IF EXISTS/IF NOT EXISTS, in which case special rules
apply.
Before this fix, Scylla used to assume a row doesn't exist if there is
no matching primary key. In Cassandra, if there is a
non-empty static row in the partition, a regular row based
on the static row' cell values is created in this case, and then this
row is used to evaluate the condition.
This problem was reported as gh-10081.
The reason for Scylla behaviour before the patch was that when
implementing LWT I tried to converge Cassandra data model (or lack of
thereof) with a relational data model, and assumed a static row is a
"shared" portion of a regular row, i.e. a storage level concept intended
to save space, and doesn't have independent existence.
This was an oversimplification.
This patch fixes gh-10081, making Scylla semantics match the one of
Cassandra.
I will now list other known examples when a static row has an own
independent existence as part of a table, for cataloguing purposes.
SELECT * from a partition which has a partition key
and a static cell set returns 1 row. If later a regular row is added
to the partition, the SELECT would still return 1 row, i.e.
the static row will disappear, and a regular row will appear instead.
Another example showing a static row has an independent existence below:
CREATE TABLE t (p int, c int, s int static, PRIMARY KEY(p, c));
INSERT INTO t (p, c) VALUES(1, 1);
INSERT INTO t (p, s) VALUES(1, 1) IF NOT EXISTS;
In Cassandra (and Scylla), IF NOT EXISTS evaluates to TRUE, even though both
the regular row and the partition exist. But the static cells are not
set, and the insert only provides a partition key, so the database assumes the
insert is operating against a static row.
It would be wrong to assume that a static row exists when the partition
key exists:
INSERT INTO t (p, c, s) VALUES(1, 1, 1) IF NOT EXISTS;
[applied] | p | c | s
-----------+---+---+------
False | 1 | 1 | null
evaluates to False, i.e. the regular row does exist when p and c exist.
Issue
CREATE TABLE t (p INT, c INT, r INT, s INT static, PRIMARY KEY(p, c))
INSERT INTO t (p, s) VALUES (1, 1);
UPDATE t SET s=2, r=1 WHERE p=1 AND c=1 IF s=1 and r=null;
- in this case, even though the regular row doesn't exist, the static
row does, and should be used for condition evaluation.
In other words, IF EXISTS/IF NOT EXISTS have contextual semantics.
They apply to the regular row if clustering key is used in the WHERE
clause, otherwise they apply to static row.
One analogy for static rows is that it is like a static member of C++ or
Java class. It's an attribute of the class (assuming class = partition),
which is accessible through every object of the class (object = regular
row). It is also present if there are no objects of the class, but the
class itself exists: i.e. a partition could have no regular rows, but
some static cells set, in this case it has a static row.
*Unlike C++/Java static class members* a static row is an optional
attribute of the partition. A partition may exist, but the static row
may be absent (e.g. no static cell is set). If the static row does exist,
all regular rows share its contents, *even if they do not exist*.
A regular row exists when its clustering key is present
in the table. A static row exists when at least one static cell is set.
Tests are updated because now when no matching row is found
for the update we show the value of the static row as the previous
value, instead of a non-matching clustering row.
Changes in v2:
- reworded the commit message
- added select tests
Closes #10711
192 lines
8.3 KiB
C++
192 lines
8.3 KiB
C++
/*
|
|
* Copyright (C) 2019-present ScyllaDB
|
|
*
|
|
* Modified by ScyllaDB
|
|
*/
|
|
|
|
/*
|
|
* SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
|
|
*/
|
|
|
|
#include "mutation.hh"
|
|
#include "modification_statement.hh"
|
|
#include "cas_request.hh"
|
|
#include <seastar/core/sleep.hh>
|
|
#include "cql3/result_set.hh"
|
|
#include "transport/messages/result_message.hh"
|
|
#include "types/map.hh"
|
|
#include "service/storage_proxy.hh"
|
|
#include "cql3/query_processor.hh"
|
|
|
|
namespace cql3::statements {
|
|
|
|
using namespace std::chrono;
|
|
|
|
void cas_request::add_row_update(const modification_statement& stmt_arg,
|
|
std::vector<query::clustering_range> ranges_arg,
|
|
modification_statement::json_cache_opt json_cache_arg,
|
|
const query_options& options_arg) {
|
|
// TODO: reserve updates array for batches
|
|
_updates.emplace_back(cas_row_update{
|
|
.statement = stmt_arg,
|
|
.ranges = std::move(ranges_arg),
|
|
.json_cache = std::move(json_cache_arg),
|
|
.options = options_arg});
|
|
}
|
|
|
|
std::optional<mutation> cas_request::apply_updates(api::timestamp_type ts) const {
|
|
// We're working with a single partition, so there will be only one element
|
|
// in the vector. A vector is used since this is a conventional format
|
|
// to pass a mutation onward.
|
|
std::optional<mutation> mutation_set;
|
|
for (const cas_row_update& op: _updates) {
|
|
update_parameters params(_schema, op.options, ts, op.statement.get_time_to_live(op.options), _rows);
|
|
|
|
std::vector<mutation> statement_mutations = op.statement.apply_updates(_key, op.ranges, params, op.json_cache);
|
|
// Append all mutations (in fact only one) to the consolidated one.
|
|
for (mutation& m : statement_mutations) {
|
|
if (mutation_set.has_value() == false) {
|
|
mutation_set.emplace(std::move(m));
|
|
} else {
|
|
mutation_set->apply(std::move(m));
|
|
}
|
|
}
|
|
}
|
|
|
|
return mutation_set;
|
|
}
|
|
|
|
lw_shared_ptr<query::read_command> cas_request::read_command(query_processor& qp) const {
|
|
|
|
column_set columns_to_read(_schema->all_columns_count());
|
|
std::vector<query::clustering_range> ranges;
|
|
|
|
for (const cas_row_update& op : _updates) {
|
|
if (op.statement.has_conditions() == false && op.statement.requires_read() == false) {
|
|
// No point in pre-fetching the old row if the statement doesn't check it in a CAS and
|
|
// doesn't use it to apply updates.
|
|
continue;
|
|
}
|
|
columns_to_read.union_with(op.statement.columns_to_read());
|
|
if (op.statement.has_only_static_column_conditions() && !op.statement.requires_read()) {
|
|
// If a statement has only static column conditions and doesn't have operations that
|
|
// require read, it doesn't matter what clustering key range to query - any partition
|
|
// row will do for the check.
|
|
continue;
|
|
}
|
|
ranges.reserve(op.ranges.size());
|
|
std::copy(op.ranges.begin(), op.ranges.end(), std::back_inserter(ranges));
|
|
}
|
|
uint64_t max_rows = query::partition_max_rows;
|
|
if (ranges.empty()) {
|
|
// With only a static condition, we still want to make the distinction between
|
|
// a non-existing partition and one that exists (has some live data) but has not
|
|
// static content. So we query the first live row of the partition.
|
|
ranges.emplace_back(query::clustering_range::make_open_ended_both_sides());
|
|
max_rows = 1;
|
|
} else {
|
|
ranges = query::clustering_range::deoverlap(std::move(ranges), clustering_key::tri_compare(*_schema));
|
|
}
|
|
auto options = update_parameters::options;
|
|
options.set(query::partition_slice::option::always_return_static_content);
|
|
query::partition_slice ps(std::move(ranges), *_schema, columns_to_read, options);
|
|
ps.set_partition_row_limit(max_rows);
|
|
return make_lw_shared<query::read_command>(_schema->id(), _schema->version(), std::move(ps), qp.proxy().get_max_result_size(ps));
|
|
}
|
|
|
|
bool cas_request::applies_to() const {
|
|
for (const cas_row_update& op: _updates) {
|
|
if (!op.statement.has_conditions()) {
|
|
continue;
|
|
}
|
|
// No need to check subsequent conditions as we have already failed the current one.
|
|
if (!op.statement.applies_to(find_old_row(op), op.options)) {
|
|
return false;
|
|
}
|
|
}
|
|
return true;
|
|
}
|
|
|
|
std::optional<mutation> cas_request::apply(foreign_ptr<lw_shared_ptr<query::result>> qr,
|
|
const query::partition_slice& slice, api::timestamp_type ts) {
|
|
_rows = update_parameters::build_prefetch_data(_schema, *qr, slice);
|
|
if (applies_to()) {
|
|
return apply_updates(ts);
|
|
} else {
|
|
return {};
|
|
}
|
|
}
|
|
|
|
const update_parameters::prefetch_data::row* cas_request::find_old_row(const cas_row_update& op) const {
|
|
static const clustering_key empty_ckey = clustering_key::make_empty();
|
|
const partition_key& pkey = _key.front().start()->value().key().value();
|
|
// We must ignore statement clustering column restriction when
|
|
// choosing a row to check the conditions. If there is no
|
|
// exact match, choose static row to check if the statement
|
|
// applies.
|
|
// For example, the following update must successfully apply (effectively
|
|
// turn into INSERT), because, although the table doesn't have any regular rows matching the
|
|
// statement clustering column restriction, the static row matches the statement condition:
|
|
// CREATE TABLE t(p int, c int, s int static, v int, PRIMARY KEY(p, c));
|
|
// INSERT INTO t(p, s) VALUES(1, 1);
|
|
// UPDATE t SET v=1 WHERE p=1 AND c=1 IF s=1;
|
|
const clustering_key& ckey = op.ranges.front().start() ? op.ranges.front().start()->value() : empty_ckey;
|
|
auto row = _rows.find_row(pkey, ckey);
|
|
if (row == nullptr && !ckey.is_empty() &&
|
|
!op.statement.has_if_exist_condition() && !op.statement.has_if_not_exist_condition()) {
|
|
row = _rows.find_row(pkey, empty_ckey);
|
|
}
|
|
return row;
|
|
}
|
|
|
|
seastar::shared_ptr<cql_transport::messages::result_message>
|
|
cas_request::build_cas_result_set(seastar::shared_ptr<cql3::metadata> metadata,
|
|
const column_set& columns,
|
|
bool is_applied) const {
|
|
const partition_key& pkey = _key.front().start()->value().key().value();
|
|
const clustering_key empty_ckey = clustering_key::make_empty();
|
|
auto result_set = std::make_unique<cql3::result_set>(metadata);
|
|
|
|
for (const cas_row_update& op: _updates) {
|
|
// Construct the result set row
|
|
std::vector<bytes_opt> rs_row;
|
|
rs_row.reserve(metadata->value_count());
|
|
rs_row.emplace_back(boolean_type->decompose(is_applied));
|
|
// Get old row from prefetched data for the row update
|
|
const auto* old_row = find_old_row(op);
|
|
if (!old_row) {
|
|
// In case there is no old row, leave all other columns null
|
|
// so that we can infer whether the update attempts to insert a
|
|
// non-existing row.
|
|
rs_row.resize(metadata->value_count());
|
|
result_set->add_row(std::move(rs_row));
|
|
continue;
|
|
}
|
|
// Fill in the cells from prefetch data (old row) into the result set row
|
|
for (ordinal_column_id id = columns.find_first(); id != column_set::npos; id = columns.find_next(id)) {
|
|
const auto it = old_row->cells.find(id);
|
|
if (it == old_row->cells.end()) {
|
|
rs_row.emplace_back(bytes_opt{});
|
|
continue;
|
|
}
|
|
const data_value& cell = it->second;
|
|
const abstract_type& cell_type = *cell.type();
|
|
const abstract_type& column_type = *_rows.schema->column_at(id).type;
|
|
|
|
if (column_type.is_listlike() && cell_type.is_map()) {
|
|
// List/sets are fetched as maps, but need to be stored as sets.
|
|
const listlike_collection_type_impl& list_type = static_cast<const listlike_collection_type_impl&>(column_type);
|
|
const map_type_impl& map_type = static_cast<const map_type_impl&>(cell_type);
|
|
rs_row.emplace_back(list_type.serialize_map(map_type, cell));
|
|
} else {
|
|
rs_row.emplace_back(cell_type.decompose(cell));
|
|
}
|
|
}
|
|
result_set->add_row(std::move(rs_row));
|
|
}
|
|
cql3::result result(std::move(result_set));
|
|
return seastar::make_shared<cql_transport::messages::result_message::rows>(std::move(result));
|
|
}
|
|
|
|
} // end of namespace "cql3::statements"
|