Cassandra generally does not allow empty strings as partition keys (note, by the way, that empty strings are allowed as clustering keys, as well as in individual components of a compound partition key).
However, Cassandra does allow empty strings in _regular_ columns - and those regular columns can be indexed by a secondary index, or become an empty partition-key column in a materialized view. As noted in issues #9375 and #9364 and verified in a few xfailing cql-pytest tests, Scylla didn't allow these cases - and this patch series fixes that.
Before the last patch in this series finally enables empty-string partition keys in materialized views, we first need to solve a couple of bugs in our code related to handling empty partition keys:
The first patch fixes issue #10178 - a bug in `key_view::tri_compare()` where comparing two empty keys returned a random result instead of "equal".
The second patch fixes issue #9352: our tokenizer has an inconsistency where for an empty string key, two variants of the same function return different results:
1. One variant `murmur3_partitioner::get_token(bytes_view key)` returned `minimum_token()` for the empty string.
2. Another variant `murmur3_partitioner::get_token(const schema& s, partition_key_view key)` did not have this special case, and called the normal hash-function calculation on the empty string (the resulting token is 0).
Variant 2 was an unintentional bug, because Cassandra always does what variant does 1. So the "obvious" fix here would be to fix variant 2 to do what variant 1 does. Nevertheless, we decided to do the opposite: Change variant 1 to match variant 2. The reasoning is as follows:
The `minimum_token()` is `token{token::kind::before_all_keys, 0 }` - it's not a real token. Since we intend in this patch allow real data to exist with the empty key, we need this real data to have a real token. For example, this token needs to be located on the token ring (so the empty-key partition will have replicas) and also belong to one of the shards, and it's not clear that `minimum_token()` will be handled correctly in this context.
After changing the token of the empty string to 0, we note that some places in the code assume that `dht::decorated_key(dh
t::minimum_token(), partition_key::make_empty())` is a legal decorated key. However, as far as I can tell, none of these places actually assume that the partition-key part (the `make_empty()`) really matches the token - this decorated key is only used to start an iteration (ignoring this key itself) or to indicate a non-existent key (in modern code `std::optional` should be used for that).
While normally changing the token of a key is a big faux-pas, which can result in old data no longer being readable, in this case this change is safe because:
1. Scylla previously disallowed empty partition keys (in both base tables and views), so we cannot have had such a partition key saved in any sstable.
3. Cassandra does allow empty partition keys in _views_ and _secondary indexes_, but we do not support migrating sstables of those into Scylla - users are expected to only migrate the base table and then re-create the view or index. So however Cassandra writes those empty-key partitions, we don't care.
The third patch finally fixes the materialized views implementation to not drop view rows with an empty-string partition key (#9375). This means we basically revert commit ec8960df45 - which fixed#3262 by disallowing empty partition keys in views, whereas this patch fixes the same problem by handling the empty partition keys correctly.
The fix for the secondary index bug (#9364) comes "for free" because it is based on materialized views.
We already had xfailing test cases for empty strings in materialized views and indexes, and after this series they begin to pass so the "xfail" mark is removed. The series also adds additional test cases that validate additional corner cases discovered during the debugging.
Fixes#9352Fixes#9364Fixes#9375Fixes#10178Closes#10170
* github.com:scylladb/scylla:
compound_compat.hh: add missing methods of iterator
materialized views: allow empty strings in views and indexes
murmur3: fix inconsistent token for empty partition key
compound_compat.hh: fix bug iterating on empty singular key