The semantics of Scylla's materialized views may vary depending on how their
primary keys correspond to the base table's one. One of the differences is
how we handle writes to columns in the base table that are not selected by
a view:
* Case 1: The view's PK is a permutation of the base table's PK:
Since the view's primary key cannot be changed in an update, a row in
the view remains alive as long as the corresponding row in the base table
is alive.
The tricky part comes when the base table has columns that are NOT selected
by the view. CQL3 used to not allow for defining a table that didn't have
any other columns besides its primary key. Also, when inserting a row into
a table, it was mandatory to provide at least one value aside from the
primary key. At some point it changed [1] and the implementation of the
solution relied on the notion of the row marker.
Putting the details aside, consider the following scenario:
(i) the base table has a primary key consisting of columns
c_1, ..., c_k, and it has regular columns rc_1, ..., rc_n,
(ii) the primary key of an MV defined on that table consists of
a permutation of c_1, ..., c_k. The MV doesn't select at least
one of the regular columns of the base table. Without loss of
generality, let that unselected column be rc_1.
(iii) the base table has a row R whose only non-null value is the one
in the regular column rc_1.
Now, what will R correspond to in the MV? The base table doesn't have a row
marker, but all of its regular columns in the MV will be NULLs. That's NOT
allowed.
To solve that problem, all unselected columns have corresponding virtual
columns in the MV; the only information they provide is whether there is
a value in the base table or not. This way, the MV knows if a row is still
alive or not.
For that reason, we send view updates to virtual columns in the following
cases:
(i) the value in the column changes from NULL to a value, i.e. it's
created,
(ii) the value in the column exists, but its TTL has been updated.
* Case 2: The view's PK has one more column that the base table's one:
Since the primary key of the view has a regular column C from the base
table, it is guaranteed that if there's a row in the MV, the corresponding
row in the base table can remain alive: since C is part of the view's PK,
it must have a value, so the row in the base table has a value in C too.
The problem with virtual columns from the previous case doesn't manifest
in this one. The liveness of the cell in C determines the liveness of
the whole row in the view.
The semantics gets more complex, but the conclusion is this: in case 1,
virtual columns exist and we may need to generate view updates for them,
while in case 2 virtual columns do NOT exist and so we don't generate
view updates for them.
What changes in this patch is we adjust the code to it. If a view has
a regular column from the base table as part of its primary key, we
no longer emit view updates when we change a column unselected by that
view. It is purely an OPTIMIZATION change.
[1]: https://issues.apache.org/jira/browse/CASSANDRA-4361
Fixes scylladb/scylladb#21652
Closes scylladb/scylladb#21653
Scylla unit tests using C++ and the Boost test framework
The source files in this directory are Scylla unit tests written in C++ using the Boost.Test framework. These unit tests come in three flavors:
-
Some simple tests that check stand-alone C++ functions or classes use Boost's
BOOST_AUTO_TEST_CASE. -
Some tests require Seastar features, and need to be declared with Seastar's extensions to Boost.Test, namely
SEASTAR_TEST_CASE. -
Even more elaborate tests require not just a functioning Seastar environment but also a complete (or partial) Scylla environment. Those tests use the
do_with_cql_env()ordo_with_cql_env_thread()function to set up a mostly-functioning environment behaving like a single-node Scylla, in which the test can run.
While we have many tests of the third flavor, writing new tests of this type should be reserved to white box tests - tests where it is necessary to inspect or control Scylla internals that do not have user-facing APIs such as CQL. In contrast, black-box tests - tests that can be written only using user-facing APIs, should be written in one of newer test frameworks that we offer - such as test/cqlpy or test/alternator (in Python, using the CQL or DynamoDB APIs respectively) or test/cql (using textual CQL commands), or - if more than one Scylla node is needed for a test - using the test/topology* framework.
Running tests
Because these are C++ tests, they need to be compiled before running.
To compile a single test executable row_cache_test, use a command like
ninja build/dev/test/boost/row_cache_test
You can also use ninja dev-test to build all C++ tests, or use
ninja deb-build to build the C++ tests and also the full Scylla executable
(however, note that full Scylla executable isn't needed to run Boost tests).
Replace "dev" by "debug" or "release" in the examples above and below to use the "debug" build mode (which, importantly, compiles the test with ASAN and UBSAN enabling on and helps catch difficult-to-catch use-after-free bugs) or the "release" build mode (optimized for run speed).
To run an entire test file row_cache_test, including all its test
functions, use a command like:
build/dev/test/boost/row_cache_test -- -c1 -m1G
to run a single test function test_reproduce_18045() from the longer test
file, use a command like:
build/dev/test/boost/row_cache_test -t test_reproduce_18045 -- -c1 -m1G
In these command lines, the parameters before the -- are passed to
Boost.Test, while the parameters after the -- are passed to the test code,
and in particular to Seastar. In this example Seastar is asked to run on one
CPU (-c1) and use 1G of memory (-m1G) instead of hogging the entire
machine. The Boost.Test option -t test_reproduce_18045 asks it to run just
this one test function instead of all the test functions in the executable.
Unfortunately, interrupting a running test with control-C while doesn't
work. This is a known bug (#5696). Kill a test with SIGKILL (-9) if you
need to kill it while it's running.
Boost tests can also be run using test.py - which is a script that provides a uniform way to run all tests in scylladb.git - C++ tests, Python tests, etc.
Writing tests
Because of the large build time and build size of each separate test executable, it is recommended to put test functions into relatively large source files. But not too large - to keep compilation time of a single source file (during development) at reasonable levels.
When adding new source files in test/boost, don't forget to list the new source file in configure.py and also in CMakeLists.txt. The former is needed by our CI, but the latter is preferred by some developers.