This will make it easier to access table proprties in places which
only have schema_ptr. This is in particular useful when replacing
dht::shard_of() uses with s->table().shard_of(), now that sharding is
no longer static, but table-specific.
Also, it allows us to install a guard which catches invalid uses of
schema::get_sharder() on tablet-based tables.
It will be helpful for other uses as well. For example, we can now get
rid of the static_props hack.
This PR fixes some problems found after the PR was merged:
* missed `node_to_work_on` assignment in `handle_topology_transition`;
* change error reporting in `update_fence_version` from `on_internal_error` to regular exceptions, since that exceptions can happen during normal operation.
* `update_fence_version` has beed moved after `group0_service.setup_group0_if_exist` in `main.cc`, otherwise we use uninitialized `token_metadata::version` and get an error.
Fixes: #14303Closes#14292
* github.com:scylladb/scylladb:
main.cc: move update_fence_version after group0_service.setup_group0_if_exist
shared_token_metadata: update_fence_version: on_internal_error -> throw
storage_service: handle_topology_transition: fix missed node assignment
Currently, when two cells have the same write timestamp
and both are alive or expiring, we compare their value first,
before checking if either of them is expiring
and if both are expiring, comparing their expiration time
and ttl value to determine which of them will expire
later or was written later.
This was based on an early version of Cassandra.
However, the Cassandra implementation rightfully changed in
e225c88a65 ([CASSANDRA-14592](https://issues.apache.org/jira/browse/CASSANDRA-14592)),
where the cell expiration is considered before the cell value.
To summarize, the motivation for this change is three fold:
1. Cassandra compatibility
2. Prevent an edge case where a null value is returned by select query when an expired cell has a larger value than a cell with later expiration.
3. A generalization of the above: value-based reconciliation may cause select query to return a mixture of upserts, if multiple upserts use the same timeastamp but have different expiration times. If the cell value is considered before expiration, the select result may contain cells from different inserts, while reconciling based the expiration times will choose cells consistently from either upserts, as all cells in the respective upsert will carry the same expiration time.
Fixes#14182
Also, this series:
- updates dml documentation
- updates internal documentation
- updates and adds unit tests and cql pytest reproducing #14182Closes#14183
* github.com:scylladb/scylladb:
docs: dml: add update ordering section
cql-pytest: test_using_timestamp: add tests for rewrites using same timestamp
mutation_partition: compare_row_marker_for_merge: consider ttl in case expiry is the same
atomic_cell: compare_atomic_cell_for_merge: update and add documentation
compare_atomic_cell_for_merge: compare value last for live cells
mutation_test: test_cell_ordering: improve debuggability
Otherwise, the validation
new_fence_version <= token_metadata::version
inside update_fence_version will use an uninitialized
token_metadata::version == 0
and we will get an error.
The test_topology_ops was improved to
catch this problem.
Fixes: #14303
It is currently located in query_class_config.hh, which is named after a
now defunct struct. This arrangement is unintuitive and there is no
upside to it. The main user of max_result_size is query_comand, so
colocate it next to the latter.
Closes#14268
Add reproducers for #14182:
test_rewrite_different_values_using_same_timestamp verifies
expiration-based cell reconciliation.
test_rewrite_different_values_using_same_timestamp_and_expiration
is a scylla_only test, verifying that when
two cells with same timestamp and same expiration
are compared, the one with the lesser ttl prevails.
test_rewrite_using_same_timestamp_select_after_expiration
reproduces the specific issue hit in #14182
where a cell is selected after it expires since
it has a lexicographically larger value than
the other cell with later expiration.
test_rewrite_multiple_cells_using_same_timestamp verifies
atomicity of inserts of multiple columns, with a TTL.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, when two cells have the same write timestamp
and both are alive or expiring, we compare their value first,
before checking if either of them is expiring
and if both are expiring, comparing their expiration time
and ttl value to determine which of them will expire
later or was written later.
This was changed in CASSANDRA-14592
for consistency with the preference for dead cells over live cells,
as expiring cells will become tombstones at a future time
and then they'd win over live cells with the same timestamp,
hence they should win also before expiration.
In addition, comparing the cell value before expiration
can lead to unintuitive corner cases where rewriting
a cell using the same timestamp but different TTL
may cause scylla to return the cell with null value
if it expired in the meanwhile.
Also, when multiple columns are written using two upserts
using the same write timestamp but with different expiration,
selecting cells by their value may return a mixed result
where each cell is selected individually from either upsert,
by picking the cells with the largest values for each column,
while using the expiration time to break tie will lead
to a more consistent results where a set of cell from
only one of the upserts will be selected.
Fixesscylladb/scylladb#14182
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, it is hard to tell which of the many sub-cases
fail in this unit test, in case any of them fails.
This change uses logging in debug and trace level
to help with that by reproducing the error
with --logger-log-level testlog=trace
(The cases are deterministic so reproducing should not
be a problem)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This patch adds some minimal tests for the "with compression = {..}" table
configuration. These tests reproduce three known bugs:
Refs #6442: Always print all schema parameters (including default values)
Scylla doesn't return the default chunk_length_in_kb, but Cassandra
does.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
for compression settings by default
Cassandra switched, long ago, the "sstable_compression" attribute's
name to "class". This can break Cassandra applications that create
tables (where we won't understand the "class" parameter) and applications
that inquire about the configuration of existing tables. This patch adds
tests for both problems.
Refs #9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a
core dump on all nodes
Our test for this issue hangs Scylla (or crashes, depending on the test
environment configuration), when a huge allocation is attempted during
memtable flush. So this test is marked "skip" instead of xfail.
The tests included here also uncovered a new minor/insignificant bug,
where Scylla allows floating point numbers as chunk_length_in_kb - this
number is truncated to an integer, and allowed, unlike Cassandra or
common sense.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#14261
Since `mvcc: make schema upgrades gentle` (51e3b9321b),
rows pointed to by the cursor can have different (older) schema
than the schema of the cursor's snapshot.
However, one place in the code wasn't updated accordingly,
causing a row to be processed with the wrong schema in the right
circumstances.
This passed through unit testing because it requires
a digest-computing cache read after a schema change,
and no test exercised this.
This series fixes the bug and adds a unit test which reproduces the issue.
Fixes#14110Closes#14305
* github.com:scylladb/scylladb:
test: boost/row_cache_test: add a reproducer for #14110
cache_flat_mutation_reader: use the correct schema in prepare_hash
mutation: mutation_cleaner: add pause()
Exposing scrub compaction to the command-line. Allows for offline scrub of sstables, in cases where online scrubbing (via scylla itself) is not possible or not desired. One such case recently was an sstable from a backup which turned out to be corrupt, `nodetool refresh --load-and-stream` refusing to load it.
Fixes: #14203Closes#14260
* github.com:scylladb/scylladb:
docs/operating-scylla/admin-tools: scylla-sstable: document scrub operation
test/cql-pytest: test_tools.py: add test for scylla sstable scrub
tools/scylla-sstable: add scrub operation
tools/scylla-sstable: write operation: add none to valid validation levels
tools/scylla-sstable: handle errors thrown by the operation
test/cql-pytest: add option to omit scylla's output from the test output
tools/scylla-sstable: s/option/operation_option/
tool/scylla-sstable: add missing comments
This mini-series updates the expected errors in `test/cql-pytest/test-timestamp.py`
to the ones changed in b7bbcdd178.
Then, it renamed the test to `test_using_timestamp.py` so it would
run automatically with `test.py`.
Closes#14293
* github.com:scylladb/scylladb:
cql-pytest: rename test-timestamp.py to test_using_timestamp.py
cql-pytest: test-timestamp: test_key_writetime: update expected errors
There are tons of wrappers that help test cases make sstables for their needs. And lots of code duplication in test cases that do parts of those helpers' work on their own. This set cleans some bits of those
Closes#14280
* github.com:scylladb/scylladb:
test/utils: Generalize making memtable from vector<mutation>
test/util: Generalize make_sstable_easy()-s
test/sstable_mutation: Remove useless helper
test/sstable_mutation: Make writer config in make_sstable_mutation_source()
test/utils: De-duplicate make_sstable_containing-s
test/sstable_compaction: Remove useless one-line local lambda
test/sstable_compaction: Simplify sstable making
test/sstables*: Make sstable from vector of mutations
test/mutation_reader: Remove create_sstable() helper from test
It's excessive, test case that needs it can get storage prefix without
this fancy wrapper-helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14273
Commit 4e205650 (test: Verify correctness of sstable::bytes_on_disk())
added a test to verify that sstable::bytes_on_disk() is equal to the
real size of real files. The same test case makes sense for S3-backed
sstables as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14272
Followup to 9bfa63fe37. Like in
`test_downgrade_after_successful_upgrade_fails`, the test
`test_joining_old_node_fails` also restarts all nodes at once and is prone
to a bug in the Python driver which can prevent the session from
reconnecting to any of the nodes. This commit applies the same
workaround to the other test (manual reconnect by recreating the Python
driver session).
Closes#14291
Another node can stop after it joined the group0 but before it
advertised itself in gossip. `get_inet_addrs` will try to resolve all
IPs and `wait_for_peers_to_enter_synchronize_state` will loop
indefinitely.
But `wait_for_peers_to_enter_synchronize_state` can return early if one
of the nodes confirms that the upgrade procedure has finished. For that,
it doesn't need the IPs of all group 0 members - only the IP of some
nodes which can do the confirmation.
This pr restructures the code so that IPs of nodes are resolved inside
the `max_concurrent_for_each` that
`wait_for_peers_to_enter_synchronize_state` performs. Then, even if some
IPs won't be resolved, but one of the nodes confirms a successful
upgrade, we can continue.
Fixes#13543Closes#14046
* github.com:scylladb/scylladb:
raft topology: test: check if aborted node replacing blocks bootstrap
raft topology: `wait_for_peers_to_enter_synchronize_state` doesn't need to resolve all IPs
1. Otherwise test.py doesn't recognize it.
2. As it represents what the test does in a better way.
3. Following `test_using_timeout.py` naming convention.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The error messages were changed in
b7bbcdd178.
Extend the `match` regular expression param
to pytest.raises to include both old and new message
to remain backward compatible also with Cassandra,
as this test is run against both Cassandra and Scylla.
Note that the test didn't run automatically
since it's named `test-timestamp.py` and test.py
looks up only test scripts beginning with `test_`.
The test will be renamed in the next patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In preparation for converting selectors to evaluate expressions,
add support for evaluating column_mutation_attribute (representing
the WRITETIME/TTL pseudo-functions).
A unit test is added.
Fixes#12906Closes#14287
* github.com:scylladb/scylladb:
test: expr: test evaluation of column_mutation_attribute
test: lib: enhance make_evaluation_inputs() with support for ttls/timestamps
cql3: expr: evaluate() column_mutation_attribute
This reverts commit d1dc579062, reversing
changes made to 3a73048bc9.
Said commit caused regressions in dtests. We need to investigate and fix
those, but in the meanwhile let's revert this to reduce the disruption
to our workflows.
Refs: #14283
Initialization of `system_keyspace` is now done in a single place instead of
being spread out through the entire procedure. `system_keyspace` is also
available for queries much earlier which allows, for example, to load our Host
ID before we initialize any of the distributed services (like gossiper,
messaging_service etc.) This is doable because `query_processor` is now
available early. A couple of FIXMEs have been resolved.
Refs: #14202Closes#14285
* github.com:scylladb/scylladb:
main, cql_test_env: simplify `system_keyspace` initialization
db: system_keyspace: take simpler service references in `make`
db: system_keyspace: call `initialize_virtual_tables` from `main`
db: system_keyspace: refactor virtual tables creation
db: system_keyspace: remove `system_keyspace_make`
db: system_keyspace: refactor local system table creation code
replica: database: remove `is_bootstrap` argument from create_keyspace
replica: database: write a comment for `parse_system_tables`
replica: database: remove redundant `keyspace::get_erm_factory()` getter
db: system_keyspace: don't take `sharded<>` references
There's no way to evaluate a column_mutation_attribute via CQL
yet (the only user uses old-style cql3::selection::selector), so
we only supply a unit test.
While remaining backwards compatible, allow supplying custom timestamp/ttl
with each fake column value.
Note: I tried to use a formatter<> for the new data structure, but
got entangled in a template loop.
Initialization of `system_keyspace` is now all done at once instead of
being spread out through the entire procedure. This is doable because
`query_processor` is now available early. A couple of FIXMEs have been
resolved.
Take references to services which are initialized earlier. The
references to `gossiper`, `storage_service` and `raft_group0_registry`
are no longer needed.
This will allow us to move the `make` step right after starting
`system_keyspace`.
`initialize_virtual_tables` was called from `system_keyspace::make`,
which caused this `make` function to take a bunch of references to
late-initialized services (`gossiper`, `storage_service`).
Call it from `main`/`cql_test_env` instead.
Note: `system_keyspace::make` is called from
`distributed_loader::init_system_keyspace`. The latter function contains
additional steps: populate the system keyspaces (with data from
sstables) and mark their tables ready for writes.
None of these steps apply to virtual tables.
There exists at least one writable virtual table, but writes into
virtual tables are special and the implementation of writes is
virtual-table specific. The existing writable virtual table
(`db_config_table`) only updates in-memory state when written to. If a
virtual table would like to create sstables, or populate itself with
sstable data on startup, it will have to handle this in its own
initialization function.
Separating `initialize_virtual_tables` like this will allow us to
simplify `system_keyspace` initialization, making it independent of
services used for distributed communication.
Implement `expr:valuate()` for `expr::field_selection`.
`field_selection` is used to represent access to a struct field.
For example, with a UDT value:
```
CREATE TYPE my_type (a int, b int);
```
The expression `my_type_value.a` would be represented as a `field_selection`, which selects the field `a`.
Evaluating such an expression consists of finding the right element's value in a serialized UDT value and returning it.
Note that it's still not possible to use `field_selection` inside the `WHERE` clause. Enabling it would require changes to the grammar, as well as query planning, Current `statement_restrictions` just reacts with `on_internal_error` when it encounters a `field_selection`.
Nonetheless it's a step towards relaxing the grammar, and now it's finally possible to evaluate all kinds of prepared expressions (#12906)
Fixes: https://github.com/scylladb/scylladb/issues/12906Closes#14235
* github.com:scylladb/scylladb:
boost/expr_test: test evaluate(field_selection)
cql3/expr: fix printing of field_selection
cql3/expression: implement evaluate(field_selection)
types/user: modify idx_of_field to use bytes_view
column_identifer: add column_identifier_raw::text()
types: add read_nth_user_type_field()
types: add read_nth_tuple_element()
Both, make_sstable_easy() and make_sstable_containing() prepare memtable
by allocating it and applying mutations from vector. Make a local
helper. Many test cases can, probably, benefit from it too, but they
often do more stuff before applying mutation to memtable, so this is
left for future patching
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two of them, one making sstable from memtable and the other
one doing the same from a custom reader. The former can just call the
latter with memtable's flat reader
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two make_sstable_mutation_source() helpers that call one
another and test cases only need one of them, so leave just one that's
in use.
Also don't pass env's tempdir to make_sstable() util call, it can get
env's tempdir on its own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
These local helpers accept writer config which's made the same way by
callers, so the helpers can do it on their own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The function that prepares memtable from mutations vector can call its
overload that writes this memtable into an sstable
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The get_usable_sst() wrapper lambda is not needed, calling the
make_sstable_containing() is shorter
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a temporary memtable and on-stack lambda that makes the
mutation. Both are overkill, make_sstable_containing() can work on just
plan on-stack-constructed mutation
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are many cases that want to call make_sstable_containing() with
the vector of mutations at hand. For that they apply it to a temporary
memtable, but sstable-utils can work with the mutations vector as well
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test `test_downgrade_after_successful_upgrade_fails` shuts down the whole cluster, reconfigures the nodes and then restarts. Apparently, the python driver sometimes does not handle this correctly; in one test run we observed that the driver did not manage to reconnect to any of the nodes, even though the nodes managed to start successfully.
More context can be found on the python driver issue.
This PR works around this issue by using the existing `reconnect_driver` function (which is a workaround for a _different_ python driver issue already) to help the driver reconnect after the full cluster restart.
Refs: scylladb/python-driver#230
Closes#14276
* github.com:scylladb/scylladb:
tests/topology: work around python driver issue in cluster feature tests
test/topology{_raft_disabled}: move reconnect_driver to topology utils
In https://github.com/scylladb/scylladb/pull/14231 we split `storage_proxy` initialization into two phases: for local and remote parts. Here we do the same with `query_processor`. This allows performing queries for local tables early in the Scylla startup procedure, before we initialize services used for cluster communication such as `messaging_service` or `gossiper`.
Fixes: #14202
As a follow-up we will simplify `system_keyspace` initialization, making it available earlier as well.
Closes#14256
* github.com:scylladb/scylladb:
main, cql_test_env: start `query_processor` early
cql3: query_processor: split `remote` initialization step
cql3: query_processor: move `migration_manager&`, `forwarder&`, `group0_client&` to a `remote` object
cql3: query_processor: make `forwarder()` private
cql3: query_processor: make `get_group0_client()` private
cql3: strongly_consistent_modification_statement: fix indentation
cql3: query_processor: make `get_migration_manager` private
tracing: remove `qp.get_migration_manager()` calls
table_helper: remove `qp.get_migration_manager()` calls
thrift: handler: move implementation of `execute_schema_command` to `query_processor`
data_dictionary: add `get_version`
cql3: statements: schema_altering_statement: move `execute0` to `query_processor`
cql3: statements: pass `migration_manager&` explicitly to `prepare_schema_mutations`
main: add missing `supervisor::notify` message
The test `test_downgrade_after_successful_upgrade_fails` stops all
nodes, reconfigures them to support the test-only feature and restarts
them. Unfortunately, it looks like python driver sometimes does not
handle this properly and might not reconnect after all nodes are shut
down.
This commit adds a workaround for scylladb/python-driver#230 - the test
re-creates python driver session right after nodes are restarted.
The `reconnect_driver` function will be useful outside the
`topology_raft_disabled` test suite - namely, for cluster feature tests
in `topology`. The best course of action for this function would be to
put it into pylib utils; however, the function depends on ManagerClient
which is defined in `test.pylib.manager_client` that depends on
`test.pylib.utils` - therefore we cannot put it there as it would cause
an import cycle. The `topology.utils` module sounds like the next best
thing.
In addition, the docstring comment is updated to reflect that this
function will now be used to work around another issue as well.
Pass `migration_manager&`, `forward_service&` and `raft_group0_client&`
in the remote init step which happens after the constructor.
Add a corresponding uninit remote step.
Make sure that any use of the `remote` services is finished before we
destroy the `remote` object by using a gate.
Thanks to this in a later commit we'll be able to move the construction
of `query_processor` earlier in the Scylla initialization procedure.
Scylla's output is often unnecessary to debug a failed test, or even
detrimental because one has to scroll back in the terminal after each
test run, to see the actual test's output. Add an option,
--omit-scylla-output, which when present on the command line of `run`,
the output of scylla will be omitted from the test output.
Also, to help discover this option (and others), don't run the tests
when either -h or --help is present on the command line. Just invoke
pytest (with said option) and exit.
Scenario:
1. Start a cluster with nodes node1, node2, node3
2. Start node4 replacing node node2
3. Stop node node4 after it joined group0 but before it advertised itself in gossip
4. Start node5 replacing node node2
Test simulates the behavior described in #13543.
Test passes only if `wait_for_peers_to_enter_synchronize_state` doesn't need to
resolve all IPs to return early. If not, node5 will hang trying to resolve the
IP of node4:
```
raft_group0_upgrade - : failed to resolve IP addresses of some of the cluster members ([node4's host ID])
```