Currently, when two cells have the same write timestamp
and both are alive or expiring, we compare their value first,
before checking if either of them is expiring
and if both are expiring, comparing their expiration time
and ttl value to determine which of them will expire
later or was written later.
This was changed in CASSANDRA-14592
for consistency with the preference for dead cells over live cells,
as expiring cells will become tombstones at a future time
and then they'd win over live cells with the same timestamp,
hence they should win also before expiration.
In addition, comparing the cell value before expiration
can lead to unintuitive corner cases where rewriting
a cell using the same timestamp but different TTL
may cause scylla to return the cell with null value
if it expired in the meanwhile.
Also, when multiple columns are written using two upserts
using the same write timestamp but with different expiration,
selecting cells by their value may return a mixed result
where each cell is selected individually from either upsert,
by picking the cells with the largest values for each column,
while using the expiration time to break tie will lead
to a more consistent results where a set of cell from
only one of the upserts will be selected.
Fixesscylladb/scylladb#14182
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, it is hard to tell which of the many sub-cases
fail in this unit test, in case any of them fails.
This change uses logging in debug and trace level
to help with that by reproducing the error
with --logger-log-level testlog=trace
(The cases are deterministic so reproducing should not
be a problem)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Exposing scrub compaction to the command-line. Allows for offline scrub of sstables, in cases where online scrubbing (via scylla itself) is not possible or not desired. One such case recently was an sstable from a backup which turned out to be corrupt, `nodetool refresh --load-and-stream` refusing to load it.
Fixes: #14203Closes#14260
* github.com:scylladb/scylladb:
docs/operating-scylla/admin-tools: scylla-sstable: document scrub operation
test/cql-pytest: test_tools.py: add test for scylla sstable scrub
tools/scylla-sstable: add scrub operation
tools/scylla-sstable: write operation: add none to valid validation levels
tools/scylla-sstable: handle errors thrown by the operation
test/cql-pytest: add option to omit scylla's output from the test output
tools/scylla-sstable: s/option/operation_option/
tool/scylla-sstable: add missing comments
This mini-series updates the expected errors in `test/cql-pytest/test-timestamp.py`
to the ones changed in b7bbcdd178.
Then, it renamed the test to `test_using_timestamp.py` so it would
run automatically with `test.py`.
Closes#14293
* github.com:scylladb/scylladb:
cql-pytest: rename test-timestamp.py to test_using_timestamp.py
cql-pytest: test-timestamp: test_key_writetime: update expected errors
There are tons of wrappers that help test cases make sstables for their needs. And lots of code duplication in test cases that do parts of those helpers' work on their own. This set cleans some bits of those
Closes#14280
* github.com:scylladb/scylladb:
test/utils: Generalize making memtable from vector<mutation>
test/util: Generalize make_sstable_easy()-s
test/sstable_mutation: Remove useless helper
test/sstable_mutation: Make writer config in make_sstable_mutation_source()
test/utils: De-duplicate make_sstable_containing-s
test/sstable_compaction: Remove useless one-line local lambda
test/sstable_compaction: Simplify sstable making
test/sstables*: Make sstable from vector of mutations
test/mutation_reader: Remove create_sstable() helper from test
It's excessive, test case that needs it can get storage prefix without
this fancy wrapper-helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14273
Commit 4e205650 (test: Verify correctness of sstable::bytes_on_disk())
added a test to verify that sstable::bytes_on_disk() is equal to the
real size of real files. The same test case makes sense for S3-backed
sstables as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14272
Followup to 9bfa63fe37. Like in
`test_downgrade_after_successful_upgrade_fails`, the test
`test_joining_old_node_fails` also restarts all nodes at once and is prone
to a bug in the Python driver which can prevent the session from
reconnecting to any of the nodes. This commit applies the same
workaround to the other test (manual reconnect by recreating the Python
driver session).
Closes#14291
Another node can stop after it joined the group0 but before it
advertised itself in gossip. `get_inet_addrs` will try to resolve all
IPs and `wait_for_peers_to_enter_synchronize_state` will loop
indefinitely.
But `wait_for_peers_to_enter_synchronize_state` can return early if one
of the nodes confirms that the upgrade procedure has finished. For that,
it doesn't need the IPs of all group 0 members - only the IP of some
nodes which can do the confirmation.
This pr restructures the code so that IPs of nodes are resolved inside
the `max_concurrent_for_each` that
`wait_for_peers_to_enter_synchronize_state` performs. Then, even if some
IPs won't be resolved, but one of the nodes confirms a successful
upgrade, we can continue.
Fixes#13543Closes#14046
* github.com:scylladb/scylladb:
raft topology: test: check if aborted node replacing blocks bootstrap
raft topology: `wait_for_peers_to_enter_synchronize_state` doesn't need to resolve all IPs
1. Otherwise test.py doesn't recognize it.
2. As it represents what the test does in a better way.
3. Following `test_using_timeout.py` naming convention.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The error messages were changed in
b7bbcdd178.
Extend the `match` regular expression param
to pytest.raises to include both old and new message
to remain backward compatible also with Cassandra,
as this test is run against both Cassandra and Scylla.
Note that the test didn't run automatically
since it's named `test-timestamp.py` and test.py
looks up only test scripts beginning with `test_`.
The test will be renamed in the next patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In preparation for converting selectors to evaluate expressions,
add support for evaluating column_mutation_attribute (representing
the WRITETIME/TTL pseudo-functions).
A unit test is added.
Fixes#12906Closes#14287
* github.com:scylladb/scylladb:
test: expr: test evaluation of column_mutation_attribute
test: lib: enhance make_evaluation_inputs() with support for ttls/timestamps
cql3: expr: evaluate() column_mutation_attribute
This reverts commit d1dc579062, reversing
changes made to 3a73048bc9.
Said commit caused regressions in dtests. We need to investigate and fix
those, but in the meanwhile let's revert this to reduce the disruption
to our workflows.
Refs: #14283
Initialization of `system_keyspace` is now done in a single place instead of
being spread out through the entire procedure. `system_keyspace` is also
available for queries much earlier which allows, for example, to load our Host
ID before we initialize any of the distributed services (like gossiper,
messaging_service etc.) This is doable because `query_processor` is now
available early. A couple of FIXMEs have been resolved.
Refs: #14202Closes#14285
* github.com:scylladb/scylladb:
main, cql_test_env: simplify `system_keyspace` initialization
db: system_keyspace: take simpler service references in `make`
db: system_keyspace: call `initialize_virtual_tables` from `main`
db: system_keyspace: refactor virtual tables creation
db: system_keyspace: remove `system_keyspace_make`
db: system_keyspace: refactor local system table creation code
replica: database: remove `is_bootstrap` argument from create_keyspace
replica: database: write a comment for `parse_system_tables`
replica: database: remove redundant `keyspace::get_erm_factory()` getter
db: system_keyspace: don't take `sharded<>` references
There's no way to evaluate a column_mutation_attribute via CQL
yet (the only user uses old-style cql3::selection::selector), so
we only supply a unit test.
While remaining backwards compatible, allow supplying custom timestamp/ttl
with each fake column value.
Note: I tried to use a formatter<> for the new data structure, but
got entangled in a template loop.
Initialization of `system_keyspace` is now all done at once instead of
being spread out through the entire procedure. This is doable because
`query_processor` is now available early. A couple of FIXMEs have been
resolved.
Take references to services which are initialized earlier. The
references to `gossiper`, `storage_service` and `raft_group0_registry`
are no longer needed.
This will allow us to move the `make` step right after starting
`system_keyspace`.
`initialize_virtual_tables` was called from `system_keyspace::make`,
which caused this `make` function to take a bunch of references to
late-initialized services (`gossiper`, `storage_service`).
Call it from `main`/`cql_test_env` instead.
Note: `system_keyspace::make` is called from
`distributed_loader::init_system_keyspace`. The latter function contains
additional steps: populate the system keyspaces (with data from
sstables) and mark their tables ready for writes.
None of these steps apply to virtual tables.
There exists at least one writable virtual table, but writes into
virtual tables are special and the implementation of writes is
virtual-table specific. The existing writable virtual table
(`db_config_table`) only updates in-memory state when written to. If a
virtual table would like to create sstables, or populate itself with
sstable data on startup, it will have to handle this in its own
initialization function.
Separating `initialize_virtual_tables` like this will allow us to
simplify `system_keyspace` initialization, making it independent of
services used for distributed communication.
Implement `expr:valuate()` for `expr::field_selection`.
`field_selection` is used to represent access to a struct field.
For example, with a UDT value:
```
CREATE TYPE my_type (a int, b int);
```
The expression `my_type_value.a` would be represented as a `field_selection`, which selects the field `a`.
Evaluating such an expression consists of finding the right element's value in a serialized UDT value and returning it.
Note that it's still not possible to use `field_selection` inside the `WHERE` clause. Enabling it would require changes to the grammar, as well as query planning, Current `statement_restrictions` just reacts with `on_internal_error` when it encounters a `field_selection`.
Nonetheless it's a step towards relaxing the grammar, and now it's finally possible to evaluate all kinds of prepared expressions (#12906)
Fixes: https://github.com/scylladb/scylladb/issues/12906Closes#14235
* github.com:scylladb/scylladb:
boost/expr_test: test evaluate(field_selection)
cql3/expr: fix printing of field_selection
cql3/expression: implement evaluate(field_selection)
types/user: modify idx_of_field to use bytes_view
column_identifer: add column_identifier_raw::text()
types: add read_nth_user_type_field()
types: add read_nth_tuple_element()
Both, make_sstable_easy() and make_sstable_containing() prepare memtable
by allocating it and applying mutations from vector. Make a local
helper. Many test cases can, probably, benefit from it too, but they
often do more stuff before applying mutation to memtable, so this is
left for future patching
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two of them, one making sstable from memtable and the other
one doing the same from a custom reader. The former can just call the
latter with memtable's flat reader
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two make_sstable_mutation_source() helpers that call one
another and test cases only need one of them, so leave just one that's
in use.
Also don't pass env's tempdir to make_sstable() util call, it can get
env's tempdir on its own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
These local helpers accept writer config which's made the same way by
callers, so the helpers can do it on their own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The function that prepares memtable from mutations vector can call its
overload that writes this memtable into an sstable
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The get_usable_sst() wrapper lambda is not needed, calling the
make_sstable_containing() is shorter
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a temporary memtable and on-stack lambda that makes the
mutation. Both are overkill, make_sstable_containing() can work on just
plan on-stack-constructed mutation
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are many cases that want to call make_sstable_containing() with
the vector of mutations at hand. For that they apply it to a temporary
memtable, but sstable-utils can work with the mutations vector as well
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test `test_downgrade_after_successful_upgrade_fails` shuts down the whole cluster, reconfigures the nodes and then restarts. Apparently, the python driver sometimes does not handle this correctly; in one test run we observed that the driver did not manage to reconnect to any of the nodes, even though the nodes managed to start successfully.
More context can be found on the python driver issue.
This PR works around this issue by using the existing `reconnect_driver` function (which is a workaround for a _different_ python driver issue already) to help the driver reconnect after the full cluster restart.
Refs: scylladb/python-driver#230
Closes#14276
* github.com:scylladb/scylladb:
tests/topology: work around python driver issue in cluster feature tests
test/topology{_raft_disabled}: move reconnect_driver to topology utils
In https://github.com/scylladb/scylladb/pull/14231 we split `storage_proxy` initialization into two phases: for local and remote parts. Here we do the same with `query_processor`. This allows performing queries for local tables early in the Scylla startup procedure, before we initialize services used for cluster communication such as `messaging_service` or `gossiper`.
Fixes: #14202
As a follow-up we will simplify `system_keyspace` initialization, making it available earlier as well.
Closes#14256
* github.com:scylladb/scylladb:
main, cql_test_env: start `query_processor` early
cql3: query_processor: split `remote` initialization step
cql3: query_processor: move `migration_manager&`, `forwarder&`, `group0_client&` to a `remote` object
cql3: query_processor: make `forwarder()` private
cql3: query_processor: make `get_group0_client()` private
cql3: strongly_consistent_modification_statement: fix indentation
cql3: query_processor: make `get_migration_manager` private
tracing: remove `qp.get_migration_manager()` calls
table_helper: remove `qp.get_migration_manager()` calls
thrift: handler: move implementation of `execute_schema_command` to `query_processor`
data_dictionary: add `get_version`
cql3: statements: schema_altering_statement: move `execute0` to `query_processor`
cql3: statements: pass `migration_manager&` explicitly to `prepare_schema_mutations`
main: add missing `supervisor::notify` message
The test `test_downgrade_after_successful_upgrade_fails` stops all
nodes, reconfigures them to support the test-only feature and restarts
them. Unfortunately, it looks like python driver sometimes does not
handle this properly and might not reconnect after all nodes are shut
down.
This commit adds a workaround for scylladb/python-driver#230 - the test
re-creates python driver session right after nodes are restarted.
The `reconnect_driver` function will be useful outside the
`topology_raft_disabled` test suite - namely, for cluster feature tests
in `topology`. The best course of action for this function would be to
put it into pylib utils; however, the function depends on ManagerClient
which is defined in `test.pylib.manager_client` that depends on
`test.pylib.utils` - therefore we cannot put it there as it would cause
an import cycle. The `topology.utils` module sounds like the next best
thing.
In addition, the docstring comment is updated to reflect that this
function will now be used to work around another issue as well.
Pass `migration_manager&`, `forward_service&` and `raft_group0_client&`
in the remote init step which happens after the constructor.
Add a corresponding uninit remote step.
Make sure that any use of the `remote` services is finished before we
destroy the `remote` object by using a gate.
Thanks to this in a later commit we'll be able to move the construction
of `query_processor` earlier in the Scylla initialization procedure.
Scylla's output is often unnecessary to debug a failed test, or even
detrimental because one has to scroll back in the terminal after each
test run, to see the actual test's output. Add an option,
--omit-scylla-output, which when present on the command line of `run`,
the output of scylla will be omitted from the test output.
Also, to help discover this option (and others), don't run the tests
when either -h or --help is present on the command line. Just invoke
pytest (with said option) and exit.
Scenario:
1. Start a cluster with nodes node1, node2, node3
2. Start node4 replacing node node2
3. Stop node node4 after it joined group0 but before it advertised itself in gossip
4. Start node5 replacing node node2
Test simulates the behavior described in #13543.
Test passes only if `wait_for_peers_to_enter_synchronize_state` doesn't need to
resolve all IPs to return early. If not, node5 will hang trying to resolve the
IP of node4:
```
raft_group0_upgrade - : failed to resolve IP addresses of some of the cluster members ([node4's host ID])
```
memory_footprint_test fails with:
`sstable - writing sstables with too old format`
because it attempts to write the obsolete sstables formats,
for which the writer code has been long removed.
Fix that.
Closes#14265
Add a unit test which tests evaluating field selections.
Alas at the moment it's impossible to add a cql-pytest,
as the grammar and query planning doesn't handle field
selections inside the WHERE clause.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
CQL statements carry expressions in many contexts: the SELECT, WHERE, SET, and IF clauses, plus various attributes. Previously, each of these contexts had its own representation for an expression, and another one for the same expression but before preparation. We have been gradually moving towards a uniform representation of expressions.
This series tackles SELECT clause elements (selectors), in their unprepared phase. It's relatively simple since there are only five types of expression components (column references, writetime/ttl modifiers, function calls, casts, and field selections). Nevertheless, there isn't much commonality with previously converted expression elements so quite a lot of code is involved.
After the series, we are still left with a custom post-prepare representation of expressions. It's quite complicated since it deals with two passes, for aggregation, so it will be left for another series.
Closes#14219
* github.com:scylladb/scylladb:
cql3: seletor: drop inheritance from assignment_testable
cql3: selection: rely on prepared expressions
cql3: selection: prepare selector expressions
cql3: expr: match counter arguments to function parameters expecting bigint
cql3: expr: avoid function constant-folding if a thread is needed
cql3: add optional type annotation to assignment_testable
cql3: expr: wire unresolved_identifier to test_assignment()
cql3: expr: support preparing column_mutation_attribute
cql3: expr: support preparing SQL-style casts
cql3: expr: support preparing field_selection expressions
cql3: expr: make the two styles of cast expressions explicit
cql3: error injection functions: mark enabled_injections() as impure
cql3: eliminate dynamic_cast<selector> from functions::get()
cql3: test_assignment: pass optional schema everywhere
cql3: expr: prepare_expr(): allow aggregate functions
cql3: add checks for aggregation functions after prepare
cql3: expr: add verify_no_aggregate_functions() helper
test: add regression test for rejection of aggregates in the WHERE clause
cql3: expr: extract column_mutation_attribute_type
cql3: expr: add fmt formatter for column_mutation_attribute_kind
cql3: statements: select_statement: reuse to_selectable() computation in SELECT JSON
this series adds an option named "uuid_sstable_identifier_enabled", and the related cluster feature bit, which is set once all nodes in this cluster set this option to "true". and the sstable subsystem will start using timeuuid instead plain integer for the identifier of sstables. timeuuid should be a better choice for identifiers as we don't need to worry about the id conflicts anymore. but we still have quite a few tests using static sstables with integer in their names, these tests are not changed in this series. we will create some tests to exercise the sstable subsystem with this option set.
a very simple inter-op test with Cassandra 4.1.1 was also performed to verify that the generated sstables can be read by the Cassandra:
1. start scylla, and connect it with cqlsh, run following commands, and stop it
```
cqlsh> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy','replication_factor':1} ;
cqlsh> CREATE TABLE ks.cf ( name text primary key, value text );
cqlsh> INSERT INTO ks.cf (name, value) VALUES ('1', 'one');
cqlsh> SELECT * FROM ks.cf;
```
2. enable Cassandra's `uuid_sstable_identifiers_enabled`, and start Cassandra 4.1.1, and connect it with cqlsh, run following commands, and stop it
```
cqlsh> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy','replication_factor':1} ;
cqlsh> CREATE TABLE ks.cf ( name text primary key, value text );
cqlsh> INSERT INTO ks.cf (name, value) VALUES ('1', 'one');
cqlsh> SELECT * FROM ks.cf;
```
2. move away the sstables generated by Cassandra, and replace it with the sstables generated by scylladb:
```console
$ mv ~/cassandra/data/data/ks/cf-b29d23a009d911eeb5fed163c4d0af49 /tmp
$ mv ~/scylla/ks/cf-db47a12009d611eea6b8b179df3a2d5d ~/cassandra/data/data/ks/cf-b29d23a009d911eeb5fed163c4d0af49
```
3. start Cassandra 4.1.1 again, and connect it with cqlsh, run following commands
```
cqlsh> SELECT * FROM ks.cf;
name | value
------+-------
1 | one
```
Fixes https://github.com/scylladb/scylladb/issues/10459Closes#13932
* github.com:scylladb/scylladb:
replica,sstable: introduce invalid generation id
sstables, replica: pass uuid_sstable_identifiers to generation generator
gms/feature_service: introduce UUID_SSTABLE_IDENTIFIERS cluster feature
db: config: add uuid_sstable_identifiers_enabled option
sstables, replica: support UUID in generation_type
the invalid sstable id is the NULL of a sstable identifier. with
this concept, it would be a lot simpler to find/track the greatest
generation. the complexity is hidden in the generation_type, which
compares the a) integer-based identifiers b) uuid-based identifiers
c) invalid identitifer in different ways.
so, in this change
* the default constructor generation_type is
now public.
* we don't check for empty generation anymore when loading
SSTables or enumerating them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we assume that generation is always integer based.
in order to enable the UUID-based generation identifier if the related
option is set, we should populate this option down to generation generator.
because we don't have access to the cluster features in some places where
a new generation is created, a new accessor exposing feature_service from
sstable manager is added.
Fixes#10459
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change generalize the value of generation_type so it also
supports UUID based identifier.
* sstables/generation_type.h:
- add formatter and parse for UUID. please note, Cassandra uses
a different format for formatting the SSTable identifier. and
this formatter suits our needs as it uses underscore "_" as the
delimiter, as the file name of components uses dash "-" as the
delimiter. instead of reinventing the formatting or just use
another delimiter in the stringified UUID, we choose to use the
Cassandra's formatting.
- add accessors for accessing the type and value of generation_type
- add constructor for constructing generation_type with UUID and
string.
- use hash for placing sstables with uuid identifiers into shards
for more uniformed distrbution of tables in shards.
* replica/table.cc:
- only update the generator if the given generation contains an
integer
* test/boost:
- add a simple test to verify the generation_type is able to
parse and format
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This patch adds an xfailing test reproducing the bug in issue #12762:
When a SELECT uses a secondary index to list rows, if there is also a
PER PARTITION LIMIT given, Scylla forgets to apply it.
The test shows that the PER PARTITION LIMIT is correctly applied when
the index doesn't exist, but forgotten when the index is added.
In contrast, both cases work correctly in Cassandra.
This patch also adds a second variant of this test, which adds filtering
to the mix, and ensures that PER PARTITION LIMIT 1 doesn't give up on the
first row of each partition - but rather looks for the first row that
passes the filter, and only then moves on to the next partition.
Refs #12762.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#14248
Move the initialization of `storage_proxy` early in the startup procedure, before starting
`system_keyspace`, `messaging_service`, `gossiper`, `storage_service` and more.
As a follow-up, we'll be able to move initialization of `query_processor` right
after `storage_proxy` (but this requires a bit of refactoring in
`query_processor` too).
Local queries through `storage_proxy` can be done after the early initialization step.
In a follow-up, when we do a similar thing for `query_processor`, we'll be able
to perform local CQL queries early as well. (Before starting `gossiper` etc.)
Closes#14231
* github.com:scylladb/scylladb:
main, cql_test_env: initialize `storage_proxy` early
main, cql_test_env: initialize `database` early
storage_proxy: rename `init_messaging_service` to `start_remote`
storage_proxy: don't pass `gossiper&` and `messaging_service&` during initialization
storage_proxy: prepare for missing `remote`
storage_proxy: don't access `remote` during local queries in `query_partition_key_range_concurrent`
db: consistency_level: remove overload of `filter_for_query`
storage_proxy: don't access `remote` when calculating target replicas for local queries
storage_proxy: introduce const version of `remote()`
replica: table: introduce `get_my_hit_rate`
storage_proxy: `endpoint_filter`: remove gossiper dependency