group0 members to own file
Move slow test for removenode with nodes not present in group0 to a
server after a sudden stop to a separate file.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
... move them to their own file.
Schema verification tests for restart, add, and hard stop of server can
be done with the same cluster. Merge them in the same test case.
While there, move them to a separate file to be run independently as
this is a slow test.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Force snapshot with schema changes while server down. Then verify schema when bringing back up the server.
Closes#12726
* github.com:scylladb/scylladb:
pytest/topology: check snapshot transfer
raft conf error injection for snapshot
test/pylib: one-shot error injection helper
The helping wrapper facilitates the usage of sharded<sstable_directory> for several test cases and the helper and its callers had deserved some cleanup over time.
Closes#12791
* github.com:scylladb/scylladb:
sstable_directory_test: Reindent and de-multiline
sstable_directory_test: Enlighten and rename sstable_from_existing_file
sstable_directory_test: Remove constant parallelizm parameter
Merging rows from different partition versions should preserve the LRU link of
the entry from the newer version. We need this in case we're merging two last
dummy entries where the older dummy is already unlinked from the LRU. The
newer dummy could be the last entry which is still holding the partition
entry linked in the LRU.
The mutation_partition_v2 merging didn't take the LRU link from the newer
entry, and we could end up with the partition entry not having any entries
linked in the LRU.
Introduced in f73e2c992f.
Fixes#12778Closes#12785
System keyspace is a keyspace with local replication strategy and thus
it does not need to be repaired. It is possible to invoke repair
of this keyspace through the api, which leads to runtime error since
peer_events and scylla_table_schema_history have different sharding logic.
For keyspaces with local replication strategy repair_service::do_repair_start
returns immediately.
Closes#12459
* github.com:scylladb/scylladb:
test: rest_api: check if repair of system keyspace returns before corresponding task is created
repair: finish repair immediately on local keyspaces
Many tests using sstable directory wrapper have broken indentation with
previous patching. Fix it. No functional changes.
Also, while at it, convert multiline wrapper calls into one-line, after
previous patch these are short enough for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It used to be the sstable maker for sstable::test_env / cql_test_env,
now sstables for tests are made via sstables manager explicitly, so the
guy can be remaned to something more relevant to its current status.
Also, de-mark its constructors as explicit to make callers look shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it's scattered between dist. loader and sstable directory code making the latter quite bloated. Keeping everything in distributed loader makes the sstable_directory code compact and easier to patch to support object storage backend.
Closes#12771
* github.com:scylladb/scylladb:
sstable_directory: Rename remove_input_sstables_from_reshaping()
sstable_directory: Make use of remove_sstables() helper
sstable_directory: Merge output sstables collecting methods
distributed_loader: Remove max_compaction_threshold argument from reshard()
distributed_loader: Remove compaction_manager& argument from reshard()
sstable_directory: Move the .reshard() to distributed_loader
sstable_directory: Add helper to load foreign sstable
sstable_directory: Add io-prio argument to .reshard()
sstable_directory: Move reshard() to distributed_loader.cc
distributed_loader: Remove compaction_manager& argument from reshape()
sstable_directory: Move the .reshape() to distributed loader
sstable_directory: Add helper to retrive local sstables
sstable_directory: Add io-prio argument to .reshape()
sstable_directory: Move reshape() to distributed_loader.cc
CQL transport sever error handling fixes and improvements:
* log failed requests with `DEBUG` level for easier debugging;
* in case of unhandled errors, deliver them to the client as `SERVER_ERROR`'s
* fix for `protocol_error`'s in case of shedded big requests;
* explicit tests have been written for the error handling problems above.
Closes#11949
* github.com:scylladb/scylladb:
transport server: fix "request size too large" handling
transport server: log failed requests with debug level
transport server: fix unexpected server errors handling
transport server: log client errors with debug level
We have a cql3::expr::expression::printer wrapper that annotates
an expression with a debug_mode boolean prior to formatting. The
fmt library, however, provides a much simpler alterantive: a custom
format specifier. With this, we can write format("{:user}", expr) for
user-oriented prints, or format("{:debug}", expr) for debug-oriented
prints (if nothing is specified, the default remains debug).
This is done by implementing fmt::formatter::parse() for the
expression type, can using expression::printer internally.
Since sometimes we pass expression element types rather than
the expression variant, we also provide a custom formatter for all
ExpressionElement Types.
Uses for expression::printer are updated to use the nicer syntax. In
one place we eliminate a temporary that is no longer needed since
ExpressionElement:s can be formatted directly.
Closes#12702
Calling _read_buf.close() doesn't imply eof(), some data
may have already been read into kernel or client buffers
and will be returned next time read() is called.
When the _server._max_request_size limit was exceeded
and the _read_buf was closed, the process_request method
finished and we started processing the next request in
connection::process. The unread data from _read_buf was
treated as the header of the next request frame, resulting
in "Invalid or unsupported protocol version" error.
The existing test_shed_too_large_request was adjusted.
It was originally written with the assumption that the data
of a large query would simply be dropped from the socket
and the connection could be used to handle the
next requests. This behaviour was changed in scylladb#8800,
now the connection is closed on the Scylla side and
can no longer be used. To check there are no errors
in this case, we use Scylla metrics, getting them
from the Scylla Prometheus API.
If request processing ended with an error, it is worth
sending the error to the client through
make_error/write_response. Previously in this case we
just wrote a message to the log and didn't handle the
client connection in any way. As a result, the only
thing the client got in this case was timeout error.
A new test_batch_with_error is added. It is quite
difficult to reproduce error condition in a test,
so we use error injection instead. Passing injection_key
in the body of the request ensures that the exception
will be thrown only for this test request and
will not affect other requests that
the driver may send in the background.
Closes: scylladb#12104
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Test snapshot transfer by reducing the snapshot threshold on initial
servers (3 and 1 trailing).
Then creates a table, and does 3 extra schema changes (add column),
triggering at least 2 snapshots.
Then brings a new server to the cluster, which will get the schema
through a snapshot.
Then the test stops the initial servers and verifies the table
schema is up to date on the new server.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Let the initial range passed to query_partition_key_range
be [1, 2) where 2 is the successor of 1 in terms
of ring_position order and 1 is equal to vnode.
Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)],
so we get an empty range (1,2) and subsequently will
make a data request with this empty range in
storage_proxy::query_partition_key_range_concurrent,
which will be redundant.
The patch adds a check for this condition after
making a split in the main loop in process_one_range.
The patch does not attempt to handle cases where the
original ranges were empty, since this check is the
responsibility of the caller. We only take care
not to add empty ranges to the result as an
unintentional artifact of the algorithm in
query_ranges_to_vnodes_generator.
A test case is added in test_get_restricted_ranges.
The helper lambda check is changed so that not to limit
the number of ranges to the length of expected
ranges, otherwise this check passes without
the change in process_one_range.
Fixes: #12566Closes#12755
This patch fixes#12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).
The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).
The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.
The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":
Refs #12477: Combining COUNT with GROUP by results with empty results
in Cassandra, and one result with empty count in Scylla.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12715
This patch adds three additional tests for empty (e.g., empty string)
clustering keys.
The first test disproves a worry that was raised in #12561 that perhaps
empty clustering keys only seem work, but they don't get written to
sstables. The new test verifies that there is no bug - they are written
and can be read correctly.
The second and third test reproduce issue #12749 - an empty clustering
should be allowed in a COMPACT STORAGE table only if there is a compound
(multi-column) clustering key. But as the tests demonstrate, 1. if there
is just one clustering column, Scylla gives the wrong error message, and
2. if there is a compound clustering key, Scylla doesn't allow an empty
key as it should.
As usual, all tests pass on Cassandra. The last two tests fail on
Scylla, so are marked xfail.
Refs #12561
Refs #12749
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12750
We currently have two method families to generate partition keys:
* make_keys() in test/lib/simple_schema.hh
* token_generation_for_shard() in test/lib/sstable_utils.hh
Both work only for schemas with a single partition key column of `text` type and both generate keys of fixed size.
This is very restrictive and simplistic. Tests, which wanted anything more complicated than that had to rely on open-coded key generation.
Also, many tests started to rely on the simplistic nature of these keys, in particular two tests started failing because the new key generation method generated keys of varying size:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation
These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test.
Closes#12657
* github.com:scylladb/scylladb:
test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends
test/lib/simple_schema: remove now unused make_keys() and friends
test: migrate to tests::generate_partition_key[s]()
test/lib/test_services: add table_for_tests::make_default_schema()
test/lib: add key_utils.hh
test/lib/random_schema.hh: value_generator: add min_size_in_bytes
This patch fixes 2 issues with checking whether UDFs are used in UDAs:
1) UDFs types are not considered during the check, which prevents us from dropping a UDF that isn't used in any UDAs, but shares its name with one of them.
2) the REDUCEFUNC is not considered during the check, which allows dropping a UDF even though it's used in a UDA as the REDUCEFUNC.
Additionally, tests for these issues are added
Closes#12681
* github.com:scylladb/scylladb:
udf: also check reducefunc to confirm that a UDF is not used in a UDA
udf: fix dropping UDFs that share names with other UDFs used in UDAs
pytest: add optional argument for new_function argument types
Fixes: https://github.com/scylladb/scylladb/issues/12309Closes#12720
* github.com:scylladb/scylladb:
service/raft: raft_group_registry: use recent_entries_map to store rate_limits in pinger. Fixes#12309
utils: introduce recent_entries_map datatype to track least recent visited entries.
When dropping a UDF we're checking if it's not begin used in any UDAs
and fail otherwise. However, we're only checking its state function
and final function, and it may also be used as its reduce function.
This patch adds the missing checks and a test for them.
Currently, when dropping a function, we only check if there exist
an aggregate that uses a function with the same name as its state
function or final function. This may cause the drop to fail even
when it's just another UDF with the same name that's used in the
aggregate, even when the actual dropped function is not used there.
This patch fixes this by checking whether not only the name of the
UDA's sfunc and finalfunc, but also their argument types.
When multiple functions with the same name but different argument types
are created, the default drop statement for these functions will fail
because it does not include the argument types.
With this change, this problem can be worked around by specifying
argument types when creating the function, as this will cause the drop
statement to include them.
The system_keyspace defines several auxiliary methods to help view_builder update system.scylla_views_builds_in_progress and system.built_views tables. All use global qctx thing.
It only takes adding view_builder -> system_keyspace dependency in order to de-static all those helpers and let them use query-processor from it, not the qctx.
Closes#12728
* github.com:scylladb/scylladb:
system_keysace: De-static calls that update view-building tables
storage_service: Coroutinize mark_existing_views_as_built()
api: Unset column_famliy endpoints
api: Carry sharded<db::system_keyspace> reference over
view_builder: Add system_keyspace dependency
Since 97bb2e47ff (storage_service: Enable Repair Based Node Operations (RBNO) by default for replace), RBNO was enabled by default for replace ops.
After more testing, we decided to enable repair based node operations by default for all node operations.
Closes#12173
* github.com:scylladb/scylladb:
storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops
test: Increase START_TIMEOUT
test: Increase max-networking-io-control-blocks
storage_service: Check node has left in node_ops_cmd::decommission_done
repair: Use remote dc neighbors for everywhere strategy
Introduces task manager's compaction module. That's an initial
part of integration of compaction with task manager.
When fully integrated, task manager will allow user to track compaction
operations, check status and progress of each individual one. It will help
with creating an asynchronous version of rest api that forces any compaction.
Currently, users can see with /task_manager/list_modules api call that
compaction is one of the modules accessible through task manager.
They won't get any additional information though, since compaction
tasks are not created yet.
A shared_ptr to compaction module is kept in compaction manager.
Closes#12635
* github.com:scylladb/scylladb:
compaction: test: pass task_manager to compaction_manager in test environment
compaction: create and register task manager's module for compaction
tasks: add task_manager constructor without arguments
Makes sstable_set::all() interface robust, and introduces sstable_set::size() to avoid copies when retrieving set size.
Closes#12716
* github.com:scylladb/scylladb:
treewide: Use new sstable_set::size() wherever possible
sstables: Introduce sstable_set::size()
sstables: Fix fragility of sstable_set::all() interface
This series switches memtable and cache to use a new representation for mutation data,
called `mutation_partition_v2`. In this representation, range tombstone information is stored
in the same tree as rows, attached to row entries. Each entry has a new tombstone field,
which represents range tombstone part which applies to the interval between this entry and
the previous one. See docs/dev/mvcc.md for more details about the format.
The transient mutation object still uses the old model in order to avoid work needed to adapt
old code to the new model. It may also be a good idea to live with two models, since the
transient mutation has different requirements and thus different trade-offs can be made.
Transient mutation doesn't need to support eviction and strong exception guarantees,
so its algorithms and in-memory representation can be simpler.
This allows us to incrementally evict range tombstone information. Before this series,
range tombstones were accumulated and evicted only when the whole partition entry was evicted. This
could lead to inefficient use of cache memory.
Another advantage of the new representation is that reads don't have to lookup
range tombstone information in a different tree while reading. This leads to simpler
and more efficient readers.
There are several disadvantages too. Firstly, rows_entry is now larger by 16 bytes.
Secondly, update algorithms are more complex because they need to deoverlap range tombstone
information. Also, to handle preemption and provide strong exception guarantees, update
algorithms may need to allocate sentinel entries, which adds complexity and reduces performance.
The memtable reader was changed to use the same cursor implementation
which cache uses, for improved code reuse and reducing risk of bugs
due to discrepancy of algorithms which deal with MVCC.
Remaining work:
- performance optimizations to apply_monotonically() to avoid regressions
- performance testing
- preemption support in apply_to_incomplete (cache update from memtable)
Fixes#2578Fixes#3288Fixes#10587Closes#12048
* github.com:scylladb/scylladb:
test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction
test: mvcc: Extract mvcc_container::allocate_in_region()
row_cache, lru: Introduce evict_shallow()
test: mvcc: Avoid copies of mutation under failure injection
test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic
mutation_partition_v2: Avoid full scan when applying mutation to non-evictable
Pass is_evictable to apply()
tests: mutation_partition_v2: Introduce test_external_memory_usage_v2 mirroring the test for v1
tests: mutation: Fix test_external_memory_usage() to not measure mutation object footprint
tests: mutation_partition_v2: Add test for exception safety of mutation merging
tests: Add tests for the mutation_partition_v2 model
mutation_partition_v2: Implement compact()
cache_tracker: Extract insert(mutation_partition_v2&)
mvcc, mutation_partition: Document guarantees in case merging succeeds
mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically()
mutation_partition_v2: Simplify get_continuity()
row_cache: Distinguish dummy insertion site in trace log
db: Use mutation_partition_v2 in mvcc
range_tombstone_change_merger: Introduce peek()
readers: Extract range_tombstone_change_merger
mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots
mvcc: partition_snapshot_row_cursor: Support digest calculation
mutation_partition_v2: Store range tombstones together with rows
db: Introduce mutation_partition_v2
doc: Introduce docs/dev/mvcc.md
db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU
db: Print range_tombstone bounds as position_in_partition
test: memtable_test: Relax test_segment_migration_during_flush
test: cache_flat_mutation_reader: Avoid timestamp clash
test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows
test: mvcc: Fix sporadic failures due to compact_for_compaction()
test: lib: random_mutation_generator: Produce partition tombstone less often
test: lib: random_utils: Introduce with_probability()
test: lib: Improve error message in has_same_continuity()
test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker
test: mvcc: Insert entries in the tracker
test: mvcc_test: Do not set dummy::no on non-clustering rows
mutation_partition: Print full position in error report in append_clustered_row()
db: mutation_cleaner: Extract make_region_space_guard()
position_in_partition: Optimize equality check
mvcc: Fix version merging state resetting
mutation_partition: apply_resume: Mark operator bool() as explicit
There's a bunch of them used by mainly view_builder and also by the API
and storage_service. All use global qctx to make its job, now when the
callers have main-local sharded<system_keysace> references they can be
made non-static.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The view builder updates system.scylla_views_builds_in_progress and
.built_views tables and thus needs the system keyspace instance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Each instance of compaction manager should have compaction module pointer
initialized. All contructors get task_manager reference with which
the module is created.
Preferred aternative to sstable_set->all()->size(), which may
involve of copy elements from a single set or multiple ones
if compound_sstable_set is used.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
New test/lib/scylla_test_case.hh, introduced in "tests: Add command line options for Scylla unit tests",
allows extension of the command line options provided by Seastar testing framework.
It allows all boost tests to process additional options without changing a single line of code.
Patch "test: Add x-log2-compaction-groups to Scylla test command line options" builds on that, allowing
all test cases to run with N compaction groups. Again, without changing a line of code in the tests.
Now all you have to do is:
./build/dev/test/boost/sstable_compaction_test -- --smp 1 --x-log2-compaction-groups 1
./test.py --mode=dev --x-log2-compaction-groups 1 --verbose
And it will run the test cases with as many groups as you wish.
./test.py passes successfully with parameter --x-log2-compaction-groups 1.
Closes#12369
* github.com:scylladb/scylladb:
test.py: Add option to run scylla tests with multiple compaction groups
test: Add x-log2-compaction-groups to Scylla test command line options
test: Enable Scylla test command line options for boost tests
tests: Add command line options for Scylla unit tests
replica: table: Add debug log for number of compaction groups
test: sstable_compaction_test: Fix indentation
test: sstable_compaction_test: Make it work with compaction groups
test: test_bloom_filter: Fix it with multiple compaction groups
test: memtable_test: Fix it with multiple compaction groups
Existing helper with async context manager only worked for non one-shot
error injections. Fix it and add another helper for one-shot without a
context manager.
Fix tests using the previous helper.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
`evaluation_inputs` is a struct which contains data needed to evaluate expressions - values of columns, bind variables and other data.
`is_on_of()` is a function used to to evaluate `IN` restrictions. It checks whether the LHS is one of elements on the RHS list.
Generally when evaluating expressions we get the `evaluation_inputs` as an argument and we should pass them along to any functions that evaluate subexpressions.
`is_one_of()` got the inputs as an argument, but didn't pass them along to `equal()`, instead it creates new empty `evaluation_inputs{}` and gives that to `equal()`.
At first [I thought this was a bug](https://github.com/scylladb/scylladb/pull/12356#discussion_r1084300969) - with missing information there could be a crash if `equal()` tried to evaluate an expression with a `bind_variable`.
It turns out that in this particular case `equal()` won't use the `evaluation_inputs` at all. The LHS and RHS passed to it are just constant values, which were already evaluated to serialized bytes before calling `evaluate()`, so there is no bug.
It's still better to pass the inputs argument along if possible. If in the future `equal()` required these inputs for some reason, missing inputs could lead to an unexpected crash.
I couldn't find any tests that would detect this case, so such a bug could stay undetected until an unhappy user finds it because their cluster crashed.
I added some tests to make sure that it's covered from now on.
Closes#12701
* github.com:scylladb/scylladb:
cql-pytest: test filtering using list with bind variable
test/expr_test: test <int_value> IN (123, ?, 456)
cql3: expr: don't pass empty evaluation_inputs in is_one_of
The first patch in this series enables a previously-skipped test for what happens with Limit=0. The test passes.
The second patch adds an xfailng test for very large Limit.
Closes#12625
* github.com:scylladb/scylladb:
test/alternator: xfailing test for huge Limit in ListStreams
alternator/test: un-skip test of zero Limit in ListStreams
The tests can now optionally run with multiple groups via option
--x-log2-compaction-groups.
This includes boost tests and the ones which run against either
one (e.g. cql) or many instances (e.g. topology).
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Now any boost test can run with multiple compaction groups by default,
without any change in the boost test cases whatsoever.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>