It's only used by the single test and apparently exists since the times
seastar was missing the future::discard_result() sugar
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#12803
Move long running topology tests out of `test_topology.py` and into their own files, so they can be run in parallel.
While there, merge simple schema tests.
Closes#12804
* github.com:scylladb/scylladb:
test/topology: rename topology test file
test/topology: lint and type for topology tests
test/topology: move topology ip tests to own file
test/topology: move topology test remove garbaje...
test/topology: move topology rejoin test to own file
test/topology: merge topology schema tests and...
test/topology: isolate topology smp params test
test/topology: move topology helpers to common file
group0 members to own file
Move slow test for removenode with nodes not present in group0 to a
server after a sudden stop to a separate file.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.
Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.
`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.
Also fix a comment in `cdc_with_lwt_test`.
Fixes#12098Closes#12768
* github.com:scylladb/scylladb:
test/cql-pytest: test_cdc: regression test for #12098
test/cql: cdc_with_lwt_test: fix comment
service: storage_proxy: sequence CDC preimage select with Paxos learn
... move them to their own file.
Schema verification tests for restart, add, and hard stop of server can
be done with the same cluster. Merge them in the same test case.
While there, move them to a separate file to be run independently as
this is a slow test.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Improve logging by printing the cluster at the end of each test.
Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure.
Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test.
Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do.
Closes#12652
* github.com:scylladb/scylladb:
test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
test/topology: don't drop random_tables keyspace after a failed test
test/pylib: mark cluster as dirty after a failed test
test: pylib, topology: don't perform operations after test on a dirty cluster
test/pylib: print cluster at the end of test
Force snapshot with schema changes while server down. Then verify schema when bringing back up the server.
Closes#12726
* github.com:scylladb/scylladb:
pytest/topology: check snapshot transfer
raft conf error injection for snapshot
test/pylib: one-shot error injection helper
Perform multiple LWT inserts to different keys ensuring none of them
observes a preimage.
On my machine this test reproduces the problem more than 50% of the time
in debug mode.
The helping wrapper facilitates the usage of sharded<sstable_directory> for several test cases and the helper and its callers had deserved some cleanup over time.
Closes#12791
* github.com:scylladb/scylladb:
sstable_directory_test: Reindent and de-multiline
sstable_directory_test: Enlighten and rename sstable_from_existing_file
sstable_directory_test: Remove constant parallelizm parameter
Merging rows from different partition versions should preserve the LRU link of
the entry from the newer version. We need this in case we're merging two last
dummy entries where the older dummy is already unlinked from the LRU. The
newer dummy could be the last entry which is still holding the partition
entry linked in the LRU.
The mutation_partition_v2 merging didn't take the LRU link from the newer
entry, and we could end up with the partition entry not having any entries
linked in the LRU.
Introduced in f73e2c992f.
Fixes#12778Closes#12785
System keyspace is a keyspace with local replication strategy and thus
it does not need to be repaired. It is possible to invoke repair
of this keyspace through the api, which leads to runtime error since
peer_events and scylla_table_schema_history have different sharding logic.
For keyspaces with local replication strategy repair_service::do_repair_start
returns immediately.
Closes#12459
* github.com:scylladb/scylladb:
test: rest_api: check if repair of system keyspace returns before corresponding task is created
repair: finish repair immediately on local keyspaces
Many tests using sstable directory wrapper have broken indentation with
previous patching. Fix it. No functional changes.
Also, while at it, convert multiline wrapper calls into one-line, after
previous patch these are short enough for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It used to be the sstable maker for sstable::test_env / cql_test_env,
now sstables for tests are made via sstables manager explicitly, so the
guy can be remaned to something more relevant to its current status.
Also, de-mark its constructors as explicit to make callers look shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This reverts commit e7d5e508bc. It ends up
failing continuous integration tests randomly. We don't know if it's
uncovering an existing bug, or if RBNO itself is broken, but for now we
need to revert it to unblock progress.
Now it's scattered between dist. loader and sstable directory code making the latter quite bloated. Keeping everything in distributed loader makes the sstable_directory code compact and easier to patch to support object storage backend.
Closes#12771
* github.com:scylladb/scylladb:
sstable_directory: Rename remove_input_sstables_from_reshaping()
sstable_directory: Make use of remove_sstables() helper
sstable_directory: Merge output sstables collecting methods
distributed_loader: Remove max_compaction_threshold argument from reshard()
distributed_loader: Remove compaction_manager& argument from reshard()
sstable_directory: Move the .reshard() to distributed_loader
sstable_directory: Add helper to load foreign sstable
sstable_directory: Add io-prio argument to .reshard()
sstable_directory: Move reshard() to distributed_loader.cc
distributed_loader: Remove compaction_manager& argument from reshape()
sstable_directory: Move the .reshape() to distributed loader
sstable_directory: Add helper to retrive local sstables
sstable_directory: Add io-prio argument to .reshape()
sstable_directory: Move reshape() to distributed_loader.cc
CQL transport sever error handling fixes and improvements:
* log failed requests with `DEBUG` level for easier debugging;
* in case of unhandled errors, deliver them to the client as `SERVER_ERROR`'s
* fix for `protocol_error`'s in case of shedded big requests;
* explicit tests have been written for the error handling problems above.
Closes#11949
* github.com:scylladb/scylladb:
transport server: fix "request size too large" handling
transport server: log failed requests with debug level
transport server: fix unexpected server errors handling
transport server: log client errors with debug level
It unlinks unshared sstables filtering some of them out. Name it
according to what it does without mentioning reshape/reshard.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently it's called remove_input_sstables_from_resharding() but it's
just unlinks sstables in parallel from the given list. So rename it not
to mention reshard and also make use of this "new" helper in the
remove_input_sstables_from_reshaping(), it needs exactly the same
functionality.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two of them collecting sstables from resharding and reshaping.
Both doing the same job except for the latter doesn't expect the list to
contain remote sstables.
This patch merges them together with the help of an extra sanity boolean
to check for the remote sstable not in the list. And renames the method
not to mention reshape/reshard.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We have a cql3::expr::expression::printer wrapper that annotates
an expression with a debug_mode boolean prior to formatting. The
fmt library, however, provides a much simpler alterantive: a custom
format specifier. With this, we can write format("{:user}", expr) for
user-oriented prints, or format("{:debug}", expr) for debug-oriented
prints (if nothing is specified, the default remains debug).
This is done by implementing fmt::formatter::parse() for the
expression type, can using expression::printer internally.
Since sometimes we pass expression element types rather than
the expression variant, we also provide a custom formatter for all
ExpressionElement Types.
Uses for expression::printer are updated to use the nicer syntax. In
one place we eliminate a temporary that is no longer needed since
ExpressionElement:s can be formatted directly.
Closes#12702
Calling _read_buf.close() doesn't imply eof(), some data
may have already been read into kernel or client buffers
and will be returned next time read() is called.
When the _server._max_request_size limit was exceeded
and the _read_buf was closed, the process_request method
finished and we started processing the next request in
connection::process. The unread data from _read_buf was
treated as the header of the next request frame, resulting
in "Invalid or unsupported protocol version" error.
The existing test_shed_too_large_request was adjusted.
It was originally written with the assumption that the data
of a large query would simply be dropped from the socket
and the connection could be used to handle the
next requests. This behaviour was changed in scylladb#8800,
now the connection is closed on the Scylla side and
can no longer be used. To check there are no errors
in this case, we use Scylla metrics, getting them
from the Scylla Prometheus API.
If request processing ended with an error, it is worth
sending the error to the client through
make_error/write_response. Previously in this case we
just wrote a message to the log and didn't handle the
client connection in any way. As a result, the only
thing the client got in this case was timeout error.
A new test_batch_with_error is added. It is quite
difficult to reproduce error condition in a test,
so we use error injection instead. Passing injection_key
in the body of the request ensures that the exception
will be thrown only for this test request and
will not affect other requests that
the driver may send in the background.
Closes: scylladb#12104
Since the whole reshard() is local to dist. loader code now, the caller
of the reshard helper may let this method get the threshold itself
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now all the reshading logic is accumulated in distributed loader and the
sstable_directory is just the place where sstables are collected.
The changes summary is:
- add sstable_directory as argument (used to be "this")
- replace all "this" captures with &dir ones
- remove temporary namespace gap and declaration from sst-dir class
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is to generalize the code duplication between .reshard() and
existing .load_foreign_sstables() (plural form).
Make it coroutinized right at once.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now all the reshaping logic is accumulated in distributed loader and the
sstable_directory is just the place where sstables are collected.
The changes summary is:
- add sstable_directory as argument (used to be "this")
- replace all "this" captures with &dir ones
- remove temporary namespace gap and declaration from sst-dir class
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are methods to retrive shared local sstables and foreign sstables,
so here's one more to the family
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.
Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.
`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.
Fixes#12098
Test snapshot transfer by reducing the snapshot threshold on initial
servers (3 and 1 trailing).
Then creates a table, and does 3 extra schema changes (add column),
triggering at least 2 snapshots.
Then brings a new server to the cluster, which will get the schema
through a snapshot.
Then the test stops the initial servers and verifies the table
schema is up to date on the new server.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Let the initial range passed to query_partition_key_range
be [1, 2) where 2 is the successor of 1 in terms
of ring_position order and 1 is equal to vnode.
Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)],
so we get an empty range (1,2) and subsequently will
make a data request with this empty range in
storage_proxy::query_partition_key_range_concurrent,
which will be redundant.
The patch adds a check for this condition after
making a split in the main loop in process_one_range.
The patch does not attempt to handle cases where the
original ranges were empty, since this check is the
responsibility of the caller. We only take care
not to add empty ranges to the result as an
unintentional artifact of the algorithm in
query_ranges_to_vnodes_generator.
A test case is added in test_get_restricted_ranges.
The helper lambda check is changed so that not to limit
the number of ranges to the length of expected
ranges, otherwise this check passes without
the change in process_one_range.
Fixes: #12566Closes#12755
request_controller_timeout_exception_factory::timeout() creates an
instance of `request_controller_timed_out_error` whose ctor is
default-created by compiler from that of timed_out_error, which is
in turn default-created from the one of `std::exception`. and
`std::exception::exception` does not throw. so it's safe to
mark this factory method `noexcept`.
with this specifier, we don't need to worry about the exception thrown
by it, and don't need to handle them if any in `seastar::semaphore`,
where `timeout()` is called for the customized exception.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12759
- main: consider EDQUOT as environmental failure also
- main: use defer_verbose_shutdown() to shutdown compaction manager
- replica/table: extract should_retry() int with_retry
- replica/table: retry on EDQUOT when flushing memtable
Fixes#12626Closes#12653
* github.com:scylladb/scylladb:
replica/table: retry on EDQUOT when flushing memtable
replica/table: extract should_retry() int with_retry
main: use defer_verbose_shutdown() to shutdown compaction manager
main: consider EDQUOT as environmental failure also