Load the command line arguments, if any, from suite.yaml, rather
than keep them hard-coded in test.py.
This is allows operations team to have easier access to these.
Note I had to sacrifice dynamic smp count for mutation_reader_test
(the new smp count is fixed at 3) since this is part
of test configuration now.
This way we can avoid iterating over all tests
to handle --repeat.
Besides, going forward the tests will be stored
in two places: in the global list of all tests,
for the runner, and per suite, for suite-based
reporting, so it's easier if TestSuite
if fully responsible for finding and adding tests.
Scan entire test/ for folders that contain suite.yaml,
and load tests from these folders. Skip the rest.
Each folder with a suite.yaml is expected to have a valid
suite configuration in the yaml file.
A suite is a folder with test of the same type. E.g.
it can be a folder with unit tests, boost tests, or CQL
tests.
The harness will use suite.yaml to create an appropriate
suite test driver, to execute tests in different formats.
It reduces the number of configurations to re-test when test.py is
modified. and simplifies usage of test.py in build tools, since you no
longer need to bother with extra arguments.
Going forward I'd like to make terminal output brief&tabular,
but some test details are necessary to preserve so that a failure
is easy to debug. This information now goes to the log file.
- open and truncate the log file on each harness start
- log options of each invoked test in the log, so that
a failure is easy to reproduce
- log test result in the log
Since tests are run concurrently, having an exact
trace of concurrent execution also helps
debugging flaky tests.
The storage_service struct is a collection of diverse things,
most of them requiring only on start and on stop and/or runing
on shard 0 (but is nonetheless sharded).
As a part of clearing this structure and generated by it inter-
-componenes dependencies, here's the sanitation of load_broadcaster.
Fixes#5582
... but only populate log on shard 0.
Migration manager callbacks are slightly assymetric. Notifications
for pre-create/update mutations are sent only on initiating shard
(neccesary, because we consider the mutations mutable).
But "created" callbacks are sent on all shards (immutable).
We must subscribe on all shards, but still do population of cdc table
only once, otherwise we can either miss table creat or populate
more than once.
v2:
- Add test case
Message-Id: <20200113140524.14890-1-calle@scylladb.com>
* seastar 36cf5c5ff0...3f3e117de3 (16):
> memcached: don't use C++17-only std::optional
> reactor: Comment why _backend is assigned in constructor body
> log: restore --log-to-stdout for backward compatibility
> used_size.hh: Include missing headers
> core: Move some code from reactor.cc to future.cc
> future-util: move parallel_for_each to future-util.cc
> task: stop wrapping tasks with unique_ptr
> Merge "Setup timer signal handler in backend constructor" from Pavel
Fixes#5524
> future: avoid a branch in future's move constructor if type is trivial
> utils: Expose used_size
> stream: Call get_future early
> future-util: Move parallel_for_each_state code to a .cc
> memcached: log exceptions
> stream: Delete dead code
> core: Turn pollable_fd into a simple proxy over pollable_fd_state.
> Merge "log to std::cerr" from Benny
This is the part of de-bloating storage_service.
The field in question is used to temporary keep the _token_metadata
value during shard-wide replication. There's no need to have it as
class member, any "local" copy is enough.
Also, as the size of token_metadata is huge, and invoke_on_all()
copies the function for each shard, keep one local copy of metadata
using do_with() and pass it into the invoke_on_all() by reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Reviewed-by: Asias He <asias@scylladb.com>
Message-Id: <20200113171657.10246-1-xemul@scylladb.com>
The query option always_return_static_content was added for lightweight
transations in commits e0b31dd273 (infrastructure) and 65b86d155e
(actual use). However, the flag was added unconditionally to
update_parameters::options. This caused it to be set for list
read-modify-write operations, not just for lightweight transactions.
This is a little wasteful, and worse, it breaks compatibility as old
nodes do not understand the always_return_static_content flag and
complain when they see it.
To fix, remove the always_return_static_content from
update_parameters::options and only set it from compare-and-swap
operations that are used to implement lightweight transactions.
Fixes#5593.
Reviewed-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20200114135133.2338238-1-avi@scylladb.com>
The drain_in_progress variable here is the future that's set by the
drain() operation itself. Its promise is set when the drain() finishes.
The check for this future in the beginning of drain() is pointless.
No two drain()-s can run in parallels because of run_with_api_lock()
protection. Doing the 2nd drain after successfull 1st one is also
impossible due to the _operation_mode check. The 2nd drain after
_exceptioned_ (and thus incomplete) 1st one will deadlock, after
this patch will try to drain for the 2nd time, but that should by ok.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200114094724.23876-1-xemul@scylladb.com>
This change introduces system.clients table, which provides
information about CQL clients connected.
PK is the client's IP address, CK consists of outgoing port number
and client_type (which will be extended in future to thrift/alternator/redis).
Table supplies also shard_id and username. Other columns,
like connection_stage, driver_name, driver_version...,
are currently empty but exist for C* compatibility and future use.
This is an ordinary table (i.e. non-virtual) and it's updated upon
accepting connections. This is also why C*'s column request_count
was not introduced. In case of abrupt DB stop, the table should not persist,
so it's being truncated on startup.
Resolves#4820
"
Most of the code in `cell` and the `imr` infrastructure it is built on
is `noexcept`. This means that extra care must be taken to avoid rouge
exceptions as they will bring down the node. The changes introduced by
0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this
code path by violating an undocumented and unvalidated assumption --
that fragment ranges passed to `cell::make_collection()` are nothrow
copyable and movable.
This series refactors `cell::make_collection()` such that it does not
have this assumption anymore and is safe to use with any range.
Note that the unit test included in this series, that was used to find
all the possible exception sources will not be currently run in any of
our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not
being set. I plan to address this in a followup because setting this
flags fails other tests using the failure injection mechanism. This is
because these tests are normally run with the failure injection disabled
so failures managed to lurk in without anyone noticing.
Fixes: #5575
Refs: #5341
Tests: unit(dev, debug)
"
* 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla:
test: mutation_test: add exception safety test for large collection serialization
data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
utils/fragment_range.hh: introduce fragment_range_view
We do not yet support the ScanIndexForward=false option for reversing
the sort order of a Query operation, as reported in issue #5153.
But even before implementing this feature, it is important that we
produce an error if a user attempts to use it - instead of outright
ignoring this parameter and giving the user wrong results. This is
what this patch does.
Before this patch, the reverse-order query in the xfailing test
test_query.py::test_query_reverse seems to succeed - yet gives
results in the wrong order. With this patch, the query itself fails -
stating that the ScanIndexForward=false argument is not supported.
Refs #5153
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200105113719.26326-1-nyh@scylladb.com>
Here's another theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and some other operation.
Let's walk through them
First goes the removenode:
run_with_api_lock
_operation_in_progress = "removenode"
storage_service::remove_node
sleep in replicating_nodes.empty() loop
Now the force_removenode can run:
run_with_no_api_lock
storage_service::force_removenode
check _operation_in_progress (not empty)
_force_remove_completion = true
sleep in _operation_in_progress.empty loop
Now the 1st call wakes up and:
if _force_remove_completion == true
throw <some exception>
.finally() handler in run_with_api_lock
_operation_in_progress = <empty>
At this point some other operation may start. Say, drain:
run_with_api_lock
_operation_in_progress = "drain"
storage_service::drain
...
go to sleep somewhere
No let's go back to the 1st op that wakes up from its sleep.
The code it executes is
while (!ss._operation_in_progress.empty()) {
sleep_abortable()
}
and while the drain is running it will never exit.
However (! and this is the core of the race) should the drain
operation happen _before_ the force_removenode, another check
for _operation_in_progress would have made the latter exit with
the "Operation drain is in progress, try again" message.
Fix this inconsistency by making the check for current operation
every wake-up from the sleep_abortable.
Fixes#5591
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Here's a theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and removenode (again)
operations. Let's walk through them
First goes the removenode:
run_with_api_lock
_operation_in_progress = "removenode"
storage_service::remove_node
sleep in replicating_nodes.empty() loop
Now the force_removenode can run:
run_with_no_api_lock
storage_service::force_removenode
check _operation_in_progress (not empty)
_force_remove_completion = true
sleep in _operation_in_progress.empty loop
Now the 1st call wakes up and:
if _force_remove_completion == true
_force_remove_completion = false
throw <some exception>
.finally() handler in run_with_api_lock
_operation_in_progress = <empty>
! at this point we have _force_remove_completion = false and
_operation_in_progress = <empty>, which opens the following
opportunity for the 3d removenode:
run_with_api_lock
_operation_in_progress = "removenode"
storage_service::remove_node
sleep in replicating_nodes.empty() loop
Now here's what we have in 2nd and 3rd ops:
1. _operation_in_progress = "removenode" (set by 3rd) prevents the
force_removenode from exiting its loop
2. _force_remove_completion = false (set by 1st on exit) prevents
the removenode from waiting on replicating_nodes list
One can start the 4th call with force_removenode, it will proceed and
wake up the 3rd op, but after it we'll have two force_removenode-s
running in parallel and killing each other.
I propose not to set _force_remove_completion to false in removenode,
but just exit and let the owner of this flag unset it once it gets
the control back.
Fixes#5590
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Other types do not have a wider accumulator at the moment.
And static_cast<accumulator_type>(ret) != _sum evaluates as
false for NaN/Inf floating point values.
Fixes#5586
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>
Now that atomic_cell_view and collection_mutation_view have
type-aware printers, we can use them in the type-aware atomic_cell_or_collection
printer.
Message-Id: <20191231142832.594960-1-avi@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5533
from Avi Kivity:
canonical_mutation objects are used for schema reconciliation, which is a
fragile area and thus deserves some debugging help.
This series makes canonical_mutation objects printable.