Coverage handling is divided into 3 steps:
1. Generation of profiling data from a run of an instrumented file
(which this patch doesn't cover)
2. Processing of profiling data, which involves indexing the profile and
producing the data in some format that can be manipulated and
unified.
3. Generate some reporting based on this data.
The following patch is aiming to deal with the last two steps by providing a
cli and a library for this end.
This patch adds two libraries:
1. `coverage_utils.py` which is a library for manipulating coverage
data, it also contains a cli for the (assumed) most common operations
that are needed in order to eventually generate coverage reporting.
2. `lcov_utils.py` - which is a library to deal with lcov format data,
which is a textual form containing a source dependant coverage data.
An example of such manipulation can be `coverage diff` operation
which produces a set like difference operation. cov_a - cov_b = diff
where diff is an lcov formated file containing coverage data for code
cov_a that is not covered at all in cov_b.
The libraries and cli main goal is to provide a unified way to handle
coverage data in a way that can be easily scriptable and extensible.
This will pave the way for automating the coverage reporting and
processing in test.py and in jenkins piplines (for example to also
process dtest or sct coverage reporting)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.
For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
coverage.
--coverage-mode - which can be used to "turn on" coverage support only
for some of the modes in this run.
The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.
Also added support of 'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
We already have a dedicated coverage build, however, this build is
dedicated mostly for coverage in boost and standalone unit tests.
This added configuration option will compile every configured
build mode with coverage profiling support (excluding 'coverage' mode).
It also does targeted profiling that is narrowed down only to ScyllaDB
code and doesn't instrument seastar and testing code, this should give
a more accurate coverage reporting and also impact performance less, as
one example, the reactor loop in seastar will not be profiled (along
with everything else).
The targeted profiling is done with the help of the newly added
`coverage_sources.list` file which excludes all seastar sub directories
from the profiling.
Also an extra measure is taken to make sure that the seastar
library will not be linked with the coverage framework
(so it will not dump confusing empty profiles).
Some of the seastar headers are still going to be included in the
profile since they are indirectly included by profiled source files in
order to remove them from the final report a processing step on the
resulting profile will need to take place.
A note about expected performance impact:
It is expected to have minimal impact on performance since the
instrumentation adds counter increments without locking.
Ref: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fprofile-update
This means that the numbers themselves are less reliable but all covered
lines are guarantied to have at least non-zero value.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
If the joining node fails while handling the response from the
topology coordinator, it hangs even though it knows the join
operation has failed. Therefore, we ensure it shuts down in
this patch.
Additionally, we ensure that if the first join request response
was a rejection or the node failed while handling it, the
following acceptances by the (possibly different) coordinator
don't succeed. The node considers the join operation as failed.
We shouldn't add it to the cluster.
Fixesscylladb/scylladb#16333Closesscylladb/scylladb#16650
* github.com:scylladb/scylladb:
topology_coordinator: clarify warnings
raft topology: join: allow only the first response to be a succesful acceptance
storage_service: join_node_response_handler: fix indentation
raft topology: join: shut down a node on error in response handler
Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests.
This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest.
Closesscylladb/scylladb#16394
* github.com:scylladb/scylladb:
test:cql-pytest: change service levels intervals in tests
configure service levels interval
Refs #16757
Allows waiting for all previous and pending segment deletes to finish.
Useful if a caller of `discard_completed_segments` (i.e. a memtable
flush target) not only wants to ensure segments are clean and released,
but thoroughly deleted/recycled, and hence no treat to resurrecting
data on crash+restart.
Test included.
Closesscylladb/scylladb#16801
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for auth::role_or_anonymous,
and remove their operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16812
Currently to figure out if a topology request is complete a submitter
checks the topology state and tries to figure out from that the status
of the request. This is not exact. Lets look at rebuild handling for
instance. To figure out if request is completed the code waits for
request object to disappear from the topology, but if another rebuild
starts between the end of the previous one and the code noticing that
it completed the code will continue waiting for the next rebuild.
Another problem is that in case of operation failure there is no way to
pass an error back to the initiator.
This series solves those problems by assigning an id for each request and
tracking the status of each request in a separate table. The initiator
can query the request status from the table and see if the request was
completed successfully or if it failed with an error, which is also
evadable from the table.
The schema for the table is:
CREATE TABLE system.topology_requests (
id timeuuid PRIMARY KEY,
initiating_host uuid,
start_time timestamp,
done boolean,
error text,
end_time timestamp,
);
and all entries have TTL of one month.
The sstables replay_position in stats_metadata is
valid only on the originating node and shard.
Therefore, validate the originating host and shard
before using it in compaction or table truncate.
Fixes#10080
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#16550
And call on_internal_error if process_upload_dir
is called for tablets-enabled keyspace as it isn't
supported at the moment (maybe it could be in the future
if we make sure that the sstables are confined to tablets
boundaries).
Refs #12775Fixes#16743
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#16788
Now that we have explicit status for each request we may use it to
replace shutdown notification rpc. During a decommission, in
left_token_ring state, we set done to true after metadata barrier
that waits for all request to the decommissioning node to complete
and notify the decommissioning node with a regular barrier. At this
point the node will see that the request is complete and exit.
Instead of trying to guess if a request completed by looking into the
topology state (which is sometimes can be error prone) look at the
request status in the new topology_requests. If request failed report
a reason for the failure from the table.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for replica::memtable and
replica::memtable_entry, and remove their operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16793
The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass.
The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place.
Fixes#16598Closesscylladb/scylladb#16670
* github.com:scylladb/scylladb:
view: revert cleanup filter that doesn't work with tablets
mv: sleep a bit before view-update-generator restart
Local keyspaces do not need cleanup, and
keyspaces configured with tablets, where their
replication strategy is per-table do not support
cleanup.
In both cases, just skip their cleanup via the api.
Fixes#16738
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#16785
Provide a unique ID for each topology request and store it the topology
state machine. It will be used to index new topology requests table in
order to retrieve request status.
The table has the following schema and will be managed by raft:
CREATE TABLE system.topology_requests (
id timeuuid PRIMARY KEY,
initiating_host uuid,
start_time timestamp,
done boolean,
error text,
end_time timestamp,
);
In case of an request completing with an error the "error" filed will be non empty when "done" is set to true.
To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.
This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is
* `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration
* `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing
fixes: #16319Closesscylladb/scylladb#16364
* github.com:scylladb/scylladb:
code: Enable tablets if cluster feature is enabled
test: Turn off tablets feature by default
test: Move test_tablet_drain_failure_during_decommission to another suite
test/tablets: Enable tables for real on test keyspace
test/tablets: Make timestamp local
cql3: Add feature service to as_ks_metadata_update()
cql3: Add feature service to ks_prop_defs::as_ks_metadata()
cql3: Add feature service to get_keyspace_metadata()
cql: Add tablets on/off switch to CREATE KEYSPACE
cql: Move initial_tablets from REPLICATION to TABLETS in DDL
network_topology_strategy: Estimate initial_tablets if 0 is set
keyspace objects are heavyweight and copies are immediately our-of-date,
so copying them is bad.
Fix by deleting the copy constructor and copy assignment operator. One
call site is fixed. This call site is safe since the it's only used
for accessing a few attributes (introduced in f70c4127c6).
Closesscylladb/scylladb#16782
When the reader is currently paused, it is resumed, fast-forwarded, then
paused again. The fast forwarding part can throw and this will lead to
destroying the reader without it being closed first.
Add a try-catch surrounding this part in the code. Also mark
`maybe_pause()` and `do_pause()` as noexcept, to make it clear why
that part doesn't need to be in the try-catch.
Fixes: #16606Closesscylladb/scylladb#16630
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for service::cleanup_status,
and remove its operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16778
This commit removes support for CentOS 7
from the docs.
The change applies to version 5.4,so it
must be backported to branch-5.4.
Refs https://github.com/scylladb/scylla-enterprise/issues/3502
In addition, this commit removes the information
about Amazon Linux and Oracle Linux, unnecessarily added
without request, and there's no clarity over which versions
should be documented.
Closesscylladb/scylladb#16279
Tablet keyspaces have per/table range ownership, which cannot currently
be expressed in a DESC CLUSTER statement, which describes range
ownership in the current keyspace (if set). Until we figure out how to
represent range ownership (tablets) of all tables of a keyspace, we
disable range ownership for tablet keyspaces.
Fixes: #16483Closesscylladb/scylladb#16713
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&)
from test/boost/cql_auth_query_test.cc.
It prepares the test for removal of the templated helpers.
Such removal is one of goals of the referenced issue that is linked below.
Refs: #13245
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16758
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we
* define a formatter for `auth::resource` and friends,
* update their callers of `operator<<` to use `fmt::print()`.
* drop `operator<<`, as they are not used anymore.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16765
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for data_value, but its
its operator<<() is preserved as we are still using the generic
homebrew formatter for formatting std::vector, which in turn uses
operator<< of the element type.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16767
because the CMake-generated build.ninja is located under build/,
and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla,
instead of at build/$scylla_build_mode/scylla, so let's adapt to this
change accordingly.
we will promote this change to a shared place if we have similar
needs in other tests as well.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16775
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.
To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will make per-keyspace initial_tables option really
optional and turn tablets ON when the feature is ON. This will break all
other tests' assumptions, that they are testing vnodes replication. So
turn the feature off by default, tests that do need tables will need to
explicitly enable this feature on their own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When started cql_test_env creates a test keyspace. Some tablets test
cases create a table in this keyspace, but misuse the whole feature. The
thing is that while tablets feature is ON in those test cases, the
keyspace itself doesn _not_ have the initial_tables option and thus
tablets are not enabled for the ks' table for real. Currently test cases
work just because this table is only used as a transparent table ID
placeholder. If turning on tablets for the keyspace, several test cases
would get broken for two reasons.
First, the tables map will no longer be empty on test start.
Second, applying changes to tablet metadata may not be visible, becase
test case uses "ranom" timestamp, that can be less that the initial
metadata mutations' timestamp.
This patch fixes all three places:
1. enables tables for the test keyspace
2. removes assumption that the initial metadata is empty
3. uses large enough timestamp for subsequent mutations
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the user can do
CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }
to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.
Also one can do just
CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }
and let Scylla select the initial tablets value by its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch changes the syntax of enabling tablets from
CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }
to be
CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }
and updates all tests accordingly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>