Commit Graph

26669 Commits

Author SHA1 Message Date
Avi Kivity
e8e4456ec7 Merge 'Introduce per-service-level workload types and their first use-case - shedding in interactive workloads' from Piotr Sarna
This draft extends and obsoletes #8123 by introducing a way of determining the workload type from service level parameters, and then using this context to qualify requests for shedding.

The rough idea is that when the admission queue in the CQL server is hit, it might make more sense to start shedding surplus requests instead of accumulating them on the semaphore. The assumption that interactive workloads are more interested in the success rate of as many requests as possible, and hanging on a semaphore reduces the chances for a request to succeed. Thus, it may make sense to shed some requests to reduce the load on this coordinator and let the existing requests to finish.

It's a draft, because I only performed local guided tests. #8123 was followed by some experiments on a multinode cluster which I want to rerun first.

Closes #8680

* github.com:scylladb/scylla:
  test: add a case for conflicting workload types
  cql-pytest: add basic tests for service level workload types
  docs: describe workload types for service levels
  sys_dist_ks: fix redundant parsing in get_service_level
  sys_dist_ks: make get_service_level exception-safe
  transport: start shedding requests during potential overload
  client_state: hook workload type from service levels
  cql3: add listing service level workload type
  cql3: add persisting service level workload type
  qos: add workload_type service level parameter
2021-05-27 17:01:56 +03:00
Avi Kivity
f3e8e625c0 Update tools/java submodule (toppartitions single jmx call)
* tools/java fd92603b99...599b2368d6 (1):
  > toppartitions: Fix toppartitions to only jmx once

Ref #8459.
2021-05-27 16:57:57 +03:00
Konstantin Osipov
52f7ff4ee4 raft: (testing) update copyright
An incorrect copyright information was copy-pasted
from another test file.

Message-Id: <20210525183919.1395607-1-kostja@scylladb.com>
2021-05-27 15:47:49 +03:00
Nadav Har'El
92b7a84e90 secondary index: in error message, call UDT as UDT
It is forbidden to create a secondary index of a column which includes in
any way the "duration" type. This includes a UDT which including duration.
Our code attempted to print in this case the message "Secondary indexes
are not supported on UDTs containing durations" - but because we tested
for tuples first, and UDTs are also tuples - we got the message about
tuples.

By changing the order of the tests, we get the most specific (and
useful) error message.

Fixes #8724.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210526201042.642550-1-nyh@scylladb.com>
2021-05-27 15:46:30 +03:00
Piotr Sarna
99f356d764 test: add a case for conflicting workload types
The test case verifies that if several workload types are effective
for a single role, the conflict resolution is well defined.
2021-05-27 14:31:36 +02:00
Piotr Sarna
01b7e445f9 cql-pytest: add basic tests for service level workload types
The test cases check whether it's possible to declare workload
type for a service level and if its input is validated.
2021-05-27 14:31:36 +02:00
Piotr Sarna
54a5d4516c docs: describe workload types for service levels
A paragraph about workload types is added to docs/service_levels.md
2021-05-27 14:31:36 +02:00
Piotr Sarna
d45574ed28 sys_dist_ks: fix redundant parsing in get_service_level
The routine used for getting service level information already
operates on the service level name, but the same information is
also parsed once more from a row from an internal table.
This parsing is redundant, so it's hereby removed.
2021-05-27 14:31:26 +02:00
Piotr Sarna
7faba19605 sys_dist_ks: make get_service_level exception-safe
In order to avoid killing the node if a parsing error occurs,
the routine which fetches service level information is made
exception-safe.
2021-05-27 14:31:25 +02:00
Pavel Emelyanov
d2442a1bb3 tests: Ditch storage_service_for_tests
The purpose of the class in question is to start sharded storage
service to make its global instance alive. I don't know when exactly
it happened but no code that instantiates this wrapper really needs
the global storage service.

Ref: #2795
tests: unit(dev), perf_sstable(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210526170454.15795-1-xemul@scylladb.com>
2021-05-27 14:39:13 +03:00
Piotr Sarna
cb27ebe61d transport: start shedding requests during potential overload
This commit implements the following overload prevention heuristics:
if the admission queue becomes full, a timer is armed for 50ms.
If any of the ongoing requests finishes, the timer is disarmed,
but if that doesn't happen, the server goes into shedding mode,
which means that it reads new requests from the socket and immediately
drops them until one of the ongoing requests finishes.
This heuristics is not recommended for OLAP workloads,
so it is applied only if the session declared itself as
interactive (via service level's workload_type parameter).
2021-05-27 13:02:22 +02:00
Piotr Sarna
409c67b1b4 client_state: hook workload type from service levels
The client state is now aware of its workload type derived
from its attached service level.
2021-05-27 13:02:22 +02:00
Piotr Sarna
762e2f48f2 cql3: add listing service level workload type
The workload type information is now presented in the output
of LIST SERVICE LEVEL and LIST ALL SERVICE LEVELS statements.
2021-05-27 13:02:22 +02:00
Piotr Sarna
4816678eb6 cql3: add persisting service level workload type
The workload type information can now be set via CQL
and it's persisted in the distributed system table.
2021-05-27 13:02:22 +02:00
Piotr Sarna
578543603d qos: add workload_type service level parameter
The workload type is currently one of three values:
 - unspecified
 - interactive
 - batch

By defining the workload type, the service level makes it easier
for other components to decide what to do in overload scenarios.
E.g. if the workload is interactive, requests can be shed earlier,
while if it's batched (or unspecified), shedding does not take place.
Conversely, batch workloads could accept long full scan operations.
2021-05-27 13:02:22 +02:00
Dejan Mircevski
b54872fd95 auth: Remove const from role_manager methods
Some subclasses want to maintain state, which constness needlessly precludes.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #8721
2021-05-27 11:27:38 +03:00
Nadav Har'El
97e827e3e1 secondary index: fix regression in CREATE INDEX IF NOT EXISTS
The recent commit 0ef0a4c78d added helpful
error messages in case an index cannot be created because the intended
name of its materialized view is already taken - but accidentally broke
the "CREATE INDEX IF NOT EXISTS" feature.

The checking code was correct, but in the wrong place: we need to first
check maybe the index already exists and "IF NOT EXISTS" was chosen -
and only do this new error checking if this is not the case.

This patch also includes a cql-pytest test for reproducing this bug.
The bug is also reproduced by the translated Cassandra unit tests
    cassandra_tests/validation/entities/secondary_index_test.py::
    testCreateAndDropIndex
and this is how I found this bug. After these patch, all these tests
pass.

Fixes #8717.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210526143635.624398-1-nyh@scylladb.com>
2021-05-27 09:10:41 +02:00
Asias He
72cc596842 repair: Wire off-strategy compaction for regular repair
We have enabled off-strategy compaction for bootstrap, replace,
decommission and removenode operations when repair based node operation
is enabled. Unlike node operations like replace or decommission, it is
harder to know when the repair of a table is finished because users can
send multiple repair requests one after another, each request repairing
a few token ranges.

This patch wires off-strategy compaction for regular repair by adding
a timeout based automatic off-strategy compaction trigger mechanism.
If there is no repair activity for sometime, off-strategy compaction
will be triggered for that table automatically.

Fixes #8677

Closes #8678
2021-05-26 11:41:27 +03:00
Konstantin Osipov
ac43941f17 rpc: don't include an unused header (raft_services.hh)
Message-Id: <20210525183919.1395607-7-kostja@scylladb.com>
2021-05-26 11:07:44 +03:00
Konstantin Osipov
7ca4ffc309 system_keyspace: coroutinize db::system_keyspace::setup()
Message-Id: <20210525183919.1395607-19-kostja@scylladb.com>
2021-05-26 11:06:21 +03:00
Avi Kivity
e2e723cc4c build: enable -Wrange-loop-construct warning
This warning triggers when a range for ("for (auto x : range)") causes
non-trivial copies, prompting the developer to replace with a capture
by reference. A few minor violations in the test suite are corrected.

Closes #8699
2021-05-26 10:32:56 +03:00
Avi Kivity
3896e35897 Merge 'storage_service: Respect --enable-repair-based-node-ops flag during removenode' from Asias He
In commit 829b4c1 (repair: Make removenode safe by default), removenode
was changed to use repair based node operations unconditionally. Since
repair based node operations is not enabled by default, we should
respect the flag to use stream to sync data if the flag is false.

Fixes #8700

Closes #8701

* github.com:scylladb/scylla:
  storage_service: Add removenode_add_ranges helper
  storage_service: Respect --enable-repair-based-node-ops flag during removenode
2021-05-26 10:32:56 +03:00
Avi Kivity
e9c940dbbc Merge 'Commitlog: Handle disk usage and disk footprint discrepancies, ensuring we flush when needed' from Calle Wilund
Fixes #8270

If we have an allocation pattern where we leave large parts of segments "wasted" (typically because the segment has empty space, but cannot hold the mutation being added), we can have a disk usage that is below threshold, yet still get a disk _footprint_ that is over limit causing new segment allocation to stall.

We need to take a few things into account:
1.) Need to include wasted space in the threshold check. Whether or not disk is actually used does not matter here.
2.) If we stall a segment alloc, we should just flush immediately. No point in waiting for the timer task.
3.) Need to adjust the thresholds a bit. Depending on sizes, we should probably consider start flushing once we've used up space enough to be in the last available segment, so a new one is hopefully available by the time we hit the limit.

Also fix edge case (for tests), when we have too few segment to have an active one (i.e. need flush everything).

Closes #8695

* github.com:scylladb/scylla:
  commitlog_test: Add test case for usage/disk size threshold mismatch
  commitlog: Flush all segments if we only have one.
  commitlog: Always force flush if segment allocation is waiting
  commitlog: Include segment wasted (slack) size in footprint check
  commitlog: Adjust (lower) usage threshold
2021-05-25 18:34:29 +03:00
Calle Wilund
a96433c684 commitlog_test: Add test case for usage/disk size threshold mismatch
Refs #8270

Tries to simulate case where we mismatch segments usage with actual
disk footprint and fail to flush enough to allow segment recycling
2021-05-25 12:43:12 +00:00
Calle Wilund
bf0a91b566 commitlog: Flush all segments if we only have one.
Handle test cases with borked config so we don't deadlock
in cases where we only have one segment in a commitlog
2021-05-25 12:43:12 +00:00
Calle Wilund
8ce836209b commitlog: Always force flush if segment allocation is waiting
Refs #8270

If segement allocation is blocked, we should bypass all thresholds
and issue a flush of as much as possible.
2021-05-25 12:43:12 +00:00
Calle Wilund
e34ed30178 commitlog: Include segment wasted (slack) size in footprint check
Refs #8270

Since segment allocation looks at actual disk footprint, not active,
the threshold check in timer task should include slack space so we
don't mistake sparse usage for space left.
2021-05-25 12:43:12 +00:00
Calle Wilund
ec40207e7f commitlog: Adjust (lower) usage threshold
Refs #8270

Try to ensure we issue a flush as soon as we are allocating in the
last allowable segment, instead of "half through". This will make
flushing a little more eager, but should reduce latencies created
by waiting for segment delete/recycle on heavy usage.
2021-05-25 12:43:12 +00:00
Benny Halevy
6144656b25 table: seal_active_memtable: update stats also on the error path
Currently the pending (memtables) flushes stats are adjusted back
only on success, therefore they will "leak" on error, so move
use a .then_wrapped clause to always update the stats.

Note that _commitlog->discard_completed_segments is still called
only on success, and so is returning the previous_flush future.

Test: unit(dev)
DTest: alternator_tests.py:AlternatorTest.test_batch_with_auto_snapshot_false(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210525055336.1190029-2-bhalevy@scylladb.com>
2021-05-25 12:51:54 +02:00
Benny Halevy
d46958d3ce phased_barrier: advance_and_await: abort on allocation failure
Currently, advance_and_wait() allocates a new gate
which might fail.  Rather than returning this failure
as an exceptional future - which will require its callers
to handle that failure, keep the function as noexcept and
let an exception from make_lw_shared<gate>() terminate the program.

This makes the function "fail-free" to its callers,
in particular, when called from the table::stop() path where
we can't do much about these failures and we require close/stop
functions to always succeed.

The alternative of make the allocation of a new gate optional
and covering from it in start() is possible but was deemed not
worth it as it will add complexity and cost to start() that's
called on the common, hot, path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210525055336.1190029-1-bhalevy@scylladb.com>
2021-05-25 12:50:59 +02:00
Avi Kivity
e391e4a398 test: serialized_action_test: prevent false-positive timeout in test_phased_barrier_reassignment
test_phased_barrier_reassignment has a timeout to prevent the test from
hanging on failure, but it occastionally triggers in debug mode since
the timeout is quite low (1ms). Increase the timeout to prevent false
positives. Since the timeout only expires if the test fails, it will
have no impact on execution time.

Ref #8613

Closes #8692
2021-05-25 11:20:18 +02:00
Benny Halevy
3ad0f156b9 memtable_list: request_flush: wait on pending flushes also when empty()
In https://github.com/scylladb/scylla/issues/8609,
table::stop() that is called from database::drop_column_family
is expected to wait on outstanding flushes by calling
_memtable->request_flush(), but the memtable_list is considered
empty() at this point as it has a single empty memtable,
so request_flush() returns a ready future, without waiting
on outstanding flushes. This change replaces the call to
request_flush with flush().

Fix that by either returning _flush_coalescing future
that resolves when the memtable is sealed, if available,
or go through the get_flush_permit and
_dirty_memory_manager->flush_one song and dance, even though
the memtable is empty(), as the latter waits on pending flushes.

Fixes #8609

Test: unit(dev)
DTest: alternator_tests.py:AlternatorTest.test_batch_with_auto_snapshot_false(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210524143438.1056014-1-bhalevy@scylladb.com>
2021-05-25 11:19:51 +02:00
Kamil Braun
d71513d814 abstract_replication_strategy: avoid reactor stalls in get_address_ranges and friends
The algorithm used in `get_address_ranges` and `get_range_addresses`
calls `calculate_natural_endpoints` in a loop; the loop iterates over
all tokens in the token ring. If the complexity of a particular
implementation of `calculate_natural_endpoints` is large - say `θ(n)`,
where `n` is the number of tokens - this results in an `θ(n^2)`
algorithm (or worse). This case happens for `Everywhere` replication strategy.

For small clusters this doesn't matter that much, but if `n` is, say, `20*255`,
this may result in huge reactor stalls, as observed in practice.

We avoid these stalls by inserting tactical yields. We hope that
some day someone actually implements a subquadratic algortihm here.

The commit also adds a comment on
`abstract_replication_strategy::calculate_natural_endpoints` explaining
that the interface does not give a complexity guarantee (at this point);
the different implementations have different complexities.

For example, `Everywhere` implementation always iterates over all tokens
in the token ring, so it has `θ(n)` worst and best case complexity.
On the other hand, `NetworkTopologyStrategy` implementation usually
finishes after visiting a small part of the token ring (specifically,
as soon as it finds a token for each node in the ring) and performs
a constant number of operations for each visited token on average,
but theoretically its worst case complexity is actually `O(n + k^2)`,
where `n` is the number of all tokens and `k` is the number of endpoints
(the `k^2` appears since for each endpoint we must perform finds and
inserts on `unordered_set` of size `O(k)`; `unordered_set` operations
have `O(1)` average complexity but `O(size of the set)` worst case
complexity).

Therefore it's not easy to put any complexity guarantee in the interface
at this point. Instead, we say that:
- some implementations may yield - if their complexities force us to do so
- but in general, there is no guarantee that the implementation may
  yield - e.g. the `Everywhere` implementation does not yield.

Fixes #8555.

Closes #8647
2021-05-25 11:53:28 +03:00
Raphael S. Carvalho
ee39eb9042 sstables: Fix slow off-strategy compaction on STCS tables
Off-strategy compaction on a table using STCS is slow because of
the needless write amplification of 2. That's because STCS reshape
isn't taking advantage of the fact that sstables produced by
a repair-based operation are disjoint. So the ~256 input sstables
were compacted (in batches of 32) into larger sstables, which in
turn were compacted into even larger ones. That write amp is very
significant on large data sets, making the whole operation 2x
slower.

Fixes #8449.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210524213426.196407-1-raphaelsc@scylladb.com>
2021-05-25 11:24:42 +03:00
Asias He
70147dcb5a storage_service: Add removenode_add_ranges helper
Share the code between restore_replica_count and
removenode_with_stream to reduce duplication.

Refs #8700
2021-05-25 10:44:31 +08:00
Asias He
a285bd28e2 storage_service: Respect --enable-repair-based-node-ops flag during removenode
In commit 829b4c1 (repair: Make removenode safe by default), removenode
was changed to use repair based node operations unconditionally. Since
repair based node operations is not enabled by default, we should
respect the flag to use stream to sync data if the flag is false.

Fixes #8700
2021-05-25 10:42:58 +08:00
Avi Kivity
948e2c0b36 utils: config_file: delete unneeded template instantation of operator<<()
config_file.cc instantiates std::istream& std::operator>>(std::istream&,
std::unordered_map<seastar::sstring, seastar::sstring>&), but that
instantiation is ignored since config_file_impl.hh specializes
that signature. -Winstantiation-after-specialization warns about it,
so re-enable it now that the code base is clean.

Also remove the matching "extern template" declaration, which has no
definition any more.

Closes #8696
2021-05-24 18:34:45 +03:00
Avi Kivity
60fb224171 Update seastar submodule
* seastar 28dddd2683...f0f28d07e1 (4):
  > httpd: allow handler to not read an empty content
Fixes #8691.
  > compat: source_location: implement if no std or experimental are available
  > compat: source_location: declare using in seastar::compat namespace
  > perftune.py: fix a bug in mlx4 IRQs names matching pattern
2021-05-24 17:44:08 +03:00
Piotr Sarna
95c6ec1528 Merge 'test/cql-pytest: clean up tests to run on Cassandra' from Nadav Har'El
To keep our cql-pytest tests "correct", we should strive for them to pass on
Cassandra - unless they are testing a Scylla-only feature or a deliberate
difference between Scylla and Cassandra - in which case they should be marked
"scylla-only" and cause such tests to be skipped when running on Cassandra.

The following few small patches fix a few cases where our tests we failing on
Cassandra. In one case this even found a bug in the test (a trivial Python
mistake, but still).

Closes #8694

* github.com:scylladb/scylla:
  test/cql-pytest: fix python mistake in an xfailing test
  test/cql-pytest: mark some tests with scylla-only
  test/cql-pytest: clean up test_create_large_static_cells_and_rows
2021-05-24 16:42:01 +02:00
Avi Kivity
789757a692 Merge 'cql3: represent lists as chunked_vector instead of std::vector' from Michał Chojnowski
The cql3 layer manipulates lists as `std::vector`s (of `managed_bytes_opt`). Since lists can be arbitrarily large, let's use chunked vectors there to prevent potentially large contiguous allocations.

Closes #8668

* github.com:scylladb/scylla:
  cql3: change the internal type of tuples::in_value from std::vector to chunked_vector
  cql3: change the internal type of lists::value from std::vector to chunked_vector
  cql3: in multi_item_terminal, return the vector of items by value
2021-05-24 17:19:45 +03:00
Nadav Har'El
edc2c65552 Merge 'Fix service level negative timeouts' from Piotr Sarna
This series fixes a minor validation issue with service level timeouts - negative values were not checked. This bug is benign because negative timeouts act just like a 0s timeout, but the original series claimed to validate against negative values, so it's hereby fixed.
More importantly however, this series follows by enabling cql-pytest to run service level tests and provides a first batch of them, including a missing test case for negative timeouts.
The idea is similar to what we already have in alternator test suite - authentication is unconditionally enabled, which doesn't affect any existing tests, but at the same time allows writing test cases which rely on authentication - e.g. service levels.

Closes #8645

* github.com:scylladb/scylla:
  cql-pytest: introduce service level test suite
  cql-pytest: add enabling authentication by default
  qos: fix validating service level timeouts for negative values
2021-05-24 16:30:13 +03:00
Tomasz Grabiec
b1821c773f Merge "raft: basic RPC module testing" from Pavel Solodovnikov
Now RPC module has some basic testing coverage to
make sure RPC configuration is updated appropriately
on configuration changes (i.e. `add_server` and
`remove_server` are called when appropriate).

The test suite currenty consists of the following
test-cases:
 * Loading server instance with configuration from a snapshot.
 * Loading server instance with configuration from a log.
 * Configuration changes (remove + add node).
 * Leader elections don't lead to RPC configuration changes.
 * Voter <-> learner node transitions also don't change RPC
   configuration.
 * Reverting uncommitted configuration changes updates
   RPC configuration accordingly (two cases: revert to
   snapshot config or committed state from the log).

A few more refactorings are made along the way to be
able to reuse some existing functions from
`replication_test` in `rpc_test` implementation.

Please note, though, that there are still some functions
that are borrowed from `replication_test` but not yet
extracted to common helpers.

This is mostly because RPC tests doesn't need all
the complexity that `replication_test` has, thus,
some helpers are copied in a reduced form.

It would take some effort to refactor these bits to
fit both `replication_test` and `rpc_test` without
sacrificing convenience.
This will probably be addressed in another series later.

* manmanson/raft-rpc-tests-v9-alt3:
  raft: add tests for RPC module
  test: add CHECK_EVENTUALLY_EQUAL utility macro
  raft: replication_test: reset test rpc network between test runs
  raft: replication_test: extract tickers initialization into a separate func
  raft: replication_test: support passing custom `apply_fn` to `change_configuration()`
  raft: replication_test: introduce `test_server` aggregate struct
  raft: replication_test: support voter<->learner configuration changes
  raft: remove duplicate `create_command` function from `replication_test`
  raft: avoid 'using' statements in raft testing helpers header
2021-05-24 14:44:37 +02:00
Benny Halevy
56d3cb514a sstables: parse statistics: improve error handling
Properly return malformed_sstable_exception if the
statistics file fails to parse.

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210524113808.973951-1-bhalevy@scylladb.com>
2021-05-24 15:12:48 +03:00
Nadav Har'El
5da0ad2ebc Merge branch 'coverage-py-missing-features/v1' of https://github.com/denesb/scylla into next
This patchset adds the missing features noted by the patchset
introducing it, namely:
* The ability to run a test through `coverage.py`, automating the entire
  process of setting up the environment, running the test and generating
  the report. This is possible with the new `--run` command line
  argument. It supports either generating a report immediately after
  running the provided test or just doing the running part, allowing the
  user to generate the report after having run all the tests they wanted
  to.
* A tweakable verbosity level.

It is also possible to specify a subset of the profiling data as input
for the report.
The documentation was also completed, with examples for all the
intended uses-cases.
With these changes, `coverage.py` is considered mature, the remaining
rough edges being located in other scripts (`tests.py` and
`configure.py`).
It is now possible to generate a coverage report for any test desired.

Also on: https://github.com/denesb/scylla.git
coverage-py-missing-features/v1

Botond Dénes (5):
  scripts/coverage.py: allow specifying the input files to generate the
    report from
  scripts/coverage.py: add capability of running a test directly
  scripts/coverage.py: add --verbose parameter
  scripts/coverage.py: document intended uses-cases
  HACKING.md: redirect to ./coverage.py for more details

 scripts/coverage.py | 143 +++++++++++++++++++++++++++++++++++++++-----
 HACKING.md          |  19 +-----
 2 files changed, 129 insertions(+), 33 deletions(-)
2021-05-24 14:54:28 +03:00
Avi Kivity
50f3bbc359 Merge "treewide: various header cleanups" from Pavel S
"
The patch set is an assorted collection of header cleanups, e.g:
* Reduce number of boost includes in header files
* Switch to forward declarations in some places

A quick measurement was performed to see if these changes
provide any improvement in build times (ccache cleaned and
existing build products wiped out).

The results are posted below (`/usr/bin/time -v ninja dev-build`)
for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX).

Before:

	Command being timed: "ninja dev-build"
	User time (seconds): 28262.47
	System time (seconds): 824.85
	Percent of CPU this job got: 3979%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2129888
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1402838
	Minor (reclaiming a frame) page faults: 124265412
	Voluntary context switches: 1879279
	Involuntary context switches: 1159999
	Swaps: 0
	File system inputs: 0
	File system outputs: 11806272
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

After:

	Command being timed: "ninja dev-build"
	User time (seconds): 26270.81
	System time (seconds): 767.01
	Percent of CPU this job got: 3905%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2117608
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1400189
	Minor (reclaiming a frame) page faults: 117570335
	Voluntary context switches: 1870631
	Involuntary context switches: 1154535
	Swaps: 0
	File system inputs: 0
	File system outputs: 11777280
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

The observed improvement is about 5% of total wall clock time
for `dev-build` target.

Also, all commits make sure that headers stay self-sufficient,
which would help to further improve the situation in the future.
"

* 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla:
  transport: remove extraneous `qos/service_level_controller` includes from headers
  treewide: remove evidently unneded storage_proxy includes from some places
  service_level_controller: remove extraneous `service/storage_service.hh` include
  sstables/writer: remove extraneous `service/storage_service.hh` include
  treewide: remove extraneous database.hh includes from headers
  treewide: reduce boost headers usage in scylla header files
  cql3: remove extraneous includes from some headers
  cql3: various forward declaration cleanups
  utils: add missing <limits> header in `extremum_tracking.hh`
2021-05-24 14:24:20 +03:00
Yaron Kaikov
dd453ffe6a install.sh: Setup aio-max-nr upon installation
This is a follow up change to #8512.

Let's add aio conf file during scylla installation process and make sure
we also remove this file when uninstall Scylla

As per Avi Kivity's suggestion, let's set aio value as static
configuration, and make it large enough to work with 500 cpus.

Closes #8650
2021-05-24 14:24:20 +03:00
Takuya ASADA
3d307919c3 scylla_raid_setup: use /dev/disk/by-uuid to specify filesystem
Currently, var-lib-scylla.mount may fails because it can start before
MDRAID volume initialized.
We may able to add "After=dev-disk-by\x2duuid-<uuid>.device" to wait for
device become available, but systemd manual says it automatically
configure dependency for mount unit when we specify filesystem path by
"absolute path of a device node".

So we need to replace What=UUID=<uuid> to What=/dev/disk/by-uuid/<uuid>.

Fixes #8279

Closes #8681
2021-05-24 14:24:08 +03:00
Nadav Har'El
5206665b15 test/cql-pytest: fix python mistake in an xfailing test
The xfailing test cassandra_tests/validation/entities/collections_test.py::
testSelectionOfEmptyCollections had a Python mistake (using {} instead
of set() for an empty set), which resulted in its failure when run
against Cassandra. After this patch it passes on Cassandra and fails on
Scylla - as expected (this is why it is marked xfail).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-05-24 13:14:54 +03:00
Nadav Har'El
f26b31e950 test/cql-pytest: mark some tests with scylla-only
Tests which are known to test a Scylla-only feature (such as CDC)
or to rely on a known and difference between Scylla and Cassandra
should be marked "scylla-only", so they are skipped when running
the tests against Cassandra (test/cql-pytest/run-cassandra) instead
of reporting errors.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-05-24 13:03:48 +03:00
Nadav Har'El
c8117584e3 test/cql-pytest: clean up test_create_large_static_cells_and_rows
The test test_create_large_static_cells_and_rows had its own
implementation of "nodetool flush" using Scylla's REST API.
Now that we have a nodetool.flush() function for general use in
cql-pytest, let's use it and save a bit of duplication.

Another benefit is that now this test can be run (and pass) against
Cassandra.

To allow this test to run on Cassandra, I had to remove a
"USING TIMEOUT" which wasn't necessary for this test, and is
not a feature supported by Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-05-24 12:31:51 +03:00