In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).
So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command
The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields
Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)
Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile
The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13963
As described in https://github.com/scylladb/scylladb/issues/8638,
we're moving away from `SimpleStrategy`, in the future
it will become deprecated.
We should remove all uses of it and replace them
with `NetworkTopologyStrategy`.
This change replaces `SimpleStrategy` with
`NetworkTopologyStrategy` in all unit tests,
or at least in the ones where it was reasonable to do so.
Some of the tests were written explicitly to test the
`SimpleStrategy` strategy, or changing the keyspace from
`SimpleStrategy` to `NetworkTopologyStrategy`.
These tests were left intact.
It's still a feature that is supported,
even if it's slowly getting deprecated.
The typical way to use `NetworkTopologyStrategy` is
to specify a replication factor for each datacenter.
This could be a bit cumbersome, we would have to fetch
the list of datacenters, set the repfactors, etc.
Luckily there is another way - we can just specify
a replication factor to use for or each existing
datacenter, like this:
```cql
CREATE KEYSPACE {} WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1};
```
This makes the change rather straightforward - just replace all
instances of `'SimpleStrategy'', with `'NetworkTopologyStrategy'`.
Refs: https://github.com/scylladb/scylladb/issues/8638
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#13990
Into reset_to() and reset_to_zero(). The latter replaces `reset()` with
the default 0 resources argument, which was often called from noexcept
contexts. Splitting it out from `reset()` allows for a specialized
implementation that is guaranteed to be `noexcept` indeed and thus
peace of mind.
The forward_service.hh and raft_group0_client.hh can be replaced with
forward declarations. Few other files need their previously indirectly
included headers back.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13384
And propagate it down to where it is created. This will be used to add
trace points for semaphore related events, but this will come in the
next patches.
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.
Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
These methods will soon be retired (made private) so migrate away from
them. Consume memory through a permit instead. It is also safer this
way: all memory consumed through the permit is guaranteed to be released
when the permit is destroyed at the latest.
It is not type safe: has multiple limits passed to it as raw ints, as
well as other types that ints implicitly convert to. Furthermore the row
limit is passed in two separate fields (lower 32 bits and upper 32
bits). All this make this constructor a minefield for humans to use. We
have a safer constructor for some time but some users of the old one
remain. Move them to the safe one.
Soon, the currently two distinct types of queriers will be merged, as
the template parameter differentiating them will be gone. This will make
using type based overload for insert() impossible, as 2 out of the 3
types will be the same. Use different names instead.
"
Namely the query result writer and the reconcilable result builder, used
for building results for regular queries and mutation queries (used in
read repair) respectively.
With this, there are no users left for the v1 output of the compactor,
so we remove that, making the compactor v2 all-the-way (and simpler).
This means that for regular queries, a downgrade phase is eliminated
completely, as regular queries don't store range tombstone in their
result, so no need to convert them.
Tests: unit(dev, release, debug)
"
* 'result-builders-v2/v1' of https://github.com/denesb/scylla:
reconcilable_result_builder: remove v1 support
query_result_builder: remove v1 support
mutation_compactor: drop v1 related code-paths
mutation_compactor: drop v1 support altogether from the API
tree: migrate to the v2 consumer APIs
test/boost/mutation_test: remove v1 specific test code
querier: switch to v2 compactor output
reconcilable_result_builder: add v2 support
query_result_writer: add v2 support
query_result_builder: make consume(range_tombstone) noop
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.
With changes
real 29m14.051s
user 168m39.071s
sys 5m13.443s
Without changes
real 30m36.203s
user 175m43.354s
sys 5m26.376s
Closes#10194
The change is mostly mechanical: update all compactor instances to the
_v2 variant and update all call-sites, of which there is not that many.
As a consequence of this patch, queries -- both single-partition and
range-scans -- now do the v2->v1 conversion in the consumers, instead of
in the compactor.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
reverse reads where it is not possible to follow the convention of
passing the query schema which is reversed compared to that of the
mutations.
* Now that we don't depend on the mutations for the schema, we can lift
the restriction on mutations not being empty: this leads to safer
code. When the mutations parameter is empty, an empty reader is
created.
Add "make_" prefix to follow convention of similar reader factory
functions.
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
It's easy to forget about supplying the correct value for a parameter
when it has a default value specified. It's safer if 'production code'
is forced to always supply these parameters manually.
The default values were mostly useful in tests, where some parameters
didn't matter that much and where the majority of uses of the class are.
Without default values adding a new parameter is a pain, forcing one to
modify every usage in the tests - and there are a bunch of them. To
solve this, we introduce a new constructor which requires passing the
`for_tests` tag, marking that the constructor is only supposed to be
used in tests (and the constructor has an appropriate comment). This
constructor uses default values, but the other constructors - used in
'production code' - do not.
"
The patch set is an assorted collection of header cleanups, e.g:
* Reduce number of boost includes in header files
* Switch to forward declarations in some places
A quick measurement was performed to see if these changes
provide any improvement in build times (ccache cleaned and
existing build products wiped out).
The results are posted below (`/usr/bin/time -v ninja dev-build`)
for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX).
Before:
Command being timed: "ninja dev-build"
User time (seconds): 28262.47
System time (seconds): 824.85
Percent of CPU this job got: 3979%
Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2129888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1402838
Minor (reclaiming a frame) page faults: 124265412
Voluntary context switches: 1879279
Involuntary context switches: 1159999
Swaps: 0
File system inputs: 0
File system outputs: 11806272
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
After:
Command being timed: "ninja dev-build"
User time (seconds): 26270.81
System time (seconds): 767.01
Percent of CPU this job got: 3905%
Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2117608
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1400189
Minor (reclaiming a frame) page faults: 117570335
Voluntary context switches: 1870631
Involuntary context switches: 1154535
Swaps: 0
File system inputs: 0
File system outputs: 11777280
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The observed improvement is about 5% of total wall clock time
for `dev-build` target.
Also, all commits make sure that headers stay self-sufficient,
which would help to further improve the situation in the future.
"
* 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla:
transport: remove extraneous `qos/service_level_controller` includes from headers
treewide: remove evidently unneded storage_proxy includes from some places
service_level_controller: remove extraneous `service/storage_service.hh` include
sstables/writer: remove extraneous `service/storage_service.hh` include
treewide: remove extraneous database.hh includes from headers
treewide: reduce boost headers usage in scylla header files
cql3: remove extraneous includes from some headers
cql3: various forward declaration cleanups
utils: add missing <limits> header in `extremum_tracking.hh`
Currently `with_cql_test_env()` is equivalent to
`with_cql_test_env_thread()`, which resulted in many tests using the
former while really needing the latter and getting away with it. This
equivalence is incidental and will go away soon, so make sure all cql
test env using tests that expect to be run in a thread use the
appropriate variant.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210514141614.128213-1-bdenes@scylladb.com>
This commit conceptually reverts 4c8ab10. Said commit was meant to
prevent the scenario where memory-only permits -- those that don't pass
admission but still consume memory -- completely prevent the admission
of reads, possibly even causing a deadlock because a permit might even
blocks its own admission. The protection introduced by said commit
however proved to be very problematic. It made the status of resources
on the permit very hard to reason about and created loopholes via which
permits could accumulate without tracking or they could even leak
resources. Instead of continuing to patch this broken system, this
commit does away with this "protection" based on the observation that
deadlocks are now prevented anyway by the admission criteria introduced
by 0fe75571d9, which admits a read anyway when all the initial count
resources are available (meaning no admitted reader is alive),
regardless of availability of memory.
The benefits of this revert is that the semaphore now knows about all
the resources and is able to do its job better as it is not "lied to"
about resource by the permits. Furthermore the status of a permit's
resources is much simpler to reason about, there are no more loopholes
in unexpected state transitions to swallow/leak resources.
To prove that this revert is indeed safe, in the next commit we add
robust tests that stress test admission on a highly contested semaphore.
This patch also does away with the registered/admitted differentiation
of permits, as this doesn't make much sense anymore, instead these two
are unified into a single "active" state. One can always tell whether a
permit was admitted or not from whether it owns count resources anyway.
Close the _closing_gate to wait on background
close of dropped queries, and close all remaining queriers.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In addition to clear_inactive_reads, that's currently called when
the database object is destroyed, introduce a stop() method that will:
1. wait on all background closes of inactive_reads.
2. close all present inactive_reads and waits on their close.
3. signal waiters on the wait_list via broken() with a proper
exception indicating that the semaphore was closed.
In addition, assert in the semaphore's destructor
that it has no remaining inactive reads.
Stop must be called from whoever owns the r_c_s.
Mainly, from database::stop.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
`reader_concurrency_semaphore::register_inactive_read()` drops the
registered inactive read immediately if there is a resource shortage.
This is in effect a resource based eviction, so account it as such in
`querier::insert()`.
Register the inactive reader first with no
evict_notify_handler and ttl.
Those can be set later, only if registration succeeded.
Otherwise, as in the querier example, there is no need
to to place the querier in the index and erase it
on eviction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Calling unregister_inactive_read on the wrong semaphore is a blatant
bug so better call on_internal_error so it'd be easier to catch and fix.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
Currently inactive readers are stored in two different places:
* reader concurrency semaphore
* querier cache
With the latter registering its inactive readers with the former. This
is an unnecessarily complex (and possibly surprising) setup that we want
to move away from. This series solves this by moving the responsibility
if storing of inactive reads solely to the reader concurrency semaphore,
including all supported eviction policies. The querier cache is now only
responsible for indexing queriers and maintaining relevant stats.
This makes the ownership of the inactive readers much more clear,
hopefully making Benny's work on introducing close() and abort() a
little bit easier.
Tests: unit(release, debug:v1)
"
* 'unify-inactive-readers/v2' of https://github.com/denesb/scylla:
reader_concurrency_semaphore: store inactive readers directly
querier_cache: store readers in the reader concurrency semaphore directly
querier_cache: retire memory based cache eviction
querier_cache: delegate expiry to the reader_concurrency_semaphore
reader_concurrency_semaphore: introduce ttl for inactive reads
querier_cache: use new eviction notify mechanism to maintain stats
reader_concurrency_semaphore: add eviction notification facility
reader_concurrency_semaphore: extract evict code into method evict()
Use the thread_local seastar::testing::local_random_engine
in all seastar tests so they can be reproduced using
the --random-seed option.
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210112103713.578301-2-bhalevy@scylladb.com>
Clang does not yet implement p1091r3, which allows lambdas
to capture structured bindings. To accomodate it, don't
use structured bindings for variables that are later
captured.
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.
As not all read can be associated with one schema, the schema is allowed
to be null.
The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
The test currently uses a single permit shared between two simulated
reads (to wait admission twice). This is not a supported way of using a
permit and will stop working soon as we make the states the permit is in
more pronounced.
In the mutation source, which creates the reader for this test, the
global test semaphore's permit was passed to the created reader
(`tests::make_permit()`). This caused reader resources to be accounted
on the global test semaphore, instead of the local one the test creates.
Just forward the permit passed to the mutation sources to the reader to
fix this.
Not used yet, this patch does all the churn of propagating a permit
to each impl.
In the next patch we will use it to track to track the memory
consumption of `_buffer`.