Commit Graph

307 Commits

Author SHA1 Message Date
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
fe479aca1d reader_permit: add timeout member
To replace the timeout parameter passed
to flat_mutation_reader methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Asias He
040b626235 table: Fix is_shared assert for load and stream
The reader is used by load and stream to read sstables from the upload
directory which are not guaranteed to belong to the local shard.

Using the make_range_sstable_reader instead of
make_local_shard_sstable_reader.

Tests:

backup_restore_tests.py:TestBackupRestore.load_and_stream_using_snapshot_test
backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_2_test
backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_1_test
migration_test.py:TestLoadAndStream.load_and_stream_asymmetric_cluster_test
migration_test.py:TestLoadAndStream.load_and_stream_decrease_cluster_test
migration_test.py:TestLoadAndStream.load_and_stream_frozen_pk_test
migration_test.py:TestLoadAndStream.load_and_stream_increase_cluster_test
migration_test.py:TestLoadAndStream.load_and_stream_primary_replica_only_test

Fixes #9173

Closes #9185
2021-08-11 12:18:40 +03:00
Asias He
4ae6eae00a table: Get rid of table::run_compaction helper
The table::run_compaction is a trivial wrapper for
table::compact_sstables.

We have lots of similar {start, trigger, run}_compaction functions.
Dropping the run_compaction wrapper to reduce confusion.

Closes #9161
2021-08-09 14:02:54 +03:00
Tomasz Grabiec
c3ada1a145 Merge "count row (sstables/row cache/memtables) and range (memtables) tombstone reads" from Michael
Fixes #7749.
2021-08-01 23:13:18 +02:00
Michael Livshin
64dca1fef9 memtables: count read row tombstones
Refs #7749.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:41:11 +03:00
Michael Livshin
2ee9f1b951 memtables: add metric and accounter for range tombstone reads
Refs #7749.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:41:11 +03:00
Nadav Har'El
6c27000b98 Merge 'Propagate exceptions without throwing' from Piotr Sarna
NOTE: this series depends on a Seastar submodule update, currently queued in next: 0ed35c6af052ab291a69af98b5c13e023470cba3

In order to avoid needless throwing, exceptions are passed
directly wherever possible. Two mechanisms which help with that are:
 1. `make_exception_future<>` for futures
 2. `co_return coroutine::exception(...)` for coroutines
    which return `future<T>` (the mechanism does not work for `future<>`
    without parameters, unfortunately)

Tests: unit(release)

Closes #9079

* github.com:scylladb/scylla:
  system_keyspace: pass exceptions without throwing
  sstables: pass exceptions without throwing
  storage_proxy: pass exceptions without throwing
  multishard_mutation_query: pass exceptions without throwing
  client_state: pass exceptions without throwing
  flat_mutation_reader: pass exceptions without throwing
  table: pass exceptions without throwing
  commitlog: pass exceptions without throwing
  compaction: pass exceptions without throwing
  database: pass exceptions without throwing
2021-08-01 16:47:47 +03:00
Raphael S. Carvalho
eb16268768 table: Guarantee serialization of every sstable set updates
Continuing the work from e4eb7df1a1, let's guarantee
serialization of sstable set updates by making all sites acquire
the mutation permit. Then table no longer rely on serialization
mechanism of row cache's update functions.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210728174740.78826-1-raphaelsc@scylladb.com>
2021-07-29 10:42:18 +03:00
Avi Kivity
df4d77e857 table: simplify generate_and_propagate_view_updates exception handling
We have both try/catch and handle_exception() to ignore exceptions.
Try/catch is enough, so remove handle_exception().

Closes #9011
2021-07-27 14:08:30 +02:00
Piotr Sarna
26ae74524a table: pass exceptions without throwing
In order to avoid needless throwing, exceptions are passed
directly wherever possible. Two mechanisms which help with that are:
 1. make_exception_future<> for futures
 2. co_return coroutine::exception(...) for coroutines
    which return future<T> (the mechanism does not work for future<>
    without parameters, unfortunately)
2021-07-26 17:04:18 +02:00
Raphael S. Carvalho
e4eb7df1a1 table: Make correctness of concurrent sstable list update robust
Today, table relies on row_cache::invalidate() serialization for
concurrent sstable list updates to produce correct results.
That's very error prone because table is relying on an implementation
detail of invalidate() to get things right.
Instead, let's make table itself take care of serialization on
concurrent updates.
To achieve that, sstable_list_builder is introduced. Only one
builder can be alive for a given table, so serialization is guaranteed
as long as the builder is kept alive throughout the update procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210721001716.210281-1-raphaelsc@scylladb.com>
2021-07-21 16:45:30 +03:00
Raphael S. Carvalho
aad72289e2 table: Kill load_sstable()
That function is dangerously used by distributed loader, as the latter
was responsible for invalidating cache for new sstable.
load_sstable() is an unsafe alternative to
add_sstable_and_update_cache() that should never have been used by
the outside world. Instead, let's kill it and make loader use
the safe alternative instead.
This will also make it easier to make sure that all concurrent updates
to sstable set are properly serialized.

Additionally, this may potentially reduce the amount of data evicted
from the cache, when the sstables being imported have a narrow range,
like high level sstables imported from a LCS table. Unlikely but
possible.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210721131949.26899-1-raphaelsc@scylladb.com>
2021-07-21 16:21:42 +03:00
Raphael S. Carvalho
841e9227f9 table: Document the serialization requirement on sstable set rebuild
In order to avoid data loss bugs, that could come due to lack of
serialization when using the preemptable build_new_sstable_list(),
let's document the serialization requirement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210714201301.188622-1-raphaelsc@scylladb.com>
2021-07-17 18:09:00 +03:00
Piotr Sarna
3d816b7c16 Merge 'Move the reader concurrency semaphore in front of the cache' from Botond
This patchset combines two important changes to the way reader permits
are created and admitted:
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) Currently permits are created before the read is started, but they
only wait for admission when going to the disk. This leaves the
resources consumption of cache and memtables reads unbounded, possibly
leading to OOM (rare but happens). This series changes this that permits
are admitted at the moment they are creating making admission up-front
-- at least those reads that pass admission at all (some don't).

(2) Admission currently is based on availability of resources. We have a
certain amount of memory available, which derived from the memory
available to the shard, as well a hardcoded count resource. Reads are
admitted when a count and a certain amount (base cost) of memory is
available. This patchset adds a new aspect to this admission process
beyond the existing resource availability: the number of used/blocked
reads. Namely it only admits new reads if in addition to the necessary
amount of resources being available, all currently used readers are
blocked. In other words we only admit new reads if all currently
admitted reads requires something other than CPU to progress. They are
either waiting on I/O, a remote shard, or attention from their consumers
(not used currently).

The reason for making these two changes at the same time is that
up-front admission means cache reads now need to obtain a permit too.
For cache reads the optimal concurrency is 1. Anything above that just
increases latency (without increasing throughput). So we want to make sure
that if a cache reader hits it doesn't get any competition for CPU and
it can run to completion. We admit new reads only if the read misses and
has to go to disk.

A side effect of these changes is that the execution stages from the
replica-side read path are replaced with the reader concurrency
semaphore as an execution stage. This is necessary due to bad
interaction between said execution stages and up-front admission. This
has an important consequence: read timeouts are more strictly enforced
because the execution stage doesn't have a timeout so it can execute
already timed-out reads too. This is not the case with the semaphore's
queue which will drop timed-out reads. Another consequence is that, now
data and mutation reads share the same execution stage, which increases
its effectiveness, on the other hand system and user reads don't
anymore.

Fixes: #4758
Fixes: #5718

Tests: unit(dev, release, debug)

* 'reader-concurrency-semaphore-in-front-of-the-cache/v5.3' of https://github.com/denesb/scylla: (54 commits)
  test/boost/reader_concurrency_semaphore_test: add used/blocked test
  test/boost/reader_concurrency_semaphore_test: add admission test
  reader_permit: add operator<< for reader_resources
  reader_concurrency_semaphore: add reads_{admitted,enqueued} stats
  table: make_sstable_reader(): fix indentation
  table: clean up make_sstable_reader()
  database: remove now unused query execution stages
  mutation_reader: remove now unused restricting_reader
  sstables: sstable_set: remove now unused make_restricted_range_sstable_reader()
  reader_permit: remove now unused wait_admission()
  reader_concurrency_semaphore: remove now unused obtain_permit_nowait()
  reader_concurrency_semaphore: admission: flip the switch
  database: increase semaphore max queue size
  test: index_with_paging_test: increase semaphore's queue size
  reader_concurrency_semaphore: add set_max_queue_size()
  test: mutation_reader_test: remove restricted reader tests
  reader_concurrency_semaphore: remove now unused make_permit()
  test: reader_concurrency_semaphore_test: move away from make_permit()
  test: move away from make_permit()
  treewide: use make_tracking_only_permit()
  ...
2021-07-14 16:22:56 +02:00
Botond Dénes
46c9106bdf table: make_sstable_reader(): fix indentation 2021-07-14 17:19:02 +03:00
Botond Dénes
7ddde9107e table: clean up make_sstable_reader()
Remove all the now unneeded mutation sources.
2021-07-14 17:19:02 +03:00
Botond Dénes
16d3cb4777 mutation_reader: remove now unused restricting_reader
Move the now orphaned new_reader_base_cost constant to
database.hh/table.cc, as its main user is now
`table::estimate_read_memory_cost()`.
2021-07-14 17:19:02 +03:00
Botond Dénes
1b7eea0f52 reader_concurrency_semaphore: admission: flip the switch
This patch flips two "switches":
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) by now all permits are obtained up-front, so this patch just yanks
out the restricted reader from all reader stacks and simultaneously
switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By
doing this admission is now waited on when creating the permit.

(2) we switch to an admission algorithm that adds a new aspect to the
existing resource availability: the number of used/blocked reads. Namely
it only admits new reads if in addition to the necessary amount of
resources being available, all currently used readers are blocked. In
other words we only admit new reads if all currently admitted reads
requires something other than CPU to progress. They are either waiting
on I/O, a remote shard, or attention from their consumers (not used
currently).

We flip these two switches at the same time because up-front admission
means cache reads now need to obtain a permit too. For cache reads the
optimal concurrency is 1. Anything above that just increases latency
(without increasing throughput). So we want to make sure that if a cache
reader hits it doesn't get any competition for CPU and it can run to
completion. We admit new reads only if the read misses and has to go to
disk.

Another change made to accommodate this switch is the replacement of the
replica side read execution stages which the reader concurrency
semaphore as an execution stage. This replacement is needed because with
the introduction of up-front admission, reads are not independent of
each other any-more. One read executed can influence whether later reads
executed will be admitted or not, and execution stages require
independent operations to work well. By moving the execution stage into
the semaphore, we have an execution stage which is in control of both
admission and running the operations in batches, avoiding the bad
interaction between the two.
2021-07-14 17:19:02 +03:00
Botond Dénes
7bfa40a2f1 treewide: use make_tracking_only_permit()
For all those reads that don't (won't or can't) pass through admission
currently.
2021-07-14 17:19:02 +03:00
Botond Dénes
7f2813e3fa database: mutation_query(): handle querier lookup/save on the database level
Instead of passing down the querier_cache_ctx to table::mutation_query(),
handle the querier lookup/save on the level where the cache exists.

The real motivation behind this change however is that we need to move
the lookup outside the execution stage, because the current execution
stage will soon be replaced by the one provided by the semaphore and to
use that properly we need to know if we have a saved permit or not.
2021-07-14 16:48:43 +03:00
Botond Dénes
d2f5393a43 database: query(): handle querier lookup/save on the database level
Instead of passing down the querier_cache_ctx to table::query(),
handle the querier lookup/save on the level where the cache exists.

The real motivation behind this change however is that we need to move
the lookup outside the execution stage, because the current execution
stage will soon be replaced by the one provided by the semaphore and to
use that properly we need to know if we have a saved permit or not.
2021-07-14 16:48:43 +03:00
Botond Dénes
999169e535 database: make_streaming_reader(): require permit
As a preparation for up-front admission, add a permit parameter to
`make_streaming_reader()`, which will be the admitted permit once we
switch to up-front admission. For now it has to be a non-admitted
permit.
A nice side-effect of this patch is that now permits will have a
use-case specific description, instead of the generic "streaming" one.
2021-07-14 16:48:43 +03:00
Botond Dénes
a6b59f0d89 table: add estimate_read_memory_cost()
To be used for determining the base cost of reads used in admission. For
now it just returns the already used constant. This is a forward looking
change, to when this will be a real estimation, not just a hardcoded
number.
2021-07-14 16:48:43 +03:00
Piotr Sarna
73f7702a69 table: stop ignoring view generation errors on write path
When the generate-and-propagate-view-updates routine was rewritten
to allow partial results, one important validation got lost:
previously, an error which occured during update *generation*
was propagated to the user - as an example, the indexed column
value must be smaller than 64kB, otherwise it cannot act as primary
key part in the underlying view. Errors on view update *propagation*
are however ignored in this layer, because it becomes a background
process.
During the rewrite these two got mixed up and so it was possible
to ignore an error that should have been propagated.
This behavior is now fixed.

Fixes #9013
2021-07-13 17:20:38 +02:00
Piotr Sarna
a1813c9b34 db,view,table: drop unneeded time point parameter
Now that restriction checking is translated to the partition-slice-style
interface, checking the partition/clustering key restrictions for views
can be performed without the time point parameter.
The parameter is dropped from all relevant call sites.
2021-07-13 10:40:08 +02:00
Avi Kivity
222ef17305 build, treewide: enable -Wredundant-move
Returning a function parameter guarantees copy elision and does not
require a std::move().  Enable -Wredundant-move to warn us that the
move is unneeded, and gain slightly more readable code. A few violations
are trivially adjusted.

Closes #9004
2021-07-11 12:53:02 +03:00
Avi Kivity
4f1e21ceac Merge "reader_concurrency_semaphore: get rid of global semaphores" from Botond
"
When obtaining a valid permit was made mandatory, code which now had to
create reader permits but didn't have a semaphore handy suddenly found
itself in a difficult situation. Many places and most prominently tests
solved the problem by creating a thread-local semaphore to source
permits from. This was fine at the time but as usual, globals came back
to haunt us when `reader_concurrency_semaphore::stop()` was
introduced, as these global semaphores had no easy way to be stopped
before being destroyed. This patch-set cleans up this wart, by getting
rid of all global semaphores, replacing them with appropriately scoped
local semaphores, that are stopped after being used. With that, the
FIXME in `~reader_concurrency_semaphore()` can be resolved and we an
finally `assert()` that the semaphore was stopped before being
destroyed.

This series is another preparatory one for the series which moves the
semaphore in front of the cache.

tests: unit(dev)
"

* 'reader-concurrency-semaphore-mandatory-stop/v2' of https://github.com/denesb/scylla: (26 commits)
  reader_concurrency_semaphore: assert(_stopped) in the destructor
  test/lib: remove now unused reader_permit.{hh,cc}
  test/boost: migrate off the global test reader semaphore
  test/manual: migrate off the global test reader semaphore
  test/unit: migrate off the global test reader semaphore
  test/perf: migrate off the global test reader semaphore
  test/perf: perf.hh: add reader_concurrency_semaphore_wrapper
  test/lib: migrate off the global test reader semaphore
  test/lib/simple_schema: migrate off the global test reader semaphore
  test/lib/sstable_utils: migrate off the global test reader semaphore
  test/lib/test_services: migrate off the global test reader semaphore
  test/lib/sstable_test_env: add reader_concurrency_semaphore member
  test/lib/cql_test_env: add make_reader_permit()
  test/lib: add reader_concurrency_semaphore.hh
  test/boost/sstable_test: migrate row counting tests to seastar thread
  test/boost/sstable_test: test_using_reusable_sst(): pass env to func
  test/lib/reader_lifecycle_policy: add permit parameter to factory function
  test/boost/mutation_reader_test: share permit between readers in a read
  memtable: migrate off the global reader concurrency semaphore
  mutation_writer: multishard_writer: migrate off the global reader concurrency semaphore
  ...
2021-07-08 17:28:13 +03:00
Botond Dénes
0f36e5c498 memtable: migrate off the global reader concurrency semaphore
Require the caller of `create_flush_reader()` to pass a permit instead.
2021-07-08 12:31:36 +03:00
Piotr Sarna
6a461d00c6 table: elaborate on why exceptions are ignored for view updates
The generate_and_propagate_view_updates() function explicitly
ignores exceptions reported from the underlying view update
propagation layer. This decision is now explained in the comment.
2021-07-08 11:21:55 +02:00
Piotr Sarna
bf0777e97a view: generate view updates in smaller parts
In order to avoid large allocations and too large mutations
generated from large view updates, granularity of the process
is broken down from per-partition to smaller chunks.
The view update builder now produces partial updates, no more
than 100 view rows at a time.
2021-07-08 11:17:27 +02:00
Piotr Sarna
1000d52cfa table: coroutinize generating view updates
... which will make the incoming changes easier to review.
2021-07-08 11:17:27 +02:00
Raphael S. Carvalho
1924e8d2b6 treewide: Move compaction code into a new top-level compaction dir
Since compaction is layered on top of sstables, let's move all compaction code
into a new top-level directory.
This change will give me extra motivation to remove all layer violations, like
sstable calling compaction-specific code, and compaction entanglement with
other components like table and storage service.

Next steps:
- remove all layer violations
- move compaction code in sstables namespace into a new one for compaction.
- move compaction unit tests into its own file

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>
2021-07-07 23:21:51 +03:00
Calle Wilund
fdb5801704 table: Always use explicit commitlog discard + clear out rp_set
Fixes #8733

If a memtable flush is still pending when we call table::clear(),
we can end up doing a "discard-all" call to commitlog, followed
by a per-segment-count (using rp_set) _later_. This will foobar
our internal usage counts and quite probably cause assertion
failures.
Fixed by always doing per-memtable explicit discard call. But to
ensure this works, since a memtable being flushed remains on
memtable list for a while (why?), we must also ensure we clear
out the rp_set on discard.

v3:
* Fix table::clear to discard rp_sets before memtables

Closes #8894
2021-06-21 14:53:54 +03:00
Nadav Har'El
45c2442f49 Merge 'Avoid large allocs in mv update code' from Piotr Sarna
This series addresses #8852 by:
 * migrating to chunked_vector in view update generation code to avoid large allocations
 * reducing the number of futures kept in mutate_MV, tracking how many view updates were already sent

Combined with #8853 I was able to only observe large partition warnings in the logs for the reproducing code, without crashes, large allocation or reactor stall warnings. The reproducing code itself is not part of cql-pytest because I haven't yet figured out how to make it fast and robust.

Tests: unit(release)
Refs  #8852

Closes #8856

* github.com:scylladb/scylla:
  db,view: limit the number of simultaneous view update futures
  db,view: use chunked_vector for view updates
2021-06-17 14:01:38 +03:00
Avi Kivity
00ff3c1366 Merge 'treewide: add support for snapshot skip-flush option' from Benny Halevy
The option is provided by nodetool snapshot
https://docs.scylladb.com/operating-scylla/nodetool-commands/snapshot/
```
nodetool [(-h <host> | --host <host>)] [(-p <port> | --port <port>)]
         [(-pp | --print-port)] [(-pw <password> | --password <password>)]
         [(-pwf <passwordFilePath> | --password-file <passwordFilePath>)]
         [(-u <username> | --username <username>)] snapshot
         [(-cf <table> | --column-family <table> | --table <table>)]
         [(-kc <kclist> | --kc.list <kclist>)]
         [(-sf | --skip-flush)] [(-t <tag> | --tag <tag>)] [--] [<keyspaces...>]

-sf / –skip-flush    Do not flush memtables before snapshotting (snapshot will not contain unflushed data)
```

But is currently ignored by scylla-jmx (scylladb/scylla-jmx#167)
and not supported at the api level.

This patch adds support for the option in advance
from the api service level down via snapshot_ctl
to the table class and snapshot implementation.

In addition, a corresponding unit test was added to verify
that taking a snapshot with `skip_flush` does not flush the memtable
(at the table::snapshot level).

Refs #8725

Closes #8726

* github.com:scylladb/scylla:
  test: database_test: add snapshot_skip_flush_works
  api: storage_service/snapshots: support skip-flush option
  snapshot: support skip_flush option
  table: snapshot: add skip_flush option
  api: storage_service/snapshots: add sf (skip_flush) option
2021-06-17 13:32:23 +03:00
Piotr Sarna
a7f7716ecf db,view: use chunked_vector for view updates
The number of view updates can grow large, especially in corner
cases like removing large base partitions. Chunked vector
prevents large allocations.
2021-06-17 10:15:17 +02:00
Piotr Sarna
f832a30388 db,view,table: futurize calculating affected ranges
In order to avoid stalls on large inputs, calculating
affected ranges is now able to yield.
2021-06-16 09:51:31 +02:00
Piotr Sarna
e3fa0246a1 table: coroutinize do_push_view_replica_updates
Makes the code cleaner, but more importantly it will make it easier
to futurize calculate_affected_clustering_ranges in the near future.
2021-06-16 09:51:30 +02:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Benny Halevy
f081e651b3 memtable_list: rename request_flush to just flush
Now that it returns a future that always waits on
pending flushes there is no point in calling it `request_flush`.
`flush()` is simpler and better describes its function.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-06-06 09:21:23 +03:00
Benny Halevy
948a9da832 table: do_apply: verify that _async_gate is open
Applying changes to the memtable after table::stop
is prohibited. Verify that by making sure that
the _async_gate is still open in `do_apply`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210601055042.41380-1-bhalevy@scylladb.com>
2021-06-06 09:21:23 +03:00
Calle Wilund
131da30856 table: Always use explicit commitlog discard + clear out rp_set
Fixes #8733

If a memtable flush is still pending when we call table::clear(),
we can end up doing a "discard-all" call to commitlog, followed
by a per-segment-count (using rp_set) _later_. This will foobar
our internal usage counts and quite probably cause assertion
failures.
Fixed by always doing per-memtable explicit discard call. But to
ensure this works, since a memtable being flushed remains on
memtable list for a while (why?), we must also ensure we clear
out the rp_set on discard.

Closes #8766
2021-06-06 09:21:23 +03:00
Benny Halevy
52fd2b71b7 table: snapshot: add skip_flush option
skip_flush is false by default.

Also, log a debug message when starting the snapshot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-06-02 17:20:21 +03:00
Benny Halevy
1c0769d789 table: clear: make exception safe
It is currently possible that _memtables->add_memtable()
will throw after _memtables->clear(), leaving the memtables
list completely empty.  However, we do rely on always
having at least one allocated in the memtables list
as active_memtable() references a lw_shared_ptr<memtable>
at the back of the memtables vector, and it expected
to always be allocated via add_memtable() upon construction
and after clear().

This change moves the implementation of this convention
to memtable_list::clear() and makes the latter exception safe
by first allocating the to-be-added empty memtable and
only then clearing the vector.

Refs #8749

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210530100232.2104051-1-bhalevy@scylladb.com>
2021-05-30 13:22:52 +03:00
Asias He
72cc596842 repair: Wire off-strategy compaction for regular repair
We have enabled off-strategy compaction for bootstrap, replace,
decommission and removenode operations when repair based node operation
is enabled. Unlike node operations like replace or decommission, it is
harder to know when the repair of a table is finished because users can
send multiple repair requests one after another, each request repairing
a few token ranges.

This patch wires off-strategy compaction for regular repair by adding
a timeout based automatic off-strategy compaction trigger mechanism.
If there is no repair activity for sometime, off-strategy compaction
will be triggered for that table automatically.

Fixes #8677

Closes #8678
2021-05-26 11:41:27 +03:00
Benny Halevy
6144656b25 table: seal_active_memtable: update stats also on the error path
Currently the pending (memtables) flushes stats are adjusted back
only on success, therefore they will "leak" on error, so move
use a .then_wrapped clause to always update the stats.

Note that _commitlog->discard_completed_segments is still called
only on success, and so is returning the previous_flush future.

Test: unit(dev)
DTest: alternator_tests.py:AlternatorTest.test_batch_with_auto_snapshot_false(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210525055336.1190029-2-bhalevy@scylladb.com>
2021-05-25 12:51:54 +02:00
Avi Kivity
50f3bbc359 Merge "treewide: various header cleanups" from Pavel S
"
The patch set is an assorted collection of header cleanups, e.g:
* Reduce number of boost includes in header files
* Switch to forward declarations in some places

A quick measurement was performed to see if these changes
provide any improvement in build times (ccache cleaned and
existing build products wiped out).

The results are posted below (`/usr/bin/time -v ninja dev-build`)
for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX).

Before:

	Command being timed: "ninja dev-build"
	User time (seconds): 28262.47
	System time (seconds): 824.85
	Percent of CPU this job got: 3979%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2129888
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1402838
	Minor (reclaiming a frame) page faults: 124265412
	Voluntary context switches: 1879279
	Involuntary context switches: 1159999
	Swaps: 0
	File system inputs: 0
	File system outputs: 11806272
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

After:

	Command being timed: "ninja dev-build"
	User time (seconds): 26270.81
	System time (seconds): 767.01
	Percent of CPU this job got: 3905%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2117608
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1400189
	Minor (reclaiming a frame) page faults: 117570335
	Voluntary context switches: 1870631
	Involuntary context switches: 1154535
	Swaps: 0
	File system inputs: 0
	File system outputs: 11777280
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

The observed improvement is about 5% of total wall clock time
for `dev-build` target.

Also, all commits make sure that headers stay self-sufficient,
which would help to further improve the situation in the future.
"

* 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla:
  transport: remove extraneous `qos/service_level_controller` includes from headers
  treewide: remove evidently unneded storage_proxy includes from some places
  service_level_controller: remove extraneous `service/storage_service.hh` include
  sstables/writer: remove extraneous `service/storage_service.hh` include
  treewide: remove extraneous database.hh includes from headers
  treewide: reduce boost headers usage in scylla header files
  cql3: remove extraneous includes from some headers
  cql3: various forward declaration cleanups
  utils: add missing <limits> header in `extremum_tracking.hh`
2021-05-24 14:24:20 +03:00
Avi Kivity
1d508106be table: drop unused field database_sstable_write_monitor::_compaction_manager 2021-05-21 21:04:20 +03:00
Pavel Solodovnikov
fff7ef1fc2 treewide: reduce boost headers usage in scylla header files
`dev-headers` target is also ensured to build successfully.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-05-20 01:33:18 +03:00