Commit Graph

23751 Commits

Author SHA1 Message Date
Botond Dénes
2ff326a41a test/manual/sstable_scan_footprint_test: document sstable related command line arguments 2020-09-28 11:27:49 +03:00
Botond Dénes
ceb308411c mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*() 2020-09-28 11:27:49 +03:00
Botond Dénes
ceb0b02ee8 test: simple_schema: add make_static_row() 2020-09-28 11:27:49 +03:00
Botond Dénes
63578bf0a7 reader_permit: reader_resources: add operator== 2020-09-28 11:27:49 +03:00
Botond Dénes
256140a033 mutation_fragment: memory_usage(): remove unused schema parameter
The memory usage is now maintained and updated on each change to the
mutation fragment, so it needs not be recalculated on a call to
`memory_usage()`, hence the schema parameter is unused and can be
removed.
2020-09-28 11:27:47 +03:00
Botond Dénes
041d71bd6f mutation_fragment: track memory usage through the reader_permit
The memory usage of mutation fragments is now tracked through its
lifetime through a reader permit. This was the last major (to my current
knowledge) untracked piece of the reader pipeline.
2020-09-28 11:27:29 +03:00
Botond Dénes
52662f17ea reader_permit: resource_units: add permit() and resources() accessors 2020-09-28 11:27:29 +03:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
54357221f0 partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment
It is what its callers want anyway.
2020-09-28 10:53:56 +03:00
Botond Dénes
1e6285d776 mutation_fragment: remove as_mutable_end_of_partition()
There is nothing to mutate on a partition_end fragment.
2020-09-28 10:53:56 +03:00
Botond Dénes
5079b9ccf1 mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_mutation_start() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
72a88e0257 mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_range_tombstone() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
4f5ccf82cb mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_clustering_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
f2b9cad4c6 mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_static_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
0518571e56 flat_mutation_reader: make _buffer a tracked buffer
Via a tracked_allocator. Although the memory allocations made by the
_buffer shouldn't dominate the memory consumption of the read itself,
they can still be a significant portion that scales with the number of
readers in the read.
2020-09-28 10:53:56 +03:00
Botond Dénes
77ea44cb73 mutation_reader: extract the two fill_buffer_result into a single one
Currently we have two, nearly identical definitions of said struct.
Extract it to a common definition and rename it to
`remote_fill_buffer_result`.
2020-09-28 10:53:56 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Botond Dénes
c1215592da reader_permit: introduce tracking_allocator
This can be used with standard containers and other containers that use
the std::allocator interface to track the allocations made by them via a
reader_permit.
2020-09-28 08:46:22 +03:00
Botond Dénes
f10abf6e35 reader_permit: reader_resources: add with_memory() factory function
To make creating reader resource with just memory more convenient and
more readable at the same time.
2020-09-28 08:46:22 +03:00
Botond Dénes
4c8ab10563 reader_permit: only forward resource consumption to semaphore after admission
In the next patches we plan to start tracking the memory consumption of
the actual allocations made by the circular_buffer<mutation_fragment>,
as well as the memory consumed by the mutation fragments.
This means that readers will start consuming memory off the permit right
after being constructed. Ironically this can prevent the reader from
being admitted, due to its own pre-admission memory consumption. To
prevent this hold on forwarding the memory consumption to the semaphore,
until the permit is actually admitted.
2020-09-28 08:46:22 +03:00
Botond Dénes
e1eee0dc34 reader_permit: track resource consumed through permit
Track all resources consumed through the permit inside the permit. This
allows querying how much memory each read is consuming (as there should
be one read per permit). Although this might be interesting, especially
when debugging OOM cores, the real reason we are doing this is to be
able forward resource consumption to the semaphore only post-admission.
More on this in the patch introducing this.

Another advantage of tracking resources consumed through the permit is
that now we can detect resource leaks in the permit destructor and
report them. Even if it is just a case of the holder of the resources
wanting to release the resources later, with the permit destroyed it
will cause use-after-free.
2020-09-28 08:46:22 +03:00
Botond Dénes
cd953a36fd reader_permit: move internals to impl
In the next patches the reader permit will gain members that are shared
across all instances of the same permit. To facilitate this move all
internals into an impl class, of which the permit stores a shared
pointer. We use a shared_ptr to avoid defining `impl` in the header.

This is how the reader permit started in the beginning. We've done a
full circle. :)
2020-09-28 08:46:22 +03:00
Botond Dénes
12372731cb reader_permit: add consume()/signal()
And do all consuming and signalling through these methods. These
operations will soon be more involved than the simple forwarding they do
today, so we want to centralize them to a single method pair.
2020-09-28 08:46:22 +03:00
Botond Dénes
375815e650 reader_permit::resource_units: store permit instead of semaphore
In the next patches we want to introduce per-permit resource tracking --
that is, have each permit track the amount of resource consumed through
it. For this, we need all consumption to happen through a permit, and
not directly with the semaphore.
2020-09-28 08:46:22 +03:00
Botond Dénes
04d83f6678 reader_permit: move resource_units declaration outside the reader_permit class
In the next patch we want to store a `reader_permit` instance inside
`resource_units` so a full definition of the former must be available.
2020-09-28 08:46:22 +03:00
Botond Dénes
0fe75571d9 reader_concurrency_semaphore: admit one read if no reader is active
To ensure progress at all times. This is due to evictable readers, who
still hold on to a buffer even when their underlying reader is evicted.
As we are introducing buffer and mutation fragment tracking in the next
patches, these readers will hold on to memory even in this state, so it
may theoretically happen that even though no readers are admitted (all
count resources all available) no reader can be admitted due to lack of
memory. To prevent such deadlocks we now always admit one reader if all
count resource are available.
2020-09-28 08:46:22 +03:00
Botond Dénes
ef0b279c80 reader_concurrency_semaphore: move may_proceed() out-of-line
They are only used in the .cc anyway.
2020-09-28 08:46:22 +03:00
Botond Dénes
d692993bdc mutation_reader_test: test_multishard_combining_reader_non_strictly_monotonic_positions: reset size between buffer fills
Current code uses a single counter to produce multiple buffer worth of
data. This uses carry-on from on buffer to the other, which happens to
work with the current memory accounting but is very fragile. Account
each buffer separately, resetting the counter between them.
2020-09-28 08:46:22 +03:00
Botond Dénes
7e909671f4 view_build_test: test_view_update_generator_deadlock: release semaphore resources
The test consumes all resources off the semaphore, leaving just enough
to admit a single reader. However this amount is calculated based on the
base cost of readers, but as we are going to track reader buffers as
well, the amount of memory consumed will be much less predictable.
So to make sure background readers can finish during shutdown, release
all the consumed resources before leaving scope.
2020-09-28 08:46:22 +03:00
Botond Dénes
122ab1aabd view_build_test: test_view_update_generator_buffering: fail the test early on exceptions
No point in continuing processing the entire buffer once a failure was
found. Especially that an early failure might introduce conditions that
are not handled in the normal flow-path. We could handle these but there
is no point in this added complexity, at this point the test is failed
anyway.
2020-09-28 08:46:22 +03:00
Botond Dénes
99388590da querier_cache_test: test_resources_based_cache_eviction: use semaphore::consume() to drain semaphore
It is much more reliable and simple this way, than playing with
`reader_permit::wait_for_admission()`.
2020-09-28 08:46:22 +03:00
Botond Dénes
3c73cc2a4e tests: prepare for permit forwarding consumption post admission
Some tests rely on `consume*()` calls on the permit to take effect
immediately. Soon this will only be true once the permit has been
admitted, so make sure the permit is admitted in these tests.
2020-09-28 08:46:22 +03:00
Botond Dénes
5e5c94b064 test/lib/reader_lifecycle_policy: don't destroy reader context eagerly
Currently per-shard reader contexts are cleaned up as soon as the reader
itself is destroyed. This causes two problems:
* Continuations attached to the reader destroy future might rely on
  stuff in the context being kept alive -- like the semaphore.
* Shard 0's semaphore is special as it will be used to account buffers
  allocated by the multishard reader itself, so it has to be alive until
  after all readers are destroyed.

This patch changes this so that contexts are destroyed only when the
lifecycle policy itself is destroyed.
2020-09-28 08:46:22 +03:00
Takuya ASADA
8366d2231d scylla_ntp_setup: use chrony on all distributions
To simplify scylla_ntp_setup, use chrony on all distributions.
2020-09-27 12:30:02 +03:00
Rafael Ávila de Espíndola
2093efceab build: Upgrade to seastar API level 5
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200923202424.216444-1-espindola@scylladb.com>
2020-09-26 11:07:49 +03:00
Avi Kivity
36d93f586a Update seastar submodule
* seastar e215023c7...292ba734b (4):
  > future: Fix move of futures of reference type
  > doc: fix hyper link to tutorial.html
  > tutorial: fix formatting of code block
  > README.md: fix the formatting of table
2020-09-25 21:54:44 +03:00
Tomasz Grabiec
97c99ea9f3 Merge "evictable_reader: validate buffer on reader recreation" from Botond
The reader recreation mechanism is a very delicate and error-prone one,
as proven by the countless bugs it had. Most of these bugs were related
to the recreated reader not continuing the read from the expected
position, inserting out-of-order fragments into the stream.
This patch adds a defense mechanism against such bugs by validating the
start position of the recreated reader.
The intent is to prevent corrupt data from getting into the system as
well as to help catch these bugs as close to the source as possible.

Fixes: #7208

Tests: unit(dev), mutation_reader_test:debug (v4)

* botond/evictable-reader-validate-buffer/v5:
  mutation_reader_test: add unit test for evictable reader self-validation
  evictable_reader: validate buffer after recreation the underlying
  evictable_reader: update_next_position(): only use peek'd position on partition boundary
  mutation_reader_test: add unit test for evictable reader range tombstone trimming
  evictable_reader: trim range tombstones to the read clustering range
  position_in_partition_view: add position_in_partition_view before_key() overload
  flat_mutation_reader: add buffer() accessor
2020-09-25 17:02:51 +02:00
Takuya ASADA
eae2aa58fa dist/common/scripts: move back get_set_nic_and_disks_config_value to scylla_util.py
The function mistakenly moved to scylla_sysconfig_setup but it also referenced
from scylla_prepare, move back to scylla_util.py

Fixes #7276

Closes #7280
2020-09-25 13:05:43 +03:00
Botond Dénes
076c27318b mutation_reader_test: add unit test for evictable reader self-validation
Add both positive (where the validation should succeed) and negative
(where the validation should fail) tests, covering all validation cases.
2020-09-25 12:09:01 +03:00
Botond Dénes
0b0ae18a14 evictable_reader: validate buffer after recreation the underlying
The reader recreation mechanism is a very delicate and error-prone one,
as proven by the countless bugs it had. Most of these bugs were related
to the recreated reader not continuing the read from the expected
position, inserting out-of-order fragments into the stream.
This patch adds a defense mechanism against such bugs by validating the
start position of the recreated reader. Several things are checked:
* The partition is the expected one -- the one we were in the middle of
  or the next if we stopped at partition boundaries.
* The partition is in the read range.
* The first fragment in the partition is the expected one -- has a
  an equal or larger position than the next expected fragment.
* The fragment is in the clustering range as defined by the slice.

As these validations are only done on the slow-path of recreating an
evicted reader, no performance impact is expected.
2020-09-25 12:09:00 +03:00
Botond Dénes
91020eef73 evictable_reader: update_next_position(): only use peek'd position on partition boundary
`evictable_reader::update_next_position()` is used to record the position the
reader will continue from, in the next buffer fill. This position is used to
create the partition slice when the underlying reader is evicted and has
to be recreated. There is an optimization in this method -- if the
underlying's buffer is not empty we peek at the first fragment in it and
use it as the next position. This is however problematic for buffer
validation on reader recreation (introduced in the next patch), because
using the next row's position as the next pos will allow for range
tombstones to be emitted with before_key(next_pos.key()), which will
trigger the validation. Instead of working around this, just drop this
optimization for mid-partition positions, it is inconsequential anyway.
We keep it for where it is important, when we detect that we are at a
partition boundary. In this case we can avoid reading the current
partition altogether when recreating the reader.
2020-09-25 12:09:00 +03:00
Botond Dénes
d1b0573e1c mutation_reader_test: add unit test for evictable reader range tombstone trimming 2020-09-25 12:09:00 +03:00
Botond Dénes
4f2e7a18e2 evictable_reader: trim range tombstones to the read clustering range
Currently mutation sources are allowed to emit range tombstones that are
out-of the clustering read range if they are relevant to it. For example
a read of a clustering range [ck100, +inf), might start with:

    range_tombstone{start={ck1, -1}, end={ck200, 1}},
    clustering_row{ck100}

The range tombstone is relevant to the range and the first row of the
range so it is emitted as first, but its position (start) is outside the
read range. This is normally fine, but it poses a problem for evictable
reader. When the underlying reader is evicted and has to be recreated
from a certain clustering position, this results in out-of-order
mutation fragments being inserted into the middle of the stream. This is
not fine anymore as the monotonicity guarantee of the stream is
violated. The real solution would be to require all mutation sources to
trim range tombstones to their read range, but this is a lot of work.
Until that is done, as a workaround we do this trimming in the evictable
reader itself.
2020-09-25 12:09:00 +03:00
Botond Dénes
d7d93aef49 position_in_partition_view: add position_in_partition_view before_key() overload 2020-09-25 12:09:00 +03:00
Avi Kivity
f1fcf4f139 Update seastar submodule
* seastar 9ae33e67e1...e215023c78 (4):
  > future: Make futures non variadic
  > on_internal_error: add noexcept variant
  > Convert another std::result_of to std::invoke_result
  > reactor: remove unused declaration abort_on_error()
2020-09-24 20:04:03 +03:00
Tomasz Grabiec
14fdd2f501 Merge "Gossip echo message improvement" from Asias
This series improves gossip echo message handling in a loaded cluster.

Refs: #7197

* git://github.com/asias/scylla.git gossip_echo_improve_7197:
  gossiper: Handle echo message on any shard
  gossiper: Increase echo message timeout
  gossiper: Remove unused _last_processed_message_at
2020-09-24 15:13:55 +02:00
Pekka Enberg
84a0aca666 configure.py: Rename "mode" to "checkheaders_mode"
The "mode" variable name is used everywhere, usually in a loop.
Therefore, rename the global "mode" to "checkheaders_mode" so that if
your code block happens to be outside of a loop, you don't accidentally
use the globally visible "mode" and spend hours debugging why it's
always "dev".

Spotted by Yaron Kaikov.

Message-Id: <20200924112237.315817-1-penberg@scylladb.com>
2020-09-24 15:00:49 +03:00
Nadav Har'El
e1c42f2bb3 scripts/pull_github_pr.sh: show titles of more than 20 patches
The script pull_github_pr.sh uses git merge's "--log" option to put in
the merge commit the list of titles of the individual patches being
merged in. This list is useful when later searching the log for the merge
which introduced a specific feature.

Unfortunately, "--log" defaults to cutting off the list of commit titles
at 20 lines. For most merges involving fewer than 20 commits, this makes
no difference. But some merges include more than 20 commits, and get
a truncated list, for no good reason. If someone worked hard to create a
patch set with 40 patches, the last thing we should be worried about is
that the merge commit message will be 20 lines longer.

Unfortunately, there appears to be no way to tell "--log" to not limit
the length at all. So I chose an arbitrary limit of 1000. I don't think
we ever had a patch set in Scylla which exceeded that limit. Yet :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200924114403.817893-1-nyh@scylladb.com>
2020-09-24 14:51:58 +03:00
Piotr Dulikowski
39771967bb hinted handoff: fix race - decomission vs. endpoint mgr init
This patch fixes a race between two methods in hints manager: drain_for
and store_hint.

The first method is called when a node leaves the cluster, and it
'drains' end point hints manager for that node (sends out all hints for
that node). If this method is called when the local node is being
decomissioned or removed, it instead drains hints managers for all
endpoints.

In the case of decomission/remove, drain_for first calls
parallel_for_each on all current ep managers and tells them to drain
their hints. Then, after all of them complete, _ep_managers.clear() is
called.

End point hints managers are created lazily and inserted into
_ep_managers map the first time a hint is stored for that node. If
this happens between parallel_for_each and _ep_managers.clear()
described above, the clear operation will destroy the new ep manager
without draining it first. This is a bug and will trigger an assert in
ep manager's destructor.

To solve this, a new flag for the hints manager is added which is set
when it drains all ep managers on removenode/decommission, and prevents
further hints from being written.

Fixes #7257

Closes #7278
2020-09-24 14:51:24 +03:00
Nadav Har'El
a5369881b3 Merge 'sstables: make sstable_manager control the lifetime of the sstables it manages' from Avi Kivity
Currently, sstable_manager is used to create sstables, but it loses track
of them immediately afterwards. This series makes an sstable's life fully
contained within its sstable_manager.

The first practical impact (implemented in this series) is that file removal
stops being a background job; instead it is tracked by the sstable_manager,
so when the sstable_manager is stopped, you know that all of its sstable
activity is complete.

Later, we can make use of this to track the data size on disk, but this is not
implemented here.

Closes #7253

* github.com:scylladb/scylla:
  sstables: remove background_jobs(), await_background_jobs()
  sstables: make sstables_manager take charge of closing sstables
  test: test_env: hold sstables_manager with a unique_ptr
  test: drop test_sstable_manager
  test: sstables::test_env: take ownership of manager
  test: broken_sstable_test: prepare for asynchronously closed sstables_manager
  test: sstable_utils: close test_env after use
  test: sstable_test:  dont leak shared_sstable outside its test_env's lifetime
  test: sstables::test_env: close self in do_with helpers
  test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager
  test: view_build_test: prepare for asynchronously closed sstables_manager
  test: sstable_resharding_test: prepare for asynchronously closed sstables_manager
  test: sstable_mutation_test: prepare for asynchronously closed sstables_manager
  test: sstable_directory_test: prepare for asynchronously closed sstables_manager
  test: sstable_datafile_test: prepare for asynchronously closed sstables_manager
  test: sstable_conforms_to_mutation_source_test: remove references to test_sstables_manager
  test: sstable_3_x_test: remove test_sstables_manager references
  test: schema_changes_test: drop use of test_sstables_manager
  mutation_test: adjust for column_family_test_config accepting an sstables_manager
  test: lib: sstable_utils: stop using test_sstables_manager
  test: sstables test_env: introduce manager() accessor
  test: sstables test_env: introduce do_with_async_sharded()
  test: sstables test_env: introduce  do_with_async_returning()
  test: lib: sstable test_env: prepare for life as a sharded<> service
  test: schema_changes_test: properly close sstables::test_env
  test: sstable_mutation_test: avoid constructing temporary sstables::test_env
  test: mutation_reader_test: avoid constructing temporary sstables::test_env
  test: sstable_3_x_test: avoid constructing temporary sstables::test_env
  test: lib: test_services: pass sstables_manager to column_family_test_config
  test: lib: sstables test_env: implement tests_env::manager()
  test: sstable_test: detemplate write_and_validate_sst()
  test: sstable_test_env: detemplate do_with_async()
  test: sstable_datafile_test: drop bad 'return'
  table: clear sstable set when stopping
  table: prevent table::stop() race with table::query()
  database: close sstable_manager:s
  sstables_manager: introduce a stub close()
  sstable_directory_test: fix threading confusion in make_sstable_directory_for*() functions
  test: sstable_datafile_test: reorder table stop in compaction_manager_test
  test: view_build_test: test_view_update_generator_register_semaphore_unit_leak: do not discard future in timer
  test: view_build_test: fix threading in test_view_update_generator_register_semaphore_unit_leak
  view: view_update_generator: drop references to sstables when stopping
2020-09-24 13:54:38 +03:00