Commit Graph

1788 Commits

Author SHA1 Message Date
Piotr Jastrzebski
01ea159fde codebase wide: use try_emplace when appropriate
C++17 introduced try_emplace for maps to replace a pattern:
if(element not in a map) {
    map.emplace(...)
}

try_emplace is more efficient and results in a more concise code.

This commit introduces usage of try_emplace when it's appropriate.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>
2020-08-16 14:41:09 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Nadav Har'El
8135647906 merge: Add metrics to semaphores
Merged pull request https://github.com/scylladb/scylla/pull/7018
by Piotr Sarna:

This series addresses various issues with metrics and semaphores - it mainly adds missing metrics, which makes it possible to see the length of the queues attached to the semaphores. In case of view building and view update generation, metrics was not present in these services at all, so a first, basic implementation is added.

More precise semaphore metrics would ease the testing and development of load shedding and admission control.

	view_builder: add metrics
	db, view: add view update generator metrics
	hints: track resource_manager sending queue length
	hints: add drain queue length to metrics
	table: add metrics for sstable deletion semaphore
	database: remove unused semaphore
2020-08-12 12:39:59 +03:00
Piotr Sarna
5086a5ca32 view_builder: add metrics
The view builder service lacked metrics, so a basic set of them
is added.
2020-08-11 17:43:53 +02:00
Piotr Sarna
e4d78b60ff db, view: add view update generator metrics
The view update generator completely lacked metrics, so a basic set
of them is now exposed.
2020-08-11 17:43:53 +02:00
Piotr Sarna
180a1505fd hints: track resource_manager sending queue length
The number of tasks waiting for a hint to be sent is now tracked.
2020-08-11 17:43:53 +02:00
Piotr Sarna
58a9fa7d2e hints: add drain queue length to metrics
The number of tasks waiting for a drain is now tracked.
2020-08-11 17:43:53 +02:00
Avi Kivity
d36601a838 Merge 'Make commitlog respect disk limit better' from Calle
"
Refs #6148

Separates disk usage into two cases: Allocated and used.
Since we use both reserve and recycled segments, both
which are not actually filled with anything at the point
of waiting.

Also refuses to recycle segments or increase reserve size
if our current disk footprint exceeds threshold.

And finally uses some initial heuristics to determine when
we should suggest flushing, based on disk limit, segment
size, and current usage. Right now, when we only have
a half segment left before hitting used == max.

Some initial tests show an improved adherence to limit
though it will still be exceeded, because we do _not_
force waiting for segments to become cleared or similar
if we need to add data, thus slow flushing can still make
usage create extra segments. We will however attempt to
shrink disk usage when load is lighter.

Somewhat unclear how much this impacts performance
with tight limits, and how much this matters.
"

* elcallio-calle/commitlog_size:
  commitlog: Make commitlog respect disk limit better
  commitlog: Demote buffer write log messages to trace
2020-08-11 15:03:32 +03:00
Calle Wilund
5d044ab74e commitlog: Make commitlog respect disk limit better
Refs #6148

Separates disk usage into two cases: Allocated and used.
Since we use both reserve and recycled segments, both
which are not actually filled with anything at the point
of waiting.

Also refuses to recycle segments or increase reserve size
if our current disk footprint exceeds threshold.

And finally uses some initial heuristics to determine when
we should suggest flushing, based on disk limit, segment
size, and current usage. Right now, when we only have
a half segment left before hitting used == max.

Some initial tests show an improved adherence to limit
though it will still be exceeded, because we do _not_
force waiting for segments to become cleared or similar
if we need to add data, thus slow flushing can still make
usage create extra segments. We will however attempt to
shrink disk usage when load is lighter.

Somewhat unclear how much this impacts performance
with tight limits, and how much this matters.

v2:
* Add some comments/explanations
v3:
* Made disk footprint subtract happen post delete (non-optimistic)
2020-08-11 10:40:56 +00:00
Avi Kivity
3530e80ce1 Merge "Support md format" from Benny
"
This series adds support for the "md" sstable format.

Support is based on the following:

* do not use clustering based filtering in the presence
  of static row, tombstones.
* Disabling min/max column names in the metadata for
  formats older than "md".
* When updating the metadata, reset and disable min/max
  in the presence of range tombstones (like Cassandra does
  and until we process them accurately).
* Fix the way we maintain min/max column names by:
  keeping whole clustering key prefixes as min/max
  rather than calculating min/max independently for
  each component, like Cassandra does in the "md" format.

Fixes #4442

Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug)
md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1
"

* tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits)
  config: enable_sstables_md_format by default
  test: cql_query_test: add test_clustering_filtering unit tests
  table: filter_sstable_for_reader: allow clustering filtering md-format sstables
  table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results
  table: filter_sstable_for_reader: adjust to md-format
  table: filter_sstable_for_reader: include non-scylla sstables with tombstones
  table: filter_sstable_for_reader: do not filter if static column is requested
  table: filter_sstable_for_reader: refactor clustering filtering conditional expression
  features: add MD_SSTABLE_FORMAT cluster feature
  config: add enable_sstables_md_format
  database: add set_format_by_config
  test: sstable_3_x_test: test both mc and md versions
  test: Add support for the "md" format
  sstables: mx/writer: use version from sstable for write calls
  sstables: mx/writer: update_min_max_components for partition tombstone
  sstables: metadata_collector: support min_max_components for range tombstones
  sstable: validate_min_max_metadata: drop outdated logic
  sstables: rename mc folder to mx
  sstables: may_contain_rows: always true for old formats
  sstables: add may_contain_rows
  ...
2020-08-11 13:29:11 +03:00
Piotr Jastrzebski
80e3923b3c codebase wide: replace find(...) != end() with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
the code pattern looked like:

<collection>.find(<element>) != <collection>.end()

In C++20 the same can be expressed with:

<collection>.contains(<element>)

This is not only more concise but also expresses the intend of the code
more clearly.

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>
2020-08-11 13:28:50 +03:00
Calle Wilund
9167d1ac76 commitlog: Demote buffer write log messages to trace
Because they become very plentiful and annoying when
one tries to analyze segment behaviour. More so in
batch mode.
2020-08-11 09:18:23 +00:00
Benny Halevy
e2340d0684 config: enable_sstables_md_format by default
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
e8d7744040 features: add MD_SSTABLE_FORMAT cluster feature
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
65239a6e50 config: add enable_sstables_md_format
MD format is disabled by default at this point.

The option extends enable_sstables_mc_format
so that both are needed to be set for supporting
the md format.

The MD_FORMAT cluster feature will be added in
a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Piotr Jastrzebski
52ec0c683e codebase wide: replace erase + remove_if with erase_if
C++20 introduced std::erase_if which simplifies removal of elements
from the collection. Previously the code pattern looked like:

<collection>.erase(
        std::remove_if(<collection>.begin(), <collection>.end(), <predicate>),
        <collection>.end());

In C++20 the same can be expressed with:

std::erase_if(<collection>, <predicate>);

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6ffcace5cce79793ca6bd65c61dc86e6297233fd.1597064990.git.piotr@scylladb.com>
2020-08-10 18:17:38 +03:00
Dejan Mircevski
df20854963 cql3: Move expressions to their own namespace
Move the classes representing CQL expressions (and utility functions
on them) from the `restrictions` namespace to a new namespace `expr`.

Most of the restriction.hh content was moved verbatim to
expression.hh.  Similarly, all expression-related code was moved from
statement_restrictions.cc verbatim to expression.cc.

As suggested in #5763 feedback
https://github.com/scylladb/scylla/pull/5763#discussion_r443210498

Tests: dev (unit)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-08 21:03:26 +03:00
Avi Kivity
1572b9e41c Merge 'transport: Added listener with port-based load balancing' from Juliusz
"
This is inspired by #6781. The idea is to make Scylla listen for CQL connections on port 9042 (where both old shard-aware and shard-unaware clients can still connect the traditional way). On top of that I added a new port, where everything works the same way, only the port from client's socket used to determine the shard No. to connect to. Desired shard No. is the result of `clientside_port % num_shards`.

The new port is configurable from scylla.yaml and defaults to 19042 (unencrypted, unless user configures encryption options and omits `native_shard_aware_transport_port_ssl` in DB config).

Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and "SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility, "SCYLLA_SHARDING_ALGORITHM" is still kept.

Fixes #5239
"

* jul-stas-shard-aware-listener:
  docs: Info about shard-aware listeners in protocol-extensions
  transport: Added listener with port-based load balancing
2020-08-03 19:23:28 +03:00
Calle Wilund
30a700c5b0 system_keyspace: Remove support for legacy truncation records
Fixes #6341

Since scylla no longer supports upgrading from a version without the
"new" (dedicated) truncation record table, we can remove support for these
and the migtration thereof.

Make sure the above holds whereever this is committed.

Note that this does not  remove the "truncated_at" field in
system.local.
2020-08-03 17:16:26 +03:00
Eliran Sinvani
779502ab11 Revert "schema: take into account features when converting a table creation to"
This reverts commit b97f466438.

It turns out that the schema mechanism has a lot of nuances,
after this change, for unknown reason, it was empirically
proven that the amount of cross shard on an upgraded node was
increased significantly with a steady stress traffic, if
was so significant that the node appeared unavailable to
the coordinators because all of the requests started to fail
on smp_srvice_group semaphore.

This revert will bring back a caveat in Scylla, the caveat is
that creating a table in a mixed cluster **might** under certain
condition cause schema mismatch on the newly created table, this
make the table essentially unusable until the whole cluster has
a uniform version (rolling upgrade or rollback completion).

Fixes #6893.
2020-08-03 12:51:16 +03:00
Avi Kivity
257c17a87a Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael
"
While working on another patch I was getting odd compiler errors
saying that a call to ::make_shared was ambiguous. The reason was that
seastar has both:

template <typename T, typename... A>
shared_ptr<T> make_shared(A&&... a);

template <typename T>
shared_ptr<T> make_shared(T&& a);

The second variant doesn't exist in std::make_shared.

This series drops the dependency in scylla, so that a future change
can make seastar::make_shared a bit more like std::make_shared.
"

* 'espindola/make_shared' of https://github.com/espindola/scylla:
  Everywhere: Explicitly instantiate make_lw_shared
  Everywhere: Add a make_shared_schema helper
  Everywhere: Explicitly instantiate make_shared
  cql3: Add a create_multi_column_relation helper
  main: Return a shared_ptr from defer_verbose_shutdown
2020-08-02 19:51:24 +03:00
Juliusz Stasiewicz
1c11d8f4c4 transport: Added listener with port-based load balancing
The new port is configurable from scylla.yaml and defaults to 19042
(unencrypted, unless client configures encryption options and omits
`native_shard_aware_transport_port_ssl`).

Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and
"SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility,
"SCYLLA_SHARDING_ALGORITHM" is still kept.

Fixes #5239
2020-07-31 13:02:13 +02:00
Tomasz Grabiec
3486eba1ce commitlog: Fix use-after-free on mutation object during replay
The mutation object may be freed prematurely during commitlog replay
in the schema upgrading path. We will hit the problem if the memtable
is full and apply_in_memory() needs to defer.

This will typically manifest as a segfault.

Fixes #6953

Introduced in 79935df

Tests:
  - manual using scylla binary. Reproduced the problem then verified the fix makes it go away

Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com>
2020-07-29 20:58:15 +03:00
Avi Kivity
fea5067dfa Merge "Limit non-paged query memory consumption" from Botond
"
Non-paged queries completely ignore the query result size limiter
mechanism. They consume all the memory they want. With sufficiently
large datasets this can easily lead to a handful or even a single
unpaged query producing an OOM.

This series continues the work started by 134d5a5f7, by introducing a
configurable pair of soft/hard limit (default to 1MB/100MB) that is
applied to otherwise unlimited queries, like reverse and unpaged ones.
When an unlimited query reaches the soft limit a warning is logged. This
should give users some heads-up to adjust their application. When the
hard limit is reached the query is aborted. The idea is to not greet
users with failing queries after an upgrade while at the same time
protect the database from the really bad queries. The hard limit should
be decreased from time to time gradually approaching the desired goal of
1MB.

We don't want to limit internal queries, we trust ourselves to either
use another form of memory usage control, or read only small datasets.
So the limit is selected according to the query class. User reads use
the `max_memory_for_unlimited_query_{soft,hard}_limit` configuration
items, while internal reads are not limited. The limit is obtained by
the coordinator, who passes it down to replicas using the existing
`max_result_size` parameter (which is not a special type containing the
two limits), which is now passed on every verb, instead of once per
connection. This ensures that all replicas work with the same limits.
For normal paged queries `max_result_size` is set to the usual
`query::result_memory_limiter::maximum_result_size` For queries that can
consume unlimited amount of memory -- unpaged and reverse queries --
this is set to the value of the aforementioned
`max_memory_for_unlimited_query_{soft,hard}_limit` configuration item,
but only for user reads, internal reads are not limited.

This has the side-effect that reverse reads now send entire
partitions in a single page, but this is not that bad. The data was
already read, and its size was below the limit, the replica might as well
send it all.

Fixes: #5870
"

* 'nonpaged-query-limit/v5' of https://github.com/denesb/scylla: (26 commits)
  test: database_test: add test for enforced max result limit
  mutation_partition: abort read when hard limit is exceeded for non-paged reads
  query-result.hh: move the definition of short_read to the top
  test: cql_test_env: set the max_memory_unlimited_query_{soft,hard}_limit
  test: set the allow_short_read slice option for paged queries
  partition_slice_builder: add with_option()
  result_memory_accounter: remove default constructor
  query_*(): use the coordinator specified memory limit for unlimited queries
  storage_proxy: use read_command::max_result_size to pass max result size around
  query: result_memory_limiter: use the new max_result_size type
  query: read_command: add max_result_size
  query: read_command: use tagged ints for limit ctor params
  query: read_command: add separate convenience constructor
  service: query_pager: set the allow_short_read flag
  result_memory_accounter: check(): use _maximum_result_size instead of hardcoded limit
  storage_proxy: add get_max_result_size()
  result_memory_limiter: add unlimited_result_size constant
  database: add get_statement_scheduling_group()
  database: query_mutations(): obtain the memory accounter inside
  query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field
  ...
2020-07-29 13:41:53 +03:00
Botond Dénes
9eab5bca27 query_*(): use the coordinator specified memory limit for unlimited queries
It is important that all replicas participating in a read use the same
memory limits to avoid artificial differences due to different amount of
results. The coordinator now passes down its own memory limit for reads,
in the form of max_result_size (or max_size). For unpaged or reverse
queries this has to be used now instead of the locally set
max_memory_unlimited_query configuration item.

To avoid the replicas accidentally using the local limit contained in
the `query_class_config` returned from
`database::make_query_class_config()`, we refactor the latter into
`database::get_reader_concurrency_semaphore()`. Most of its callers were
only interested in the semaphore only anyway and those that were
interested in the limit as well should get it from the coordinator
instead, so this refactoring is a win-win.
2020-07-28 18:00:29 +03:00
Botond Dénes
159d37053d storage_proxy: use read_command::max_result_size to pass max result size around
Use the recently added `max_result_size` field of `query::read_command`
to pass the max result size around, including passing it to remote
nodes. This means that the max result size will be sent along each read,
instead of once per connection.
As we want to select the appropriate `max_result_size` based on the type
of the query as well as based on the query class (user or internal) the
previous method won't do anymore. If the remote doesn't fill this
field, the old per-connection value is used.
2020-07-28 18:00:29 +03:00
Botond Dénes
92a7b16cba query: read_command: add max_result_size
This field will replace max size which is currently passed once per
established rpc connection via the CLIENT_ID verb and stored as an
auxiliary value on the client_info. For now it is unused, but we update
all sites creating a read command to pass the correct value to it. In the
next patch we will phase out the old max size and use this field to pass
max size on each verb instead.
2020-07-28 18:00:29 +03:00
Botond Dénes
8992bcd1f8 query: read_command: use tagged ints for limit ctor params
The convenience constructor of read_command now has two integer
parameter next to each other. In the next patch we intend to add another
one. This is recipe for disaster, so to avoid mistakes this patch
converts these parameters to tagged integers. This makes sure callers
pass what they meant to pass. As a matter of fact, while fixing up
call-sites, I already found several ones passing `query::max_partitions`
to the `row_limit` parameter. No harm done yet, as
`query::max_partitions` == `query::max_rows` but this shows just how
easy it is to mix up parameters with the same type.
2020-07-28 18:00:29 +03:00
Botond Dénes
46d5b651eb db/config: introduce max_memory_for_unlimited_query_soft_limit and max_memory_for_unlimited_query_hard_limit
This pair of limits replace the old max_memory_for_unlimited_query one,
which remains as an alias to the hard limit. The soft limit inherits the
previous value of the limit (1MB), when this limit is reached a warning
will be logged allowing the users to adjust their client codes without
downtime. The hard limit starts out with a more permissive default of
100MB. When this is reached queries are aborted, the same behaviour as
with the previous single limit.

The idea is to allow clients a grace period for fixing their code, while
at the same time protecting the database from the really bad queries.
2020-07-28 18:00:29 +03:00
Botond Dénes
9faaf46d4b utils: config_src::add_command_line_options(): drop name and desc args
Now that there are no ad-hoc aliases needing to overwrite the name and
description parameter of this method, we can drop these and have each
config item just use `name()` and `desc()` to access these.
2020-07-28 18:00:29 +03:00
Botond Dénes
dc23736d0c db/config: replace ad-hoc aliases with alias mechanism
We already uses aliases for some configuration items, although these are
created with an ad-hoc mechanism that only registers them on the command
line. Replace this with the built-in alias mechanism in the previous
patch, which has the benefit of conflict resolution and also working
with YAML.
2020-07-28 18:00:29 +03:00
Piotr Sarna
ee35c4c3d6 db: handle errors when loading view build progress
Currently, encountering an error when loading view build progress
would result in view builder refusing to start - which also means
that future views would not be built until the server restarts.
A more user-friendly solution would be to log an error message,
but continue to boot the view builder as if no views are currently
in progress, which would at least allow future views to be built
correctly.
The test case is also amended, since now it expects the call
to return that "no view builds are in progress" instead of
an exception.

Fixes #6934
Tests: unit(dev)
Message-Id: <9f26de941d10e6654883a919fd43426066cee89c.1595922374.git.sarna@scylladb.com>
2020-07-28 11:32:09 +03:00
Nadav Har'El
f488eaebaf merge: db/view: view_update_generator: make staging reader evictable
Merged patch set by Botond Dénes:

The view update generation process creates two readers. One is used to
read the staging sstables, the data which needs view updates to be
generated for, and another reader for each processed mutation, which
reads the current value (pre-image) of each row in said mutation. The

staging reader is created first and is kept alive until all staging data
is processed. The pre-image reader is created separately for each
processed mutation. The staging reader is not restricted, meaning it
does not wait for admission on the relevant reader concurrency
semaphore, but it does register its resource usage on it. The pre-image
reader however *is* restricted. This creates a situation, where the
staging reader possibly consumes all resources from the semaphore,
leaving none for the later created pre-image reader, which will not be
able to start reading. This will block the view building process meaning
that the staging reader will not be destroyed, causing a deadlock.

This patch solves this by making the staging reader restricted and
making it evictable. To prevent thrashing -- evicting the staging reader
after reading only a really small partition -- we only make the staging
reader evictable after we have read at least 1MB worth of data from it.

  test/boost: view_build_test: add test_view_update_generator_buffering
  test/boost: view_build_test: add test test_view_update_generator_deadlock
  reader_permit: reader_resources: add operator- and operator+
  reader_concurrency_semaphore: add initial_resources()
  test: cql_test_env: allow overriding database_config
  mutation_reader: expose new_reader_base_cost
  db/view: view_updating_consumer: allow passing custom update pusher
  db/view: view_update_generator: make staging reader evictable
  db/view: view_updating_consumer: move implementation from table.cc to view.cc
  database: add make_restricted_range_sstable_reader()

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
---
 db/view/view_updating_consumer.hh | 51 ++++++++++++++++++++++++++++---
 db/view/view.cc                   | 39 +++++++++++++++++------
 db/view/view_update_generator.cc  | 19 +++++++++---
 3 files changed, 91 insertions(+), 18 deletions(-)
2020-07-27 09:19:37 +02:00
Avi Kivity
39db54a758 Merge "Use seastar::with_file_close_on_failure in commitlog" from Benny
"
`close_on_failure` was committed to seastar so use
the library version.

This requires making the lambda function passed to
it nothrow move constructible, so this series also
makes db::commitlog::descriptor move constructor noexcept
and changes allocate_segment_ex and segment::segment
to get a descriptor by value rather than by reference.

Test: unit(dev), commitlog_test(debug)
"

* tag 'commit-log-use-with_file_close_on_failure-v1' of github.com:bhalevy/scylla:
  commitlog: use seastar::with_file_close_on_failure
  commitlog: descriptor: make nothrow move constructible
  commitlog: allocate_segment_ex, segment: pass descriptor by value
  commitlog: allocate_segment_ex: filename capture is unused
2020-07-23 19:23:23 +03:00
Rafael Ávila de Espíndola
e15c8ee667 Everywhere: Explicitly instantiate make_lw_shared
seastar::make_lw_shared has a constructor taking a T&&. There is no
such constructor in std::make_shared:

https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared

This means that we have to move from

    make_lw_shared(T(...)

to

    make_lw_shared<T>(...)

If we don't want to depend on the idiosyncrasies of
seastar::make_lw_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola
efeaded427 Everywhere: Add a make_shared_schema helper
This replaces a lot of make_lw_shared(schema(...)) with
make_shared_schema(...).

This makes it easier to drop a dependency on the differences between
seastar::make_shared and std::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Botond Dénes
566e31a5ac db/view: view_updating_consumer: allow passing custom update pusher
So that tests can test the `view_update_consumer` in isolation, without
having to set up the whole database machinery. In addition to less
infrastructure setup, this allows more direct checking of mutations
pushed for view generation.
2020-07-20 11:23:39 +03:00
Botond Dénes
0166f97096 db/view: view_update_generator: make staging reader evictable
The view update generation process creates two readers. One is used to
read the staging sstables, the data which needs view updates to be
generated for, and another reader for each processed mutation, which
reads the current value (pre-image) of each row in said mutation. The
staging reader is created first and is kept alive until all staging data
is processed. The pre-image reader is created separately for each
processed mutation. The staging reader is not restricted, meaning it
does not wait for admission on the relevant reader concurrency
semaphore, but it does register its resource usage on it. The pre-image
reader however *is* restricted. This creates a situation, where the
staging reader possibly consumes all resources from the semaphore,
leaving none for the later created pre-image reader, which will not be
able to start reading. This will block the view building process meaning
that the staging reader will not be destroyed, causing a deadlock.

This patch solves this by making the staging reader restricted and
making it evictable. To prevent thrashing -- evicting the staging reader
after reading only a really small partition -- we only make the staging
reader evictable after we have read at least 1MB worth of data from it.
2020-07-20 11:23:39 +03:00
Botond Dénes
84357f0722 db/view: view_updating_consumer: move implementation from table.cc to view.cc
table.cc is a very counter-intuitive place for view related stuff,
especially if the declarations reside in `db/view/`.
2020-07-20 11:23:39 +03:00
Avi Kivity
5371be71e9 Merge "Reduce fanout of some mutation-related headers" from Pavel E
"
The set's goal is to reduce the indirect fanout of 3 headers only,
but likely affects more. The measured improvement rates are

flat_mutation_reader.hh: -80%
mutation.hh            : -70%
mutation_partition.hh  : -20%

tests: dev-build, 'checkheaders' for changed headers (the tree-wide
       fails on master)
"

* 'br-debloat-mutation-headers' of https://github.com/xemul/scylla:
  headers:: Remove flat_mutation_reader.hh from several other headers
  migration_manager: Remove db/schema_tables.hh inclustion into header
  storage_proxy: Remove frozen_mutation.hh inclustion
  storage_proxy: Move paxos/*.hh inclusions from .hh to .cc
  storage_proxy: Move hint_wrapper from .hh to .cc
  headers: Remove mutation.hh from trace_state.hh
2020-07-19 19:47:59 +03:00
Eliran Sinvani
b97f466438 schema: take into account features when converting a table creation to
schema_mutations

When upgrading from a version that lacks some schema features,
during the transition, when we have a mixed cluster. Schema digests
are calculated without taking into account the mixed cluster supported
features. Every node calculate the digest as if the whole cluster supports
its supported features.
Scylla already has a mechanism of redaction to the lowest common
denominator, but it haven't been used in this context.

This commit is using the redaction mechanism when calculating the digest on
the newly added table so it will match the supported features of the
whole cluster.

Tests: Manual upgrading - upgraded to a version with an additional
feature and additional schema column and validated that the digest
of the tables schema is identical on every node on the mixed cluster.
2020-07-19 10:30:51 +03:00
Pavel Emelyanov
92f58f62f2 headers:: Remove flat_mutation_reader.hh from several other headers
All they can live with forward declaration of the f._m._r. plus a
seastar header in commitlog code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:47 +03:00
Pavel Emelyanov
8618a02815 migration_manager: Remove db/schema_tables.hh inclustion into header
The schema_tables.hh -> migration_manager.hh couple seems to work as one
of "single header for everyhing" creating big blot for many seemingly
unrelated .hh's.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:43 +03:00
Benny Halevy
3ab1d9fe1d commitlog: use seastar::with_file_close_on_failure
`close_on_failure` was committed to seastar so use the
library version.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:32:32 +03:00
Benny Halevy
742298fa2a commitlog: descriptor: make nothrow move constructible
inherit from sstring nothrow move constructor.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:32:22 +03:00
Benny Halevy
54c5583b8d commitlog: allocate_segment_ex, segment: pass descriptor by value
Besdies being more robust than passing const descriptor&
to continuations, this helps simplify making allocate_segment_ex's
continuations nothrow_move_constructible, that is need for using
seastar::with_file_close_on_failure().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:31:12 +03:00
Benny Halevy
22c384c2e9 commitlog: allocate_segment_ex: filename capture is unused
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:23:57 +03:00
Pavel Solodovnikov
5ff5df1afd storage_proxy: un-hardcode force sync flag for mutate_locally(mutation) overload
Corresponding overload of `storage_proxy::mutate_locally`
was hardcoded to pass `db::commitlog::force_sync::no` to the
`database::apply`. Unhardcode it and substitute `force_sync::no`
to all existing call sites (as it were before).

`force_sync::yes` will be used later for paxos learn writes
when trying to apply mutations upgraded from an obsolete
schema version (similar to the current case when applying
locally a `frozen_mutation` stored in accepted proposal).

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200716124915.464789-1-pa.solodovnikov@scylladb.com>
2020-07-16 16:38:48 +03:00
Avi Kivity
0c7c255f94 Merge "compaction uuid for log and compaction_history" from Benny
"
We'd like to use the same uuid both for printing compaction log
messages and to update compaction_history.

Generate one when starting compaction and keep it in
compaction_info.  Then use it by convention in all
compaction log messages, along with compaction type,
and keyspace.table information.  Finally, use the
same uuid to update compaction_history.

Fixes #6840
"

* tag 'compaction-uuid-v1' of github.com:bhalevy/scylla:
  compaction: print uuid in log messages
  compaction: report_(start|finish): just return description
  compaction: move compaction uuid generation to compaction_info
2020-07-16 16:38:48 +03:00
Benny Halevy
e39fbe1849 compaction: move compaction uuid generation to compaction_info
We'd like to use the same uuid both for printing compaction log
messages and to update compaction_history.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 13:55:23 +03:00