Commit Graph

1832 Commits

Author SHA1 Message Date
Avi Kivity
fd1dd0eac7 Merge "Track the memory consumption of reader buffers" from Botond
"
The last major untracked area of the reader pipeline is the reader
buffers. These scale with the number of readers as well as with the size
and shape of data, so their memory consumption is unpredictable varies
wildly. For example many small rows will trigger larger buffers
allocated within the `circular_buffer<mutation_fragment>`, while few
larger rows will consume a lot of external memory.

This series covers this area by tracking the memory consumption of both
the buffer and its content. This is achieved by passing a tracking
allocator to `circular_buffer<mutation_fragment>` so that each
allocation it makes is tracked. Additionally, we now track the memory
consumption of each and every mutation fragment through its whole
lifetime. Initially I contemplated just tracking the `_buffer_size` of
`flat_mutation_reader::impl`, but concluded that as our reader trees are
typically quite deep, this would result in a lot of unnecessary
`signal()`/`consume()` calls, that scales with the number of mutation
fragments and hence adds to the already considerable per mutation
fragment overhead. The solution chosen in this series is to instead
track the memory consumption of the individual mutation fragments, with
the observation that these are typically always moved and very rarely
copied, so the number of `signal()`/`consume()` calls will be minimal.

This additional tracking introduces an interesting dilemma however:
readers will now have significant memory on their account even before
being admitted. So it may happen that they can prevent their own
admission via this memory consumption. To prevent this, memory
consumption is only forwarded to the semaphore upon admission. This
might be solved when the semaphore is moved to the front -- before the
cache.
Another consequence of this additional, more complete tracking is that
evictable readers now consume memory even when the underlying reader is
evicted. So it may happen that even though no reader is currently
admitted, all memory is consumed from the semaphore. To prevent any such
deadlocks, the semaphore now admits a reader unconditionally if no
reader is admitted -- that is if all count resources all available.

Refs: #4176

Tests: unit(dev, debug, release)
"

* 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits)
  test/manual/sstable_scan_footprint_test: run test body in statement sched group
  test/manual/sstable_scan_footprint_test: move test main code into separate function
  test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s
  test/manual/sstable_scan_footprint_test: make clustering row size configurable
  test/manual/sstable_scan_footprint_test: document sstable related command line arguments
  mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*()
  test: simple_schema: add make_static_row()
  reader_permit: reader_resources: add operator==
  mutation_fragment: memory_usage(): remove unused schema parameter
  mutation_fragment: track memory usage through the reader_permit
  reader_permit: resource_units: add permit() and resources() accessors
  mutation_fragment: add schema and permit
  partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment
  mutation_fragment: remove as_mutable_end_of_partition()
  mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/
  mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/
  mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
  mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/
  flat_mutation_reader: make _buffer a tracked buffer
  mutation_reader: extract the two fill_buffer_result into a single one
  ...
2020-09-29 16:08:16 +03:00
Eliran Sinvani
925cdc9ae1 consistency level: fix wrong quorum calculation whe RF = 0
We used to calculate the number of endpoints for quorum and local_quorum
unconditionally as ((rf / 2) + 1). This formula doesn't take into
account the corner case where RF = 0, in this situation quorum should
also be 0.
This commit adds the missing corner case.

Tests: Unit Tests (dev)
Fixes #6905

Closes #7296
2020-09-29 13:25:41 +03:00
Piotr Sarna
4b856cf62d transport: make max_concurrent_requests_per_shard reloadable
This configuration entry is expected to be used as a quick fix
for an overloaded node, so it should be possible to reload this value
without having to restart the server.
2020-09-29 10:11:36 +02:00
Piotr Sarna
b4db6d2598 transport,config: add a param for max request concurrency
The newly introduced parameter - max_concurrent_requests_per_shard
- can be used to limit the number of in-flight requests a single
coordinator shard can handle. Each surplus request will be
immediately refused by returning OverloadedException error to the client.
The default value for this parameter is large enough to never
actually shed any requests.
Currently, the limit is only applied to CQL requests - other frontends
like alternator and redis are not throttled yet.
2020-09-29 09:59:30 +02:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
4f5ccf82cb mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_clustering_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Piotr Dulikowski
39771967bb hinted handoff: fix race - decomission vs. endpoint mgr init
This patch fixes a race between two methods in hints manager: drain_for
and store_hint.

The first method is called when a node leaves the cluster, and it
'drains' end point hints manager for that node (sends out all hints for
that node). If this method is called when the local node is being
decomissioned or removed, it instead drains hints managers for all
endpoints.

In the case of decomission/remove, drain_for first calls
parallel_for_each on all current ep managers and tells them to drain
their hints. Then, after all of them complete, _ep_managers.clear() is
called.

End point hints managers are created lazily and inserted into
_ep_managers map the first time a hint is stored for that node. If
this happens between parallel_for_each and _ep_managers.clear()
described above, the clear operation will destroy the new ep manager
without draining it first. This is a bug and will trigger an assert in
ep manager's destructor.

To solve this, a new flag for the hints manager is added which is set
when it drains all ep managers on removenode/decommission, and prevents
further hints from being written.

Fixes #7257

Closes #7278
2020-09-24 14:51:24 +03:00
Avi Kivity
844b675520 view: view_update_generator: drop references to sstables when stopping
sstable_manager will soon wait for all sstables under its
control to be deleted (if so marked), but that can't happen
if someone is holding on to references to those sstables.

To allow sstables_manager::stop() to work, drop remaining
queued work when terminating.
2020-09-23 20:55:02 +03:00
Avi Kivity
a0ffcabd66 view: use nonwrapping_interval instead of nonwrapping_range to avoid clang deduction failure
We use class template argument deduction (CTAD) in a few places, but it appears
not to work for alias templates in clang. While it looks like a clang bug, using
the class name is an improvement, so let's do that.
2020-09-21 16:32:53 +03:00
Pavel Solodovnikov
6e10f2b530 schema_registry: make grace period configurable
Introduce new database config option `schema_registry_grace_period`
describing the amount of time in seconds after which unused schema
versions will be cleaned up from the schema registry cache.

Default value is 1 second, the same value as was hardcoded before.

Tests: unit(debug)
Refs: #7225

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200915131957.446455-1-pa.solodovnikov@scylladb.com>
2020-09-15 17:53:27 +02:00
Avi Kivity
dcaf4ea4dd Merge "Fix race in schema version recalculation leading to stale schema version in gossip" from Tomasz
"
Migration manager installs several cluster feature change listeners.
The listeners will call update_schema_version_and_announce() when cluster
features are enabled, which does this:

    return update_schema_version(proxy, features).then([] (utils::UUID uuid) {
        return announce_schema_version(uuid);
    });

It first updates the schema version and then publishes it via
gossip in announce_schema_version(). It is possible that the
announce_schema_version() part of the first schema change will be
deferred and will execute after the other four calls to
update_schema_version_and_announce(). It will install the old schema
version in gossip instead of the more recent one.

The fix is to serialize schema digest calculation and publishing.

Refs #7200

This problem also brought my attention to initialization code, which could be
prone to the same problem.

The storage service computes gossiper states before it starts the
gossiper. Among them, node's schema version. There are two problems with that.

First is that computing the schema version and publishing it is not
atomic, so is not safe against concurrent schema changes or schema
version recalculations. It will not exclude with
recalculate_schema_version() calls, and we could end up with the old
(and incorrect) schema version being advertised in gossip.

Second problem is that we should not allow the database layer to call
into the gossiper layer before it is fully initialized, as this may
produce undefined behavior.

Maybe we're not doing concurrent schema changes/recalculations now,
but it is easy to imagine that this could change for whatever reason
in the future.

The solution for both problems is to break the cyclic dependency
between the database layer and the storage_service layer by having the
database layer not use the gossiper at all. The database layer
publishes schema version inside the database class and allows
installing listeners on changes. The storage_service layer asks the
database layer for the current version when it initializes, and only
after that installs a listener which will update the gossiper.

Tests:

  - unit (dev)
  - manual (3 node ccm)
"

* tag 'fix-schema-digest-calculation-race-v1' of github.com:tgrabiec/scylla:
  db, schema: Hide update_schema_version_and_announce()
  db, storage_service: Do not call into gossiper from the database layer
  db: Make schema version observable
  utils: updateable_value_source: Introduce as_observable()
  schema: Fix race in schema version recalculation leading to stale schema version in gossip
2020-09-14 12:37:46 +03:00
Tomasz Grabiec
691009bc1e db, schema: Hide update_schema_version_and_announce() 2020-09-11 14:42:48 +02:00
Tomasz Grabiec
9f58dcc705 db, storage_service: Do not call into gossiper from the database layer
The storage service computes gossiper states before it starts the
gossiper. Among them, node's schema version. There are two problems with that.

First is that computing the schema version and publishing it is not
atomic, so is not safe against concurrent schema changes or schema
version recalculations. It will not exclude with
recalculate_schema_version() calls, and we could end up with the old
(and incorrect) schema version being advertised in gossip.

Second problem is that we should not allow the database layer to call
into the gossiper layer before it is fully initialized, as this may
produce undefined behavior.

The solution for both problems is to break the cyclic dependency
between the database layer and the storage_service layer by having the
database layer not use the gossiper at all. The database layer
publishes schema version inside the database class and allows
installing listeners on changes. The storage_service layer asks the
database layer for the current version when it initializes, and only
after that installs a listener which will update the gossiper.

This also allows us to drop unsafe functions like update_schema_version().
2020-09-11 14:42:41 +02:00
Tomasz Grabiec
1a57d641d1 schema: Fix race in schema version recalculation leading to stale schema version in gossip
Migration manager installs several feature change listeners:

    if (this_shard_id() == 0) {
        _feature_listeners.push_back(_feat.cluster_supports_view_virtual_columns().when_enabled(update_schema));
        _feature_listeners.push_back(_feat.cluster_supports_digest_insensitive_to_expiry().when_enabled(update_schema));
        _feature_listeners.push_back(_feat.cluster_supports_cdc().when_enabled(update_schema));
        _feature_listeners.push_back(_feat.cluster_supports_per_table_partitioners().when_enabled(update_schema));
    }

They will call update_schema_version_and_announce() when features are enabled, which does this:

    return update_schema_version(proxy, features).then([] (utils::UUID uuid) {
        return announce_schema_version(uuid);
    });

So it first updates the schema version and then publishes it via
gossip in announce_schema_version(). It is possible that the
announce_schema_version() part of the first schema change will be
deferred and will execute after the other four calls to
update_schema_version_and_announce(). It will install the old schema
version in gossip instead of the more recent one.

The fix is to serialize schema digest calculation and publishing.

Refs #7200
2020-09-11 14:40:28 +02:00
Piotr Grabowski
462d12f555 db: Propagate enable_cache to system keyspaces
Make enable_cache configuration option also
affect caching of system keyspaces. Fixes #2909.
2020-09-07 17:54:46 +03:00
Avi Kivity
64c7c81bac Merge "Update log messages to {fmt} rules" from Pavel E
"
Before seastar is updated with the {fmt} engine under the
logging hood, some changes are to be made in scylla to
conform to {fmt} standards.

Compilation and tests checked against both -- old (current)
and new seastar-s.

tests: unit(dev), manual
"

* 'br-logging-update' of https://github.com/xemul/scylla:
  code: Force formatting of pointer in .debug and .trace
  code: Format { and } as {fmt} needs
  streaming: Do not reveal raw pointer in info message
  mp_row_consumer: Provide hex-formatting wrapper for bytes_view
  heat_load_balance: Include fmt/ranges.h
2020-09-03 15:10:09 +03:00
Kamil Braun
ff78a3c332 cdc: rename CDC description tables... again
Commit a6ad70d3da changed the format of
stream IDs: the lower 8 bytes were previously generated randomly, now
some of them have semantics. In particular, the least significant byte
contains a version (stream IDs might evolve with further releases).

This is a backward-incompatible change: the code won't properly handle
stream IDs with all lower 8 bytes generated randomly. To protect us from
subtle bugs, the code has an assertion that checks the stream ID's
version.

This means that if an experimental user used CDC before the change and
then upgraded, they might hit the assertion when a node attempts to
retrieve a CDC generation with old stream IDs from the CDC description
tables and then decode it.
In effect, the user won't even be able to start a node.

Similarly as with the case described in
d89b7a0548, the simplest fix is to rename
the tables. This fix must get merged in before CDC goes out of
experimental.

Now, if the user upgrades their cluster from a pre-rename version, the
node will simply complain that it can't obtain the CDC generation
instead of preventing the cluster from working. The user will be able to
use CDC after running checkAndRepairCDCStreams.

Since a new table is added to the system_distributed keyspace, the
cluster's schema has changed, so sstables and digests need to be
regenerated for schema_digest_test.
2020-08-31 11:33:14 +03:00
Rafael Ávila de Espíndola
d18af34205 everywhere: Use future::get0 when appropriate
This works with current seastar and clears most of the way for
updating to a version that doesn't use std::tuple in futures.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200826231947.1145890-1-espindola@scylladb.com>
2020-08-27 15:05:51 +03:00
Piotr Sarna
ca9422ca73 Merge 'Fix view_builder lockup and crash on shutdown' from Pavel
The lockup:

When view_builder starts all shards at some point get to a
barrier waiting for each other to pass. If any shard misses
this checkpoint, all others stuck forever. As this barrier
lives inside the _started future, which in turn is waited
on stop, the stop stucks as well.

Reasons to miss the barrier -- exception in the middle of the
fun^w start or explicit abort request while waiting for the
schema agreement.

Fix the "exception" case by unlocking the barrier promise with
exception and fix the "abort request" case by turning it into
an exception.

The bug can be reproduced by hands if making one shard never
see the schema agreement and continue looping until the abort
request.

The crash:

If the background start up fails, then the _started future is
resolved into exception. The view_builder::stop then turns this
future into a real exception caught-and-rethrown by main.cc.

This seems wrong that a failure in a background fiber aborts
the regular shutdown that may proceed otherwise.

tests: unit(dev), manual start-stop
branch: https://github.com/xemul/scylla/tree/br-view-builder-shutdown-fix-3
fixes: #7077

Patch #5 leaves the seastar::async() in the 1-st phase of the
start() although can also be tuned not to produce a thread.
However, there's one more (painless) issue with the _sem usage,
so this change appears too large for the part of the bug-fix
and will come as a followup.

* 'br-view-builder-shutdown-fix-3' of git://github.com/xemul/scylla:
  view_builder: Add comment about builder instances life-times
  view_builder: Do sleep abortable
  view_builder: Wakeup barrier on exception
  view_builder: Always resolve started future to success
  view_builder: Re-futurize start
  view_builder: Split calculate_shard_build_step into two
  view_builder: Populate the view_builder_init_state
  view_builder: Fix indentation after previous patch
  view_builder: Introduce view_builder_init_state
2020-08-27 11:51:46 +02:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Pavel Emelyanov
366b4e8a8f code: Format { and } as {fmt} needs
There are two places that want to print "{<text>}" strings, but do not format
the curly braces the {fmt}-way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Pavel Emelyanov
fe33e3ed78 heat_load_balance: Include fmt/ranges.h
To provide vector<> formatter for {fmt}

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:08 +03:00
Avi Kivity
3daa49f098 Merge "materialized views: Fix undefined behavior on base table schema changes" from Tomasz
"
The view_info object, which is attached to the schema object of the
view, contains a data structure called
"base_non_pk_columns_in_view_pk". This data structure contains column
ids of the base table so is valid only for a particular version of the
base table schema. This data structure is used by materialized view
code to interpret mutations of the base table, those coming from base
table writes, or reads of the base table done as part of view updates
or view building.

The base table schema version of that data structure must match the
schema version of the mutation fragments, otherwise we hit undefined
behavior. This may include aborts, exceptions, segfaults, or data
corruption (e.g. writes landing in the wrong column in the view).

Before this patch, we could get schema version mismatch here after the
base table was altered. That's because the view schema did not change
when the base table was altered.

Another problem was that view building was using the current table's schema
to interpret the fragments and invoke view building. That's incorrect for two
reasons. First, fragments generated by a reader must be accessed only using
the reader's schema. Second, base_non_pk_columns_in_view_pk of the recorded
view ptrs may not longer match the current base table schema, which is used
to generate the view updates.

Part of the fix is to extract base_non_pk_columns_in_view_pk into a
third entity called base_dependent_view_info, which changes both on
base table schema changes and view schema changes.

It is managed by a shared pointer so that we can take immutable
snapshots of it, just like with schema_ptr. When starting the view
update, the base table schema_ptr and the corresponding
base_dependent_view_info have to match. So we must obtain them
atomically, and base_dependent_view_info cannot change during update.

Also, whenever the base table schema changes, we must update
base_dependent_view_infos of all attached views (atomically) so that
it matches the base table schema.

Fixes #7061.

Tests:

  - unit (dev)
  - [v1] manual (reproduced using scylla binary and cqlsh)
"

* tag 'mv-schema-mismatch-fix-v2' of github.com:tgrabiec/scylla:
  db: view: Refactor view_info::initialize_base_dependent_fields()
  tests: mv: Test dropping columns from base table
  db: view: Fix incorrect schema access during view building after base table schema changes
  schema: Call on_internal_error() when out of range id is passed to column_at()
  db: views: Fix undefined behavior on base table schema changes
  db: views: Introduce has_base_non_pk_columns_in_view_pk()
2020-08-26 17:37:52 +03:00
Pavel Emelyanov
cf1cb4d145 view_builder: Add comment about builder instances life-times
The barrier passing is tricky and deserves a description
about objects' life-times.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:56:38 +03:00
Pavel Emelyanov
643c431ce4 view_builder: Do sleep abortable
If one shard delays in seeing the schema agreement and returns on
abort request, other shards may get stuck waiting for it on the
status read barrier. Luckily with the previous patch the barrier
is exception-proof, so we may abort the waiting loop with exception
and handle the lock-up.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:56:38 +03:00
Pavel Emelyanov
c36bbc37c9 view_builder: Wakeup barrier on exception
If an exception pops up during the view_builder::start while some
shards wait for the status-read barrier, these shards are not woken
up, thus causing the shutdown to stuck.

Fix this by setting exception on the barrier promise, resolving all
pending and on-going futures.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:56:38 +03:00
Pavel Emelyanov
8f8ed625ab view_builder: Always resolve started future to success
If the view builder background start fails, the _started future resolves
to exceptional state. In turn, stopping the view builder keeps this state
through .finally() and aborts the shutdown very early, while it may and
should proceed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:56:38 +03:00
Pavel Emelyanov
60e21bb59a view_builder: Re-futurize start
Step two turning the view_builder::start() into a chain of lambdas --
rewrite (most of) the seastar::async()'s lambda into a more "classical"
form.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:56:38 +03:00
Pavel Emelyanov
77c7d94f85 view_builder: Split calculate_shard_build_step into two
The calculate_shard_build_step() has a cross-shard barrier in the middle and
passing the barrier is broken wrt exceptions that may happen before it. The
intention is to prepare this barrier passing for exception handling by turning
the view_builder::start() into a dedicated continuation lambda.

Step one in this campaign -- split the calculate_shard_build_step() into
steps called by view_builder::start():

 - before the barrier
 - barrier
 - after the barrier

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:56:38 +03:00
Pavel Emelyanov
fe0326b75b view_builder: Populate the view_builder_init_state
Keep the internal calculate_shard_build_step()'s stuff on the init helper
struct, as the method in question is about to be split into a chain of
continuation lambdas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:56:35 +03:00
Pavel Emelyanov
2d2d04c6b7 view_builder: Fix indentation after previous patch
No functional changes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:46:36 +03:00
Pavel Emelyanov
d0393d92a2 view_builder: Introduce view_builder_init_state
This is the helper initialization struct that will carry the needed
objects accross continuation lambdas.

The indentation in ::start() will be fixed in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 15:45:15 +03:00
Botond Dénes
0a8cc4c2b5 db/size_estimates_virtual_reader: remove redundant _schema member
This reader was probably created in ancient times, when readers didn't
yet have a _schema member of their own. But now that they do, it is not
necessary to store the schema in the reader implementation, there is one
available in the parent class.

While at it also move the schema into the class when calling the
constructor.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200821070358.33937-1-bdenes@scylladb.com>
2020-08-22 20:47:49 +03:00
Tomasz Grabiec
c44455d514 Merge "Miscellaneous schema code cleanups" from Rafael 2020-08-20 15:19:42 +02:00
Rafael Ávila de Espíndola
33669bd21d commitlog: Use try_with_gate
Now that we have try_with_gate we can use instead of futurize_invoke
and with_gate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200819191334.74108-1-espindola@scylladb.com>
2020-08-20 15:19:42 +02:00
Tomasz Grabiec
cf12b5e537 db: view: Refactor view_info::initialize_base_dependent_fields()
It is no longer called once for a given view_info, so the name
"initialize" is not appropriate.

This patch splits the "initialize" method into the "make" part, which
makes a new base_info object, and the "set" part, which changes the
current base_info object attached to the view.
2020-08-20 14:53:07 +02:00
Tomasz Grabiec
f8df214836 db: view: Fix incorrect schema access during view building after base table schema changes
The view building process was accessing mutation fragments using
current table's schema. This is not correct, fragments must be
accessed using the schema of the generating reader.

This could lead to undefined behavior when the column set of the base
table changes. out_of_range exceptions could be observed, or data in
the view ending up in the wrong column.

Refs #7061.

The fix has two parts. First, we always use the reader's schema to
access fragments generated by the reader.

Second, when calling populate_views() we upgrade the fragment-wrapping
reader's schema to the base table schema so that it matches the base
table schema of view_and_base snapshots passed to populate_views().
2020-08-20 14:53:07 +02:00
Tomasz Grabiec
3a6ec9933c db: views: Fix undefined behavior on base table schema changes
The view_info object, which is attached to the schema object of the
view, contains a data structure called
"base_non_pk_columns_in_view_pk". This data structure contains column
ids of the base table so is valid only for a particular version of the
base table schema. This data structure is used by materialized view
code to interpret mutations of the base table, those coming from base
table writes, or reads of the base table done as part of view updates
or view building.

The base table schema version of that data structure must match the
schema version of the mutation fragments, otherwise we hit undefined
behavior. This may include aborts, exceptions, segfaults, or data
corruption (e.g. writes landing in the wrong column in the view).

Before this patch, we could get schema version mismatch here after the
base table was altered. That's because the view schema does not change
when the base table is altered.

Part of the fix is to extract base_non_pk_columns_in_view_pk into a
third entitiy called base_dependent_view_info, which changes both on
base table schema changes and view schema changes.

It is managed by a shared pointer so that we can take immutable
snapshots of it, just like with schema_ptr. When starting the view
update, the base table schema_ptr and the corresponding
base_dependent_view_info have to match. So we must obtain them
atomically, and base_dependent_view_info cannot change during update.

Also, whenever the base table schema changes, we must update
base_dependent_view_infos of all attached views (atomically) so that
it matches the base table schema.

Refs #7061.
2020-08-20 14:53:07 +02:00
Tomasz Grabiec
dc18117b82 db: views: Introduce has_base_non_pk_columns_in_view_pk()
In preparation for pushing _base_non_pk_columns_in_view_pk deeper.
2020-08-20 14:53:07 +02:00
Rafael Ávila de Espíndola
6363716799 schema: Pass an rvalue to set_compaction_strategy_options
This produces less code and makes sure every caller moves the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-08-19 14:02:35 -07:00
Pavel Emelyanov
78298ec776 init: Use local messaging reference in main
There are few places that initialize db and system_ks and need the
messaging service. Pass the reference to it from main instead of
using the global helpers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 13:08:12 +03:00
Botond Dénes
22a6493716 view_update_generator: fix race between registering and processing sstables
fea83f6 introduced a race between processing (and hence removing)
sstables from `_sstables_with_tables` and registering new ones. This
manifested in sstables that were added concurrently with processing a
batch for the same sstables being dropped and the semaphore units
associated with them not returned. This resulted in repairs being
blocked indefinitely as the units of the semaphore were effectively
leaked.

This patch fixes this by moving the contents of `_sstables_with_tables`
to a local variable before starting the processing. A unit test
reproducing the problem is also added.

Fixes: #6892

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200817160913.2296444-1-bdenes@scylladb.com>
2020-08-18 10:22:35 +03:00
Pavel Solodovnikov
9aa4712270 lwt: introduce paxos_grace_seconds per-table option to set paxos ttl
Previously system.paxos TTL was set as max(3h, gc_grace_seconds).

Introduce new per-table option named `paxos_grace_seconds` to set
the amount of seconds which are used to TTL data in paxos tables
when using LWT queries against the base table.

Default value is equal to `DEFAULT_GC_GRACE_SECONDS`,
which is 10 days.

This change allows to easily test various issues related to paxos TTL.

Fixes #6284

Tests: unit (dev, debug)

Co-authored-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200816223935.919081-1-pa.solodovnikov@scylladb.com>
2020-08-17 16:44:14 +02:00
Piotr Jastrzebski
01ea159fde codebase wide: use try_emplace when appropriate
C++17 introduced try_emplace for maps to replace a pattern:
if(element not in a map) {
    map.emplace(...)
}

try_emplace is more efficient and results in a more concise code.

This commit introduces usage of try_emplace when it's appropriate.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>
2020-08-16 14:41:09 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Nadav Har'El
8135647906 merge: Add metrics to semaphores
Merged pull request https://github.com/scylladb/scylla/pull/7018
by Piotr Sarna:

This series addresses various issues with metrics and semaphores - it mainly adds missing metrics, which makes it possible to see the length of the queues attached to the semaphores. In case of view building and view update generation, metrics was not present in these services at all, so a first, basic implementation is added.

More precise semaphore metrics would ease the testing and development of load shedding and admission control.

	view_builder: add metrics
	db, view: add view update generator metrics
	hints: track resource_manager sending queue length
	hints: add drain queue length to metrics
	table: add metrics for sstable deletion semaphore
	database: remove unused semaphore
2020-08-12 12:39:59 +03:00
Piotr Sarna
5086a5ca32 view_builder: add metrics
The view builder service lacked metrics, so a basic set of them
is added.
2020-08-11 17:43:53 +02:00
Piotr Sarna
e4d78b60ff db, view: add view update generator metrics
The view update generator completely lacked metrics, so a basic set
of them is now exposed.
2020-08-11 17:43:53 +02:00
Piotr Sarna
180a1505fd hints: track resource_manager sending queue length
The number of tasks waiting for a hint to be sent is now tracked.
2020-08-11 17:43:53 +02:00