Commit Graph

197 Commits

Author SHA1 Message Date
Botond Dénes
566e31a5ac db/view: view_updating_consumer: allow passing custom update pusher
So that tests can test the `view_update_consumer` in isolation, without
having to set up the whole database machinery. In addition to less
infrastructure setup, this allows more direct checking of mutations
pushed for view generation.
2020-07-20 11:23:39 +03:00
Botond Dénes
0166f97096 db/view: view_update_generator: make staging reader evictable
The view update generation process creates two readers. One is used to
read the staging sstables, the data which needs view updates to be
generated for, and another reader for each processed mutation, which
reads the current value (pre-image) of each row in said mutation. The
staging reader is created first and is kept alive until all staging data
is processed. The pre-image reader is created separately for each
processed mutation. The staging reader is not restricted, meaning it
does not wait for admission on the relevant reader concurrency
semaphore, but it does register its resource usage on it. The pre-image
reader however *is* restricted. This creates a situation, where the
staging reader possibly consumes all resources from the semaphore,
leaving none for the later created pre-image reader, which will not be
able to start reading. This will block the view building process meaning
that the staging reader will not be destroyed, causing a deadlock.

This patch solves this by making the staging reader restricted and
making it evictable. To prevent thrashing -- evicting the staging reader
after reading only a really small partition -- we only make the staging
reader evictable after we have read at least 1MB worth of data from it.
2020-07-20 11:23:39 +03:00
Botond Dénes
84357f0722 db/view: view_updating_consumer: move implementation from table.cc to view.cc
table.cc is a very counter-intuitive place for view related stuff,
especially if the declarations reside in `db/view/`.
2020-07-20 11:23:39 +03:00
Pavel Emelyanov
8618a02815 migration_manager: Remove db/schema_tables.hh inclustion into header
The schema_tables.hh -> migration_manager.hh couple seems to work as one
of "single header for everyhing" creating big blot for many seemingly
unrelated .hh's.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:43 +03:00
Amnon Heiman
ea8d52b11c row_locking: change estimated histogram with time_estimated_histogram
This patch changes the row locking latencies to use
time_estimated_histogram.

The change consist of changing the histogram definition and changing how
values are inserted to the histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-07-14 11:17:43 +03:00
Avi Kivity
b0698dfb38 Merge 'Rewrite CQL3 restriction representation' from dekimir
"
This is the first stage of replacing the existing restrictions code with a new representation. It adds a new class `expression` to replace the existing class `restriction`. Lots of the old code is deleted, though not all -- that will come in subsequent stages.

Tests: unit (dev, debug restrictions_test), dtest (next-gating)
"

* dekimir-restrictions-rewrite:
  cql3/restrictions: Drop dead code
  cql3/restrictions: Use free functions instead of methods
  cql3/restrictions: Create expression objects
  cql3/restrictions: Add free functions over new classes
  cql3/restrictions: Add new representation
2020-07-08 10:22:17 +03:00
Dejan Mircevski
37ebe521e3 cql3/restrictions: Use free functions instead of methods
Instead of `restriction` class methods, use the new free functions.
Specific replacement actions are listed below.

Note that class `restrictions` (plural) remains intact -- both its
methods and its type hierarchy remain intact for now.

Ensure full test coverage of the replacement code with new file
test/boost/restrictions_test.cc and some extra testcases in
test/cql/*.

Drop some existing tests because they codify buggy behaviour
(reference #6369, #6382).  Drop others because they forbid relation
combinations that are now allowed (eg, mixing equality and
inequality, comparing to NULL, etc.).

Here are some specific categories of what was replaced:

- restriction::is_foo predicates are replaced by using the free
  function find_if; sometimes it is used transitively (see, eg,
  has_slice)

- restriction::is_multi_column is replaced by dynamic casts (recall
  that the `restrictions` class hierarchy still exists)

- utility methods is_satisfied_by, is_supported_by, to_string, and
  uses_function are replaced by eponymous free functions; note that
  restrictions::uses_function still exists

- restriction::apply_to is replaced by free function
  replace_column_def

- when checking infinite_bound_range_deletions, the has_bound is
  replaced by local free function bounded_ck

- restriction::bounds and restriction::value are replaced by the more
  general free function possible_lhs_values

- using free functions allows us to simplify the
  multi_column_restriction and token_restriction hierarchies; their
  methods merge_with and uses_function became identical in all
  subclasses, so they were moved to the base class

- single_column_primary_key_restrictions<clustering_key>::needs_filtering
  was changed to reuse num_prefix_columns_that_need_not_be_filtered,
  which uses free functions

Fixes #5799.
Fixes #6369.
Fixes #6371.
Fixes #6372.
Fixes #6382.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-07-07 23:08:09 +02:00
Botond Dénes
5ebe2c28d1 db/view: view_update_generator: re-balance wait/signal on the register semaphore
The view update generator has a semaphore to limit concurrency. This
semaphore is waited on in `register_staging_sstable()` and later the
unit is returned after the sstable is processed in the loop inside
`start()`.
This was broken by 4e64002, which changed the loop inside `start()` to
process sstables in per table batches, however didn't change the
`signal()` call to return the amount of units according to the number of
sstables processed. This can cause the semaphore units to dry up, as the
loop can process multiple sstables per table but return just a single
unit. This can also block callers of `register_staging_sstable()`
indefinitely as some waiters will never be released as under the right
circumstances the units on the semaphore can permanently go below 0.
In addition to this, 4e64002 introduced another bug: table entries from
the `_sstables_with_tables` are never removed, so they are processed
every turn. If the sstable list is empty, there won't be any update
generated but due to the unconditional `signal()` described above, this
can cause the units on the semaphore to grow to infinity, allowing
future staging sstables producers to register a huge amount of sstables,
causing memory problems due to the amount of sstable readers that have
to be opened (#6603, #6707).
Both outcomes are equally bad. This patch fixes both issues and modifies
the `test_view_update_generator` unit test to reproduce them and hence
to verify that this doesn't happen in the future.

Fixes: #6774
Refs: #6707
Refs: #6603

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200706135108.116134-1-bdenes@scylladb.com>
2020-07-07 08:53:00 +02:00
Wojciech Mitros
76038b8d8e view: differentiate identical error messages and change them to warnings
Modified log message in view_builder::calculate_shard_build_step to make it distinct from the one in view_builder::execute, changed their logging level to warning, since we're continuing even if we handle an exception.

Fixes #4600
2020-07-06 20:50:34 +03:00
Botond Dénes
62c6859b69 db/view: view_update_generator: use partitioned sstable set
And pass it to `make_range_sstable_reader()` when creating the reader,
thus allowing the incremental selector created therein to exploit the
fact that staging sstables are disjoint (in the case of repair and
streaming at least). This should reduce the memory consumption of the
staging reader considerably when reading from a lot of sstables.
2020-07-06 13:38:23 +03:00
Rafael Ávila de Espíndola
64c8164e6c everywhere: Update to seastar api v4 (when_all_succeed returning a tuple)
We now just need to replace a few calls to then with then_unpack.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200618172100.111147-1-espindola@scylladb.com>
2020-06-23 19:40:18 +03:00
Avi Kivity
de38091827 priority_manager: merge streaming_read and streaming_write classes into one class
Streaming is handled by just once group for CPU scheduling, so
separating it into read and write classes for I/O is artificial, and
inflates the resources we allow for streaming if both reads and writes
happen at the same time.

Merge both classes into one class ("streaming") and adjust callers. The
merged class has 200 shares, so it reduces streaming bandwidth if both
directions are active at the same time (which is rare; I think it only
happens in view building).
2020-06-22 15:09:04 +03:00
Rafael Ávila de Espíndola
f6e407ecd2 everywhere: Prepare for seastar api v4 (when_all_succeed return value)
The seastar api v4 changes the return type of when_all_succeed. This
patch adds discard_result when that is best solution to handle the
change.

This doesn't do the actual update to v4 since there are still a few
issues left to fix in seastar. A patch doing just the update will
follow.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200617233150.918110-1-espindola@scylladb.com>
2020-06-18 15:13:56 +03:00
Piotr Sarna
3458bd2e32 db,view: fix outdated comments
Some comments still referred to variable names which are no longer
up-to-date.

Follow-up for #6560.
Message-Id: <2b857ccc900dd64f0d9379f5d6c87fd3aaa5d902.1591594042.git.sarna@scylladb.com>
2020-06-08 09:02:10 +03:00
Nadav Har'El
d6626c217a merge: add error injection to mv
Merged pull request https://github.com/scylladb/scylla/pull/6516 from
Piotr Sarna:

This series adds error injection points to materialized view paths:

	view update generation from staging sstables;
	view building;
	generating view updates from user writes.

This series comes with a corresponding dtest pull request which adds some
test cases based on error injection.

Fixes #6488
2020-06-07 19:23:23 +03:00
Piotr Sarna
b3a6a33487 db,view: ensure that local updates are applied locally
In current mutate_MV() code it's possible for a local endpoint
to become a target for a network operation. That's the source
of occasional `broken promise` benign error messages appearing,
since the mutation is actually applied locally, so there's no point
in creating a write response handler - the node will not send a response
to itself via network.
While at it, the code is deduplicated a little bit - with the paths
simplified, it's easier to ensure that a local endpoint is never
listed as a target for remote network operations.

Fixes #5459
Tests: unit(dev),
       dtest(materialized_views_test.TestMaterializedViews.add_dc_during_mv_insert_test)
2020-06-07 19:10:03 +03:00
Piotr Sarna
76e89efc1a db,view: add error injection points to view building
... in order to be able to test scenarios with failures.
2020-06-05 09:39:58 +02:00
Piotr Sarna
9d524a7a7e db,view: add error injection points to view update generator
... in order to be able to test scenarios with failures.
2020-06-05 09:39:58 +02:00
Avi Kivity
0c6bbc84cd Merge "Classify queries based on their initiator, rather than their target" from Botond
"
Currently we classify queries as "system" or "user" based on the table
they target. The class of a query determines how the query is treated,
currently: timeout, limits for reverse queries and the concurrency
semaphore. The catch is that users are also allowed to query system
tables and when doing so they will bypass the limits intended for user
queries. This has caused performance problems in the past, yet the
reason we decided to finally address this is that we want to introduce a
memory limit for unpaged queries. Internal (system) queries are all
unpaged and we don't want to impose the same limit on them.

This series uses scheduling groups to distinguish user and system
workloads, based on the assumption that user workloads will run in the
statement scheduling group, while system workloads will run in the main
(or default) scheduling group, or perhaps something else, but in any
case not in the statement one. Currently the scheduling group of reads
and writes is lost when going through the messaging service, so to be
able to use scheduling groups to distinguish user and system reads this
series refactors the messaging service to retain this distinction across
verb calls. Furthermore, we execute some system reads/writes as part of
user reads/writes, such as auth and schema sync. These processes are
tagged to run in the main group.
This series also centralises query classification on the replica and
moves it to a higher level. More specifically, queries are now
classified -- the scheduling group they run in is translated to the
appropriate query class specific configuration -- on the database level
and the configuration is propagated down to the lower layers.
Currently this query class specific configuration consists of the reader
concurrency semaphore and the max memory limit for otherwise unlimited
queries. A corollary of the semaphore begin selected on the database
level is that the read permit is now created before the read starts. A
valid permit is now available during all stages of the read, enabling
tracking the memory consumption of e.g. the memtable and cache readers.
This change aligns nicely with the needs of more accurate reader memory
tracking, which also wants a valid permit that is available in every layer.

The series can be divided roughly into the following distinct patch
groups:
* 01-02: Give system read concurrency a boost during startup.
* 03-06: Introduce user/system statement isolation to messaging service.
* 07-13: Various infrastructure changes to prepare for using read
  permits in all stages of reads.
* 14-19: Propagate the semaphore and the permit from database to the
  various table methods that currently create the permit.
* 20-23: Migrate away from using the reader concurrency semaphore for
  waiting for admission, use the permit instead.
* 24: Introduce `database::make_query_config()` and switch the database
  methods needing such a config to use it.
* 25-31: Get rid of all uses of `no_reader_permit()`.
* 32-33: Ban empty permits for good.
* 34: querier_cache: use the queriers' permits to obtain the semaphore.

Fixes: #5919

Tests: unit(dev, release, debug),
dtest(bootstrap_test.py:TestBootstrap.start_stop_test_node), manual
testing with a 2 node mixed cluster with extra logging.
"
* 'query-class/v6' of https://github.com/denesb/scylla: (34 commits)
  querier_cache: get semaphore from querier
  reader_permit: forbid empty permits
  reader_permit: fix reader_resources::operator bool
  treewide: remove all uses of no_reader_permit()
  database: make_multishard_streaming_reader: pass valid permit to multi range reader
  sstables: pass valid permits to all internal reads
  compaction: pass a valid permit to sstable reads
  database: add compaction read concurrency semaphore
  view: use valid permits for reads from the base table
  database: use valid permit for counter read-before-write
  database: introduce make_query_class_config()
  reader_concurrency_semaphore: remove wait_admission and consume_resources()
  test: move away from reader_concurrency_semaphore::wait_admission()
  reader_permit: resource_units: introduce add()
  mutation_reader: restricted_reader: work in terms of reader_permit
  row_cache: pass a valid permit to underlying read
  memtable: pass a valid permit to the delegate reader
  table: require a valid permit to be passed to most read methods
  multishard_mutation_query: pass a valid permit to shard mutation sources
  querier: add reader_permit parameter and forward it to the mutation_source
  ...
2020-05-29 10:11:44 +03:00
Piotr Sarna
77e943e9a3 db,views: unify time points used for update generation
Until now, view updates were generated with a bunch of random
time points, because the interface was not adjusted for passing
a single time point. The time points were used to determine
whether cells were alive (e.g. because of TTL), so it's better
to unify the process:
1. when generating view updates from user writes, a single time point
   is used for the whole operation
2. when generating view updates via the view building process,
   a single time point is used for each build step

NOTE: I don't see any reliable and deterministic way of writing
      test scenarios which trigger problems with the old code.
      After #6488 is resolved and error injection is integrated
      into view.cc, tests can be added.

Fixes #6429
Tests: unit(dev)
Message-Id: <f864e965eb2e27ffc13d50359ad1e228894f7121.1590070130.git.sarna@scylladb.com>
2020-05-28 12:56:09 +03:00
Botond Dénes
992e697dd5 view: use valid permits for reads from the base table
View update generation involves reading existing values from the base
table, which will soon require a valid permit to be passed to it, so
make sure we create and pass a valid permit to these reads.
We use `database::make_query_class_config()` to obtain the semaphore for
the read which selects the appropriate user/system semaphore based on
the scheduling group the base table write is running in.
2020-05-28 11:34:35 +03:00
Botond Dénes
cc5137ffe3 table: require a valid permit to be passed to most read methods
Now that the most prevalent users (range scan and single partition
reads) all pass valid permits we require all users to do so and
propagate the permit down towards `make_sstable_reader()`. The plan is
to use this permit for restricting the sstable readers, instead of the
semaphore the table is configured with. The various
`make_streaming_*reader()` overloads keep using the internal semaphores
as but they also create the permit before the read starts and pass it to
`make_sstable_reader()`.
2020-05-28 11:34:35 +03:00
Piotr Sarna
18a37d0cb1 db,view: add tracing to view update generation path
In order to improve materialized views' debuggability,
tracing points are added to view update generation path.

Sample info of an insert statement which resulted in producing
local view updates which require read-before-write:

 activity                                                                                                                           | timestamp                  | source    | source_elapsed | client
------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                 Execute CQL3 query | 2020-04-19 12:02:48.420000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                      Parsing a statement [shard 0] | 2020-04-19 12:02:48.420674 | 127.0.0.1 |             -- | 127.0.0.1
                                                                                                   Processing a statement [shard 0] | 2020-04-19 12:02:48.420753 | 127.0.0.1 |             79 | 127.0.0.1
                                  Creating write handler for token: -6715243485458697746 natural: {127.0.0.1} pending: {} [shard 0] | 2020-04-19 12:02:48.420815 | 127.0.0.1 |            141 | 127.0.0.1
                                                                   Creating write handler with live: {127.0.0.1} dead: {} [shard 0] | 2020-04-19 12:02:48.420824 | 127.0.0.1 |            149 | 127.0.0.1
                                                                                             Executing a mutation locally [shard 0] | 2020-04-19 12:02:48.420830 | 127.0.0.1 |            155 | 127.0.0.1
                                          View updates for ks.t1 require read-before-write - base table reader is created [shard 0] | 2020-04-19 12:02:48.420862 | 127.0.0.1 |            188 | 127.0.0.1
                                                                                        Generated 2 view update mutations [shard 0] | 2020-04-19 12:02:48.420910 | 127.0.0.1 |            235 | 127.0.0.1
 Locally applying view update for ks.t1_v_idx_index; base token = -6715243485458697746; view token = -4156302194539278891 [shard 0] | 2020-04-19 12:02:48.420918 | 127.0.0.1 |            243 | 127.0.0.1
                                              Successfully applied local view update for 127.0.0.1 and 0 remote endpoints [shard 0] | 2020-04-19 12:02:48.420971 | 127.0.0.1 |            297 | 127.0.0.1
                                                                     View updates for ks.t1 were generated and propagated [shard 0] | 2020-04-19 12:02:48.420973 | 127.0.0.1 |            299 | 127.0.0.1
                                                                                           Got a response from /127.0.0.1 [shard 0] | 2020-04-19 12:02:48.420988 | 127.0.0.1 |            314 | 127.0.0.1
                                                             Delay decision due to throttling: do not delay, resuming now [shard 0] | 2020-04-19 12:02:48.420990 | 127.0.0.1 |            315 | 127.0.0.1
                                                                                          Mutation successfully completed [shard 0] | 2020-04-19 12:02:48.420994 | 127.0.0.1 |            320 | 127.0.0.1
                                                                                     Done processing - preparing a result [shard 0] | 2020-04-19 12:02:48.421000 | 127.0.0.1 |            326 | 127.0.0.1
                                                                                                                   Request complete | 2020-04-19 12:02:48.420330 | 127.0.0.1 |            330 | 127.0.0.1

Sample info for remote updates:

 activity                                                                                                                                                           | timestamp                  | source    | source_elapsed | client
--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                                                 Execute CQL3 query | 2020-04-26 16:19:47.691000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                                                      Parsing a statement [shard 1] | 2020-04-26 16:19:47.691590 | 127.0.0.1 |              6 | 127.0.0.1
                                                                                                                                   Processing a statement [shard 1] | 2020-04-26 16:19:47.692368 | 127.0.0.1 |            783 | 127.0.0.1
                                                       Creating write handler for token: -3248873570005575792 natural: {127.0.0.3, 127.0.0.2} pending: {} [shard 1] | 2020-04-26 16:19:47.694186 | 127.0.0.1 |           2598 | 127.0.0.1
                                                                                        Creating write handler with live: {127.0.0.2, 127.0.0.3} dead: {} [shard 1] | 2020-04-26 16:19:47.694283 | 127.0.0.1 |           2699 | 127.0.0.1
                                                                                                                         Sending a mutation to /127.0.0.2 [shard 1] | 2020-04-26 16:19:47.694591 | 127.0.0.1 |           3006 | 127.0.0.1
                                                                                                                         Sending a mutation to /127.0.0.3 [shard 1] | 2020-04-26 16:19:47.694862 | 127.0.0.1 |           3277 | 127.0.0.1
                                                                                                                         Message received from /127.0.0.1 [shard 1] | 2020-04-26 16:19:47.696358 | 127.0.0.3 |             40 | 127.0.0.1
                                                                                                                         Message received from /127.0.0.1 [shard 1] | 2020-04-26 16:19:47.696442 | 127.0.0.2 |             32 | 127.0.0.1
                                                                           View updates for ks.t require read-before-write - base table reader is created [shard 1] | 2020-04-26 16:19:47.697762 | 127.0.0.3 |           1444 | 127.0.0.1
                                                                           View updates for ks.t require read-before-write - base table reader is created [shard 1] | 2020-04-26 16:19:47.698120 | 127.0.0.2 |           1710 | 127.0.0.1
                                                                                                                        Generated 1 view update mutations [shard 1] | 2020-04-26 16:19:47.699107 | 127.0.0.3 |           2789 | 127.0.0.1
 Sending view update for ks.t_v2_idx_index to 127.0.0.4, with pending endpoints = {}; base token = -3248873570005575792; view token = 1634052884888577606 [shard 1] | 2020-04-26 16:19:47.699345 | 127.0.0.3 |           3027 | 127.0.0.1
                                                                                                                         Sending a mutation to /127.0.0.4 [shard 1] | 2020-04-26 16:19:47.699614 | 127.0.0.3 |           3296 | 127.0.0.1
                                                                                                                        Generated 1 view update mutations [shard 1] | 2020-04-26 16:19:47.699824 | 127.0.0.2 |           3414 | 127.0.0.1
                                  Locally applying view update for ks.t_v2_idx_index; base token = -3248873570005575792; view token = 1634052884888577606 [shard 1] | 2020-04-26 16:19:47.700012 | 127.0.0.2 |           3603 | 127.0.0.1
                                                                                                      View updates for ks.t were generated and propagated [shard 1] | 2020-04-26 16:19:47.700059 | 127.0.0.3 |           3741 | 127.0.0.1
                                                                                                                         Message received from /127.0.0.3 [shard 1] | 2020-04-26 16:19:47.700958 | 127.0.0.4 |             37 | 127.0.0.1
                                                                              Successfully applied local view update for 127.0.0.2 and 0 remote endpoints [shard 1] | 2020-04-26 16:19:47.701522 | 127.0.0.2 |           5112 | 127.0.0.1
                                                                                                      View updates for ks.t were generated and propagated [shard 1] | 2020-04-26 16:19:47.701615 | 127.0.0.2 |           5206 | 127.0.0.1
                                                                                                                      Sending mutation_done to /127.0.0.1 [shard 1] | 2020-04-26 16:19:47.701913 | 127.0.0.3 |           5595 | 127.0.0.1
                                                                                                                                Mutation handling is done [shard 1] | 2020-04-26 16:19:47.702489 | 127.0.0.3 |           6171 | 127.0.0.1
                                                                                                                           Got a response from /127.0.0.3 [shard 1] | 2020-04-26 16:19:47.702667 | 127.0.0.1 |          11082 | 127.0.0.1
                                                                                             Delay decision due to throttling: do not delay, resuming now [shard 1] | 2020-04-26 16:19:47.702689 | 127.0.0.1 |          11105 | 127.0.0.1
                                                                                                                          Mutation successfully completed [shard 1] | 2020-04-26 16:19:47.702784 | 127.0.0.1 |          11200 | 127.0.0.1
                                                                                                                      Sending mutation_done to /127.0.0.1 [shard 1] | 2020-04-26 16:19:47.703016 | 127.0.0.2 |           6606 | 127.0.0.1
                                                                                                                     Done processing - preparing a result [shard 1] | 2020-04-26 16:19:47.703054 | 127.0.0.1 |          11470 | 127.0.0.1
                                                                                                                      Sending mutation_done to /127.0.0.3 [shard 1] | 2020-04-26 16:19:47.703720 | 127.0.0.4 |           2800 | 127.0.0.1
                                                                                                                                Mutation handling is done [shard 1] | 2020-04-26 16:19:47.704527 | 127.0.0.4 |           3607 | 127.0.0.1
                                                                                                                           Got a response from /127.0.0.4 [shard 1] | 2020-04-26 16:19:47.704580 | 127.0.0.3 |           8262 | 127.0.0.1
                                                                                             Delay decision due to throttling: do not delay, resuming now [shard 1] | 2020-04-26 16:19:47.704606 | 127.0.0.3 |           8288 | 127.0.0.1
                                                                                    Successfully applied view update for 127.0.0.4 and 1 remote endpoints [shard 1] | 2020-04-26 16:19:47.704853 | 127.0.0.3 |           8535 | 127.0.0.1
                                                                                                                                Mutation handling is done [shard 1] | 2020-04-26 16:19:47.706092 | 127.0.0.2 |           9682 | 127.0.0.1
                                                                                                                           Got a response from /127.0.0.2 [shard 1] | 2020-04-26 16:19:47.709933 | 127.0.0.1 |          18348 | 127.0.0.1
                                                                                                                                                   Request complete | 2020-04-26 16:19:47.702582 | 127.0.0.1 |          11582 | 127.0.0.1

Tests: unit(dev, debug)
2020-05-18 16:05:23 +02:00
Piotr Sarna
92aadb94e5 treewide: propagate trace state to write path
In order to add tracing to places where it can be useful,
e.g. materialized view updates and hinted handoff, tracing state
is propagated to all applicable call sites.
2020-05-18 16:05:23 +02:00
Piotr Sarna
f48e414eab db, view: remove duplicate entries from pending endpoints
When generating view updates, an endpoint can appear both
as a primary paired endpoint for the view update, and as a pending
endpoint (due to range movements). In order not to generate
the same update twice for the same endpoint, the paired endpoint
is removed from the list of pending endpoints if present.

Fixes #5459
Tests: unit(dev),
       dtest(TestMaterializedViews.add_dc_during_mv_insert_test)
2020-05-06 16:42:56 +03:00
Glauber Costa
1f9c37fb5e view_updating_consumer: move reference to a pointer
It is currently not possible to wrap the view_updating_consumer in an
std::optional. I intend to do it to allow for compactions to optionally
generate view updates.

The reason for that is that view_updating_consumer has a reference as a
member, which makes the move assignment constructor not be implicitly
generated.

This patch fixes it by keeping a pointer instead of a reference.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200421123648.8328-1-glauber@scylladb.com>
2020-04-22 10:05:35 +03:00
Glauber Costa
4e6400293e staging: potentially read many SSTables at the same time
There is no reason to read a single SSTable at a time from the staging
directory. Moving SSTables from staging directory essentially involves
scanning input SSTables and creating new SSTables (albeit in a different
directory).

We have a mechanism that does that: compactions. In a follow up patch, I
will introduce a new specialization of compaction that moves SSTables
from staging (potentially compacting them if there are plenty).

In preparation for that, some signatures have to be changed and the
view_updating_consumer has to be more compaction friendly. Meaning:
- Operating with an sstable vector
- taking a table reference, not a database

Because this code is a bit fragile and the reviewer set is fundamentally
different from anything compaction related, I am sending this separately

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-15 11:26:44 -04:00
Piotr Sarna
1a9083b342 db,view: guard view builder startup with a semaphore
The startup routine performs some bookkeeping operations on views,
and so do these events:
 - on_create_view;
 - on_drop_view;
 - on_update_view.
Since the above events are guarded with a semaphore, the startup
routine should also take the same semaphore - in order to ensure
that all bookkeeping operations are serialized.

Refs #6094
2020-04-05 11:41:26 +02:00
Piotr Sarna
8da4a5b78c db,view: nitpick: change & operator to && for booleans
Although it's technically correct to use the bitwise and operator
on booleans as well, it's slightly confusing for the reader.
2020-04-05 11:41:25 +02:00
Piotr Sarna
e49805b7b8 db,view: remove unneeded implicit capture-by-reference
The lambda does not use any other captures, so it does not to
implicitly capture anything by reference.
2020-04-05 11:41:25 +02:00
Piotr Sarna
3f19865493 db,view: fix waiting for a view building future
The future was marked with a `FIXME: discarded future`, but there's really
no reason not to wait for it, and it was probably meant to be waited for
since its implementation.
2020-04-05 11:41:25 +02:00
Botond Dénes
240b5e0594 frozen_schema: key() remove unused schema parameter
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200402092249.680210-1-bdenes@scylladb.com>
2020-04-02 14:43:35 +02:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00
Pavel Solodovnikov
adc6a98b59 cql3: return raw::parsed_statement as unique_ptr
Change CQL parsing routine to return std::unique_ptr
instead of seastar::shared_ptr.

This can help reduce redundant shared_ptr copies even further.

Make some supplementary changes necessary for this transition:
 * Remove enabled_shared_from_this base class from the following
   classes: truncate_statement, authorization_statement,
   authentication_statement: these were previously constructing
   prepared_statement instance in `prepare` method using
   `shared_from_this`.
   Make `prepare` methods implementation of inheriting classes
   mirror implementation from other statements (i.e.
   create a shallow copy of the object when prepairing into
   `prepared_statement`; this could be further refactored
   to avoid copies as much as possible).
 * Remove unused fields in create_role_statement which led to
   error while using compiler-generated copy ctor (copying
   uninitialied bool values via ctor).

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-03-23 23:19:21 +03:00
Botond Dénes
e0284bb9ee treewide: add missing headers and/or forward declarations 2020-03-23 09:29:45 +02:00
Nadav Har'El
7922b9eb8f materialized views: reduce recompilation when db/view/view.hh changes.
Before this patch, when db/view/view.hh was modified, 89 source files had to
be recompiled. After this patch, this number is down to 5.

Most of the irrelevant source files got view.hh by including database.hh,
which included view.hh just for the definition of statistics. So in this
patch we split the view statistics to a separate header file, view_stats.hh,
and database.hh only includes that. A few source files which included
only database.hh and also needed view.hh (for materialized-view related
functions) now need to include view.hh explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200319121031.540-1-nyh@scylladb.com>
2020-03-19 15:46:14 +02:00
Piotr Sarna
0c11e07faf view,table: fix waiting for view updates during building
View updates sent as part of the view building process should never
be ignored, but fd49fd7 introduced a bug which may cause exactly that:
the updates are mistakenly sent to background, so the view builder
will not receive negative feedback if an update failed, which will
in turn not cause a retry. Consequently, view building may report
that it "finished" building a view, while some of the updates were
lost. A simple fix is to restore previous behaviour - all updates
triggered by view building are now waited for.

Fixes #6038
Tests: unit(dev),
dtest: interrupt_build_process_with_resharding_low_to_half_test
2020-03-19 10:50:54 +02:00
Nadav Har'El
635e6d887c materialized views: fix corner case of view updates used by Alternator
While CQL does not allow creation of a materialized view with more than one
base regular column in the view's key, in Alternator we do allow this - both
partition and clustering key may be a base regular column. We had a bug in
the logic handling this case:

If the new base row is missing a value for *one* of the view key columns,
we shouldn't create a view row. Similarly, if the existing base row was
missing a value for *one* of the view key columns, a view row does not
exist and doesn't need to be deleted.  This was done incorrectly, and made
decisions based on just one of the key columns, and the logic is now
fixed (and I think, simplified) in this patch.

With this patch, the Alternator test which previously failed because of
this problem now passes. The patch also includes new tests in the existing
C++ unit test test_view_with_two_regular_base_columns_in_key. This tests
was already supposed to be testing various cases of two-new-key-columns
updates, but missed the cases explained above. These new tests failed
badly before this patch - some of them had clean write errors, others
caused crashes. With this patch, they pass.

Fixes #6008.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200312162503.8944-1-nyh@scylladb.com>
2020-03-15 07:57:33 +01:00
Piotr Sarna
2061e6a9cc db,view: perform local view updates synchronously
Local view updates (updates applied to a local node,
without remote communication) are from now on performed
synchronously - which adds consistency guarantees, as a local
write failure will be returned to the client instead of being
silently ignored.
2020-03-11 09:05:56 +01:00
Piotr Sarna
fd49fd773c db,view: move putting view updates to background to mutate_MV
Currently, launching view updates as an asynchronous background job
is done via not waiting for mutate_MV() future in
table::generate_and_propagate_view_updates. That has a big downside,
since mutate_MV() handles *all* view updates for *all* views of a table,
so it's not possible to wait for each view independently.
Per-view granularity is required in order to implement synchronous
view updates of local views - because then we'll synchronously
wait for all views that write to a local node (due to having a matching
partition key with the base), while remote view updates will still
be sent asynchronously.
In order to do that, instead of not waiting for mutate_MV,
we do wait for it properly, but instead launch the asynchronous,
unwaited-for futures inside mutate_MV.
Effectively that means no changes for view updates so far - all updates
will be fired in the background. Later, another patch will introduce
a way to wait for selected updates to finish.
2020-03-11 09:05:56 +01:00
Piotr Sarna
3b3659e8cd db,view: drop default parameter for mutate_MV::allow_hints
Default parameters are considered harmful, and as part of a cleanup
before editing view.cc code, a default value for allow_hints parameter
is removed.
2020-03-11 09:05:56 +01:00
Pavel Emelyanov
4fa12f2fb8 header: De-bloat schema.hh
The header sits in many other headers, but there's a handy
schema_fwd.hh that's tiny and contains needed declarations
for other headers. So replace shema.hh with schema_fwd.hh
in most of the headers (and remove completely from some).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200303102050.18462-1-xemul@scylladb.com>
2020-03-03 11:34:00 +01:00
Piotr Jastrzebski
76d154dbac view: stop calling global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Sarna
e93c54e837 db,view: fix generating view updates for partition tombstones
The update generation path must track and apply all tombstones,
both from the existing base row (if read-before-write was needed)
and for the new row. One such path contained an error, because
it assumed that if the existing row is empty, then the update
can be simply generated from the new row. However, lack of the
existing row can also be the result of a partition/range tombstone.
If that's the case, it needs to be applied, because it's entirely
possible that this partition row also hides the new row.
Without taking the partition tombstone into account, creating
a future tombstone and inserting an out-of-order write before it
in the base table can result in ghost rows in the view table.
This patch comes with a test which was proven to fail before the
changes.

Branches 3.1,3.2,3.3
Fixes #5793

Tests: unit(dev)
Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>
2020-02-12 23:16:30 +02:00
Avi Kivity
dcab666d52 cql3: query_processor: reduce #includes
query_processor is a central class, so reducing its includes
can reduce dependencies treewite. This patch removes includes
for parsed_statement, cf_statement, and untyped_result_set and
fixes up the rest of the tree to include what it lacks as a result
of these removals.
2020-02-09 12:24:24 +02:00
Pavel Emelyanov
e2ec5eecf6 view_update: Do not need storage_proxy
The view_update_generator acceps (and keeps) database and storage_proxy,
the latter is only needed to initialize the view_updating_consumer which,
in turn, only needs it to get database from (to find column family).

This can be relaxed by providing the database from _generator to _consumer
directly, without using the storage_proxy in between.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200207112427.18419-1-xemul@scylladb.com>
2020-02-07 13:30:01 +02:00
Eliran Sinvani
8cfc2aad57 internalize storage proxy statistics metric registration
The storage proxy statistics structure did not contain
a method for registering the statistics for metric
groups, instead, each user had to register some
of the metrics by itself. There is no real reason
for separating the metrics registration from
the statistics data. There is even less justification
for doing this only for part of the stats as is
the case for those statistics.
This commit internalize the metrics registration
in the storage_proxy stats structures.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2020-01-30 15:01:40 +01:00
Avi Kivity
17eaf552f0 Merge "Improve the accuracy of reader memory tracking" from Botond
"
Grab the lowest hanging fruits.

This patch-set makes three important changes:
* Consume the memory for I/O operations on tracked files, *before* they
  are forwarded to the underlying file.
* Track memory consumed by buffers created for parsing in
  `continuous_data_consumer`. As this is the basis for the data, index
  and promoted index parsers, all three are covered now in this regard.
* Track the index file.

The remaining, not-so-low handing fruits in order of
gain/cost(performance) ratio:
* Track in-memory index lists.
* Track in-memory promoted index blocks.
* Track reader buffer memory.

Note that this ordering might change based on the workload and other
environmental factors.

Also included in this series is an infrastructure refactoring to make
tracking memory easier and involve including lighter headers, as well as
a manual test designed to allow testing and experimenting with the
effects of changes to the accuracy of the tracking of reader memory
consumption.

Refs: #4176
Refs: #2778

Tests: unit(dev), manual(sstable_scan_footprint_test)

The latter was run as:
build/dev/test/manual/sstable_scan_footprint_test -c1 -m2G --reads=4000
--read-concurrency=1 --logger-log-level test=trace --collect-stats
--stats-period-ms=20

This will trickle reads until the semaphore blocks, then wait until the
wait queue drains before sending new reads. This way we are not testing
the effectiveness of the pre-admission estimation (which is terribly
optimistic) and instead check that with slowly ramping up read load the
semaphore will block on memory preventing OOM.
This now runs to completion without a single `std::bad_alloc`. The read
concurrency semaphore allows between 15-30 reads, and is always blocked
on memory.
"

* 'more-accurate-reader-resource-tracking/v1' of ssh://github.com/denesb/scylla:
  test/manual/sstable_scan_footprint_test: improve memory consumption diagnostics
  tests/manual/sstable_scan_footprint_test: use the semaphore to determine read rate
  tests/manual: Add test measuring memory demand of concurrent sstable reads
  index_reader: make the index file tracked
  sstables/continuous_data_consumer: track buffers used for parsing
  reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively
  reader_concurrency_semaphore: bye reader_resource_tracker
  treewide: replace reader_resource_tracer with reader_permit
  reader_permit: expose make_tracked_temporary_buffer()
  reader_permit: introduce make_tracked_file()
  reader_permit: introduce memory_units
  reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh
  reader_concurrency_semaphore: reader_permit: make it a value type
  reader_concurrency_semaphore: s/resources/reader_resources/
  reader_concurrency_semaphore::reader_permit: move methods out-of-line
2020-01-29 00:11:17 +02:00
Botond Dénes
dfc8b2fc45 treewide: replace reader_resource_tracer with reader_permit
The former was never really more than a reader_permit with one
additional method. Currently using it doesn't even save one from any
includes. Now that readers will be using reader_permit we would have to
pass down both to mutation_source. Instead get rid of
reader_resource_tracker and just use reader_permit. Instead of making it
a last and optional parameter that is easy to ignore, make it a
first class parameter, right after schema, to signify that permits are
now a prominent part of the reader API.

This -- mostly mechanical -- patch essentially refactors mutation_source
to ask for the reader_permit instead of reader_resource_tracking and
updates all usage sites.
2020-01-28 08:13:16 +02:00
Dejan Mircevski
90b54c8c42 view_info: Drop partition_ranges()
The method view_info::partition_ranges() is unused.

Also drop the now-dead _partition_ranges data member.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-01-26 12:02:32 +02:00