Commit Graph

67 Commits

Author SHA1 Message Date
Pavel Emelyanov
837fde84b1 view: Carry data_dictionary arg through standalone helpers
There's a bunch of functions in view.{hh|cc} that don't belong to any
class and perform view-related claculations for view updates. Lots of
them eventually call view_info::select_statement() which will later need
the dictionary.

By now all those methods' callers have data dictionary at hand and can
share it via argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
1301a99ba3 view_updates: Carry data_dictionary argument throug methods
The goal is to have the dictionary at places that later wrap calls to
view_info::select_statement(). This graph of calls starts at the only
public view_updates::generate_update() method which, in turn, is called
from view_update_builder that already has data dictionary at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
9d3d533561 view_update_builder: Construct with data dictionary
The caller is table with view-update-generator at hand (it calls
mutate_MV on). Builder here is used as a temporary object that destroys
once the caller coroutine co_return-s, so keeping the database obtained
from the view-update-generator is safe.

Later the v.u.b. object will propagate its data dictionary down the
callstacks.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:38 +03:00
Pavel Emelyanov
821c8b19a6 view: Carry backing-secondary-index bit via view builder
When view builder constructs it populates itself with view updates.
Later the updates may instantiate the value_getter-s which, in turn,
would need to check if the view is backing secondary index.

Good news is that when view builder constructs it has all the
information at hand needed to evaluate this "backing" bit. It's then
propagated down to value_getter via corresponding view_updates.

The getter's _view field becomes unused after this change and is
(void)-ed to make this patch compile.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:48:36 +03:00
Pavel Emelyanov
7cabdc54a6 view: Make mutate_MV() method of view_update_generator
Nowadays its a static helper, but internally it depends on storage
proxy, so it grabs its global instance. Making it a method of view
update generator makes it possible to use the proxy dependency from the
generator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 18:48:14 +03:00
Avi Kivity
7bb717d2f9 treewide: prevent redefining names
gcc dislikes a member name that matches a type name, as it changes
the type name retroactively. Fix by fully-qualifying the type name,
so it is not changed by the newly-introduced member.
2023-03-21 13:42:49 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Nadav Har'El
92d03be37b materialized view: fix bug in some large modifications to base partitions
Sometimes a single modification to a base partition requires updates to
a large number of view rows. A common example is deletion of a base
partition containing many rows. A large BATCH is also possible.

To avoid large allocations, we split the large amount of work into
batch of 100 (max_rows_for_view_updates) rows each. The existing code
assumed an empty result from one of these batches meant that we are
done. But this assumption was incorrect: There are several cases when
a base-table update may not need a view update to be generated (see
can_skip_view_updates()) so if all 100 rows in a batch were skipped,
the view update stopped prematurely. This patch includes two tests
showing when this bug can happen - one test using a partition deletion
with a USING TIMESTAMP causing the deletion to not affect the first
100 rows, and a second test using a specially-crafed large BATCH.
These use cases are fairly esoteric, but in fact hit a user in the
wild, which led to the discovery of this bug.

The fix is fairly simple: To detect when build_some() is done it is no
longer enough to check if it returned zero view-update rows; Rather,
it explicitly returns whether or not it is done as an std::optional.

The patch includes several tests for this bug, which pass on Cassandra,
failed on Scylla before this patch, and pass with this patch.

Fixes #12297.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12305
2022-12-14 14:50:38 +02:00
Piotr Dulikowski
6ab41d76e6 replica/table: adjust the view read-before-write to return static rows when needed
Adjusts the read-before-write query issued in
`table::do_push_view_replica_updates` so that, when needed, requests
static columns and makes sure that the static row is present.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
18be90b1e6 db/view: process static rows in view_update_builder::on_results
The `view_update_builder::on_results()` function is changed to react to
static rows when comparing read-before-write results with the base table
mutation.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
2dd95d76f1 db/view: adjust existing view update generation path to use clustering_or_static_row
The view update path is modified to use `clustering_or_static_row`
instead of just `clustering_row`.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
986ab6034c db/view: add clustering_or_static_row
Adds a `clustering_or_static_row`, which is a common, immutable
representation of either a static or clustering row. It will allow to
handle view update generation based on static or clustering rows in a
uniform way.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
f7b7724eaf db/view: base_dependent_view_info: split non-pk columns into regular and static
Currently, `base_dependent_view_info::_base_non_pk_columns_in_view_pk`
field keeps a list of non-primary-key columns from the base table which
are a part of the view's primary key. Because the current code does not
allow indexes on static columns yet, the columns kept in the
aforementioned field are always assumed to be regular columns of the
base table and are kept as `column_id`s which do not contain information
about the column kind.

This commit splits the `_base_non_pk_columns_in_view_pk` field into two,
one for regular columns and the other for static columns, so that it is
possible to keep both kinds of columns in `base_dependent_view_info` and
the structure can be used for secondary indexes on static columns.
2022-12-06 11:21:16 +01:00
Nadav Har'El
e1f8cb6521 materialized views: inline used-once and confusing function, replace_entry()
The replace_entry() function is nothing more than a convenience for
calling delete_old_entry() and then create_entry(). But it is only used
once in the code, and we can just open-code the two calls instead of
the one.

The reason I want to change it now is that the shortcut replace_entry()
helped hide a bug (#11801) - replace_entry() works incorrectly if the
old and new row have the same key, because if they do we get a deletion
and creation of the same row with the same timestamp - and the deletion
wins. Having the two calls not hidden by a convenience function makes
this potential problem more apparent.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-10-19 13:25:34 +03:00
Benny Halevy
6fb4b5555d db: view: get_tombstone_gc_state from compaction_manager
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:05:39 +03:00
Benny Halevy
71ede6124a db: view: pass base table to view_update_builder
To be used by generate_update() for getting the
tombstone_gc_state via the table's compaction_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:04:23 +03:00
Nadav Har'El
5d556115a1 cql, view: rename and explain bytes_with_action
The structure "bytes_with_action" was very hard to understand because of
its mysterious and general-sounding name, and no comments.

In this patch I add a large comment explaining its purpose, and rename
it to a more suitable name, view_key_and_action, which suggests that
each such object is about one view key (where to add a view row), and
an additional "action" that we need to take beyond adding the view row.

This is the best I can do to make this code easier to understand without
completely reorganizing it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Michał Radwański
32289d681f db/view/view.cc: compute view_updates for views over collections
For collection indexes, logic of computing values for each of the column
needed to change, since a single particular column might produce more
than one value as a result.

The liveness info from individual cells of the collection impacts the
liveness info of resulting rows. Therefore it is needed to rewrite the
control flow - instead of functions getting a row from get_view_row and
later computing row markers and applying it, they compute these values
by themselves.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:49 +03:00
Michał Radwański
ebc4ad4713 column_computation.hh, schema.cc: collection_column_computation
This type of column computation will be used for creating updates to
materialized views that are indexes over collections.

This type features additional function, compute_values_with_action,
which depending on an (optional) old row and new row (the update to the
base table) returns multiple bytes_with_action, a vector of pairs
(computed value, some action), where the action signifies whether a
deletion of row with a specific key is needed, or creation thereby.
2022-08-14 10:29:13 +03:00
Pavel Emelyanov
62d95f09de view: De-futurize make_view_update_builder()
It doesn't sleep, just returns ready future with builder

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1384
       it's red because e-mail notification is broken (scylla-pkg#2988)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220718132529.30751-1-xemul@scylladb.com>
2022-07-18 17:15:48 +03:00
Benny Halevy
6454c8d67f db: view_updates: coroutinize move_to
And allow yielding in-between freezing each update mutation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-08 11:29:25 +03:00
Botond Dénes
909be0b9d7 db/view: convert view_update_builder interface to v2
The constructor and the make_ factory method now take v2 readers.
Immediate users are patched, with conversions if needed.
2022-03-17 10:50:50 +02:00
Botond Dénes
0740019e4d db/view: migrate view_update_builder to v2
To avoid noise, the interface is left as v1 and inbound readers are
converted in the constructor.
2022-03-17 10:47:55 +02:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Botond Dénes
f02632aeb0 range_tombstone_accumulator: drop _reversed flag 2021-09-09 15:42:15 +03:00
Piotr Sarna
a1813c9b34 db,view,table: drop unneeded time point parameter
Now that restriction checking is translated to the partition-slice-style
interface, checking the partition/clustering key restrictions for views
can be performed without the time point parameter.
The parameter is dropped from all relevant call sites.
2021-07-13 10:40:08 +02:00
Piotr Sarna
bf0777e97a view: generate view updates in smaller parts
In order to avoid large allocations and too large mutations
generated from large view updates, granularity of the process
is broken down from per-partition to smaller chunks.
The view update builder now produces partial updates, no more
than 100 view rows at a time.
2021-07-08 11:17:27 +02:00
Piotr Sarna
679dc4d824 db,view: move view_update_builder to the header
The builder is going to be used directly by the callers,
which requires making its definition public.
No semantic changes were intended.
2021-07-08 11:17:27 +02:00
Piotr Sarna
a7f7716ecf db,view: use chunked_vector for view updates
The number of view updates can grow large, especially in corner
cases like removing large base partitions. Chunked vector
prevents large allocations.
2021-06-17 10:15:17 +02:00
Piotr Sarna
f832a30388 db,view,table: futurize calculating affected ranges
In order to avoid stalls on large inputs, calculating
affected ranges is now able to yield.
2021-06-16 09:51:31 +02:00
Piotr Sarna
88d4a66e90 db,view: pass base token by value to mutate_MV
The base token is passed cross-continuations, so the current way
of passing it by const reference probably only works because the token
copying is cheap enough to optimize the reference out.
Fix by explicitly taking the token by value.
2021-06-14 09:30:38 +02:00
Pavel Solodovnikov
76bea23174 treewide: reduce header interdependencies
Use forward declarations wherever possible.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>

Closes #8813
2021-06-07 15:58:35 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Piotr Sarna
35887bf88b view: add printing missing base column on errors
When an out-of-sync view is attempted to be used in a write operation,
the whole operation needs to be aborted with an error. After this patch,
the error contains more context - namely, the missing column.
2020-10-31 12:22:07 +01:00
Piotr Sarna
ef3470fa34 view: simplify creating base-dependent info for reads only
The code which created base-dependent info for materialized views
can be expressed with fewer branches. Also, the constructor
which takes a single parameter is made explicit.
2020-10-31 12:22:07 +01:00
Eliran Sinvani
70e04c1123 view info: support partial match between base and view for
only reading from view.

The current implementation of materialized views does
no keep the version to which a specific version of materialized
view schema corresponds to. This complicate things especially on
old views versions that the schema doesn't support anymore. However,
the views, being also an independent table should allow reading from
them as long as they exist even if the base table changed since then.
For the reading purpose, we don't need to know the exact composition
of view primary key columns that are not part of the base primary
key, we only need to know that there are any, and this is a much
looser constrain on the schema.
We can rely on a table invariants such as the fact that pk columns are
not going to disappear on newer version of the table.
This means that if we don't find a view column in the base table, it is
not a part of the base table primary key.
This information is enough for us to perform read on the view.
This commit adds support for being able to rely on such partial
information along with a validation that it is not going to be used for
writes. If it is, we simply abort since this means that our schema
integrity is compromised.
2020-10-21 15:20:43 +03:00
Tomasz Grabiec
3a6ec9933c db: views: Fix undefined behavior on base table schema changes
The view_info object, which is attached to the schema object of the
view, contains a data structure called
"base_non_pk_columns_in_view_pk". This data structure contains column
ids of the base table so is valid only for a particular version of the
base table schema. This data structure is used by materialized view
code to interpret mutations of the base table, those coming from base
table writes, or reads of the base table done as part of view updates
or view building.

The base table schema version of that data structure must match the
schema version of the mutation fragments, otherwise we hit undefined
behavior. This may include aborts, exceptions, segfaults, or data
corruption (e.g. writes landing in the wrong column in the view).

Before this patch, we could get schema version mismatch here after the
base table was altered. That's because the view schema does not change
when the base table is altered.

Part of the fix is to extract base_non_pk_columns_in_view_pk into a
third entitiy called base_dependent_view_info, which changes both on
base table schema changes and view schema changes.

It is managed by a shared pointer so that we can take immutable
snapshots of it, just like with schema_ptr. When starting the view
update, the base table schema_ptr and the corresponding
base_dependent_view_info have to match. So we must obtain them
atomically, and base_dependent_view_info cannot change during update.

Also, whenever the base table schema changes, we must update
base_dependent_view_infos of all attached views (atomically) so that
it matches the base table schema.

Refs #7061.
2020-08-20 14:53:07 +02:00
Piotr Sarna
77e943e9a3 db,views: unify time points used for update generation
Until now, view updates were generated with a bunch of random
time points, because the interface was not adjusted for passing
a single time point. The time points were used to determine
whether cells were alive (e.g. because of TTL), so it's better
to unify the process:
1. when generating view updates from user writes, a single time point
   is used for the whole operation
2. when generating view updates via the view building process,
   a single time point is used for each build step

NOTE: I don't see any reliable and deterministic way of writing
      test scenarios which trigger problems with the old code.
      After #6488 is resolved and error injection is integrated
      into view.cc, tests can be added.

Fixes #6429
Tests: unit(dev)
Message-Id: <f864e965eb2e27ffc13d50359ad1e228894f7121.1590070130.git.sarna@scylladb.com>
2020-05-28 12:56:09 +03:00
Piotr Sarna
92aadb94e5 treewide: propagate trace state to write path
In order to add tracing to places where it can be useful,
e.g. materialized view updates and hinted handoff, tracing state
is propagated to all applicable call sites.
2020-05-18 16:05:23 +02:00
Nadav Har'El
7922b9eb8f materialized views: reduce recompilation when db/view/view.hh changes.
Before this patch, when db/view/view.hh was modified, 89 source files had to
be recompiled. After this patch, this number is down to 5.

Most of the irrelevant source files got view.hh by including database.hh,
which included view.hh just for the definition of statistics. So in this
patch we split the view statistics to a separate header file, view_stats.hh,
and database.hh only includes that. A few source files which included
only database.hh and also needed view.hh (for materialized-view related
functions) now need to include view.hh explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200319121031.540-1-nyh@scylladb.com>
2020-03-19 15:46:14 +02:00
Piotr Sarna
0c11e07faf view,table: fix waiting for view updates during building
View updates sent as part of the view building process should never
be ignored, but fd49fd7 introduced a bug which may cause exactly that:
the updates are mistakenly sent to background, so the view builder
will not receive negative feedback if an update failed, which will
in turn not cause a retry. Consequently, view building may report
that it "finished" building a view, while some of the updates were
lost. A simple fix is to restore previous behaviour - all updates
triggered by view building are now waited for.

Fixes #6038
Tests: unit(dev),
dtest: interrupt_build_process_with_resharding_low_to_half_test
2020-03-19 10:50:54 +02:00
Piotr Sarna
3b3659e8cd db,view: drop default parameter for mutate_MV::allow_hints
Default parameters are considered harmful, and as part of a cleanup
before editing view.cc code, a default value for allow_hints parameter
is removed.
2020-03-11 09:05:56 +01:00
Pavel Emelyanov
4fa12f2fb8 header: De-bloat schema.hh
The header sits in many other headers, but there's a handy
schema_fwd.hh that's tiny and contains needed declarations
for other headers. So replace shema.hh with schema_fwd.hh
in most of the headers (and remove completely from some).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200303102050.18462-1-xemul@scylladb.com>
2020-03-03 11:34:00 +01:00
Eliran Sinvani
8cfc2aad57 internalize storage proxy statistics metric registration
The storage proxy statistics structure did not contain
a method for registering the statistics for metric
groups, instead, each user had to register some
of the metrics by itself. There is no real reason
for separating the metrics registration from
the statistics data. There is even less justification
for doing this only for part of the stats as is
the case for those statistics.
This commit internalize the metrics registration
in the storage_proxy stats structures.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2020-01-30 15:01:40 +01:00
Piotr Sarna
a7602bd2f1 database: add global view update stats
Currently view update metrics are only per-table, but per-table metrics
are not always enabled. In order to be able to see the number of
generated view updates in all cases, global stats are added.

Fixes #4221
Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>
2019-03-14 12:04:18 +00:00
Piotr Sarna
e30cf22956 db,view: add allow_hints parameter to mutate_MV
Mutating MV function can now accept a parameter whether
hints should be allowed during sending mutations to endpoints.
2019-01-28 09:38:42 +01:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00