scylladb

Author	SHA1	Message	Date
Benny Halevy	6454c8d67f	db: view_updates: coroutinize move_to And allow yielding in-between freezing each update mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-08 11:29:25 +03:00
Botond Dénes	909be0b9d7	db/view: convert view_update_builder interface to v2 The constructor and the make_ factory method now take v2 readers. Immediate users are patched, with conversions if needed.	2022-03-17 10:50:50 +02:00
Botond Dénes	0740019e4d	db/view: migrate view_update_builder to v2 To avoid noise, the interface is left as v1 and inbound readers are converted in the constructor.	2022-03-17 10:47:55 +02:00
Mikołaj Sielużycki	1d84a254c0	flat_mutation_reader: Split readers by file and remove unnecessary includes. The flat_mutation_reader files were conflated and contained multiple readers, which were not strictly necessary. Splitting optimizes both iterative compilation times, as touching rarely used readers doesn't recompile large chunks of codebase. Total compilation times are also improved, as the size of flat_mutation_reader.hh and flat_mutation_reader_v2.hh have been reduced and those files are included by many file in the codebase. With changes real 29m14.051s user 168m39.071s sys 5m13.443s Without changes real 30m36.203s user 175m43.354s sys 5m26.376s Closes #10194	2022-03-14 13:20:25 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Botond Dénes	f02632aeb0	range_tombstone_accumulator: drop _reversed flag	2021-09-09 15:42:15 +03:00
Piotr Sarna	a1813c9b34	db,view,table: drop unneeded time point parameter Now that restriction checking is translated to the partition-slice-style interface, checking the partition/clustering key restrictions for views can be performed without the time point parameter. The parameter is dropped from all relevant call sites.	2021-07-13 10:40:08 +02:00
Piotr Sarna	bf0777e97a	view: generate view updates in smaller parts In order to avoid large allocations and too large mutations generated from large view updates, granularity of the process is broken down from per-partition to smaller chunks. The view update builder now produces partial updates, no more than 100 view rows at a time.	2021-07-08 11:17:27 +02:00
Piotr Sarna	679dc4d824	db,view: move view_update_builder to the header The builder is going to be used directly by the callers, which requires making its definition public. No semantic changes were intended.	2021-07-08 11:17:27 +02:00
Piotr Sarna	a7f7716ecf	db,view: use chunked_vector for view updates The number of view updates can grow large, especially in corner cases like removing large base partitions. Chunked vector prevents large allocations.	2021-06-17 10:15:17 +02:00
Piotr Sarna	f832a30388	db,view,table: futurize calculating affected ranges In order to avoid stalls on large inputs, calculating affected ranges is now able to yield.	2021-06-16 09:51:31 +02:00
Piotr Sarna	88d4a66e90	db,view: pass base token by value to mutate_MV The base token is passed cross-continuations, so the current way of passing it by const reference probably only works because the token copying is cheap enough to optimize the reference out. Fix by explicitly taking the token by value.	2021-06-14 09:30:38 +02:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Piotr Sarna	35887bf88b	view: add printing missing base column on errors When an out-of-sync view is attempted to be used in a write operation, the whole operation needs to be aborted with an error. After this patch, the error contains more context - namely, the missing column.	2020-10-31 12:22:07 +01:00
Piotr Sarna	ef3470fa34	view: simplify creating base-dependent info for reads only The code which created base-dependent info for materialized views can be expressed with fewer branches. Also, the constructor which takes a single parameter is made explicit.	2020-10-31 12:22:07 +01:00
Eliran Sinvani	70e04c1123	view info: support partial match between base and view for only reading from view. The current implementation of materialized views does no keep the version to which a specific version of materialized view schema corresponds to. This complicate things especially on old views versions that the schema doesn't support anymore. However, the views, being also an independent table should allow reading from them as long as they exist even if the base table changed since then. For the reading purpose, we don't need to know the exact composition of view primary key columns that are not part of the base primary key, we only need to know that there are any, and this is a much looser constrain on the schema. We can rely on a table invariants such as the fact that pk columns are not going to disappear on newer version of the table. This means that if we don't find a view column in the base table, it is not a part of the base table primary key. This information is enough for us to perform read on the view. This commit adds support for being able to rely on such partial information along with a validation that it is not going to be used for writes. If it is, we simply abort since this means that our schema integrity is compromised.	2020-10-21 15:20:43 +03:00
Tomasz Grabiec	3a6ec9933c	db: views: Fix undefined behavior on base table schema changes The view_info object, which is attached to the schema object of the view, contains a data structure called "base_non_pk_columns_in_view_pk". This data structure contains column ids of the base table so is valid only for a particular version of the base table schema. This data structure is used by materialized view code to interpret mutations of the base table, those coming from base table writes, or reads of the base table done as part of view updates or view building. The base table schema version of that data structure must match the schema version of the mutation fragments, otherwise we hit undefined behavior. This may include aborts, exceptions, segfaults, or data corruption (e.g. writes landing in the wrong column in the view). Before this patch, we could get schema version mismatch here after the base table was altered. That's because the view schema does not change when the base table is altered. Part of the fix is to extract base_non_pk_columns_in_view_pk into a third entitiy called base_dependent_view_info, which changes both on base table schema changes and view schema changes. It is managed by a shared pointer so that we can take immutable snapshots of it, just like with schema_ptr. When starting the view update, the base table schema_ptr and the corresponding base_dependent_view_info have to match. So we must obtain them atomically, and base_dependent_view_info cannot change during update. Also, whenever the base table schema changes, we must update base_dependent_view_infos of all attached views (atomically) so that it matches the base table schema. Refs #7061.	2020-08-20 14:53:07 +02:00
Piotr Sarna	77e943e9a3	db,views: unify time points used for update generation Until now, view updates were generated with a bunch of random time points, because the interface was not adjusted for passing a single time point. The time points were used to determine whether cells were alive (e.g. because of TTL), so it's better to unify the process: 1. when generating view updates from user writes, a single time point is used for the whole operation 2. when generating view updates via the view building process, a single time point is used for each build step NOTE: I don't see any reliable and deterministic way of writing test scenarios which trigger problems with the old code. After #6488 is resolved and error injection is integrated into view.cc, tests can be added. Fixes #6429 Tests: unit(dev) Message-Id: <f864e965eb2e27ffc13d50359ad1e228894f7121.1590070130.git.sarna@scylladb.com>	2020-05-28 12:56:09 +03:00
Piotr Sarna	92aadb94e5	treewide: propagate trace state to write path In order to add tracing to places where it can be useful, e.g. materialized view updates and hinted handoff, tracing state is propagated to all applicable call sites.	2020-05-18 16:05:23 +02:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Piotr Sarna	0c11e07faf	view,table: fix waiting for view updates during building View updates sent as part of the view building process should never be ignored, but `fd49fd7` introduced a bug which may cause exactly that: the updates are mistakenly sent to background, so the view builder will not receive negative feedback if an update failed, which will in turn not cause a retry. Consequently, view building may report that it "finished" building a view, while some of the updates were lost. A simple fix is to restore previous behaviour - all updates triggered by view building are now waited for. Fixes #6038 Tests: unit(dev), dtest: interrupt_build_process_with_resharding_low_to_half_test	2020-03-19 10:50:54 +02:00
Piotr Sarna	3b3659e8cd	db,view: drop default parameter for mutate_MV::allow_hints Default parameters are considered harmful, and as part of a cleanup before editing view.cc code, a default value for allow_hints parameter is removed.	2020-03-11 09:05:56 +01:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Eliran Sinvani	8cfc2aad57	internalize storage proxy statistics metric registration The storage proxy statistics structure did not contain a method for registering the statistics for metric groups, instead, each user had to register some of the metrics by itself. There is no real reason for separating the metrics registration from the statistics data. There is even less justification for doing this only for part of the stats as is the case for those statistics. This commit internalize the metrics registration in the storage_proxy stats structures. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:40 +01:00
Piotr Sarna	a7602bd2f1	database: add global view update stats Currently view update metrics are only per-table, but per-table metrics are not always enabled. In order to be able to see the number of generated view updates in all cases, global stats are added. Fixes #4221 Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>	2019-03-14 12:04:18 +00:00
Piotr Sarna	e30cf22956	db,view: add allow_hints parameter to mutate_MV Mutating MV function can now accept a parameter whether hints should be allowed during sending mutations to endpoints.	2019-01-28 09:38:42 +01:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Duarte Nunes	a3d30ea99a	db/view: Propagate acquired semaphore units to mutate_MV() Propagate acquired semaphore units to mutate_MV() to allow the semaphore to be incrementally signalled as view updates are processed by view replicas. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	2753cfee88	db/view: Generate view updates as frozen_mutations Working in terms of frozen_mutations allows us to account more precisely the memory pending view updates consume at the storage_proxy layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Nadav Har'El	30f721afab	Materialized Views: add unselected columns as virtual columns When a view's partition key contains only columns from the base's partition key (and not an additional one), the liveness (existance or disappearance) of a view-table row is tied to the liveness of the base table row - and that depends not only on selected columns (base-table columns SELECTed to also appear in the view) but also on unselected columns. This means that we may need to keep a view row alive even without data, just because some unselected column is alive in the base table. Before this patch we tried to build a single "row marker" in the view column which summarizes the liveness information in all unselected columns, but this proved unworkable, as explained in issue #3362 and as will be demonstrated in unit tests in a later patch. Because we can't replace several unselected cells by one row marker, what we do in this patch is to add for each for the unselected cell a "virtual cell" which contains the cell's liveness information (timestamp, deletion, ttl) but not its value. For collections, we can't represent the entire collection by one virtual cell, and rather need a collection of virtual cells. This patch just adds the virtual columns to the view schema. Code in the previous patch, when it notices the virtual columns in the view's schema, added the appropriate content into these columns. We may need to add virtual columns to a view when first created, but also when an unselected column is added to the base table with "ALTER TABLE", so both are supported in this patch. Fixes #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:42:22 +03:00
Piotr Sarna	3792bed3ed	view: adapt view_stats to act as write stats This commit adapts view_stats structure so it can be passed to storage_proxy as write stats. Thanks to that, mv replica updates will not interfere with user write metrics. As a side effect it also provides more stats to replica view updates. Closes #3385 Closes #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	49bebcfa25	view: add view metrics This commit introduces view statistics: - updates pushed to local/remote replicas - updates failed to be pushed to local/remote replicas Metrics are kept on per-table basis, i.e. updates_pushed_remote shows the number of total updates (mutations) pushed to all paired mv replicas that this particular table has. Every single update is taken into consideration, so if view update requires removing a row from one view and adding a row to another, it will be counted as 2 updates. References #3385 References #3416	2018-05-22 16:52:58 +02:00
Duarte Nunes	dc44a08370	db/view: Return a future when sending view updates While we now send view mutations asynchronously in the normal view write path, other processes interested in sending view updates, such as streaming or view building, may wish to do it synchronously. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Piotr Jastrzebski	96c97ad1db	Rename streamed_mutation* files to mutation_fragment* Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:49 +01:00
Piotr Jastrzebski	4c74b8c7e7	Migrate materalized views to flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-18 07:32:35 +01:00
Duarte Nunes	983af595e9	database: Read existing base mutations When generating updates for a materialized view we need to read the existing base row, to be able to determine the primary key of the view row the new base update will supplant, in case the view includes a base non-primary key column in its own primary key. That old view row will be tombstoned or updated, if it exists, depending on the difference between the new base row and the existing one, if any. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	8a77bfe35b	db/view: Calculate clustering ranges for MV read-before-write query Introduce the calculate_affected_clustering_ranges() function to calculate the smallest subject of affected clustering ranges that we need to query for. The update_requires_read_before_write() function checks whether a view is potentially affected by the base update. The patch also cleans up the may_be_affected_by() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	bfb8a3c172	materialized views: Replace db::view::view class The write path uses a base schema at a particular version, and we want it to use the materialized views at the corresponding version. To achieve this, we need to map the state currently in db::view::view to a particular schema version, which this patch does by introducing the view_info class to hold the state previously in db::view::view, and by having a view schema directly point to it. The changes in the patch are thus: 1) Introduce view_info to hold the extra view state; 2) Point to the view_info from the schema; 3) Make the functions in the now stateless db::view::view non-member; 4) Remove the db::view::view class. All changes are structural and don't affect current behavior. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-15 15:50:05 +01:00
Nadav Har'El	3ae73164a4	materialized views: partial mutate_MV This adds a function mutate_MV() which takes view mutations and sends them to the appropriate nodes (this may be the current node, or a remote node). This is only a partial implementation - we still don't do the local batch log (to survive reboots and failures) and some other stuff which is left commented out. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	d5a61a8c48	view: Add view_update_builder class This patch adds the view_update_builder class, which is responsible for calculating the mutations to apply to a column family's materialized views, given a streamed_mutation representing an update to the base table and a streamed_mutation representing the pre-existing rows which the update covers. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Duarte Nunes	082ef56df1	view: Store pk view column that's non-pk in the base To help calculate the view mutations from a base update, we store in the view class the column that's part of the view's primary key but not part of the base's, if such column exists. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:30 +01:00
Duarte Nunes	734ad80390	view: Add matches_view_filter() function This patch adds the matches_view_filter() function which specifies whether a given base row matches the view filter. Unlike may_be_affected_by(), this function has no false positives. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:30 +01:00
Duarte Nunes	21d1bbb527	view: Add may_be_affected_by function This patch adds the may_be_affected_by() function to the view class, which is responsible to determine whether an update to a base class affects one of its views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:35:30 +01:00
Duarte Nunes	7818339791	materialized views: Add view class This patch adds the view class, which will contains functions related to populating a view, either from the base table's write path or from the view building mechanism which copies over already existing data in the base table. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00

46 Commits