scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Piotr Sarna	5f85a7a821	db,view: fix virtual columns liveness checks When looking for optimization paths, columns selected in a view are checked against multiple conditions - unfortunately virtual columns were erroneously skipped from that check, which resulted in ignoring their TTLs. That can lead to overoptimizing and not including vital liveness info into view rows, which can then result in row disappearing too early.	2019-02-28 10:47:19 +01:00
Piotr Sarna	bd52e05ae2	view: minimize generated view updates for unselected columns In some cases generating view updates for columns that were not selected in CREATE VIEW statement is redundant - it is the case when the update will not influence row liveness in anyway. Currently, these cases are optimized out: - row marker is live and only unselected columns were updated; - row marked is not live and only unselected columns were updated, and in the process nothing was created or deleted and there was no TTL involved;	2019-02-20 14:05:27 +01:00
Piotr Sarna	dbe8491655	view: cache is_index for view pointer It's detrimental to keep querying index manager whether a view is backing a secondary index every time, so this value is cached at construct time. At the same time, this value is not simply passed to view_info when being created in secondary index manager, in order to decouple materialized view logic from secondary indexes as much as possible (the sole existence of is_index() is bad enough).	2019-02-20 12:52:32 +01:00
Nadav Har'El	05db7d8957	Materialized views: name the "batch_memory_max" constant Give the constant 1024*1024 introduced in an earlier commit a name, "batch_memory_max", and move it from view.cc to view_builder.hh. It now resides next to the pre-existing constant that controlled how many rows were read in each build step, "batch_size". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217100222.15673-1-nyh@scylladb.com>	2019-02-17 13:28:16 +00:00
Nadav Har'El	fec562ec8f	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com>	2019-02-14 12:04:40 +02:00
Piotr Sarna	9a6261ca27	db,view: add updating view_building_paused statistics Each time view building does is paused because of connection failure, view_building_paused metrics is bumped.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e30cf22956	db,view: add allow_hints parameter to mutate_MV Mutating MV function can now accept a parameter whether hints should be allowed during sending mutations to endpoints.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e0fe9ce2c0	storage_proxy: add allow_hints parameter to send_to_endpoint With hints allowed, send_to_endpoint will leverage consistency level ANY to send data. Otherwise, it will use the default - cl::ONE.	2019-01-28 09:38:41 +01:00
Piotr Sarna	02d88de082	db,view: add consuming units in staging table registration View update generator service can accept sstables even before it starts, but it should still acknowledge the number of waiters in the semaphore. Reported-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <fcaa0f2884ebb4d34d1716e9e1cfed0642b4b85d.1547661048.git.sarna@scylladb.com>	2019-01-16 18:05:17 +00:00
Duarte Nunes	04a14b27e4	Merge 'Add handling staging sstables to /upload dir' from Piotr " This series adds generating view updates from sstables added through /upload directory if their tables have accompanying materialized views. Said sstables are left in /upload directory until updates are generated from them and are treated just like staging sstables from /staging dir. If there are no views for a given tables, sstables are simply moved from /upload dir to datadir without any changes. Tests: unit (release) " * 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla: all: rename view_update_from_staging_generator distributed_loader: fix indentation service: add generating view updates from uploaded sstables init: pass view update generator to storage service sstables: treat sstables in upload dir as needing view build sstables,table: rename is_staging to requires_view_building distributed_loader: use proper directory for opening SSTable db,view: make throttling optional for view_update_generator	2019-01-15 18:19:27 +00:00
Piotr Sarna	0eb703dc80	all: rename view_update_from_staging_generator The new name, view_update_generator, is both more concise and correct, since we now generate from directories other than "/staging".	2019-01-15 17:31:47 +01:00
Piotr Sarna	beb4836726	db,view: make throttling optional for view_update_generator Currently registering new view updates is throttled by a semaphore, which makes sense during stream sessions in order to avoid overloading the queue. Still, registration also occurs during initialization, where it makes little sense to wait on a semaphore, since view update generator might not have started at all yet.	2019-01-15 16:47:01 +01:00
Piotr Sarna	b9203ec4f8	view: wait for stream sessions to finish before view building During streaming, there's a race between streamed sstables and view creation, which might result in some tables not being used to generate view updates, even though they should. That happens when the decision about view update path for a table is done before view creation, but after already receiving some sstables via streaming. These will not be used in view building even though they should. Hence, a phaser is used to make the view builder wait for all ongoing stream sessions for a table to finish before proceeding with build steps. Refs #4032	2019-01-15 09:36:55 +01:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Piotr Sarna	9d46715613	streaming,view: move view update checks to separate file Checking if view update path should be used for sstables is going to be reused in row level repair code, so relevant functions are moved to a separate header.	2019-01-03 08:31:40 +01:00
Duarte Nunes	f41d13f38c	db/view/view_update_from_staging_generator: Break semaphore on stop() This avoid having fibers waiting _registration_sem without ever being notified. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:04 +00:00
Duarte Nunes	4974addc5c	db/view/view_update_from_staging_generator: Restore formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:02 +00:00
Duarte Nunes	201196130d	db/view/view_update_from_staging_generator: Avoid creating more than one fiber If view_update_from_staging_generator::maybe_generate_view_updates() is called before view_update_from_staging_generator::start(), as can happen in main.cc, then we can potentially create more than one fiber, which leads to corrupted state and conflicting operations. To avoid this, use just one fiber and be explicit about notifying it that more work is needed, by leveraging a condition-variable. Fixes #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:52:51 +00:00
Avi Kivity	0c0cc66ee7	system_keyspace, view: reduce interdependencies system_keyspace is an implementation detail for most of its users, not part of the interface, as it's only used to store internal data. Therefore, including it in a header file causes unneeded dependencies. This patch removes a dependency between views and system_keyspace.hh by moving view_name and view_build_progress into a separate header file, and using forward declarations where possible. This allows us to remove an inclusion of system_keyspace.hh from a header file (the last one), so that further changes to system_keyspace.hh will cause fewer recompilations. Message-Id: <20181228215736.11493-1-avi@scylladb.com>	2018-12-29 12:12:15 +00:00
Duarte Nunes	2bd76f8fc5	db/view: Introduce node_update_backlog class This class is an atomic view update backlog representation, safe to update from multiple shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	12ce517242	db/view: Add view_update_backlog The view update backlog represents the pending view data that a base replica maintains. It is the maximum of the memory backlog - how much memory pending view updates are consuming - and the disk backlog - how much view hints are consuming. The size of a backlog is relative to its maximum size. We will use this class to represent a base replica's view update backlog at the coordinator. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	a3d30ea99a	db/view: Propagate acquired semaphore units to mutate_MV() Propagate acquired semaphore units to mutate_MV() to allow the semaphore to be incrementally signalled as view updates are processed by view replicas. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	2753cfee88	db/view: Generate view updates as frozen_mutations Working in terms of frozen_mutations allows us to account more precisely the memory pending view updates consume at the storage_proxy layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	715da6fd6b	db/view: Reserve vector space in mutate_MV() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	5d011eb61f	db/view: Cleanup mutate_MV() In particular, extract out the logic updating the stats in case of a failed update. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Botond Dénes	1865e5da41	treewide: remove include database.hh from headers where possible Many headers don't really need to include database.hh, the include can be replaced by forward declarations and/or including the actually needed headers directly. Some headers don't need this include at all. Each header was verified to be compilable on its own after the change, by including it into an empty `.cc` file and compiling it. `.cc` files that used to get `database.hh` through headers that no longer include it were changed to include it themselves.	2018-12-14 08:03:57 +02:00
Paweł Dziepak	9024187222	partition_slice: use small_vector for column_ids	2018-12-06 14:21:04 +00:00
Duarte Nunes	6fbf792777	db/view/view_builder: Don't timeout waiting for view to be built Remove the timeout argument to db::view::view_builder::wait_until_built(), a test-only function to wait until a given materialized view has finished building. This change is motivated by the fact that some tests running on slow environments will timeout. Instead of incrementally increasing the timeout, remove it completely since tests are already run under an exterior timeout. Fixes #3920 Tests: unit release(view_build_test, view_schema_test) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181115173902.19048-1-duarte@scylladb.com>	2018-11-15 19:41:43 +02:00
Piotr Sarna	fc7267c797	db/view: add view_update_from_staging_generator service A shardable service for generating mv updates after restarts is added.	2018-11-13 15:01:52 +01:00
Piotr Sarna	ed05d91adc	db/view: add view updating consumer This consumer is used to generate and push view replica updates from read mutations.	2018-11-13 14:54:39 +01:00
Avi Kivity	d77e044cde	db: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Nadav Har'El	b8337f8c9d	Materalized views: fix race condition in resharding while view building When a node reshards (i.e., restarts with a different number of CPUs), and is in the middle of building a view for a pre-existing table, the view building needs to find the right token from which to start building on all shards. We ran the same code on all shards, hoping they would all make the same decision on which token to continue. But in some cases, one shard might make the decision, start building, and make progress - all before a second shard goes to make the decision, which will now be different. This resulted, in some rare cases, in the new materialized view missing a few rows when the build was interrupted with a resharding. The fix is to add the missing synchronization: All shards should make the same decision on whether and how to reshard - and only then should start building the view. Fixes #3890 Fixes #3452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181028140549.21200-1-nyh@scylladb.com>	2018-10-28 17:20:10 +00:00
Duarte Nunes	f3a5ec0fd9	db/view: Don't copy keyspace name Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181022104527.14555-1-duarte@scylladb.com>	2018-10-22 13:00:00 +02:00
Nadav Har'El	1d5f8d0015	materialized views: update stats.write statistics in all cases mutate_MV usually calls send_to_endpoint() to push view update to remote view replicas. This function gets passed a statistics object, service::storage_proxy_stats::write_stats and, in particular, updates its "writes" statistic which counts the number of ongoing writes. In the case that the paired view replica happens to be the same node, we avoid calling send_to_endpoint() and call mutate_locally() instead. That function does not take a write_stats object, so the "writes" statistic doesn't get incremented for the duration of the write. So we should do this explicitly. Co-authored-by: Nadav Har'El <nyh@scylladb.com> Co-authored-by: Duarte Nunes <duarte@scylladb.com>	2018-10-02 20:44:58 +01:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Nadav Har'El	16a6f76873	materialized views: simplify do_delete_old_entry() In previous patches, we gave up on an old (and broken) attempt to track the timestamps of many unselected base-table columns through one row marker in the view table - and replaced them by "virtual cells", one per unselected cell. The do_delete_old_entry() function still contains old code which maintained that row marker, and is no longer needed. That old code is no only no longer needed, it also no longer did anything because all columns now appear in the view (as virtual columns) so the code ignored them when calculating the row marker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180829131914.16042-1-nyh@scylladb.com>	2018-08-29 14:33:41 +01:00
Nadav Har'El	6c00341383	Materialized Views: no need for elaborate row marker calculations Now that we have separate virtual cells to represent unselected columns in a materialized view, we no longer need the elaborate row-marker liveness calculations which aimed (but failed) to do the same thing. So that code can be removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:45:41 +03:00
Nadav Har'El	30f721afab	Materialized Views: add unselected columns as virtual columns When a view's partition key contains only columns from the base's partition key (and not an additional one), the liveness (existance or disappearance) of a view-table row is tied to the liveness of the base table row - and that depends not only on selected columns (base-table columns SELECTed to also appear in the view) but also on unselected columns. This means that we may need to keep a view row alive even without data, just because some unselected column is alive in the base table. Before this patch we tried to build a single "row marker" in the view column which summarizes the liveness information in all unselected columns, but this proved unworkable, as explained in issue #3362 and as will be demonstrated in unit tests in a later patch. Because we can't replace several unselected cells by one row marker, what we do in this patch is to add for each for the unselected cell a "virtual cell" which contains the cell's liveness information (timestamp, deletion, ttl) but not its value. For collections, we can't represent the entire collection by one virtual cell, and rather need a collection of virtual cells. This patch just adds the virtual columns to the view schema. Code in the previous patch, when it notices the virtual columns in the view's schema, added the appropriate content into these columns. We may need to add virtual columns to a view when first created, but also when an unselected column is added to the base table with "ALTER TABLE", so both are supported in this patch. Fixes #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:42:22 +03:00
Nadav Har'El	782baa44ef	Materialized Views: fill virtual columns The add_cells_to_view() function usually adds selected cells from the base table to the view mutation. For issue #3362, we sometimes want to also add unselected cells as "virtual" cells - truncated versions of the base-table cells just without the values. This patch contains the code to fill the virtual columns' data using the regular columns from the base table. This patch does not yet actually add any virtual columns to the schema, so until that is done (in the next patch), this patch will not yet cause any behavior change. This is important for bisectability. Refs #3362. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:38:27 +03:00
Tomasz Grabiec	894961006b	Merge "db/view/view_builder: Fixes to bookkeeping" from Duarte This series contains a couple of fixes to the bookkeeping of the view build process, which could cause data to be left behind in the system tables. * git@github.com:duarten/scylla.git materialized-views/view-build-fixes/v1: Duarte Nunes (3): db/system_keyspace: Add function to remove view build status of a shard db/view: Don't have shard 0 clear other shard's status on drop db/view: Restrict writes to the distributed system keyspace to shard 0	2018-07-17 18:01:28 +02:00
Duarte Nunes	55caaec411	db/view/build_progress_virtual_reader: Also adjust end RT bound Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	eda6b88b0e	db/view/build_progress_virtual_reader: Fix full ck detection As an optimization, the virtual reader doesn't change the underlying key if it is not full, and hence doesn't include the extra clustering key. However, this detection is broken because it checked for 3 clustering columns, instead of 2. This patch fixes that by obtaining the clustering key size from the underlying schema instead of hardcoding the size. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	ff3a0d437a	db/view/build_progress_virtual_reader: Use correct schema to adjust ck The virtual reader adjusts clustering keys obtained from the underlying, scylla-specific schema, and potentially sheds the extra clustering key that's absent from the Cassandra-compatible schema. This patches ensures we use the correct schema to iterator over the key. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	df66d7db59	db/view: Restrict writes to the distributed system keyspace to shard 0 Writing to the distributed system keyspace should be confined to a single shard of each host, namely shard 0. We were violating this constraint by having all shards set the host status to "started". This could be problematic when the build finishes quickly or there's a concurrent view drop, such that a write done by shard 0 can have a smaller timestamp than one done by some other shard. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:45:26 +01:00
Duarte Nunes	e683c1367f	db/view: Don't have shard 0 clear other shard's status on drop Shard 0 can clear the in-progress build status of all shards when a view finishes building, because we are ensured all writes to the system table have completed with earlier timestamps. This is not the case when dropping a view. A drop can happen concurrently with the build, in which case shard 0 may process the notification before another shard receives it, and before that shard writes to the system table. Fix this by ensuring each shard clears its own status on drop. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:45:26 +01:00
Piotr Sarna	d5e7b5507b	view: add handling of a token column for secondary indexes In order to ensure token order on secondary index queries, first clustering column for each view that backs a secondary index is going to store a token computed from base's partition keys. After this commit, if there exists a column that is not present in base schema, it will be filled with computed token.	2018-06-05 18:59:25 +02:00
Piotr Sarna	06eee0f525	view: add is_index method is_index method returns true if view that owns it is backing a secondary index.	2018-06-05 11:10:24 +02:00
Paweł Dziepak	aa25f0844f	atomic_cell: introduce fragmented buffer value interface As a prepratation for the switch to the new cell representation this patch changes the type returned by atomic_cell_view::value() to one that requires explicit linearisation of the cell value. Even though the value is still implicitly linearised (and only when managed by the LSA) the new interface is the same as the target one so that no more changes to its users will be needed.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Paweł Dziepak	93130e80fb	atomic_cell: require column_definition for creating atomic_cell views	2018-05-31 15:51:11 +01:00

1 2 3

118 Commits