scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Piotr Jastrzebski	76d154dbac	view: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Sarna	e93c54e837	db,view: fix generating view updates for partition tombstones The update generation path must track and apply all tombstones, both from the existing base row (if read-before-write was needed) and for the new row. One such path contained an error, because it assumed that if the existing row is empty, then the update can be simply generated from the new row. However, lack of the existing row can also be the result of a partition/range tombstone. If that's the case, it needs to be applied, because it's entirely possible that this partition row also hides the new row. Without taking the partition tombstone into account, creating a future tombstone and inserting an out-of-order write before it in the base table can result in ghost rows in the view table. This patch comes with a test which was proven to fail before the changes. Branches 3.1,3.2,3.3 Fixes #5793 Tests: unit(dev) Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>	2020-02-12 23:16:30 +02:00
Avi Kivity	dcab666d52	cql3: query_processor: reduce #includes query_processor is a central class, so reducing its includes can reduce dependencies treewite. This patch removes includes for parsed_statement, cf_statement, and untyped_result_set and fixes up the rest of the tree to include what it lacks as a result of these removals.	2020-02-09 12:24:24 +02:00
Pavel Emelyanov	e2ec5eecf6	view_update: Do not need storage_proxy The view_update_generator acceps (and keeps) database and storage_proxy, the latter is only needed to initialize the view_updating_consumer which, in turn, only needs it to get database from (to find column family). This can be relaxed by providing the database from _generator to _consumer directly, without using the storage_proxy in between. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112427.18419-1-xemul@scylladb.com>	2020-02-07 13:30:01 +02:00
Eliran Sinvani	8cfc2aad57	internalize storage proxy statistics metric registration The storage proxy statistics structure did not contain a method for registering the statistics for metric groups, instead, each user had to register some of the metrics by itself. There is no real reason for separating the metrics registration from the statistics data. There is even less justification for doing this only for part of the stats as is the case for those statistics. This commit internalize the metrics registration in the storage_proxy stats structures. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:40 +01:00
Avi Kivity	17eaf552f0	Merge "Improve the accuracy of reader memory tracking" from Botond " Grab the lowest hanging fruits. This patch-set makes three important changes: * Consume the memory for I/O operations on tracked files, before they are forwarded to the underlying file. * Track memory consumed by buffers created for parsing in `continuous_data_consumer`. As this is the basis for the data, index and promoted index parsers, all three are covered now in this regard. * Track the index file. The remaining, not-so-low handing fruits in order of gain/cost(performance) ratio: * Track in-memory index lists. * Track in-memory promoted index blocks. * Track reader buffer memory. Note that this ordering might change based on the workload and other environmental factors. Also included in this series is an infrastructure refactoring to make tracking memory easier and involve including lighter headers, as well as a manual test designed to allow testing and experimenting with the effects of changes to the accuracy of the tracking of reader memory consumption. Refs: #4176 Refs: #2778 Tests: unit(dev), manual(sstable_scan_footprint_test) The latter was run as: build/dev/test/manual/sstable_scan_footprint_test -c1 -m2G --reads=4000 --read-concurrency=1 --logger-log-level test=trace --collect-stats --stats-period-ms=20 This will trickle reads until the semaphore blocks, then wait until the wait queue drains before sending new reads. This way we are not testing the effectiveness of the pre-admission estimation (which is terribly optimistic) and instead check that with slowly ramping up read load the semaphore will block on memory preventing OOM. This now runs to completion without a single `std::bad_alloc`. The read concurrency semaphore allows between 15-30 reads, and is always blocked on memory. " * 'more-accurate-reader-resource-tracking/v1' of ssh://github.com/denesb/scylla: test/manual/sstable_scan_footprint_test: improve memory consumption diagnostics tests/manual/sstable_scan_footprint_test: use the semaphore to determine read rate tests/manual: Add test measuring memory demand of concurrent sstable reads index_reader: make the index file tracked sstables/continuous_data_consumer: track buffers used for parsing reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively reader_concurrency_semaphore: bye reader_resource_tracker treewide: replace reader_resource_tracer with reader_permit reader_permit: expose make_tracked_temporary_buffer() reader_permit: introduce make_tracked_file() reader_permit: introduce memory_units reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh reader_concurrency_semaphore: reader_permit: make it a value type reader_concurrency_semaphore: s/resources/reader_resources/ reader_concurrency_semaphore::reader_permit: move methods out-of-line	2020-01-29 00:11:17 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Dejan Mircevski	90b54c8c42	view_info: Drop partition_ranges() The method view_info::partition_ranges() is unused. Also drop the now-dead _partition_ranges data member. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-26 12:02:32 +02:00
Piotr Sarna	9b379e3d63	db,view: fix checking for secondary index special columns A mistake in handling legacy checks for special 'idx_token' column resulted in not recognizing materialized views backing secondary indexes properly. The mistake is really a typo, but with bad consequences - instead of checking the view schema for being an index, we asked for the base schema, which is definitely not an index of itself. Branches 3.1,3.2 (asap) Fixes #5621 Fixes #4744	2020-01-21 22:32:04 +02:00
Rafael Ávila de Espíndola	27bd3fe203	service: Add a lock around migration_notifier::_listeners Before this patch the iterations over migration_notifier::_listeners could race with listeners being added and removed. The addition side is not modified, since it is common to add a listener during construction and it would require a fairly big refactoring. Instead, the iteration is modified to use indexes instead of iterators so that it is still valid if another listener is added concurrently. For removal we use a rw lock, since removing an element invalidates indexes too. There are only a few places that needed refactoring to handle unregister_listener returning a future<>, so this is probably OK. Fixes #5541. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200120192819.136305-1-espindola@scylladb.com>	2020-01-20 22:14:02 +02:00
Tomasz Grabiec	36d90e637e	Merge "Relax migration manager dependencies" from Pavel Emalyanov The set make dependencies between mm and other services cleaner, in particular, after the set: - the query processor no longer needs migration manager (which doesn't need query processor either) - the database no longer needs migration manager, thus the mutual dependency between these two is dropped, only migration manager -> database is left - the migration manager -> storage_service dependency is relaxed, one more patchset will be needed to remove it, thus dropping one more mutual dependency between them, only the storage_service -> migration manager will be left - the migration manager is stopped on drain, but several more services need it on stop, thus causing use after free problems, in particular there's a caught bug when view builder crashes when unregistering from notifier list on stop. Fixed. Tests: unit(dev) Fixes: #5404	2020-01-16 12:12:25 +01:00
Pavel Emelyanov	28f1250b8b	view_builder: Use migration notifier The migration manager itself is still needed on start to wait for schema agreement, but there's no longer the need for the life-time reference on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Gleb Natapov	51672e5990	paxos: immediately sync commitlog entries for writes made by paxos learn stage	2020-01-15 12:15:42 +02:00
Piotr Sarna	155a47cc55	view: handle multiple regular base columns in view pk Previous assumption was that there can only be one regular base column in the view key. The assumption is still correct for tables created via CQL, but it's internally possible to create a view with multiple such columns - the new assumption is that if there are multiple columns, they share their liveness. This patch is vital for indexing to work properly on alternator, so it would be best to solve the issue upstream. I strived to leave the existing semantics intact as long as only up to one regular column is part of the materialized view primary key, which is the case for Scylla's materialized views. For alternator it may not be true, but all regular columns in alternator share liveness info (since alternator does not support per-column TTL), which is sufficient to compute view updates in a consistent way. Fixes #5006 Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo) Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>	2020-01-07 12:18:39 +01:00
Piotr Sarna	54315f89cd	db,view: fix checking if partition key is empty Previous implementation did not take into account that a column in a partition key might exist in a mutation, but in a DEAD state - if it's deleted. There are no regressions for CQL, while for alternator and its capability of having two regular base columns in a view key, this additional check must be performed.	2020-01-07 12:05:36 +01:00
Benny Halevy	4b3243f5b9	table: move_sstables_from_staging_in_thread with _sstable_deletion_sem Hold the _sstable_deletion_sem while moving sstables from the staging directory so not to move them under the feet of table::snapshot. Fixes #5340 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0446ce712a	view_update_generator::start: use variable binding Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	5d7c80c148	view_update_generator::start: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	02784f46b9	view_update_generator: handle errors when processing sstable Consumer may throw, in this case, break from the loop and retry. move_sstable_from_staging_in_thread may theoretically throw too, ignore the error in this case since the sstable was already processed, individual move failures are already ignored and moving from staging will be retried upon restart. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0d2a7111b2	view_update_generator: sstable_with_table: std::move constructor args Just a small optimization. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:19:55 +02:00
Piotr Sarna	9c5a5a5ac2	treewide: add names to semaphores By default, semaphore exceptions bring along very little context: either that a semaphore was broken or that it timed out. In order to make debugging easier without introducing significant runtime costs, a notion of named semaphore is added. A named semaphore is simply a semaphore with statically defined name, which is present in its errors, bringing valuable context. A semaphore defined as: auto sem = semaphore(0); will present the following message when it breaks: "Semaphore broken" However, a named semaphore: auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"}); will present a message with at least some debugging context: "Semaphore broken: io_concurrency_sem" It's not much, but it would really help in pinpointing bugs without having to inspect core dumps. At the same time, it does not incur any costs for normal semaphore operations (except for its creation), but instead only uses more CPU in case an error is actually thrown, which is considered rare and not to be on the hot path. Refs #4999 Tests: unit(dev), manual: hardcoding a failure in view building code	2019-11-26 15:14:21 +02:00
Kamil Braun	2ada219f2c	view: generalize create_virtual_column and maybe_make_virtual to UDTs.	2019-10-25 12:04:44 +02:00
Kamil Braun	bbdb438d89	collection_mutation: easier (de)serialization of collection_mutation(s). `collection_type_impl::serialize_mutation_form` became `collection_mutation(_view)_description::serialize`. Previously callers had to cast their data_type down to collection_type to use serialize_mutation_form. Now it's done inside `serialize`. In the future `serialize` will be generalized to handle UDTs. `collection_type_impl::deserialize_mutation_form` became a free standing function `deserialize_collection_mutation` with similiar benefits. Actually, noone needs to call this function manually because of the next paragraph. A common pattern consisting of linearizing data inside a `collection_mutation_view` followed by calling `deserialize_mutation_form` has been abstracted out as a `with_deserialized` method inside collection_mutation_view. serialize_mutation_form_only_live was removed, because it hadn't been used anywhere.	2019-10-25 10:42:58 +02:00
Kamil Braun	b1d16c1601	types: move collection_type_impl::mutation(_view) out of collection_type_impl. collection_type_impl::mutation became collection_mutation_description. collection_type_impl::mutation_view became collection_mutation_view_description. These classes now reside inside collection_mutation.hh. Additional documentation has been written for these classes. Related function implementations were moved to collection_mutation.cc. This makes it easier to generalize these classes to non-frozen UDTs in future commits. The new names (together with documentation) better describe their purpose.	2019-10-25 10:19:45 +02:00
Piotr Sarna	9e98b51aaa	view: fix view_info select statement for local indexes Calculating the select statement for given view_info structure used to work fine, but once local indexes were introduced, a subtle bug appeared: the legacy token column does not exist in local indexes and a valid clustering key column was omitted instead. That results in potentially incorrect partition slices being used later in read-before-write. There's a long term plan for removing select_statement from view info altogether, but nonetheless the bug needs to be fixed first.	2019-10-14 17:14:19 +02:00
Kamil Braun	ef9d5750c8	view: fix bug in virtual columns. When creating a virtual column of non-frozen map type, the wrong type was used for the map's keys. Fixes #5165.	2019-10-11 20:47:06 +03:00
Piotr Sarna	feec3825aa	view: degrade shutdown bookkeeping update failures log to warn Currently, if updating bookkeeping operations for view building fails, we log the error message and continue. However, during shutdown, some errors are more likely to happen due to existing issues like #4384. To differentiate actual errors from semi-expected errors during shutdown, the latter are now logged with a warning level instead of error. Fixes #4954	2019-09-16 10:13:06 +03:00
Piotr Sarna	23c891923e	main: make sure view_builder doesn't propagate semaphore errors Stopping services which occurs in a destructor of deferred_action should not throw, or it will end the program with terminate(). View builder breaks a semaphore during its shutdown, which results in propagating a broken_semaphore exception, which in turn results in throwing an exception during stop().get(). In order to fix that issue, semaphore exceptions are explicitly ignored, since they're expected to appear during shutdown. Fixes #4875	2019-09-01 11:59:57 +03:00
Botond Dénes	136fc856c5	treewide: silence discarded future warnings for questionable discards This patches silences the remaining discarded future warnings, those where it cannot be determined with reasonable confidence that this was indeed the actual intent of the author, or that the discarding of the future could lead to problems. For all those places a FIXME is added, with the intent that these will be soon followed-up with an actual fix. I deliberately haven't fixed any of these, even if the fix seems trivial. It is too easy to overlook a bad fix mixed in with so many mechanical changes.	2019-08-26 19:28:43 +03:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Piotr Sarna	3cc5a04301	db,view: wrap view update generation in stream scheduling group Generating view updates is used by streaming, so the service itself should also run under the matching scheduling group.	2019-08-20 00:24:50 +02:00
Piotr Sarna	3c5dd94306	view: remove unused token_for function The function was only used once in code removed in this series.	2019-07-19 11:58:42 +02:00
Piotr Sarna	6a6871aa0e	view: check for computed columns in view Currently, having a 'computed' column in view update generation indicates that token value needs to be generated and assigned to it.	2019-07-19 11:58:42 +02:00
Piotr Sarna	85a3a4b458	view: ignore duplicated key entries in progress virtual reader Build progress virtual reader uses Scylla-specific scylla_views_builds_in_progress table in order to represent legacy views_builds_in_progress rows. The Scylla-specific table contains additional cpu_id clustering key part, which is trimmed before returning it to the user. That may cause duplicated clustering row fragments to be emitted by the reader, which may cause undefined behaviour in consumers. The solution is to keep track of previous clustering keys for each partition and drop fragments that would cause duplication. That way if any shard is still building a view, its progress will be returned, and if many shards are still building, the returned value will indicate the progress of a single arbitrary shard. Fixes #4524 Tests: unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>	2019-06-11 13:01:31 +02:00
Piotr Sarna	cf8d2a5141	Revert "view: cache is_index for view pointer" This reverts commit `dbe8491655`. Caching the value was not done in a correct manner, which resulted in longevity tests failures. Fixes #4478 Branches: 3.1 Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>	2019-05-06 11:45:46 +03:00
Duarte Nunes	ded9221187	db/view: Apply tracked tombstones for new updates When generating view updates for base mutations when no pre-existing data exists, we were forgetting to apply the tracked tombstones. Fixes #4321 Tests: unit(dev) Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2019-03-27 12:01:39 +00:00
Piotr Sarna	a7602bd2f1	database: add global view update stats Currently view update metrics are only per-table, but per-table metrics are not always enabled. In order to be able to see the number of generated view updates in all cases, global stats are added. Fixes #4221 Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>	2019-03-14 12:04:18 +00:00
Piotr Sarna	5f85a7a821	db,view: fix virtual columns liveness checks When looking for optimization paths, columns selected in a view are checked against multiple conditions - unfortunately virtual columns were erroneously skipped from that check, which resulted in ignoring their TTLs. That can lead to overoptimizing and not including vital liveness info into view rows, which can then result in row disappearing too early.	2019-02-28 10:47:19 +01:00
Piotr Sarna	bd52e05ae2	view: minimize generated view updates for unselected columns In some cases generating view updates for columns that were not selected in CREATE VIEW statement is redundant - it is the case when the update will not influence row liveness in anyway. Currently, these cases are optimized out: - row marker is live and only unselected columns were updated; - row marked is not live and only unselected columns were updated, and in the process nothing was created or deleted and there was no TTL involved;	2019-02-20 14:05:27 +01:00
Piotr Sarna	dbe8491655	view: cache is_index for view pointer It's detrimental to keep querying index manager whether a view is backing a secondary index every time, so this value is cached at construct time. At the same time, this value is not simply passed to view_info when being created in secondary index manager, in order to decouple materialized view logic from secondary indexes as much as possible (the sole existence of is_index() is bad enough).	2019-02-20 12:52:32 +01:00
Nadav Har'El	05db7d8957	Materialized views: name the "batch_memory_max" constant Give the constant 1024*1024 introduced in an earlier commit a name, "batch_memory_max", and move it from view.cc to view_builder.hh. It now resides next to the pre-existing constant that controlled how many rows were read in each build step, "batch_size". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217100222.15673-1-nyh@scylladb.com>	2019-02-17 13:28:16 +00:00
Nadav Har'El	fec562ec8f	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com>	2019-02-14 12:04:40 +02:00
Piotr Sarna	9a6261ca27	db,view: add updating view_building_paused statistics Each time view building does is paused because of connection failure, view_building_paused metrics is bumped.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e30cf22956	db,view: add allow_hints parameter to mutate_MV Mutating MV function can now accept a parameter whether hints should be allowed during sending mutations to endpoints.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e0fe9ce2c0	storage_proxy: add allow_hints parameter to send_to_endpoint With hints allowed, send_to_endpoint will leverage consistency level ANY to send data. Otherwise, it will use the default - cl::ONE.	2019-01-28 09:38:41 +01:00
Piotr Sarna	02d88de082	db,view: add consuming units in staging table registration View update generator service can accept sstables even before it starts, but it should still acknowledge the number of waiters in the semaphore. Reported-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <fcaa0f2884ebb4d34d1716e9e1cfed0642b4b85d.1547661048.git.sarna@scylladb.com>	2019-01-16 18:05:17 +00:00
Duarte Nunes	04a14b27e4	Merge 'Add handling staging sstables to /upload dir' from Piotr " This series adds generating view updates from sstables added through /upload directory if their tables have accompanying materialized views. Said sstables are left in /upload directory until updates are generated from them and are treated just like staging sstables from /staging dir. If there are no views for a given tables, sstables are simply moved from /upload dir to datadir without any changes. Tests: unit (release) " * 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla: all: rename view_update_from_staging_generator distributed_loader: fix indentation service: add generating view updates from uploaded sstables init: pass view update generator to storage service sstables: treat sstables in upload dir as needing view build sstables,table: rename is_staging to requires_view_building distributed_loader: use proper directory for opening SSTable db,view: make throttling optional for view_update_generator	2019-01-15 18:19:27 +00:00
Piotr Sarna	0eb703dc80	all: rename view_update_from_staging_generator The new name, view_update_generator, is both more concise and correct, since we now generate from directories other than "/staging".	2019-01-15 17:31:47 +01:00
Piotr Sarna	beb4836726	db,view: make throttling optional for view_update_generator Currently registering new view updates is throttled by a semaphore, which makes sense during stream sessions in order to avoid overloading the queue. Still, registration also occurs during initialization, where it makes little sense to wait on a semaphore, since view update generator might not have started at all yet.	2019-01-15 16:47:01 +01:00
Piotr Sarna	b9203ec4f8	view: wait for stream sessions to finish before view building During streaming, there's a race between streamed sstables and view creation, which might result in some tables not being used to generate view updates, even though they should. That happens when the decision about view update path for a table is done before view creation, but after already receiving some sstables via streaming. These will not be used in view building even though they should. Hence, a phaser is used to make the view builder wait for all ongoing stream sessions for a table to finish before proceeding with build steps. Refs #4032	2019-01-15 09:36:55 +01:00

1 2 3 4

155 Commits