scylladb

Author	SHA1	Message	Date
Calle Wilund	e4d6c8904f	untyped_result_set: Do not copy data from input store (retain fragmented views) Refs #7961 Fixes #8014 Instead of doing a deep copy of input, we keep assume ownership and build rows of the views therein, potentially retaining fragmented data as-is avoiding premature linearization. Note that this is not all sugar and flowers though. Any data access will by nature be more expensive, and the view collections we create are potentially just as expensive as copying for small cells. Otoh, it allows writing code using this that avoids data copying, depending on destination. v2: * Fixed wrong collection reserved in visitor * Changed row index from shared ptr to ref * Moved typedef * Removed non-existing constructors * Added const ref to index build * Fixed raft usage after rebase v3: * Changed shared_ptr to unique	2021-03-03 10:19:46 +00:00
Kamil Braun	e2f03e4aba	cdc: move (most of) CDC generation management code to the new service Currently all management of CDC generations happens in storage_service, which is a big ball of mud that does many unrelated things. Previous commits have introduced a new service for managing CDC generations. This code moves most of the relevant code to this new service. However, some part still remains in storage_service: the bootstrap procedure, which happens inside storage_service, must also do some initialization regarding CDC generations, for example: on restart it must retrieve the latest known generation timestamp from disk; on bootstrap it must create a new generation and announce it to other nodes. The order of these operations w.r.t the rest of the startup procedure is important, hence the startup procedure is the only right place for them. Still, what remains in storage_service is a small part of the entire CDC generation management logic; most of it has been moved to the new service. This includes listening for generation changes and updating the data structures for performing CDC log writes (cdc::metadata). Furthermore these functions now return futures (and are internally coroutines), where previously they required a seastar::async context.	2021-02-26 12:06:12 +01:00
Kamil Braun	022d7773f4	cdc: coroutinize make_new_cdc_generation	2021-02-22 12:47:44 +01:00
Kamil Braun	26ca9d6c33	cdc: coroutinize update_streams_description	2021-02-22 12:46:53 +01:00
Kamil Braun	d4937daaea	cdc: introduce cdc::generation_service This commit introduces a new service crafted to handle CDC generation management: listening and reacting to generation changes in the cluster. The implementation is a stub for now, the service reacts to generation changes by simply logging the event. The commit plugs the service in, initializing it in main and test code, passing a reference to storage_service and having storage_service start the service (using the `after_join` method): the service only starts doing its job after the node joins the token ring (either on bootstrap or restart).	2021-02-22 12:45:43 +01:00
Avi Kivity	90a7f76fb6	Merge 'cdc: log: fix a use-after-free in process_bytes_visitor' from Michał Chojnowski Due to small value optimization used in `bytes`, views to `bytes` stored in `vector` can be invalidated when the vector resizes, resulting in use-after-free and data corruption. Fix that. Closes #8105 * github.com:scylladb/scylla: cdc: log: avoid an unnecessary copy cdc: log: fix use-after-free in process_bytes_visitor	2021-02-18 20:23:41 +02:00
Michał Chojnowski	96c22cf3f8	cdc: log: avoid an unnecessary copy There is no need to copy `bytes_view` into `bytes` here.	2021-02-18 14:08:18 +01:00
Michał Chojnowski	8cc4f39472	cdc: log: fix use-after-free in process_bytes_visitor Due to small value optimization used in `bytes`, views to `bytes` stored in `vector` can be invalidated when the vector resizes, resulting in use-after-free and data corruption. Fix that. Fixes #8117	2021-02-18 14:08:17 +01:00
Kamil Braun	841f07e9b7	cdc: add config option to disable streams rewriting Rewriting stream descriptions is a long, expensive, and prone-to-failure operation. Due to #8061 it may consume a lot of memory. In general, it may keep failing (and being retried) endlessly, straining the cluster. As a backdoor we add this flag for potential future needs of admins or field engineers. I don't expect it will ever be used, but it won't hurt and may save us some work in the worst case scenario.	2021-02-18 11:44:59 +01:00
Kamil Braun	9bdd000e97	cdc: rewrite streams to the new description table Nodes automatically ensure that the latest CDC generation's list of streams is present in the streams description table. When a new generation appears, we only need to update the table for this generation; old generations are already inserted. However, we've changed the description table (from `cdc_streams_descriptions` to `cdc_streams_descriptions_v2`). The existing mechanism only ensures that the latest generation appears in the new description table. This commit adds an additional procedure that rewrites the older generations as well, if we find that it is necessary to do so (i.e. when some CDC log tables may contain data in these generations).	2021-02-18 11:44:59 +01:00
Kamil Braun	7c91894ddf	cdc: introduce no_generation_data_exception exception type	2021-02-18 11:44:59 +01:00
Kamil Braun	44aab61aea	cdc: coroutinize do_update_streams_description	2021-02-18 11:44:59 +01:00
Kamil Braun	67d4e5576d	sys_dist_ks: split CDC streams table partitions into clustered rows Until now, the lists of streams in the `cdc_streams_descriptions` table for a given generation were stored in a single collection. This solution has multiple problems when dealing with large clusters (which produce large lists of streams): 1. large allocations 2. reactor stalls 3. mutations too large to even fit in commitlog segments This commit changes the schema of the table as described in issue #7993. The streams are grouped according to token ranges, each token range being represented by a separate clustering row. Rows are inserted in reasonably large batches for efficiency. The table is renamed to enable easy upgrade. On upgrade, the latest CDC generation's list of streams will be (re-)inserted into the new table. Yet another table is added: one that contains only the generation timestamps clustered in a single partition. This makes it easy for CDC clients to learn about new generations. It also enables an elegant two-phase insertion procedure of the generation description: first we insert the streams; only after ensuring that a quorum of replicas contains them, we insert the timestamp. Thus, if any client observes a timestamp in the timestamps table (even using a ONE query), it means that a quorum of replicas must contain the list of streams.	2021-02-18 11:44:59 +01:00
Kamil Braun	ba920361b3	cdc: use chunked_vector for streams in streams_version The vector may get quite long (say... 1,6M stream IDs). We prevent a large allocation by using utils::chunked_vector.	2021-02-18 11:44:59 +01:00
Kamil Braun	9ae4467970	cdc: remove `streams_version::expired` field This field was not used anywhere.	2021-02-18 11:44:59 +01:00
Avi Kivity	c63e26e26f	Merge 'cdc: Limit size of topology description' from Piotr Jastrzębski Currently, whole topology description for CDC is stored in a single row. This means that for a large cluster of strong machines (say 100 nodes 64 cpus each), the size of the topology description can reach 32MB. This causes multiple problems. First of all, there's a hard limit on mutation size that can be written to Scylla. It's related to commit log block size which is 16MB by default. Mutations bigger than that can't be saved. Moreover, such big partitions/rows cause reactor stalls and negatively influence latency of other requests. This patch limits the size of topology description to about 4MB. This is done by reducing the number of CDC streams per vnode and can lead to CDC data not being fully colocated with Base Table data on shards. It can impact performance and consistency of data. This is just a quick fix to make it easily backportable. A full solution to the problem is under development. For more details see #7961, #7993 and #7985. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8048 * github.com:scylladb/scylla: cdc: Limit size of topology description cdc: Extract create_stream_ids from topology_description_generator	2021-02-17 15:43:53 +02:00
Piotr Jastrzebski	649f254863	cdc: Limit size of topology description Currently, whole topology description for CDC is stored in a single row. This means that for a large cluster of strong machines (say 100 nodes 64 cpus each), the size of the topology description can reach 32MB. This causes multiple problems. First of all, there's a hard limit on mutation size that can be written to Scylla. It's related to commit log block size which is 16MB by default. Mutations bigger than that can't be saved. Moreover, such big partitions/rows cause reactor stalls and negatively influence latency of other requests. This patch limits the size of topology description to about 4MB. This is done by reducing the number of CDC streams per vnode and can lead to CDC data not being fully colocated with Base Table data on shards. It can impact performance and consistency of data. This is just a quick fix to make it easily backportable. A full solution to the problem is under development. For more details see #7961, #7993 and #7985. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-02-17 13:24:40 +01:00
Botond Dénes	ba7a9d2ac3	imr: switch back to open-coded description of structures Commit `aab6b0ee27` introduced the controversial new IMR format, which relied on a very template-heavy infrastructure to generate serialization and deserialization code via template meta-programming. The promise was that this new format, beyond solving the problems the previous open-coded representation had (working on linearized buffers), will speed up migrating other components to this IMR format, as the IMR infrastructure reduces code bloat, makes the code more readable via declarative type descriptions as well as safer. However, the results were almost the opposite. The template meta-programming used by the IMR infrastructure proved very hard to understand. Developers don't want to read or modify it. Maintainers don't want to see it being used anywhere else. In short, nobody wants to touch it. This commit does a conceptual revert of `aab6b0ee27`. A verbatim revert is not possible because related code evolved a lot since the merge. Also, going back to the previous code would mean we regress as we'd revert the move to fragmented buffers. So this revert is only conceptual, it changes the underlying infrastructure back to the previous open-coded one, but keeps the fragmented buffers, as well as the interface of the related components (to the extent possible). Fixes: #5578	2021-02-16 23:43:07 +01:00
Piotr Jastrzebski	390cef6a96	cdc: Extract create_stream_ids from topology_description_generator This new function will be used in the following patches in additional places. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-02-10 10:24:06 +01:00
Konstantin Osipov	b4f875f08e	uuid: reduce code dependency on UUID_gen.hh Do not include UUID_gen.hh in trace_state.hh and lists.hh to reduce header level dependency on it. Message-Id: <20210127173114.725761-2-kostja@scylladb.com>	2021-01-27 20:08:29 +02:00
Benny Halevy	c60da2e90d	cdc: remove _token_metadata from db_context 1. It's unused since `cbe510d1b8` 2. It's unsafe to keep a reference to token_metadata& potentially across yield points. The higher-level motivation is to make storage_service::get_token_metadata() private so we can control better how it's used. For cdc, if the token_metadata is going to be needed to the future, it'd be better get it from db_context::_proxy.get_token_metadata_ptr(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201213162351.52224-2-bhalevy@scylladb.com>	2020-12-13 18:32:17 +02:00
Pavel Emelyanov	89fd524c5a	schema-tables: Add database argument to make_update_table_mutations There are 3 callers of this helper (cdc, migration manager and tests) and all of them already have the database object at hands. The argument will be used by next patch to remove call for global storage proxy instance from make_update_indices_mutations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:21:22 +03:00
Kamil Braun	2da723b9c8	cdc: produce postimage when inserting with no regular columns When a row was inserted into a table with no regular columns, and no such row existed in the first place, postimage would not be produced. Fix this. Fixes #7716. Closes #7723	2020-12-01 18:01:23 +02:00
Piotr Sarna	5a9dc6a3cc	Merge 'Cleanup CDC tests after CDC became GA' from Piotr Jastrzębski Now that CDC is GA, it should be enabled in all the tests by default. To achieve that the PR adds a special db::config::add_cdc_extension() helper which is used in cql_test_envm to make sure CDC is usable in all the tests that use cql_test_env.m As a result, cdc_tests can be simplified. Finally, some trailing whitespaces are removed from cdc_tests. Tests: unit(dev) Closes #7657 * github.com:scylladb/scylla: cdc: Remove trailing whitespaces from cdc_tests cdc: Remove mk_cdc_test_config from tests config: Add add_cdc_extension function for testing cdc: Add missing includes to cdc_extension.hh	2020-11-20 13:56:29 +01:00
Piotr Jastrzebski	89f4298670	cdc: Add missing includes to cdc_extension.hh Without those additional includes, a .cc file that includes cdc_extension.hh won't compile. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 16:11:33 +01:00
Piotr Jastrzebski	3024795507	cdc: Change for_testing to add_delay in make_new_cdc_generation The meaning of the parameter changes from defining whether the function is called in testing environment to deciding whether a delay should be added to a timestamp of a newly created CDC generation. This is a preparation for improvement in the following patch that does not always add delay to every node but only to non-first node. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 12:19:42 +01:00
Piotr Jastrzebski	6b1167ea0d	cdc: Remove std::iterator from collection_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	2091408478	cdc: Make it possible for CDC generation creation to fail Following patch enables CDC by default and this means CDC has to work will all the clusters now. There is a problematic case when existing cluster with no CDC support is stopped, all the binaries are updated to newer version with CDC enabled by default. In such case, nodes know that they are already members of the cluster but they can't find any CDC generation so they will try to create one. This creation may fail due to lack of QUORUM for the write. Before this patch such situation would lead to node failing to start. After the change, the node will start but CDC generation will be missing. This will mean CDC won't be able to work on such cluster before nodetool checkAndRepairCdcStreams is run to fix the CDC generation. We still fail to bootstrap if the creation of CDC generation fails. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:29:31 +01:00
Benny Halevy	7697c0f129	cdc: generation: use token_metadata_ptr So it could be safely held across continuations. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Calle Wilund	46ea8c9b8b	cdc: Add an "end-of-record" column to Fixes #7435 Adds an "eor" (end-of-record) column to cdc log. This is non-null only on last-in-timestamp group rows, i.e. end of a singular source "event". A client can use this as a shortcut to knowing whether or not he has a full cdc "record" for a given source mutation (single row change). Closes #7436	2020-10-26 09:39:27 +02:00
Avi Kivity	d3c0b4c555	cdc: log: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:23:16 +03:00
Calle Wilund	04deacd7e7	alternator::streams: Improve paging and fix parent-child calculation Fixes #7345 Fixes #7346 Do a more efficient collection skip when doing paging, instead of iterating the full sets. Ensure some semblance of sanity in the parent-child relationship between shards by ensuring token order sorting and finding the apparent previous ID coverting the approximate range of new gen. Fix endsequencenumber generation by looking at whether we are last gen or not, instead of the (not filled in) 'expired' column.	2020-10-07 08:43:39 +00:00
Avi Kivity	186c6cef57	cdc: sprinkle parentheses in EntryContainer concept Due to a bug, clang does not decay a type to a reference, failing the concept evaluation on correct input. Add parentheses to force it to decay the type.	2020-09-21 16:32:53 +03:00
Calle Wilund	d29d676955	cdc: Add setter for delta mode	2020-09-07 14:14:04 +00:00
Kamil Braun	ff78a3c332	cdc: rename CDC description tables... again Commit `a6ad70d3da` changed the format of stream IDs: the lower 8 bytes were previously generated randomly, now some of them have semantics. In particular, the least significant byte contains a version (stream IDs might evolve with further releases). This is a backward-incompatible change: the code won't properly handle stream IDs with all lower 8 bytes generated randomly. To protect us from subtle bugs, the code has an assertion that checks the stream ID's version. This means that if an experimental user used CDC before the change and then upgraded, they might hit the assertion when a node attempts to retrieve a CDC generation with old stream IDs from the CDC description tables and then decode it. In effect, the user won't even be able to start a node. Similarly as with the case described in `d89b7a0548`, the simplest fix is to rename the tables. This fix must get merged in before CDC goes out of experimental. Now, if the user upgrades their cluster from a pre-rename version, the node will simply complain that it can't obtain the CDC generation instead of preventing the cluster from working. The user will be able to use CDC after running checkAndRepairCDCStreams. Since a new table is added to the system_distributed keyspace, the cluster's schema has changed, so sstables and digests need to be regenerated for schema_digest_test.	2020-08-31 11:33:14 +03:00
Calle Wilund	70a282ced2	cdc: Remove post-filterings for keys-only/off cdc delta generation Refs #7095 CDC delta!=full both relied on post-filtering to remove generated log row and/or cells. This is inefficient. Instead, simply check if the data should be created in the visitors. v2: * Fixed delta logs rows created (empty) even when delta == off v3: * Killed delta == off v4: * Move checks into (const) member var(s)	2020-08-31 07:59:43 +00:00
Calle Wilund	78236c015a	cdc: Remove cdc delta_mode::off Fixes #7128 CDC logs are not useful without at least delta_mode==keys, since pre/post image data has no info on _what_ was actually done to base table in source mutation.	2020-08-31 07:59:40 +00:00
Calle Wilund	e50911e5b0	cdc: Do not generate pre/post image for non-existent rows Fixes #7119 Fixes #7120 If preimage select came up empty - i.e. the row did not exist, either due to never been created, or once delete, we should not bother creating a log preimage row for it. Esp. since it makes it harder to interpret the cdc log. If an operation in a cdc batch did a row delete (ranged, ck, etc), do not generate postimage data, since the row does no longer exist. Note that we differentiate deleting all (non-pk/ck) columns from actual row delete.	2020-08-26 18:14:09 +00:00
Calle Wilund	5ed3d6892d	cdc: Remove stored (postimage) data when doing row delete Fixes #6900 Clustered range deletes did not clear out the "row_states" data associated with affected rows (might be many). Adds a sweep through and erases relevant data. Since we do pre- and postimage in "order", this should only affect postimage.	2020-08-25 12:27:18 +03:00
Piotr Jastrzebski	f01ce1458f	cdc: Preserve metadata columns when geting only keys for delta Fixes #7095 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-08-25 10:41:54 +03:00
Benny Halevy	2f7c529c1c	storage_service: separate get_mutable_token_metadata Use a different getter for a token_metadata& that may be changed so we can better synchronize readers and writers of token_metadata and eventually allow them to yield in asynchronous loops. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Kamil Braun	0d3779e3e6	cdc: rewrite process_changes using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	9067f1a4e2	cdc: move some functions out of `cdc::transformer` Preparing them to be used outside of `transformer`.	2020-08-17 15:51:33 +02:00
Kamil Braun	4533f62f54	cdc: rewrite extract_changes using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	e9192a6108	cdc: rewrite should_split using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	ee87f4026e	cdc: rewrite find_timestamp using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	694714796f	cdc: introduce a ,,change visitor'' abstraction This is an abstraction for walking over mutations created by a write coordinator, deconstructing them into ,,atomic'' pieces (,,changes''), and consuming these pieces. Read the big comment in cdc/change_visitor.hh for more details.	2020-08-17 15:51:30 +02:00
Nadav Har'El	7e01ae089e	cdc: avoid including cdc/cdc_options.hh everywhere Before this patch, modifying cdc/cdc_options.hh required recompiling 264 source files. This is because this header file was included by a couple other header files - most notably schema.hh, where a forward declaration would have been enough. Only the handful of source files which really need to access the CDC options should include "cdc/cdc_options.hh" directly. After this patch, modifying cdc/cdc_options.hh requires only 6 source files to be recompiled. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200813070631.180192-1-nyh@scylladb.com>	2020-08-16 14:41:47 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Calle Wilund	2eb4522fef	cdc: Make pre image optionally "full" (include all columns) Makes the "preimage" option for cdc non-binary, i.e. it can now be "true"/"on", "false"/"off" or "full. The two former behaving like previously, the latter obviously including all columns in pre image.	2020-08-12 16:03:06 +00:00

1 2 3 4 5

220 Commits