scylladb

Author	SHA1	Message	Date
Avi Kivity	186c6cef57	cdc: sprinkle parentheses in EntryContainer concept Due to a bug, clang does not decay a type to a reference, failing the concept evaluation on correct input. Add parentheses to force it to decay the type.	2020-09-21 16:32:53 +03:00
Calle Wilund	d29d676955	cdc: Add setter for delta mode	2020-09-07 14:14:04 +00:00
Kamil Braun	ff78a3c332	cdc: rename CDC description tables... again Commit `a6ad70d3da` changed the format of stream IDs: the lower 8 bytes were previously generated randomly, now some of them have semantics. In particular, the least significant byte contains a version (stream IDs might evolve with further releases). This is a backward-incompatible change: the code won't properly handle stream IDs with all lower 8 bytes generated randomly. To protect us from subtle bugs, the code has an assertion that checks the stream ID's version. This means that if an experimental user used CDC before the change and then upgraded, they might hit the assertion when a node attempts to retrieve a CDC generation with old stream IDs from the CDC description tables and then decode it. In effect, the user won't even be able to start a node. Similarly as with the case described in `d89b7a0548`, the simplest fix is to rename the tables. This fix must get merged in before CDC goes out of experimental. Now, if the user upgrades their cluster from a pre-rename version, the node will simply complain that it can't obtain the CDC generation instead of preventing the cluster from working. The user will be able to use CDC after running checkAndRepairCDCStreams. Since a new table is added to the system_distributed keyspace, the cluster's schema has changed, so sstables and digests need to be regenerated for schema_digest_test.	2020-08-31 11:33:14 +03:00
Calle Wilund	70a282ced2	cdc: Remove post-filterings for keys-only/off cdc delta generation Refs #7095 CDC delta!=full both relied on post-filtering to remove generated log row and/or cells. This is inefficient. Instead, simply check if the data should be created in the visitors. v2: * Fixed delta logs rows created (empty) even when delta == off v3: * Killed delta == off v4: * Move checks into (const) member var(s)	2020-08-31 07:59:43 +00:00
Calle Wilund	78236c015a	cdc: Remove cdc delta_mode::off Fixes #7128 CDC logs are not useful without at least delta_mode==keys, since pre/post image data has no info on _what_ was actually done to base table in source mutation.	2020-08-31 07:59:40 +00:00
Calle Wilund	e50911e5b0	cdc: Do not generate pre/post image for non-existent rows Fixes #7119 Fixes #7120 If preimage select came up empty - i.e. the row did not exist, either due to never been created, or once delete, we should not bother creating a log preimage row for it. Esp. since it makes it harder to interpret the cdc log. If an operation in a cdc batch did a row delete (ranged, ck, etc), do not generate postimage data, since the row does no longer exist. Note that we differentiate deleting all (non-pk/ck) columns from actual row delete.	2020-08-26 18:14:09 +00:00
Calle Wilund	5ed3d6892d	cdc: Remove stored (postimage) data when doing row delete Fixes #6900 Clustered range deletes did not clear out the "row_states" data associated with affected rows (might be many). Adds a sweep through and erases relevant data. Since we do pre- and postimage in "order", this should only affect postimage.	2020-08-25 12:27:18 +03:00
Piotr Jastrzebski	f01ce1458f	cdc: Preserve metadata columns when geting only keys for delta Fixes #7095 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-08-25 10:41:54 +03:00
Benny Halevy	2f7c529c1c	storage_service: separate get_mutable_token_metadata Use a different getter for a token_metadata& that may be changed so we can better synchronize readers and writers of token_metadata and eventually allow them to yield in asynchronous loops. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Kamil Braun	0d3779e3e6	cdc: rewrite process_changes using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	9067f1a4e2	cdc: move some functions out of `cdc::transformer` Preparing them to be used outside of `transformer`.	2020-08-17 15:51:33 +02:00
Kamil Braun	4533f62f54	cdc: rewrite extract_changes using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	e9192a6108	cdc: rewrite should_split using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	ee87f4026e	cdc: rewrite find_timestamp using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	694714796f	cdc: introduce a ,,change visitor'' abstraction This is an abstraction for walking over mutations created by a write coordinator, deconstructing them into ,,atomic'' pieces (,,changes''), and consuming these pieces. Read the big comment in cdc/change_visitor.hh for more details.	2020-08-17 15:51:30 +02:00
Nadav Har'El	7e01ae089e	cdc: avoid including cdc/cdc_options.hh everywhere Before this patch, modifying cdc/cdc_options.hh required recompiling 264 source files. This is because this header file was included by a couple other header files - most notably schema.hh, where a forward declaration would have been enough. Only the handful of source files which really need to access the CDC options should include "cdc/cdc_options.hh" directly. After this patch, modifying cdc/cdc_options.hh requires only 6 source files to be recompiled. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200813070631.180192-1-nyh@scylladb.com>	2020-08-16 14:41:47 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Calle Wilund	2eb4522fef	cdc: Make pre image optionally "full" (include all columns) Makes the "preimage" option for cdc non-binary, i.e. it can now be "true"/"on", "false"/"off" or "full. The two former behaving like previously, the latter obviously including all columns in pre image.	2020-08-12 16:03:06 +00:00
Piotr Jastrzebski	80e3923b3c	codebase wide: replace find(...) != end() with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously the code pattern looked like: <collection>.find(<element>) != <collection>.end() In C++20 the same can be expressed with: <collection>.contains(<element>) This is not only more concise but also expresses the intend of the code more clearly. This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>	2020-08-11 13:28:50 +03:00
Calle Wilund	a6ad70d3da	cdc:stream_id: Encode format version + vnode grouping/index in id Fixes #6948 Changes the stream_id format from <token:64>:<rand:64> to <token:64>:<rand:38><index:22><version:4> The code will attempt to assert version match when presented with a stored id (i.e. construct from bytes). This means that ID:s created by previous (experimental) versions will break. Moves the ID encoding fully into the ID class, and makes the code path private for the topology generation code path. Removes some superflous accessors but adds accessors for token, version and index. (For alternator etc).	2020-08-11 12:48:04 +03:00
Nadav Har'El	936cf4cce0	merge: Increase row limits Merged pull request https://github.com/scylladb/scylla/pull/6910 by Wojciech Mitros: This patch enables selecting more than 2^32 rows from a table. The change becomes active after upgrading whole cluster - until then old limits are used. Tested reading 4.5*10^9 rows from a virtual table, manually upgrading a cluster with ccm and performing cql SELECT queries during the upgrade, ran unit tests in dev mode and cql and paging dtests. tests: add large paging state tests increase the maximum size of query results to 2^64	2020-08-04 19:52:30 +03:00
Kamil Braun	b5f3aef900	cdc: add an abstraction for building log mutations This commit takes out some responsibilities of `cdc::transformer` (which is currently a big ball of mud) into a separate class. This class is a simple abstraction for creating entries in a CDC log mutation. Low-level calls to the mutation API (such as `set_cell`) inside `cdc::transformer` were replaced by higher-level calls to the builder abstraction, removing some duplication of logic.	2020-08-04 19:37:03 +03:00
Calle Wilund	05851578d4	alternator::streams: Report streams as not ready until CDC stream id:s are available Refs #6864 When booting a clean scylla, CDC stream ID:s will not be availble until a nring delay time period has passed. Before this, writing to a CDC enabled table will fail hard. For alternator (and its tests), we can report the stream(s) for tables as not yet available (ENABLING) until such time as id:s are computed. v2: Keep storage service ref in executor	2020-08-03 20:34:15 +03:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Nadav Har'El	2dcb6294da	merge: cdc: New delta modes: `off`, `keys`, `fulll` Merged pull request https://github.com/scylladb/scylla/pull/6914 by By Juliusz Stasiewicz: The goal is to have finer control over CDC "delta" rows, i.e.: disable them totally (mode off); record only base PK+CK columns (mode keys); make them behave as usual (mode full, default). The editing of log rows is performed at the stage of finishing CDC mutation. Fixes #6838 tests: Added CQL test for `delta mode` cdc: Implementations of `delta_mode::off/keys` cdc: Infrastructure for controlling `delta_mode`	2020-08-03 14:10:15 +03:00
Botond Dénes	92a7b16cba	query: read_command: add max_result_size This field will replace max size which is currently passed once per established rpc connection via the CLIENT_ID verb and stored as an auxiliary value on the client_info. For now it is unused, but we update all sites creating a read command to pass the correct value to it. In the next patch we will phase out the old max size and use this field to pass max size on each verb instead.	2020-07-28 18:00:29 +03:00
Botond Dénes	8992bcd1f8	query: read_command: use tagged ints for limit ctor params The convenience constructor of read_command now has two integer parameter next to each other. In the next patch we intend to add another one. This is recipe for disaster, so to avoid mistakes this patch converts these parameters to tagged integers. This makes sure callers pass what they meant to pass. As a matter of fact, while fixing up call-sites, I already found several ones passing `query::max_partitions` to the `row_limit` parameter. No harm done yet, as `query::max_partitions` == `query::max_rows` but this shows just how easy it is to mix up parameters with the same type.	2020-07-28 18:00:29 +03:00
Juliusz Stasiewicz	9e4247090f	cdc: Implementations of `delta_mode::off/keys` At the stage of `finish`ing CDC mutation, deltas are removed (mode `off`) or edited to keep only PK+CK of the base table (mode `keys`). Fixes #6838	2020-07-27 19:05:47 +02:00
Juliusz Stasiewicz	c05128d217	cdc: Infrastructure for controlling `delta_mode` The goal is to have finer control over CDC "delta" rows, i.e.: - disable them totally (mode `off`); - record only PK+CK (mode `keys`); - make them behave as usual (mode `full`, default). This commit adds the necessary infrastructure to `cdc_options`.	2020-07-27 19:00:06 +02:00
Kamil Braun	12e2891c60	cdc: if ring_delay == 0, don't add delay to newly created generation If ring_delay == 0, something fishy is going on, e.g. single-node tests are being performed. In this case we want the CDC generation to start operating immediately. There is no need to wait until it propagates to the cluster. You should not use ring_delay == 0 in production. Fixes https://github.com/scylladb/scylla/issues/6864.	2020-07-22 16:06:09 +03:00
Pavel Emelyanov	757a7145b9	headers: Remove mutation.hh from trace_state.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:40:23 +03:00
Piotr Dulikowski	e2462bce3b	cdc: fix a corner case inside get_base_table It is legal for a user to create a table with name that has a _scylla_cdc_log suffix. In such case, the table won't be treated as a cdc log table, and does not require a corresponding base table to exist. During refactoring done as a part of initial implemetation of of Alternator streams (#6694), `is_log_for_some_table` started throwing when trying to check a name like `X_scylla_cdc_log` when there was no table with name `X`. Previously, it just returned false. The exception originates inside `get_base_table`, which tries to return the base table schema, not checking for its existence - which may throw. It makes more sense for this function to return nullptr in such case (it already does when provided log table name does not have the cdc log suffix), so this patch adds an explicit check and returns nullptr when necessary. A similar oversight happened before (see #5987), so this patch also adds a comment which explains why existence of `X_scylla_cdc_log` does not imply existence of `X`. Fixes: #6852 Refs: #5724, #5987	2020-07-16 16:38:48 +03:00
Calle Wilund	3376209718	cdc::schema: Make extensions expicitly settable from builder To make non-cql cdc schema options a reality.	2020-07-15 08:21:34 +00:00
Calle Wilund	0158f6473b	cdc: Add stream ids structure with time and expiration For reading the topology tables from within scylla.	2020-07-15 08:10:23 +00:00
Calle Wilund	331aa7c501	cdc: Add "is_cdc_metacolumn_name" predicate To sift column names	2020-07-15 08:10:23 +00:00
Calle Wilund	8a728ce618	cdc: Add get_base_table helper	2020-07-15 08:10:23 +00:00
Calle Wilund	8f462e8606	CDC::log: Add `base_name` helper To extract base table name from CDC log table name.	2020-07-15 08:10:23 +00:00
Piotr Dulikowski	ad811a48bf	cdc: fix indentation	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	20b236d27d	cdc: don't update partition state when not needed In some cases, tracking the state of processed rows inside `transformer` is not needd at all. We don't need to do it if either: - Preimage and postimage are disabled for the table, - Only preimage is enabled and we are processing the last timestamp. This commit disables updating the state in the cases listed above.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	246f8da6f6	cdc: implement pre/postimage persistence Moves responsibility for generating pre/postimage rows from the "process_change" method to "produce_preimage" and "produce_postimage". This commit actually affects the contents of generated CDC log mutations. Added a unit test that verifies more complicated cases with CQL BATCH.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	24b50ffbc8	cdc: add interface for producing pre/postimages Introduces new methods to the change_processor interface that will cause it to produce pre/postimage rows for requested clustering key, or for static row. Introduces logic in split.cc responsible for calling pre/postimage methods of the change_processor interface. This does not have any effect on generated CDC log mutations yet, because the transformer class has empty implementations in place of those methods.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	761c59d92a	cdc: load preimage query result into partition state fields Instead of looking up preimage data directly from the raw preimage query results, use the raw results to populate current partition state data, and read directly from the current partition state.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	946354ee74	cdc: introduce fields for keeping partition state Introduces data structures that will be used for keeping the current state of processed rows: _clustering_row_states, and _static_row_state.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	bb587a93be	cdc: rename set_pk_columns -> allocate_new_log_row The new name better describes what this function does.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	82ddeb1992	cdc: track batch_no inside transformer Move tracking of batch_no inside the transformer.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	7b47f84965	cdc: move cdc$time generation to transformer Generate the timeuuid on the transformer side, which allows to simplify the change_processor interface.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	7691568b0a	cdc: move find_timestamp to split.cc The function is no longer used in log.cc, so instead it is moved to split.cc. Removed declaration of the function from the log.hh header, because it is not used elsewhere - apart from testing code, but it already declared find_timestamp in the cdc_test.cc file.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	51d97be0b3	cdc: introduce change_processor interface This allows for a more refined use of the transformer by the for_each_change function (now named "process_changes_with_splitting). The change_processor interface exposes two methods so far: begin_timestamp, and process_change (previously named "transform"). By separating those two and exposing them, process_changes_with\ _splitting can cause the transformer to generate less CDC log mutations - only one for each timestamp in the batch.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	f907cab156	cdc: remove redundant schema arguments from cdc functions A `mutation` object already has a reference to its schema. It does not make sense to call functions changed in this commit with a different schema.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	fa00ea996a	cdc: move management of generated mutations inside transformer CDC log mutations are now stored inside `transformer`, and only moved to the final set of mutations at the end of `transformer`'s lifetime.	2020-07-08 15:36:40 +02:00

1 2 3 4

188 Commits