scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Takuya ASADA	e6e4359414	scylla_raid_setup: switch to systemd mount unit Since we already use systemd unit file for coredump bind mount and swapfile, we should move to systemd mount unit for data partition as well.	2020-07-13 17:14:44 +03:00
Pekka Enberg	c807c903ab	pull_github_pr.sh: Use "cherry-pick" for single-commit pull requests Improve the "pull_github_pr.sh" to detect the number of commits in a pull request, and use "git cherry-pick" to merge single-commit pull requests. Message-Id: <20200713093044.96764-1-penberg@scylladb.com>	2020-07-13 17:14:44 +03:00
Avi Kivity	d74582fbc5	move jmx/tools submodules to tools directory Move all package repositories to tools directory.	2020-07-13 17:14:14 +03:00
Avi Kivity	06341d2528	dist: fix debian generated files for non-default PRODUCT setting There are a bunch of renames that are done if PRODUCT is not the default, but the Python code for them is incorrect. Path.glob() is not a static method, and Path does not support .endswith(). Fix by constructing a Path object, and later casting to str.	2020-07-13 11:51:31 +03:00
Pekka Enberg	f2b4c1a212	scylla_prepare: Improve error message on missing CPU features Let's report each missing CPU feature individually, and improve the error message a bit. For example, if the "clmul" instruction is missing, the report looks as follows: ERROR: You will not be able to run Scylla on this machine because its CPU lacks the following features: pclmulqdq If this is a virtual machine, please update its CPU feature configuration or upgrade to a newer hypervisor. Fixes #6528	2020-07-13 11:39:29 +03:00
Pekka Enberg	bc053b3cfa	README.md: Add links to mailing lists and Slack Add links to the users and developers mailing lists, and the Slack channel in README.md to make them more discoverable. Message-Id: <20200713074654.90204-1-penberg@scylladb.com>	2020-07-13 10:48:55 +03:00
Pekka Enberg	df6a0ec5e5	README.md: Update build and run instructions Simplify the build and run instructions by splitting the text in three sections (prerequisites, building, and running) and streamlining the steps a bit. Message-Id: <20200713065910.84582-1-penberg@scylladb.com>	2020-07-13 10:04:12 +03:00
Pekka Enberg	5476efabb3	configure.py: Make output less verbose by default The configure.py script outputs the Seastar build command it executes: ['./cooking.sh', '-i', 'dpdk', '-d', '../build/release/seastar', '--', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_C_COMPILER=gcc', '-DCMAKE_CXX_COMPILER=g++', '-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON', '-DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300', '-DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 ', '-DSeastar_CXX_DIALECT=gnu++20', '-DSeastar_API_LEVEL=4', '-DSeastar_UNUSED_RESULT_ERROR=ON', '-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm'] The output is mostly useful for debugging the build process itself, so hide it behind a "--verbose" flag, and make it more human-readable while at it: ./cooking.sh \ -i \ dpdk \ -d \ ../build/release/seastar \ -- \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DCMAKE_C_COMPILER=gcc \ -DCMAKE_CXX_COMPILER=g++ \ -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON \ -DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300 \ -DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 \ -DSeastar_CXX_DIALECT=gnu++20 \ -DSeastar_API_LEVEL=4 \ -DSeastar_UNUSED_RESULT_ERROR=ON \ -DSeastar_DPDK=ON \ -DSeastar_DPDK_MACHINE=wsm Message-Id: <20200713065509.83184-1-penberg@scylladb.com>	2020-07-13 09:57:38 +03:00
Botond Dénes	ef2c8f563b	scylla-gdb.py: scylla fiber: add suggestion for further investigation scylla fiber often fails to really unwind the entire fiber, stopping sooner than expected. This is expected as scylla fiber only recognizes the most standard continuations but can drop the ball as soon as there is an unusual transmission. This commits adds a message below the found tasks explaining that the list might not be exhaustive and prints a command which can be used to explain why the unwinding stopped at the last task. While at it also rephrase an out-of-date comment. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200710120813.100009-1-bdenes@scylladb.com>	2020-07-12 15:43:21 +03:00
Dejan Mircevski	29fccd76ea	cql/restrictions: Rename find_if to find_atom As requested in #5763 feedback, rename to avoid clashes with std::find_if and boost::find_if. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-12 14:12:30 +03:00
Dejan Mircevski	9dac9a25e5	cql/restrictions: Constrain find_if and count_if As requested in #5763 feedback, require that Fn be callable with binary_operator in the functions mentioned above. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-12 14:11:39 +03:00
Pavel Emelyanov	1331623465	test.py: Don't feed fail-on-abandoned-failed-futures to unit tests The problem is that this option is defined in seastar testing wrapper, while no unit tests use it, all just start themselves with app.run() and would complain on unknown option. "Would", because nowadays every single test in it declares its own options in suite.yaml, that override test.py's defaults. Once an option-less unit test is added (B+ tree ones) it will complain. The proposal is to remove this option from defaults, if any unit test will use the seastar testing wrappers and will need this option, it can add one to the suite.yaml. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200709084602.8386-1-xemul@scylladb.com>	2020-07-10 16:21:14 +02:00
Tomasz Grabiec	883ac4a78c	Merge "Some selective noexcept bombing" form Pavel E. The goal is to make the lambdas, that are fed into partition cache's clear_and_dispose() and erase_in_dispose(), to be noexcept. This is to satisfy B+, which strictly requires those to be noexcept (currently used collections don't care). The set covers not only the strictly required minimum, but also some other methods that happened to be nearby. * https://github.com/xemul/scylla/tree/br-noexcepts-over-the-row-cache: row_cache: Mark invalidation lambda as noexcept cache_tracker: Mark methods noexcept cache_entry: Mark methods noexcept region: Mark trivial noexcept methods as such allocation_strategy: Mark returning lambda as noexcept allocation_strategy: Mark trivial noexcept methods as such dht: Mark noexcept methods	2020-07-10 15:02:52 +02:00
Nadav Har'El	f549d147ea	alternator: fix Expected's "NULL" operator with missing AttributeValueList The "NULL" operator in Expected (old-style conditional operations) doesn't have any parameters, so we insisted that the AttributeValueList be empty. However, we forgot to allow it to also be missing - a possibility which DynamoDB allows. This patch adds a test to reproduce this case (the test passes on DyanmoDB, fails on Alternator before this patch, and succeeds after this patch), and a fix. Fixes #6816. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200709161254.618755-1-nyh@scylladb.com>	2020-07-10 07:45:02 +02:00
Benny Halevy	3ce86a7160	test: restrictions_test: set_contains: uncomment check depnding on #6797 Now that #6797 is fixed. Refs #5763 Cc: Dejan Mircevski <dejan@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Test: restrictions_test(debug) Message-Id: <20200709123703.955897-1-bhalevy@scylladb.com>	2020-07-09 17:56:09 +03:00
Benny Halevy	ec77777bda	bytes: compare_unsigned: do not pass nullptr to memcmp If any of the compared bytes_view's is empty consider the empty prefix is same and proceed to compare the size of the suffix. A similar issue exists in legacy_compound_view::tri_comparator::operator(). It too must not pass nullptr to memcmp if any of the compared byte_view's is empty. Fixes #6797 Refs #6814 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Test: unit(dev) Branches: all Message-Id: <20200709123453.955569-1-bhalevy@scylladb.com>	2020-07-09 17:54:46 +03:00
Nadav Har'El	9042161ba3	merge: cdc: better pre/postimages for complicated batches Merged pull request https://github.com/scylladb/scylla/pull/6741 by Piotr Dulikowski: This PR changes the algorithm used to generate preimages and postimages in CDC log. While its behavior is the same for non-batch operations (with one exception described later), it generates pre/postimages that are organized more nicely, and account for multiple updates to the same row in one CQL batch. Fixes #6597, #6598 Tests: - unit(dev), for each consecutive commit - unit(debug), for the last commit Previous method The previous method worked on a per delta row basis. First, the base table is queried for the current state of the rows being modified in the processed mutation (this is called the "preimage query"). Then, for each delta row (representing a modification of a row): If preimage is enabled and the row was already present in the table, a corresponding preimage row is inserted before the delta row. The preimage row contains data taken directly from the preimage query result. Only columns that are modified by the delta are included in the preimage. If postimage is enabled, then a postimage row is inserted after the delta row. The postimage row contains data which was a result of taking row data directly from the preimage query result and applying the change the corresponding delta row represented. All columns of the row are included in the postimage. The above works well for simple cases such like singular CQL INSERT, UPDATE, DELETE, or simple CQL BATCH-es. An example: cqlsh:ks> BEGIN UNLOGGED BATCH INSERT INTO tbl (pk, ck, v) VALUES (0, 1, 111); INSERT INTO tbl (pk, ck, v) VALUES (0, 2, 222); APPLY BATCH; cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl", pk, ck, v from ks.tbl_scylla_cdc_log ; cdc$batch_seq_no \| cdc$operation \| cdc$ttl \| pk \| ck \| v ------------------+---------------+---------+----+----+----- ...snip... 0 \| 0 \| null \| 0 \| 1 \| 100 1 \| 2 \| null \| 0 \| 1 \| 111 2 \| 9 \| null \| 0 \| 1 \| 111 3 \| 0 \| null \| 0 \| 2 \| 200 4 \| 2 \| null \| 0 \| 2 \| 222 5 \| 9 \| null \| 0 \| 2 \| 222 Preimage rows are represented by cdc operation 0, and postimage by 9. Please note that all rows presented above share the same value of cdc$time column, which was not shown here for brevity. Problems with previous approach This simple algorithm has some conceptual and implementational problems which arise when processing more complicated CQL BATCH-es. Consider the following example: cqlsh:ks> BEGIN UNLOGGED BATCH INSERT INTO tbl (pk, ck, v1) VALUES (0, 0, 1) USING TTL 1000; INSERT INTO tbl (pk, ck, v2) VALUES (0, 0, 2) USING TTL 2000; APPLY BATCH; cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl", pk, ck, v1, v2 FROM tbl_scylla_cdc_log; cdc$batch_seq_no \| cdc$operation \| cdc$ttl \| pk \| ck \| v1 \| v2 ------------------+---------------+---------+----+----+------+------ ...snip... 0 \| 0 \| null \| 0 \| 0 \| null \| 0 1 \| 2 \| 2000 \| 0 \| 0 \| null \| 2 2 \| 9 \| null \| 0 \| 0 \| 0 \| 2 3 \| 0 \| null \| 0 \| 0 \| 0 \| null 4 \| 1 \| 1000 \| 0 \| 0 \| 1 \| null 5 \| 9 \| null \| 0 \| 0 \| 1 \| 0 A single cdc group (corresponding to rows sharing the same cdc$time) might have more than one delta that modify the same row. For example, this happens when modifying two columns of the same row with different TTLs - due to our choice of CDC log schema, we must represent such change with two delta rows. It does not make sense to present a postimage after the first delta and preimage before the second - both deltas are applied simultaneously by the same CQL BATCH, so the middle "image" is purely imaginary and does not appear at any point in the table. Moreover, in this example, the last postimage is wrong - v1 is updated, but v2 is not. None of the postimages presented above represent the final state of the row. New algorithm The new algorithm works now on per cdc group basis, not delta row. When starting processing a CQL BATCH: Load preimage query results into a data structure representing current state of the affected rows. For each cdc group: For each row modified within the group, a preimage is produced, regardless if the row was present in the table. The preimage is calculated based on the current state. Only include columns that are modified for this row within the group. For each delta, produce a delta row and update the current state accordingly. Produce postimages in the same way as preimages - but include all columns for each row in the postimage. The new algorithm produces postimage correctly when multiple deltas affect one, because the state of the row is updated on the fly. This algorithm moves preimage and postimage rows to the beginning and the end of the cdc group, accordingly. This solves the problem of imaginary preimages and postimages appearing inside a cdc group. Unfortunately, it is possible for one CQL BATCH to contain changes that use multiple timestamps. This will result in one CQL BATCH creating multiple cdc groups, with different cdc$time. As it is impossible, with our choice of schema, to tell that those cdc groups were created from one CQL BATCH, instead we pretend as if those groups were separate CQL operations. By tracking the state of the affected rows, we make sure that preimage in later groups will reflect changes introduces in previous groups. One more thing - this algorithm should have the same results for singular CQL operations and simple CQL BATCH-es, with one exception. Previously, preimage not produced if a row was not present in the table. Now, the preimage row will appear unconditionally - it will have nulls in place of column values. * 'cdc-pre-postimage-persistence' of github.com:piodul/scylla: cdc: fix indentation cdc: don't update partition state when not needed cdc: implement pre/postimage persistence cdc: add interface for producing pre/postimages cdc: load preimage query result into partition state fields cdc: introduce fields for keeping partition state cdc: rename set_pk_columns -> allocate_new_log_row cdc: track batch_no inside transformer cdc: move cdc$time generation to transformer cdc: move find_timestamp to split.cc cdc: introduce change_processor interface cdc: remove redundant schema arguments from cdc functions cdc: move management of generated mutations inside transformer cdc: move preimage result set into a field of transformer cdc: keep ts and tuuid inside transformer cdc: track touched parts of mutations inside transformer cdc: always include preimage for affected rows	2020-07-09 16:55:55 +03:00
Pavel Emelyanov	bb32cff23d	row_cache: Mark invalidation lambda as noexcept It calls noexcept functions inside and handles the exception from throwing one itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:46:38 +03:00
Pavel Emelyanov	1346289151	cache_tracker: Mark methods noexcept All but few are trivially such. The clear_continuity() calls cache_entry::set_continuous() that had become noexcept a patch ago. The allocator() calls region.allocator() which had been marked noexcept few patches back. The on_partition_erase() calls allocator().invalidate_references(), both had been marked noexcept few patches back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:44:17 +03:00
Pavel Emelyanov	d4ef845136	cache_entry: Mark methods noexcept All but one are trivially such, the position() one calls is_dummy_entry() which has become noexcept right now. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:43 +03:00
Pavel Emelyanov	3237796e00	region: Mark trivial noexcept methods as such Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:37 +03:00
Pavel Emelyanov	2c4a94aeab	allocation_strategy: Mark returning lambda as noexcept It just calls current_alloctor().destroy() which is noexcept Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:23 +03:00
Pavel Emelyanov	a497dfdd0b	allocation_strategy: Mark trivial noexcept methods as such Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:03 +03:00
Pavel Emelyanov	6d7ae4ead1	dht: Mark noexcept methods These are either trivially noexcept already, or call each-other, thus becoming noexcept too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:03 +03:00
Piotr Sarna	7ae3b25d8e	alternator: cleanup raw GetString() calls Instead of using raw GetString() from rapidjson, it's neater to use a helper for creating string views: rjson::to_string_view(). Message-Id: <3afda97403d4601c9600f6838f2028bfabd2f2f9.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:40 +03:00
Piotr Sarna	75dbaa0834	test: add alternator test for incorrect numeric values The test case is put inside test_manual_requests suite, because boto3 validates numeric inputs and does not allow passing arbitrary incorrect values. Tests: unit(dev), alternator(local, remote) Message-Id: <ac2baedc2ea61f0d857e7c01839f34cd15f7e02d.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:33 +03:00
Piotr Sarna	96426df72e	alternator: translate number errors to ValidationException In order to be consistent with returned error types, marshaling exceptions thrown from parsing big decimals are translated to ValidationException. Message-Id: <1446878cd63ad8291327a399cf700e4f402d108c.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:25 +03:00
Dejan Mircevski	d956233a80	cql_query_test: Drop get() on cquery_nofail result cquery_nofail returns the query result, not a future. Invoking .get() on its result is unnecessary. This just happened to compile because shared_ptr has a get() method with the same signature as future::get. Tests: cql_query_test unit test (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-09 13:52:52 +03:00
Nadav Har'El	8b3dac040a	alternator: add request headers to trace-level logging When "trace"-level logging is enabled for Alternator, we log every request, but currently only the request's body. For debugging, it is sometimes useful to also see the headers - which are important to debug authentication, for example. So let's print the headers as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200709103414.599883-1-nyh@scylladb.com>	2020-07-09 12:38:45 +02:00
Asias He	67f6da6466	repair: Switch to btree_set for repair_hash. In one of the longevity tests, we observed 1.3s reactor stall which came from repair_meta::get_full_row_hashes_source_op. It traced back to a call to std::unordered_set::insert() which triggered big memory allocation and reclaim. I measured std::unordered_set, absl::flat_hash_set, absl::node_hash_set and absl::btree_set. The absl::btree_set was the only one that seastar oversized allocation checker did not warn in my tests where around 300K repair hashes were inserted into the container. - unordered_set: hash_sets=295634, time=333029199 ns - flat_hash_set: hash_sets=295634, time=312484711 ns - node_hash_set: hash_sets=295634, time=346195835 ns - btree_set: hash_sets=295634, time=341379801 ns The btree_set is a bit slower than unordered_set but it does not have huge memory allocation. I do not measure real difference of total time to finish repair of the same dataset with unordered_set and btree_set. To fix, switch to absl btree_set container. Fixes #6190	2020-07-09 11:35:18 +03:00
Nadav Har'El	9ff9cd37c3	alternator test: tests for the number type We had some tests for the number type in Alternator and how it can be stored, retrieved, calculated and sorted, but only had rudementary tests for the allowed magnitude and precision of numbers. This patch creates a new test file, test_number.py, with tests aiming to check exactly the supported magnitudes and precision of numbers. These tests verify two things: 1. That Alternator's number type supports the full precision and magnitude that DynamoDB's number type supports. We don't want to see precision or magnitude lost when storing and retrieving numbers, or when doing calculations on them. 2. That Alternator's number type does not have better precision or magnitude than DynamoDB does. If it did, users may be tempted to rely on that implementation detail. The three tests of the first type pass; But all four tests of the second type xfail: Alternator currently stores numbers using big_decimal which has unlimited precision and almost-unlimited magnitude, and is not yet limited by the precision and magnitude allowed by DynamoDB. This is a known issue - Refs #6794 - and these four new xfailing tests will can be used to reproduce that issue. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200707204824.504877-1-nyh@scylladb.com>	2020-07-09 07:38:36 +02:00
Piotr Sarna	91a616968c	Update seastar submodule * seastar cbf88f59...5632cf21 (1): > Merge "Handle or avoid a few std::bad_alloc" from Rafael	2020-07-08 21:22:31 +02:00
Hagit Segev	aec910278f	build-deb.sh: fix rm to erase only python While building unified-deb we first use scylla/reloc/build_deb.sh to create the scylla core package, and after that scylla/reloc/python3/build_deb.sh to create python3. On 058da69#diff-4a42abbd0ed654a1257c623716804c82 a new rm -rf command was added. It causes python3 process to erase Scylla-core process. Set python3 to erase its own dir scylla-python3-package only.	2020-07-08 17:58:38 +03:00
Piotr Dulikowski	ad811a48bf	cdc: fix indentation	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	20b236d27d	cdc: don't update partition state when not needed In some cases, tracking the state of processed rows inside `transformer` is not needd at all. We don't need to do it if either: - Preimage and postimage are disabled for the table, - Only preimage is enabled and we are processing the last timestamp. This commit disables updating the state in the cases listed above.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	246f8da6f6	cdc: implement pre/postimage persistence Moves responsibility for generating pre/postimage rows from the "process_change" method to "produce_preimage" and "produce_postimage". This commit actually affects the contents of generated CDC log mutations. Added a unit test that verifies more complicated cases with CQL BATCH.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	24b50ffbc8	cdc: add interface for producing pre/postimages Introduces new methods to the change_processor interface that will cause it to produce pre/postimage rows for requested clustering key, or for static row. Introduces logic in split.cc responsible for calling pre/postimage methods of the change_processor interface. This does not have any effect on generated CDC log mutations yet, because the transformer class has empty implementations in place of those methods.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	761c59d92a	cdc: load preimage query result into partition state fields Instead of looking up preimage data directly from the raw preimage query results, use the raw results to populate current partition state data, and read directly from the current partition state.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	946354ee74	cdc: introduce fields for keeping partition state Introduces data structures that will be used for keeping the current state of processed rows: _clustering_row_states, and _static_row_state.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	bb587a93be	cdc: rename set_pk_columns -> allocate_new_log_row The new name better describes what this function does.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	82ddeb1992	cdc: track batch_no inside transformer Move tracking of batch_no inside the transformer.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	7b47f84965	cdc: move cdc$time generation to transformer Generate the timeuuid on the transformer side, which allows to simplify the change_processor interface.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	7691568b0a	cdc: move find_timestamp to split.cc The function is no longer used in log.cc, so instead it is moved to split.cc. Removed declaration of the function from the log.hh header, because it is not used elsewhere - apart from testing code, but it already declared find_timestamp in the cdc_test.cc file.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	51d97be0b3	cdc: introduce change_processor interface This allows for a more refined use of the transformer by the for_each_change function (now named "process_changes_with_splitting). The change_processor interface exposes two methods so far: begin_timestamp, and process_change (previously named "transform"). By separating those two and exposing them, process_changes_with\ _splitting can cause the transformer to generate less CDC log mutations - only one for each timestamp in the batch.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	f907cab156	cdc: remove redundant schema arguments from cdc functions A `mutation` object already has a reference to its schema. It does not make sense to call functions changed in this commit with a different schema.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	fa00ea996a	cdc: move management of generated mutations inside transformer CDC log mutations are now stored inside `transformer`, and only moved to the final set of mutations at the end of `transformer`'s lifetime.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	76a323a02d	cdc: move preimage result set into a field of transformer Instead of passing the preimage result set in each `transform` call, it is now assigned to a field, and `transform` uses that field.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	79eabc04a8	cdc: keep ts and tuuid inside transformer Adds a `begin_timestamp` method which tells the `transformer` to start using the following timestamp and timeuuid when generating new log row mutations.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	3c01b3c41d	cdc: track touched parts of mutations inside transformer Moves tracking of the "touched parts" statistics inside the transformer class. This commit is the first of multiple commits in this series which move parts of the state used in CDC log row generation inside the `transformer` class. There is a lot of state being passed to `transformer` each time its methods are called, which could be as well tracked by the `transformer` itself. This will result in a nicer interface and will allow us to generate less CDC log mutations which give the same result.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	027d20c654	cdc: always include preimage for affected rows This changes the current algorithm so that the preimage row will not be skipped if the corresponding rows was not present in preimage query results.	2020-07-08 15:36:40 +02:00

1 2 3 4 5 ...

22745 Commits