scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 18:10:39 +00:00

Author	SHA1	Message	Date
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Avi Kivity	b1f9df279a	Merge "Untie cdc, storage service and migration notifier knot" from Pavel E " Storage service needs migration notifier reference to pass it to cdc service via get_local_storage_service(). This set removes - get_local_storage_service from cdc - migration notifier from storage service - db_context::builder from cdc (released nuclear binding energy) tests: unit(dev) " * 'br-cdc-no-storage-service' of https://github.com/xemul/scylla: storage_service: Remove migration notifier dependency cdc: Remove db_context::builder cdc: Provide migration notifier right at once cdc: Remove db_context::builder::with_migration_notifier	2021-05-11 18:39:10 +03:00
Piotr Grabowski	cd6154e8bf	cdc: log: assert post_image is always in full mode Add an assertion that checks that post_image can never be in non-full mode.	2021-05-04 12:33:15 +02:00
Pavel Emelyanov	cc813ef0dd	cdc: Remove db_context::builder Right now the builder is just an opaque transfer between cdc_service constructor args and cdc_service's db_context constructor args. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-29 22:46:57 +03:00
Pavel Emelyanov	3a7ca647af	cdc: Provide migration notifier right at once The only way db_context's migration notifier reference is set up is via cdc_service->db_context::builder->.build chain of calls. Since the builder's notifier optional reference is always disengaged (the .with_migration_notifier is removed by previous patch) the only possible notifier reference there is from the storage service which, in turn, is the same as in main.cc. Said that -- push the notifier reference onto db_context directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-29 22:40:24 +03:00
Pavel Emelyanov	421a514c30	cdc: Remove db_context::builder::with_migration_notifier It's unused and removing it makes next patch's life simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-29 22:39:12 +03:00
Piotr Grabowski	b1650114eb	cdc: log: fill cdc$deleted_ columns in pre-images Before this change, cdc$deleted_ columns were all NULL in pre-images. Lack of such information made it hard to correctly interpret the pre-image rows, for example: INSERT INTO tbl(pk, ck, v, v2) VALUES (1, 1, null, 1); INSERT INTO tbl(pk, ck, v2) VALUES (1, 1, 1); For this example, pre-image generated for the second operation would look like this (in both 'true' and 'full' pre-image mode): pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 v=NULL has two meanings: 1. If pre-image was in 'true' mode, v=NULL describes that v was not affected (affected columns: pk, ck, v2). 2. If pre-image was in 'full' mode, v=NULL describes that v was equal to NULL in the pre-image. Therefore, to properly decode pre-images you would need to know in which mode pre-image was configured on the CDC-enabled table at the moment this CDC log row was inserted. There is no way to determine such information (you can only check a current mode of pre-image). A solution to this problem is to fill in the cdc$deleted_ columns for pre-images. After this change, for the INSERT described above, CDC now generates the following log row: If in pre-image 'true' mode: pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 If in pre-image 'full' mode: pk=1, ck=1, v=NULL, cdc$deleted_v=true, v2=1 A client library now can properly decode a pre-image row. If it sees a NULL value, it can now check the cdc$deleted_ column to determine if this NULL value was a part of pre-image or it was omitted due to not being an affected column in the delta operation. No such change is necessary for the post-image rows, as those images are always generated in the 'full' mode. Additional example of trouble decoding pre-images before this change. tbl2 - 'true' pre-image mode, tbl3 - 'full' pre-image mode: INSERT INTO tbl2(pk, ck, v, v2) VALUES (1, 1, 5, 1); INSERT INTO tbl3(pk, ck, v, v2) VALUES (1, 1, null, 1); INSERT INTO tbl2(pk, ck, v2) VALUES (1, 1, 1); generated pre-image: pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 INSERT INTO tbl3(pk, ck, v2) VALUES (1, 1, 1); generated pre-image: pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 Both pre-images look the same, but: 1. v=NULL in tbl2 describes v being omitted from the pre-image. 2. v=NULL in tbl3 described v being NULL in the pre-image.	2021-04-29 18:04:07 +02:00
Avi Kivity	daeddda7cc	treewide: remove inclusions of storage_proxy.hh from headers storage_proxy.hh is huge and includes many headers itself, so remove its inclusions from headers and re-add smaller headers where needed (and storage_proxy.hh itself in source files that need it). Ref #1.	2021-04-20 21:23:00 +03:00
Piotr Grabowski	61c8e196be	cdc: improve exception message of invalid "ttl" Improve the exception message of providing invalid "ttl" value to the table. Previously, if you executed a CREATE TABLE query with invalid "ttl" value, you would get a non-descriptive error message: CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': true, 'ttl': 'invalid'}; ServerError: stoi This commit adds more descriptive exception messages: CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': true, 'ttl': 'kgjhfkjd'}; ConfigurationException: Invalid value for CDC option "ttl": kgjhfkjd CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': true, 'ttl': '75747885787487'}; ConfigurationException: Invalid CDC option: ttl too large	2021-04-14 17:40:23 +02:00
Piotr Grabowski	10390afc10	cdc: add validation of "enable" and "postimage" Add validation of "enable" and "postimage" CDC options. Both options are boolean options, but previously they were not validated, meaning you could issue a query: CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': 'dsfdsd'}; and it would be executed without any errors, silently interpreting "dsfdsd" as false. This commit narrows possible values of those boolean CDC options to false, true, 0, 1. After applying this change, issuing the query above would result in this error message: ConfigurationException: Invalid value for CDC option "enabled": dsfdsd	2021-04-14 17:36:38 +02:00
Piotr Sarna	d77eb39076	Merge 'cdc: log: avoid linearizations' from Michał Chojnowski CDC log uses `bytes` to deal with cells and their values, and linearizes all values indiscriminately. This series makes a switch from `bytes` to `managed_bytes` to avoid that linearization. Fixes #7506. Closes #8429 * github.com:scylladb/scylla: cdc: log: change yet another occurence of `bytes` to `managed_bytes` cdc: log: switch the remaining usages of `bytes` to `managed_bytes` in collection_visitor cdc: log: change `deleted_elements` in log_mutation_builder from bytes to managed_bytes cdc: log: rewrite collection merge to use managed_bytes instead of bytes cdc: log: don't linearize collections in get_preimage_col_value cdc: log: change return type of get_preimage_col_value to managed_bytes cdc: log: remove an unnecessary copy in process_row_visitor::live_atomic_cell cdc: log: switch cell_map from bytes to managed_bytes cdc: log: change the argument of log_mutation_builder::set_value to managed_bytes_view cdc: log: don't linearize the primary key in log_mutation_builder atomic_cell: add yet another variant of make_live for managed_bytes_view compound: add explode_fragmented	2021-04-12 10:56:12 +02:00
Michał Chojnowski	6b31f73987	cdc: log: change yet another occurence of `bytes` to `managed_bytes`	2021-04-08 10:16:21 +02:00
Michał Chojnowski	061f72166c	cdc: log: switch the remaining usages of `bytes` to `managed_bytes` in collection_visitor	2021-04-08 10:16:21 +02:00
Michał Chojnowski	2760382a68	cdc: log: change `deleted_elements` in log_mutation_builder from bytes to managed_bytes	2021-04-08 10:16:21 +02:00
Michał Chojnowski	ba53c85829	cdc: log: rewrite collection merge to use managed_bytes instead of bytes	2021-04-08 10:16:21 +02:00
Michał Chojnowski	42acdc4d09	cdc: log: don't linearize collections in get_preimage_col_value	2021-04-08 10:16:21 +02:00
Michał Chojnowski	70a2bed70b	cdc: log: change return type of get_preimage_col_value to managed_bytes	2021-04-08 10:16:21 +02:00
Michał Chojnowski	4214e74678	cdc: log: remove an unnecessary copy in process_row_visitor::live_atomic_cell	2021-04-08 10:16:11 +02:00
Michał Chojnowski	c2b43c8daf	cdc: log: switch cell_map from bytes to managed_bytes	2021-04-08 10:05:30 +02:00
Michał Chojnowski	4e8eb07de4	cdc: log: change the argument of log_mutation_builder::set_value to managed_bytes_view	2021-04-08 10:05:00 +02:00
Michał Chojnowski	f18b74eee5	cdc: log: don't linearize the primary key in log_mutation_builder	2021-04-08 10:04:31 +02:00
Konstantin Osipov	c83cf1f965	uuid: switch the API to use std::chrono A follow up for the patch for #7611. This change was requested during review and moved out of #7611 to reduce its scope. The patch switches UUID_gen API from using plain integers to hold time units to units from std::chrono. For one, we plan to switch the entire code base to std::chrono units, to ensure type safety. Secondly, using std::chrono units allows to increase code reuse with template metaprogramming and remove a few of UUID_gen functions that beceme redundant as a result. * switch get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch min_time_UUID(), max_time_UUID(), create_time_safe() to std::chrono * remove unused variant of from_unix_timestamp() * remove unused get_time_UUID_bytes(), create_time_unsafe(), redundant get_adjusted_timestamp() * inline get_raw_UUID_bytes() * collapse to similar implementations of get_time_UUID() * switch internal constants to std::chrono * remove unnecessary unique_ptr from UUID_gen::_instance Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>	2021-04-06 17:12:54 +03:00
Wojciech Mitros	daa31be37f	types: replace buffers in tuple_deserializing_iterator with fragmented ones In preparation for removing linearization from abstract_type::compare, add options to avoid linearization in tuple_deserializing_iterator. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-03-31 06:35:09 +02:00
Calle Wilund	e4d6c8904f	untyped_result_set: Do not copy data from input store (retain fragmented views) Refs #7961 Fixes #8014 Instead of doing a deep copy of input, we keep assume ownership and build rows of the views therein, potentially retaining fragmented data as-is avoiding premature linearization. Note that this is not all sugar and flowers though. Any data access will by nature be more expensive, and the view collections we create are potentially just as expensive as copying for small cells. Otoh, it allows writing code using this that avoids data copying, depending on destination. v2: * Fixed wrong collection reserved in visitor * Changed row index from shared ptr to ref * Moved typedef * Removed non-existing constructors * Added const ref to index build * Fixed raft usage after rebase v3: * Changed shared_ptr to unique	2021-03-03 10:19:46 +00:00
Kamil Braun	e2f03e4aba	cdc: move (most of) CDC generation management code to the new service Currently all management of CDC generations happens in storage_service, which is a big ball of mud that does many unrelated things. Previous commits have introduced a new service for managing CDC generations. This code moves most of the relevant code to this new service. However, some part still remains in storage_service: the bootstrap procedure, which happens inside storage_service, must also do some initialization regarding CDC generations, for example: on restart it must retrieve the latest known generation timestamp from disk; on bootstrap it must create a new generation and announce it to other nodes. The order of these operations w.r.t the rest of the startup procedure is important, hence the startup procedure is the only right place for them. Still, what remains in storage_service is a small part of the entire CDC generation management logic; most of it has been moved to the new service. This includes listening for generation changes and updating the data structures for performing CDC log writes (cdc::metadata). Furthermore these functions now return futures (and are internally coroutines), where previously they required a seastar::async context.	2021-02-26 12:06:12 +01:00
Michał Chojnowski	96c22cf3f8	cdc: log: avoid an unnecessary copy There is no need to copy `bytes_view` into `bytes` here.	2021-02-18 14:08:18 +01:00
Michał Chojnowski	8cc4f39472	cdc: log: fix use-after-free in process_bytes_visitor Due to small value optimization used in `bytes`, views to `bytes` stored in `vector` can be invalidated when the vector resizes, resulting in use-after-free and data corruption. Fix that. Fixes #8117	2021-02-18 14:08:17 +01:00
Botond Dénes	ba7a9d2ac3	imr: switch back to open-coded description of structures Commit `aab6b0ee27` introduced the controversial new IMR format, which relied on a very template-heavy infrastructure to generate serialization and deserialization code via template meta-programming. The promise was that this new format, beyond solving the problems the previous open-coded representation had (working on linearized buffers), will speed up migrating other components to this IMR format, as the IMR infrastructure reduces code bloat, makes the code more readable via declarative type descriptions as well as safer. However, the results were almost the opposite. The template meta-programming used by the IMR infrastructure proved very hard to understand. Developers don't want to read or modify it. Maintainers don't want to see it being used anywhere else. In short, nobody wants to touch it. This commit does a conceptual revert of `aab6b0ee27`. A verbatim revert is not possible because related code evolved a lot since the merge. Also, going back to the previous code would mean we regress as we'd revert the move to fragmented buffers. So this revert is only conceptual, it changes the underlying infrastructure back to the previous open-coded one, but keeps the fragmented buffers, as well as the interface of the related components (to the extent possible). Fixes: #5578	2021-02-16 23:43:07 +01:00
Konstantin Osipov	b4f875f08e	uuid: reduce code dependency on UUID_gen.hh Do not include UUID_gen.hh in trace_state.hh and lists.hh to reduce header level dependency on it. Message-Id: <20210127173114.725761-2-kostja@scylladb.com>	2021-01-27 20:08:29 +02:00
Benny Halevy	c60da2e90d	cdc: remove _token_metadata from db_context 1. It's unused since `cbe510d1b8` 2. It's unsafe to keep a reference to token_metadata& potentially across yield points. The higher-level motivation is to make storage_service::get_token_metadata() private so we can control better how it's used. For cdc, if the token_metadata is going to be needed to the future, it'd be better get it from db_context::_proxy.get_token_metadata_ptr(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201213162351.52224-2-bhalevy@scylladb.com>	2020-12-13 18:32:17 +02:00
Pavel Emelyanov	89fd524c5a	schema-tables: Add database argument to make_update_table_mutations There are 3 callers of this helper (cdc, migration manager and tests) and all of them already have the database object at hands. The argument will be used by next patch to remove call for global storage proxy instance from make_update_indices_mutations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:21:22 +03:00
Kamil Braun	2da723b9c8	cdc: produce postimage when inserting with no regular columns When a row was inserted into a table with no regular columns, and no such row existed in the first place, postimage would not be produced. Fix this. Fixes #7716. Closes #7723	2020-12-01 18:01:23 +02:00
Piotr Jastrzebski	6b1167ea0d	cdc: Remove std::iterator from collection_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Calle Wilund	46ea8c9b8b	cdc: Add an "end-of-record" column to Fixes #7435 Adds an "eor" (end-of-record) column to cdc log. This is non-null only on last-in-timestamp group rows, i.e. end of a singular source "event". A client can use this as a shortcut to knowing whether or not he has a full cdc "record" for a given source mutation (single row change). Closes #7436	2020-10-26 09:39:27 +02:00
Avi Kivity	d3c0b4c555	cdc: log: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:23:16 +03:00
Calle Wilund	70a282ced2	cdc: Remove post-filterings for keys-only/off cdc delta generation Refs #7095 CDC delta!=full both relied on post-filtering to remove generated log row and/or cells. This is inefficient. Instead, simply check if the data should be created in the visitors. v2: * Fixed delta logs rows created (empty) even when delta == off v3: * Killed delta == off v4: * Move checks into (const) member var(s)	2020-08-31 07:59:43 +00:00
Calle Wilund	78236c015a	cdc: Remove cdc delta_mode::off Fixes #7128 CDC logs are not useful without at least delta_mode==keys, since pre/post image data has no info on _what_ was actually done to base table in source mutation.	2020-08-31 07:59:40 +00:00
Calle Wilund	e50911e5b0	cdc: Do not generate pre/post image for non-existent rows Fixes #7119 Fixes #7120 If preimage select came up empty - i.e. the row did not exist, either due to never been created, or once delete, we should not bother creating a log preimage row for it. Esp. since it makes it harder to interpret the cdc log. If an operation in a cdc batch did a row delete (ranged, ck, etc), do not generate postimage data, since the row does no longer exist. Note that we differentiate deleting all (non-pk/ck) columns from actual row delete.	2020-08-26 18:14:09 +00:00
Calle Wilund	5ed3d6892d	cdc: Remove stored (postimage) data when doing row delete Fixes #6900 Clustered range deletes did not clear out the "row_states" data associated with affected rows (might be many). Adds a sweep through and erases relevant data. Since we do pre- and postimage in "order", this should only affect postimage.	2020-08-25 12:27:18 +03:00
Piotr Jastrzebski	f01ce1458f	cdc: Preserve metadata columns when geting only keys for delta Fixes #7095 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-08-25 10:41:54 +03:00
Benny Halevy	2f7c529c1c	storage_service: separate get_mutable_token_metadata Use a different getter for a token_metadata& that may be changed so we can better synchronize readers and writers of token_metadata and eventually allow them to yield in asynchronous loops. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Kamil Braun	0d3779e3e6	cdc: rewrite process_changes using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	9067f1a4e2	cdc: move some functions out of `cdc::transformer` Preparing them to be used outside of `transformer`.	2020-08-17 15:51:33 +02:00
Nadav Har'El	7e01ae089e	cdc: avoid including cdc/cdc_options.hh everywhere Before this patch, modifying cdc/cdc_options.hh required recompiling 264 source files. This is because this header file was included by a couple other header files - most notably schema.hh, where a forward declaration would have been enough. Only the handful of source files which really need to access the CDC options should include "cdc/cdc_options.hh" directly. After this patch, modifying cdc/cdc_options.hh requires only 6 source files to be recompiled. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200813070631.180192-1-nyh@scylladb.com>	2020-08-16 14:41:47 +03:00
Calle Wilund	2eb4522fef	cdc: Make pre image optionally "full" (include all columns) Makes the "preimage" option for cdc non-binary, i.e. it can now be "true"/"on", "false"/"off" or "full. The two former behaving like previously, the latter obviously including all columns in pre image.	2020-08-12 16:03:06 +00:00
Piotr Jastrzebski	80e3923b3c	codebase wide: replace find(...) != end() with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously the code pattern looked like: <collection>.find(<element>) != <collection>.end() In C++20 the same can be expressed with: <collection>.contains(<element>) This is not only more concise but also expresses the intend of the code more clearly. This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>	2020-08-11 13:28:50 +03:00
Nadav Har'El	936cf4cce0	merge: Increase row limits Merged pull request https://github.com/scylladb/scylla/pull/6910 by Wojciech Mitros: This patch enables selecting more than 2^32 rows from a table. The change becomes active after upgrading whole cluster - until then old limits are used. Tested reading 4.5*10^9 rows from a virtual table, manually upgrading a cluster with ccm and performing cql SELECT queries during the upgrade, ran unit tests in dev mode and cql and paging dtests. tests: add large paging state tests increase the maximum size of query results to 2^64	2020-08-04 19:52:30 +03:00
Kamil Braun	b5f3aef900	cdc: add an abstraction for building log mutations This commit takes out some responsibilities of `cdc::transformer` (which is currently a big ball of mud) into a separate class. This class is a simple abstraction for creating entries in a CDC log mutation. Low-level calls to the mutation API (such as `set_cell`) inside `cdc::transformer` were replaced by higher-level calls to the builder abstraction, removing some duplication of logic.	2020-08-04 19:37:03 +03:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Nadav Har'El	2dcb6294da	merge: cdc: New delta modes: `off`, `keys`, `fulll` Merged pull request https://github.com/scylladb/scylla/pull/6914 by By Juliusz Stasiewicz: The goal is to have finer control over CDC "delta" rows, i.e.: disable them totally (mode off); record only base PK+CK columns (mode keys); make them behave as usual (mode full, default). The editing of log rows is performed at the stage of finishing CDC mutation. Fixes #6838 tests: Added CQL test for `delta mode` cdc: Implementations of `delta_mode::off/keys` cdc: Infrastructure for controlling `delta_mode`	2020-08-03 14:10:15 +03:00

1 2 3

146 Commits