scylladb

Author	SHA1	Message	Date
Kefu Chai	6c06751640	cdc: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16725	2024-01-11 09:13:37 +02:00
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	63f64f3303	token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. generic_token_metadata::update_topology overload with host_id parameter is added to make update_topology_change_info work, it now uses NodeId as a parameter type. topology::remove_endpoint(host_id) is added to make generic_token_metadata::remove_endpoint(NodeId) work. pending_endpoints_for and endpoints_for_reading are just removed - they are not used and not implemented. The declarations were left by mistake from a refactoring in which these methods were moved to erm. generic_token_metadata_base is extracted to contain declarations, common to both token_metadata versions. Templates are explicitly instantiated inside token_metadata.cc, since implementation part is also a template and it's not exposed to the header. There are no other behavioral changes in this commit, just syntax fixes to make token_metadata a template.	2023-12-11 12:51:34 +04:00
Botond Dénes	f8a8fe41d6	cdc/log.hh: expose is_log_name() Allow outside code to use it to determine whether a table is cdc or not. This is currently the most reliable method if the custom partitioner is not set on the schema of the investigated table.	2022-06-10 10:57:12 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Pavel Emelyanov	0fd00d7016	cdc: Add database argument to is_log_for_some_table All callers has been patched already. This argument can now be used to replace get_local_storage_proxy().get_db().local() call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-08-27 14:07:26 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Emelyanov	cc813ef0dd	cdc: Remove db_context::builder Right now the builder is just an opaque transfer between cdc_service constructor args and cdc_service's db_context constructor args. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-29 22:46:57 +03:00
Pavel Emelyanov	3a7ca647af	cdc: Provide migration notifier right at once The only way db_context's migration notifier reference is set up is via cdc_service->db_context::builder->.build chain of calls. Since the builder's notifier optional reference is always disengaged (the .with_migration_notifier is removed by previous patch) the only possible notifier reference there is from the storage service which, in turn, is the same as in main.cc. Said that -- push the notifier reference onto db_context directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-29 22:40:24 +03:00
Pavel Emelyanov	421a514c30	cdc: Remove db_context::builder::with_migration_notifier It's unused and removing it makes next patch's life simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-29 22:39:12 +03:00
Kamil Braun	e2f03e4aba	cdc: move (most of) CDC generation management code to the new service Currently all management of CDC generations happens in storage_service, which is a big ball of mud that does many unrelated things. Previous commits have introduced a new service for managing CDC generations. This code moves most of the relevant code to this new service. However, some part still remains in storage_service: the bootstrap procedure, which happens inside storage_service, must also do some initialization regarding CDC generations, for example: on restart it must retrieve the latest known generation timestamp from disk; on bootstrap it must create a new generation and announce it to other nodes. The order of these operations w.r.t the rest of the startup procedure is important, hence the startup procedure is the only right place for them. Still, what remains in storage_service is a small part of the entire CDC generation management logic; most of it has been moved to the new service. This includes listening for generation changes and updating the data structures for performing CDC log writes (cdc::metadata). Furthermore these functions now return futures (and are internally coroutines), where previously they required a seastar::async context.	2021-02-26 12:06:12 +01:00
Benny Halevy	c60da2e90d	cdc: remove _token_metadata from db_context 1. It's unused since `cbe510d1b8` 2. It's unsafe to keep a reference to token_metadata& potentially across yield points. The higher-level motivation is to make storage_service::get_token_metadata() private so we can control better how it's used. For cdc, if the token_metadata is going to be needed to the future, it'd be better get it from db_context::_proxy.get_token_metadata_ptr(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201213162351.52224-2-bhalevy@scylladb.com>	2020-12-13 18:32:17 +02:00
Benny Halevy	2f7c529c1c	storage_service: separate get_mutable_token_metadata Use a different getter for a token_metadata& that may be changed so we can better synchronize readers and writers of token_metadata and eventually allow them to yield in asynchronous loops. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Nadav Har'El	7e01ae089e	cdc: avoid including cdc/cdc_options.hh everywhere Before this patch, modifying cdc/cdc_options.hh required recompiling 264 source files. This is because this header file was included by a couple other header files - most notably schema.hh, where a forward declaration would have been enough. Only the handful of source files which really need to access the CDC options should include "cdc/cdc_options.hh" directly. After this patch, modifying cdc/cdc_options.hh requires only 6 source files to be recompiled. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200813070631.180192-1-nyh@scylladb.com>	2020-08-16 14:41:47 +03:00
Pavel Emelyanov	757a7145b9	headers: Remove mutation.hh from trace_state.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:40:23 +03:00
Calle Wilund	331aa7c501	cdc: Add "is_cdc_metacolumn_name" predicate To sift column names	2020-07-15 08:10:23 +00:00
Calle Wilund	8a728ce618	cdc: Add get_base_table helper	2020-07-15 08:10:23 +00:00
Calle Wilund	8f462e8606	CDC::log: Add `base_name` helper To extract base table name from CDC log table name.	2020-07-15 08:10:23 +00:00
Kamil Braun	013330199d	cdc/storage_proxy: keep cdc_service alive in storage_proxy operations storage_proxy is never deinitialized, so it may have still used cdc_service after its destructor was called. This fixes the problem by cdc_service inheriting from async_sharded_service and storage_proxy calling shared_from_this on the service whenever it uses it. cdc_service inherits from async_sharded_service and not simply from enable_shared_from_this, because there might be other services that cdc_service depends on. Assuming that these services are deinitialized after cdc_service (as they should), i.e. after stop() is called on cdc_service, making cdc_service async_sharded_service will keep their deinitialization code from being called until all references to cdc_service disappear (async_sharded_service keeps stop() from returning until this happens). Some more improvements should be possible through some refactoring: 1. Make augment_mutation_call a free function, not a member of cdc_service: it doesn't need any state that cdc_service has. db_context can be passed down from storage_proxy when it calls the function. 2. Remove the storage_proxy -> cdc_service reference. storage_proxy only needs augment_mutation_call, which would not be a part of the service. This would also get rid of the proxy -> cdc -> proxy reference cycle that we have now, and would allow storage_proxy to be safely deinitialized after cdc_service. 3. Maybe we could even remove the cdc_service -> storage_proxy reference. Is it really needed?	2020-06-08 13:25:51 +03:00
Juliusz Stasiewicz	c70311f73e	cdc: CL for preimage select is calculated from base write CL CL of LOCAL_QUORUM used to be hardcoded into CDC preimage query and led to an error when number of replicas was lower than CL would require. The solution here is to link the CLs of writes to base table with the CLs of CDC reads, so the client will get the (limited) control over the consistency of preimage SELECTs (instead of getting error every time). The algorithm is as follows: 1. If write that caused CDC activity was done with CL = ANY, then do preimage read with CL = ONE. 2. If write that caused CDC activity was done with CL = ALL, then do preimage read with CL = QUORUM. 3. SERIAL and LOCAL_SERIAL writes cause preimage read with QUORUM and LOCAL_QUORUM, respectively. 4. In other cases do preimage read with the same CL as base write.	2020-04-21 14:33:36 +02:00
Piotr Dulikowski	5a5cc57878	cdc: create an operation_result_tracker object An `operation_result_tracker` object is now returned as a second return value from the `augment_mutation_call` function.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	59727fb34b	cdc: remove result_callback The `result_callback` was a callback returned by `augment_mutation_call` that was supposed to be used in the CDC postimage implementation. Because CDC postimage was implemented without using this callback, and currently a no-op function is always returned, this callback can safely be removed.	2020-03-19 14:55:07 +02:00
Nadav Har'El	35d95d6887	merge: Add postimage implementation Merged pull request https://github.com/scylladb/scylla/pull/5996 from Calle Wilund: Fixes #4992 Implements post-image support by synthesizing it from pre-image + delta. Post-image data differs from the delta data in two ways: 1.) It merges non-atomics into an actual result value 2.) It contains all columns of the row, not just those affected by the update. For a non-atomic field, the post-image value of a column is either the pre-image or the delta (maybe null) Tested by adding post-image checks to pre-image test and collection/udt tests	2020-03-16 13:42:07 +02:00
Calle Wilund	0a3383c090	cdc: Add postimage implementation Fixes #4992 Implements post-image support by synthesizing it from pre-image + delta. Post-image data differs from the delta data in two ways: 1.) It merges non-atomics into an actual result value 2.) It contains _all_ columns of the row, not just those affected by the update. For a non-atomic field, the post-image value of a column is either the pre-image or the delta (maybe null) Tested by adding post-image checks to pre-image test and collection/udt tests	2020-03-16 09:21:06 +00:00
Piotr Dulikowski	b1e8170bf9	cdc: add tracing Adds information about the stages of CDC mutation augmentation to tracing sessions.	2020-03-15 11:54:10 +01:00
Kamil Braun	3200d415da	cdc: use a single timeuuid value for a batch of changes If a batch update is performed with a sequence of changes with a single timestamp, they will now show up in CDC with a single timeuuid in the `time` column, distinguished by different `batch_seq_no` values. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 12:32:57 +01:00
Calle Wilund	ed0d1c5fe2	cdc: Break up data column tuple According to "new" spec: Data column is now pure frozen original type. If column is deleted (set to null), a metadata column cdc$deleted_<name> is set to true, to distinguish null column == not involved in row operation For non-atomic collections, a cdc$deleted_elements_<name> column is added, and when removing elements from collection this is where they are shown. For non-atomic assign, the "cdc$deleted_<name>" is true, and <name> is set to new value. column_op removed.	2020-03-03 08:52:20 +00:00
Calle Wilund	1085860c62	cdc: Rename metadata and data columns according to new spec Also use transformation methods for names in all code + tests to make switching again easier	2020-03-02 09:34:51 +00:00
Juliusz Stasiewicz	cf24ae86f3	cdc: distinguishing update from insert When incoming mutation contains live row marker the `operation` is described as "insert", not as an "update". Also, I extended the test case "test_row_delete" with one insert, which is expected to log different value of `operation` than update or delete. Renamed the test case accordingly. Test cases that relied on "update" being the same as "insert" are updated accordingly (`test_pre_image_logging`, `test_cdc_across_shards`, `test_add_columns`). Fixes #5723	2020-03-01 17:50:08 +02:00
Piotr Dulikowski	82a2bdf39f	cdc: distinguish open and closed ranges for range delete This patch causes inclusive and exclusive range deletes to be distinguished in cdc log. Previously, operations `range_delete_start` and `range_delete_end` were used for both inclusive and exclusive bounds in range deletes. Now, old operations were renamed to `range_delete__inclusive`, and for exclusive deletes, new operations `range_delete__exclusive` are used. Tests: unit(dev)	2020-02-20 11:39:06 +01:00
Piotr Dulikowski	6fe4f9ded8	cdc: restrict permissions on _scylla_cdc_log tables Disallows DROP permission on CDC log tables.	2020-02-10 15:40:48 +01:00
Piotr Jastrzebski	97262bec82	cdc: remove partitioner from db_context partitioner from cdc::db_context is no longer used so it can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 08:00:01 +01:00
Kamil Braun	bd42b10df1	cdc: rename cdc/cdc.{hh,cc} to cdc/log.{hh,cc} To increase modularity, making it easier to find what is where and maintain. The 'log' module (cdc/log.{hh,cc}) is responsible for updating CDC log tables when base table writes are performed. The 'generation' module (cdc/generation.{hh,cc}) handles stream generation changes in response to topology change events. cdc/metadata.{hh,cc} contains a helper class which holds the currently used generation of streams. It is used by both aforementioned modules: 'log' queries it, while 'generation' updates it.	2020-01-30 11:10:39 +01:00

34 Commits