Commit Graph

34 Commits

Author SHA1 Message Date
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Petr Gusev
7b55ccbd8e token_metadata: drop the template
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
2023-12-12 23:19:54 +04:00
Petr Gusev
63f64f3303 token_metadata: make it a template with NodeId=inet_address/host_id
NodeId is used in all internal token_metadata data structures, that
previously used inet_address. We choose topology::key_kind based
on the value of the template parameter.

generic_token_metadata::update_topology overload with host_id
parameter is added to make update_topology_change_info work,
it now uses NodeId as a parameter type.

topology::remove_endpoint(host_id) is added to make
generic_token_metadata::remove_endpoint(NodeId) work.

pending_endpoints_for and endpoints_for_reading are just removed - they
are not used and not implemented. The declarations were left by mistake
from a refactoring in which these methods were moved to erm.

generic_token_metadata_base is extracted to contain declarations, common
to both token_metadata versions.

Templates are explicitly instantiated inside token_metadata.cc, since
implementation part is also a template and it's not exposed to the header.

There are no other behavioral changes in this commit, just syntax
fixes to make token_metadata a template.
2023-12-11 12:51:34 +04:00
Botond Dénes
f8a8fe41d6 cdc/log.hh: expose is_log_name()
Allow outside code to use it to determine whether a table is cdc or not.
This is currently the most reliable method if the custom partitioner is
not set on the schema of the investigated table.
2022-06-10 10:57:12 +03:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Pavel Emelyanov
0fd00d7016 cdc: Add database argument to is_log_for_some_table
All callers has been patched already. This argument can now
be used to replace get_local_storage_proxy().get_db().local()
call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-08-27 14:07:26 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Emelyanov
cc813ef0dd cdc: Remove db_context::builder
Right now the builder is just an opaque transfer between cdc_service
constructor args and cdc_service's db_context constructor args.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-29 22:46:57 +03:00
Pavel Emelyanov
3a7ca647af cdc: Provide migration notifier right at once
The only way db_context's migration notifier reference is set up
is via cdc_service->db_context::builder->.build chain of calls.
Since the builder's notifier optional reference is always
disengaged (the .with_migration_notifier is removed by previous
patch) the only possible notifier reference there is from the
storage service which, in turn, is the same as in main.cc.

Said that -- push the notifier reference onto db_context directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-29 22:40:24 +03:00
Pavel Emelyanov
421a514c30 cdc: Remove db_context::builder::with_migration_notifier
It's unused and removing it makes next patch's life simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-29 22:39:12 +03:00
Kamil Braun
e2f03e4aba cdc: move (most of) CDC generation management code to the new service
Currently all management of CDC generations happens in storage_service,
which is a big ball of mud that does many unrelated things.

Previous commits have introduced a new service for managing CDC
generations. This code moves most of the relevant code to this new
service.

However, some part still remains in storage_service: the bootstrap
procedure, which happens inside storage_service, must also do some
initialization regarding CDC generations, for example: on restart it
must retrieve the latest known generation timestamp from disk; on
bootstrap it must create a new generation and announce it to other
nodes. The order of these operations w.r.t the rest of the startup
procedure is important, hence the startup procedure is the only right
place for them.

Still, what remains in storage_service is a small part of the entire
CDC generation management logic; most of it has been moved to the
new service. This includes listening for generation changes and
updating the data structures for performing CDC log writes (cdc::metadata).
Furthermore these functions now return futures (and are internally
coroutines), where previously they required a seastar::async context.
2021-02-26 12:06:12 +01:00
Benny Halevy
c60da2e90d cdc: remove _token_metadata from db_context
1. It's unused since cbe510d1b8
2. It's unsafe to keep a reference to token_metadata&
potentially across yield points.

The higher-level motivation is to make
storage_service::get_token_metadata() private so we
can control better how it's used.

For cdc, if the token_metadata is going to be needed
to the future, it'd be better get it from
db_context::_proxy.get_token_metadata_ptr().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201213162351.52224-2-bhalevy@scylladb.com>
2020-12-13 18:32:17 +02:00
Benny Halevy
2f7c529c1c storage_service: separate get_mutable_token_metadata
Use a different getter for a token_metadata& that
may be changed so we can better synchronize readers
and writers of token_metadata and eventually allow
them to yield in asynchronous loops.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Nadav Har'El
7e01ae089e cdc: avoid including cdc/cdc_options.hh everywhere
Before this patch, modifying cdc/cdc_options.hh required recompiling 264
source files. This is because this header file was included by a couple
other header files - most notably schema.hh, where a forward declaration
would have been enough. Only the handful of source files which really
need to access the CDC options should include "cdc/cdc_options.hh" directly.

After this patch, modifying cdc/cdc_options.hh requires only 6 source files
to be recompiled.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200813070631.180192-1-nyh@scylladb.com>
2020-08-16 14:41:47 +03:00
Pavel Emelyanov
757a7145b9 headers: Remove mutation.hh from trace_state.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:40:23 +03:00
Calle Wilund
331aa7c501 cdc: Add "is_cdc_metacolumn_name" predicate
To sift column names
2020-07-15 08:10:23 +00:00
Calle Wilund
8a728ce618 cdc: Add get_base_table helper 2020-07-15 08:10:23 +00:00
Calle Wilund
8f462e8606 CDC::log: Add base_name helper
To extract base table name from CDC log table name.
2020-07-15 08:10:23 +00:00
Kamil Braun
013330199d cdc/storage_proxy: keep cdc_service alive in storage_proxy operations
storage_proxy is never deinitialized, so it may have still used cdc_service
after its destructor was called.

This fixes the problem by cdc_service inheriting from
async_sharded_service and storage_proxy calling shared_from_this on
the service whenever it uses it.

cdc_service inherits from async_sharded_service and not simply from
enable_shared_from_this, because there might be other services that
cdc_service depends on. Assuming that these services are
deinitialized after cdc_service (as they should), i.e. after stop() is
called on cdc_service, making cdc_service async_sharded_service will
keep their deinitialization code from being called until all references
to cdc_service disappear (async_sharded_service keeps stop() from
returning until this happens).

Some more improvements should be possible through some refactoring:
1. Make augment_mutation_call a free function, not a member of
   cdc_service: it doesn't need any state that cdc_service has.
   db_context can be passed down from storage_proxy when it calls the
   function.
2. Remove the storage_proxy -> cdc_service reference. storage_proxy
   only needs augment_mutation_call, which would not be a part of the
   service. This would also get rid of the proxy -> cdc -> proxy
   reference cycle that we have now, and would allow storage_proxy to be
   safely deinitialized after cdc_service.
3. Maybe we could even remove the cdc_service -> storage_proxy
   reference. Is it really needed?
2020-06-08 13:25:51 +03:00
Juliusz Stasiewicz
c70311f73e cdc: CL for preimage select is calculated from base write CL
CL of LOCAL_QUORUM used to be hardcoded into CDC preimage query
and led to an error when number of replicas was lower than CL
would require. The solution here is to link the CLs of writes
to base table with the CLs of CDC reads, so the client will get
the (limited) control over the consistency of preimage SELECTs
(instead of getting error every time).

The algorithm is as follows:
1. If write that caused CDC activity was done with CL = ANY,
  then do preimage read with CL = ONE.
2. If write that caused CDC activity was done with CL = ALL,
  then do preimage read with CL = QUORUM.
3. SERIAL and LOCAL_SERIAL writes cause preimage read with QUORUM
  and LOCAL_QUORUM, respectively.
4. In other cases do preimage read with the same CL as base write.
2020-04-21 14:33:36 +02:00
Piotr Dulikowski
5a5cc57878 cdc: create an operation_result_tracker object
An `operation_result_tracker` object is now returned as a second return
value from the `augment_mutation_call` function.
2020-03-23 14:05:25 +01:00
Piotr Dulikowski
59727fb34b cdc: remove result_callback
The `result_callback` was a callback returned by `augment_mutation_call`
that was supposed to be used in the CDC postimage implementation.
Because CDC postimage was implemented without using this callback, and
currently a no-op function is always returned, this callback can safely
be removed.
2020-03-19 14:55:07 +02:00
Nadav Har'El
35d95d6887 merge: Add postimage implementation
Merged pull request https://github.com/scylladb/scylla/pull/5996 from
Calle Wilund:

Fixes #4992

Implements post-image support by synthesizing it from
pre-image + delta.

Post-image data differs from the delta data in two ways:

1.) It merges non-atomics into an actual result value
2.) It contains all columns of the row, not just
those affected by the update.

For a non-atomic field, the post-image value of a column
is either the pre-image or the delta (maybe null)

Tested by adding post-image checks to pre-image test
and collection/udt tests
2020-03-16 13:42:07 +02:00
Calle Wilund
0a3383c090 cdc: Add postimage implementation
Fixes #4992

Implements post-image support by synthesizing it from
pre-image + delta.

Post-image data differs from the delta data in two ways:

1.) It merges non-atomics into an actual result value
2.) It contains _all_ columns of the row, not just
    those affected by the update.

For a non-atomic field, the post-image value of a column
is either the pre-image or the delta (maybe null)

Tested by adding post-image checks to pre-image test
and collection/udt tests
2020-03-16 09:21:06 +00:00
Piotr Dulikowski
b1e8170bf9 cdc: add tracing
Adds information about the stages of CDC mutation augmentation to
tracing sessions.
2020-03-15 11:54:10 +01:00
Kamil Braun
3200d415da cdc: use a single timeuuid value for a batch of changes
If a batch update is performed with a sequence of changes with a single
timestamp, they will now show up in CDC with a single timeuuid in the
`time` column, distinguished by different `batch_seq_no` values.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-05 12:32:57 +01:00
Calle Wilund
ed0d1c5fe2 cdc: Break up data column tuple
According to "new" spec:

Data column is now pure frozen original type.

If column is deleted (set to null), a metadata column
cdc$deleted_<name> is set to true, to distinguish
null column == not involved in row operation

For non-atomic collections, a cdc$deleted_elements_<name>
column is added, and when removing elements from collection
this is where they are shown.

For non-atomic assign, the "cdc$deleted_<name>" is true,
and <name> is set to new value.

column_op removed.
2020-03-03 08:52:20 +00:00
Calle Wilund
1085860c62 cdc: Rename metadata and data columns according to new spec
Also use transformation methods for names in all code + tests
to make switching again easier
2020-03-02 09:34:51 +00:00
Juliusz Stasiewicz
cf24ae86f3 cdc: distinguishing update from insert
When incoming mutation contains live row marker the `operation` is
described as "insert", not as an "update".

Also, I extended the test case "test_row_delete" with one insert,
which is expected to log different value of `operation` than update
or delete. Renamed the test case accordingly.

Test cases that relied on "update" being the same as "insert" are
updated accordingly (`test_pre_image_logging`, `test_cdc_across_shards`,
`test_add_columns`).

Fixes #5723
2020-03-01 17:50:08 +02:00
Piotr Dulikowski
82a2bdf39f cdc: distinguish open and closed ranges for range delete
This patch causes inclusive and exclusive range deletes to be
distinguished in cdc log. Previously, operations `range_delete_start`
and `range_delete_end` were used for both inclusive and exclusive bounds
in range deletes. Now, old operations were renamed to
`range_delete_*_inclusive`, and for exclusive deletes, new operations
`range_delete_*_exclusive` are used.

Tests: unit(dev)
2020-02-20 11:39:06 +01:00
Piotr Dulikowski
6fe4f9ded8 cdc: restrict permissions on _scylla_cdc_log tables
Disallows DROP permission on CDC log tables.
2020-02-10 15:40:48 +01:00
Piotr Jastrzebski
97262bec82 cdc: remove partitioner from db_context
partitioner from cdc::db_context is no longer used
so it can be removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-06 08:00:01 +01:00
Kamil Braun
bd42b10df1 cdc: rename cdc/cdc.{hh,cc} to cdc/log.{hh,cc}
To increase modularity, making it easier to find what is where and
maintain.

The 'log' module (cdc/log.{hh,cc}) is responsible for updating CDC log
tables when base table writes are performed.

The 'generation' module (cdc/generation.{hh,cc}) handles stream
generation changes in response to topology change events.

cdc/metadata.{hh,cc} contains a helper class which holds the currently
used generation of streams. It is used by both aforementioned modules:
'log' queries it, while 'generation' updates it.
2020-01-30 11:10:39 +01:00