Commit Graph

165 Commits

Author SHA1 Message Date
Calle Wilund
05851578d4 alternator::streams: Report streams as not ready until CDC stream id:s are available
Refs #6864

When booting a clean scylla, CDC stream ID:s will not be availble until
a n*ring delay time period has passed. Before this, writing to a CDC
enabled table will fail hard.
For alternator (and its tests), we can report the stream(s) for tables as not yet
available (ENABLING) until such time as id:s are
computed.

v2:
* Keep storage service ref in executor
2020-08-03 20:34:15 +03:00
Nadav Har'El
2dcb6294da merge: cdc: New delta modes: off, keys, fulll
Merged pull request https://github.com/scylladb/scylla/pull/6914
by By Juliusz Stasiewicz:

The goal is to have finer control over CDC "delta" rows, i.e.:

    disable them totally (mode off);
    record only base PK+CK columns (mode keys);
    make them behave as usual (mode full, default).

The editing of log rows is performed at the stage of finishing CDC mutation.

Fixes #6838

  tests: Added CQL test for `delta mode`
  cdc: Implementations of `delta_mode::off/keys`
  cdc: Infrastructure for controlling `delta_mode`
2020-08-03 14:10:15 +03:00
Botond Dénes
92a7b16cba query: read_command: add max_result_size
This field will replace max size which is currently passed once per
established rpc connection via the CLIENT_ID verb and stored as an
auxiliary value on the client_info. For now it is unused, but we update
all sites creating a read command to pass the correct value to it. In the
next patch we will phase out the old max size and use this field to pass
max size on each verb instead.
2020-07-28 18:00:29 +03:00
Botond Dénes
8992bcd1f8 query: read_command: use tagged ints for limit ctor params
The convenience constructor of read_command now has two integer
parameter next to each other. In the next patch we intend to add another
one. This is recipe for disaster, so to avoid mistakes this patch
converts these parameters to tagged integers. This makes sure callers
pass what they meant to pass. As a matter of fact, while fixing up
call-sites, I already found several ones passing `query::max_partitions`
to the `row_limit` parameter. No harm done yet, as
`query::max_partitions` == `query::max_rows` but this shows just how
easy it is to mix up parameters with the same type.
2020-07-28 18:00:29 +03:00
Juliusz Stasiewicz
9e4247090f cdc: Implementations of delta_mode::off/keys
At the stage of `finish`ing CDC mutation, deltas are removed (mode
`off`) or edited to keep only PK+CK of the base table (mode `keys`).

Fixes #6838
2020-07-27 19:05:47 +02:00
Juliusz Stasiewicz
c05128d217 cdc: Infrastructure for controlling delta_mode
The goal is to have finer control over CDC "delta" rows, i.e.:
- disable them totally (mode `off`);
- record only PK+CK (mode `keys`);
- make them behave as usual (mode `full`, default).

This commit adds the necessary infrastructure to `cdc_options`.
2020-07-27 19:00:06 +02:00
Kamil Braun
12e2891c60 cdc: if ring_delay == 0, don't add delay to newly created generation
If ring_delay == 0, something fishy is going on, e.g. single-node tests
are being performed. In this case we want the CDC generation to start
operating immediately. There is no need to wait until it propagates to
the cluster.

You should not use ring_delay == 0 in production.

Fixes https://github.com/scylladb/scylla/issues/6864.
2020-07-22 16:06:09 +03:00
Pavel Emelyanov
757a7145b9 headers: Remove mutation.hh from trace_state.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:40:23 +03:00
Piotr Dulikowski
e2462bce3b cdc: fix a corner case inside get_base_table
It is legal for a user to create a table with name that has a
_scylla_cdc_log suffix. In such case, the table won't be treated as a
cdc log table, and does not require a corresponding base table to exist.

During refactoring done as a part of initial implemetation of of
Alternator streams (#6694), `is_log_for_some_table` started throwing
when trying to check a name like `X_scylla_cdc_log` when there was no
table with name `X`. Previously, it just returned false.

The exception originates inside `get_base_table`, which tries to return
the base table schema, not checking for its existence - which may throw.
It makes more sense for this function to return nullptr in such case (it
already does when provided log table name does not have the cdc log
suffix), so this patch adds an explicit check and returns nullptr when
necessary.

A similar oversight happened before (see #5987), so this patch also adds
a comment which explains why existence of `X_scylla_cdc_log` does not
imply existence of `X`.

Fixes: #6852
Refs: #5724, #5987
2020-07-16 16:38:48 +03:00
Calle Wilund
3376209718 cdc::schema: Make extensions expicitly settable from builder
To make non-cql cdc schema options a reality.
2020-07-15 08:21:34 +00:00
Calle Wilund
0158f6473b cdc: Add stream ids structure with time and expiration
For reading the topology tables from within scylla.
2020-07-15 08:10:23 +00:00
Calle Wilund
331aa7c501 cdc: Add "is_cdc_metacolumn_name" predicate
To sift column names
2020-07-15 08:10:23 +00:00
Calle Wilund
8a728ce618 cdc: Add get_base_table helper 2020-07-15 08:10:23 +00:00
Calle Wilund
8f462e8606 CDC::log: Add base_name helper
To extract base table name from CDC log table name.
2020-07-15 08:10:23 +00:00
Piotr Dulikowski
ad811a48bf cdc: fix indentation 2020-07-08 15:36:41 +02:00
Piotr Dulikowski
20b236d27d cdc: don't update partition state when not needed
In some cases, tracking the state of processed rows inside `transformer`
is not needd at all. We don't need to do it if either:

- Preimage and postimage are disabled for the table,
- Only preimage is enabled and we are processing the last timestamp.

This commit disables updating the state in the cases listed above.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
246f8da6f6 cdc: implement pre/postimage persistence
Moves responsibility for generating pre/postimage rows from the
"process_change" method to "produce_preimage" and "produce_postimage".
This commit actually affects the contents of generated CDC log
mutations.

Added a unit test that verifies more complicated cases with CQL BATCH.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
24b50ffbc8 cdc: add interface for producing pre/postimages
Introduces new methods to the change_processor interface that will cause
it to produce pre/postimage rows for requested clustering key, or for
static row.

Introduces logic in split.cc responsible for calling pre/postimage
methods of the change_processor interface. This does not have any effect
on generated CDC log mutations yet, because the transformer class has
empty implementations in place of those methods.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
761c59d92a cdc: load preimage query result into partition state fields
Instead of looking up preimage data directly from the raw preimage query
results, use the raw results to populate current partition state data,
and read directly from the current partition state.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
946354ee74 cdc: introduce fields for keeping partition state
Introduces data structures that will be used for keeping the current
state of processed rows: _clustering_row_states, and _static_row_state.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
bb587a93be cdc: rename set_pk_columns -> allocate_new_log_row
The new name better describes what this function does.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
82ddeb1992 cdc: track batch_no inside transformer
Move tracking of batch_no inside the transformer.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
7b47f84965 cdc: move cdc$time generation to transformer
Generate the timeuuid on the transformer side, which allows to simplify
the change_processor interface.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
7691568b0a cdc: move find_timestamp to split.cc
The function is no longer used in log.cc, so instead it is moved to
split.cc.

Removed declaration of the function from the log.hh header, because it
is not used elsewhere - apart from testing code, but it already
declared find_timestamp in the cdc_test.cc file.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
51d97be0b3 cdc: introduce change_processor interface
This allows for a more refined use of the transformer by the
for_each_change function (now named "process_changes_with_splitting).

The change_processor interface exposes two methods so far:
begin_timestamp, and process_change (previously named "transform").
By separating those two and exposing them, process_changes_with\
_splitting can cause the transformer to generate less CDC log mutations
- only one for each timestamp in the batch.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
f907cab156 cdc: remove redundant schema arguments from cdc functions
A `mutation` object already has a reference to its schema. It does not
make sense to call functions changed in this commit with a different
schema.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
fa00ea996a cdc: move management of generated mutations inside transformer
CDC log mutations are now stored inside `transformer`, and only moved to
the final set of mutations at the end of `transformer`'s lifetime.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
76a323a02d cdc: move preimage result set into a field of transformer
Instead of passing the preimage result set in each `transform` call, it
is now assigned to a field, and `transform` uses that field.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
79eabc04a8 cdc: keep ts and tuuid inside transformer
Adds a `begin_timestamp` method which tells the `transformer` to start
using the following timestamp and timeuuid when generating new log row
mutations.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
3c01b3c41d cdc: track touched parts of mutations inside transformer
Moves tracking of the "touched parts" statistics inside the transformer
class.

This commit is the first of multiple commits in this series which move
parts of the state used in CDC log row generation inside the
`transformer` class. There is a lot of state being passed to
`transformer` each time its methods are called, which could be as well
tracked by the `transformer` itself. This will result in a nicer
interface and will allow us to generate less CDC log mutations which
give the same result.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
027d20c654 cdc: always include preimage for affected rows
This changes the current algorithm so that the preimage row will not be
skipped if the corresponding rows was not present in preimage query
results.
2020-07-08 15:36:40 +02:00
Piotr Sarna
4cb79f04b0 treewide: replace libjsoncpp usage with rjson
In order to eventually switch to a single JSON library,
most of the libjsoncpp usage is dropped in favor of rjson.
Unfortunately, one usage still remains:
test/utils/test_repl utility heavily depends on the *exact textual*
format of its output JSON files, so replacing a library results
in all tests failing because of differences in formatting.
It is possible to force rjson to print its documents in the exact
matching format, but that's left for later, since the issue is not
critical. It would be nice though if our test suite compared
JSON documents with a real JSON parser, since there are more
differences - e.g. libjsoncpp keeps children of the object
sorted, while rapidjson uses an unordered data structure.
This change should cause no change in semantics, it strives
just to replace all usage of libjsoncpp with rjson.
2020-07-03 10:27:23 +02:00
Juliusz Stasiewicz
8628ede009 cdc: Fix segfault when stream ID key is too short
When a token is calculated for stream_id, we check that the key is
exactly 16 bytes long. If it's not - `minimum_token` is returned
and client receives empty result.

This used to be the expected behavior for empty keys; now it's
extended to keys of any incorrect length.

Fixes #6570
2020-06-17 18:19:37 +03:00
Nadav Har'El
86a4dfcd29 merge: api: Command to check and repair cdc streams
Merged pull request https://github.com/scylladb/scylla/pull/6551
from Juliusz Stasiewicz:

The command regenerates streams when:

    generations corresponding to a gossiped timestamp cannot be
    fetched from system_distributed table,
    or when generation token ranges do not align with token metadata.

In such case the streams are regenerated and new timestamp is
gossiped around. The returned JSON is always empty, regardless of
whether streams needed regeneration or not.

Fixes #6498
Accompanied by: scylladb/scylla-jmx#109, scylladb/scylla-tools-java#172
2020-06-15 14:17:35 +03:00
Calle Wilund
5105e9f5e1 cdc::log: Missing "preimage" check in row deletion pre-image
Fixes #6561

Pre-image generation in row deletion case only checked if we had a pre-image
result set row. But that can be from post-image. Also check actual existance
of the pre-image CK.
Message-Id: <20200608132804.23541-1-calle@scylladb.com>
2020-06-09 10:56:41 +03:00
Kamil Braun
013330199d cdc/storage_proxy: keep cdc_service alive in storage_proxy operations
storage_proxy is never deinitialized, so it may have still used cdc_service
after its destructor was called.

This fixes the problem by cdc_service inheriting from
async_sharded_service and storage_proxy calling shared_from_this on
the service whenever it uses it.

cdc_service inherits from async_sharded_service and not simply from
enable_shared_from_this, because there might be other services that
cdc_service depends on. Assuming that these services are
deinitialized after cdc_service (as they should), i.e. after stop() is
called on cdc_service, making cdc_service async_sharded_service will
keep their deinitialization code from being called until all references
to cdc_service disappear (async_sharded_service keeps stop() from
returning until this happens).

Some more improvements should be possible through some refactoring:
1. Make augment_mutation_call a free function, not a member of
   cdc_service: it doesn't need any state that cdc_service has.
   db_context can be passed down from storage_proxy when it calls the
   function.
2. Remove the storage_proxy -> cdc_service reference. storage_proxy
   only needs augment_mutation_call, which would not be a part of the
   service. This would also get rid of the proxy -> cdc -> proxy
   reference cycle that we have now, and would allow storage_proxy to be
   safely deinitialized after cdc_service.
3. Maybe we could even remove the cdc_service -> storage_proxy
   reference. Is it really needed?
2020-06-08 13:25:51 +03:00
Kamil Braun
a1e235b1a4 CDC: Don't split collection tombstone away from base update
Overwriting a collection cell using timestamp T is a process with
following steps:
1. inserting a row marker (if applicable) with timestamp T;
2. writing a collection tombstone with timestamp T-1;
3. writing the new collection value with timestamp T.
Since CDC does clustering of the operations by timestamp, this
would result in 3 separate calls to `transform` (in case of
INSERT, or 2 - in the case of UPDATE), which seems excessive,
especially when pre-/postimage is enabled. This patch makes
collection tombstones being treated as if they had the same TS as
the base write and thus they are processed in one call to `transform`
(as long as TTLs are not used).

Also, `cdc_test` had to be updated in places that relied on former
splitting strategy.

Fixes #6084
2020-06-07 17:09:05 +03:00
Kamil Braun
d89b7a0548 cdc: rename CDC description tables
Commit 968177da04 has changed the schema
of cdc_topology_description and cdc_description tables in the
system_distributed keyspace.

Unfortunately this was a backwards-incompatible change: these tables
would always be created, irrespective of whether or not "experimental"
was enabled. They just wouldn't be populated with experimental=off.

If the user now tries to upgrade Scylla from a version before this change
to a version after this change, it will work as long as CDC is protected
b the experimental flag and the flag is off.

However, if we drop the flag, or if the user turns experimental on,
weird things will happen, such as nodes refusing to start because they
try to populate cdc_topology_description while assuming a different schema
for this table.

The simplest fix for this problem is to rename the tables. This fix must
get merged in before CDC goes out of experimental.
If the user upgrades his cluster from a pre-rename version, he will simply
have two garbage tables that he is free to delete after upgrading.

sstables and digests need to be regenerated for schema_digest_test since
this commit effectively adds new tables to the system_distributed keyspace.
This doesn't result in schema disagreement because the table is
announced to all nodes through the migration manager.
2020-06-05 09:59:16 +02:00
Piotr Sarna
9a4394327a Merge 'CDC: Disallowed CDC for tables with counter column(s)'
from Juliusz.

CDC for counters is unimplemented as of now,
therefore any attempt to enable CDC log on counter
table needs to be clearly disallowed. This patch does
exactly this.

The check whether schema has counter columns
is performed in `cdc_service::impl` in:
- `on_before_create_column_family`,
- `on_before_update_column_family`
and, if so, results in `invalid_request_exception` thrown.

Fixes #6553

* jul-stas-6553-disallow-cdc-for-counters:
  test/cql: Check that CDC for counters is disallowed
  CDC: Disallowed CDC for tables with counter column(s)
2020-06-05 07:46:53 +02:00
Juliusz Stasiewicz
3a079cf21b CDC: Disallowed CDC for tables with counter column(s)
Until we get implementation of CDC for counters, we explicitly
disallow it. The check is performed in `cdc_service::impl` in:
- `on_before_create_column_family`,
- `on_before_update_column_family`
and results in `invalid_request_exception` thrown.
2020-06-03 18:29:36 +02:00
Piotr Dulikowski
97cb2892b2 cdc: include information about all PKs in trace
This fixes a bug in CDC mutation augmentation logic. A lambda that is
called for each partition key in a batch captures a trace state pointer,
but moves it out after being called for the first time. This caused CDC
tracing information to be included only for one of the partition keys
of the batch.

Fixes #6575
2020-06-03 11:07:57 +02:00
Juliusz Stasiewicz
f2cedbc228 cdc: Remove assert that bootstrap_tokens is nonempty 2020-05-29 12:23:08 +02:00
Kamil Braun
7a98db2ab3 cdc: set ttl column in log rows which update only collections 2020-05-27 08:40:05 +03:00
Piotr Jastrzebski
cd33b9f406 cdc: Tune expired sstables check frequency
CDC Log is a time series which uses time window compaction with some
time window. Data is TTLed with the same value. This means that sstable
won't become fully expired more often than once per time window
duration.

This patch sets expired_sstable_check_frequency_seconds compaction
strategy parameter to half of the time window. Default value of this
parameter is 10 minutes which in most cases won't be a good fit.
By default, we set TTL to 24h and time window to 1h. This means that
with a default value of the parameter we would be checking every 10
minutes but new expired sstable would appear only every 60 minutes.

The parameter is set to half of the time window duration because it's
the expected time we have to wait for sstable to become fully expired.
Half of the time we will wait longer and half of the time we will wait
shorter.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-05-18 16:49:19 +03:00
Nadav Har'El
62c00a3f17 merge: Use time window compaction strategy for CDC Log table
Merged pull request https://github.com/scylladb/scylla/pull/6427
by Piotr Jastrzębski:

CDC Log is a time series so it makes sense to use time window compaction
strategy for it.
Our support for time series is limited so we make sure that we don't create
more than 24 sstables.
If TTL is configured to 0, meaning data does not expire, we don't use time
window compaction strategy.

This PR also sets gc_grace_seconds to 0 when TTL is not set to 0.
2020-05-13 14:36:43 +03:00
Piotr Jastrzebski
49b6010cb4 cdc: Use time window compaction strategy for CDC Log table
CDC Log is a time series with data TTLed by default to 24 hours so
it makes sense to use for it a time window compaction.

A window size is adjusted to the TTL configured for CDC Log so that
no more than 24 sstables will be created.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-05-12 07:53:40 +02:00
Piotr Jastrzebski
0cd0775a27 cdc: Set CDC Log gc_grace_seconds to 0
Data in CDC Log is TTLed and we want to remove it as soon as it expires.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-05-11 17:59:52 +02:00
Avi Kivity
76d21a0c22 Merge 'Make it possible to turn caching off per table and stop caching CDC Log' from Piotr J.
"
We inherited from Origin a `caching` table parameter. It's a map of named caching parameters. Before this PR two caching parameters were expected: `keys` and `rows_per_partition`. So far we have been ignoring them. This PR adds a new caching parameter called `enabled` which can be set to `true` or `false` and controls the usage of the cache for the table. By default, it's set to `true` which reflects Scylla behavior before this PR.

This new capability is used to disable caching for CDC Log table. It is desirable because CDC Log entries are not expected to be read often. They also put much more pressure on memory than entries in Base Table. This is caused by the fact that some writes to Base Table can override previous writes. Every write to CDC Log is unique and does not invalidate any previous entry.

Fixes #6098
Fixes #6146

Tests: unit(dev, release), manual
"

* haaawk-dont_cache_cdc:
  cdc: Don't cache CDC Log table
  table: invalidate disabled cache on memtable flush
  table: Add cache_enabled member function
  cf_prop_defs: persist caching_options in schema
  property_definitions: add get that returns variant
  feature: add PER_TABLE_CACHING feature
  caching_options: add enabled parameter
2020-05-10 15:39:42 +03:00
Avi Kivity
6f1a8cfeea Merge 'Use special partitioner for CDC Log' from Piotr
"
CDC has to create CDC streams that are co-located with corresponding BaseTable data. This is not always easy. Especially for small vnodes. This PR introduces new partitioner which allows us to easily find such stream ids that the stream belongs to a given vnode and shard.

The idea is that a partitioner accepts only keys that are a blob composed of two int64 numbers. The first number is the token of the key.

Tests: unit(dev), dtests(CDC)
"

* haaawk-cdc_partitioner:
  cdc:use CDCPartitioner for CDC Log
  dht: Add find_first_token_for_shard
  dht: use long_token in token::to_int64
  cdc: add CDCPartitioner
  stream_id: add token_from_bytes static function
  i_partitioner: Stop distinguishing whether keys order is preserved
2020-05-06 20:29:27 +03:00
Piotr Jastrzebski
e3dd78b68f cdc: Don't cache CDC Log table
CDC writes are not expected to be read multiple times so it makes little sense
to cache them. Moreover, CDC Log puts much bigger pressure on memory usage than
Base Table because some updates to the Base Table override existing data while
related CDC Log updates are always a new entry in a memtable.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-05-06 18:39:01 +02:00