Commit Graph

20911 Commits

Author SHA1 Message Date
Avi Kivity
cdecb21b78 Update seastar submodule
* seastar 65980a9b30...30185fd901 (12):
  > sstring: resize: NulTerminate when downsizing
  > reactor: make open_flags::dsync respect --unsafe-bypass-fsync
  > json/json_elements: Use double quotes around element name
  > Revert "reactor: make open_flags::dsync respect --unsafe-bypass-fsync"
  > Merge "smp: reduce allocations in work_item::process" from Avi
  > task: optimize destruction by making destructor non-virtual
  > reactor: make open_flags::dsync respect --unsafe-bypass-fsync
  > Revert "sstring: resize: NulTerminate when downsizing"
  > sstring: resize: NulTerminate when downsizing
  > tests: Rename unix domain socket test for consistency
  > resource: downgrade cgroupsv2 message.
  > Merge "Simplify the stream/subscription implementation" from Rafael
2020-02-04 10:20:29 +02:00
Nadav Har'El
3de09042bb CDC topology change support
Merged pull request https://github.com/scylladb/scylla/pull/5485
by Kamil Braun:

This series introduces the notion of CDC generations: sets of CDC streams
used by the cluster to choose partition keys for CDC log writes.
Each CDC generation begins operating at a specific time point, called the
generation's timestamp (cdc_streams_timestamp in the code).
It continues being used by all nodes in the cluster to generate log writes
until superseded by a new generation.

Generations are chosen so that CDC log writes are colocated with their
corresponding base table writes, i.e. their partition keys (which are CDC
stream identifiers picked from the generation operating at time of making
the write) fall into the same vnode and shard as the corresponding base
table write partition keys. Currently this is probabilistic and not 100%
of log writes will be colocated - this will change in future commits,
after per-table partitioners are implemented.

CDC generations are a global property of the cluster -- they don't depend
on any particular table's configuration. Therefore the old "CDC stream
description tables", which were specific to each CDC-enabled table,
were removed and replaced by a new, global description table inside the
system_distributed keyspace.

A new generation is introduced and supersedes the previous one whenever
we insert new tokens into the token ring, which breaks the colocation
property of the previous generation. The new generation is chosen to
account for the new tokens and restore colocation. This happens when a
new node joins the cluster.

The joining node is responsible for creating and informing other nodes
about the new CDC generation. It does that by serializing it and inserting
into an internal distributed table ("CDC topology description table").
If it fails the insert, it fails the joining process. It then announces
the generation to other nodes through gossip using the generation's
timestamp, which is the partition key of the inserted distributed table
entry.

Nodes that learn about the new generation through gossip attempt to
retrieve it from the distributed table. This might fail - for example,
if the node is partitioned away from all replicas that hold this
generation's table entry. In that case the node might stop accepting
writes, since it knows that it should send log entries to a new generation
of streams, but it doesn't know what the generation is. The node will keep
trying to retrieve the data in the background until it succeeds or sees
that it is no longer necessary (e.g., because yet another generation
superseded this one). So we give up some availability to achieve safety.
However, this solution is not completely safe (might break consistency
properties): if a node learns about a new generation too late (if gossip
doesn't reach this node in time), the node might send writes to the wrong
(old) generation. In the future we will introduce a transaction-based
approach where we will always make sure that all nodes receive the new
generation before any of them starts using it (and if it's impossible
e.g. due to a network partition, we will fail the bootstrap attempt).
In practice, if the admin makes sure that the cluster works correctly
before bootstrapping a new node, and a network partition doesn't start
in the few seconds window where a new generation is announced, everything
will work as it should.

After the learning node retrieves the generation, it inserts it into an
in-memory data structure called "CDC metadata". This structure is then
used when performing writes to the CDC log -- given the timestamp of the
written mutation, the data structure will return the CDC generation
operating at this time point. CDC metadata might reject the query for
two reasons: if the timestamp belongs to an earlier generation, which
most probably doesn't have the colocation property anymore, or if it is
picked too far away into the future, where we don't know if the current
generation won't be superseded by a different one (so we don't yet know
the set of streams that this log write should be sent to). If the client
uses server-generated timestamps, the query will never be rejected.
Clients can also use client-generated timestamps, but they must make sure
that their clocks are not too desynchronized with the database --
otherwise some or all of their writes to CDC-enabled tables will be
rejected.

In the case of rolling upgrade, where we restart nodes that were
previously running without CDC, we act a bit differently - there is no
naturally selected joining node which must propose a new generation.
We have to select such a node using other means. For this we use a bully
approach: every node compares its host id with host ids of other nodes
and if it finds that it has the greatest host id, it becomes responsible
for creating the first generation.

This change also fixes the way of choosing values of the "time" column
of CDC log writes: the timeuuid is chosen in a way which preserves
ordering of corresponding base table mutations (the timestamp of this
timeuuid is equal to the base table mutation timestamp).

Warning: if you were running a previous CDC version (without topology
change support), make sure to disable CDC on all tables before performing
the upgrade. This will drop the log data -- backup it if needed.

TODO in future patchset: expire CDC generations. Currently, each inserted
CDC generation will stay in the distributed tables forever (until
manually removed by the administrator). When a generation is superseded,
it should become "expired", and 24 hours after expiration, it should be
removed. The distributed tables (cdc_topology_description and
cdc_description) both have an "expired" column which can be used for
this purpose.

Unit tests: dev, debug, release
dtests (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/907/
2020-02-04 10:20:29 +02:00
Gleb Natapov
2876482373 lwt: account for cases where LWT request were moved to another shard in statistics
Now that we bounce lwt requests to the correct shard before calling into
storage_proxy the cross shard op accounting does not account for bounced
lwt statement. Fix that by increasing corresponding counter when
returning a "bounce" reply.

Message-Id: <20200203122011.GH26048@scylladb.com>
2020-02-04 10:20:28 +02:00
Nadav Har'El
37f2f6112e cql3::util::maybe_quote: avoid stack overflow and fix quote doubling
Merged patch series from Benny Halevy:

The function was reimplemented to solve the following issues.
The cutom implementation also improved its performance in
close to 19%

Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings
as found with the limits_test.py:TestLimits.max_key_length_test dtest.

std::regex_replace does not replace in-place so no doubling of
quotes was actually done.

Add unit test that reproduces the crash without this fix
and tests various string patterns for correctness.

Note that defining the regex with std::regex::optimize
still ended up with stack overflow.

Fixes #5671

* cql3::util::maybe_quote: avoid stack overflow and fix quote doubling
* cql3::util::maybe_quote: further optimize quote doubling
2020-02-04 10:20:28 +02:00
Nadav Har'El
6e91f159fe LWT: handle bounce_to_shard result for batch statements
Merged patch series from Gleb Natapov:
Batch statement can also execute LWT and hence need to handle
 bounce_to_shard result.

* transport: handle bounce_to_shard for batch statement
* transport: consolidate bounce_to_shard handling between all three verbs that handle it
2020-02-04 10:20:28 +02:00
Takuya ASADA
1446fe930b dist/redhat: install specified version of scylla-conf on meta package (#5599)
To install specified version of scylla-conf package, we need to add it on Requires.

Fixes #5639
2020-02-04 10:20:28 +02:00
Benny Halevy
f45fabab73 gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
2020-02-04 10:20:28 +02:00
Avi Kivity
501b24cad3 test.py: use command line option in preference to environment variable when calling a test
Command line options are printed out, so if a user cuts-and-pastes a
command line they will get a run that is more similar to the one that
the test executed.
Message-Id: <20200202133209.209608-1-avi@scylladb.com>
2020-02-04 10:20:28 +02:00
Gleb Natapov
9c75a25e9f transport: consolidate bounce_to_shard handling between all three verbs that handle it
All three verbs that need to handle bounce_to_shard have almost
identical process_*() and process_*_on_shard() functions. Consolidate
them into one to reuse the code.
2020-02-03 14:27:50 +02:00
Gleb Natapov
dd793098fa transport: handle bounce_to_shard for batch statement
Batch statement can also execute LWT and hence need to handle
bounce_to_shard result.

Fixes: #5644
2020-02-03 14:27:30 +02:00
Kamil Braun
4b3754ff94 docs: add documentation about CDC generations 2020-02-03 10:57:31 +01:00
Kamil Braun
b130b76274 test: disable CDC flag by default
When CDC flag is on, the node startup procedure takes a few seconds
longer (we have to generate CDC streams). This is not necessary in
non-CDC tests.
2020-02-03 10:57:31 +01:00
Kamil Braun
0d41e2c1fe test: add cdc::generate_timeuuid tests 2020-02-03 10:57:31 +01:00
Kamil Braun
5fb5925fb4 test: add cdc::find_timestamp tests 2020-02-03 10:57:31 +01:00
Kamil Braun
7cb6ac33f5 storage_service: check if we know other nodes' tokens when joining ring
If we are a seed node (but not the only one) or we set
auto_bootstrap=off, it might happen due to misconfiguration or a network
partition that we don't know other nodes' tokens at the end of the
join_token_ring function, when we go into the NORMAL status, finishing
the joining process.

CDC however requires that we know other nodes' tokens at this point:
we need them to correctly create a new CDC generation.

This commit adds a check which prevents the node from starting if that's
not the case. If the check fails, the node first tries waiting a bit until
it learns about the tokens or timeouts.
2020-02-03 10:57:28 +01:00
Avi Kivity
2816404f57 test.py: documented exit code value
Document our chosen exit failure code value and its relationship
to git bisect.
Message-Id: <20200202134223.210578-1-avi@scylladb.com>
2020-02-03 00:58:58 +02:00
Avi Kivity
541893e69a Merge "Fix conversion of lua nil to cql null" from Rafael
"
The fix itself is fairly simple, but looking at the code I found that
our code base was not cleanly distinguishing null and empty values and
was treating null and missing values differently, but that distinction
was dead since a null is represented as a dead cell.
"

* 'espindola/lua-fix-null-v6' of https://github.com/espindola/scylla:
  lua: Handle nil returns correctly
  types: Return bytes_opt from data_value::serialize
  query-result-set: Assert that we don't have null values
  types: Fix comparison of empty and null data_values
  Revert "tests: Handle null and not present values differently"
  query-result-set: Avoid a copy during construction
  types: Move operator== for data_value out-of-line
2020-02-02 15:43:24 +02:00
Avi Kivity
c8890eb124 Merge "Simplify usage of stream subscriptions" from Rafael
"
In a few places, the only use we had for a subscription was calling
done(). With this series we now call done() early and store the
future<> instead.
"

* 'espindola/stream-cleanup' of https://github.com/espindola/scylla:
  sstable_test: Store a future<> instead of a subscription
  commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper
  lister: Store a future<> instead of a subscription
2020-02-02 14:49:00 +02:00
Rafael Ávila de Espíndola
5dfb658e77 build: Add two missing dependencies
With this change we always rebuild seastar/libseastar_testing.a for
the same reason we always rebuild seastar/libseastar.a: We have no
idea what its dependencies are, we have to recurse to seastar to find
out.

The other missing dependency is that we have to rebuild build.ninja
when seastar/CMakeLists.txt changes. A change in
seastar/CMakeLists.txt can cause seastar.pc to change which can change
the command lines used.

That is incomplete as change other seastar files can have the same
impact, but it is better than nothing.

It is not sufficient to put a dependency in the seastar.pc file as
that file will be modified when cmake is run and the scylla ninja
process doesn't see the CMakeLists.txt to seastar.pc edge.

Fixes: #5687

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200201001126.458992-1-espindola@scylladb.com>
2020-02-01 21:08:26 +02:00
Pavel Emelyanov
4839ca8491 storage_service: Unregister from gossiper notifications ... at all
This unregistration doesn't happen currently, but doesn't seem to
cause any problems in general, as on stop gossiper is stopped and
nothing from it hits the store_service.

However (!) if an exception pops up between the storage_service
is subscribed on gossiper and the drain_on_shutdown defer action
is set up  then we _may_ get into the following situation:

- main's stuff gets unrolled back
- gossiper is not stopped (drain_on_shutdown defer is not set up)
- migration manager is stopped (with deferred action in main)
- a nitification comes from gossiper
    -> storage_service::on_change might want to pull schema with
       the help of local migration manager
    -> assert(local_is_initialized) strikes

Fix this by registering storage_service to gossiper a bit earlier
(both are already initialized y that time) and setting up unregister
defer right afterwards.

Test: unit(dev), manual start-stop
Bug: #5628

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200130190343.25656-1-xemul@scylladb.com>
2020-01-31 14:02:18 +01:00
Avi Kivity
ec5b721db7 test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>
2020-01-31 14:02:18 +01:00
Dejan Mircevski
6661ed7de4 cql3: Drop restrictions::values() method
No-one seems to invoke this method.  Instead, clients invoke
restriction::values (note singular "restriction").  Most subclasses of
restrictions also inherit from restriction, so values() still exists
in their public interface.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-01-31 13:05:51 +01:00
Avi Kivity
985e00efa6 Merge "Fix the serialization of negative varint values" from Rafael
"
Benny pointed out that we could avoid a branch inside a loop is the
old serialization code. That got me looking at the logic and I found
that it would also produce an unnecessary 0xff prefix for some
negative numbers.

This patch series fixes the serialization and optimizes it. It now
does no extra copies for positives numbers and only one extra copy for
negative numbers, which I think is optimal since cpp_int uses sign
magnitude and we want the 2 complement representation.
"

* 'espindola/serialize_varint-improvements-v2' of https://github.com/espindola/scylla:
  types: Use a fancy iterator to avoid a temporary buffer
  types: Use export_bits to serialize cpp_int
  types: Avoid a branch in a loop
  types: Fix encoding of negative varint
  types: Replace "num.sign() < 0" with "num < 0"
2020-01-30 20:35:54 +02:00
Rafael Ávila de Espíndola
cc81ba3432 types: Use a fancy iterator to avoid a temporary buffer
By using a fancy iterator we can avoid calling export_bits with a
temporary buffer before copying the result to the output.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola
7e67ce0bdb types: Use export_bits to serialize cpp_int
This avoid a copy when serializing positive numbers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola
27a67f1a2c types: Avoid a branch in a loop
Thanks to Benny for the suggestion.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola
c89c90d07f types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 10:25:09 -08:00
Rafael Ávila de Espíndola
ed747122aa types: Replace "num.sign() < 0" with "num < 0"
Surprisingly, this produces better code with cpp_int.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 10:24:03 -08:00
Rafael Ávila de Espíndola
cc9495d4d3 sstable_test: Store a future<> instead of a subscription
The only use we had for the subscription was calling done, may as well
call it early and store the future<>.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola
da984f1f33 commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper
The only use we had for the subscription was calling done, may as well
call it early and store the future<>.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola
b88f6edee0 lister: Store a future<> instead of a subscription
The only use we had for the subscription was calling done, may as well
call it early and store the future<>.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-30 08:31:28 -08:00
Gleb Natapov
b08679e1d3 db/system_keyspace: use user memory limits for local.paxos table
Treat writes to local.paxos as user memory, as the number of writes is
dependent on the amount of user data written with LWT.

Fixes #5682

Message-Id: <20200130150048.GW26048@scylladb.com>
2020-01-30 17:07:27 +02:00
Piotr Sarna
b783d40aaf Merge 'Add per scheduling groups statistics' from Eliran
This set implements support for per scheduling group statistics in
storage proxy and tables view statistics (although tables view per
scheduling group stats are not actively applied in this series).
Having those statistics per scheduling group can help in finding operations
that are performed outside their context, another advantage is that
it lays the land for supporting per service level statistics for the
workload prioritization enterprise feature.
At some point there was a thought to add those stats per role but
for now it is not feasible at the moment:
1. The number of roles/user is unbounded so it is dangerous to
hold stats (in memory) for all of them.
2. We will need a proper design of how to deal with the hierarchical
nature of roles in the stats.

Besides these reasons and regardless, it is beneficial to look on
resource related stats per scheduling group, looking at resources
per user or role will not necessarily give insights since resources
are divided per sg and not role, so it can lead to false conclusions
if more than one role is attached to the same service level.

Tests:
unit tests (Dev, Debug)
validating the stats with monitor

* es/per_sg_stats/v6:
  storage proxy: migrate to per scheduling group statistics
  internalize storage proxy statistics metric registration
2020-01-30 15:02:33 +01:00
Eliran Sinvani
971711a546 storage proxy: migrate to per scheduling group statistics
This commit builds on top of the introduced per scheduling group
statistics template and employs it for achieving a per scheduling
group statistics in storage_proxy.

Some of the statistics also had meaning as a global - per
shard one. Those are the ones for determining if to
throttle the write request. This was handled by creating a
global stats struct that will hold those stats and by changing
the stat update to also include the global one.

One point that complicated it is an already existing aggregation
over the per shard stats that now became a per scheduling group
per shard stats, converting the aggregation to a two-dimensional
aggregation.

One thing this commit doesn't handle is validating that an individual
statistic didn't "cross a scheduling group boundary", such validation
is possible but it can easily be added in the future. There is a
subtlety to doing so since if the operation did cross to other
scheduling group two connected statistics can lose balance
for example written bytes and completed write transactions.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2020-01-30 15:01:44 +01:00
Eliran Sinvani
8cfc2aad57 internalize storage proxy statistics metric registration
The storage proxy statistics structure did not contain
a method for registering the statistics for metric
groups, instead, each user had to register some
of the metrics by itself. There is no real reason
for separating the metrics registration from
the statistics data. There is even less justification
for doing this only for part of the stats as is
the case for those statistics.
This commit internalize the metrics registration
in the storage_proxy stats structures.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2020-01-30 15:01:40 +01:00
Gleb Natapov
c138dfd33e lwt: introduce LWT gossiper feature
Do not allow lwt operation if LWT is not enabled by entire cluster.

Message-Id: <20200130120912.GV26048@scylladb.com>
2020-01-30 15:12:56 +02:00
Benny Halevy
606db0d412 cql3::util::maybe_quote: further optimize quote doubling
Avoid string copies when doubling quotes in the string
by counting them when scanning the input string and
reserving the required space when making the result std::string.

This showed a performance improvement of ~1.8% when
running the maybe_quote unit test in tight loop
(w/ the shorter strings only)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-30 14:55:51 +02:00
Rafael Ávila de Espíndola
a16cb00719 configure: Don't use -Wno-error when building seastar
This depends on the recent patches to avoid warnings in seastar.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200127210833.200410-1-espindola@scylladb.com>
2020-01-30 14:10:18 +02:00
Avi Kivity
09e2556541 Update seastar submodule
* seastar 44cf127ee9...65980a9b30 (2):
  > io_tester: fix the fix for lack of file closing
  > cmake: Disable broken gcc warning -Warray-bounds
2020-01-30 14:10:18 +02:00
Avi Kivity
b01f0cab60 utils: add missing include for ssize_t
gcc 10 tightened its C++ includes to no longer provide ssize_t,
so we must get it from a C header instead.
Message-Id: <20200129205912.21139-1-avi@scylladb.com>
2020-01-30 14:10:18 +02:00
Avi Kivity
adb64dc72f treewide: tighten concepts syntax
gcc 10 requires a semicolon after every compound requirement,
as per the standard. Add missing semicolons where necessary.
Message-Id: <20200129205805.20928-1-avi@scylladb.com>
2020-01-30 14:10:18 +02:00
Rafael Ávila de Espíndola
4b4efcf302 types: Remove collection_type_impl::serialize
The rest of the serialize api has been devirtualized some time ago,
but this auxiliary function stayed virtual.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200129203916.20460-1-espindola@scylladb.com>
2020-01-30 14:10:18 +02:00
Kamil Braun
bd42b10df1 cdc: rename cdc/cdc.{hh,cc} to cdc/log.{hh,cc}
To increase modularity, making it easier to find what is where and
maintain.

The 'log' module (cdc/log.{hh,cc}) is responsible for updating CDC log
tables when base table writes are performed.

The 'generation' module (cdc/generation.{hh,cc}) handles stream
generation changes in response to topology change events.

cdc/metadata.{hh,cc} contains a helper class which holds the currently
used generation of streams. It is used by both aforementioned modules:
'log' queries it, while 'generation' updates it.
2020-01-30 11:10:39 +01:00
Kamil Braun
1a56310687 locator: remove get_shard_count and get_ignore_msb_bits from snitch
Snitch forms a class hierarchy which get_shard_count and
get_ignore_msb_bits ignore (their returned values only depend on the
gossiper's state).

Besides, these functions just don't belong there.
Snitch has nothing to do with shard_count or ignore_msb_bits.
2020-01-30 11:10:08 +01:00
Kamil Braun
e91af78cf5 cdc: update streams description table
Inform CDC users about newly generated streams.
2020-01-30 11:10:08 +01:00
Kamil Braun
cbe510d1b8 cdc: use stream generations
Change the CDC code to use the global CDC stream generations.

The per-base-table CDC description table was removed. The code instead
uses cdc::metadata which is updated on gossip events.

The per-table description tables were replaced by a global description
table to be used by clients when searching for streams.
2020-01-30 11:10:08 +01:00
Kamil Braun
8f4a2ba0b9 storage_service: learn about CDC stream generations.
When a node learns that another node joins the cluster (or begins
the joining process, i.e. bootstrap), it will read the CDC generation
timestamp proposed by that node, use it to retrieve the generation from the
distributed generations table, and save it in its local generation queue
to be used for writing to the CDC log when its local clock crosses
the generation's timestamp.

The CDC generation is saved in the queue before tokens are saved in
token_metadata. This is important so that when the node becomes
a coordinator of a write, it will already have all the necessary
information required to generate a corresponding CDC log mutation.

After joining, nodes should keep gossiping their proposed stream
generation timestamps forever, until they learn about a newer timestamp,
in which case they'll start gossiping the new timestamp.

There is one case where a node won't gossip such any generation timestamp:
if it's upgrading from a non-CDC version.
In this situation we make one of the nodes begin the first generation.
2020-01-30 11:10:08 +01:00
Kamil Braun
834c2ca997 cdc: add cdc::metadata class
The class stores a queue of CDC generations to be used for choosing
streams when writing to the CDC log.

This data structure will be updated on some gossip events (when a new node
joins the cluster and proposes a new generation of CDC streams).
2020-01-30 11:10:08 +01:00
Kamil Braun
86af2a63ec clocks: add printing functions
For debugging and logging.
2020-01-30 11:10:08 +01:00
Kamil Braun
34e4ce275d storage_service: restore CDC streams timestamp when replacing a node
When a node is replacing another node it will keep gossiping its CDC
streams generation timestamp.
2020-01-30 11:10:08 +01:00