scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Juliusz Stasiewicz	f2cedbc228	cdc: Remove assert that bootstrap_tokens is nonempty	2020-05-29 12:23:08 +02:00
Avi Kivity	6f1a8cfeea	Merge 'Use special partitioner for CDC Log' from Piotr " CDC has to create CDC streams that are co-located with corresponding BaseTable data. This is not always easy. Especially for small vnodes. This PR introduces new partitioner which allows us to easily find such stream ids that the stream belongs to a given vnode and shard. The idea is that a partitioner accepts only keys that are a blob composed of two int64 numbers. The first number is the token of the key. Tests: unit(dev), dtests(CDC) " * haaawk-cdc_partitioner: cdc:use CDCPartitioner for CDC Log dht: Add find_first_token_for_shard dht: use long_token in token::to_int64 cdc: add CDCPartitioner stream_id: add token_from_bytes static function i_partitioner: Stop distinguishing whether keys order is preserved	2020-05-06 20:29:27 +03:00
Piotr Jastrzebski	0416d70c9f	cdc:use CDCPartitioner for CDC Log This will allow deterministic stream_id generation and would remove the risk of not being able to generate a stream id for some vnode. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-22 18:25:51 +02:00
Piotr Jastrzebski	330cd162f0	stream_id: add token_from_bytes static function This function will be used by CDCPartitioner to extract token from partition key. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-21 15:50:22 +02:00
Kamil Braun	113384b6f8	gms: move TOKENS string deserialization code into versioned_value And do the same with CDC_STREAMS_TIMESTAMP. The code that took a list of tokens represented as a string inside versioned_value (for gossiping) and deserialized it into an `unordered_set<dht::token>` lived in the storage_service module, while the code that did the serializing (set -> string) lived in versioned_value. There was a similar situation with the CDC generation timestamp. To increase maintanability and reusability, the deserialization code is now placed next to the serialization code in versioned_value. Furthermore, the `make_full_token_string`, `make_token_string`, and `make_cdc_streams_timestamp_string` (serialization functions) are moved out of versioned_value::factory and made static methods of versioned_value instead.	2020-04-20 12:57:13 +02:00
Piotr Jastrzebski	57cfe6d0e1	cdc: store stream_ids as blobs in internal tables In new CDC Log format stream_id is represented by a single blob column so it makes sense to store it in the same form everywhere - including internal CDC tables. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	b2acdc9307	cdc: improve do_update_streams_description Use std::set::insert that takes range instead of looping through elements and adding them one by one. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	446722d6ed	cdc: Fix generate_topology_description In new CDC Log format we store only a single stream_id column. This means generate_topology_description has to use appropriate schema for generating tokens for stream_ids. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	9a212dcaef	cdc: add stream_id::operator< Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:21 +01:00
Piotr Jastrzebski	f317a659d9	cdc: change stream_id representation New CDC Log format stores stream ids as blobs. It makes sense to keep them internally in the same form. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:30:10 +01:00
Piotr Jastrzebski	354e3c34c8	cdc log: merge stream_id columns into a single column Previously we had stream_id_1 and stream_id_2 columns of type long each. They were forming a partition key. In a new format we want a single stream_id column that forms a partition key. To be able to still store two longs, the new column will have type blob and its value will be concatenated bytes of two longs that partition key is composed of. We still want partition key to logically be two longs because those two values will be used by a custom partitioner later once we implement it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-04 13:27:48 +02:00
Piotr Jastrzebski	f0f6e220ea	cdc: stop using partitioners CDC can get all it needs from a config and does not need partitioner. For base table specific operations CDC is using partitioner from that table (obtained with schema::get_partitioner). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	2d7532f87f	dht: add dht::get_token and replace all calls to dht::global_partitioner().get_token dht::get_token is better because it takes schema and uses it to obtain partitioner instead of using a global partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	50cfe81331	murmur3: move sharding logic to token and i_partitioner Since token representation is fixed now, all the partitioners will share the sharding logic. It makes sense now to keep the logic in common super class and separate header that's included only in i_partitioner.cc. shard_of and token_for_next_shard are now implemented in i_partitioner. They would be non-virtual but we have to keep them virtual because one test is overriding them to enforce some specific sharding. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Kamil Braun	1a56310687	locator: remove get_shard_count and get_ignore_msb_bits from snitch Snitch forms a class hierarchy which get_shard_count and get_ignore_msb_bits ignore (their returned values only depend on the gossiper's state). Besides, these functions just don't belong there. Snitch has nothing to do with shard_count or ignore_msb_bits.	2020-01-30 11:10:08 +01:00
Kamil Braun	e91af78cf5	cdc: update streams description table Inform CDC users about newly generated streams.	2020-01-30 11:10:08 +01:00
Kamil Braun	a6e62dba95	cdc: add get_streams_timestamp_for(endpoint) method In future commits this will be used by nodes learning about other nodes entering NORMAL status. The joining node proposes a new generation of streams, whose timestamp is gossiped by the node.	2020-01-30 11:10:08 +01:00
Kamil Braun	19f23c6de1	cdc: add cdc-related node startup functions	2020-01-30 11:10:08 +01:00
Piotr Jastrzebski	9fa18c03c1	cdc: add generate_topology_description cdc::topology_description describes a mapping of tokens to CDC streams. The cdc::generate_topology_description function is given: 1. a set of tokens which split the token ring into token ranges (vnodes), 2. information on how each token range is distributed among its owning node's shards and tries to generate a set of CDC stream identifiers such that for each shard and vnode pair there exists a stream whose token falls into this vnode and is owned by this shard. It then builds a cdc::topology_description which maps tokens to these found stream identifiers, such that if token T is owned by shard S in vnode V, it gets mapped to the stream identifier generated for (S, V).	2020-01-30 11:10:07 +01:00
Piotr Jastrzebski	a3748f942e	cdc: add topology_description class This is a class that will be used for storing information required to perform CDC operations, i.e. assignment of token ranges to CDC streams. It is serializable to bytes and will be stored in such a form in a distributed table accessible by all nodes.	2020-01-30 11:10:07 +01:00

20 Commits