scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Avi Kivity	42a76567b7	dht: use nonwrapping_ranges in ring_position_range_sharder It was the observation that ring_position_range_sharder doesn't support wrapping ranges that started the nonwrapping_range madness, but that class still has some leftover wrapping ranges. Close the circle by removing them. Message-Id: <20161123153113.8944-1-avi@scylladb.com> (cherry picked from commit `8686a59ea5`)	2016-12-27 19:16:26 +02:00
Avi Kivity	a1d463900f	storage_proxy: handle range scans of sparsely populated tables When murmur3_partitioner_ignore_msb_bits = 12 (which we'd like to be the default), a scan range can be split into a large number of subranges, each going to a separate shard. With the current implementation, subranges were queried sequentially, resulting in very long latency when the table was empty or nearly empty. Switch to an exponential retry mechanism, where the number of subranges queried doubles each time, dropping the latency from O(number of subranges) to O(log(number of subranges)). If, during an iteration of a retry, we read at most one range from each shard, then partial results are merged by concatentation. This optimizes for the dense(r) case, where few partial results are required. If, during an iteration of a retry, we need more than one range per shard, then we collapse all of a shard's ranges into just one range, and merge partial results by sorting decorated keys. This reduces the number of sstable read creations we need to make, and optimizes for the sparse table case, where we need many partial results, most of which are empty. We don't merge subranges that come from different partition ranges, because those need to be sorted in request order, not decorated key order. [tgrabiec: trivial conflicts] Message-Id: <20161220170532.25173-1-avi@scylladb.com> (cherry picked from commit `a1cafed370`)	2016-12-27 16:57:18 +02:00
Paweł Dziepak	b86a826baf	dht: describe split_range[s]_to_shards() guarantees We are going to require these functions to return sorted and disjoint ranges. They already do so (provided that the input ranges are sorted and disjoint), but if the guarantee is not explicitly stated it may disappear some day.	2016-12-15 13:07:32 +00:00
Asias He	463cc4fbde	dht: Introduce split_ranges_to_shards Split a ranges into shard ranges map with ring_position_range_sharder helper.	2016-12-12 09:04:21 +08:00
Asias He	044c4ff44c	dht: Introduce split_range_to_shards Split a range into shard ranges map with ring_position_range_sharder helper.	2016-12-12 09:04:21 +08:00
Duarte Nunes	ada2f1092e	dht: Make i_partitioner::tri_compare pure virtual This patch makes the i_partitioner::tri_compare() function pure virtual as it is overridden by all partitioners. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161211172037.16496-1-duarte@scylladb.com>	2016-12-11 19:29:37 +02:00
Duarte Nunes	bb66b051ed	dht: Make i_partitioner::tri_compare memory safe This patch fixes a typo in i_partitioner::tri_compare() where we were using std::max instead of std::min, thus avoiding accessing random memory and getting random results. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161211165043.17816-1-duarte@scylladb.com>	2016-12-11 18:58:10 +02:00
Avi Kivity	28857e42e7	Merge " Virtualize size_estimates system table" from Duarte "We currently write the size_estimates system table for every schema on a periodic basis, currently set to 5 minutes, which can interfere with an ongoing workload. This patchset virtualizes it such that queries are intercepted and we calculate the results on the fly, only for the ranges the caller is interested in. Fixes #1616" * 'virtual-estimates/v4' of github.com:duarten/scylla: size_estimates_virtual_reader: Add unit test db: Delete size_estimates_recorder size_estimates: Add virtual reader column_family: Add support for virtual readers storage_service: get_local_tokens() returns a future nonwrapping_range: Add slice() function range: Find a sequence's lower and upper bounds system_keyspace: Build mutations for size estimates size_estimates: Store the token range as bytes range_estimates: Add schema murmur3_partitioner: Convert maximum_token to sstring	2016-11-28 10:12:59 +02:00
Avi Kivity	07d5a20bae	Wire up sharding ignore msb parameter to configuration We might have used a fancy map<sstring, any> to pass the parameters, but that's overkill for now.	2016-11-22 22:40:47 +02:00
Avi Kivity	8b1d689de8	partitioner: add ignore_msb parameters to byte ordered and random partitioners Ignored; doesn't make sense on byte ordered, and random is deprecated.	2016-11-22 21:56:42 +02:00
Avi Kivity	af16c0fac4	murmur3_partitioner: shard on the middle token bits, not most significant bits Sharding on the most significant token bits aliases with the vnode mechanism, which also uses the most significant bits; this requires a huge number of vnodes to achieve good sharding. This patch teaches the murmur3 partitioner to ignore the most significant N bits when calculating a token's hard, so we use token bits which still have some entropy. In effect, with changes the token range layout from shard 0 shard 1 ... shard S-1 to shard 0 shard 1 ... shard S-1 shard 0 shard 1 ... shard S-1 ... shard 0 shard 1 ... shard S-1 Where the number of repetitions of the block is 2^(ignored msb bits). For compatibility, the default is zero ignored bits, matching the pre-patch state, until we wire things up.	2016-11-22 21:56:42 +02:00
Duarte Nunes	01815ecd24	murmur3_partitioner: Convert maximum_token to sstring This patch ensures we can convert the maximum_token to an sstring. For Cassandra, the minimum and maximum tokens have the same representation. So, we use the string representation of the maximum_token for the maximum_token. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 10:56:32 +00:00
Duarte Nunes	66f6a367a4	ring_position_range_sharder: Avoid copying eagerly Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161104115632.15974-1-duarte@scylladb.com>	2016-11-13 11:42:23 +02:00
Avi Kivity	7202b94183	dht: introduce a sharder for vectors of partition ranges Building on the single-range sharder, add a sharder for vectors of partition ranges. This helps with wrapped ranges, which are translated into a vector containing two shards.	2016-11-03 19:10:20 +02:00
Avi Kivity	43a2380899	dht: add a generator for shard/range pairs Divides a ring_position range into a sequence of shard/range pairs. This allows sequential iteration over shards in ring order. The current multi-partition query executes on all shards in parallel, but this is very wasteful, as most of the data will be thrown away if it is not included in the page. With the generator, we can switch to sequential execution.	2016-11-03 19:10:17 +02:00
Avi Kivity	1f88d103a8	partitioner: add i_partitioner::token_for_next_shard() When performing a range query, we want to iterate over shards, running the query on each shard in order until the query range is exhausted or we have the right number of rows. To be able to do this, introduce token_for_next_shard(), which allows us to determine the boundary between shards. It is a sort-of inverse to shard_of(), in that shard_of(token_for_next_range(t)) == shard_of(t) + 1	2016-11-03 19:09:23 +02:00
Avi Kivity	6c45b0bae8	partitioner: make comparators public The public comparison operators depend on global_partitioner(), and are therefore less useful for tests.	2016-11-03 11:27:40 +02:00
Avi Kivity	6320181b97	partitioner: const correctness for comparators	2016-11-03 11:27:40 +02:00
Avi Kivity	470826d127	partitioner: change partitioners to have shard counts independent from smp::count Useful for testing.	2016-11-03 11:27:40 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Duarte Nunes	862f51cddf	partitioner: Parse token from bytes This patch adds the from_bytes() function to the i_partitioner class, whose purpose is parse a particular token and explicitly handle the case when the minimum token is specified. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-30 11:17:02 +00:00
Avi Kivity	4fcebd4ca6	random_partitioner: fix overflow in shard_of() uint128_t will overflow if smp::count > 2. Replace with a larger type. Message-Id: <1471188765-30142-1-git-send-email-avi@scylladb.com>	2016-08-15 09:41:54 +03:00
Asias He	2f4cd86809	random_partitioner: Implement random_partitioner Cassandra 1.x clusters often use RandomPartitioner. Supporting RandomPartitioner will allow easier migration to Scylla Tests are added to make sure scylla generates the same token as Cassandra does for the same partition key. Fixes #1438 Message-Id: <3bc8b7f06fad16d59aaaa96e2827198ce74214c6.1469166766.git.asias@scylladb.com>	2016-07-24 16:25:25 +03:00
Duarte Nunes	aaa76d58ba	query: Move to_partition_range to dht namespace This patch moves to_partition_range, from the query namespace to the dht namespace, where it is a more natural fit. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468498060-19251-1-git-send-email-duarte@scylladb.com>	2016-07-15 10:41:52 +02:00
Asias He	f4389349e4	config: Enable partitioner option Enable --partitioner option so that user can choose partitioner other than the default Murmur3Partitioner. Currently, only Murmur3Partitioner and ByteOrderedPartitioner are supported. When non-supported partitioner is specifed, error will be propogated to user.	2016-07-08 17:44:55 +08:00
Asias He	9c27b5c46e	byte_ordered_partitioner: Implement missing describe_ownership and midpoint In order to support ByteOrderedPartitioner, we need to implement the missing describe_ownership and midpoint function in byte_ordered_partitioner class. As a starter, this path uses a simple node token distance based method to calculate ownership. C* uses a complicated key samples based method. We can switch to what C* does later. Tests are added to tests/partitioner_test.cc. Fixes #1378	2016-07-08 17:44:55 +08:00
Asias He	f6a2672be0	storage_service: Modify log to match config option of scylla We currently log as follow: May 9 00:09:13 node3.nl scylla[2546]: [shard 0] storage_service - This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set,or all existing data is removed and the node is bootstrapped again Howerver, user should use override_decommission:true instead of cassandra.override_decommission:true in scylla.yaml where the cassandra prefix is stripped. Fixes #1240 Message-Id: <b0c9424c6922431ad049ab49391771e07ca6fbde.1467079190.git.asias@scylladb.com>	2016-07-04 10:47:49 +02:00
Piotr Jastrzebski	27575a0528	Fix previous_entry_is_continuous Rename it to check_previous_entry. Remove unnesessary test. Make sure ring_position always has working relation_to_keys method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6bc790d492ba9b5c302a50218f3e26b924f657d0.1467101754.git.piotr@scylladb.com>	2016-06-28 10:27:08 +02:00
Asias He	ee0585cee9	dht: Add default constructor for token It is needed to put token in to a boost interval_map in the following patch.	2016-05-17 17:32:15 +08:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	775cc93880	remove unused range and token serializers	2016-02-02 12:15:49 +02:00
Asias He	bdd6a69af7	streaming: Drop unused parameters - int connections_per_host Scylla does not create connections per stream_session, instead it uses rpc, thus connections_per_host is not relevant to scylla. - bool keep_ss_table_level - int repaired_at Scylla does not stream sstable files. They are not relevant to scylla.	2016-01-25 11:38:13 +08:00
Gleb Natapov	043d132ba9	Remove no longer used serializers.	2016-01-24 12:45:41 +02:00
Gleb Natapov	49ce2b83df	Add ring_position constructor needed by serializer.	2016-01-24 12:45:41 +02:00
Asias He	89b79d44de	streaming: Get rid of the _connecting_ parameter messaging_service will use private ip address automatically to connect a peer node if possible. There is no need for the upper level like streaming to worry about it. Drop it simplifies things a bit.	2015-12-31 11:25:08 +01:00
Nadav Har'El	f0b27671a2	murmur3 partitioner: remove outdated comment, and code Since commit `16596385ee`, long_token() is already checking t.is_minimum(), so the comment which explains why it does not (for performance) is no longer relevant. And we no longer need to check t._kind before calling long_token (the check we do here is the same as is_minimum). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2015-12-30 10:01:29 +02:00
Nadav Har'El	06ab43a7ee	murmur3 partitioner: fix midpoint() algorithm The midpoint() algorithm to find a token between two tokens doesn't work correctly in case of wraparound. The code tried to handle this case, but did it wrong. So this patch fixes the midpoint() algorithm, and adds clearer comments about why the fixed algorithm is correct. This patch also modifies two midpoint() tests in partitioner_test, which were incorrect - they verified that midpoint() returns some expected values, but expected values were wrong! We also add to the test a more fundemental test of midpoint() correctness, which doesn't check the midpoint against a known value (which is easy to get wrong, like indeed happened); Rather we simply check that the midpoint is really inside the range (according to the token ordering operator). This simple test failed with the old implementation of midpoint() and passes with the new one. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2015-12-24 17:19:49 +02:00
Pekka Enberg	e56bf8933f	Improve not implemented errors Print out the function name where we're throwing the exception from to make it easier to debug such exceptions.	2015-12-18 10:51:37 +01:00
Tomasz Grabiec	a78f4656e8	Introduce ring_position_less_comparator	2015-12-15 18:00:55 +01:00
Asias He	0af7fb5509	range_streamer: Kill FIXME in use_strict_consistency for consistent_rangemovement	2015-11-30 09:15:42 +08:00
Asias He	f80e3d7859	range_streamer: Simplify multiple_map to map conversion in add_ranges	2015-11-30 09:15:42 +08:00
Asias He	21882f5122	range_streamer: Kill one leftover comment	2015-11-30 09:15:42 +08:00
Asias He	6b258f1247	range_streamer: Kill FIXME for is_replacing	2015-11-30 09:15:42 +08:00
Asias He	6aa5bfe59f	range_streamer: Add virtual destructor to i_source_filter Found by debug build ==10190==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x602000084430 in thread T0: object passed to delete has wrong type: size of the allocated type: 16 bytes; size of the deallocated type: 8 bytes. #0 0x7fe244add512 in operator delete(void, unsigned long) (/lib64/libasan.so.2+0x9a512) #1 0x3c674fe in std::default_delete<dht::range_streamer::i_source_filter>::operator()(dht::range_streamer::i_source_filter) const /usr/include/c++/5.1.1/bits/unique_ptr.h:76 #2 0x3c60584 in std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> >::~unique_ptr() /usr/include/c++/5.1.1/bits/unique_ptr.h:236 #3 0x3c7ac22 in void __gnu_cxx::new_allocator<std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> > >::destroy<std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> > >(std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> >*) /usr/include/c++/5.1.1/ext/new_allocator.h:124 ...	2015-11-12 11:19:22 +02:00
Asias He	87292d6a16	range_streamer: Simplify unordered_multimap_to_unordered_map operator[] is own friend, it creates map[x] if x is not in the map.	2015-11-09 08:43:04 +08:00
Asias He	a54989cd65	range_streamer: Fix get_all_ranges_with_strict_sources_for std::set_difference requires the container to be sorted which is not true here, use remove_if. Do not use assert, use throw instead so that we can recover from this error.	2015-11-09 08:43:04 +08:00
Asias He	d166b0f3fa	range_streamer: Add get_work_map	2015-11-09 08:43:04 +08:00
Asias He	ed313160c2	storage_service: Add initial_token config option support	2015-11-04 10:42:17 +08:00
Asias He	16596385ee	token: Handle minimum token correctly in long_token Fixes: Exiting on unhandled exception of type 'runtime_exception': runtime error: Invalid token. Should have size 8, has size 0	2015-11-04 09:01:06 +08:00
Amnon Heiman	b77ec2bd6a	Importing token_range and endpoint_details from origin The storage server uses the token_range in origin to return inforamtion about the ring. This import the structures. The functionality in origin is redundant in this case and was not imported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-11-03 10:17:05 +02:00

1 2 3 4

152 Commits