scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Asias He	d23dafa7ac	dht: Remove column_families parameter in add_rx_ranges and add_tx_ranges In `4b1034b` (storage_service: Remove the stream_hints), we removed the only user of the api with the column_families parameter. std::vector column_families = { db::system_keyspace::HINTS }; streamer->add_tx_ranges(keyspace, std::move(ranges_per_endpoint), column_families); We can simplify the code range_streamer a bit by removing it. Fixes #3476 Tests: dtest update_cluster_layout_tests.py Message-Id: <c81d79c5e6dbc8dd78c1242837de892e39d6abd2.1528356342.git.asias@scylladb.com>	2018-06-10 14:53:40 +03:00
Glauber Costa	250d9332dc	partitioner: export the name of the algorithm used to do intra-node sharding We will export this on system tables. To avoid hard-coding it in the system table level, keep it at least in the dht layer where it belongs. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-04 11:25:58 -04:00
Avi Kivity	9eb7c0c65b	Merge "Remove (some) reactor stalls in the SSTable code" from Glauber " This is an improvement on my latest series. Instead of just dealing with the problem of destroying the Summary that I have identified in a previous test, I have tried to find other sources of stalls. Some of them are on readers and would affect early processes and operations like nodetool refresh. Others are on writers, which can affect any SSTable being written. Two of those stalls (on large filter, on summary read), I saw in a synthetic benchmark where I used very small values + nodetool compact to generate one SSTable with many keys. They were 80ms and 20ms respectively, and now they are totally gone. For others, I just tried to be safe (for instance, if we know reading/writing large vectors can be costly, just always insert preemption points in them). With all of these patches applied, I no longer see stalls coming from the SSTable code in those tests (although given enough time, I am sure I can find more). Tests: unit (release) Fixes: #3282, Fixes #3281, Fixes #3269 " * 'sstables-stalls-v3-updated' of github.com:glommer/scylla: large_bitset/bloom filter: add preemption points in loops sstables: read filter in a thread abstract summary entry version of the token with a token view add a token_view sstables: rework summary entries reading sstables: avoid calls to resize for vectors sstables: replace potentially large for loop with do_until summary_entry: do not store key bytes in each summary entry tests: change tests to make summary non-copyable chunked_vector: do not iterate to destruct trivially destructible types	2018-03-16 09:43:36 +01:00
Glauber Costa	dddc7e1676	add a token_view Ideally we would like tokens to be trivially destructible, so that we can easily dispose of giant vectors holding them. While that is hard to do with our current infrastructure, we can introduce a token_view, which holds a bytes_view elements instead of the real data - making it trivially destructible. The comparators are then changed to take a token_view, and an implicit conversion function is provided from tokens so they get compared. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-15 12:24:09 -04:00
Asias He	9b5585ebd5	range_streamer: Stream 10% of ranges instead of 10 ranges per time If there are a lot of ranges, e.g., num_tokens=2048, 10 ranges per stream plan will cause tons of stream plan to be created to stream data, each having very few data. This cause each stream plan has low transfer bandwidth, so that the total time to complete the streaming increases. It makes more sense to send a percentage of the total ranges per stream plan than a fixed ranges. Here is an example to stream a keyspace with 513 ranges in total, 10 ranges v.s. 10% ranges: Before: [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=system_traces, 510 out of 513 ranges: ranges = 51 [shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1 succeeded, took 107 seconds After: [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=system_traces, 510 out of 513 ranges: ranges = 10 [shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1 succeeded, took 22 seconds Message-Id: <a890b84fbac0f3c3cc4021e30dbf4cdf135b93ea.1520992228.git.asias@scylladb.com>	2018-03-14 10:12:12 +02:00
Asias He	73d8e2743f	dht: Fix log in range_streamer The address and keyspace should be swapped. Before: range_streamer - Bootstrap with ks3 for keyspace=127.0.0.1 succeeded, took 56 seconds After: range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 56 seconds Message-Id: <5c49646f1fbe45e3a1e7545b8470e04b166922c4.1520416042.git.asias@scylladb.com>	2018-03-07 11:49:58 +02:00
Raphael S. Carvalho	19d994cfff	dht: make it easier to create ring_position_view from token that's done by adding a separate explicit constructor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-01-03 15:26:26 -02:00
Raphael S. Carvalho	68ac0832b7	dht: introduce is_min/max for ring_position Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-01-03 15:26:25 -02:00
Paweł Dziepak	8c3b7fea81	Merge "Introduce new API and converters from/to old mutation_reader" from Piotr "This changeset is the first step to flatten mutation_reader. Then it introduces new mutation_fragment types for partition header and end of partition. Using those a new flat_mutation_reader is defined. Finally it introduces converters between new flat_mutation_reader and old mutation_reader." * 'haaawk/flattened_mutation_reader_v12' of github.com:scylladb/seastar-dev: Add tests for flat_mutation_reader Introduce conversion from flat_mutation_reader to mutation_reader Introduce conversion from mutation_reader to flat_mutation_reader Introduce flat_mutation_reader Extract FlattenedConsumer concept using GCC6_CONCEPT Introduce partition_end mutation_fragment Introduce a position for end of partition Introduce partition_start mutation_fragment Introduce FragmentConsumer Introduce a position for partition start streamed_mutation: Extract concepts using GCC6_CONCEPT macro	2017-10-16 12:14:23 +01:00
Duarte Nunes	2210d10552	gms/gossiper: Cleanup is_alive() Make it use get_endpoint_state_for_endpoint_ptr(), check if gossiper is enabled, mark it as const, and have some callers use it instead of open coding the logic. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-10-11 10:02:32 +01:00
Piotr Jastrzebski	2516b42752	Introduce partition_start mutation_fragment This type of mutation_fragment will be used in new mutation_reader to signal the beginning of the next partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-10-10 16:15:59 +02:00
Duarte Nunes	ceebbe14cc	gossiper: Avoid endpoint_state copies gossiper::get_endpoint_state_for_endpoint() returns a copy of endpoint_state, which we've seen can be very expensive. This patch adds a similar function which returns a pointer instead, and changes the call sites where using the pointer-returning variant is deemed safe (the pointer neither escapes the function, nor crosses any defer point). Fixes #764 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-10-10 13:48:02 +01:00
Tomasz Grabiec	741ec61269	streaming: Fix streaming not streaming all ranges It skipped one sub-range in each of the 10 range batch, and tried to access the range vector using end() iterator. Fixes sporadic failures of update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_node_1_test. Message-Id: <1505848902-16734-1-git-send-email-tgrabiec@scylladb.com>	2017-09-20 10:33:59 +03:00
Botond Dénes	a980ff6463	Use abort() instead of assert + throw in unreachable code Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <393c3730111dfe090c44d8fc2e31602956a7d008.1504022425.git.bdenes@scylladb.com>	2017-09-03 11:07:27 +03:00
Botond Dénes	d1209c548a	Fix -Wreturn-type warnings Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <99f7a006daaa78eb87720ac51c394093398bc868.1504013915.git.bdenes@scylladb.com>	2017-08-29 16:41:09 +03:00
Tomasz Grabiec	2ca99be27d	ring_position_view: Print token instead of token pointer Broken in `e989d65539`. Message-Id: <1503667158-7544-1-git-send-email-tgrabiec@scylladb.com>	2017-08-25 14:25:21 +01:00
Avi Kivity	81a33df25d	dht: reduce split_range_to_single_shard contiguous memory demand split_range_to_single_shard() returns a vector of size 4096, with each element (a partition_range) of size 100. The total of 400k can cause defragmentation if memory is fragmented. Fix by using a deque. Fixes #2707. Message-Id: <20170819141017.28287-1-avi@scylladb.com>	2017-08-21 14:25:45 +02:00
Duarte Nunes	ec75eac37d	ring_position_exponential_vector_sharder: Take ranges by rvalue Avoids some copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170814093310.29200-1-duarte@scylladb.com>	2017-08-14 12:55:43 +03:00
Asias He	f239b11a84	storage_service: Use the new range_streamer interface for bootstrap So that bootstrap operation will now stream small ranges at a time and restream the failed ranges.	2017-08-07 16:31:47 +08:00
Asias He	6810031ba7	dht: Extend range_streamer interface After this patch and the following patches to use the new range_streamder interface, all the following cluster operations: - bootstrap - rebuild - decommission - removenode will use the same code to do the streaming. The range_streamer is now extended to support both fetch from and push to peer node. Another big change is now the range_streamer will stream less ranges at a time, so less data, per stream_plan and range_streamer will remember which ranges are failed to stream and can retry later. The retry policy is very simple at the moment it retries at most 5 times and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes .... Later, we can introduce api for user to decide when to stop retrying and the retry interval. The benefits: - All the cluster operation shares the same code to stream - We can know the operation progress, e.g., we can know total number of ranges need to be streamed and number of ranges finished in bootstrap, decommission and etc. - All the cluster operation can survive peer node down during the operation which usually takes long time to complete, e.g., when adding a new node, currently if any of the existing node which streams data to the new node had issue sending data to the new node, the whole bootstrap process will fail. After this patch, we can fix the problematic node and restart it, the joining node will retry streaming from the node again. - We can fail streaming early and timeout early and retry less because all the operations use stream can survive failure of a single stream_plan. It is not that important for now to have to make a single stream_plan successful. Note, another user of streaming, repair, is now using small stream_plan as well and can rerun the repair for the failed ranges too. This is one step closer to supporting the resumable add/remove node opeartions.	2017-08-07 16:31:47 +08:00
Paweł Dziepak	68e57a742f	ring_position_comparator: drop unused overloads	2017-07-26 14:36:37 +01:00
Paweł Dziepak	fe7eba7f06	ring_position_comparator: accept sstables::decorated_key_view ring_position_comparator has overloads for comparing ring_positions as well as sstables::key_view. In the case of the latter it needs to compute the token of the key. However, the sstable layer could cache some tokens so let's allow the comparator callers to provide it directly.	2017-07-26 14:36:36 +01:00
Tomasz Grabiec	60678f0e8a	ring_position: Optimize contruction from r-value referenceces of decorated_key Message-Id: <1500650171-26291-1-git-send-email-tgrabiec@scylladb.com>	2017-07-24 10:25:14 +03:00
Asias He	d835cf2748	dht: Add selective_token_range_sharder It is like ring_position_range_sharder but it works with dht::token_range. This sharder will return the ranges belong to a selected shard.	2017-07-04 18:46:19 +08:00
Tomasz Grabiec	e989d65539	dht: Make ring_position_view copyable dht::token needs to be stored as a pointer now and not a reference so that validity of old pointers is not impacted by in-place object construction which would occur in the copy-assignment operator. [1] says that old pointers can be used to access the new object only if the type "does not contain any non-static data member whose type is const-qualified or a reference type". [1] http://en.cppreference.com/w/cpp/language/lifetime#Storage_reuse	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	81e7b561da	dht: Add ring_position min()/max()	2017-06-24 18:06:11 +02:00
Avi Kivity	f9f2f18145	dht: fix bad to_sstring() call to_sstring() is part of seastar, nor the global namespace.	2017-06-22 17:51:27 +03:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Calle Wilund	6ca07f16c1	scylla: fix compilation errors on gcc 5 Message-Id: <1495030581-2138-1-git-send-email-calle@scylladb.com>	2017-05-17 17:56:06 +03:00
Avi Kivity	68034604e1	dht: murmur3_partitioner: simplify moving to and from the zero-based token range	2017-05-17 13:50:30 +03:00
Avi Kivity	76f12a8842	dht: add split_range_to_single_shard() Intersects a shard's owning range with a ring position range, and return the sorted result.	2017-05-17 13:50:27 +03:00
Avi Kivity	a65e8bd215	dht: add a ring-position-range-vector variant of the exponential sharder The "exponentiality" is not carried over from one range to another, because we expect one or two ranges (two ranges result from a wrapped around thrift token range).	2017-05-17 13:18:52 +03:00
Avi Kivity	f671ac13b4	dht: add an exponential ring_position range sharder Like the regular sharder, the exponential sharder divides a range into subranges owned by individual ranges. Unlike the regular sharder, it generates ever-increasing subranges, spanning more and more shards, and eventually returns several subranges per shard. To avoid using exponential cpu and memory, subranges belonging to a single shard are merged, and a flag is set to indicate the subranges are not ordered wrt. each other.	2017-05-17 13:18:49 +03:00
Avi Kivity	025c6b45b2	dht: extend i_partitioner::next_token_for_shard() Right now, next_token_for_shard() only allows iterating linearly in shard order. Add the ability to select a specific shard to skip to (in case we're only interested in a single shard), and to select larger ranges (so that exponential increases are not implemented by iteration).	2017-05-17 12:30:03 +03:00
Avi Kivity	7156ea8804	dht: make ring_position_range_sharder more independent of global_partitioner Useful for testing.	2017-05-17 12:30:03 +03:00
Avi Kivity	302fec8293	dht: make i_partitioner::name() const	2017-05-17 12:30:03 +03:00
Avi Kivity	f462c4327e	dht: make i_partitioner keep track of the number of shards it was configured with Useful for testing classes layered on top of the partitioner (the sharders).	2017-05-17 12:30:03 +03:00
Avi Kivity	04b16ae8ec	dht: fix partitioner initialization for tests The partitioners now depend on smp::count to be initialized correctly, but smp::count isn't available at static initialization time. The scylla executable isn't affected because it calls set_global_partitioner() after smp::count has been initialized. Fix by deferring initialization to the first global_partitioner() call.	2017-05-17 12:30:03 +03:00
Tomasz Grabiec	7db83fa3fe	sstables: index_reader: Optimize advancing to extreme positions	2017-04-20 10:54:38 +02:00
Tomasz Grabiec	c7b9c5dfd3	dht: ring_position_view: Add key getter	2017-04-20 10:54:38 +02:00
Tomasz Grabiec	5b71e0b9ab	dht: ring_position_view: Add constructor and factory from ring_position_view	2017-04-20 10:54:38 +02:00
Avi Kivity	af118ab52b	murmur3_partitioner: fix build on clang Don't know what the root cause it, but the fix is harmless.	2017-04-17 23:03:15 +03:00
Avi Kivity	c05f60387b	i_partitioner: remove unused function Found by clang.	2017-04-17 23:03:15 +03:00
Avi Kivity	a496ec7f5b	byte_ordered_partitioner: fix bad operator precedence Found by clang.	2017-04-17 23:03:15 +03:00
Tomasz Grabiec	d4b6e430ed	dht: Introduce ring_position_view	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	55a7cceef5	dht: Move comparison logic from ring_position::tri_compare() to ring_position_comparator It will soon define common ordering for many objects, not just ring_position.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	65a8920b25	dht: Make min/max tokens capturable by reference So that they can be later used in views.	2017-03-28 18:10:39 +02:00
Avi Kivity	54b8acdd9f	dht: add hashing and comparison helpers to dht::decorarted_key An std::hash specialization, and an equality comparator.	2017-01-20 11:24:14 +02:00
Avi Kivity	141048e0e5	dht: improve token hash function For a small token, we can just return it, since it already is a hash. We hash large tokens using murmur3, which is supposedly a good hash.	2017-01-20 11:24:14 +02:00
Avi Kivity	8686a59ea5	dht: use nonwrapping_ranges in ring_position_range_sharder It was the observation that ring_position_range_sharder doesn't support wrapping ranges that started the nonwrapping_range madness, but that class still has some leftover wrapping ranges. Close the circle by removing them. Message-Id: <20161123153113.8944-1-avi@scylladb.com>	2016-12-22 14:40:30 +01:00

1 2 3 4 5

207 Commits