scylladb

Author	SHA1	Message	Date
Benny Halevy	57ff3f240f	dht: optimize subtract_ranges Take advantage of the fact that both ranges and ranges_to_subtract are deoverlapped and sorted by to reduce the calculation complexity from quadratic to linear. Fixes #11922 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:48:28 +02:00
Benny Halevy	8b81635d95	compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation The algorithm is generic and can be used elsewhere. Add a unit test for the function before it gets optimized in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:48:26 +02:00
Benny Halevy	10f8f13b90	db: view_update_generator: always clean up staging sstables Since they are currently not cleaned up by cleanup compaction filter their tokens, processing only tokens owned by the current node (based on the keyspace replication strategy). Refs #9559 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:38:22 +02:00
Benny Halevy	fd3e66b0cc	compaction: extract incremental_owned_ranges_checker out to dht It is currently used by cleanup_compaction partition filter. Factor it out so it can be used to filter staging sstables in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:32:56 +02:00
Botond Dénes	169a8a66f2	compatible_ring_position_or_view: make it cheap to copy This class exists for one purpose only: to serve as glue code between dht::ring_position and boost::icl::interval_map. The latter requires that keys in its intervals are: * default constructible * copyable * have standalone compare operations For this reason we have to wrap `dht::ring_position` in a class, together with a schema to provide all this. This is `compatible_ring_position`. There is one further requirement by code using the interval map: it wants to do lookups without copying the lookup key(s). To solve this, we came up with `compatible_ring_position_or_view` which is a union of a key or a key view + schema. As we recently found out, boost::icl copies its keys a lot. It seems to assume these keys are cheap to copy and carelessly copies them around even when iterating over the map. But `compatible_ring_position_or_view` is not cheap to copy as it copies a `dht::ring_position` which allocates, and it does that via an `std::optional` and `std::variant` to add insult to injury. This patch make said class cheap to copy, by getting rid of the variant and storing the `dht::ring_position` via a shared pointer. The view is stored separately and either points to the ring position stored in the shared pointer or to an outside ring position (for lookups). Fixes: #11669 Closes #11670	2022-10-04 12:00:21 +03:00
Asias He	9ed401c4b2	streaming: Add finished percentage metrics for node ops using streaming We have added the finished percentage for repair based node operations. This patch adds the finished percentage for node ops using the old streaming. Example output: scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945 scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000 In addition to the metrics, log shows the percentage is added. [shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646 Fixes #11600 Closes #11601	2022-09-22 14:19:34 +03:00
Pavel Emelyanov	b6fdea9a79	code: Call sort_endpoints_by_proximity() via topology The method is about to be moved from snitch to topology, this patch prepares the rest of the code to use the latter to call it. The topology's method just calls snitch, but it's going to change in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:14:01 +03:00
Pavel Emelyanov	4184091f1c	snitch, code: Remove get_sorted_list_by_proximity() There are two sorting methods in snitch -- one sorts the list of addresses in place, the other one creates a sorted copy of the passed const list (in fact -- the passed reference is not const, but it's not modified by the method). However, both callers of the latter anyway create their own temporary list of address, so they don't really benefit from snitch generating another copy. So this patch leaves just one sorting method -- the in-place one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:11:37 +03:00
Pavel Emelyanov	6dedc69608	topology: Do not add bootstrapping nodes to topology Recent change in topology (commit `4cbe6ee9` titled "topology: Require entry in the map for update_normal_tokens()") made token_metadata::update_normal_tokens() require the entry presense in the embedded topology object. Respectively, the commit in question equipped most callers of update_normal_tokens() with preceeding topology update call to satisfy the requirement. However, tokens are put into token_metadata not only for normal state, but also for bootstrapping, and one place that added bootstrapping tokens errorneously got topology update. This is wrong -- node must not be present in the topology until switching into normal state. As the result several tests with bootstrapping nodes started to fail. The fix removes topology update for bootstrapping nodes, but this change reveals few other places that piggy-backed this mistaken update, so noy _they_ need to update topology themselves. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2040/ update_cluster_layout_tests.py::test_simple_add_new_node_while_schema_changes_with_repair update_cluster_layout_tests.py::test_simple_kill_new_node_while_bootstrapping_with_parallel_writes_in_multidc repair_based_node_operations_test.py::test_lcs_reshape_efficiency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220902082753.17827-1-xemul@scylladb.com>	2022-09-04 13:53:38 +03:00
Pavel Emelyanov	7305061674	replication_strategy: Accept dc-rack as get_pending_address_ranges argument The method creates a copy of token metadata and pushes an endpoint (with some tokens) into it. Next patches will require providing dc/rack info together with the endpoint, this patch prepares for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:39:44 +03:00
Pavel Emelyanov	360c4f8608	dht: Carry dc-rack over boot_strapper and range_streamer Both classes may populate (temporarly clones of) token metadata object with endpoint:tokens pairs for the endpoint they work with. Next patches will require that endpoint comes with the dc/rack info. This patch makes sure dht classes have the necessary information at hand (for now it's just empty pair of strings). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:37:02 +03:00
Benny Halevy	91ab8ee1c3	effective_replication_map: make get_range_addresses asynchronous So it may yield, preenting reactor stalls as seen in #11005. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	9b2af3f542	range_streamer: add_ranges and friends: get erm as param Rather than getting it in the callee, let the caller (e.g. storage_service) hold the erm and pass it down to potentially multiple async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	7ee6048255	database: add get_non_local_strategy_keyspaces For node operations, we currently call get_non_system_keyspaces but really want to work on all keyspace that have non-local replication strategy as they are replicated on other nodes. Reflect that in the replica::database function name. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Kamil Braun	ff4ecfa182	dht: boot_strapper: check if keyspace still exists in `bootstrap` While we're iterating over the fetched keyspace names, some of these keyspaces may get dropped. Handle that by checking if the keyspace still exists. Also, when retrieving the replication strategy from the keyspace, store the pointer (which is an `lw_shared_ptr`) to the strategy to keep it alive, in case the keyspace that was holding it gets dropped. Closes #10861	2022-06-27 19:13:46 +02:00
Pavel Emelyanov	5e2fa32c8c	range_streamer: Get rack/datacenter from topology It's needed in source filter classes so range-streamer passes the topology reference into its methods. Nice side effect -- snitch header goes away from range-streamer one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:26 +03:00
Asias He	1f8b529e08	range_streamer: Disable restream logic Consider: - n1 and n2 in the cluster - n3 bootstraps to join - n1 does not hear gossip update from n3 due to network issue - n1 removes n3 from gossip and pending node list - stream between n1 and n3 fails - n1 and n3 network issue is fixed - n3 retry the stream with n1 - n3 finishes the stream with n1 - n3 advertises normal to join the cluster The problem is that n1 will not treat n3 as the pending node so writes will not route to n3 once n1 removes n3. Another problem is that when n1 gets normal gossip status update from n3. The gossip listener will fail because n1 has removed n3 so n1 could not find the host id for n3. This will cause n1 to abort. To fix, disable the retry logic in range_streamer so that once a stream with existing fails the bootstrap fails. The downside is that we lose the ability to restream caused by temporary network issue but since we have repair based node operation. We can use it to resume the previous failed node operations. Fixes: #9805 Closes #9806	2022-05-24 11:24:25 +03:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Mikołaj Sielużycki	1d84a254c0	flat_mutation_reader: Split readers by file and remove unnecessary includes. The flat_mutation_reader files were conflated and contained multiple readers, which were not strictly necessary. Splitting optimizes both iterative compilation times, as touching rarely used readers doesn't recompile large chunks of codebase. Total compilation times are also improved, as the size of flat_mutation_reader.hh and flat_mutation_reader_v2.hh have been reduced and those files are included by many file in the codebase. With changes real 29m14.051s user 168m39.071s sys 5m13.443s Without changes real 30m36.203s user 175m43.354s sys 5m26.376s Closes #10194	2022-03-14 13:20:25 +02:00
Nadav Har'El	bc4d0fd5ad	murmur3: fix inconsistent token for empty partition key Traditionally in Scylla and in Cassandra, an empty partition key is mapped to minimum_token() instead of the empty key's usual hash function (0). The reasons for this are unknown (to me), but one possibility is that having one known key that maps to the minimal token is useful for various iterations. In murmur3_partitioner.cc we have two variants of the token calculation function - the first is get_token(bytes_view) and the second is get_token(schema, partition_key_view). The first includes that empty- key special case, but the second was missing this special case! As Kamil first noted in #9352, the second variant is used when looking up partitions in the index file - so if a partition with an empty-string key is saved under one token, it will be looked up under a different token and not found. I reproduced exactly this problem when fixing issues #9364 and #9375 (empty-string keys in materialized views and indexes) - where a partition with an empty key was visible in a full-table scan but couldn't be found by looking up its key because of the wrong index lookup. I also tried an alternative fix - changing both implementations to return minimum_token (and not 0) for the empty key. But this is undesirable - minimum_token is not supposed to be a valid token, so the tokenizer and sharder may not return a valid replica or shard for it, so we shouldn't store data under such token. We also have have code (such as an increasing- key sanity check in the flat mutation reader) which assumes that no real key in the data can be minimum_token, and our plan is to start allowing data with an empty key (at least for materialized views). This patch does not risk a backward-incompatible disk format changes for two reasons: 1. In the current Scylla, there was no valid case where an empty partition key may appear. CQL and Thrift forbid such keys, and materialized-views and indexes also (incorrectly - see #9364, #9375) drop such rows. 2. Although Cassandra does allow empty partition keys, they is only allowed in materialized views and indexes - and we don't support reading materialized views generated by Cassandra (the user must re-generate them in Scylla). When #9364 and #9375 will be fixed by the next patch, empty partition keys will start appearing in Scylla (in materialized views and in the materialized view backing a secondary index), and this fix will become important. Fixes #9352 Refs #9364 Refs #9375 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-03-08 14:15:03 +02:00
Avi Kivity	6572b297a2	treewide: clean up stray license blurbs After the mechanical change in `fcb8d040e8` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), a few stray license blurbs or fragments thereof remain. In two cases these were extra blurbs in code generators intended for the generated code, in others they were just missed by the script. Clean them up, adding an SPDX license identifier where needed. Closes #10072	2022-02-13 14:16:16 +02:00
Pavel Emelyanov	469ded71a9	bootstrapper: Get 'is-replacing' via argument too This also removes the only usage of this helper outside of the storage service. The place that needs it is the use_strict_sources_for_ranges() checker and all the callers of it are aware of whether it's replacing happenning or not. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-07 12:41:02 +03:00
Pavel Emelyanov	9770f54789	bootstrapper: Get replace address via argument This removes the only usage of db.get_replace_address outside of storage service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-07 12:39:51 +03:00
Pavel Emelyanov	1525c04db3	dht: Use db::config to generate initial tookens The replica::database is passed into the helper just to get the config from. Better to use config directly without messing with the database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-27 16:41:29 +03:00
Pavel Emelyanov	77532a6a36	database, dht: Move get_initial_tokens() The helper in question has nothing to do with replica/database and is only used by dht to convert config option to a set of tokens. It sounds like the helper deserves living where it's needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-27 16:41:29 +03:00
Pavel Emelyanov	50170366ea	storage_service: Factor out random/config tokens generation There's a place in normal node start that parses the initial_token option or generates num_tokens random tokens. This code is used almost unchanged since being ported from its java version. Later there appeared the dht::get_bootstrap_token() with the same internal logic. This patch generalizes these two places. Logging messages are unified too (dtest seem not to check those). The change improves a corner case. The normal node startup code doesn't check if the initial_token is empty and num_tokens is 0 generating empty bootstrap_tokens set. It fails later with an obscure 'remove_endpoint should be used instead' message. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-27 16:41:29 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Nadav Har'El	6012f6f2b6	build performance: do not include <seastar/net/ip.hh> In a previous patch, we noticed that the header file <gm/inet_address.hh>, which is included, directly or indirectly, by most source files, includes <seastar/net/ip.hh> which is very slow to compile, and replaced it by the much faster-to-include <seastar/net/ipv[46]_address.hh>. However, we also included <seastar/net/ip.hh> in types.hh - and that too is included by almost every file, so the actual saving from the above patch was minimal. So in this patch we replace this include too. After this patch Scylla does not include <seastar/net/ip.hh> at all. According to ClangBuildAnalyzer, this reduces the average time to include types.hh (multiply this by 312 times!) from 4 seconds to 1.8 seconds, and reduces total build time (dev mode) by about 3%. Some of the source files were now missing some include directives, that were previously included in ip.hh - so we need to add those explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-01-05 17:29:21 +02:00
Pavel Emelyanov	831f18e392	dht: Pass gossiper to range_streamer::add_ranges A continuation of the previous patch. The range_streamer needs gossiper too, and is called from boot_strapper and storage_service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:54:16 +03:00
Pavel Emelyanov	6a2f6068cb	dht: Pass gossiper argument to bootstrap The boot_strapper::bootstrap needs gossiper and is called only from the storage_service code that has it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:53:56 +03:00
Pavel Emelyanov	3087422d4d	stream_plan: Keep stream_manager onboard The plan itself doesn't need it, but it creates some lower level objects that do. Next patches will use this reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:17:37 +03:00
Pavel Emelyanov	c593f8624d	dht: Keep stream_manager on board This is the preparation for the future patching. The stream_plan creation will need the manager reference, so keep one on dht object in advance. These are only created from the storage service bootstrap code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:17:37 +03:00
Pavel Emelyanov	5877b84a1a	range_streamer: Remove stream_plan from The streamer creates stream_plan "on demand" and doesnt use the on-board one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20211112180335.27831-1-xemul@scylladb.com>	2021-11-12 19:38:45 +01:00
Benny Halevy	17296cba4b	effective_replication_map: add get_range_addresses Equivalent to abstract_replication_strategy get_range_addresses, yet synchronous, as it uses the precalculated map. Call it from storage_service::get_new_source_ranges and range_streamer::get_all_ranges_with_sources_for. Consequently, get_new_source_ranges and removenode_add_ranges can become synchronous too. Unfortunately we can't entirely get rid of abstract_replication_strategy::get_range_addresses as it's still needed by range_streamer::get_all_ranges_with_strict_sources_for. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	4d2561ff75	abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	cbe58345b9	abstract_replication_strategy: futurize get_*address_ranges Remaining callers of get_address_ranges and get_pending_address_ranges are all either from a seastar thread or from a coroutine so we can make the methods always async and drop the can_yield param. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	91581ba23a	abstract_replication_strategy: futurize get_range_addresses All remaining use sites are called in a seastar thread so we drop the can_yield param and make get_range_addresses always async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	d96a67eb57	abstract_replication_strategy: use shared_ptr in registry Enable creating shared_ptr<BaseClass> in nonstatic_class_registry using BaseClass::ptr_type and use that for abstract_replication_strategy. While at it, also clean up compressor with that respect to define compressor::ptr_type as shared_ptr<compressor> thus simplifying compressor_registry. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 12:39:36 +03:00
Benny Halevy	7498ac4869	dht: boot_strapper: bootstrap: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210923144206.1690576-2-bhalevy@scylladb.com>	2021-09-26 11:09:01 +03:00
Benny Halevy	798aee6747	dht: boot_strapper: coroutinize bootstrap Prepare for futurizing get_pending_address_ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210923144206.1690576-1-bhalevy@scylladb.com>	2021-09-26 11:09:01 +03:00
Avi Kivity	daf028210b	build: enable -Winconsistent-missing-override warning This warning can catch a virtual function that thinks it overrides another, but doesn't, because the two functions have different signatures. This isn't very likely since most of our virtual functions override pure virtuals, but it's still worth having. Enable the warning and fix numerous violations. Closes #9347	2021-09-15 12:55:54 +03:00
Avi Kivity	0909e3c17d	treewide: remove redundant "x <=> 0" compares If x is of type std::strong_ordering, then "x <=> 0" is equivalent to x. These no-ops were inserted during #1449 fixes, but are now unnecessary. They have potential for harm, since they can hide an accidental of the type of x to an arithmetic type, so remove them. Ref #1449.	2021-07-28 13:30:32 +03:00
Avi Kivity	8a80e455fb	sstables: keys: convert trichotomic comparisons to std::strong_ordering Prevent accidental conversions to bool from yielding the wrong results. Unprepared users (that converted to bool, or assigned to int) are adjusted. Ref #1449 Test: unit (dev) Closes #9088	2021-07-26 19:09:19 +03:00
Juliusz Stasiewicz	a8b741efe2	endpoint_details: store `_host` as `gms::inet_address` In an upcoming commit I will add "system.describe_ring" table which uses endpoint's inet address as a part of CK and, therefore, needs to keep them sorted with `inet_addr_type::less`.	2021-07-20 14:00:54 +02:00
Tomasz Grabiec	06e373e272	sstables: index_reader: Keep index objects under LSA In preparation for caching index objects, manage them under LSA. Implementation notes: key_view was changed to be a view on managed_bytes_view instead of bytes, so it now can be fragmented. Old users of key_view now have to linearize it. Actual linearization should be rare since partition keys are typically small. Index parser is now not constructing the index_entry directly, but produces value objects which live in the standard allocator space: class parsed_promoted_index_entry; calss parsed_partition_index_entry; This change was needed to support consumers which don't populate the partition index cache and don't use LSA, e.g. sstable::generate_summary(). It's now consumer's responsibility to allocate index_entry out of parsed_partition_index_entry.	2021-07-02 19:02:14 +02:00
Avi Kivity	0048c404d2	Merge 'dht: token: make some cosmetic changes' from Michał Chojnowski This is a set of a few cosmetic changes in dht/token. Mostly some comments and a simplification of `midpoint()`. Closes #8803 * github.com:scylladb/scylla: dht: token: add a comment excusing the `const bytes&` constructor dht: token: simplify midpoint() dht: token: add a comment to normalize() dht: token: use {read,write}_unaligned instead of std::copy_n dht: token-sharding: fix a typo in a comment	2021-06-07 15:41:15 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00

1 2 3 4 5 ...

419 Commits