scylladb

Author	SHA1	Message	Date
Kefu Chai	57b14220ce	tree: remove unused "#include"s these unused includes were identified by clang-include-cleaner. after auditing these source files, all of the reports have been confirmed. in which, instead of using `seastarx.hh`, `readers/mutation_reader.hh`, use `using seastar::future` to include `future` in the global namespace, this makes `readers/mutation_reader.hh` a header exposing `future<>`, but this is not a good practice, because, unlike `seastarx.hh` or `seastar/core/future.hh`, `reader/mutation_reader.hh` is not responsible for exposing seastar declarations. so, we trade the using statement for `#include "seastarx.hh"` in that file to decouple the source files including it from this header because of this statement. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22439	2025-01-28 14:12:06 +03:00
Gleb Natapov	315db647dd	consistency_level: drop templates since the same types of ranges are used by all the callers	2025-01-16 16:37:06 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Gleb Natapov	6116751e44	db: consistency_level: change is_sufficient_live_nodes to work on host ids It is called from storage proxy which works on host ids now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	ccbfabb858	db: consistency_level: move filter_for_query to host id It is called from storage proxy which works on host ids now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	12937aeb7f	storage_proxy: move to addressing nodes by host ids instead of ips In this rather large path we mode to address nodes in storage proxy by host ids instead of ips. Some subsystems storage proxy calls to are not yet converted to host ids, so we translate back and forth when we interact with them.	2024-12-02 10:31:11 +02:00
Kefu Chai	6ead5a4696	treewide: move log.hh into utils/log.hh the log.hh under the root of the tree was created keep the backward compatibility when seastar was extracted into a separate library. so log.hh should belong to `utils` directory, as it is based solely on seastar, and can be used all subsystems. in this change, we move log.hh into utils/log.hh to that it is more modularized. and this also improves the readability, when one see `#include "utils/log.hh"`, it is obvious that this source file needs the logging system, instead of its own log facility -- please note, we do have two other `log.hh` in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 06:54:46 +03:00
Kefu Chai	ee36358a60	db: remove unused includes these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. please note, since we have `using seastar::shared_ptr` in `seastarx.h`, this renders `#include <seastar/core/shared_ptr.hh>` unnecessary if we don't need the full definition of `seastar::shared_ptr`. so, in this change, all the unused includes are removed. but there are some headers which are actually used, while still being identified by this tool. these includes are marked with "IWYU pragma: keep". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-04 20:48:18 +08:00
Kamil Braun	0e36377f56	db: consistency_level: remove overload of `filter_for_query` Not used anymore after the previous commit.	2023-06-14 11:41:36 +02:00
Pavel Emelyanov	64c9359443	storage_proxy: Don't use default-initialized endpoint in get_read_executor() After calling filter_for_query() the extra_replica to speculate to may be left default-initialized which is :0 ipv6 address. Later below this address is used as-is to check if it belongs to the same DC or not which is not nice, as :0 is not an address of any existing endpoint. Recent move of dc/rack data onto topology made this place reveal itself by emitting the internal error due to :0 not being present on the topology's collection of endpoints. Prior to this move the dc filter would count :0 as belonging to "default_dc" datacenter which may or may not match with the dc of the local node. The fix is to explicitly tell set extra_replica from unset one. fixes: #11825 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11833	2022-10-25 09:16:50 +03:00
Avi Kivity	46bd0b1e62	consistency_level: accept effective_replication_map as parameter, rather than keyspace A keyspace is a mutable object that can change from time to time. An effective_replication_map captures the state of a keyspace at a point in time and can therefore be consistent (with care from the caller). Change consistency_level's functions to accept an effective_replication_map. This allows the caller to ensure that separate calls use the same information and are consistent with each other. Current callers are likely correct since they are called from one continuation, but it's better to be sure.	2022-08-11 17:58:42 +03:00
Botond Dénes	fbbe2529c1	Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov " There are several helpers in this .cc file that need to get datacenter for endpoints. For it they use global snitch, because there's no other place out there to get that data from. The whole dc/rack info is now moving to topology, so this set patches the consistency_level.cc to get the topology. This is done two ways. First, the helpers that have keyspace at hand may get the topology via ks's effective_replication_map. Two difficult cases are db::is_local() and db.count_local_endpoints() because both have just inet_address at hand. Those are patched to be methods of topology itself and all their callers already mess with token metadata and can get topology from it. " * 'br-consistency-level-over-topology' of https://github.com/xemul/scylla: consistency_level: Remove is_local() and count_local_endpoints() storage_proxy: Use topology::local_endpoints_count() storage_proxy: Use proxy's topology for DC checks storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver storage_proxy: Use topology local_dc_filter in its methods storage_proxy: Mark some digest_read_resolver methods private forwarding_service: Use topology local_dc_filter storage_service: Use topology local_dc_filter consistency_level: Use topology local_dc_filter consitency-level: Call count_local_endpoints from topology consistency_level: Get datacenter from topology replication_strategy: Remove hold snitch reference effective_replication_map: Get datacenter from topology topology: Add local-dc detection shugar	2022-08-05 13:31:55 +03:00
Pavel Emelyanov	c3718b7a6e	consistency_level: Remove is_local() and count_local_endpoints() No code uses them now -- switched to use topology -- so thse two can be dropped together with their calls for global snitch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:48 +03:00
Kamil Braun	a9fd156a1b	db: consistency_level: `filter_for_query`: take `const gossiper&`	2022-08-04 12:19:38 +02:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Pavel Emelyanov	11c99fc41b	table: Don't use global gossiper The table::get_hit_rate needs gossiper to get hitrates state from. There's no way to carry gossiper reference on the table itself, so it's up to the callers of that method to provide it. Fortunately, there's only one caller -- the proxy -- but the call chain to carry the reference it not very short ... oh, well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:33:08 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Avi Kivity	4d70f3baee	storage_proxy: change unordered_set<inet_address> to small_vector in write path The write paths in storage_proxy pass replica sets as std::unordered_set<gms::inet_address>. This is a complex type, with N+1 allocations for N members, so we change it to a small_vector (via inet_address_vector_replica_set) which requires just one allocation, and even zero when up to three replicas are used. This change is more nuanced than the corresponding change to the read path `abe3d7d7` ("Merge 'storage_proxy: use small_vector for vectors of inet_address' from Avi Kivity"), for two reasons: - there is a quadratic algorithm in abstract_write_response_handler::response(): it searches for a replica and erases it. Since this happens for every replica, it happens N^2/2 times. - replica sets for writes always include all datacenters, while reads usually involve just one datacenter. So, a write to a keyspace that has 5 datacenters will invoke 15*(15-1)/2 =105 compares. We could remove this by sending the index of the replica in the replica set to the replica and ask it to include the index in the response, but I think that this is unnecessary. Those 105 compares need to be only 105/15 = 7 times cheaper than the corresponding unordered_set operation, which they surely will. Handling a response after a cross-datacenter round trip surely involves L3 cache misses, and a small_vector reduces these to a minimum compared to an unordered_set with its bucket table, linked list walking and managent, and table rehashing. Tests using perf_simple_query --write --smp 1 --operations-per-shard 1000000 --task-quota-ms show two allocations removed (as expected) and a nice reduction in instructions executed. before: median 204842.54 tps ( 54.2 allocs/op, 13.2 tasks/op, 49890 insns/op) after: median 206077.65 tps ( 52.2 allocs/op, 13.2 tasks/op, 49138 insns/op) Closes #8847	2021-06-17 13:46:40 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Avi Kivity	c71d007797	consistency_level: deinline assure_sufficient_live_nodes() assure_sufficient_live_nodes() is a huge template calling other huge templates, and requires "network_topology_strategy.hh". It is inlined in consistency_level.hh. This increases compile time and recompiles. Move the template out-of-line and use "extern template" to instantiate it. This is not ideal as new callers would require updates to the instantiated signatures, but I think our goal should be to de-template it completely instead. Meanwhile, this reduces some pain. Ref #1. Closes #8637	2021-05-19 15:03:51 +03:00
Avi Kivity	cea5493cb7	storage_proxy, treewide: introduce names for vectors of inet_address storage_proxy works with vectors of inet_addresses for replica sets and for topology changes (pending endpoints, dead nodes). This patch introduces new names for these (without changing the underlying type - it's still std::vector<gms::inet_address>). This is so that the following patch, that changes those types to utils::small_vector, will be less noisy and highlight the real changes that take place.	2021-05-05 18:36:48 +03:00
Gleb Natapov	0b84b04f97	consistency_level: make it more const correct Message-Id: <20190214122631.GF19055@scylladb.com>	2019-02-14 14:52:51 +02:00
Avi Kivity	2c08bff8d5	Split consistency_level.hh header It has two unrelated users: cql for validation, and storage_proxy for complicated calculations. Split the simple stuff into a new header to reduce dependencies.	2018-11-27 13:32:10 +02:00
Botond Dénes	6e59cee244	db::consistency_level::filter_for_query() add preferred_endpoints To the second overload (the one without read-repair related params) too.	2018-09-03 10:31:44 +03:00
Botond Dénes	aaf67bcbaa	Consider preferred replicas when choosing endpoints for query_singular() Propagate the preferred_replicas to db::filter_for_query() and consider them when selecting the endpoints. The algoritm for selecting the endpoints is as follows: * Compute the intersection of the endpoint candidates and the preferred endpoints. * If this yields a set of endpoints that already satisfies the CL requirements use this set. * Otherwise select the remaining endpoints according to the load-balancing strategy, just like before.	2018-03-13 10:34:34 +02:00
Gleb Natapov	357c77a333	consistency_level: constify quorum_for() and local_quorum_for()	2017-12-05 13:01:20 +02:00
Gleb Natapov	739dd878e3	consistency_level: report less live endpoints in Unavailable exception if there are pending nodes DowngradingConsistencyRetryPolicy uses live replicas count from Unavailable exception to adjust CL for retry, but when there are pending nodes CL is increased internally by a coordinator and that may prevent retried query from succeeding. Adjust live replica count in case of pending node presence so that retried query will be able to proceed. Fixes #2535 Message-Id: <20170710085238.GY2324@scylladb.com>	2017-07-11 16:51:56 +03:00
Gleb Natapov	87094849fa	storage_proxy: load balance read requests according to cache hit rates This patch makes storage proxy to choose replicas to read from base on their cache hit rates. Replicas with higher cache hit rates will see more requests while replicas with lower hit rates will see less. Local node has a special bonus and will get more requests even if another node has slightly higher cache hit rate (same goes for local vs remote DC), but after the patch it is no longer guarantied that a coordinator node will be chosen as a replica for the read (if the feature is enabled).	2017-06-13 09:57:14 +03:00
Gleb Natapov	bc8aa1b4ee	choose extra replica for speculation in filter_for_query() Currently storage proxy has to loop over remaining replicas to search for suitable extra replica, but doing it in filter_for_query() is extremely easy, so do it there instead.	2017-06-13 09:57:14 +03:00
Gleb Natapov	8437ea3b99	consistency_level: drop filter_for_query_dc_local function Merge filter_for_query_dc_local() functionality into filter_for_query(). This is more efficient since filter_for_query_dc_local() partitions endpoints into 'local' and 'remote' set but filter_for_query() already does it for CL=LOCAL so for such queries we needlessly do it twice.	2017-06-13 09:57:14 +03:00
Tomasz Grabiec	ddfee57c97	Replace iostream include with iosfwd in headers Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00
Gleb Natapov	dbb1217896	cl: enable logging for insufficient LOCAL_QUORUM consistency Message-Id: <1460549369-29523-2-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:58 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	f59415b3c6	Take pending endpoints into account while checking for sufficient live nodes During bootstrapping additional copies of data has to be made to ensure that CL level is met (see CASSANDRA-833 for details). Our code does that, but it does not take into account that bootstraping node can be dead which may cause request to proceed even though there is no enough live nodes for it to be completed. In such a case request neither completes nor timeouts, so it appear to be stuck from CQL layer POV. The patch fixes this by taking into account pending nodes while checking that there are enough sufficient live nodes for operation to proceed. Fixes #965 Message-Id: <20160303165250.GG2253@scylladb.com>	2016-03-07 13:30:13 +01:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Gleb Natapov	17e54d0604	add logger for consistency level calculation	2015-09-13 11:59:17 +03:00
Pekka Enberg	0b8c67ed79	exceptions: Move unavailable_exception to exceptions.hh Move unavailable_exception to exceptions.hh where other CQL transport level exceptions are defined in. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 10:06:18 +03:00
Pekka Enberg	055e25ed43	db/consistency_level: Move enum to separate header Move 'consistency_level' enumeration to a separate header file to fix dependency issues that arise when we move 'unavailable_exception' to exceptions.hh. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 10:06:18 +03:00
Pekka Enberg	7fc1311d4a	db/consistency_level: Move implementation to .cc file Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 10:06:18 +03:00
Pekka Enberg	32378708d0	db/consistency_level: Remove ifdef'd code Cleanup consistency_level.hh by removing untranslated code that's been sitting in the tree for a while. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 09:36:36 +03:00
Pekka Enberg	826f21643f	transport/server: Fix UNAVAILABLE error encoding This fixes UNAVAILABLE error encoding to follow the CQL binary protocol spec. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 09:27:52 +03:00
Gleb Natapov	f122ee39b9	storage_proxy: return proper error codes to transport layer Transport layer expects to get error code in an exception of type exceptions::cassandra_exception. Fix code to use it as a base for all user visible exceptions and put correct error code there.	2015-07-23 12:32:21 +03:00
Vlad Zolotarov	45ce351f60	db: consistency_level.hh: added is_sufficient_live_nodes() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-07-05 17:34:56 +03:00
Vlad Zolotarov	501737cb84	db: consistency_level.hh: Complete the implementation of filter_for_query() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Use std::partition_copy() and boost::range::algorithm::partition(). - Don't use std::move() when returning a local vector variable.	2015-07-05 17:34:50 +03:00
Vlad Zolotarov	a9a3bd1927	db: consistency_level.hh: Styling in filter_for_query() - Make live_endpoints.erase() call more readable. - Adjust the comments to our naming. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-07-05 17:15:23 +03:00
Vlad Zolotarov	77c50dc013	db: consistency_level.hh: complete assure_sufficient_live_nodes() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Use static_cast instead of a dynamic_cast.	2015-07-02 16:00:17 +03:00

1 2

61 Commits