scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	11c99fc41b	table: Don't use global gossiper The table::get_hit_rate needs gossiper to get hitrates state from. There's no way to carry gossiper reference on the table itself, so it's up to the callers of that method to provide it. Fortunately, there's only one caller -- the proxy -- but the call chain to carry the reference it not very short ... oh, well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:33:08 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Avi Kivity	4d70f3baee	storage_proxy: change unordered_set<inet_address> to small_vector in write path The write paths in storage_proxy pass replica sets as std::unordered_set<gms::inet_address>. This is a complex type, with N+1 allocations for N members, so we change it to a small_vector (via inet_address_vector_replica_set) which requires just one allocation, and even zero when up to three replicas are used. This change is more nuanced than the corresponding change to the read path `abe3d7d7` ("Merge 'storage_proxy: use small_vector for vectors of inet_address' from Avi Kivity"), for two reasons: - there is a quadratic algorithm in abstract_write_response_handler::response(): it searches for a replica and erases it. Since this happens for every replica, it happens N^2/2 times. - replica sets for writes always include all datacenters, while reads usually involve just one datacenter. So, a write to a keyspace that has 5 datacenters will invoke 15*(15-1)/2 =105 compares. We could remove this by sending the index of the replica in the replica set to the replica and ask it to include the index in the response, but I think that this is unnecessary. Those 105 compares need to be only 105/15 = 7 times cheaper than the corresponding unordered_set operation, which they surely will. Handling a response after a cross-datacenter round trip surely involves L3 cache misses, and a small_vector reduces these to a minimum compared to an unordered_set with its bucket table, linked list walking and managent, and table rehashing. Tests using perf_simple_query --write --smp 1 --operations-per-shard 1000000 --task-quota-ms show two allocations removed (as expected) and a nice reduction in instructions executed. before: median 204842.54 tps ( 54.2 allocs/op, 13.2 tasks/op, 49890 insns/op) after: median 206077.65 tps ( 52.2 allocs/op, 13.2 tasks/op, 49138 insns/op) Closes #8847	2021-06-17 13:46:40 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Avi Kivity	c71d007797	consistency_level: deinline assure_sufficient_live_nodes() assure_sufficient_live_nodes() is a huge template calling other huge templates, and requires "network_topology_strategy.hh". It is inlined in consistency_level.hh. This increases compile time and recompiles. Move the template out-of-line and use "extern template" to instantiate it. This is not ideal as new callers would require updates to the instantiated signatures, but I think our goal should be to de-template it completely instead. Meanwhile, this reduces some pain. Ref #1. Closes #8637	2021-05-19 15:03:51 +03:00
Avi Kivity	cea5493cb7	storage_proxy, treewide: introduce names for vectors of inet_address storage_proxy works with vectors of inet_addresses for replica sets and for topology changes (pending endpoints, dead nodes). This patch introduces new names for these (without changing the underlying type - it's still std::vector<gms::inet_address>). This is so that the following patch, that changes those types to utils::small_vector, will be less noisy and highlight the real changes that take place.	2021-05-05 18:36:48 +03:00
Gleb Natapov	0b84b04f97	consistency_level: make it more const correct Message-Id: <20190214122631.GF19055@scylladb.com>	2019-02-14 14:52:51 +02:00
Avi Kivity	2c08bff8d5	Split consistency_level.hh header It has two unrelated users: cql for validation, and storage_proxy for complicated calculations. Split the simple stuff into a new header to reduce dependencies.	2018-11-27 13:32:10 +02:00
Botond Dénes	6e59cee244	db::consistency_level::filter_for_query() add preferred_endpoints To the second overload (the one without read-repair related params) too.	2018-09-03 10:31:44 +03:00
Botond Dénes	aaf67bcbaa	Consider preferred replicas when choosing endpoints for query_singular() Propagate the preferred_replicas to db::filter_for_query() and consider them when selecting the endpoints. The algoritm for selecting the endpoints is as follows: * Compute the intersection of the endpoint candidates and the preferred endpoints. * If this yields a set of endpoints that already satisfies the CL requirements use this set. * Otherwise select the remaining endpoints according to the load-balancing strategy, just like before.	2018-03-13 10:34:34 +02:00
Gleb Natapov	357c77a333	consistency_level: constify quorum_for() and local_quorum_for()	2017-12-05 13:01:20 +02:00
Gleb Natapov	739dd878e3	consistency_level: report less live endpoints in Unavailable exception if there are pending nodes DowngradingConsistencyRetryPolicy uses live replicas count from Unavailable exception to adjust CL for retry, but when there are pending nodes CL is increased internally by a coordinator and that may prevent retried query from succeeding. Adjust live replica count in case of pending node presence so that retried query will be able to proceed. Fixes #2535 Message-Id: <20170710085238.GY2324@scylladb.com>	2017-07-11 16:51:56 +03:00
Gleb Natapov	87094849fa	storage_proxy: load balance read requests according to cache hit rates This patch makes storage proxy to choose replicas to read from base on their cache hit rates. Replicas with higher cache hit rates will see more requests while replicas with lower hit rates will see less. Local node has a special bonus and will get more requests even if another node has slightly higher cache hit rate (same goes for local vs remote DC), but after the patch it is no longer guarantied that a coordinator node will be chosen as a replica for the read (if the feature is enabled).	2017-06-13 09:57:14 +03:00
Gleb Natapov	bc8aa1b4ee	choose extra replica for speculation in filter_for_query() Currently storage proxy has to loop over remaining replicas to search for suitable extra replica, but doing it in filter_for_query() is extremely easy, so do it there instead.	2017-06-13 09:57:14 +03:00
Gleb Natapov	8437ea3b99	consistency_level: drop filter_for_query_dc_local function Merge filter_for_query_dc_local() functionality into filter_for_query(). This is more efficient since filter_for_query_dc_local() partitions endpoints into 'local' and 'remote' set but filter_for_query() already does it for CL=LOCAL so for such queries we needlessly do it twice.	2017-06-13 09:57:14 +03:00
Tomasz Grabiec	ddfee57c97	Replace iostream include with iosfwd in headers Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00
Gleb Natapov	dbb1217896	cl: enable logging for insufficient LOCAL_QUORUM consistency Message-Id: <1460549369-29523-2-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:58 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	f59415b3c6	Take pending endpoints into account while checking for sufficient live nodes During bootstrapping additional copies of data has to be made to ensure that CL level is met (see CASSANDRA-833 for details). Our code does that, but it does not take into account that bootstraping node can be dead which may cause request to proceed even though there is no enough live nodes for it to be completed. In such a case request neither completes nor timeouts, so it appear to be stuck from CQL layer POV. The patch fixes this by taking into account pending nodes while checking that there are enough sufficient live nodes for operation to proceed. Fixes #965 Message-Id: <20160303165250.GG2253@scylladb.com>	2016-03-07 13:30:13 +01:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Gleb Natapov	17e54d0604	add logger for consistency level calculation	2015-09-13 11:59:17 +03:00
Pekka Enberg	0b8c67ed79	exceptions: Move unavailable_exception to exceptions.hh Move unavailable_exception to exceptions.hh where other CQL transport level exceptions are defined in. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 10:06:18 +03:00
Pekka Enberg	055e25ed43	db/consistency_level: Move enum to separate header Move 'consistency_level' enumeration to a separate header file to fix dependency issues that arise when we move 'unavailable_exception' to exceptions.hh. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 10:06:18 +03:00
Pekka Enberg	7fc1311d4a	db/consistency_level: Move implementation to .cc file Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 10:06:18 +03:00
Pekka Enberg	32378708d0	db/consistency_level: Remove ifdef'd code Cleanup consistency_level.hh by removing untranslated code that's been sitting in the tree for a while. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 09:36:36 +03:00
Pekka Enberg	826f21643f	transport/server: Fix UNAVAILABLE error encoding This fixes UNAVAILABLE error encoding to follow the CQL binary protocol spec. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-07-28 09:27:52 +03:00
Gleb Natapov	f122ee39b9	storage_proxy: return proper error codes to transport layer Transport layer expects to get error code in an exception of type exceptions::cassandra_exception. Fix code to use it as a base for all user visible exceptions and put correct error code there.	2015-07-23 12:32:21 +03:00
Vlad Zolotarov	45ce351f60	db: consistency_level.hh: added is_sufficient_live_nodes() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-07-05 17:34:56 +03:00
Vlad Zolotarov	501737cb84	db: consistency_level.hh: Complete the implementation of filter_for_query() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Use std::partition_copy() and boost::range::algorithm::partition(). - Don't use std::move() when returning a local vector variable.	2015-07-05 17:34:50 +03:00
Vlad Zolotarov	a9a3bd1927	db: consistency_level.hh: Styling in filter_for_query() - Make live_endpoints.erase() call more readable. - Adjust the comments to our naming. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-07-05 17:15:23 +03:00
Vlad Zolotarov	77c50dc013	db: consistency_level.hh: complete assure_sufficient_live_nodes() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Use static_cast instead of a dynamic_cast.	2015-07-02 16:00:17 +03:00
Vlad Zolotarov	a4a6c0d69e	db: consistency_level.hh: implement is_local() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-07-02 15:59:40 +03:00
Vlad Zolotarov	ff770a61a5	db: consistency_level.hh: complete block_for() function Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Use static_cast instead of a dynamic_cast.	2015-07-02 15:58:50 +03:00
Vlad Zolotarov	6b609d5b35	db: consistency_level.hh: implement local_quorum_for() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Use static_cast instead of a dynamic_cast.	2015-07-02 15:56:56 +03:00
Gleb Natapov	4b9661c608	initial read clustering code Works only if all replicas (participating in CL) has the same live data. Does not detects mismatch in tombstones (no infrastructure yet). Does not report timeout yet.	2015-07-01 13:36:30 +03:00
Gleb Natapov	969134280a	initial mutation clustering code	2015-06-15 12:53:10 +03:00
Tomasz Grabiec	731a63e371	schema: Embed raw_schema inside schema Public fields got encapsulated.	2015-04-24 18:01:01 +02:00
Tomasz Grabiec	2902395129	Relax includes	2015-03-30 09:01:59 +02:00
Tomasz Grabiec	ac61d7526e	db: Take keyspace name by const&	2015-03-30 09:01:59 +02:00
Avi Kivity	30c3348702	db: add ostream support to consistency_level	2015-03-26 09:34:49 +02:00
Tomasz Grabiec	d4b6f7abc3	cql3: Convert more of ConsistencyLevel	2015-01-29 19:40:07 +01:00
Tomasz Grabiec	612f68b869	db: Convert ConsistencyLevel to C++	2015-01-23 18:45:28 +01:00

46 Commits