Commit Graph

61 Commits

Author SHA1 Message Date
Kefu Chai
57b14220ce tree: remove unused "#include"s
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

in which, instead of using `seastarx.hh`, `readers/mutation_reader.hh`,
use `using seastar::future` to include `future` in the global namespace,
this makes `readers/mutation_reader.hh` a header exposing `future<>`,
but this is not a good practice, because, unlike `seastarx.hh` or
`seastar/core/future.hh`, `reader/mutation_reader.hh`  is not
responsible for exposing seastar declarations. so, we trade the
using statement for `#include "seastarx.hh"` in that file to decouple
the source files including it from this header because of this statement.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22439
2025-01-28 14:12:06 +03:00
Gleb Natapov
315db647dd consistency_level: drop templates since the same types of ranges are used by all the callers 2025-01-16 16:37:06 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Gleb Natapov
6116751e44 db: consistency_level: change is_sufficient_live_nodes to work on host ids
It is called from storage proxy which works on host ids now.
2024-12-02 10:31:12 +02:00
Gleb Natapov
ccbfabb858 db: consistency_level: move filter_for_query to host id
It is called from storage proxy which works on host ids now.
2024-12-02 10:31:12 +02:00
Gleb Natapov
12937aeb7f storage_proxy: move to addressing nodes by host ids instead of ips
In this rather large path we mode to address nodes in storage proxy by
host ids instead of ips. Some subsystems storage proxy calls to are
not yet converted to host ids, so we translate back and forth when we
interact with them.
2024-12-02 10:31:11 +02:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Kefu Chai
ee36358a60 db: remove unused includes
these unused includes are identified by clang-include-cleaner.
after auditing the source files, all of the reports have been
confirmed.

please note, since we have `using seastar::shared_ptr` in
`seastarx.h`, this renders `#include <seastar/core/shared_ptr.hh>`
unnecessary if we don't need the full definition of `seastar::shared_ptr`.

so, in this change, all the unused includes are removed. but there are
some headers which are actually used, while still being identified by
this tool. these includes are marked with "IWYU pragma: keep".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-04 20:48:18 +08:00
Kamil Braun
0e36377f56 db: consistency_level: remove overload of filter_for_query
Not used anymore after the previous commit.
2023-06-14 11:41:36 +02:00
Pavel Emelyanov
64c9359443 storage_proxy: Don't use default-initialized endpoint in get_read_executor()
After calling filter_for_query() the extra_replica to speculate to may
be left default-initialized which is :0 ipv6 address. Later below this
address is used as-is to check if it belongs to the same DC or not which
is not nice, as :0 is not an address of any existing endpoint.

Recent move of dc/rack data onto topology made this place reveal itself
by emitting the internal error due to :0 not being present on the
topology's collection of endpoints. Prior to this move the dc filter
would count :0 as belonging to "default_dc" datacenter which may or may
not match with the dc of the local node.

The fix is to explicitly tell set extra_replica from unset one.

fixes: #11825

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11833
2022-10-25 09:16:50 +03:00
Avi Kivity
46bd0b1e62 consistency_level: accept effective_replication_map as parameter, rather than keyspace
A keyspace is a mutable object that can change from time to time. An
effective_replication_map captures the state of a keyspace at a point in
time and can therefore be consistent (with care from the caller).

Change consistency_level's functions to accept an effective_replication_map.
This allows the caller to ensure that separate calls use the same
information and are consistent with each other.

Current callers are likely correct since they are called from one
continuation, but it's better to be sure.
2022-08-11 17:58:42 +03:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Pavel Emelyanov
c3718b7a6e consistency_level: Remove is_local() and count_local_endpoints()
No code uses them now -- switched to use topology -- so thse two can be
dropped together with their calls for global snitch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:48 +03:00
Kamil Braun
a9fd156a1b db: consistency_level: filter_for_query: take const gossiper& 2022-08-04 12:19:38 +02:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Pavel Emelyanov
11c99fc41b table: Don't use global gossiper
The table::get_hit_rate needs gossiper to get hitrates state from.
There's no way to carry gossiper reference on the table itself, so it's
up to the callers of that method to provide it. Fortunately, there's
only one caller -- the proxy -- but the call chain to carry the
reference it not very short ... oh, well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:33:08 +03:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Avi Kivity
4d70f3baee storage_proxy: change unordered_set<inet_address> to small_vector in write path
The write paths in storage_proxy pass replica sets as
std::unordered_set<gms::inet_address>. This is a complex type, with
N+1 allocations for N members, so we change it to a small_vector (via
inet_address_vector_replica_set) which requires just one allocation, and
even zero when up to three replicas are used.

This change is more nuanced than the corresponding change to the read path
abe3d7d7 ("Merge 'storage_proxy: use small_vector for vectors of
inet_address' from Avi Kivity"), for two reasons:

 - there is a quadratic algorithm in
   abstract_write_response_handler::response(): it searches for a replica
   and erases it. Since this happens for every replica, it happens N^2/2
   times.
 - replica sets for writes always include all datacenters, while reads
   usually involve just one datacenter.

So, a write to a keyspace that has 5 datacenters will invoke 15*(15-1)/2
=105 compares.

We could remove this by sending the index of the replica in the replica
set to the replica and ask it to include the index in the response, but
I think that this is unnecessary. Those 105 compares need to be only
105/15 = 7 times cheaper than the corresponding unordered_set operation,
which they surely will. Handling a response after a cross-datacenter round
trip surely involves L3 cache misses, and a small_vector reduces these
to a minimum compared to an unordered_set with its bucket table, linked
list walking and managent, and table rehashing.

Tests using perf_simple_query --write --smp 1 --operations-per-shard 1000000
 --task-quota-ms show two allocations removed (as expected) and a nice
reduction in instructions executed.

before: median 204842.54 tps ( 54.2 allocs/op,  13.2 tasks/op,   49890 insns/op)
after:  median 206077.65 tps ( 52.2 allocs/op,  13.2 tasks/op,   49138 insns/op)

Closes #8847
2021-06-17 13:46:40 +03:00
Pavel Solodovnikov
76bea23174 treewide: reduce header interdependencies
Use forward declarations wherever possible.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>

Closes #8813
2021-06-07 15:58:35 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Solodovnikov
e0749d6264 treewide: some random header cleanups
Eliminate not used includes and replace some more includes
with forward declarations where appropriate.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-06-06 19:18:49 +03:00
Avi Kivity
c71d007797 consistency_level: deinline assure_sufficient_live_nodes()
assure_sufficient_live_nodes() is a huge template calling other
huge templates, and requires "network_topology_strategy.hh". It is
inlined in consistency_level.hh. This increases compile time and recompiles.

Move the template out-of-line and use "extern template" to instantiate it.
This is not ideal as new callers would require updates to the
instantiated signatures, but I think our goal should be to de-template
it completely instead. Meanwhile, this reduces some pain.

Ref #1.

Closes #8637
2021-05-19 15:03:51 +03:00
Avi Kivity
cea5493cb7 storage_proxy, treewide: introduce names for vectors of inet_address
storage_proxy works with vectors of inet_addresses for replica sets
and for topology changes (pending endpoints, dead nodes). This patch
introduces new names for these (without changing the underlying
type - it's still std::vector<gms::inet_address>). This is so that
the following patch, that changes those types to utils::small_vector,
will be less noisy and highlight the real changes that take place.
2021-05-05 18:36:48 +03:00
Gleb Natapov
0b84b04f97 consistency_level: make it more const correct
Message-Id: <20190214122631.GF19055@scylladb.com>
2019-02-14 14:52:51 +02:00
Avi Kivity
2c08bff8d5 Split consistency_level.hh header
It has two unrelated users: cql for validation, and storage_proxy for
complicated calculations. Split the simple stuff into a new header to reduce
dependencies.
2018-11-27 13:32:10 +02:00
Botond Dénes
6e59cee244 db::consistency_level::filter_for_query() add preferred_endpoints
To the second overload (the one without read-repair related params) too.
2018-09-03 10:31:44 +03:00
Botond Dénes
aaf67bcbaa Consider preferred replicas when choosing endpoints for query_singular()
Propagate the preferred_replicas to db::filter_for_query() and consider
them when selecting the endpoints. The algoritm for selecting the
endpoints is as follows:
* Compute the intersection of the endpoint candidates and the
preferred endpoints.
* If this yields a set of endpoints that already satisfies the CL
requirements use this set.
* Otherwise select the remaining endpoints according to the
load-balancing strategy, just like before.
2018-03-13 10:34:34 +02:00
Gleb Natapov
357c77a333 consistency_level: constify quorum_for() and local_quorum_for() 2017-12-05 13:01:20 +02:00
Gleb Natapov
739dd878e3 consistency_level: report less live endpoints in Unavailable exception if there are pending nodes
DowngradingConsistencyRetryPolicy uses live replicas count from
Unavailable exception to adjust CL for retry, but when there are pending
nodes CL is increased internally by a coordinator and that may prevent
retried query from succeeding. Adjust live replica count in case of
pending node presence so that retried query will be able to proceed.

Fixes #2535

Message-Id: <20170710085238.GY2324@scylladb.com>
2017-07-11 16:51:56 +03:00
Gleb Natapov
87094849fa storage_proxy: load balance read requests according to cache hit rates
This patch makes storage proxy to choose replicas to read from base on
their cache hit rates. Replicas with higher cache hit rates will see
more requests while replicas with lower hit rates will see less. Local
node has a special bonus and will get more requests even if another node
has slightly higher cache hit rate (same goes for local vs remote DC),
but after the patch it is no longer guarantied that a coordinator node
will be chosen as a replica for the read (if the feature is enabled).
2017-06-13 09:57:14 +03:00
Gleb Natapov
bc8aa1b4ee choose extra replica for speculation in filter_for_query()
Currently storage proxy has to loop over remaining replicas to search
for suitable extra replica, but doing it in filter_for_query() is
extremely easy, so do it there instead.
2017-06-13 09:57:14 +03:00
Gleb Natapov
8437ea3b99 consistency_level: drop filter_for_query_dc_local function
Merge filter_for_query_dc_local() functionality into filter_for_query().
This is more efficient since filter_for_query_dc_local() partitions
endpoints into 'local' and 'remote' set but filter_for_query() already
does it for CL=LOCAL so for such queries we needlessly do it twice.
2017-06-13 09:57:14 +03:00
Tomasz Grabiec
ddfee57c97 Replace iostream include with iosfwd in headers
Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>
2017-01-17 14:52:44 +02:00
Gleb Natapov
dbb1217896 cl: enable logging for insufficient LOCAL_QUORUM consistency
Message-Id: <1460549369-29523-2-git-send-email-gleb@scylladb.com>
2016-04-14 14:56:58 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Gleb Natapov
f59415b3c6 Take pending endpoints into account while checking for sufficient live nodes
During bootstrapping additional copies of data has to be made to ensure
that CL level is met (see CASSANDRA-833 for details). Our code does
that, but it does not take into account that bootstraping node can be
dead which may cause request to proceed even though there is no
enough live nodes for it to be completed. In such a case request neither
completes nor timeouts, so it appear to be stuck from CQL layer POV. The
patch fixes this by taking into account pending nodes while checking
that there are enough sufficient live nodes for operation to proceed.

Fixes #965

Message-Id: <20160303165250.GG2253@scylladb.com>
2016-03-07 13:30:13 +01:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Gleb Natapov
17e54d0604 add logger for consistency level calculation 2015-09-13 11:59:17 +03:00
Pekka Enberg
0b8c67ed79 exceptions: Move unavailable_exception to exceptions.hh
Move unavailable_exception to exceptions.hh where other CQL transport
level exceptions are defined in.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-28 10:06:18 +03:00
Pekka Enberg
055e25ed43 db/consistency_level: Move enum to separate header
Move 'consistency_level' enumeration to a separate header file to fix
dependency issues that arise when we move 'unavailable_exception' to
exceptions.hh.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-28 10:06:18 +03:00
Pekka Enberg
7fc1311d4a db/consistency_level: Move implementation to .cc file
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-28 10:06:18 +03:00
Pekka Enberg
32378708d0 db/consistency_level: Remove ifdef'd code
Cleanup consistency_level.hh by removing untranslated code that's been
sitting in the tree for a while.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-28 09:36:36 +03:00
Pekka Enberg
826f21643f transport/server: Fix UNAVAILABLE error encoding
This fixes UNAVAILABLE error encoding to follow the CQL binary protocol
spec.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-28 09:27:52 +03:00
Gleb Natapov
f122ee39b9 storage_proxy: return proper error codes to transport layer
Transport layer expects to get error code in an exception of type
exceptions::cassandra_exception. Fix code to use it as a base for
all user visible exceptions and put correct error code there.
2015-07-23 12:32:21 +03:00
Vlad Zolotarov
45ce351f60 db: consistency_level.hh: added is_sufficient_live_nodes()
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-07-05 17:34:56 +03:00
Vlad Zolotarov
501737cb84 db: consistency_level.hh: Complete the implementation of filter_for_query()
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Use std::partition_copy() and boost::range::algorithm::partition().
   - Don't use std::move() when returning a local vector variable.
2015-07-05 17:34:50 +03:00
Vlad Zolotarov
a9a3bd1927 db: consistency_level.hh: Styling in filter_for_query()
- Make live_endpoints.erase() call more readable.
   - Adjust the comments to our naming.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-07-05 17:15:23 +03:00
Vlad Zolotarov
77c50dc013 db: consistency_level.hh: complete assure_sufficient_live_nodes()
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Use static_cast instead of a dynamic_cast.
2015-07-02 16:00:17 +03:00