Commit Graph

51 Commits

Author SHA1 Message Date
Raphael S. Carvalho
012ba25b5b service: fix indentation in dispatch()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Raphael S. Carvalho
0a9e073154 service: fix reactor stall with large tablet count
with a large tablet count, e.g. 128k, forward_service::dispatch() can
potentially stall when grouping ranges per endpoint.

Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Raphael S. Carvalho
f7659b357c service: avoid potential expensive copies in forward_service::dispatch()
each partition_range_vector might grow to ~9600 elements, assuming
96-shard nodes, each with 100 tablets.

~9600 elements, where each is 120 bytes (sizeof(partition_range))
can result in vector with capacity of ~2M due to growth factor of
2.

we're copying each range 3x in dispatch(), and we can easily avoid
it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Raphael S. Carvalho
f9d2b9a83b service: coroutinize forward_service::dispatch()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Kefu Chai
2dbf044b91 cql3: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16791
2024-01-16 16:43:17 +02:00
Kefu Chai
ece2bd2f6e service: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16764
2024-01-15 13:29:33 +02:00
Benny Halevy
860b2d38c6 forward_service: use messaging rather than fb_utilities
Use _forwarder._messaging to get to the broadcast address
rather than the global fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:48:12 +02:00
Pavel Emelyanov
0e0f9a57c6 forward_service: Remove .shutdown() method
It's now empty and has no value

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 10:39:22 +03:00
Pavel Emelyanov
a251b9893f forward_service: Set _shutdown in abort-source subscription
Currently the bit is set in .shutdown() method which is called early on
stop. After the patch the bit it set in the abort-source subscription
callback which is also called early on stop.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 10:38:34 +03:00
Avi Kivity
66c47d40e6 cql3: selection: drop selector_factories, selectables, and selectors
The whole class hierarchy is no longer used by anything and we can just
delete it.
2023-07-03 19:45:17 +03:00
Avi Kivity
7c3ceb6473 cql3: select_statement: use prepared selectors
Change one more layer of processing to work on prepared
rather than raw selectors. This moves the call to prepare
the selectors early in select_statement processing. In turn
this changes maybe_jsonize_select_clause() and forward_service's
mock_selection() to work in the prepared realm as well.

This moves us one step closer to using evaluate() to process
the select clause, as the prepared selectors are now available
in select_statement. We can't use them yet since we can't evaluate
aggregations.
2023-07-03 19:45:17 +03:00
Botond Dénes
e1c2de4fb8 Merge 'forward_service: fix forgetting case-sensitivity in aggregates ' from Jan Ciołek
There was a bug that caused aggregates to fail when used on column-sensitive columns.

For example:
```cql
SELECT SUM("SomeColumn") FROM ks.table;
```
would fail, with a message saying that there is no column "somecolumn".

This is because the case-sensitivity got lost on the way.

For non case-sensitive column names we convert them to lowercase, but for case sensitive names we have to preserve the name as originally written.

The problem was in `forward_service` - we took a column name and created a non case-sensitive `column_identifier` out of it.
This converted the name to lowercase, and later such column couldn't be found.

To fix it, let's make the `column_identifier` case-sensitive.
It will preserve the name, without converting it to lowercase.

Fixes: https://github.com/scylladb/scylladb/issues/14307

Closes #14340

* github.com:scylladb/scylladb:
  service/forward_service.cc: make case-sensitivity explicit
  cql-pytest/test_aggregate: test case-sensitive column name in aggregate
  forward_service: fix forgetting case-sensitivity in aggregates
2023-06-22 08:25:33 +03:00
Jan Ciołek
16c21d7252 service/forward_service.cc: make case-sensitivity explicit
Make it explicit that the boolean argument determines case-sensitivity. It emphasizes its importance.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-21 16:02:41 +02:00
Jan Ciolek
7fca350075 forward_service: fix forgetting case-sensitivity in aggregates
There was a bug that caused aggregates to fail when
used on column-sensitive columns.

For example:
```
SELECT SUM("SomeColumn") FROM ks.table;
```
would fail, with a message saying that there
is no column "somecolumn".

This is because the case-sensitivity got lost on the way.

For non case-sensitive column names we convert them to lowercase,
but for case sensitive names we have to preserve the name
as originally written.

The problem was in `forward_service` - we took a column name
and created a non case-sensitive `column_identifier` out of it.
This converted the name to lowercase, and later such column
couldn't be found.

To fix it, let's make the `column_identifier` case-sensitive.
It will preserve the name, without converting it to lowercase.

Fixes: https://github.com/scylladb/scylladb/issues/14307

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-21 14:37:42 +02:00
Tomasz Grabiec
d4497a058e forward_service: Use table sharder
schema::get_sharder() does not return the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
2023-06-21 00:58:24 +02:00
Avi Kivity
1040589828 cql3: selection: prepare selector expressions
Call prepare_expression() on selector expressions to resolve types. This
leaves us with just one way to move from the unprepared domain to the
prepared domain.

The change is somewhat awkward since do_prepare_selectable() is re-doing
work that is done by prepare_expression(), but somehow it all works. The
next patch will tear down the unnecessary double-preparation.
2023-06-13 21:04:49 +03:00
Michał Sala
e0855b1de2 forward_service: introduce shutdown checks
This commit introduces a new boolean flag, `shutdown`, to the
forward_service, along with a corresponding shutdown method. It also
adds checks throughout the forward_service to verify the value of the
shutdown flag before retrying or invoking functions that might use the
messaging service under the hood.

The flag is set before messaging service shutdown, by invoking
forward_service::shutdown in main. By checking the flag before each call
that potentially involves the messaging service, we can ensure that the
messaging service is still operational. If the flag is false, indicating
that the messaging service is still active, we can proceed with the
call. In the event that the messaging service is shutdown during the
call, appropriate exceptions should be thrown somewhere down in called
functions, avoiding potential hangs.

This fix should resolve the issue where forward_service retries could
block the shutdown.

Fixes #12604

Closes #13922
2023-06-13 13:44:33 +03:00
Avi Kivity
26c8470f65 treewide: use #include <seastar/...> for seastar headers
We treat Seastar as an external library, so fix the few places
that didn't do so to use angle brackets.

Closes #14037
2023-06-06 08:36:09 +03:00
Avi Kivity
42a1ced73b cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.

Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.

The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
2023-05-07 17:17:36 +03:00
Tomasz Grabiec
e4865bd4d1 dht, storage_proxy: Abstract token space splitting
Currently, scans are splitting partition ranges around tokens. This
will have to change with tablets, where we should split at tablet
boundaries.

This patch introduces token_range_splitter which abstracts this
task. It is provided by effective_replication_map implementation.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
9b17ad3771 locator: Introduce per-table replication strategy
Will be used by tablet-based replication strategies, for which
effective replication map is different per table.

Also, this patch adapts existing users of effective replication map to
use the per-table effective replication map.

For simplicity, every table has an effective replication map, even if
the erm is per keyspace. This way the client code can be uniform and
doesn't have to check whether replication strategy is per table.

Not all users of per-keyspace get_effective_replication_map() are
adapted yet to work per-table. Those algorithms will throw an
exception when invoked on a keyspace which uses per-table replication
strategy.
2023-04-24 10:49:36 +02:00
Avi Kivity
6977df5539 cql3/selection, forward_service: use use stateless_aggregate_function directly
Now that stateless_aggregate_function is directly exposed by
aggregate_function, we can use it directly, avoiding the intermediary
aggregate_function::aggregate, which is removed.
2023-03-28 23:49:34 +03:00
Michał Jadwiszczak
68d2e1fff8 service:forward_service: use long type when column is counter
Previously aggregations on counter columns were failing because
function mocking was looking for function with counter arguemnt,
which doesn't exist.
2023-02-24 10:24:16 +01:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Nadav Har'El
3ba011c2be cql: fix empty aggregation, and add more tests
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715
2023-02-07 12:28:42 +02:00
Wojciech Mitros
5f45b32bfa forward_service: prevent heap use-after-free of forward_aggregates
Currently, we create `forward_aggregates` inside a function that
returns the result of a future lambda that captures these aggregates
by reference. As a result, the aggregates may be destructed before
the lambda finishes, resulting in a heap use-after-free.

To prolong the lifetime of these aggregates, we cannot use a move
capture, because the lambda is wrapped in a with_thread_if_needed()
call on these aggregates. Instead, we fix this by wrapping the
entire return statement in a do_with().

Fixes #12528

Closes #12533
2023-01-17 13:25:57 +02:00
Michał Sala
bbbe12af43 forward_service: fix timeout support in parallel aggregates
`forward_request` verb carried information about timeouts using
`lowres_clock::time_point` (that came from local steady clock
`seastar::lowres_clock`). The time point was produced on one node and
later compared against other node `lowres_clock`. That behavior
was wrong (`lowres_clock::time_point`s produced with different
`lowres_clock`s cannot be compared) and could lead to delayed or
premature timeout.

To fix this issue, `lowres_clock::time_point` was replaced with
`lowres_system_clock::time_point` in `forward_request` verb.
Representation to which both time point types serialize is the same
(64-bit integer denoting the count of elapsed nanoseconds), so it was
possible to do an in-place switch of those types using logic suggested
by @avikivity:
    - using steady_clock is just broken, so we aren't taking anything
        from users by breaking it further
    - once all nodes are upgraded, it magically starts to work

Closes #12529
2023-01-16 12:08:13 +02:00
Avi Kivity
2739ac66ed treewide: drop cql_serialization_format
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).

A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.

Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.

One test checking the 16-bit serialization format is removed.
2023-01-03 19:54:13 +02:00
Michał Jadwiszczak
8e64e18b80 forward_service: add debug logs
Adds a few debug logs to see what is happening in https://github.com/scylladb/scylladb/issues/11684

Wrapped `forward_result::printer` into `seastar::value_of` to lazy
evaluate the printer

Closes #12113
2022-11-30 12:15:26 +02:00
Avi Kivity
f1b0e3d58e storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map
Allow callers to use consistent effective_replication_map:s across calls
by letting the caller select the object to use.
2022-08-11 17:58:42 +03:00
Raphael S. Carvalho
337390d374 forward_service: execute_on_this_shard: avoid reallocation and copy
avoid about log2(256)=8 reallocations when pushing partition ranges to
be fetched. additionally, also avoid copying range into ranges
container. current_range will not contain the last range, after
moved, but will still be engaged by the end of the loop, allowing
next iteration to happen as expected.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11242
2022-08-09 09:08:53 +02:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Pavel Emelyanov
9a19414c62 forwarding_service: Use topology local_dc_filter
The service needs to filter out non-local endpoints for its needs. The
service carries token metadata pointer and can get topology from it to
fulfill this goal

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:47 +03:00
Piotr Sarna
dd2417618e forward_service: limit the number of partition ranges fetched
The forward service uses a vector of ranges owned by a particular
shard in order to split and delegate the work. The number can
grow large though, which can cause large allocations.
This commit limits the number of ranges handled at a time to 256.

Fixes #10725

Closes #11182
2022-08-01 17:36:34 +03:00
Piotr Sarna
abc5a7b7ec forward_service: remove redundant optional from forward_service
This commit refactors the code to get rid of unnecessary
std::optional usage in forward_result, since now it's possible
to merge empty results with each other, both ways (#11064).
2022-07-26 12:02:55 +02:00
Piotr Sarna
626fb75949 forward_service: open-code running a Sestar thread
Previous interface forced the caller to allocate forward_aggregates
in order to be able to conditionally run the merging code inside
a Seastar thread, which is suboptimal. By open-coding the condition,
it's possible to drop the do_with, saving an allocation.
2022-07-26 08:10:47 +02:00
Piotr Sarna
e8f2565371 forward_service: add requires_thread helper
It will be needed later to be able to decide if seastar thread
is needed for merging forward service results.
2022-07-26 08:10:47 +02:00
Piotr Sarna
c195ce1b82 query: allow merging non-empty forward_result with an empty one
Merging empty results was already allowed, but in one way only:

empty.merge(nonempty, r); // was permitted
nonempty.merge(empty, r); // not permitted

With this commit, both methods are permitted.
In order to remove copying, the other result is now taken
by rvalue reference, with all call sites being updated
accordingly.

Fixes #10446
Fixes #10174

Closes #11064
2022-07-25 18:06:28 +03:00
Jadw1
29a0be75da forward_service: support UDA and native aggregate parallelization
Enables parallelization of UDA and native aggregates. The way the
query is parallelized is the same as in #9209. Separate reduction
type for `COUNT(*)` is left for compatibility reason.
2022-07-18 15:25:41 +02:00
Pavel Emelyanov
282a1880a5 forward service: Re-use proxy's helper with duplicated code
The get_live_endpoints matches the same method on the proxy side. Since
the forward service carries proxy reference, it can use its method
(which needs to be made public for that sake).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:34:51 +03:00
Avi Kivity
582802825a treewide: use system-#include (angle brackets) for seastar
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.

Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.

Closes #10433
2022-04-26 14:46:42 +03:00
Avi Kivity
e55f5fab53 service: forward_service: avoid using deprecated std::bind1st and std::not1
Switch to newer alterantives std::bind_front, std::not_fn.
2022-04-18 12:27:18 +03:00
Michał Sala
28970389bc forward_service: uncoroutinize dispatch method
Done to mitigate potential misscompilations.
2022-04-06 15:01:31 +02:00
Michał Sala
edc32a7118 forward_service: uncoroutinize retrying_dispatcher
Done to mitigate potential misscompilations.
2022-04-06 14:52:59 +02:00
Michał Sala
59ff51c824 forward_service: rety a failed forwarder call
Failed-to-forward sub-queries will be executed locally (on a
super-coordinator). This local execution is meant as a fallback for
forward_requests that could not be sent to its destined coordinator
(e.g. due gossiper not reacting fast enough). Local execution was chosen
as the safest one - it does not require sending data to another
coordinator.
2022-04-06 14:44:55 +02:00
Michał Sala
e170961b4d forward_service: copy arguments/captured vars to local variables
Copying captured variables into local variables (that live in a
coroutine's frame) is a mitigation of suspected lifetime issues.
Arguments of forward_service::dispatch are also copied (to prevent
potential undefined behavior or miss-compilation triggered by
referencing the arguments in a capture list of a lambda that produces a
coroutine).
2022-04-04 16:58:08 +02:00
Michał Sala
c8413631af forward_service: change implicit lambda capture list to explicit one
Changing the capture list of a lambda in
forward_service::execute_on_this_shard from [&] to an explicit one
enables grater readability and prevents potential bugs.

Closes #10191
2022-03-10 17:30:06 +02:00
Michał Sala
e6e9553b4a forward_service: add metrics
Introduces metrics for `forward_service`. 3 counters were created, which
allows checking how many requests had been dispached or executed.
2022-02-01 21:14:41 +01:00
Michał Sala
354f7a1c34 forward_service: parallelize execution across shards
Coordinators processed each vnode sequentially on shards when executing
a `forward_request` sent by super-coordinator. This commit changes this
behavior and parallelizes execution of `forward_request` across shards.

It does that by adding additional layer of dispatching to
`forward_service`. When a coordinator receives a `forward_request`, it
forwards it to each of its shards. Shards slice `forward_request`'s
partition ranges so that they will only query data that is owned by
them. Implementation of slicing partition ranges was based on @nyh's
`token_ranges_owned_by_this_shard` from `alternator/ttl.cc`.
2022-02-01 21:14:41 +01:00
Michał Sala
aec96be553 forward_service: add tracing 2022-02-01 21:14:41 +01:00