scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 09:30:45 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	012ba25b5b	service: fix indentation in dispatch() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	0a9e073154	service: fix reactor stall with large tablet count with a large tablet count, e.g. 128k, forward_service::dispatch() can potentially stall when grouping ranges per endpoint. Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	f7659b357c	service: avoid potential expensive copies in forward_service::dispatch() each partition_range_vector might grow to ~9600 elements, assuming 96-shard nodes, each with 100 tablets. ~9600 elements, where each is 120 bytes (sizeof(partition_range)) can result in vector with capacity of ~2M due to growth factor of 2. we're copying each range 3x in dispatch(), and we can easily avoid it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	f9d2b9a83b	service: coroutinize forward_service::dispatch() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Kefu Chai	2dbf044b91	cql3: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16791	2024-01-16 16:43:17 +02:00
Kefu Chai	ece2bd2f6e	service: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16764	2024-01-15 13:29:33 +02:00
Benny Halevy	860b2d38c6	forward_service: use messaging rather than fb_utilities Use _forwarder._messaging to get to the broadcast address rather than the global fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:48:12 +02:00
Pavel Emelyanov	0e0f9a57c6	forward_service: Remove .shutdown() method It's now empty and has no value Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-26 10:39:22 +03:00
Pavel Emelyanov	a251b9893f	forward_service: Set _shutdown in abort-source subscription Currently the bit is set in .shutdown() method which is called early on stop. After the patch the bit it set in the abort-source subscription callback which is also called early on stop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-26 10:38:34 +03:00
Avi Kivity	66c47d40e6	cql3: selection: drop selector_factories, selectables, and selectors The whole class hierarchy is no longer used by anything and we can just delete it.	2023-07-03 19:45:17 +03:00
Avi Kivity	7c3ceb6473	cql3: select_statement: use prepared selectors Change one more layer of processing to work on prepared rather than raw selectors. This moves the call to prepare the selectors early in select_statement processing. In turn this changes maybe_jsonize_select_clause() and forward_service's mock_selection() to work in the prepared realm as well. This moves us one step closer to using evaluate() to process the select clause, as the prepared selectors are now available in select_statement. We can't use them yet since we can't evaluate aggregations.	2023-07-03 19:45:17 +03:00
Botond Dénes	e1c2de4fb8	Merge 'forward_service: fix forgetting case-sensitivity in aggregates ' from Jan Ciołek There was a bug that caused aggregates to fail when used on column-sensitive columns. For example: ```cql SELECT SUM("SomeColumn") FROM ks.table; ``` would fail, with a message saying that there is no column "somecolumn". This is because the case-sensitivity got lost on the way. For non case-sensitive column names we convert them to lowercase, but for case sensitive names we have to preserve the name as originally written. The problem was in `forward_service` - we took a column name and created a non case-sensitive `column_identifier` out of it. This converted the name to lowercase, and later such column couldn't be found. To fix it, let's make the `column_identifier` case-sensitive. It will preserve the name, without converting it to lowercase. Fixes: https://github.com/scylladb/scylladb/issues/14307 Closes #14340 * github.com:scylladb/scylladb: service/forward_service.cc: make case-sensitivity explicit cql-pytest/test_aggregate: test case-sensitive column name in aggregate forward_service: fix forgetting case-sensitivity in aggregates	2023-06-22 08:25:33 +03:00
Jan Ciołek	16c21d7252	service/forward_service.cc: make case-sensitivity explicit Make it explicit that the boolean argument determines case-sensitivity. It emphasizes its importance. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-21 16:02:41 +02:00
Jan Ciolek	7fca350075	forward_service: fix forgetting case-sensitivity in aggregates There was a bug that caused aggregates to fail when used on column-sensitive columns. For example: ``` SELECT SUM("SomeColumn") FROM ks.table; ``` would fail, with a message saying that there is no column "somecolumn". This is because the case-sensitivity got lost on the way. For non case-sensitive column names we convert them to lowercase, but for case sensitive names we have to preserve the name as originally written. The problem was in `forward_service` - we took a column name and created a non case-sensitive `column_identifier` out of it. This converted the name to lowercase, and later such column couldn't be found. To fix it, let's make the `column_identifier` case-sensitive. It will preserve the name, without converting it to lowercase. Fixes: https://github.com/scylladb/scylladb/issues/14307 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-21 14:37:42 +02:00
Tomasz Grabiec	d4497a058e	forward_service: Use table sharder schema::get_sharder() does not return the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Avi Kivity	1040589828	cql3: selection: prepare selector expressions Call prepare_expression() on selector expressions to resolve types. This leaves us with just one way to move from the unprepared domain to the prepared domain. The change is somewhat awkward since do_prepare_selectable() is re-doing work that is done by prepare_expression(), but somehow it all works. The next patch will tear down the unnecessary double-preparation.	2023-06-13 21:04:49 +03:00
Michał Sala	e0855b1de2	forward_service: introduce shutdown checks This commit introduces a new boolean flag, `shutdown`, to the forward_service, along with a corresponding shutdown method. It also adds checks throughout the forward_service to verify the value of the shutdown flag before retrying or invoking functions that might use the messaging service under the hood. The flag is set before messaging service shutdown, by invoking forward_service::shutdown in main. By checking the flag before each call that potentially involves the messaging service, we can ensure that the messaging service is still operational. If the flag is false, indicating that the messaging service is still active, we can proceed with the call. In the event that the messaging service is shutdown during the call, appropriate exceptions should be thrown somewhere down in called functions, avoiding potential hangs. This fix should resolve the issue where forward_service retries could block the shutdown. Fixes #12604 Closes #13922	2023-06-13 13:44:33 +03:00
Avi Kivity	26c8470f65	treewide: use #include <seastar/...> for seastar headers We treat Seastar as an external library, so fix the few places that didn't do so to use angle brackets. Closes #14037	2023-06-06 08:36:09 +03:00
Avi Kivity	42a1ced73b	cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt The expression system uses managed_bytes_opt for values, but result_set uses bytes_opt. This means that processing values from the result set in expressions requires a copy. Out of the two, managed_bytes_opt is the better choice, since it prevents large contiguous allocations for large blobs. So we switch result_set to use managed_bytes_opt. Users of the result_set API are adjusted. The db::function interface is not modified to limit churn; instead we convert the types on entry and exit. This will be adjusted in a following patch.	2023-05-07 17:17:36 +03:00
Tomasz Grabiec	e4865bd4d1	dht, storage_proxy: Abstract token space splitting Currently, scans are splitting partition ranges around tokens. This will have to change with tablets, where we should split at tablet boundaries. This patch introduces token_range_splitter which abstracts this task. It is provided by effective_replication_map implementation.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	9b17ad3771	locator: Introduce per-table replication strategy Will be used by tablet-based replication strategies, for which effective replication map is different per table. Also, this patch adapts existing users of effective replication map to use the per-table effective replication map. For simplicity, every table has an effective replication map, even if the erm is per keyspace. This way the client code can be uniform and doesn't have to check whether replication strategy is per table. Not all users of per-keyspace get_effective_replication_map() are adapted yet to work per-table. Those algorithms will throw an exception when invoked on a keyspace which uses per-table replication strategy.	2023-04-24 10:49:36 +02:00
Avi Kivity	6977df5539	cql3/selection, forward_service: use use stateless_aggregate_function directly Now that stateless_aggregate_function is directly exposed by aggregate_function, we can use it directly, avoiding the intermediary aggregate_function::aggregate, which is removed.	2023-03-28 23:49:34 +03:00
Michał Jadwiszczak	68d2e1fff8	service:forward_service: use long type when column is counter Previously aggregations on counter columns were failing because function mocking was looking for function with counter arguemnt, which doesn't exist.	2023-02-24 10:24:16 +01:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Nadav Har'El	3ba011c2be	cql: fix empty aggregation, and add more tests This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v)) of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()") resulted in an internal error instead of the "zero" result that each aggregator expects (e.g., 0 for COUNT, null for MIN). The problem is that normally our aggregator forwarder picks the nodes which hold the relevant partition(s), forwards the request to each of them, and then combines these results. When there are no partitions, the query is sent to no node, and we end up with an empty result set instead of the "zero" results. So in this patch we recognize this case and build those "zero" results (as mentioned above, these aren't always 0 and depend on the aggregation function!). The patch also adds two tests reproducing this issue in a fairly general way (e.g., several aggregators, different aggregation functions) and confirming the patch fixes the bug. The test also includes two additional tests for COUNT aggregation, which uncovered an incompatibility with Cassandra which is still not fixed - so these tests are marked "xfail": Refs #12477: Combining COUNT with GROUP by results with empty results in Cassandra, and one result with empty count in Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12715	2023-02-07 12:28:42 +02:00
Wojciech Mitros	5f45b32bfa	forward_service: prevent heap use-after-free of forward_aggregates Currently, we create `forward_aggregates` inside a function that returns the result of a future lambda that captures these aggregates by reference. As a result, the aggregates may be destructed before the lambda finishes, resulting in a heap use-after-free. To prolong the lifetime of these aggregates, we cannot use a move capture, because the lambda is wrapped in a with_thread_if_needed() call on these aggregates. Instead, we fix this by wrapping the entire return statement in a do_with(). Fixes #12528 Closes #12533	2023-01-17 13:25:57 +02:00
Michał Sala	bbbe12af43	forward_service: fix timeout support in parallel aggregates `forward_request` verb carried information about timeouts using `lowres_clock::time_point` (that came from local steady clock `seastar::lowres_clock`). The time point was produced on one node and later compared against other node `lowres_clock`. That behavior was wrong (`lowres_clock::time_point`s produced with different `lowres_clock`s cannot be compared) and could lead to delayed or premature timeout. To fix this issue, `lowres_clock::time_point` was replaced with `lowres_system_clock::time_point` in `forward_request` verb. Representation to which both time point types serialize is the same (64-bit integer denoting the count of elapsed nanoseconds), so it was possible to do an in-place switch of those types using logic suggested by @avikivity: - using steady_clock is just broken, so we aren't taking anything from users by breaking it further - once all nodes are upgraded, it magically starts to work Closes #12529	2023-01-16 12:08:13 +02:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Michał Jadwiszczak	8e64e18b80	forward_service: add debug logs Adds a few debug logs to see what is happening in https://github.com/scylladb/scylladb/issues/11684 Wrapped `forward_result::printer` into `seastar::value_of` to lazy evaluate the printer Closes #12113	2022-11-30 12:15:26 +02:00
Avi Kivity	f1b0e3d58e	storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map Allow callers to use consistent effective_replication_map:s across calls by letting the caller select the object to use.	2022-08-11 17:58:42 +03:00
Raphael S. Carvalho	337390d374	forward_service: execute_on_this_shard: avoid reallocation and copy avoid about log2(256)=8 reallocations when pushing partition ranges to be fetched. additionally, also avoid copying range into ranges container. current_range will not contain the last range, after moved, but will still be engaged by the end of the loop, allowing next iteration to happen as expected. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11242	2022-08-09 09:08:53 +02:00
Botond Dénes	fbbe2529c1	Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov " There are several helpers in this .cc file that need to get datacenter for endpoints. For it they use global snitch, because there's no other place out there to get that data from. The whole dc/rack info is now moving to topology, so this set patches the consistency_level.cc to get the topology. This is done two ways. First, the helpers that have keyspace at hand may get the topology via ks's effective_replication_map. Two difficult cases are db::is_local() and db.count_local_endpoints() because both have just inet_address at hand. Those are patched to be methods of topology itself and all their callers already mess with token metadata and can get topology from it. " * 'br-consistency-level-over-topology' of https://github.com/xemul/scylla: consistency_level: Remove is_local() and count_local_endpoints() storage_proxy: Use topology::local_endpoints_count() storage_proxy: Use proxy's topology for DC checks storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver storage_proxy: Use topology local_dc_filter in its methods storage_proxy: Mark some digest_read_resolver methods private forwarding_service: Use topology local_dc_filter storage_service: Use topology local_dc_filter consistency_level: Use topology local_dc_filter consitency-level: Call count_local_endpoints from topology consistency_level: Get datacenter from topology replication_strategy: Remove hold snitch reference effective_replication_map: Get datacenter from topology topology: Add local-dc detection shugar	2022-08-05 13:31:55 +03:00
Pavel Emelyanov	9a19414c62	forwarding_service: Use topology local_dc_filter The service needs to filter out non-local endpoints for its needs. The service carries token metadata pointer and can get topology from it to fulfill this goal Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Piotr Sarna	dd2417618e	forward_service: limit the number of partition ranges fetched The forward service uses a vector of ranges owned by a particular shard in order to split and delegate the work. The number can grow large though, which can cause large allocations. This commit limits the number of ranges handled at a time to 256. Fixes #10725 Closes #11182	2022-08-01 17:36:34 +03:00
Piotr Sarna	abc5a7b7ec	forward_service: remove redundant optional from forward_service This commit refactors the code to get rid of unnecessary std::optional usage in forward_result, since now it's possible to merge empty results with each other, both ways (#11064).	2022-07-26 12:02:55 +02:00
Piotr Sarna	626fb75949	forward_service: open-code running a Sestar thread Previous interface forced the caller to allocate forward_aggregates in order to be able to conditionally run the merging code inside a Seastar thread, which is suboptimal. By open-coding the condition, it's possible to drop the do_with, saving an allocation.	2022-07-26 08:10:47 +02:00
Piotr Sarna	e8f2565371	forward_service: add requires_thread helper It will be needed later to be able to decide if seastar thread is needed for merging forward service results.	2022-07-26 08:10:47 +02:00
Piotr Sarna	c195ce1b82	query: allow merging non-empty forward_result with an empty one Merging empty results was already allowed, but in one way only: empty.merge(nonempty, r); // was permitted nonempty.merge(empty, r); // not permitted With this commit, both methods are permitted. In order to remove copying, the other result is now taken by rvalue reference, with all call sites being updated accordingly. Fixes #10446 Fixes #10174 Closes #11064	2022-07-25 18:06:28 +03:00
Jadw1	29a0be75da	forward_service: support UDA and native aggregate parallelization Enables parallelization of UDA and native aggregates. The way the query is parallelized is the same as in #9209. Separate reduction type for `COUNT(*)` is left for compatibility reason.	2022-07-18 15:25:41 +02:00
Pavel Emelyanov	282a1880a5	forward service: Re-use proxy's helper with duplicated code The get_live_endpoints matches the same method on the proxy side. Since the forward service carries proxy reference, it can use its method (which needs to be made public for that sake). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:34:51 +03:00
Avi Kivity	582802825a	treewide: use system-#include (angle brackets) for seastar Seastar is an external library from Scylla's point of view so we should use the angle bracket #include style. Most of the source follows this, this patch fixes a few stragglers. Also fix cases of #include which reached out to seastar's directory tree directly, via #include "seastar/include/sesatar/..." to just refer to <seastar/...>. Closes #10433	2022-04-26 14:46:42 +03:00
Avi Kivity	e55f5fab53	service: forward_service: avoid using deprecated std::bind1st and std::not1 Switch to newer alterantives std::bind_front, std::not_fn.	2022-04-18 12:27:18 +03:00
Michał Sala	28970389bc	forward_service: uncoroutinize dispatch method Done to mitigate potential misscompilations.	2022-04-06 15:01:31 +02:00
Michał Sala	edc32a7118	forward_service: uncoroutinize retrying_dispatcher Done to mitigate potential misscompilations.	2022-04-06 14:52:59 +02:00
Michał Sala	59ff51c824	forward_service: rety a failed forwarder call Failed-to-forward sub-queries will be executed locally (on a super-coordinator). This local execution is meant as a fallback for forward_requests that could not be sent to its destined coordinator (e.g. due gossiper not reacting fast enough). Local execution was chosen as the safest one - it does not require sending data to another coordinator.	2022-04-06 14:44:55 +02:00
Michał Sala	e170961b4d	forward_service: copy arguments/captured vars to local variables Copying captured variables into local variables (that live in a coroutine's frame) is a mitigation of suspected lifetime issues. Arguments of forward_service::dispatch are also copied (to prevent potential undefined behavior or miss-compilation triggered by referencing the arguments in a capture list of a lambda that produces a coroutine).	2022-04-04 16:58:08 +02:00
Michał Sala	c8413631af	forward_service: change implicit lambda capture list to explicit one Changing the capture list of a lambda in forward_service::execute_on_this_shard from [&] to an explicit one enables grater readability and prevents potential bugs. Closes #10191	2022-03-10 17:30:06 +02:00
Michał Sala	e6e9553b4a	forward_service: add metrics Introduces metrics for `forward_service`. 3 counters were created, which allows checking how many requests had been dispached or executed.	2022-02-01 21:14:41 +01:00
Michał Sala	354f7a1c34	forward_service: parallelize execution across shards Coordinators processed each vnode sequentially on shards when executing a `forward_request` sent by super-coordinator. This commit changes this behavior and parallelizes execution of `forward_request` across shards. It does that by adding additional layer of dispatching to `forward_service`. When a coordinator receives a `forward_request`, it forwards it to each of its shards. Shards slice `forward_request`'s partition ranges so that they will only query data that is owned by them. Implementation of slicing partition ranges was based on @nyh's `token_ranges_owned_by_this_shard` from `alternator/ttl.cc`.	2022-02-01 21:14:41 +01:00
Michał Sala	aec96be553	forward_service: add tracing	2022-02-01 21:14:41 +01:00

1 2

51 Commits