scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Pavel Emelyanov	d021aaf34d	system_keysace: De-static calls that update view-building tables There's a bunch of them used by mainly view_builder and also by the API and storage_service. All use global qctx to make its job, now when the callers have main-local sharded<system_keysace> references they can be made non-static. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 21:56:54 +03:00
Pavel Emelyanov	bbbeba103b	view_builder: Add system_keyspace dependency The view builder updates system.scylla_views_builds_in_progress and .built_views tables and thus needs the system keyspace instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 18:55:58 +03:00
Botond Dénes	84a69b6adb	db/view/view_update_check: check_needs_view_update_path(): filter out non-member hosts We currently don't clean up the system_distributed.view_build_status table after removed nodes. This can cause false-positive check for whether view update generation is needed for streaming. The proper fix is to clean up this table, but that will be more involved, it even when done, it might not be immediate. So until then and to be on the safe side, filter out entries belonging to unknown hosts from said table. Fixes: #11905 Refs: #11836 Closes #11860	2023-01-27 17:12:45 +03:00
Avi Kivity	eced91b575	Revert "view: coroutinize maybe_mark_view_as_built" This reverts commit `ac2e2f8883`. It causes a regression ("std::bad_variant_access in load_view_build_progress"). Commit `2978052113` (a reindent) is also reverted as part of the process. Fixes #12395	2022-12-28 15:36:05 +02:00
Nadav Har'El	92d03be37b	materialized view: fix bug in some large modifications to base partitions Sometimes a single modification to a base partition requires updates to a large number of view rows. A common example is deletion of a base partition containing many rows. A large BATCH is also possible. To avoid large allocations, we split the large amount of work into batch of 100 (max_rows_for_view_updates) rows each. The existing code assumed an empty result from one of these batches meant that we are done. But this assumption was incorrect: There are several cases when a base-table update may not need a view update to be generated (see can_skip_view_updates()) so if all 100 rows in a batch were skipped, the view update stopped prematurely. This patch includes two tests showing when this bug can happen - one test using a partition deletion with a USING TIMESTAMP causing the deletion to not affect the first 100 rows, and a second test using a specially-crafed large BATCH. These use cases are fairly esoteric, but in fact hit a user in the wild, which led to the discovery of this bug. The fix is fairly simple: To detect when build_some() is done it is no longer enough to check if it returned zero view-update rows; Rather, it explicitly returns whether or not it is done as an std::optional. The patch includes several tests for this bug, which pass on Cassandra, failed on Scylla before this patch, and pass with this patch. Fixes #12297. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12305	2022-12-14 14:50:38 +02:00
Piotr Dulikowski	86dad30b66	db/view: fetch and process static rows when building indexes This commit modifies the view builder and its consumer so that static rows are always fetched and properly processed during view build. Currently, the view builder will always fetch both static and clustering rows, regardless of the type of indexes being built. For indexes on static columns this is wasteful and could be improved so that only the types of rows relevant to indexes being built are fetched - however, doing this sounds a bit complicated and I would rather start with something simpler which has a better chance of working.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	6ab41d76e6	replica/table: adjust the view read-before-write to return static rows when needed Adjusts the read-before-write query issued in `table::do_push_view_replica_updates` so that, when needed, requests static columns and makes sure that the static row is present.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	18be90b1e6	db/view: process static rows in view_update_builder::on_results The `view_update_builder::on_results()` function is changed to react to static rows when comparing read-before-write results with the base table mutation.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	2dd95d76f1	db/view: adjust existing view update generation path to use clustering_or_static_row The view update path is modified to use `clustering_or_static_row` instead of just `clustering_row`.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	b0a31bb7a7	column_computation: adjust to use clustering_or_static_row Adjusts the column_computation interface so that it is able to accept both clustering and static rows through the common db::view::clustering_or_static_row interface.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	986ab6034c	db/view: add clustering_or_static_row Adds a `clustering_or_static_row`, which is a common, immutable representation of either a static or clustering row. It will allow to handle view update generation based on static or clustering rows in a uniform way.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	27c81432cd	view_info: adjust view_column to accept column_kind The `view_info::view_column()` and `view_column` in view.cc allow to get a view's column definition which corresponds to given base table's column. They currently assume that the given column id corresponds to a regular column. In preparation for secondary indexes based on static columns, this commit adjusts those functions so that they accept other kinds of columns, including static columns.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	f7b7724eaf	db/view: base_dependent_view_info: split non-pk columns into regular and static Currently, `base_dependent_view_info::_base_non_pk_columns_in_view_pk` field keeps a list of non-primary-key columns from the base table which are a part of the view's primary key. Because the current code does not allow indexes on static columns yet, the columns kept in the aforementioned field are always assumed to be regular columns of the base table and are kept as `column_id`s which do not contain information about the column kind. This commit splits the `_base_non_pk_columns_in_view_pk` field into two, one for regular columns and the other for static columns, so that it is possible to keep both kinds of columns in `base_dependent_view_info` and the structure can be used for secondary indexes on static columns.	2022-12-06 11:21:16 +01:00
Avi Kivity	2978052113	view: reindent maybe_mark_view_as_built Several identation levels were harmed during the preparation of this patch.	2022-12-01 22:09:21 +02:00
Avi Kivity	ac2e2f8883	view: coroutinize maybe_mark_view_as_built Somewhat simplifies complicated logic.	2022-12-01 22:04:51 +02:00
Benny Halevy	94f2e95a2f	view: get_view_natural_endpoint: get topology from erm Get the topology for the effective replication map rather than from the storage_proxy to ensure its synchronized with the natural endpoints. Since there's no preemption between the two calls currently there is no issue, so this is merely a clean up of the code and not supposed to fix anything. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Nadav Har'El	2f2f01b045	materialized views: fix view writes after base table schema change When we write to a materialized view, we need to know some information defined in the base table such as the columns in its schema. We have a "view_info" object that tracks each view and its base. This view_info object has a couple of mutable attributes which are used to lazily-calculate and cache the SELECT statement needed to read from the base table. If the base-table schema ever changes - and the code calls set_base_info() at that point - we need to forget this cached statement. If we don't (as before this patch), the SELECT will use the wrong schema and writes will no longer work. This patch also includes a reproducing test that failed before this patch, and passes afterwords. The test creates a base table with a view that has a non-trivial SELECT (it has a filter on one of the base-regular columns), makes a benign modification to the base table (just a silly addition of a comment), and then tries to write to the view - and before this patch it fails. Fixes #10026 Fixes #11542	2022-11-16 13:58:21 +02:00
Nadav Har'El	8f4243b875	MV: fix handling of view update which reassign the same key value When a materialized view has a key (in Alternator, this can be two keys) which was a regular column in the base table, and a base update modifies that regular column, there are two distinct cases: 1. If the old and new key values are different, we need to delete the old view row, and create a new view row (with the different key). 2. If the old and new key values are the same, we just need to update the pre-existing row. It's important not to confuse the two cases: If we try to delete and create the same view row in the same timestamp, the result will be that the row will be deleted (a tombstone wins over data if they have the same timestamp) instead of updated. This is what we saw in issue #11801. We had a bug that was seen when an update set the view key column to the old value it already had: To compare the old and new key values we used the function compare_atomic_cell_for_merge(), but this compared not just they values but also incorrectly compared the metadata such as a the timestamp. Because setting a column to the same value changes its timestamp, we wrongly concluded that these to be different view keys and used the delete-and-create code for this case, resulting in the view row being deleted (as explained above). The simple fix is to compare just the key values - not looking at the metadata. See tests reproducing this bug and confirming its fix in the next patch. Fixes #11801 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-10-19 13:43:12 +03:00
Nadav Har'El	e1f8cb6521	materialized views: inline used-once and confusing function, replace_entry() The replace_entry() function is nothing more than a convenience for calling delete_old_entry() and then create_entry(). But it is only used once in the code, and we can just open-code the two calls instead of the one. The reason I want to change it now is that the shortcut replace_entry() helped hide a bug (#11801) - replace_entry() works incorrectly if the old and new row have the same key, because if they do we get a deletion and creation of the same row with the same timestamp - and the deletion wins. Having the two calls not hidden by a convenience function makes this potential problem more apparent. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-10-19 13:25:34 +03:00
Botond Dénes	5621cdd7f9	db/view/view_builder: don't drop partition and range tombstones when resuming The view builder builds the views from a given base table in view_builder::batch_size batches of rows. After processing this many rows, it suspends so the view builder can switch to building views for other base tables in the name of fairness. When resuming the build step for a given base table, it reuses the reader used previously (also serving the role of a snapshot, pinning sstables read from). The compactor however is created anew. As the reader can be in the middle of a partition, the view builder injects a partition start into the compactor to prime it for continuing the partition. This however only included the partition-key, crucially missing any active tombstones: partition tombstone or -- since the v2 transition -- active range tombstone. This can result in base rows covered by either of this to be resurrected and the view builder to generate view updates for them. This patch solves this by using the detach-state mechanism of the compactor which was explicitly developed for situations like this (in the range scan code) -- resuming a read with the readers kept but the compactor recreated. Also included are two test cases reproducing the problem, one with a range tombstone, the other with a partition tombstone. Fixes: #11668 Closes #11671	2022-10-03 11:28:22 +03:00
Botond Dénes	681e6ae77f	db/view: view_builder::execute(): only inject partition-start if needed When resuming a build-step, the view builder injects the partition-start fragment of the last processed partition, to bring the consumer (compactor) into the correct state before it starts to consume the remainder of the partition content. This results in an invalid fragment stream when the partition was actually over or there is nothing left for the build step. Make the inject conditional on when the reader contains more data for the partition. Fixes: #11607	2022-09-22 13:54:36 +03:00
Benny Halevy	6fb4b5555d	db: view: get_tombstone_gc_state from compaction_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Benny Halevy	71ede6124a	db: view: pass base table to view_update_builder To be used by generate_update() for getting the tombstone_gc_state via the table's compaction_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:04:23 +03:00
Benny Halevy	5dd15aa3c8	tombstone_gc: introduce tombstone_gc_state and use it to access the repair history maps. At this introductory patch, we use default-constructed tombstone_gc_state to access the thread-local maps temporarily and those use sites will be replaced in following patches that will gradually pass the tombstone_gc_state down from the compaction_manager to where it's used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Nadav Har'El	5d556115a1	cql, view: rename and explain bytes_with_action The structure "bytes_with_action" was very hard to understand because of its mysterious and general-sounding name, and no comments. In this patch I add a large comment explaining its purpose, and rename it to a more suitable name, view_key_and_action, which suggests that each such object is about one view key (where to add a view row), and an additional "action" that we need to take beyond adding the view row. This is the best I can do to make this code easier to understand without completely reorganizing it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Michał Radwański	32289d681f	db/view/view.cc: compute view_updates for views over collections For collection indexes, logic of computing values for each of the column needed to change, since a single particular column might produce more than one value as a result. The liveness info from individual cells of the collection impacts the liveness info of resulting rows. Therefore it is needed to rewrite the control flow - instead of functions getting a row from get_view_row and later computing row markers and applying it, they compute these values by themselves. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:49 +03:00
Michał Radwański	112086767c	view info: has_computed_column_depending_on_base_non_primary_key In case of secondary indexes, if an index does not contain any column from the base which makes up for the primary key, then it is assumed that during update, a change to some cells from the base table cannot cause that we're dealing with a different row in the view. This however doesn't take into account the possibility of computed columns which in fact do depend on some non-primary-key columns. Introduce additional property of an index, has_computed_column_depending_on_base_non_primary_key.	2022-08-14 10:29:14 +03:00
Michał Radwański	2babee2cdc	column_computation.hh, schema.cc: compute_value interface refactor The compute_value function of column_computation has had previously the following signature: virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override; This is superfluous, since never in the history of Scylla, the last parameter (row) was used in any implentation, and never did it happen that it returned bytes_opt. The absurdity of this interface can be seen especially when looking at call sites like following, where dummy empty row was created: ``` token_column.get_computation().compute_value( *_schema, pkv_linearized, clustering_row(clustering_key_prefix::make_empty())); ```	2022-08-14 10:29:13 +03:00
Benny Halevy	d295d8e280	everywhere: define locator::host_id as a strong tagged_uuid type So it can be distinguished from other uuid-based identifiers in the system. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11276	2022-08-12 06:01:44 +03:00
Botond Dénes	7730419f5c	query-result-writer: stop when tombstone-limit is reached The query result writer now counts tombstones and cuts the page (marking it as a short one) when the tombstone limit is reached. This is to avoid timing out on large span of tombstones, especially prefixes. In the case of unpaged queries, we fail the read instead, similarly to how we do with max result size. If the limit is 0, the previous behaviour is used: tombstones are not taken into consideration at all.	2022-08-10 06:03:38 +03:00
Botond Dénes	d1d53f1b84	query: add tombstone-limit to read-command Propagate the tombstone-limit from coordinator to replicas, to make sure all is using the same limit.	2022-08-10 06:01:47 +03:00
Benny Halevy	257d74bb34	schema, everywhere: define and use table_id as a strong type Define table_id as a distinct utils::tagged_uuid modeled after raft tagged_id, so it can be differentiated from other uuid-class types, in particular from table_schema_version. Fixes #11207 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:41 +03:00
Michał Sala	d573ab0b58	db: view: react to synchronous updates tag Code that waited for all remote view updates was already there. This commit modifies the conditions of this wait to take into account the "synchronous mode" (enabled when db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY is set).	2022-07-25 09:53:33 +02:00
Avi Kivity	13a64d8ab2	Merge 'Remove all remaining restrictions classes' from Jan Ciołek This PR removes all code that used classes `restriction`, `restrictions` and their children. There were two fields in `statement_restrictions` that needed to be dealt with: `_clustering_columns_restrictions` and `_nonprimary_key_restrictions`. Each function was reimplemented to operate on the new expression representaiion and eventually these fields weren't needed anymore. After that the restriction classes weren't used anymore and could be deleted as well. Now all of the code responsible for analyzing WHERE clause and planning a query works on expressions. Closes #11069 * github.com:scylladb/scylla: cql3: Remove all remaining restrictions code cql3: Move a function from restrictions class to the test cql3: Remove initial_key_restrictions cql3: expr: Remove convert_to_restriction cql3: Remove _new from _new_nonprimary_key_restrictions cql3: Remove _nonprimary_key_restrictions field cql3: Reimplement uses of _nonprimary_key_restrictions using expression cql3: Keep a map of single column nonprimary key restrictions cql3: Remove _new from _new_clustering_columns_restrictions cql3: Remove _clustering_columns_restrictions from statement_restrictions cql3: Use a variable instead of dynamic cast cql3: Use the new map of single column clustering restrictions cql3: Keep a map of single column clustering key restrictions cql3: Return an expression in get_clustering_columns_restrctions() cql3: Reimplement _clustering_columns_restrictions->has_supporting_index() cql3: Don't create single element conjunction cql3: Add expr::index_supports_some_column cql3: Reimplement has_unrestricted_components() cql3: Reimplement _clustering_columns_restrictions->need_filtering() cql3: Reimplement num_prefix_columns_that_need_not_be_filtered cql3: Use the new clustering restrictions field instead of ->expression cql3: Reimplement _clustering_columns_restrictions->size() using expressions cql3: Reimplement _clustering_columns_restrictions->get_column_defs() using expressions cql3: Reimplement _clustering_columns_restrictions->is_all_eq() using expressions cql3: expr: Add has_only_eq_binops function cql3: Reimplement _clustering_columns_restrictions->empty() using expressions	2022-07-20 18:01:15 +03:00
Jan Ciolek	9d1ba07471	cql3: Reimplement uses of _nonprimary_key_restrictions using expression All parts of the code that use _nonprimary_key_restrictions are changed to use _new_nonprimary_key_restrictions instead. I decided not to split this into multiple commits, as there isn't a lot of changes and they are analogous to the ones done before for partition and clustering columns. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-20 09:10:30 +02:00
Jan Ciolek	2b7ffd57fb	cql3: Return an expression in get_clustering_columns_restrctions() get_clustering_columns_restrctions() used to return a shared pointer to the clustering_restrictions class. Now everything is being converted to expression, so it should return an expression as well. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-19 16:02:01 +02:00
Pavel Emelyanov	62d95f09de	view: De-futurize make_view_update_builder() It doesn't sleep, just returns ready future with builder tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1384 it's red because e-mail notification is broken (scylla-pkg#2988) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220718132529.30751-1-xemul@scylladb.com>	2022-07-18 17:15:48 +03:00
Botond Dénes	4d2ce5c304	mutation_compactor: remove emit_only_live_rows template parameter Now that we use emit_only_live_rows::no everywhere we can remove this template parameters. Only the template parameter is removed, the internal logic around it is left in place (will be removed in a next patch), by hard-wiring `only_live()`.	2022-07-12 08:43:49 +03:00
Botond Dénes	bedc82e52c	tree: use emit_only_live_rows::no emit_only_live_rows is a convenience so downstream consumers of the mutation compactors don't have to check the `bool is_live` already passed to them. This convenience however causes a template parameter and additional logic for the compactor. As the most prominent of these consumers (the query result builder) will soon have to switch to emit_only_live_rows::no for other reasons anyway (it will want to count tombstones), we take the opportunity to switch everybody to ::no. This can be done with very little additional complexity to these consumer -- basically an additional if or two. This prepares the ground for removing this template parameter and the associate logic from the compactor.	2022-07-12 08:41:51 +03:00
Pavel Emelyanov	5526738794	view: Fix trace-state pointer use after move It's moved into .mutate_locally() but it captured and used in its continuation. It works well just because moved-from pointer looks like nullptr and all the tracing code checks for it to be non-such. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1266/ (CI job failed on post-actions thus it's red) Fixes #11015 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220711134152.30346-1-xemul@scylladb.com>	2022-07-11 17:20:51 +03:00
Jan Ciolek	1339ff1c79	cql3: Use expression instead of _partition_key_restrictions in the remaining code There are still some places that use partition_key_restrictions instead of _new_partition_key_restrictions in statement_restrictions. Change them to use the new representation Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 16:29:10 +02:00
Avi Kivity	dab56b82fa	Merge 'Per-partition rate limiting' from Piotr Dulikowski Due to its sharded and token-based architecture, Scylla works best when the user workload is more or less uniformly balanced across all nodes and shards. However, a common case when this assumption is broken is the "hot partition" - suddenly, a single partition starts getting a lot more reads and writes in comparison to other partitions. Because the shards owning the partition have only a fraction of the total cluster capacity, this quickly causes latency problems for other partitions within the same shard and vnode. This PR introduces per-partition rate limiting feature. Now, users can choose to apply per-partition limits to their tables of choice using a schema extension: ``` ALTER TABLE ks.tbl WITH per_partition_rate_limit = { 'max_writes_per_second': 100, 'max_reads_per_second': 200 }; ``` Reads and writes which are detected to go over that quota are rejected to the client using a new RATE_LIMIT_ERROR CQL error code - existing error codes didn't really fit well with the rate limit error, so a new error code is added. This code is implemented as a part of a CQL protocol extension and returned to clients only if they requested the extension - if not, the existing CONFIG_ERROR will be used instead. Limits are tracked and enforced on the replica side. If a write fails with some replicas reporting rate limit being reached, the rate limit error is propagated to the client. Additionally, the following optimization is implemented: if the coordinator shard/node is also a replica, we account the operation into the rate limit early and return an error in case of exceeding the rate limit before sending any messages to other replicas at all. The PR covers regular, non-batch writes and single-partition reads. LWT and counters are not covered here. Results of `perf_simple_query --smp=1 --operations-per-shard=1000000`: - Write mode: ``` `8f690fdd47` (PR base): 129644.11 tps ( 56.2 allocs/op, 13.2 tasks/op, 49785 insns/op) This PR: 125564.01 tps ( 56.2 allocs/op, 13.2 tasks/op, 49825 insns/op) ``` - Read mode: ``` `8f690fdd47` (PR base): 150026.63 tps ( 63.1 allocs/op, 12.1 tasks/op, 42806 insns/op) This PR: 151043.00 tps ( 63.1 allocs/op, 12.1 tasks/op, 43075 insns/op) ``` Manual upgrade test: - Start 3 nodes, 4 shards each, Scylla version `8f690fdd47` - Create a keyspace with scylla-bench, RF=3 - Start reading and writing with scylla-bench with CL=QUORUM - Manually upgrade nodes one by one to the version from this PR - Upgrade succeeded, apart from a small number of operations which failed when each node was being put down all reads/writes succeeded - Successfully altered the scylla-bench table to have a read and write limit and those limits were enforced as expected Fixes: #4703 Closes #9810 * github.com:scylladb/scylla: storage_proxy: metrics for per-partition rate limiting of reads storage_proxy: metrics for per-partition rate limiting of writes database: add stats for per partition rate limiting tests: add per_partition_rate_limit_test config: add add_per_partition_rate_limit_extension function for testing cf_prop_defs: guard per-partition rate limit with a feature query-request: add allow_limit flag storage_proxy: add allow rate limit flag to get_read_executor storage_proxy: resultize return type of get_read_executor storage_proxy: add per partition rate limit info to read RPC storage_proxy: add per partition rate limit info to query_result_local(_digest) storage_proxy: add allow rate limit flag to mutate/mutate_result storage_proxy: add allow rate limit flag to mutate_internal storage_proxy: add allow rate limit flag to mutate_begin storage_proxy: choose the right per partition rate limit info in write handler storage_proxy: resultize return types of write handler creation path storage_proxy: add per partition rate limit to mutation_holders storage_proxy: add per partition rate limit info to write RPC storage_proxy: add per partition rate limit info to mutate_locally database: apply per-partition rate limiting for reads/writes database: move and rename: classify_query -> classify_request schema: add per_partition_rate_limit schema extension db: add rate_limiter storage_proxy: propagate rate_limit_exception through read RPC gms: add TYPED_ERRORS_IN_READ_RPC cluster feature storage_proxy: pass rate_limit_exception through write RPC replica: add rate_limit_exception and a simple serialization framework docs: design doc for per-partition rate limiting transport: add rate_limit_error	2022-06-24 01:32:13 +03:00
Botond Dénes	080ed590bf	Merge "Obtain dc/rack from topology, not snitch" from Pavel Emelyanov " The way dc/rack info is maintained is very intricate. The dc/rack strings originate at snitch, get propagated via gossiper, get notified to storage service which, in turn, stores them into the system keyspace and token metadata. Code that needs to get dc/rack for a given endpoint calls snitch which tries to get the data from gossiper and if failed goes and loads it from system keyspace cache. Also there's "internal IP" thing hanging arond that loops messaging service in both -- updating and getting the info. The plan is to make topology (that currently sits on token metadata) stay the only "source of truth" regarding the endpoints' dc/rack and internal IP info. The dc/rack mappings are put into topology already, but it cannot yet fully replace snitch for two reasons: - it doesn't map internal IP to endpoint - it doesn't get data stored in system keyspace So what this patch set does is patches most of the dc/rack getters to call topology methods. The topology is temporarily patched to just call the respective snitch methods. This removes a big portion of calls for global snitch instance. After the set the places that still explicitly rely on snitch to provide dc/rack are - messaging service: needs internal IP knowledge on topology - db/consistency_level: is all "global", needs heavier patching - tests: just later " * 'br-get-dc-rack-from-topology-2' of https://github.com/xemul/scylla: proxy stats: Get rack/datacenter from topology proxy stats: Push topology arg to get_ep_stats api: Get rack/datacenter from topology hints: Remove snitch dependency hints: Get rack/datacenter from topology alternator: Get rack/datacenter from topology range_streamer: Get rack/datacenter from topology repair: Get rack/datacenter from topology view: Get rack/datacenter from topology storage_service: Get rack/datacenter from topology proxy: Get rack/datacenter from topology topology: Add get_rack/_datacenter methods	2022-06-23 10:01:36 +03:00
Piotr Dulikowski	e6beab3106	storage_proxy: add allow rate limit flag to mutate/mutate_result Now, mutate/mutate_result accept a flag which decides whether the write should be rate limited or not. The new parameter is mandatory and all call sites were updated.	2022-06-22 20:16:49 +02:00
Piotr Sarna	bc3a635c42	view: exclude using static columns in the view filter The code which applied view filtering (i.e. a condition placed on a view column, e.g. "WHERE v = 42") erroneously used a wildcard selection, which also assumes that static columns are needed, if the base table contains any such columns. The filtering code currently assumes that no such columns are fetched, so the selection is amended to only ask for regular columns (primary key columns are sent anyway, because they are enabled via slice options, so no need to ask for them explicitly). Fixes #10851 Closes #10855	2022-06-22 15:55:45 +03:00
Pavel Emelyanov	17128eb54b	view: Get rack/datacenter from topology The view code already gets token metadata from global proxy instance. Do the same to get topology object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:26 +03:00
Avi Kivity	c80999fab4	cql3: expr: push is_satisfied_by regular and static column extraction to callers is_satisfied_by() rearranges the static and regular columns from query::result_row_view form (which is a use-once iterator) to std::vector<managed_bytes_opt> (which uses the standard value representation, and allows random access which expression evaluation needs). Doing it in is_saitisfied_by() means that it is done every time an expression is evaluated, which is wasteful. It's also done even if the expression doesn't need it at all. Push it out to callers, which already eliminates some calls. We still pass cql3::expr::selection, which is a layering violation, but that is left to another time. Note that in view.cc's check_if_matches(), we should have been able to move static_and_regular_columns calculation outside the loop. However, we get crashes if we do. This is likely due to a preexisting bug (which the zero iterations loop avoids). However, in selection.cc, we are able to avoid the computation when the code claims it is only handling partition keys or clustering keys.	2022-06-12 16:12:41 +03:00
Avi Kivity	4b715226fe	cql3: expr: convert is_satisfied_by() signature to evaluation_inputs Callers are converted, but the internals are kept using the old conventions until more APIs are converted. Although the new API allows passing no query_options, the view code keeps passing dummy query_options and improvement is left as a FIXME.	2022-06-12 12:53:44 +03:00

1 2 3 4 5 ...

312 Commits