Commit Graph

102 Commits

Author SHA1 Message Date
Avi Kivity
8ab20bae68 Merge 'prepared_statements: Invalidate batch statement too' from Eliran Sinvani
It seams that batch prepared statements always return false for
depends_on_keyspace and depends_on_column_family, this in turn
renders the removal criteria from the cache to always be false
which result by the queries not being evicted.
Here we change the functions to return the true state meaning,
they will return true if any of the sub queries is dependant upon
the keyspace or column family.

In this fix we first make the API more coherent and then use this new API to implement
the batch statement's dependency test.
Fixes #10129

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes #10132

* github.com:scylladb/scylla:
  prepared_statements: Invalidate batch statement too
  cql3 statements: Change dependency test API to express better it's purpose
2022-03-07 14:00:05 +02:00
Nadav Har'El
fa7a302130 cross-tree: split coordinator_result from exceptions.hh
Recently, coordinator_result was introduced as an alternative for
exceptions. It was placed in the main "exceptions/exceptions.hh" header,
which virtually every single source file in Scylla includes.
But unfortunately, it brings in some heavy header files and templates,
leading to a lot of wasted build time - ClangBuildAnalyzer measured that
we include exceptions.hh in 323 source files, taking almost two seconds
each on average.

In this patch, we split the coordinator_result feature into a separate
header file, "exceptions/coordinator_result", and only the few places
which need it include the header file. Unfortunately, some of these
few places are themselves header, so the new header file ends up being
included in 100 source files - but 100 is still much less than 323 and
perhaps we can reduce this number 100 later.

After this patch, the total Scylla object-file size is reduced by 6.5%
(the object size is a proxy for build time, which I didn't directly
measure). ClangBuildAnalyzer reports that now each of the 323 includes
of exceptions.hh only takes 80ms, coordinator_result.hh is only included
100 times, and virtually all the cost to include it comes from Boost's
result.hh (400ms per inclusion).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220228204323.1427012-1-nyh@scylladb.com>
2022-03-02 10:12:57 +02:00
Eliran Sinvani
bf50dbd35b cql3 statements: Change dependency test API to express better it's
purpose

Cql statements used to have two API functions, depends_on_keyspace and
depends_on_column_family. The former, took as a parameter only a table
name, which makes no sense. There could be multiple tables with the same
name each in a different keyspace and it doesn't make sense to
generalize the test - i.e to ask "Does a statement depend on any table
named XXX?"
In this change we unify the two calls to one - depends on that takes a
keyspace name and optionally also a table name, that way every logical
dependency tests that makes sense is supported by a single API call.
2022-02-27 11:48:03 +02:00
Piotr Dulikowski
ddf049738d indexed_table_select_statement: return some exceptions as exception messages
Adjusts the indexed_table_select_statement so that it uses the
result-aware methods in storage_proxy and propagates failed results as
result_message::exception.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
3a4d3f3175 select_statement: implement execute_without_checking_exception_message
The select_statement will be able to propagate coordinator failures
without throwing, so it's important to override the default
implementations of execute and excecute_without... so that the first
calls the latter and not the other way around.
2022-02-22 16:25:21 +01:00
Piotr Dulikowski
df7668797b select_statement: introduce helpers for working with failed results
Adds:

- Includes for result-related helper methods (to be used in later
  commits),
- Alias for coordinator_result,
- The wrap_result_to_error_message function - a bit similar to
  utils::result_wrap. Adapts a callable T -> shared_ptr<result_message>
  to take result<T> -> shared_ptr<result_message>. If the result is
  failed, it converts it into result_message::exception and returns.
2022-02-22 16:25:21 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Pavel Emelyanov
da4c29105d select_statement: Replace all proxy-s with query_processor
This is the largest user of proxy argument. Fix them all and
their callers (all sit in the same .cc file).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 10:54:28 +03:00
Pavel Emelyanov
bce2ed9c6c cql3: Make execution stages carry query_processor over
The batch_ , modification_ and select_ statements get proxy from
query processor just to push it through execution stage. Simplify
that by pushing the query processor itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 10:53:44 +03:00
Pavel Emelyanov
b990ca5550 cql3: Make .validate() and .check_access() accept query_processor
This is mostly a sed script that replaces methods' first argument
plus fixes of compiler-generated errors.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 10:53:44 +03:00
Avi Kivity
d768e9fac5 cql3, related: switch to data_dictionary
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.

data_dictionary::database::real_database() is called from several
places, for these reasons:

 - calling yet-to-be-converted code
 - callers with a legitimate need to access data (e.g. system_keyspace)
   but with the ::database accessor removed from query_processor.
   We'll need to find another way to supply system_keyspace with
   data access.
 - to gain access to the wasm engine for testing whether used
   defined functions compile. We'll have to find another way to
   do this as well.

The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.

Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
2021-12-15 13:54:23 +02:00
Pavel Emelyanov
b0a8c153f7 select_statement: Remove unused proxy args and captures
The generate_view_paging_state_from_base_query_results() has unused
proxy argument that's carried over quite a long stack for nothing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211210175203.26197-1-xemul@scylladb.com>
2021-12-10 20:39:55 +02:00
Jan Ciolek
075b3a45fd select_statement: Store whether restrictions need filtering in a variable
Instead of calculating _restrictions->need_filtering()
we can calculate it only once and then use this computed variable.

It turns out that _restrictions->need_filtering() is called
during execution of prepared statements and it has to scan through the whole AST,
so doing it only once gives us a performance gain.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-12-03 17:01:09 +01:00
Jan Ciolek
a24d06c195 cql3: Remove term in select_statement
Replace all uses of term with expression in cql3/statements/select_statement

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-10-28 20:55:09 +02:00
Avi Kivity
daf028210b build: enable -Winconsistent-missing-override warning
This warning can catch a virtual function that thinks it
overrides another, but doesn't, because the two functions
have different signatures. This isn't very likely since most
of our virtual functions override pure virtuals, but it's
still worth having.

Enable the warning and fix numerous violations.

Closes #9347
2021-09-15 12:55:54 +03:00
Piotr Sarna
7506f44c77 cql3: use existing constant for max result in indexed statements
Original code which introduced enforcing page limits for indexed
statements created a new constant for max result size in bytes.
Botond reported that we already have such a constant, so it's now
used instead of reinventing it from scratch.

Closes #8839
2021-06-10 11:08:54 +03:00
Avi Kivity
3e3003fcc1 Merge 'cql3: limit the concurrency of indexed statements' from Piotr Sarna
Indexed select statements fetch primary key information from
their internal materialized views and then use it to query
the base table. Unfortunately, the current mechanism for retrieving
base table rows makes it easy to overwhelm the replicas with unbounded
concurrency - the number of concurrent ops is increased exponentially
until a short read is encountered, but it's not enough to cap the
concurrency - if data is fetched row-by-row, then short reads usually
don't occur and as a result it's easy to see concurrency of 1M or
higher. In order to avoid overloading the replicas, the concurrency
of indexed queries is now capped at 4096 and additionally throttled
if enough results are already fetched. For paged queries it means that
the query returns as soon as 1MB of data is ready, and for unpaged ones
the concurrency will no longer be doubled as soon as the previous
iteration fetched 1MB of results.

The fixed 4096 value can be subject to debate, its reasoning is as follows:
for 2KiB rows, so moderately large but not huge, they result in
fetching 10MB of data, which is the granularity used by replicas.
For 200B rows, which is rather small, the result would still be
around 1MB.
At the same time, 4096 separate tasks also means 4096 allocations,
so increasing the number also strains the allocator.

Fixes #8799

Tests: unit(release),
       manual: observing metrics of modified index_paging_test

Closes #8814

* github.com:scylladb/scylla:
  cql3: limit the transitional result size for indexed queries
  cql3: return indexed pages after 1MB worth of data
  cql3: limit the concurrency of indexed statements
2021-06-07 18:00:51 +03:00
Piotr Sarna
60e55b6c7f cql3: return indexed pages after 1MB worth of data
Currently there's no practical limit of the resulting page size
for an indexed query, because it simply translates a page worth
of base primary keys into base rows. In order to avoid sending
too large pages, the result is returned after hitting a 1MB limit.
2021-06-07 16:05:50 +02:00
Piotr Sarna
8eeac10ded cql3: limit the concurrency of indexed statements
Indexed select statements fetch primary key information from
their internal materialized views and then use it to query
the base table. Unfortunately, the current mechanism for retrieving
base table rows makes it easy to overwhelm the replicas with unbounded
concurrency - the number of concurrent ops is increased exponentially
until a short read is encountered, but it's not enough to cap the
concurrency - if data is fetched row-by-row, then short reads usually
don't occur and as a result it's easy to see concurrency of 1M or
higher. In order to avoid overloading the replicas, the concurrency
of indexed queries is now capped at 4096.
The number can be subject to debate, its reasoning is as follows:
for 2KiB rows, so moderately large but not huge, they result in
fetching 10MB of data, which is the granularity used by replicas.
For 200B rows, which is rather small, the result would still be
around 1MB.
At the same time, 4096 separate tasks also means 4096 allocations,
so increasing the number also strains the allocator.

Fixes #8799

Tests: unit(release),
       manual: observing metrics of modified index_paging_test
2021-06-07 15:56:15 +02:00
Pavel Solodovnikov
76bea23174 treewide: reduce header interdependencies
Use forward declarations wherever possible.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>

Closes #8813
2021-06-07 15:58:35 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Emelyanov
26c115f379 cql3: Change execute()'s 1st arg to query_processor
Currently the statement's execute() method accepts storage
proxy as the first argument. This is enough for all of them
but schema altering ones, because the latter need to call
migration manager's announce.

To provide the migration manager to those who need it it's
needed to have some higher-level service that the proxy. The
query processor seems to be good candidate for it.

Said that -- all the .execute()s now accept the querty
processor instead of the proxy and get the proxy itself from
the query processor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-03-15 19:00:33 +03:00
Piotr Sarna
b71665efe8 cql3: use timeout config from client state instead of query options
... in select statement, in order to be able to remove the timeout
from query options later.
2021-02-25 17:20:27 +01:00
Piotr Sarna
157be33b89 cql3: add per-query timeout to select statement
First of all, select statement is extended with an 'attrs' field,
which keeps the per-query attributes. Currently, only TIMEOUT
parameter is legal to use, since TIMESTAMP and TTL bear no meaning
for reads.

Secondly, if TIMEOUT attribute is set, it will be used as the effective
timeout for a particular query.
2020-12-14 07:50:40 +01:00
Piotr Grabowski
2342b386f4 secondary_index: use new token_column_computation
Switches token column computation to (new) token_column_computation,
which fixes #7443, because new token column will be compared using
signed comparisons, not the previous unsigned comparison of CQL bytes
type.

This column computation type is only set if cluster supports
correct_idx_token_in_secondary_index feature to make sure that all nodes
will be able to compute (new) token_column_computation. Also old
indexes will need to be rebuilt to take advantage of this fix, as new
token column computation type is only set for new indexes.
2020-11-04 12:02:42 +01:00
Dejan Mircevski
df3ea2443b cql3: Drop all uses_function methods
No one seems to call them except for other uses_function methods.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-09-04 17:27:30 +02:00
Wojciech Mitros
45215746fe increase the maximum size of query results to 2^64
Currently, we cannot select more than 2^32 rows from a table because we are limited by types of
variables containing the numbers of rows. This patch changes these types and sets new limits.

The new limits take effect while selecting all rows from a table - custom limits of rows in a result
stay the same (2^32-1).

In classes which are being serialized and used in messaging, in order to be able to process queries
originating from older nodes, the top 32 bits of new integers are optional and stay at the end
of the class - if they're absent we assume they equal 0.

The backward compatibility was tested by querying an older node for a paged selection, using the
received paging_state with the same select statement on an upgraded node, and comparing the returned
rows with the result generated for the same query by the older node, additionally checking if the
paging_state returned by the upgraded node contained new fields with correct values. Also verified
if the older node simply ignores the top 32 bits of the remaining rows number when handling a query
with a paging_state originating from an upgraded node by generating and sending such a query to
an older node and checking the paging_state in the reply(using python driver).

Fixes #5101.
2020-08-03 17:32:49 +02:00
Botond Dénes
92a7b16cba query: read_command: add max_result_size
This field will replace max size which is currently passed once per
established rpc connection via the CLIENT_ID verb and stored as an
auxiliary value on the client_info. For now it is unused, but we update
all sites creating a read command to pass the correct value to it. In the
next patch we will phase out the old max size and use this field to pass
max size on each verb instead.
2020-07-28 18:00:29 +03:00
Rafael Ávila de Espíndola
abb36cc7d1 cql3: Don't use variadic futures in select_statement 2020-06-29 16:49:41 -07:00
Pavel Emelyanov
6892dbdde7 cql3: Add storage_proxy argument to .check_access method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 11:17:19 +03:00
Pavel Solodovnikov
8efb02146f cql3: const cleanups and API de-pointerization
* Pass raw::select_statement::parameters as lw_shared_ptr
 * Some more const cleanups here and there
 * lists,maps,sets::equals now accept const-ref to *_type_impl
   instead of shared_ptr
 * Remove unused `get_column_for_condition` from modification_statement.hh
 * More methods now accept const-refs instead of shared_ptr

Every call site where a shared_ptr was required as an argument
has been inspected to be sure that no dangling references are
possible.

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>
2020-02-20 18:14:49 +02:00
Alejo Sanchez
45a6cc5d53 cql3: single metric for range scan and full scan
Combining both range and full table scans in a single metric as
"partition range scans are used to implement full scans in scylla deployments."
Requested by @bdenes and @avi

Refs: #5209

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20200211101221.690031-2-alejo.sanchez@scylladb.com>
2020-02-18 16:16:20 +02:00
Pavel Solodovnikov
d64fd52ae5 paging_state: switch from shared_ptr to lw_shared_ptr
Change the way `service::pager::paging_state` is passed around
from `shared_ptr` to `lw_shared_ptr`. It's safe since
`paging_state` is final.

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-02-16 17:23:36 +03:00
Alejo Sanchez
936cae6069 Range scan query counter
Fixes #5209

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-01-24 15:02:58 +01:00
Alejo Sanchez
f57513a809 Counter of queries doing full scan.
In scope of #5209

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-01-24 14:25:19 +01:00
Pavel Solodovnikov
412f1f946a cql: remove "mutable" on _opts in select_statement
_opts initialization can be safely done in the constructor, hence no need to make it mutable.
2019-11-26 17:55:10 +03:00
Konstantin Osipov
90346236ac cql: propagate const property through prepared statement tree.
cql_statement is a class representing a prepared statement in Scylla.
It is used concurrently during execution, so it is important that its
change is not changed by execution.

Add const qualifier to the execution methods family, throghout the
cql hierarchy.

Mark a few places which do mutate prepared statement state during
execution as mutable. While these are not affecting production today,
as code ages, they may become a source of latent bugs and should be
moved out of the prepared state or evaluated at prepare eventually:

cf_property_defs::_compaction_strategy_class
list_permissions_statement::_resource
permission_altering_statement::_resource
property_definitions::_properties
select_statement::_opts
2019-11-26 14:18:17 +03:00
Nadav Har'El
b38c3f1288 Merge "Add separate counters for accesses to system tables"
Merged patch series from Juliusz Stasiewicz:

Welcome to my first PR to Scylla!
The task was intended as a warm-up ("noob") exercise; its description is
here: #4182 Sorry, I also couldn't help it and did some scouting: edited
descriptions of some metrics and shortened few annoyingly long LoC.
2019-11-19 15:21:56 +02:00
Juliusz Stasiewicz
1cfa458409 metrics: separate counters for `system' KS accesses
Resolves #4182. Metrics per system tables are accumulated separately,
depending on the origin of query (DB internals vs clients).
2019-11-14 13:14:39 +01:00
Rafael Ávila de Espíndola
d9337152f3 Use threads when executing user functions
This adds a requires_thread predicate to functions and propagates that
up until we get to code that already returns futures.

We can then use the predicate to decide if we need to use
seastar::async.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Piotr Sarna
fe18638de3 cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
The constant will be later used in test scenarios.
2019-06-24 13:21:37 +02:00
Piotr Sarna
7a8b243ce4 cql3: split execute_base_query implementation
In order to handle aggregation queries correctly, the function that
returns base query results is split into two, so it's possible to
access raw query results, before they're converted into end-user
CQL message.
2019-06-24 12:57:03 +02:00
Dejan Mircevski
c3929aee3a Propagate GROUP BY indices to result_set_builder
Ensure that the indices recorded in select_statement are passed to
result_set_builder when one is created for processing the cell values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Dejan Mircevski
274a77f45e Process GROUP BY columns into select_statement
Validate raw GROUP BY identifiers and translate them into
a select_statement member.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Piotr Sarna
87f6e37caa cql3: move finding index restrictions to prepare stage
Index restrictions that match a given index were recomputed
during execution stage, which is redundant and prone to errors.
Now, used index restrictions are cached in a prepare statement.
2019-03-20 10:20:22 +01:00
Piotr Sarna
75dd964751 cql3: add handling partition slices for local indexes
For local indexes, a slice will consist of the indexed column
followed by base clustering columns.
2019-03-20 10:20:01 +01:00
Piotr Sarna
b12162c8f5 cql3: add returning correct partition ranges for local indexes
Local indexes always share the partition range with their base.
2019-03-20 09:51:46 +01:00
Piotr Sarna
da8e8f18b3 cql3: make read_posting_list a member function
It already accepts several arguments that can be extracted from 'this',
and more will be added in the future.
New parameters include lambdas prepared during prepare stage
that define how to extract partition/clustering key ranges depending
on which index is used, so keeping it a static function will result
in unbounded number of parameters with complex types, which will
in turn make the function header almost illegible for a reader.
Hence, read_posting_list becomes a member function with easy access
to any data prepared during prepare stage.
2019-03-20 09:51:46 +01:00
Piotr Sarna
c743617236 cql3: unify max value for row limit and per-partition limit
Limits are stored as uint32_t everywhere, but in some places
int32_t was used, which created inconsistencies when comparing
the value to std::numeric_limits<Type>::max().
In order to solve inconsistencies, the types are unified to uint32_t,
and instead of explicitly calling numeric limit max,
an already existing constant value query::max_rows is utilized.

Fixes #4253

Message-Id: <4234712ff61a0391821acaba63455a34844e489b.1550683120.git.sarna@scylladb.com>
2019-02-21 13:56:02 +02:00
Piotr Sarna
41b466246e cql3: add get_per_partition_limit 2019-02-18 10:29:34 +01:00