Commit Graph

68 Commits

Author SHA1 Message Date
Avi Kivity
257c17a87a Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael
"
While working on another patch I was getting odd compiler errors
saying that a call to ::make_shared was ambiguous. The reason was that
seastar has both:

template <typename T, typename... A>
shared_ptr<T> make_shared(A&&... a);

template <typename T>
shared_ptr<T> make_shared(T&& a);

The second variant doesn't exist in std::make_shared.

This series drops the dependency in scylla, so that a future change
can make seastar::make_shared a bit more like std::make_shared.
"

* 'espindola/make_shared' of https://github.com/espindola/scylla:
  Everywhere: Explicitly instantiate make_lw_shared
  Everywhere: Add a make_shared_schema helper
  Everywhere: Explicitly instantiate make_shared
  cql3: Add a create_multi_column_relation helper
  main: Return a shared_ptr from defer_verbose_shutdown
2020-08-02 19:51:24 +03:00
Botond Dénes
92a7b16cba query: read_command: add max_result_size
This field will replace max size which is currently passed once per
established rpc connection via the CLIENT_ID verb and stored as an
auxiliary value on the client_info. For now it is unused, but we update
all sites creating a read command to pass the correct value to it. In the
next patch we will phase out the old max size and use this field to pass
max size on each verb instead.
2020-07-28 18:00:29 +03:00
Rafael Ávila de Espíndola
ad6d65dbbd Everywhere: Explicitly instantiate make_shared
seastar::make_shared has a constructor taking a T&&. There is no such
constructor in std::make_shared:

https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared

This means that we have to move from

    make_shared(T(...)

to

    make_shared<T>(...)

If we don't want to depend on the idiosyncrasies of
seastar::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Alejo Sanchez
d1521e6721 lwt: validate before constructing metadata
LWT batches conditions can't span multiple tables.
This was detected in batch_statement::validate() called in ::prepare().
But ::cas_result_set_metadata() was built in the constructor,
causing a bitset assert/crash in a reported scenario.
This patch moves validate() to the constructor before building metadata.

Closes #6332

Tested with https://github.com/scylladb/scylla-dtest/pull/1465

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-05-18 10:40:21 +02:00
Alejo Sanchez
74edb3f20b lwt: consistent exception message case
Fix case Batch -> BATCH to match similar exception in same file

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-05-18 10:40:06 +02:00
Pavel Solodovnikov
f6e765b70f cql3: pass column_specification via lw_shared_ptr
`column_specification` class is marked as "final": it's safe
to use non-polymorphic pointer "lw_shared_ptr" instead of a
more generic "shared_ptr".

tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200427084016.26068-1-pa.solodovnikov@scylladb.com>
2020-04-27 12:47:42 +03:00
Konstantin Osipov
18b9bb57ac lwt: rename metrics to match accepted terminology
Rename inherited metrics cas_propose and cas_commit
to cas_accept and cas_learn respectively.

A while ago we made a decision to stick to widely accepted
terms for Paxos rounds: prepare, accept, learn. The rest
of the code is using these terms, so rename the metrics
to avoid confusion/technical debt.

While at it, rename a few internal methods and functions.

Fixes #6169

Message-Id: <20200414213537.129547-1-kostja@scylladb.com>
2020-04-15 12:20:30 +02:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00
Rafael Ávila de Espíndola
c0072eab30 everywhere: Be more explicit that we don't want std::make_shared
If sstring is made an alias to std::string ADL causes std::make_shared
to be found. Explicitly ask for ::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-03-10 13:13:48 -07:00
Pavel Emelyanov
a0a0d40267 cql3: Use proxy arg in batch_statement::verify_batch_size
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 11:17:47 +03:00
Pavel Emelyanov
6892dbdde7 cql3: Add storage_proxy argument to .check_access method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 11:17:19 +03:00
Pavel Solodovnikov
a46f235092 cql3: prefer passing schema as const ref instead of shared_ptr
De-pointerize cql3 code APIs further: change some call sites
to pass `schema` as const-ref instead of `shared_ptr`.

Affected functions known to be expecting always non-null
pointer to schema and don't store or pass the pointer somewhere
else, assuming it's safe to give them just a reference.

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200218142338.69824-1-pa.solodovnikov@scylladb.com>
2020-02-18 20:13:10 +02:00
Piotr Jastrzebski
abd76e566f dht::shard_of: stop calling global_partitioner()
Take const schema& as a parameter of shard_of and
use it to obtain partitioner instead of calling
global_partitioner().

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:23:16 +01:00
Konstantin Osipov
d4866c1a28 cql3: remove prepared alias for prepared_statement
cql3 has cql_statement, parsed_statement and prepared_statement
classes, which, largely, stand for the same thing. prepared was
an alias for prepared_statement which only required an extra
tag jump in IDE and carried no meaning.
2020-02-12 16:44:43 +03:00
Gleb Natapov
2876482373 lwt: account for cases where LWT request were moved to another shard in statistics
Now that we bounce lwt requests to the correct shard before calling into
storage_proxy the cross shard op accounting does not account for bounced
lwt statement. Fix that by increasing corresponding counter when
returning a "bounce" reply.

Message-Id: <20200203122011.GH26048@scylladb.com>
2020-02-04 10:20:28 +02:00
Pavel Solodovnikov
e1b22b6a4c cql3: get rid of lw_shared_ptr for variable_specifications
`parsed_statement::get_bound_variables` is assumed to always
return a nonnull pointer to `variable_specifications` instance.

In this case using a pointer is superfluous and can be safely
replaced by a plain reference.

Also add a default ctor and a utility method `set_bound_variables`
to the `variable_specifications` class to actually reset the
contents of the class instance.

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200120195839.164296-1-pa.solodovnikov@scylladb.com>
2020-01-22 12:51:02 +02:00
Gleb Natapov
d28dd4957b lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2020-01-13 10:26:02 +02:00
Avi Kivity
f7d69b0428 Revert "Merge "bouncing lwt request to an owning shard" from Gleb"
This reverts commit 64cade15cc, reversing
changes made to 9f62a3538c.

This commit is suspected of corrupting the response stream.

Fixes #5479.
2019-12-17 11:06:10 +02:00
Gleb Natapov
964c532c4f lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2019-12-11 14:41:31 +02:00
Konstantin Osipov
90346236ac cql: propagate const property through prepared statement tree.
cql_statement is a class representing a prepared statement in Scylla.
It is used concurrently during execution, so it is important that its
change is not changed by execution.

Add const qualifier to the execution methods family, throghout the
cql hierarchy.

Mark a few places which do mutate prepared statement state during
execution as mutable. While these are not affecting production today,
as code ages, they may become a source of latent bugs and should be
moved out of the prepared state or evaluated at prepare eventually:

cf_property_defs::_compaction_strategy_class
list_permissions_statement::_resource
permission_altering_statement::_resource
property_definitions::_properties
select_statement::_opts
2019-11-26 14:18:17 +03:00
Nadav Har'El
b38c3f1288 Merge "Add separate counters for accesses to system tables"
Merged patch series from Juliusz Stasiewicz:

Welcome to my first PR to Scylla!
The task was intended as a warm-up ("noob") exercise; its description is
here: #4182 Sorry, I also couldn't help it and did some scouting: edited
descriptions of some metrics and shortened few annoyingly long LoC.
2019-11-19 15:21:56 +02:00
Juliusz Stasiewicz
1cfa458409 metrics: separate counters for `system' KS accesses
Resolves #4182. Metrics per system tables are accumulated separately,
depending on the origin of query (DB internals vs clients).
2019-11-14 13:14:39 +01:00
Vladimir Davydov
25aeefd6f3 cql: fix CAS consistency level validation
This patch resurrects Cassandra's code validating a consistency level
for CAS requests. Basically, it makes CAS requests use a special
function instead of validate_for_write to make error messages more
coherent.

Note, we don't need to resurrect requireNetworkTopologyStrategy as
EACH_QUORUM should work just fine for both CAS and non-CAS writes.
Looks like it is just an artefact of a rebase in the Cassandra
repository.
2019-11-14 12:15:39 +01:00
Konstantin Osipov
6159c012db schema: pre-allocate the bitset of column_set
The number of columns is usually small, and avoiding
a resize speeds up bit manipulation functions.
2019-11-13 11:41:51 +03:00
Vladimir Davydov
f0075ba845 cql: account cas requests separately
This patch adds "type" label to the following CQL metrics:

  inserts
  updates
  deletes
  batches
  statements_in_batches

The label is set to "cas" for conditional statements and "non-cas" for
unconditional statements.

Note, for a batch to be accounted as CAS, it is enough to have just one
conditional statement. In this case all statements within the batch are
accounted as CAS as well.
2019-10-30 13:44:35 +03:00
Konstantin Osipov
e555dc502e lwt: implement basic lightweight transactions support
Support single-statement conditional updates and as well as batches.

This patch almost fully rewrites column_condition.cc, implementing
is_satisfied_by().

Most of the remaining complications in column_condition implementation
come from the need to properly handle frozen and multi-cell
collection in predicates - up until now it was not possible
to compare entire collection values between each other. This is further
complicated since multi-cell lists and sets are returned as maps.

We can no longer assume that the columns fetched by prefetch operation
are non-frozen collections. IF EXISTS/IF NOT EXISTS condition
fetches all columns, besides, a column may be needed to check other
condition.

When fetching the old row for LWT or to apply updates on list/columns,
we now calculate precisely the list of columns to fetch.

The primary key columns are also included in CAS batch result set,
and are thus also prefetched (the user needs them to figure out which
statements failed to apply).

The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502
but does deviate from the origin in handling of conditions on static
row cells. This is addressed in future series.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
df28985295 lwt: introduce cql_statment_opt_metadata
cql_statement_opt_metadata is an interim node
in cql (prepared) statement hierarchy parenting
modification_statement and batch_statement. If there
is IF condition in such statements, they return a result set,
and thus have a result set metadata.

The metadata itself is filled in a subsequent patch.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
e8c13efb41 lwt: move mutation hashers to mutation.hh
Prepare mutation hashers for reuse in CAS implementation.
Message-Id: <20190930202409.40561-2-kostja@scylladb.com>
2019-10-01 19:49:31 +02:00
Konstantin Osipov
6cde985946 lwt: remove code that no longer servers as a reference
Remove ifdef'ed Java code, since LWT implementation
is based on the current state of the origin.
Message-Id: <20190930201022.40240-2-kostja@scylladb.com>
2019-10-01 19:46:15 +02:00
Gleb Natapov
e72a105b5e lwt: Pass client_state reference all the way to storage_proxy::query
client_state holds a state to generate monotonically increasing unique
timestamp. Queries with a SERIAL consistency level need it to generate
a paxos round.
2019-09-26 11:44:00 +03:00
Gleb Natapov
6a4207f202 Pass service permit to storage_proxy
Current cql transport code acquire a permit before processing a query and
release it when the query gets a reply, but some quires leave work behind.
If the work is allowed to accumulate without any limit a server may
eventually run out of memory. To prevent that the permit system should
account for the background work as well. The patch is a first step in
this direction. It passes a permit down to storage proxy where it will
be later hold by background work.
2019-08-12 10:20:43 +03:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Botond Dénes
1865e5da41 treewide: remove include database.hh from headers where possible
Many headers don't really need to include database.hh, the include can
be replaced by forward declarations and/or including the actually needed
headers directly. Some headers don't need this include at all.

Each header was verified to be compilable on its own after the change,
by including it into an empty `.cc` file and compiling it. `.cc` files
that used to get `database.hh` through headers that no longer include it
were changed to include it themselves.
2018-12-14 08:03:57 +02:00
Avi Kivity
cb7ee5c765 cql3: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
512baf536f storage_proxy: implement write timeouts
Require a timeout parameter for storage_proxy::mutate_begin() and
all its callers (all the way to thrift and cql modification_statement
and batch_statement).

This should fix spurious debug-mode test failures, where overcommit
and general debug slowness result in the default timeouts being
exceeded. Since the tests use infinite timeouts, they should not
time out any more.

Tests: unit (release), with an extra patch that aborts
    when a non-infinite timeout is detected.
Message-Id: <20180707204424.17116-1-avi@scylladb.com>
2018-07-08 10:27:03 +01:00
Paweł Dziepak
e55034a33e cql3: batch_statement: use external_memory_usage() to get mutation size
batch_statement::verify_batch_size() verifies that the total size of
mutations generated by the batch statement is smaller than certain
configurable thresholds. This is done by a custom mutation_partition
visitor, which violates atomic_cell_view::value() preconditions by
calling it even for dead cells.

The simples solution is to use
mutation_partition::external_memory_usage() instead.

Message-Id: <20180619131405.12601-1-pdziepak@scylladb.com>
2018-06-19 16:26:52 +03:00
Avi Kivity
9479d3f345 cql: make batch_statement execution_stage scheduling aware
Inherit scheduling from the caller, preventing a fall back into the main group.
2018-06-18 18:30:21 +03:00
Paweł Dziepak
aa25f0844f atomic_cell: introduce fragmented buffer value interface
As a prepratation for the switch to the new cell representation this
patch changes the type returned by atomic_cell_view::value() to one that
requires explicit linearisation of the cell value. Even though the value
is still implicitly linearised (and only when managed by the LSA) the
new interface is the same as the target one so that no more changes to
its users will be needed.
2018-05-31 15:51:11 +01:00
Avi Kivity
b70febe246 cql: cql_statement: remove execute_internal()
With no callers, it can be safely removed.
2018-05-27 12:40:27 +03:00
Vlad Zolotarov
9723988926 cql3::statements::batch_statement: introduce a single_statement class
This is a helper class needed to control the handling process of a single
statement in the current batch. In particular it has the boolean defining
if the authorization is needed for this statement.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-22 20:15:03 -04:00
Avi Kivity
49fdf01b5d cql3: define and populate timeout_config_selector
Determine which timeout we need to apply at prepare time. We
don't know the numerical value (since it depends on whoever is
executing the query, not just the statement type), but we know
which member of timeout_config we need, so determine and remember
that.
2018-04-30 13:19:49 +03:00
Avi Kivity
f7b102238a cql3: change cql_statement methods to accept a local storage_proxy
The storage_proxy represents the entire cluster, so there's never a need
to access it on a remote shard; the local shard instance will contact
remote shard or remote nodes as needed.

Simplify the API by passing storage_proxy references instead of
seastar::sharded<storage_proxy> references. query_processor and
other callers are adjusted to call seastar::sharded::local() first.
Message-Id: <20180415142656.25370-2-avi@scylladb.com>
2018-04-16 10:18:28 +02:00
Piotr Jastrzebski
05b56fcfb0 mutation_partition: Add support for specifying continuity
This will allow expressing lack of information about certain ranges of
rows (including the static row), which will be used in cache to
determine if information in cache is complete or not.

Continuity is represented internally using flags on row entries. The
key range between two consecutive entries is continuous iff
rows_entry::continuous() is true for the later entry. The range
starting after the last entry is assumed to be continuous. The range
corresponding to the key of the entry is continuous iff
rows_entry::dummy() is false.

[tgrabiec:
  - based on the following commits:
     4a5bf75 - Piotr Jastrzebski : mutation_partition: introduce dummy rows_entry
     773070e - Piotr Jastrzebski : mutation_partition: add continuity flag to rows_entry
  - documented that partition tombstone is always complete
  - require specifying the partition tombstone when creating an incomplete entry
  - replaced rows_entry(dummy_tag, ...) constructor with more general
    rows_entry(position_in_partition, ...)
  - documented continuity semantics on mutation_partition
  - fixed _static_row_cached being lost by mutation_partition copy constructors
  - fixed conversion to streamed_mutation to ignore dummy entries
  - fixed mutation_partition serializer to drop dummy entries
  - documented semantics of continuity on mutation_partition level
  - dropped assumptions that dummy entries can be only at the last position
  - changed equality to ignore continuity completely, rather than
    partially (it was not ignoring dummy entries, but ignoring
    continuity flag)
  - added printout of continuity information in mutation_partition
  - fixed handling of empty entries in apply_reversibly() with regards
    to continuity; we no longer can remove empty entries before
    merging, since that may affect continuity of the right-hand
    mutation. Added _erased flag.
  - fixed mutation_partition::clustered_row() with dummy==true to not ignore the key
  - fixed partition_builder to not ignore continuity
  - renamed dummy_tag_t to dummy_tag. _t suffix is reserved.
  - standardized all APIs on is_dummy and is_continuous bool_class:es
  - replaced add_dummy_entry() with ensure_last_dummy() with safer semantics
  - dropped unused remove_dummy_entry()
  - simplified and inlined cache_entry::add_dummy_entry()
  - fixed mutation_partition(incomplete_tag) constructor to mark all row ranges as discontinuous
  ]
2017-06-24 18:06:11 +02:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Pekka Enberg
dfee4d2bb0 cql3: Fix partition key bind indices for prepared statements
Fix the CQL front-end to populate the partition key bind index array in
result message prepared metadata, which is needed for CQL binary
protocol v4 to function correctly.

Fixes #2355.

Message-Id: <1494247871-3148-1-git-send-email-penberg@scylladb.com>
2017-05-08 16:33:17 +03:00
Duarte Nunes
4e693383f7 mutation_partion: Use row_tombstone
This patch replaces the current row tombstone representation by a
row_tombstone.

The intent of the patch is thus to reify the idea of shadowable
tombstones, that up until now we considered all materialized view row
tombstones to be.

We need to distinguish shadowable from non-shadowable row tombstones
to support scenarios such as, when inserting to a table with a
materialzied view:

1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1
2. delete from base using timestamp 2 where p = 3
3. insert into base (p, v1) values (3, 1) using timestamp 3

These should yield a view row where v2 is definitely null, but with
the current implementation, v2 will pop back with its value v2=3@TS=1,
even though its dead in the base row. This is because the row
tombstone inserted at 2) is a shadowable one.

This patch only addresses the memory representation of such
row_tombstones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-04-25 11:46:33 +02:00
Vlad Zolotarov
75fbc7c558 cql3::statements::batch_statement: add a constructor that doesn't receive the "bound_terms" value
This constructor should be used when we know that there are no bound terms in the current
batch statement.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-04-12 12:24:08 -04:00
Vlad Zolotarov
ff55b76562 cql3::query_processor: use weak_ptr for passing the prepared statements around
Use seastar::checked_ptr<weak_ptr<pepared_statement>> instead of shared_ptr for passing prepared statements around.
This allows an easy tracking and handling of statements invalidation.

This implementation will throw an exception every time an invalidated
statement reference is dereferenced.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-04-12 12:24:03 -04:00
Avi Kivity
27c42359bc Merge seastar upstream
* seastar 6b21197...2ebe842 (6):
  > Merge "Various improvements to execution stages" from Paweł
  > app-template: allow apps to specify a name for help message
  > bool_class: avoid initializing object of incomplete type
  > app-template: make sure we can still get help with required options
  > prometheus: Http handler that returns prometheus 0.4 protobuf or text format
  > Update DPDK to 17.02

Includes patch from Pawel to adjust to updated execution_stage interface.
2017-03-26 10:50:21 +03:00
Gleb Natapov
d34f3a0440 batchlog: introduce batch_size_fail_threshold_in_kb option
Add batch_size_fail_threshold_in_kb to prevent huge batch from been
applied and causing troubles. Also do not warn or fail if only one
partition is affected.

Fixes: #2128

Message-Id: <20170309111247.GE8197@scylladb.com>
2017-03-09 12:20:17 +01:00