Commit Graph

57 Commits

Author SHA1 Message Date
Vlad Zolotarov
baa6496816 service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
54a758dfff cql3::select_statement: simplify the tracing code by using a tracing::make_trace_info() helper
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
a5022a09a4 tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API
In names of functions and variables:
s/flush_/write_/
s/store_/write_/

In a i_tracing_backend_helper:
s/flush()/kick()/

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Duarte Nunes
f013425bb5 query: Ensure timestamp is last param in read_command
Since the timestamp is not serialized, it must always be the last
parameter of query::read_command. This patch reorders it with the
partition_limit parameters and updates callers that specified a
timestamp argument.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468312334-10623-1-git-send-email-duarte@scylladb.com>
2016-07-12 10:41:54 +01:00
Pekka Enberg
f64c25a495 cql3/statements/select_statement: Unify coding style
The coding style in select_statement.cc is very inconsistent which makes
the code hard to read. Clean that up.
Message-Id: <1464871790-21031-1-git-send-email-penberg@scylladb.com>
2016-06-02 16:17:21 +02:00
Vlad Zolotarov
4c17a422e0 cql3: instrument a SELECT query to send tracing info
Instrument a coordinator of a SELECT query to send tracing session
info to the corresponding replica Nodes.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:17:25 +03:00
Vlad Zolotarov
6e26909b02 query::read_command: add an optional trace_info field
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:17:19 +03:00
Avi Kivity
0135b4d5cd cql3: constify metadata users
Metadata usually doesn't change after it is created; make that visible in
the code, allowing further optimizations to be applied later.
Message-Id: <1464334638-7971-3-git-send-email-avi@scylladb.com>
2016-05-31 09:12:11 +03:00
Avi Kivity
25b3d74f45 cql3: Split select_statement::raw_statement into raw namespace
cql3::select_statement::raw_statement
    -> cql3::raw::select_statement
Message-Id: <1464609556-3756-4-git-send-email-avi@scylladb.com>
2016-05-31 09:09:30 +03:00
Avi Kivity
caf8d4f0e6 cql3: separate parsed_statement and parsed_statment::prepared
cql3::statements::parsed_statement
    -> cql3::statements::raw::parsed_statement
  cql3::statements::parsed_statement::prepared
    -> cql3::statements::prepared_statement
Message-Id: <1464609556-3756-2-git-send-email-avi@scylladb.com>
2016-05-31 09:09:10 +03:00
Gleb Natapov
7f6b12c97a query: add user provided timestamp to read_command
If read query supplies timestamp  move it to read_command to be
used later otherwise get local timestamp.
2016-05-24 15:19:35 +03:00
Calle Wilund
3906dc9f0d cql3::statements: Change check_access to future<> + implement 2016-04-19 11:49:05 +00:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Tomasz Grabiec
1ecf9a7427 query: result_view: Introduce do_with()
Encapsulates linearization. Abstracts away the fact that result_view
can't work with discontiguous storage yet.
2016-02-26 12:26:13 +01:00
Pekka Enberg
dfcc48d82a transport: Add result metadata to PREPARED message
The gocql driver assumes that there's a result metadata section in the
PREPARED message. Technically, Scylla is not at fault here as the CQL
specification explicitly states in Section 4.2.5.4. ("Prepared") that the
section may be empty:

   - <result_metadata> is defined exactly as <metadata> but correspond to the
      metadata for the resultSet that execute this query will yield. Note that
      <result_metadata> may be empty (have the No_metadata flag and 0 columns, See
      section 4.2.5.2) and will be for any query that is not a Select. There is
      in fact never a guarantee that this will non-empty so client should protect
      themselves accordingly. The presence of this information is an

However, Cassandra always populates the section so lets do that as well.

Fixes #912.

Message-Id: <1456317082-31688-1-git-send-email-penberg@scylladb.com>
2016-02-24 14:43:24 +02:00
Tomasz Grabiec
09dc79f245 cql3: select_statement: Set desired serialization format 2016-02-15 17:05:55 +01:00
Tomasz Grabiec
9d11968ad8 Rename serialization_format to cql_serialization_format 2016-02-15 16:53:56 +01:00
Erich Keane
4197ceeedb raw_statement::is_reversed rewrite to avoid VLA
The is_reversed function uses a variable length array, which isn't
spec-abiding C++.  Additionally, the Clang compiler doesn't allow them
with non-POD types, so this function wouldn't compile.

After reading through the function it seems that the array wasn't
necessary as the check could be calculated inline rather than
separately.  This version should be more performant (since it no longer
requires the VLA lookup performance hit) while taking up less memory in
all but the smallest of edge-cases (when the clustering_key_size *
sizeof(optional<bool>) < sizeof(size_type) - sizeof(uint32_t) +
sizeof(bool).

This patch uses  relation_order_unsupported it assure that the exception
order is consistent with the preivous version.  The throw would
otherwise be moved into the initial for-loop.

There are two derrivations in behavior:
The first is the initial assert.  It however should not change the apparent
behavior besides causing orderings() to be looked up 2x in debug
situations.

The second is the conversion of is_reversed_ from an optional to a bool.
The result is that the final return value is now well-defined to be
false in the release-condition where orderings().size() == 0, rather
than be the ill-defined *is_reversed_ that was there previously.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454546285-16076-4-git-send-email-erich.keane@verizon.net>
2016-02-07 10:38:17 +02:00
Calle Wilund
e935c9cd34 select_statement: Make sure all aggregate queries use paging
Mainly to make sure we respect row limits. Since normal result
generation does not for aggregates.

Fixes #752 

Message-Id: <1452681048-30171-2-git-send-email-calle@scylladb.com>
2016-01-14 19:03:37 +02:00
Tomasz Grabiec
04eb58159a query: Add schema_version field to read_command 2016-01-11 10:34:51 +01:00
Pekka Enberg
d7db5e91b6 cql3: Move select_statement implementation to source file 2015-12-18 12:59:22 +02:00
Calle Wilund
fdc549cd47 select_statement: Handle aggregate queries
Fixes #549.

Being clinically absent-minded, aggregate query support (i.e. count(...))
was left out of the "paging" change set.

This adds repeated paged querying to do aggregate queries (similar to
origin). Uses "batched" paging.
2015-11-11 18:41:47 +01:00
Calle Wilund
ecd7674867 select_statement: Paging
Check if paging might be needed, and if so, use a paging object.
Similar to origin, but without all the filters.
2015-11-10 13:16:06 +01:00
Calle Wilund
4a1a17defc cql3::selection: Move result set building visitor to result_set_builder
Allows its use (and partial override - hint hint) in more place than
one.
2015-11-10 13:12:33 +01:00
Avi Kivity
2c3591cbd9 data_value de-any-fication
We use boost::any to convert to and from database values (stored in
serlialized form) and native C++ values.  boost::any captures information
about the data type (how to copy/move/delete etc.) and stores it inside
the boost::any instance.  We later retrieve the real value using
boost::any_cast.

However, data_value (which has a boost::any member) already has type
information as a data_type instance.  By teaching data_type intances about
the corresponding native type, we can elimiante the use of boost::any.

While boost::any is evil and eliminating it improves efficiency somewhat,
the real goal is growing native type support in data_type.  We will use that
later to store native types in the cache, enabling O(log n) access to
collections, O(1) access to tuples, and more efficient large blob support.
2015-10-30 17:38:51 +01:00
Pekka Enberg
1890d276b9 cql3: Add depends_on_{keyspace|column_family} helper to cql_statement
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-15 09:18:52 +03:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Paweł Dziepak
1402125bd8 cql3: reverse order of bounds for reversed selects
Because of the reverse flag in partition slice rows inside bounds will
be returned in reversed order, however, we still have to make sure
that the bounds are in the expected order.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-13 11:08:20 +02:00
Paweł Dziepak
7a7919a62e cql3: set properly partition_slice for SELECT DISTINCT
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-04 15:39:54 +02:00
Avi Kivity
bb3aef7fd9 Merge "Basic schema handling of compact strategy" from Glauber
"With this patchset, cqlsh's describe table command now work"
2015-07-23 16:47:29 +03:00
Glauber Costa
d1496944d9 sstables: handle compaction strategy
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-23 00:02:11 -04:00
Paweł Dziepak
281a8d97a5 cql3: make sure enough rows is returned in queries with IN clause
In case of queries with IN on partition key, ORDER BY and LIMIT it is
not known until after post-query sort which rows should be included in
the result set. To make sure that the output is correct each partition
specified in IN clause is queried for LIMIT rows and the excess data is
trimmed after the results are sorted.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-23 02:38:33 +02:00
Paweł Dziepak
b14aa8760a cql3: support reversed order in select statements
When partition_slice option reversed is set the query will already
return rows in desired (i.e. reversed) order. That's not true, however,
for statements using IN restriction on partition keys. In such case
post-query ordering is still needed.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-23 02:38:30 +02:00
Gleb Natapov
c58aea07c3 remove storage_proxy::query_local()
Use storage_proxy::query() instead.
2015-07-20 12:06:00 +02:00
Paweł Dziepak
f829cf373e cql3: selection::raw_selector::alias may be null
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-14 19:34:27 +02:00
Tomasz Grabiec
ad99e84505 storage_proxy: Take schema_ptr in query()
It will be needed for reconciliation.
2015-07-12 12:54:38 +02:00
Tomasz Grabiec
9724b84bb3 db: Fix query of partitions with no live clustered rows
When partition has no live regular rows, but has some data live in the
static row, then it should appear in the results, even though we
didn't select any static column.

To reproduce:

  create table cf (k blob, c blob, v blob, s1 blob static, primary key (k, c));
  update cf set s1 = 0x01 where k = 0x01;
  update cf set s1 = 0x02 where k = 0x02;
  select k from cf;

The "select" statement should return 2 rows, but was returning 0.

The following query worked fine, because static columns were included:

  select * from cf;

The data query should contain only live data, so we shouldn't write a
partition entry if it's supposed to be absent from the results. We
can'r tell that though until we've processed all the data. To solve
this problem, query result writer is using an optimistic approach,
where the partition header will be retracted from the buffer
(cheaply), if it turns out there's no live data in it.
2015-07-09 19:55:00 +02:00
Calle Wilund
333af5b61a Implement select_statement::execute_internal 2015-07-06 08:21:15 +02:00
Paweł Dziepak
290a7ca1bf query: add timestamp to read_command
Read command needs a timestamp in order to determine which cells have
already expired.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-02 17:01:19 +02:00
Gleb Natapov
a338407e29 make storage_proxy object distributed
storage_proxy holds per cpu state now to track clustering, so it has to
be distributed otherwise smp setup does not work.
2015-06-17 15:14:06 +02:00
Gleb Natapov
b7155ad862 pass partitions_ranges separately from from read_command
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
2015-06-11 15:18:07 +03:00
Calle Wilund
1631ce132e Add "storage_proxy&" argument to cql_statement::validate
To make db, schemas etc reachable
2015-06-03 10:13:52 +02:00
Calle Wilund
1d30b85ac6 Unify column_definition::column_kind and ::column_kind enums 2015-06-02 11:22:41 +02:00
Avi Kivity
db46ced43c cql3: fix shared_ptr misuse in select_statement
A shared_ptr is mutable, so it must be thread_local, not static.
2015-06-01 17:34:00 +02:00
Tomasz Grabiec
731a63e371 schema: Embed raw_schema inside schema
Public fields got encapsulated.
2015-04-24 18:01:01 +02:00
Avi Kivity
3d38708434 cql3: pass a database& instance to most foo::raw::prepare() variants
To prepare a user-defined type, we need to look up its name in the keyspace.
While we get the keyspace name as an argument to prepare(), it is useless
without the database instance.

Fix the problem by passing a database reference along with the keyspace.
This precolates through the class structure, so most cql3 raw types end up
receiving this treatment.

Origin gets along without it by using a singleton.  We can't do this due
to sharding (we could use a thread-local instance, but that's ugly too).

Hopefully the transition to a visitor will clean this up.
2015-04-20 16:15:34 +03:00
Tomasz Grabiec
ee906471ab cql3: Move method implementations to .cc 2015-04-15 20:44:59 +02:00
Tomasz Grabiec
00f99cefd4 db: split query.hh to reduce header dependencies 2015-04-15 20:44:59 +02:00
Tomasz Grabiec
878a740b9d db: Write query results in serialized form
This gives about 30% increase in tps in:

  build/release/tests/perf/perf_simple_query -c1 --query-single-key

This patch switches query result format from a structured one to a
serialized one. The problems with structured format are:

  - high level of indirection (vector of vectors of vectors of blobs), which
    is not CPU cache friendly

  - high allocation rate due to fine-grained object structure

On replica side, the query results are probably going to be serialized
in the transport layer anyway, so this change only subtracts
work. There is no processing of the query results on replica other
than concatenation in case of range queries. If query results are
collected in serialized form from different cores, we can concatenate
them without copying by simply appending the fragments into the
packet. This optimization is not implemented yet.

On coordinator side, the query results would have to be parsed from
the transport layer buffers anyway, so this also doesn't add work, but
again saves allocations and copying. The CQL server doesn't need
complex data structures to process the results, it just goes over it
linearly consuming it. This patch provides views, iterators and
visitors for consuming query results in serialized form. Currently the
iterators assume that the buffer is contiguous but we could easily
relax this in future so that we can avoid linearization of data
received from seastar sockets.

The coordinator side could be optimized even further for CQL queries
which do not need processing (eg. select * from cf where ...)  we
could make the replica send the query results in the format which is
expected by the CQL binary protocol client. So in the typical case the
coordinator would just pass the data using zero-copy to the client,
prepending a header.

We do need structure for prefetched rows (needed by list
manipulations), and this change adds query result post-processing
which converts serialized query result into a structured one, tailored
particularly for prefetched rows needs.

This change also introduces partition_slice options. In some queries
(maybe even in typical ones), we don't need to send partition or
clustering keys back to the client, because they are already specified
in the query request, and not queried for. The query results hold now
keys as optional elements. Also, meta-data like cell timestamp and
ttl is now also optional. It is only needed if the query has
writetime() or ttl() functions in it, which it typically won't have.
2015-04-15 20:44:50 +02:00
Tomasz Grabiec
f9afd82231 cql3: Remove unnecessary indirection to result_set_builder 2015-04-15 20:33:49 +02:00