Commit Graph

57 Commits

Author SHA1 Message Date
Paweł Dziepak
e95f4eaee4 Merge "partition_limit: Don't count dead partitions" from Duarte
"This patch series ensures we don't count dead partitions (i.e.,
partitions with no live rows) towards the partition_limit. We also
enforce the partition limit at the storage_proxy level, so that
limits with smp > 1 works correctly."

(cherry picked from commit 5f11a727c9)
2016-08-03 12:44:32 +03:00
Vlad Zolotarov
c1bb4d147d query::read_command: std::move() std::experimental::optional when initializing trace_info
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Duarte Nunes
aaa76d58ba query: Move to_partition_range to dht namespace
This patch moves to_partition_range, from the query namespace
to the dht namespace, where it is a more natural fit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468498060-19251-1-git-send-email-duarte@scylladb.com>
2016-07-15 10:41:52 +02:00
Duarte Nunes
21d0a2c764 query: Optionally send cell ttl
This patch adds support to send a cell's ttl as part of a query's
result. This is needed for thrift support.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
f013425bb5 query: Ensure timestamp is last param in read_command
Since the timestamp is not serialized, it must always be the last
parameter of query::read_command. This patch reorders it with the
partition_limit parameters and updates callers that specified a
timestamp argument.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468312334-10623-1-git-send-email-duarte@scylladb.com>
2016-07-12 10:41:54 +01:00
Avi Kivity
28fab55e6e Merge "Convert sstable writes to streamed mutations" from Paweł
"This series converts sstable writers (including compaction) to streamed
mutations and makes them use consumer-style interface.

Code related to sstable writes and compaction is converted to consumers
that can be used with consume_flattened_in_thread() (which is a variant
of consume_flattened() intended to be run inside a thread).
compac_for_query is improved so that it can be reused by sstable
compaction."
2016-07-04 15:07:47 +03:00
Paweł Dziepak
3c08ffb275 query: add full_slice
query::full_slice is a partiton slice which has full clustering row
ranges for all partition keys and no per-partition row limit.
Options and columns are not set.

It is used as a helper object in cases when a reference to
partition_slice is needed but the user code needs just all data there is
(an example of such case would be sstable compaction).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Duarte Nunes
0ae6eafadd query: Make partition_limit last parameter
The partition_limit should have been added to the end of the ctor
argument list, as its current placement causes some callers to pass it
the timestamp instead of the limit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1467239360-6853-3-git-send-email-duarte@scylladb.com>
2016-06-30 12:31:11 +02:00
Duarte Nunes
69798df95e query: Limit number of partitions returned
This is required to implement a thrift verb.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:48:13 +02:00
Duarte Nunes
01b18063ea query: Add per-partition row limit
This patch as a per-partition row limit. It ensures both local
queries and the reconciliation logic abide by this limit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:46:51 +02:00
Vlad Zolotarov
6e26909b02 query::read_command: add an optional trace_info field
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:17:19 +03:00
Gleb Natapov
7f6b12c97a query: add user provided timestamp to read_command
If read query supplies timestamp  move it to read_command to be
used later otherwise get local timestamp.
2016-05-24 15:19:35 +03:00
Piotr Jastrzebski
8307681975 Introduce clustering_ranges type.
It will be used to slice data returned by mutation_readers.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:09 +02:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Tomasz Grabiec
63006e5dd2 query: Serialize collection cells using CQL format
We want the format of query results to be eventually defined in the
IDL and be independent of the format we use in memory to represent
collections. This change is a step in this direction.

The change decouples format of collection cells in query results from
our in-memory representation. We currently use collection_mutation_view,
after the change we will use CQL binary protocol format. We use that because
it requires less transformations on the coordinator side.

One complication is that some list operations need to retrieve keys
used in list cells, not only values. To satisfy this need, new query
option was added called "collections_as_maps" which will cause lists
and sets to be reinterpreted as maps matching their underlying
representation. This allows the coordinator to generate mutations
referencing existing items in lists.
2016-02-15 17:05:55 +01:00
Tomasz Grabiec
5f756fcbe5 query: Add cql_format property to partition_slice
It will specify in which format CQL values should be serialized. Will
allow for rolling out new CQL binary protocol versions without
stalling reads.
2016-02-15 17:05:55 +01:00
Tomasz Grabiec
916a91c913 query: Split send_timestamp_and_expiry into two separate options
It's cleaner that way. They don't need to come together.
2016-02-15 16:53:56 +01:00
Tomasz Grabiec
945ae5d1ea Move std::hash<range<T>> definition to range.hh
Message-Id: <1454008052-5152-1-git-send-email-tgrabiec@scylladb.com>
2016-01-31 20:11:30 +02:00
Gleb Natapov
043d132ba9 Remove no longer used serializers. 2016-01-24 12:45:41 +02:00
Gleb Natapov
6cc5b15a9c Fix read_command constructor to not copy parameters. 2016-01-24 12:45:41 +02:00
Gleb Natapov
7357b1ddfe Move specific_ranges to .hh and un-nest it.
Serializer requires class to be defined, so it has to be in .h file. It
also does not support nested types yet, so move it outside of containing
class.
2016-01-24 12:45:41 +02:00
Gleb Natapov
9ae7dc70da Prepare partition_slice to be used by serializer.
Add missing _specific_ranges getter and setter.
2016-01-24 12:45:41 +02:00
Calle Wilund
8de95cdee8 paging bugfix: Allow reset/removal of "specific ck range"
Refs #752

Paged aggregate queries will re-use the partition_slice object,
thus when setting a specific ck range for "last pk", we will hit
an exception case.
Allow removing entries (actually only the one), and overwriting
(using schema equality for keys), so we maintain the interface
while allowing the pager code to re-set the ck range for previous
page pass.

[tgrabiec: commit log cleanup, fixed issue ref]

Message-Id: <1452616259-23751-1-git-send-email-calle@scylladb.com>
2016-01-12 17:45:57 +01:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
04eb58159a query: Add schema_version field to read_command 2016-01-11 10:34:51 +01:00
Calle Wilund
284b10cabe Make partition_slice::row_ranges mulitplex on partition
Allows for having more than one clustering row range set, depending on
PK queried (although right now limited to one - which happens to be exactly
the number of mutiplexing paging needs... What a coincidence...)

Encapsulates the row_ranges member in a query function, and if needed holds
ranges outside the default one in an extra object.

Query result::builder::add_partition now fetches the correct row range for
the partition, and this is the range used in subsequent iteration.
2015-11-10 13:12:33 +01:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Avi Kivity
987294a412 Add missing copyrights 2015-09-20 10:16:11 +03:00
Tomasz Grabiec
cda31eccf7 db: Use LSA to allocate data inside memtable 2015-08-06 14:05:16 +02:00
Paweł Dziepak
71e7d3bc20 partition_slice: add distinct option
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-04 15:38:36 +02:00
Avi Kivity
98ec451d6a Extract range<> into its own header
It's not just for queries any more.
2015-08-02 16:07:42 +03:00
Nadav Har'El
e24d6c21d9 range: Use std::declval in std::hash<>
Use C++11's std::declval<T>() instead of my ad-hoc scary-looking
idiom *(T*)nullptr.

Both techniques produce an object of type T which is only useful for
unevaluated contexts, only inspecting an object's type and not is value.
For example, in decltype() expressions.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-27 16:46:16 +03:00
Nadav Har'El
cc64e46425 Add equality operator for range
The operator== is needed when actually using a hash table - the hash
function is not enough.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-27 15:55:42 +03:00
Nadav Har'El
afa3d8c2c8 Fix errors in hash function for range
Amazing how many errors a short of piece of code can have, without the
compiler complaining at all. The magic of templates :-)

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-27 15:55:41 +03:00
Nadav Har'El
1399087753 Allow range<T> as hash-table key
Some methods in storage_service.cc want to return an
unordered_set<query::range<dht::token>>. This patch adds the missing
hash function for a query::range<T> to make it usable as a hash-table key.

The hash function we used is a trivial linear combination of the range's
start and end hash function - the same function used by Cassandra's
AbstractBounds.hashCode() so it is probably "good enough".

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-27 14:44:33 +03:00
Tomasz Grabiec
5e03dea65d range: Fix is_wrap_around()
In Origin, dht.Range() with equal values is considered a full wrap
around. Make our range<> recognize this.

So we have:

 ]x; x] - wrap around, full ring
 [x; x[ - wrap around, full ring
 ]x; x[ - wrap around, excluding x
 [x; x] - not wrap around, only x included
2015-07-24 16:08:41 +02:00
Tomasz Grabiec
1b7ab4f639 range: Introduce unwrap() 2015-07-24 16:08:41 +02:00
Tomasz Grabiec
1c95f646ae range: Make before() and after() public 2015-07-24 16:08:41 +02:00
Tomasz Grabiec
4d06c2aa1d Move to_partition_range() adaptor to global scope
It should be moved to i_partitioner.hh, but to do that range<> has to
be first moved out of query-request.hh to break cyclic dependency.
I didn't want to cause conflicts with in-flight patches to range<>.
2015-07-24 16:08:41 +02:00
Tomasz Grabiec
e5feff5d71 dht: ring_position: Switch to total ordering
range::is_wrap_around() and range::contains() rely on total ordering
on values to work properly. Current ring_position_comparator was only
imposing a weak ordering (token positions equal to all key positions
with that token).

range::before() and range::after() can't work for weak ordering. If
the bound is exclusive, we don't know if user-provided token position
is inside or outside.

Also, is_wrap_around() can't properly detect wrap around in all
cases. Consider this case:

 (1) ]A; B]
 (2) [A; B]

For A = (tok1) and B = (tok1, key1), (1) is a wrap around and (2) is
not. Without total ordering between A and B, range::is_wrap_around() can't
tell that.

I think the simplest soution is to define a total ordering on
ring_position by making token positions positioned either before or
after all keys with that token.
2015-07-24 16:08:41 +02:00
Paweł Dziepak
0f9a740012 query: add parition_slice::option::reversed
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-23 02:38:19 +02:00
Tomasz Grabiec
6373987704 partition_range: Introduce is_wrap_around() helper 2015-07-22 13:13:38 +02:00
Tomasz Grabiec
0b0ea04958 range: Remove start_value() and end_value()
It's easy to miss that they may be undefined. start() and end(), which
return optional<bound> const&, make it clear.
2015-07-22 10:27:47 +02:00
Tomasz Grabiec
93d607aac6 range: Make singular ranges transparent
Singular range is a range of the form [x; x]. Internally such ranges
are optimized to avoid storing the value twice, so _end is empty.

Client shouldn't have to special case for it if it's not interested,
so make end() and split() work for singular ranges too.
2015-07-18 11:10:31 +02:00
Gleb Natapov
92a016fe9f add split() for range<ring_position> 2015-07-15 12:41:38 +03:00
Avi Kivity
b89c54df9d query::range: fix call to internal deserialized_size() template
query::range::serialized_size() calls a template method (also
called serialized_size()) within a lambda, however it is called with
a const 'this' while the template is non-const-qualified.  gcc 5
rightly rejects it (nicely annotating the template as a near miss).

Fix by making the template static; it doesn't need 'this' anyway.
2015-07-12 21:18:21 +03:00
Paweł Dziepak
290a7ca1bf query: add timestamp to read_command
Read command needs a timestamp in order to determine which cells have
already expired.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-02 17:01:19 +02:00
Tomasz Grabiec
1b4a72de14 query: Introduce query::max_rows 2015-07-02 13:25:46 +02:00
Gleb Natapov
730170ff1a serialize data structures needed for read clustering 2015-07-01 13:36:28 +03:00
Tomasz Grabiec
9525464f74 Move query::ring_position to dht::ring_position 2015-06-25 18:45:12 +02:00