Commit Graph

11716 Commits

Author SHA1 Message Date
Duarte Nunes
42242273f6 schema_tables: Create views from mutations
This patch enables views to be created from their low-level,
mutation-based representation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
888a8923c7 read_table_mutations: Support other schemas
This patch changes read_table_mutations() so that it can now
read schemas from other tables besides the column families
schema table.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
93458f314c migration_manager: Notify of view schema changes
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
22d8aa9bb6 migration_listener: Listen for view schema changes
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
b9cf25c4dd schema_tables: Add views schema table
This patch adds the views schema table, containing the definition of
views in a keyspace.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
e41494996f thrift: Skip materialized views
This patch ensures we don't provide access to materialized views over
thrift. This includes preventing updates but also omitting them when
describing a keyspace.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
2b231f22b8 keyspace_metadata: Add tables() and views() functions
This patch adds utility functions to keyspace_metadata to select only
the tables or only the views out of all the schemas.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
7818339791 materialized views: Add view class
This patch adds the view class, which will contains functions related
to populating a view, either from the base table's write path or from
the view building mechanism which copies over already existing data in
the base table.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
d0ed8fa29b schema: Add view_ptr class
The view_ptr class contains a schema_ptr known to represent a
materialized view. It is intended to be used by functions that require
such a schema, and thus obviate the need for the function to check for
schema::is_view().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
82ce8eedbd schema: Add view_info field
This patch adds a view_info optional field to the schema. It's
presence indicates the schema represents a materialized view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
4b3ac42914 materialized views: Add view_info class
The view_info class is meant to augment a schema with
fields relevant for materialized views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Duarte Nunes
d7e607ff51 query_pagers: Fix over-counting of rows
This patch fixes a regression introduced in 0518895, where we counted
one extra row per partition when it contained live, non static rows.

We also simplify the visitor logic further, since now we don't need to
count rows one by one. Also remove a bunch of unused fields.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1482234083-2447-1-git-send-email-duarte@scylladb.com>
2016-12-20 11:58:37 +00:00
Tomasz Grabiec
0e487b3499 db: Compute key hash once in partition_presence_checker
I measured reduction of cache update time by 20% for 6 sstables and by
40% for 16.

Refs #1943.
2016-12-19 14:20:58 +01:00
Tomasz Grabiec
ab5c77fcf1 bloom_filter: Allow checking presence using pre-hashed key
Will allow us to calculate the hash once and use it on many filters
instead of calculating the hash for each filter separately.

Another change made is to avoid precomputing all indexes during filter
operations, and have for_each_index() template instead which invokes a
functor.
2016-12-19 14:20:58 +01:00
Tomasz Grabiec
78844fa2e5 db: Use incremental selector in partition_presence_checker
This reduces the number of sstables we need to check to only those
whose token range overlaps with the key. Reduces cache update
time. Especially effective with leveled compaction strategy.

Refs #1943.

Incremental selector works with an immutable sstable set, so cache
updates need to be serialized. Otherwise we could mispopulate due to
stale presence information.

Presence checker interface was changed to accept decorated key in
order to gain easy access to the token, which is required by
the incremental selector.
2016-12-19 14:20:58 +01:00
Avi Kivity
b740aff777 tests: adjust mutation_query_test for partition and row limits
Won't build otherwise.
2016-12-19 11:37:25 +02:00
Avi Kivity
f3c8cbbac5 Merge "Introduce dht::token_range an dht::partition_range" from Asias
"nonwrapping_range<ring_position> and nonwrapping_range<token> are used
in many places. Let's make an alias for them to make it less verbose.

Also there is a query::partition_range in query-request.hh which is the alias of
nonwrapping_range<ring_position>. query::partition_range is used in
places not related to query at all. Let's unify the usage project wide."

* tag 'asias/repair_dht_token_range/v2' of github.com:cloudius-systems/seastar-dev:
  Convert to use dht::partition_range_vector and dht::token_range_vector
  dht: Introduce dht::partition_range_vector and dht::token_range_vector
  Get rid of query::partition_range
  Convert to use dht::partition_range
  Convert to use dht::token_range
  dht: Rename token_range to token_range_endpoints
  dht: Introduce dht::token_range an dht::partition_range
2016-12-19 10:59:52 +02:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
7a446986fa dht: Introduce dht::partition_range_vector and dht::token_range_vector
std::vector<dht::partition_range> and std::vector<dht::token_range> are
used in a lot of places, introduce dht::partition_range_vector and
dht::token_range_vector as the alias.
2016-12-19 08:09:28 +08:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Asias He
85034c1b57 Convert to use dht::partition_range 2016-12-19 08:04:30 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Asias He
1f06eedb58 dht: Rename token_range to token_range_endpoints
It is a helper class used in storage_service only. Rename it so we can
use it for the real dht::token_range.
2016-12-19 08:04:29 +08:00
Asias He
264b6ee69e dht: Introduce dht::token_range an dht::partition_range
nonwrapping_range<ring_position> and nonwrapping_range<token> are used
in many places. Let's make an alias for them to make it less verbose.

Also there is a query::partition_range in query-request.hh which is the alias of
nonwrapping_range<ring_position>. query::partition_range is used in
places not related to query at all. Let's unify the usage project wide.
2016-12-19 08:04:29 +08:00
Avi Kivity
32fb4c3661 Merge "repair: Reduce unnecessary streaming traffic even more" from Asias
"In 7c873f0d (repair: Reduce unnecessary streaming traffic), we optimize
in cases when 1) all the remote nodes has the same checksum and 2) local node
has zero checksum.

In this series, we make the optimization more generec and cover more cases."

* tag 'asias/repair/node_reducer/v3' of github.com:cloudius-systems/seastar-dev:
  repair: Reduce unnecessary streaming traffic even more
  repair: Add hash specialization for partition_checksum
2016-12-18 16:53:39 +02:00
Avi Kivity
3421ebe8be Merge "storage_proxy: Enforce row limit" from Duarte
"This patchset ensures the partition limit is enforced at
the storage_proxy level. Uppers layers like the pager may
already be depending on this behavior."

* 'enforce-row-limit/v3' of https://github.com/duarten/scylla:
  query_pagers: Don't trim returned rows
  select_statement: Don't always trim result set
  query_result_merger: Limit rows
  mutation_query: to_data_query_result enforces row limit
2016-12-18 08:15:51 +02:00
Avi Kivity
6bb875bdb7 Merge "storage_proxy: Enforce partition limit" from Duarte
"This patchset ensures the partition limit is enforced at
the storage_proxy level. To achieve this, we add the partition
count to query::result, and allow the result_merger to trim
excess partitions."

* 'enforce-partition-limit/v3' of https://github.com/duarten/scylla:
  storage_proxy: Decrease limits when retrying command
  storage_proxy: Don't fetch superfluous partitions
  query::result: Add partition count
  column_family: Use counters in query::result::builder
  query_result_builder: Use the underlying counters
  mutation_partition: Count partitions in query_compacted
  mutation_partition: Remove tabs in query_compacted
  query::result::builder: Add partition count
  query_result_merger: Limit partitions
2016-12-16 13:57:37 +02:00
Glauber Costa
7133583797 track streaming and system virtual dirty memory
A case could be made that we should have counters for them no matter
what, since it can help us reason about the distribution of memory among
the groups. But with the hierarchy being broken in 1.5 it becomes even
more important. Now by looking solely at dirty, we will have no idea
about how much memory we are using in those groups.

After this patch, the dirty_memory_manager will register its metrics
for the 3 groups that we have, and the legacy names will be used to show
totals.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0d04ca4c7e8472097f16a5dc950b77c73766049e.1481831644.git.glauber@scylladb.com>
2016-12-16 10:59:40 +02:00
Avi Kivity
293876c72f Merge "Limit number of readers streaming uses" from Paweł
"Original, naive db::make_streaming_reader() implementation created a set
of memtable and sstable readers for every partition range. This caused
bad interaction with the code limiting sstable readers concurrency and
was suboptimal.

This series introduces multi range mutation reader that takes mutation
source and a sorted, disjoint vector of ranges. It creates only a single
set of memtable and sstable readers and fast forwards it to the next
range once the current one is completed."

* 'pdziepak/multi-range-reader/v1' of github.com:cloudius-systems/seastar-dev:
  db: use multi range reader for streaming readers
  dht: describe split_range[s]_to_shards() guarantees
  repair: remove outdated fixme
  test/mutation_reader_test: add multi_range_reader test
  tests/mutation_reader: extract key creation code
  mutation_reader: add multi_range_reader
2016-12-15 17:48:31 +02:00
Paweł Dziepak
cf679a413c db: use multi range reader for streaming readers
A naive approach was to create a set of readers for each range and pass
them all to combining reader. This however performed badly if the number
of ranges was high.

The solution is to use multi range reader which uses only a single set
of readers and fast forwards from range to range when necessary. This
adds another requirement that the ranges passed to
make_streaming_reader() are sorted and disjoint.
2016-12-15 13:54:43 +00:00
Paweł Dziepak
b86a826baf dht: describe split_range[s]_to_shards() guarantees
We are going to require these functions to return sorted and disjoint
ranges. They already do so (provided that the input ranges are sorted
and disjoint), but if the guarantee is not explicitly stated it may
disappear some day.
2016-12-15 13:07:32 +00:00
Paweł Dziepak
5287417136 repair: remove outdated fixme 2016-12-15 13:07:32 +00:00
Paweł Dziepak
5b0cf20f75 test/mutation_reader_test: add multi_range_reader test 2016-12-15 13:07:32 +00:00
Paweł Dziepak
787a976c2b tests/mutation_reader: extract key creation code 2016-12-15 13:07:32 +00:00
Paweł Dziepak
52a4e79210 mutation_reader: add multi_range_reader
So far, the only way to combine outputs of multiple readers was to use
combining reader. It is very general and, in particular, supports case
when the readers emit mutations from overlapping ranges.

However, we have cases (e.g. streaming) when we need to read from
several disjoint ranges. Combining reader is a suboptimal solution as it
requires to creating a reader for each range and ignores the fact that
they do not overlap.

This patch introduces multi_range_mutation_reader which takes a
mutation_source and a sorted set of disjoint ranges. Internally, it uses
mutation_reader::fast_forward_to() to move to the next range once the
current one is completed.
2016-12-15 13:07:31 +00:00
Duarte Nunes
0518895f5b query_pagers: Don't trim returned rows
Since storage_proxy::query() now respects the read_command limits, we
can remove the trimming logic from query_pagers.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 11:00:46 +00:00
Duarte Nunes
7ce859799b select_statement: Don't always trim result set
Trimming the result set is only needed when the query contains an "IN"
relation, an ORDER BY clause, and defines a limit, which is the case
where we query different ranges concurrently. We don't use the
result_merger to trim since we first need to reorder the rows.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 11:00:46 +00:00
Duarte Nunes
fee0b7fa48 query_result_merger: Limit rows
This patch makes the row limit enforced by the storage_proxy layer.
It adds a row limit to the query_result_merger, useful when merging
results for concurrent queries.

More importantly, it provides guarantees that upper layers may be
relying on implicitly (e.g., the paging code).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 11:00:36 +00:00
Duarte Nunes
efc986d548 mutation_query: to_data_query_result enforces row limit
This patch changes mutation_query::to_data_query_result() so that it
enforces the row limit alongside the partition limit and the
per-partition limit.

In the following patch, we'll enforce the row limit in an upper layer,
but this lets us optimize the case where only when replica replies.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:56:40 +00:00
Duarte Nunes
c2072c7dc9 storage_proxy: Decrease limits when retrying command
This patch changes a read_command's limits when retrying it, so that
we don't ask for more rows than necessary.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:41:06 +00:00
Duarte Nunes
9572c19dc6 storage_proxy: Don't fetch superfluous partitions
This patch ensures we keep track of how many partitions we've queried
so we don't ask for more than the number we need.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Duarte Nunes
93be8d7cef query::result: Add partition count
This patch adds a partition count to query::result, filled by the
query::result::builder. The partition count is present whenever the
result carries data, being absent only for the case where the result
contains only a digest.

We also ensure that counts are present for an empty query::result.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Duarte Nunes
781cd82cb8 column_family: Use counters in query::result::builder
This patch changes column_family::query() to use the counters in the
builder to determine how many partitions and rows to ask for and also
to implement the stop condition. This saves a continuation to do the
bookkeeping, and allows us to remove data_query_result.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Duarte Nunes
05b2ef4fa2 query_result_builder: Use the underlying counters
This patch changes the query_result_builder to use the counters
provided by the query::result::builder. It also ensures they are kept
current.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Duarte Nunes
f5cf7f7921 mutation_partition: Count partitions in query_compacted
This patch changes mutation_partition::query_compacted() to count the
number of partitions written to the underlying writer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Duarte Nunes
f21dfb8217 mutation_partition: Remove tabs in query_compacted
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Duarte Nunes
2409b6b250 query::result::builder: Add partition count
This patch adds a partition count to the query::result::builder. It is
intended to be incremented by users, and later used to build a
query::result.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Duarte Nunes
108011a839 query_result_merger: Limit partitions
This patch adds a partition limit to the query_result_merger, useful
when merging results for concurrent queries. This change also makes
the partition limit enforced by the storage_proxy layer, no changes
being needed by the upper layers, namely the Thrift interface.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:41 +00:00
Pekka Enberg
06c5216c9d Merge "Improve gossip feature logging" from Asias 2016-12-15 10:36:54 +02:00
Asias He
e578e65103 gossip: Log feature enabled message on shard zero only
Feature is per node. No need to log them number of shards times.
2016-12-15 16:33:11 +08:00