Commit Graph

11734 Commits

Author SHA1 Message Date
Tomasz Grabiec
bedd0ab6f9 database: Pass partition_range to single_key_sstable_reader to avoid copies and decorating 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
0b5ba13230 sstables: index_reader: Introduce advance_to_next_partition() 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
4b81844d2e sstables: index_reader: Introduce advance_and_check_if_present() 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
b92f095bf0 sstables: index_reader: Introduce advance_past() 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
6780756258 sstables: index_reader: Make copyable 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
7db83fa3fe sstables: index_reader: Optimize advancing to extreme positions 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
f66443c01c sstables: index_reader: Keep two last pages alive
The idea behind caching is that when we have two index readers where
one is catching up with the other, each page will be read only
once. Currently that's not always the case. There is a case when
advance_to() may need to read two pages. That's when the target
position is not found in the first page as determined by the summary
index. The second reader which catches up would have to read the first
page as well, but it would not be in cache any more. To avoid this
extra I/O let's keep a reference to the two last pages touched by the
index.
2017-04-20 10:54:38 +02:00
Tomasz Grabiec
c7b9c5dfd3 dht: ring_position_view: Add key getter 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
5b71e0b9ab dht: ring_position_view: Add constructor and factory from ring_position_view 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
3e8795494e sstables: mutation_reader: Advance to next partition using index in some cases
To produce a streamed_mutation for the next partition, we need to read
its key and the tombstone. Currently we always do that by consuming
the partition header from the data file. In some cases that may cause
unnecessary IO.

It's better to obtain partition information from the index if we
already have it. We can save on IO if the user will skip past the
front of partition immediately after.

It is also better to pay the cost of reading the index if we know that
we will need to use the index anyway soon. This patch predicts that by
checking if there are any clustering restrictions. If there are any,
we will almost surely need_skip() and use the index anyway.

This change also lays the ground for unification of multi and single
partiton queries without loss of performance.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
e35fe7492c sstables: index_reader: Expose access to partition key and tombstone 2017-04-20 10:54:37 +02:00
Tomasz Grabiec
ae72c159b1 sstables: index_reader: Introduce promoted_index_view
So that we have a nice way of extracting tombstone out of it. We not
always need fully parsed index.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
0ef33b7f29 sstables: mutation_reader: Move _index_in_current to sstable_data_source
sstable_data_source holds a shared state between mutation_reader and
streamed_mutation for sstables. The information whether index is in
current partition will have to be accessed by both in the following
patches.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
885f53d905 sstables: mutation_reader: Avoid resetting the walker
Before the change, the following scenario was happening:

  1) we try to skip based on clustering restrictions
  2) we find the page and fast forward to it, recording walker's
     lower bound counter
  3) we read the first fragment, it's not a tombstone, so we reset the walker,
     and its lower bound counter too
  4) the fragment is not in range (the range starts in the middle of the page)
  5) needs_skip() is true, we redo the index lookup, which wastes some CPU

This change fixes the problem by avoiding resetting the walker. We can
do that because leading tombstones are checked with a non-mutable
contains_tombstone()
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
bf21aa3a1f clustering_ranges_walker: Introduce contains_tombstone() 2017-04-20 10:54:37 +02:00
Tomasz Grabiec
b030ce693d sstables: mutation_reader: Don't try to read index to skip to static row
Static row is always at the beginning, there's no point in doing
index lookups.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
3e060659f1 sstables: mutation_reader: Don't try to read static row if table doesn't have any 2017-04-20 10:54:37 +02:00
Tomasz Grabiec
b1860a8a24 clustering_ranges_walker: Allow excluding the static row 2017-04-20 10:54:37 +02:00
Tomasz Grabiec
77d3e30239 sstables: mutation_reader: Use index to skip across clustering restrictions
Improves scans with clustering restrictions. Before the change such
scans would scan whole partition.

Below are results of a test case from perf_fast_forward which selects
few rows from a large partition using query restrictions (not fast forwarding).

Before:

  stride  rows      time [s]     frags     frag/s    aio      [KiB] blocked dropped  idx hit idx miss  idx blk    cpu
  1000000 1         0.000609         1       1642      3        152       2       1        0        1        1  38.0%
  500000  2         0.242255         2          8    511      64152     398       4        0        1        1  98.6%
  250000  4         0.281592         4         14    749      95832     564       4        0        1        1  98.4%
  125000  8         0.328056         8         24    873     111704     657       4        0        1        1  98.4%
  62500   16        0.306700        16         52    935     119640     751       4        0        1        1  99.4%

After:

  stride  rows      time [s]     frags     frag/s    aio      [KiB] blocked dropped  idx hit idx miss  idx blk    cpu
  1000000 1         0.000711         1       1406      3        152       2       1        0        1        1  42.1%
  500000  2         0.000910         2       2197      5        216       3       2        0        1        1  39.2%
  250000  4         0.001384         4       2891      9        344       5       4        0        1        1  35.3%
  125000  8         0.003197         8       2502     21        728      13       8        0        1        1  53.1%
  62500   16        0.006664        16       2401     41       1368      25      16        0        1        1  58.2%
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
05a1f92cbc clustering_ranges_walker: Introduce lower_bound_change_counter()
Allows detecting changes of lower_bound().

Result of advance_to() is not enough. When we get false from
advance_to() twice in a row, lower bound may or may not have changed.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
461f2af0a1 sstables: mutation_reader: Avoid index lookups when out of range 2017-04-20 10:54:37 +02:00
Tomasz Grabiec
10c92d37d1 sstables: mutation_reader: Simplify fast_forward_to() 2017-04-20 10:54:37 +02:00
Tomasz Grabiec
bfb6858e55 sstables: mutation_reader: Let clustering_ranges_walker handle the _fwd_range start
Simplifies the code a bit, but also will make it easier to calculate
the next position we should skip to after forwarding, taking into
consideration both the position forwarded to as well as clustering
ranges of the query. That will be just calling
_ck_ranges_walker->lower_bound() after it is trimmed.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
d056d9c31b sstables: mutation_reader: Let mp_row_consumer decide about position passed to the index
In general mp_row_consumer has better information about the next
position to read. It could be after the position we forward to if
there are clustering restrictions. This will be exploited in the
following patches.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
a37712e9ae sstables: mutation_reader: Move mp_row_consumer::fast_forward_to() out of line 2017-04-20 10:54:37 +02:00
Tomasz Grabiec
bb3e683783 clustering_ranges_walker: Support trimming
Makes implementing fast_forward_to() easier. mp_row_consumer emulates
this currently. This change will allow simplifying this.
2017-04-20 10:54:37 +02:00
Tomasz Grabiec
652d04e78a clustering_ranges_walker: Generalize to work on position ranges
It will include the static row by default. This will allow simplifying
users, which work with position ranges already.
2017-04-20 10:54:36 +02:00
Tomasz Grabiec
c85fe3183c position_range: Allow stealing of bounds 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
503c68de44 position_in_partition: Add more factory methods 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
6c1dc642ee sstables: mutation_reader: Create index on-demand 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
434fda3577 sstables: mutation_reader: Keep priority_class by reference
To indicate that it is not optional.
2017-04-20 10:54:36 +02:00
Tomasz Grabiec
a8c126c82a sstables: Expose get_index_reader() 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
e1af5a406d sstables: Make sstable::get_index_reader() return unique_ptr<>
Makes callers a bit simpler
2017-04-20 10:54:36 +02:00
Tomasz Grabiec
7dc3fe7d3f tests: perf_fast_forward: Add test case for forwarding with clustering restrictions in a large partition 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
eed864690b tests: perf_fast_forward: Add test case for slicing of large partition using a single-partition reader 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
81fc7977a4 tests: perf_fast_forward: Add test for selecting few rows from large partition 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
02da3ba316 tests: perf_fast_forward: Fix use-after-free in scan_with_stride_partitions()
partition_range must live as long as the reader is used.
2017-04-19 08:37:56 +02:00
Raphael S. Carvalho
e78db43b79 compaction_manager: fix crash when dropping a resharding column family
Problem is that column family field of task wasn't being set for resharding,
so column family wasn't being properly removed from compaction manager.
In addition to fixing this issue, we'll also interrupt ongoing compactions
when dropping a column family, exactly like we do with shutdown.

Fixes #2291.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170418125807.7712-1-raphaelsc@scylladb.com>
2017-04-18 17:39:27 +03:00
Duarte Nunes
af37a3fdbf logalloc: Fix compilation error
This patch moves a function using the region_impl type after the type
has been defined.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170418124551.25369-1-duarte@scylladb.com>
2017-04-18 15:56:26 +03:00
Raphael S. Carvalho
11b74050a1 partitioned_sstable_set: fix quadratic space complexity
streaming generates lots of small sstables with large token range,
which triggers O(N^2) in space in interval map.
level 0 sstables will now be stored in a structure that has O(N)
in space complexity and which will be included for every read.

Fixes #2287.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com>
2017-04-18 13:04:38 +03:00
Takuya ASADA
86e464ab26 dist/offline_installer: support Ubuntu/Debian
moved existing script to dist/offline_installer/redhat, added .deb version into
dist/offline_installer/debian.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1492474821-9907-1-git-send-email-syuu@scylladb.com>
2017-04-18 10:56:50 +03:00
Pekka Enberg
b31c45d8af Merge "clang fixes (part 1)" from Avi
"This series fixes some errors found by clang, with the aim of enabling
clang/zapcc as a supported compiler.  A few more fixes are needed to produce
a binary."

* tag 'clang/1/v1' of https://github.com/avikivity/scylla:
  logalloc: avoid auto in function argument declaration
  thrift: avoid auto in function argument declaration
  streamed_mutation: fix non-POD argument to C-style variadic function
  mutation_partition_serializer: avoid auto in function argument declaration
  date: use correct casts for years
  streaming: avoid auto in function argument declaration
  repair: avoid auto in function argument declaration
  gms: expose gms::inet_address streaming operator
  murmur3_partitioner: fix build on clang
  i_partitioner: remove unused function
  byte_ordered_partitioner: fix bad operator precedence
  result_set: pass comparator by reference to std::sort()
  to_string: move standard container overloads of to_string to std:: namespace
  cql_type: fix bad enum syntax on clang
  build: disable more warnings for clang
  build: fix detection of unsupported warnings on clang
2017-04-18 08:49:25 +03:00
Avi Kivity
844529fe33 logalloc: avoid auto in function argument declaration
'auto' in a non-lambda function argument is not legal C++, and is hard
to read besides.  Replace with the right type.

Since the right type is private, add some friendship.
2017-04-17 23:18:44 +03:00
Avi Kivity
54add19ca2 thrift: avoid auto in function argument declaration
'auto' in a non-lambda function argument is not legal C++, and is hard
to read besides.  Replace with the right type.
2017-04-17 23:18:44 +03:00
Avi Kivity
f0c25fc20f streamed_mutation: fix non-POD argument to C-style variadic function
Clang warns that passing a non-POD to a C-style variadic function will
result in an abort().  That happens to be exactly what we want, but to
silence the warning, use a template instead.  Since templates aren't
allowed in local classes, move the containing class to namespace scope.
2017-04-17 23:18:44 +03:00
Avi Kivity
635c32eb32 mutation_partition_serializer: avoid auto in function argument declaration
'auto' in a non-lambda function argument is not legal C++, and is hard
to read besides.  Replace with the right type.
2017-04-17 23:18:44 +03:00
Avi Kivity
a0858dda3e date: use correct casts for years
Our date implementation uses int64_t for years, but some of the code was
not changed; clang complains, so use the correct casts to make it happy.
2017-04-17 23:03:15 +03:00
Avi Kivity
ca69a04969 streaming: avoid auto in function argument declaration
'auto' in a non-lambda function argument is not legal C++, and is hard
to read besides.  Replace with the right type.
2017-04-17 23:03:15 +03:00
Avi Kivity
ae7d7ae20f repair: avoid auto in function argument declaration
'auto' in a non-lambda function argument is not legal C++, and is hard
to read besides.  Replace with the right type.
2017-04-17 23:03:15 +03:00
Avi Kivity
c885c468a9 gms: expose gms::inet_address streaming operator
The standard says, and clang enforces, that declaring a function via
a friend declaration is not sufficient for ADL to kick in.  Add a namespace
level declaration so ADL works.
2017-04-17 23:03:15 +03:00