Commit Graph

10230 Commits

Author SHA1 Message Date
Pekka Enberg
fde8677f1a cql3/query_processor: Clean up code formatting
Currently, query_processor.cc code formatting is all over the place,
which makes the file hard to read. Apply some formatting magic to make
it prettier.

Message-Id: <1470832486-26020-2-git-send-email-penberg@scylladb.com>
2016-08-16 10:39:15 +03:00
Pekka Enberg
ce07822f49 cql3/query_processor: Use type deduction to make code more readable
Use the 'auto' specifier for variables and lambda parameters to make the
code more readable.

Message-Id: <1470832486-26020-1-git-send-email-penberg@scylladb.com>
2016-08-16 10:39:11 +03:00
Avi Kivity
b1f9688432 Merge "range: Add nonwrapping_range" from Duarte
"Ranges that wrap around are a source of complexity and bugs. This patchset
adds a nonwrapping_range class, which specifies the range can't wrap around.
It is the user of the nonwrapping_range that is required to enforce this
constraint.

The idea is to incrementaly disallow ranges that wrap around. We do it
for query::clustering_range in this patchset, and it can be done similarly
for other ranges. This moves the burden of unwrapping ranges to the edges.

Fixes #1544"
2016-08-16 10:08:24 +03:00
Duarte Nunes
5161ea283f query: query::clustering_range can't wrap around
This patch changes the type of query::clustering_range to express that
ranges that wrap around are not allowed, and ranges that have the
start bound after the end bound are considered empty.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:50:20 +00:00
Duarte Nunes
3275fabe53 storage_proxy: Short circuit query without clustering ranges
This patch makes the storage_proxy return an empty result when the
query doesn't define any clustering ranges (default or specific).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
56f10abce3 thrift: Don't always validate clustering range
This patch makes make_clustering_range not enforce that the range be
non-wrapping, so that it can be validated differently if needed. A
make_clustering_range_and_validate function is introduced that keeps
the old behavior.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
be4adf212a nonwrapping_range: Add unit tests
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
bb16e194bc range: Add nonwrapping_range class
This patch introduces the nonwrapping_range class. This class is
intended to be used by code that requires non wrapping ranges.
Internally, it uses a wrapping_range. Users are responsible for
ensuring the bounds are correct when creating a nonwrapping_range.

The path proposed here is to incrementally replace usages of
wrapping_range/range by nonwrapping_range, pushing usages of wrapping
ranges as further to the edges as possible.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
2bb428973a range: Rename to wrapping_range
This patch renames range to wrapping_range in preparation for adding a
new range type, nonwrapping_range.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
2c0b049176 clustering_key_filter: Don't forward declare range
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Paweł Dziepak
5cae44114f partition_version: handle errors during version merge
Currently, partition snapshot destructor can throw which is a big no-no.
The solution is to ignore the exception and leave versions unmerged and
hope that subsequent reads will succeed at merging.

However, another problem is that the merge doesn't use allocating
sections which means that memory won't be reclaimed to satisfy its
needs. If the cache is full this may result in partition versions not
being merged for a very long time.

This patch introduces partition_snapshot::merge_partition_versions()
which contains all the version merging logic that was previously present
in the snapshot destructor. This function may throw so that it can be
used with allocating sections.

The actual merging and handling of potential erros is done from
partition_snapshot_reader destructor. It tries to merge versions under
the allocating section. Only if that fails it gives up and leaves them
unmerged.

Fixes #1578
Fixes #1579.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471265544-23579-1-git-send-email-pdziepak@scylladb.com>
2016-08-15 15:56:53 +03:00
Asias He
ef782f0335 gossip: Add heart_beat_version to collectd
$ tools/scyllatop/scyllatop.py '*gossip*'

node-1/gossip-0/gauge-heart_beat_version 1.0
node-2/gossip-0/gauge-heart_beat_version 1.0
node-3/gossip-0/gauge-heart_beat_version 1.0

Gossip heart beat version changes every second. If everyting is working
correctly, the gauge-heart_beat_version output should be 1.0. If not,
the gauge-heart_beat_version output should be less than 1.0.

Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>
2016-08-15 12:32:00 +03:00
Nadav Har'El
0d00da7f7f sstables: don't forget to read static row
[v2: fix check for static column (don't check if the schema is not compound)
 and move want-static-columns flag inside the filtering context to avoid
 changing all the callers.]

When a CQL request asks to read only a range of clustering keys inside
a partition, we actually need to read not just these clustering rows, but
also the static columns and add them to the response (as explained by Tomek
in issue #1568).

With the current code, that CQL request is translated into an
sstable::read_row() with a clustering-key filter. But this currently
only reads the requested clustering keys - NOT the static columns.

We don't want sstable::read_row() to unconditionally read the from disk
the static columns because if, for example, they are already cached, we
might not want to read them from disk. We don't have such partial-partition
cache yet, but we are likely to have one in the future.

This patch adds in the clustering key filter object a flag of whether we
need to read the static columns (actually, it's function, returning this
flag per partition, to match the API for the clustering-key filtering).

When sstable::read_row() sees the flag for this partition is true, it also
request to read the static columns.
Currently, the code always passes "true" for this flag - because we don't
have the logic to cache partially-read partitions.

The current find_disk_ranges() code does not yet support returning a non-
contiguous byte range, so this patch, if it notices that this partition
really has static columns in addition to the range it needs to read,
falls back to reading the entire partition. This is a correct solution
(and fixes #1568) but not the most efficient solution. Because static
columns are relatively rare, let's start with this solution (correct
by less efficient when there are static columns) and providing the non-
contiguous reading support is left as a FIXME.

Fixes #1568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1471124536-19471-1-git-send-email-nyh@scylladb.com>
2016-08-15 12:30:19 +03:00
Avi Kivity
4fcebd4ca6 random_partitioner: fix overflow in shard_of()
uint128_t will overflow if smp::count > 2.  Replace with a larger type.

Message-Id: <1471188765-30142-1-git-send-email-avi@scylladb.com>
2016-08-15 09:41:54 +03:00
Amnon Heiman
612f677283 scylla.spec: conditionally include the housekeeping.cfg in the conf package
When the housekeeping configuration name was changed from conf to cfg it
was no longer included as part of the conf rpm.

This change adds a macro that determines of if the file should be
included or not and use that marco to conditionally add the
configuration file to the rpm.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1471169042-19099-1-git-send-email-amnon@scylladb.com>
2016-08-14 13:25:59 +03:00
Tomasz Grabiec
1b2ea14d0e partition_version: Add missing linearization context
Snapshot removal merges partitions, and cell merging must be done
inside linearization context.

Fixes #1574

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471010625-18019-1-git-send-email-tgrabiec@scylladb.com>
2016-08-12 17:55:23 +03:00
Piotr Jastrzebski
f212a6cfcb Fix after free access bug in storage proxy
Due to speculative reads we can't guarantee that all
fibers started by storage_proxy::query will be finished
by the time the method returns a result.

We need to make sure that no parameter passed to this
method ever changes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>
2016-08-12 16:34:43 +02:00
Duarte Nunes
918a2939ff docker: If set, broadcast address is seed
This patch configures the broadcast address to be the seed if it is
configured, otherwise Scylla complains about it and aborts.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470863058-1011-1-git-send-email-duarte@scylladb.com>
2016-08-12 11:46:50 +03:00
Avi Kivity
21392cf5fd Merge seastar upstream
* seastar 7fd8d49...823a404 (1):
  > io_priority_class: remove non-explicit operator unsigned
2016-08-11 17:20:23 +03:00
Avi Kivity
65aa9135a1 Merge seastar upstream
* seastar 59613e7...7fd8d49 (1):
  > reactor: Do not test for poll mode default
2016-08-11 14:46:45 +03:00
Amnon Heiman
5a4fc9c503 scylla-housekeeping: rename configuration file from conf to cfg
Files with a conf extension are run by the scylla_prepare on the AMI.
The scylla-housekeeping configuration file is not a bash script and
should not be run.

This patch changes its extension to cfg which is more python like.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470896759-22651-2-git-send-email-amnon@scylladb.com>
2016-08-11 14:44:56 +03:00
Tomasz Grabiec
f1c2481040 sstables: Fix bug in promoted index generation
maybe_flush_pi_block, which is called for each cell, assumes that
block_first_colname will be empty when the first cell is encountered
for each partition.

This didn't hold after writing partition which generated no index
entry, because block_first_colname was cleared only when there way any
data written into the promoted index. Fix by always clearing the name.

The effect was that the promoted index entry for the next partition
would be flushed sooner than necessary (still counting since the start
of the previous partition) and with offset pointing to the start of
the current partition. This will cause parsing error when such sstable
is read through promoted index entry because the offset is assumed to
point to a cell not to partition start.

Fixes #1567

Message-Id: <1470909915-4400-1-git-send-email-tgrabiec@scylladb.com>
2016-08-11 13:08:48 +03:00
Amnon Heiman
a24941cc5f build_deb: Add dist flag
The dist flag mark the debian package as distributed package.
As such the housekeeping configuration file will be included in the
package and will not need to be created by the scylla_setup.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470907208-502-2-git-send-email-amnon@scylladb.com>
2016-08-11 12:25:07 +03:00
Pekka Enberg
d1a052237d dist/docker: Fix typo in "--overprovisioned" help text
Reported by Mathias Bogaert (@analytically).
Message-Id: <1470904395-4614-1-git-send-email-penberg@scylladb.com>
2016-08-11 11:38:03 +03:00
Nadav Har'El
7409688356 README.md: add another required package
I tried to compile scylladb on a new Fedora 24 system, and the "-lsystemd"
library was missing during like. We need the systemd-devel package for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470865063-12871-1-git-send-email-nyh@scylladb.com>
2016-08-11 10:21:21 +02:00
Avi Kivity
42d8701121 Merge seastar upstream
* seastar 64ae228...59613e7 (20):
  > reactor: fix I/O queue pending requests collectd metric
  > simple_input_stream: introduce copy_to() member function
  > scollectd: Fix merge skew between "disabled" and "descriped" metrics patches
  > Merge "Write-behind for XFS"
  > Merge "Fix the SMP queue poller" from Tomasz
  > Merge "collectd syntactical sugar & descriptions" from Calle
  > Fix build failure introduced by 5b1051ce0de5b772f22444abe6a1d97076b49b1f
  > Add --abort-on-seastar-bad-alloc option
  > Merge "Make arp comply with C++ strict aliasing rules"
  > doc: use install-dependencies.sh on docs
  > add execute bit on install-dependencies.sh
  > core/reactor: Fix use-after-free on io_event's promise
  > tcp: write option length correctly
  > tcp: make tcp options comply with strict aliasing rules
  > tcp: comply with strict aliasing rules
  > add libdl to library list
  > reactor: add exception counter
  > install-dependencies.sh: remove unnecessary sudo
  > install-dependencies.sh: install add-apt-repository when it's not installed
  > install-dependencies.sh: add protobuf to dependencies, for newly added prometheus API support

Fixes #1558.
2016-08-10 15:14:31 +03:00
Tomasz Grabiec
d7f8ce7722 Merge branch 'raphael/fix_min_max_metadata_v2' from git@github.com:raphaelsc/scylla.git
Fix for generation of sstables min/max clustering metadata from Raphael.
2016-08-10 10:43:35 +02:00
Pekka Enberg
6a5ab6bff4 dist/docker: Add '--smp', '--memory', and '--overprovisoned' options
Add '--smp', '--memory', and '--overprovisioned' options to the Docker
image. The options are written to /etc/scylla.d/docker.conf file, which
is picked up by the Scylla startup scripts.

You can now, for example, restrict your Docker container to 1 CPU and 1
GB of memory with:

   $ docker run --name some-scylla penberg/scylla --smp 1 --memory 1G --overprovisioned 1

Needed by folks who want to run Scylla on Docker in production.

Cc: Sasha Levin <alexander.levin@verizon.com>
Message-Id: <1470680445-25731-1-git-send-email-penberg@scylladb.com>
2016-08-10 11:34:08 +03:00
Raphael S. Carvalho
8deb1ca19d tests: add test to check sstables's min and max clustering values
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-08-09 15:54:40 -03:00
Raphael S. Carvalho
ef6ddf2398 sstables: fix tracking of min and max clustering components
Scylla was tracking min and max column names instead. Min and max
clustering components are tracked to optimize reads that use a
clustering filter. For more details:
https://issues.apache.org/jira/browse/CASSANDRA-5514

Also fix potential bug if clustering value is empty.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-08-09 15:01:30 -03:00
Nadav Har'El
c2e4f5ba16 Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
2016-08-09 13:21:45 +01:00
Duarte Nunes
0ed19ec64d thrift: Set default validator
This patch sets the default validator for dynamic column families.
Doing so has no consequences in terms of behavior, but it causes the
correct type to be shown when describing the column family through
cassandra-cli.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470739773-30497-1-git-send-email-duarte@scylladb.com>
2016-08-09 13:55:07 +02:00
Avi Kivity
da4d33802e Merge "Add configuration file to scylla-housekeeping" from Amnon
"The series adds an optional configuration file to the scylla-housekeeping. The
file act as a way to prevent the scylla-housekeeping to run. A missing
configuration file, will make the scylla-housekeeping immediately.

The series adds a flag to the build_rpm that differentiate between public
distributions that would contain the configuration file and private
distributions that will not contain it which will cause the setup script to
create it."
2016-08-09 14:52:19 +03:00
Nadav Har'El
e005762271 sstable: avoid copying non-existant value
The promoted-index reading code contained a bug where it copied the value
of an disengaged optional (this non-value was never used, but it was still
copied ). Fix it by keeping the optional<> as such longer.

This bug caused tests/sstable_test in the debug build to crash (the release
build somehow worked).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470742418-8813-1-git-send-email-nyh@scylladb.com>
2016-08-09 14:35:18 +03:00
Duarte Nunes
f63886b32e thrift: Send empty col metadata when describing ks
This patch ensures we always send the column metadata, even when the
column family is dynamic and the metadata is empty, as some clients
like cassandra-cli always assume its presence.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470740971-31169-1-git-send-email-duarte@scylladb.com>
2016-08-09 14:33:18 +03:00
Pekka Enberg
9ff242d339 cql3: Filter compaction strategy class from compaction options
Cassandra 2.x does not store the compaction strategy class in compaction
options so neither should we to avoid confusing the drivers.

Fixes #1538.
Message-Id: <1470722615-29106-1-git-send-email-penberg@scylladb.com>
2016-08-09 10:38:37 +02:00
Pekka Enberg
c23acbe5e6 Update scylla-ami submodule
* dist/ami/files/scylla-ami 2e599a3...14c1666 (1):
  > setup coredump on first startup
2016-08-09 11:09:24 +03:00
Nadav Har'El
bce020efbd Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
2016-08-09 10:51:40 +03:00
Calle Wilund
1593f4254b flush_queue_test: Start semaphore in propagation tests not initialized
Somehow, all my local runs timed ok anyway, but obviously not on all
machines.
Message-Id: <1470727968-1759-1-git-send-email-calle@scylladb.com>
2016-08-09 09:35:28 +02:00
Asias He
d8bff4f745 gossip: Fix debug log in wait_for_gossip_to_settle
There is an extra '{}' in the logger format string.

Fixes:

gossip - Gossip looks settled. 8 gossip round completed: ???

Message-Id: <1470278008-29914-2-git-send-email-asias@scylladb.com>
2016-08-08 16:38:21 +03:00
Takuya ASADA
60ce16cd54 dist/common/scripts: mkdir -p /var/lib/scylla/coredump before symlinking
We are creating this dir in scylla_raid_setup, but user may create XFS volume w/o using the command, scylla_coredump_setup should work on such condition.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470638615-17262-1-git-send-email-syuu@scylladb.com>
2016-08-08 16:02:55 +03:00
Pekka Enberg
3b31d500c8 dist/docker: Document data volume and cpuset configuration
Message-Id: <1470649675-5648-1-git-send-email-penberg@scylladb.com>
2016-08-08 15:50:30 +03:00
Pekka Enberg
4372da426c dist/docker: Add '--broadcast-rpc-address' command line option
We already have a '--broadcat-address' command line option so let's add
the same thing for RPC broadcast address configuration.

Message-Id: <1470656449-11038-1-git-send-email-penberg@scylladb.com>
2016-08-08 15:46:42 +03:00
Avi Kivity
98226a14ac Merge "Exception propagation writers in commitlog batch"
"
While periodic mode is a all-bets-off crap-shoot as far as knowing if
data actually reached disk or not, batch mode is supposed to be
somewhat more reliable/deterministic.

Thus, if we get an exception writing/flushing the current buffer,
we should propagate exceptions to all execution paths involved
in this buffer.

Flush queue can now (optionally) propagate exceptions to all clients, and
commit log uses this to ensure that commit log writers in batch mode
all generate exceptions on disk errors.

Also includes some rudimentary tests for flush queue mechanisms.

Note: other main user, sstable flushing, is not affected, as default
mode is still to keep exceptions to individual worker continuations,
not waiters."
2016-08-08 15:33:26 +03:00
Avi Kivity
700feda0db Merge "promoted index for reading partial partitions" from Nadav
"The goal of this patch series is to support reading and writing of a
"promoted index" - the Cassandra 2.* SSTable feature which allows reading
only a part of the partition without needing to read an entire partition
when it is very long. To make a long story short, a "promoted index" is
a sample of each partition's column names, written to the SSTable Index
file with that partition's entry. See a longer explanation of the index
file format, and the promoted index, here:

     https://github.com/scylladb/scylla/wiki/SSTables-Index-File

There are two main features in this series - first enabling reading of
parts of partitions (using the promoted index stored in an sstable),
and then enable writing promoted indexes to new sstables. These two
features are broken up into smaller stand-alone pieces to facilitate the
review.

Three features are still missing from this series and are planned to be
developed later:

1. When we fail to parse a partition's promoted index, we silently fall back
   to reading the entire partition. We should log (with rate limiting) and
   count these errors, to help in debugging sstable problems.

2. The current code only uses the promoted index when looking for a single
   contiguous clustering-key range. If the ck range is non-contiguous, we
   fall back to reading the entire partition. We should use the promoted
   index in that case too.

3. The current code only uses the promoted index when reading a single
   partition, via sstable::read_row(). When scanning through all or a
   range of partitions (read_rows() or read_range_rows()), we do not yet
   use the promoted index; We read contiguously from data file (we do not
   even read from the index file, so unsurprisingly we can't use it)."
2016-08-07 17:53:17 +03:00
Avi Kivity
bbaebb39a8 Update scylla-ami submodule
* dist/ami/files/scylla-ami 863cc45...2e599a3 (1):
  > Do not set developer-mode on unsupported instance types
2016-08-07 17:51:24 +03:00
Nadav Har'El
fc063ae62d tests: add test for promoted index writing
In this unit test, we create using Scylla C++ code, the same large
partition with 13520 CQL rows as we previously imported from Cassandra
for the large partition test. We then verify that the sstable index file
we just wrote is byte-for-byte identical to the one previously created by
Cassandra. They should indeed be identical, because the data file has the
same layout (even if timestamps are different) and our default promoted-
index block size is the same (64K) so the sample of columns should be
identical.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:15 +03:00
Nadav Har'El
0d8463aba5 sstables: promoted index write support
This patch adds writing of promoted index to sstables.

The promoted index is basically a sample of columns and their positions
for large partitions: The promoted index appears in the sstable's index
file for partitions which are larger than 64 KB, and divides the partition
to 64 KB blocks (as in Cassandra, this interval is configurable through
the column_index_size_in_kb config parameter). Beyond modifying the index
file, having a promoted index may also modify the data file: Since each
of blocks may be read independently, we need to add in the beginning of
each block the list of range tombstones that are still open at that
position.

See also https://github.com/scylladb/scylla/wiki/SSTables-Index-File

Fixes #959

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:12 +03:00
Nadav Har'El
09ede5f333 range tombstone accumulator: add method
Add to the range_tombstone_accumulator a range_tombstones_for_row(ck)
method.

Just like the existing tombstone_for_row(ck), this function drops from
the accumulator tombstones that end before ck. But while the existing
function returned just a single tombstone affecting the given row (the
most recent tombstone), the new function range_tombstones_for_row(ck)
returns all the accumulated range tombstones which cover ck.

This function will be useful for the promoted-index writing code later,
which divides a partition into blocks which may be read independently,
so each block needs to start with a repeat of the earlier tombstones
which still cover the first row in the new block.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:10 +03:00
Nadav Har'El
022a69caea sstables: promoted index read support
This patch adds support more efficiently reading small parts of a large
partition, without reading the entire partition as we had to do so far.
This is done using the "promoted index".

The "promoted index" is stored in the sstable index file, and provides
for each large sstable row ("partition" in CQL nomenclature) a sample of
the column names at (for example) 64KB intervals. This means that when we
read a slice of columns (e.g., cql rows), or page through a large partition,
we do not have to read the entire partition from disk.

This patch only implements the read side of promoted index - a later patch
will add the write-side support (i.e., writing the promoted index to the
index file while saving the sstable). Nevertheless this patch can already
be tested by reading existing sstables from Cassandra which include a
promoted index - such as the one included in the test in the previous patch.

The use of the promoted index currently has two limitations:

1. It is only used when reading a single partition with sstable::read_row(),
not when scanning through many partitions with sstable::read_range_rows()
or sstable::read_rows().

2. It is only used when filtering a single clustering-key range, rather
than a list of disjoint ranges. A single range is the common case.

These two issues will be improved later. In the meantime, in those
unsupported cases we simply continue to read entire partitions, so we're not
worse-off than before.

Also note that this patch only helps when sstable::read_row() is used with
a clustering-key prefix (i.e., a slice). Our higher-level request handling
code may decide to read an entire partition into the cache, and not use
a clustering-key prefix at all when reading. We will need to indepdently
improve the high-level code to use read_row()'s slicing capabilities
when paging through large partitions, for example.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:09 +03:00