Commit Graph

11716 Commits

Author SHA1 Message Date
Duarte Nunes
822a315dfa thrift: Implement get_multi_slice verb
The get_multi_slice verb is used to perform multiple slices on a
single row key in one operation. It takes a set of column_slices,
which we normalize to not contain any overlapping ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
9792a77266 range: Add deoverlap function
This patch adds the deoverlap function to range.hh, which takes in a
vector of possibly overlapping ranges and returns a vector of
non-overlapping ranges covering the same values.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:41 +02:00
Duarte Nunes
c910a4639c thrift: Implement get_paged_slice verb
The get_paged_slice verb is similar to the get_range_slices verb,
except that it doesn't take a SlicePredicate. Instead, it takes a
column from which to start the query.

For dynamic CFs, we use the partition_slice::specific_ranges to single
out the first partition, and query starting from the start_column row.
For static CFs, we issue an initial query to fetch the remainder of
columns from the first partition, and at least one more query to fetch
the subsequent columns until the limit is reached. This implies a
performance penalty for static CFs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
370572884c thrift: Implement get_range_slices verb
The get_range_slices verb is similar to the multiget_slice verb,
except that it operates on a range of partition keys (or tokens).

In origin, empty partitions are returned as part of the KeySlice, for
which the key will be filled in but the columns vector will be empty.
Since in our case we don't return empty partitions, we don't know which
partition keys in the specified range we should return back to the client.
So for now, our behavior differs from Origin.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
b872db55bd thrift: Implement get_count verb on top of multiget_count
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
a42b7ba3f7 thrift: Implement multiget_count verb
This patch implements the multiget_count verb in a similar fashion as
multiget_slice, but using an accumulator that counts the returned
columns instead of create thrift ColumnOrSuperColumn objects.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
a44561870a thrift: Implement get verb in terms of get_slice
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
db4c26d5b8 thrift: Implement get_slice in terms of multiget_slice
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
cd3a12535e thrift: Implement multiget_slice verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
acd39d871f thrift: Validate column names
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
4e9af0dc8e thrift: Make read_command from SlicePredicate
This patch build a query::read_command from a SlicePredicate,
for both dynamic and static column families.

For dynamic CFs, restrictions on the clustering columns are added, and
for static CFs, limits and ordering is defined inline by selecting the
correct regular columns.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
21d0a2c764 query: Optionally send cell ttl
This patch adds support to send a cell's ttl as part of a query's
result. This is needed for thrift support.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
eb8f5fafb2 thrift: Add partition key validation
This patch validates whether the specified partition key is not empty
and under the size limit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
f57136f2f3 thrift: Make key_from_thrift take schema ref
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
e2b4cc4849 types: Add to_bytes_view function
This patch adds a function that converts a reference to an std::string
to a bytes_view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
a647fea30b schema: Add is_dynamic to thrift_schema
This patch adds the is_dynamic() function to thrift_schema, which
tells whether the underlying column family is dynamic or not,
according to thrift rules.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
527ec2ab59 thrift: Support composite keys
This patch adds support for composite comparators (which, for dynamic
column families, it means composite clustering keys) and for composite
keys (composite partition keys).

Support for composite column names and regular columns is deferred,
which will entail making compound_type an abstract_type.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
7f5ec71b1f thrift: Extract ttl calculation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
324b776c1b thrift: Add lookup_schema function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Avi Kivity
4ef2b1b25f Merge seastar upstream
* seastar e660d54...5e97d5f (7):
  > util::lazy_eval: add an implicit cast operator overload
  > rpc: consolidate read_(request|response)_frame logic
  > rpc: handle lz4 compressor errors
  > iotune: provide a status dump if we can't calculate a proper number of io_queues
  > rpc: adjust lz4 compression for older lz4.h
  > Fix chunked_fifo move assignment
  > rpc: add missing header file protectors
2016-07-14 16:27:23 +03:00
Avi Kivity
32d670a792 Merge "Scylla-housekeeping check version" from Amnon
"This series replaces the original scylla-help.py

It contains only a basic script that checks daily for version and report if a
newer version matched.

The script is added as a service and will be started and shutdown with
scylla-server."
2016-07-14 14:58:33 +03:00
Avi Kivity
1048e1071b db: do not create column family directories belonging to foreign keyspaces
Currently, for any column family, we create a directory for it in all
keyspace directories.  This is incredibly awkward.

Fix by iterating over just the keyspace's column families, not all
column families in existence.

Fixes #1457.
Message-Id: <1468495182-18424-1-git-send-email-avi@scylladb.com>
2016-07-14 14:31:05 +03:00
Amnon Heiman
260761f2dd rules.in: Add the scylla-timer to ubuntu
This adds a rule to install the scylla-timer as part of the ubuntu
package.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-14 12:46:47 +03:00
Amnon Heiman
3be9ab38e2 ubuntu.in: Add dependency to python3-requests
The check version script uses the python requests package, this add the
dependency to the ubuntu package.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-14 12:46:47 +03:00
Amnon Heiman
3b9db378ac scylla-server.install.in: Pack the scyll-housekeeping on ubuntu
This adds the scylla-housekeeping to the ubuntu packging.
2016-07-14 12:46:47 +03:00
Amnon Heiman
948140bec3 Adding a timer service for ubuntu scylla-housekeeping
Ununtu 14.4 upstart does not support timers for recurrent operations.
The upstart cookbook suggest a way to mimic this functionality here:
http://upstart.ubuntu.com/cookbook/#run-a-job-periodically

This patch adds a service that runs the house-keeping daily.

Setting it as a service insure that it would start and stop with
scylla-server service.
2016-07-14 12:46:39 +03:00
Avi Kivity
23edc1861a db: estimate queued read size more conservatively
There are plenty of continuations involved, so don't assume it fits in 1k.
Message-Id: <1468429516-4591-1-git-send-email-avi@scylladb.com>
2016-07-14 11:42:24 +02:00
Avi Kivity
d3c87975b0 db: don't over-allocate memory for mutation_reader
column_family::make_reader() doesn't deal with sstables directly, so it
doesn't need to reserve memory for them.

Fixes #1453.
Message-Id: <1468429143-4354-1-git-send-email-avi@scylladb.com>
2016-07-14 10:01:42 +02:00
Paweł Dziepak
10c144ffd4 types: fix type aliasing violation
Any pointer can be casted to char*, but not the other way around. This
causes GCC6 to misoptimize timestamp_type_impl::from_string().

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468413349-27267-1-git-send-email-pdziepak@scylladb.com>
2016-07-13 17:22:16 +03:00
Tomasz Grabiec
c97871d95c migration_manager: Uncomment logging for keysapce drop
Message-Id: <1468413673-6899-1-git-send-email-tgrabiec@scylladb.com>
2016-07-13 13:42:23 +01:00
Gleb Natapov
9cc076c9f3 storage_proxy: preserve endpoint's order while filtering local nodes for query
filter_for_query() gets sorted by preference list of endpoints and
should preserve that order after filtering out non local endpoints for
local query. partition() does not guaranty this while stable_partition()
does, so use it instead.

Fixes #1450.
Message-Id: <20160713100909.GM10767@scylladb.com>
2016-07-13 13:17:28 +03:00
Tomasz Grabiec
7227c537ce Merge branch 'pdziepak/streamed-mutations-hashing/v5' from seastar-dev.git
From Paweł:

This is another episode in the "convert X to streamed mutations" series.
Hashing mutations (mainly for repair) is converted so that it doesn't
need to rebuild whole mutation.

The first part of the series changes the way streamed mutations deal
with range tombstones. Since it is not necessary to make sure we write
disjoint tombstones to sstables there is no need anymore for streamed
mutations to produce disjoint tombstones and, consequently, no need for
range tombstones to be split into range_tombstone_begin and
range_tombstone_end.

The second part is the actual hashing implementation. However, to ensure
that the hash depends only on the contents of the mutation and no the
way it is stored in different data sources range tombstones have to be
made disjoint before they are hashed.

This series also ensures that any changes caused by streamed mutations
to hashing and streaming do not break repair during upgrade.
2016-07-13 11:24:00 +02:00
Duarte Nunes
674afc52bc compound_test: Test singular composite_view::explode()
This patch adds a test case for composite_view::explode() called on a
non-compound composite.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468353393-3074-1-git-send-email-duarte@scylladb.com>
2016-07-13 11:23:24 +02:00
Paweł Dziepak
3fe1aec29d streaming: avoid word "ERROR" in non-error messages
Some tools (e.g. ccm) get confused and consider messages containing word
"ERROR" as error level messagess irrespectively of their actual severity
level.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468399752-5228-1-git-send-email-pdziepak@scylladb.com>
2016-07-13 12:06:33 +03:00
Paweł Dziepak
eb88181347 repair: ask for streamed checksums if cluster supports them
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
7e06499458 repair: convert hashing to streamed_mutations
This patch makes hashing for repair calculate checksums in a way that
doesn't require rebuilding whole mutation.
Unfortunately, such checksums are incompatible with the old ones so the
old way for computing checksums is preserved for compatibility reasons.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
e779e2f0c9 streaming: do not fragment mutations in mixed cluster
The receiving side needs to handle fragmented mutations properly so that
isolation guarantees are not broken. If the receiving node may be an old
one do not fragment mutations.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
85c092c56c storage_service: add LARGE_PARTITIONS_FEATURE
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
c5662919df tests/streamed_mutation: test hashing
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
fe172484bd streamed_mutation: add mutation_hasher
mutation_hasher is a consumer of streamed_mutation that feeds its data
to a specified hasher.
It is not compatible with hashing_partition_visitor.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
eb1dcf08e7 tests/streamed_mutation: add test for range_tombstones_stream
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
93cc4454a6 streamed_mutation: emit range_tombstones directly
Originally, streamed_mutations guaranteed that emitted tombstones are
disjoint. In order to achieve that two separate objects were produced
for each range tombstone: range_tombstone_begin and range_tombstone_end.

Unfortunately, this forced sstable writer to accumulate all clustering
rows between range_tombstone_begin and range_tombstone_end.

However, since there is no need to write disjoint tombstones to sstables
(see #1153 "Write range tombstones to sstables like Cassandra does") it
is also not necessary for streamed_mutations to produce disjoint range
tombstones.

This patch changes that by making streamed_mutation produce
range_tombstone objects directly.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:18 +01:00
Paweł Dziepak
c3a8539074 streamed_mutation: add more comparators to position_in_partition
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:08 +01:00
Paweł Dziepak
27fea7bf2c mutation_partition: add non-cons rows and tombstones accessors
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
2208d4b53e range_tombstone_list: add non-const begin() and end()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
5a790a9b49 range_tombstone: add flip()
range_tombstone::flip() flips range bounds. This is necessary in order
to use range tombstone in reversed mutation fragment streams.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
e1d306fa0d range_tombstone: add memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
91a866501d range_tombstone: add range_tombstone_accumulator
range_tombstone_accumulator is a helper class that allows determining
tombstone for a clustering row when range tombstones and clustering rows
are streamed from streamed_mutation.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
cd7937d33b range_tombstone: add apply()
range_tombstone::apply() allows merging two, possibly overlapping, range
tombstones with the same start bound and produces one or two disjoint
range tombstones as a result.

It is intended to be used for merging tombstones coming from different
sources.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Nadav Har'El
aec90a22da sstable parsing: assert we do not lose clustering rows
The sstable parsing code calls mp_row_consumer::flush() after every
clustering row has been read, and this puts the now complete row in a single
field "_ready". The assumption is that at this point parsing will stop, the
consumer will move out this _ready (mp_row_consumer::get_mutation_fragment())
and when flush() is later called again, _ready will be empty again.

This assumption is correct in our code, but is based on an intricate
combination of estoreric parts of the code, such as:

 1. In data_consume_row_context we stop parsing after reading the parition's
    header, before reading any clustering rows, giving the caller the chance
    to call sstable_streamed_mutation::read_next() to be prepared for the
    incoming mutations.

 2. In mp_row_consumer::flush_if_needed(), we stop the parser after each
    individual clustering row.

It is easy to break this assumption, and I did this in one of my code changes,
and the result was silent loss of clustering rows, as "_ready" got silently
overwritten before the reader had a chance to move it out.

What this patch does is to add an assertion: If a clustering row is silently
lost before being transferred to the mutation fragment reader, we croak.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1468389955-24600-1-git-send-email-nyh@scylladb.com>
2016-07-13 09:42:48 +01:00