Commit Graph

12867 Commits

Author SHA1 Message Date
Vladimir Krivopalov
003e8cf250 Use python2 explicitly as an interpreter for Python v2 scripts
Signed-off-by: Vladimir Krivopalov <vladimir.krivopalov@gmail.com>
Message-Id: <20170811032712.4362-1-vladimir.krivopalov@gmail.com>
2017-08-11 18:08:11 +03:00
Duarte Nunes
20337053ad Don't use literal lambdas
These are only available in C++17. Fixes the build after b5460c2.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-11 13:08:42 +02:00
Duarte Nunes
b5460c2990 Merge "Support duration type" from Jesse
"This patch series adds support for the `duration` type in CQL, which
was added to Cassandra in 3.10.

As part of this work, it was necessary also to add support for the
`vint` and `unsigned vint` types to the native protocol implementation,
which are part of v5 of the specification.

To test interactively, it is necessary to use cqlsh distributed with
Cassandra, as the version we distribute does not yet support the
duration type."

* 'jhk/duration_protocol/v5' of https://github.com/hakuch/scylla:
  Support `duration` CQL native type
  CQL native protocol: Add support for `vint` serialization
  duration_test.cc: Add test for printing zero duration
  duration.cc: Remove nop `const` qualifier on return type
  Change `const` qualifier declaration order for `duration`
  duration.cc: Simplify range checking
  Rename `duration` to `cql_duration`
2017-08-11 10:56:55 +01:00
Duarte Nunes
bcf21aacc2 storage_proxy: Directly call query_nonsingular_mutations_locally
Instead of duplicating the branch.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170811001559.25788-1-duarte@scylladb.com>
2017-08-11 09:06:01 +03:00
Duarte Nunes
a3ee99554b service/storage_proxy: Remove out of date comment
Now that we don't go directly to reconciliation for range queries, the
result isn't required to have the row and partition counts calculated
(we no longer transform a reconciled_result to a query::result).

Furthermore, this line was causing a lot of dtests to fail on account
of them not expecting an error line in the logs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170810225351.12610-1-duarte@scylladb.com>
2017-08-11 09:04:23 +03:00
Jesse Haber-Kucharsky
509626fe08 Support duration CQL native type
`duration` is a new native type that was introduced in Cassandra 3.10 [1].

Support for parsing and the internal representation of the type was added in
8fa47b74e8.

Important note: The version of cqlsh distributed with Scylla does not have
support for durations included (it was added to Cassandra in [2]). To test this
change, you can use cqlsh distributed with Cassandra.

Duration types are useful when working with time-series tables, because they can
be used to manipulate date-time values in relative terms.

Two interesting applications are:

- Aggregation by time intervals [3]:

`SELECT * FROM my_table GROUP BY floor(time, 3h)`

- Querying on changes in date-times:

`SELECT ... WHERE last_heartbeat_time < now() - 3h`

(Note: neither of these is currently supported, though columns with duration
values are.)

Internally, durations are represented as three signed counters: one for months,
for days, and for nanoseconds. Each of these counters is serialized using a
variable-length encoding which is described in version 5 of the CQL native
protocol specification.

The representation of a duration as three counters means that a semantic
ordering on durations doesn't exist: Is `1mo` greater than `1mo1d`? We cannot
know, because some months have more days than others. Durations can only have a
concrete absolute value when they are "attached" to absolute date-time
references. For example, `2015-04-31 at 12:00:00 + 1mo`.

That duration values are not comparable presents some difficulties for the
implementation, because most CQL types are. Like in Cassandra's implementation
[2], I adopted a similar strategy to the way restrictions on the `counter` type
are checked. A type "references" a duration if it is either a duration or it
contains a duration (like a `tuple<..., duration, ...>`, or a UDT with a
duration member).

The following restrictions apply on durations. Note that some of these contexts
are either experimental features (materialized views), or not currently
supported at run-time (though support exists in the parser and code, so it is
prudent to add the restrictions now):

- Durations cannot appear in any part of a primary key, either for tables or
  materialized views.

- Durations cannot be directly used as the element type of a `set`, nor can they
  be used as the key type of a `map`. Because internal ordering on durations is
  based on a byte-level comparison, this property of Cassandra was intended to
  help avoid user confusion around ordering of collection elements.

- Secondary indexes on durations are not supported.

- "Slice" relations (<=, <, >=, >) are not supported on durations with `WHERE`
   restrictions (like `SELECT ... WHERE span <= 3d`). Multi-column restrictions
   only work with clustering columns, which cannot be `duration` due to the
   first rule.

- "Slice" relations are not supported on durations with query conditions (like
  `UPDATE my_table ... IF span > 5us`).

Backwards incompatibility note:

As described in the documentation [4], duration literals take one of two
forms: either ISO 8601 formats (there are three), or a "standard" format. The ISO
8601 formats start with "P" (like "P5W"). Therefore, identifiers that have this
form are no longer supported.

Fixes #2240.

[1] https://issues.apache.org/jira/browse/CASSANDRA-11873

[2] bfd57d13b7

[3] https://issues.apache.org/jira/browse/CASSANDRA-11871

[4] http://cassandra.apache.org/doc/latest/cql/types.html#working-with-durations
2017-08-10 15:01:10 -04:00
Jesse Haber-Kucharsky
91dab1d998 CQL native protocol: Add support for vint serialization
Version 5 of the native protocol for CQL [1] adds the `vint` and `unsigned vint`
types.

An unsigned integer encoded as a `vint` has a variable size based on the
magnitude of the value. The first byte indicates the total number of bytes.

For signed integers, a "zig-zag" encoding scheme ensures that small negative
values are encoded as short-length `vint`s (0 -> 0, -1 -> 1, 1 -> 2, 2 -> 3, -2
-> 4, etc).

[1] https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec
2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky
77489f843f duration_test.cc: Add test for printing zero duration
It's somewhat counter-intuitive, but Cassandra also formats zero-valued
duration values as an empty string.
2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky
d9c027c2dd duration.cc: Remove nop const qualifier on return type
These have no effect according to the Clang static analyzer.
2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky
54c3cf0201 Change const qualifier declaration order for duration
The vast majority of the code-base is written in left-`const` style, and
consistency is important.
2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky
1889b036b1 duration.cc: Simplify range checking 2017-08-10 14:11:23 -04:00
Avi Kivity
301358e440 Merge "Optimize combined_mutation_reader for disjoint sstable ranges" from Botond
"sstables will sometimes have narrow/disjont ranges (e.g. LCS L1+).
This can be exploited when reading from a range of sstables by opening
sstables on-demand thus saving memory, processing and potentially I/O.

To achieve this combined_mutation_reader is refactored such that the
reader selection logic is moved-out into a reader_selector class.
combined_mutation_reader now takes a reader_selector instance in its
constructor and asks it for new readers for the current ring position
on every call to operator()().

At the moment two specializations of reader_selector are provided:
* list_reader_selector which implements the current logic, that is using
    a provided mutation_reader list, and
* incremental_reader_selector which implements the on-demand opening
    logic discussed above.

Fixes #1935"

* 'bdenes/optimize_combined_reader-v6' of https://github.com/denesb/scylla:
  Add combined_mutation_reader_test unit test
  Remove range_sstable_reader
  Add incremental_reader_selector
  Add reader_selector to combined_mutation_reader
  sstable_set::incremental_selector: select() now returns a selection
2017-08-10 15:16:30 +03:00
Botond Dénes
9ee9988097 Add combined_mutation_reader_test unit test 2017-08-10 12:38:10 +03:00
Botond Dénes
3e97a5cd6b Remove range_sstable_reader
range_sstable_reader is replaced with combined_mutation_reader, using
the incremental_reader_selector.
2017-08-10 12:38:10 +03:00
Botond Dénes
bfc74f1312 Add incremental_reader_selector
incremental_reader_selector is a specialization of reader_selector for
the case when sstables have narrow and/or disjoint token ranges. To
exploit this it creates new readers on-demand when their sstable's
token range intersects with the current ring position.
2017-08-10 12:38:02 +03:00
Botond Dénes
a6b9186cab Add reader_selector to combined_mutation_reader
combined_mutation_reader now accepts as a constructor argument a
reader_selector instance whoose task is to create new readers on
each call to operator()() if needed and possible.
This way it is possible to control how readers are created through
different specializations of reader_selector.

The previous logic is refactored into list_reader_selector which
is using a pre-provided mutation_reader list and forwards all of them to
combined_mutation_reader at once.
2017-08-10 12:37:40 +03:00
Takuya ASADA
1cb0fff146 dist/common/scripts/scylla_raid_setup: handle '--disks' parameter correctly when disk list is end with ','
We should handle parameters correctly even it's malformed.
Fixes #2402

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1499266239-27551-1-git-send-email-syuu@scylladb.com>
2017-08-10 11:42:33 +03:00
Takuya ASADA
8e115d69a9 dist/debian: append postfix '~DISTRIBUTION' to scylla package version
We are moving to aptly to release .deb packages, that requires debian repository
structure changes.
After the change, we will share 'pool' directory between distributions.
However, our .deb package name on specific release is exactly same between
distributions, so we have file name confliction.
To avoid the problem, we need to append distribution name on package version.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1502312935-22348-1-git-send-email-syuu@scylladb.com>
2017-08-10 10:53:56 +03:00
Vlad Zolotarov
1b4594b03a transport::server::process_prepare() don't ignore errors on other shards
If storing of the statement fails on any shard we should fail the whole PREPARE
request.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1502325392-31169-13-git-send-email-vladz@scylladb.com>
2017-08-10 10:32:37 +03:00
Jesse Haber-Kucharsky
352e9f60ba Rename duration to cql_duration
`std::chrono::duration` is a prolific enough name that it's best to
disambiguate.
2017-08-09 15:15:20 -04:00
Botond Dénes
94fc550e68 sstable_set::incremental_selector: select() now returns a selection
A seletion contains - in addition to the list of sstables - a next_token
which is a hint as to what is the next best token to call select() with.
This should be the smallest token such that at the next call to
select() the least number of new sstables will be returned, without
skipping any.
2017-08-09 16:27:33 +03:00
Takuya ASADA
3077416ecc dist/debian: Backport scalability fix of _Unwind_Find_FDE to out gcc for Debian 8
Since we provide custom build gcc only for Debian 8, the fix is not apply to
Ubuntu/Debian 9.

Fixes #2646

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1502239191-12649-1-git-send-email-syuu@scylladb.com>
2017-08-09 12:19:52 +03:00
Avi Kivity
7217b7ab36 Merge "Use range_streamer everywhere" from Asias
"With this series, all the following cluster operations:

- bootstrap
- rebuild
- decommission
- removenode

will use the same code to do the streaming.

The range_streamer is now extended to support both fetch from and push
to peer node. Another big change is now the range_streamer will stream
less ranges at a time, so less data, per stream_plan and range_streamer
will remember which ranges are failed to stream and can retry later.

The retry policy is very simple at the moment it retries at most 5 times
and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes ....

Later, we can introduce api for user to decide when to stop retrying and
the retry interval.

The benefits:

 - All the cluster operation shares the same code to stream
 - We can know the operation progress, e.g., we can know total number of
   ranges need to be streamed and number of ranges finished in
   bootstrap, decommission and etc.
 - All the cluster operation can survive peer node down during the
   operation which usually takes long time to complete, e.g., when adding
   a new node, currently if any of the existing node which streams data to
   the new node had issue sending data to the new node, the whole bootstrap
   process will fail. After this patch, we can fix the problematic node
   and restart it, the joining node will retry streaming from the node
   again.
 - We can fail streaming early and timeout early and retry less because
   all the operations use stream can survive failure of a single
   stream_plan. It is not that important for now to have to make a single
   stream_plan successful. Note, another user of streaming, repair, is now
   using small stream_plan as well and can rerun the repair for the
   failed ranges too.

This is one step closer to supporting the resumable add/remove node
opeartions."

* tag 'asias/use_range_streamer_everywhere_v4' of github.com:cloudius-systems/seastar-dev:
  storage_service: Use the new range_streamer interface for removenode
  storage_service: Use the new range_streamer interface for decommission
  storage_service: Use the new range_streamer interface for rebuild
  storage_service: Use the new range_streamer interface for bootstrap
  dht: Extend range_streamer interface
2017-08-09 10:00:25 +03:00
Takuya ASADA
98fc7b376d dist/redhat: install mdadm/xfsprogs on package install time
We experienced 'Constructing RAID volume...' takes too much time on some AMIs,
this is because setup script stuck at 'yum -y install mdadm xfsprogs'.
We don't have to install these packages on AMI startup time, we should
preinstall them on AMI creating time.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1502192796-21040-1-git-send-email-syuu@scylladb.com>
2017-08-09 09:10:34 +03:00
Piotr Jastrzebski
4137517cdc Check arguments of table_helper::setup_keyspace
to make sure all table helpers passed as arguments are
for the right keyspace.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <10edacd509880bb18180f13e8c28593d068c5c7b.1501688729.git.piotr@scylladb.com>
2017-08-08 15:55:06 +03:00
Piotr Jastrzebski
2d8a80f211 Make table_helper constructor safer
by taking keyspace name by value and storing it inside the object.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <a5dab41647348ae311e023fe5592aec650c6e32a.1501688729.git.piotr@scylladb.com>
2017-08-08 15:55:06 +03:00
Daniel Fiala
06089474c9 Print warning if user uses default cluster_name
* Configuration for cluster_name is commented-out in config file.
* Default value set to empty string and if not rewritten by user then
  warning is printed and value is reset to "ScyllaDB Cluster".

Fixes #2648.

Message-Id: <20170808113322.9313-1-daniel@scylladb.com>
2017-08-08 14:47:17 +03:00
Avi Kivity
a71138fc84 config: mark column_index_size_in_kb as Used
Fixes #2681
Message-Id: <20170808100415.16296-1-avi@scylladb.com>
2017-08-08 11:08:00 +01:00
Ultrabug
2022da2405 Add overall python code QA and guidelines with flake8
ScyllaDB loves python & python loves ScyllaDB.

It would benefit the project to start enforcing some code guidelines
and basic QA with a linter along a PEP8 respect thanks to flake8.

This patch adds a tox config to at least start with an assessment
of the work to be done on all .py files in the code base.

To reduce its noise, tests on long lines (> 80char) are ignored
for now.

Signed-off-by: Ultrabug <ultrabug@gentoo.org>
Message-Id: <20170726134242.8927-1-ultrabug@gentoo.org>
2017-08-08 11:15:45 +03:00
Raphael S. Carvalho
dddbd34b52 sstables: close index file when sstable writer fails
index's file output stream uses write behind but it's not closed
when sstable write fails and that may lead to crash.
It happened before for data file (which is obviously easier to
reproduce for it) and was fixed by 0977f4fdf8.

Fixes #2673.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170807171146.10243-1-raphaelsc@scylladb.com>
2017-08-08 09:53:14 +03:00
Asias He
49360992d9 storage_service: Use the new range_streamer interface for removenode
So that removenode operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:48 +08:00
Asias He
6b8dc85f12 storage_service: Use the new range_streamer interface for decommission
So that decommission operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:48 +08:00
Asias He
24584b8509 storage_service: Use the new range_streamer interface for rebuild
So that rebuild operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:47 +08:00
Asias He
f239b11a84 storage_service: Use the new range_streamer interface for bootstrap
So that bootstrap operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:47 +08:00
Asias He
6810031ba7 dht: Extend range_streamer interface
After this patch and the following patches to use the new
range_streamder interface, all the following cluster operations:

- bootstrap
- rebuild
- decommission
- removenode

will use the same code to do the streaming.

The range_streamer is now extended to support both fetch from and push
to peer node. Another big change is now the range_streamer will stream
less ranges at a time, so less data, per stream_plan and range_streamer
will remember which ranges are failed to stream and can retry later.

The retry policy is very simple at the moment it retries at most 5 times
and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes ....

Later, we can introduce api for user to decide when to stop retrying and
the retry interval.

The benefits:

- All the cluster operation shares the same code to stream

- We can know the operation progress, e.g., we can know total number of
  ranges need to be streamed and number of ranges finished in
  bootstrap, decommission and etc.
- All the cluster operation can survive peer node down during the
  operation which usually takes long time to complete, e.g., when adding
  a new node, currently if any of the existing node which streams data to
  the new node had issue sending data to the new node, the whole bootstrap
  process will fail. After this patch, we can fix the problematic node
  and restart it, the joining node will retry streaming from the node
  again.
- We can fail streaming early and timeout early and retry less because
  all the operations use stream can survive failure of a single
  stream_plan. It is not that important for now to have to make a single
  stream_plan successful. Note, another user of streaming, repair, is now
  using small stream_plan as well and can rerun the repair for the
  failed ranges too.

This is one step closer to supporting the resumable add/remove node
opeartions.
2017-08-07 16:31:47 +08:00
Avi Kivity
86de6cc7fb Merge seastat upstream
* seastar f14d2a3...7a49ae5 (8):
  > sharded: improve support for cooperating sharded<> services
  > sharded: support for peer services
  > semaphore: add a version of with_semaphore that takes a duration timeout
  > scripts: perftune.py: fix the CPU mask generation for more than 64 CPUs
  > Revert "future-utils: make when_all() (vector variant) exception safe"
  > Revert "future-utils: fix gross compilation errors in when_all()"
  > future-utils: fix gross compilation errors in when_all()
  > future-utils: make when_all() (vector variant) exception safe

Includes change to batchlog_manager constructor to adapt it to
seastar::sharded::start() change.
2017-08-06 17:47:47 +03:00
Avi Kivity
3edec66903 Revert "repair: Make send_repair_checksum_range timeout"
This reverts commit 98757069a5. We have the
failure detector which will detect an unresponsive node and fail the RPC.
Adding a timeout can just introduce false positives.
2017-08-06 13:09:36 +03:00
Avi Kivity
621926d914 dist: debian: escape "$" character for make 2017-08-05 16:51:03 +03:00
Avi Kivity
a471851bf1 dist: debian: add /opt/scylladb/bin to PATH so antlr can be found 2017-08-05 15:46:58 +03:00
Avi Kivity
8bdc0dd471 dist: debian: search for libaries in /opt/scylladb/lib 2017-08-05 13:18:14 +03:00
Takuya ASADA
2ff3bdba5c dist/debian: switch Ubuntu 3rdparty packages to external build service
Switch Ubuntu to launchpad ppa:
https://launchpad.net/~scylladb/+archive/ubuntu/ppa/+packages

Since switching 3rdparty on Debian is not ready yet, keep them to use scylla
3rdparty repo, also keep --rebuild-dep option and dist/debian/dep.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1501866678-4922-1-git-send-email-syuu@scylladb.com>
2017-08-05 11:29:13 +03:00
Glauber Costa
4a911879a3 add active streaming reads metric
In commit f38e4ff3f, we have separated streaming reads from normal reads
for the purpose of determining the maximum number of reads going on.
However, we'll now be totally unaware of how many reads will be
happening on behalf of streaming and that can be important information
when debugging issues.

This patch adds this metric so we don't fly blind.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1501909973-32519-1-git-send-email-glauber@scylladb.com>
2017-08-05 11:06:37 +03:00
Duarte Nunes
587b6be089 dirty_memory_manager: Add missing include
Allows tests/memory_footprint to build on Ubuntu 14.04.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-04 10:15:23 +02:00
Avi Kivity
4f12068e50 dist: re-add --rebuild-dep to build_rpm.sh
For compatibility with existing scripts; ignored.
2017-08-04 07:10:18 +03:00
Takuya ASADA
b5e83ebd94 dist/redhat: switch 3rdparty packages to external build service
Drop existing 3rdparty build script/3rdparty repo, switch to Fedora Copr
https://copr.fedorainfracloud.org/coprs/scylladb/scylla-3rdparty/packages/

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170803110754.22152-1-syuu@scylladb.com>
2017-08-04 06:40:09 +03:00
Pekka Enberg
90872ffa1f docker: Disable stall detector
Fixes #2162

Message-Id: <1501759957-4380-1-git-send-email-penberg@scylladb.com>
2017-08-03 14:52:49 +03:00
Takuya ASADA
91ade1a660 dist/debian: check scylla user/group existance before adding them
To prevent install failing on the environment which already has scylla
user/group, existance check is needed.

Fixes #2389

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1495023805-14905-1-git-send-email-syuu@scylladb.com>
2017-08-03 13:01:18 +03:00
Takuya ASADA
6ac254fbcb dist: change nomerges=1 on block devices during fstrim execution
We have problem to run fstrim with nomerges=2, so we need to change
the parameter to 1 during fstrim execution.
To do this, this fix changes follow things:
 - revert dropping scylla_fstrim on Ubuntu 16.04/CentOS
 - disable distribution provided fstrim script
 - enable scylla_fstrim on all distributions
 - introduce --set-nomerges on scylla-blocktune
 - scylla_fstrim call scylla-blocktune by following order:
   - 'scylla-blocktune --set-nomerges 1'
   - 'fstrim' for each devices
   - 'scylla-blocktune --set-nomerges 2'

Fixes #2649

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1501531393-21109-1-git-send-email-syuu@scylladb.com>
2017-08-03 13:00:34 +03:00
Botond Dénes
0b7ac01f0f Add QtCreator project file and .gdbinit to .gitignore
Message-Id: <ff662910fe1156cdde2bda4aa5bb9cfc45bddda9.1501752340.git.bdenes@scylladb.com>
2017-08-03 12:58:35 +03:00
Avi Kivity
f38e4ff3f9 database: prevent streaming reads from blocking normal reads
Streaming reads and normal reads share a semaphore, so if a bunch of
streaming reads use all available slots, no normal reads can proceed.

Fix by assigning streaming reads their own semaphore; they will compete
with normal reads once issued, and the I/O scheduler will determine the
winner.

Fixes #2663.
Message-Id: <20170802153107.939-1-avi@scylladb.com>
2017-08-03 10:23:01 +01:00