Compare commits

..

1314 Commits

Author SHA1 Message Date
Pekka Enberg
c4bd4e89ae release: prepare for 1.3.5 2016-11-29 09:47:38 +02:00
Raphael S. Carvalho
c5f43ecd0e main: fix exception handling when initializing data or commitlog dirs
Exception handling was broken because after io checker, storage_io_error
exception is wrapped around system error exceptions. Also the message
when handling exception wasn't precise enough for all cases. For example,
lack of permission to write to existing data directory.

Fixes #883.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com>
(cherry picked from commit 9a9f0d3a0f)
2016-11-16 15:13:44 +02:00
Paweł Dziepak
c923b6e20c row_cache: touch entries read during range queries
Fixes #1847.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1479230809-27547-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 999dafbe57)
2016-11-16 13:08:41 +00:00
Amnon Heiman
e9811897d2 API: cache_capacity should use uint for summing
Using integer as a type for the map_reduce causes number over overflow.

Fixes #1801

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1479299425-782-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit a4be7afbb0)
2016-11-16 15:04:24 +02:00
Paweł Dziepak
54c338d785 partition_version: make sure that snapshot is destroyed under LSA
Snapshot destructor may free some objects managed by the LSA. That's why
partition_snapshot_reader destructor explicitly destroys the snapshot it
uses. However, it was possible that exception thrown by _read_section
prevented that from happenning making snapshot destoryed implicitly
without current allocator set to LSA.

Refs #1831.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478778570-2795-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit f16d6f9c40)
2016-11-16 12:54:16 +00:00
Glauber Costa
f705be3518 histogram: moving averages: fix inverted parameters
moving_averages constructor is defined like this:

    moving_average(latency_counter::duration interval, latency_counter::duration tick_interval)

But when it is time to initialize them, we do this:

	... {tick_interval(), std::chrono::minutes(1)} ...

As it can be seen, the interval and tick interval are inverted. This
leads to the metrics being assigned bogus values.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <d83f09eed20ea2ea007d120544a003b2e0099732.1478798595.git.glauber@scylladb.com>
(cherry picked from commit d3f11fbabf)
2016-11-11 10:16:14 +02:00
Calle Wilund
16a9027552 auth::password_authenticator: Ensure exceptions are processed in continuation
Fixes #1718 (even more)
Message-Id: <1475497389-27016-1-git-send-email-calle@scylladb.com>

(cherry picked from commit 5b815b81b4)
2016-11-07 09:25:29 +02:00
Calle Wilund
23e792c9ea auth::password_authenticator: "authenticate" should not throw undeclared excpt
Fixes #1718

Message-Id: <1475487331-25927-1-git-send-email-calle@scylladb.com>
(cherry picked from commit d24d0f8f90)
2016-11-07 09:25:24 +02:00
Pekka Enberg
5294dd9eb2 Merge seastar submodule
* seastar b62d7a5...5adb964 (2):
  > file: make close() more robust against concurrent calls
  > rpc: Do not close client connection on error response for a timed out request
2016-11-07 09:25:02 +02:00
Raphael S. Carvalho
11323582d6 lcs: fix starvation at higher levels
When max sstable size is increased, higher levels are suffering from
starvation because we decide to compact a given level if the following
calculation results in a number greater than 1.001:
level_size(L) / max_size_for_level_l(L)

Fixes #1720.

For this backport, I needed to add schema as parameter to sstable
functions that return first and last decorated keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit a8ab4b8f37)
2016-11-05 11:26:01 +02:00
Raphael S. Carvalho
69948778c6 lcs: fix broken token range distribution at higher levels
Uniform token range distribution across sstables in a level > 1 was broken,
because we were only choosing sstable with lowest first key, when compacting
a level > 0. This resulted in performance problem because L1->L2 may have a
huge overlap over time, for example.
Last compacted key will now be stored for each level to ensure sort of
"round robin" selection of sstables for compactions at level >= 1.
That's also done by C*, and they were once affected by it as described in
https://issues.apache.org/jira/browse/CASSANDRA-6284.

Fixes #1719.

For this backport, I added schema parameter to compaction_strategy::
notify_completion() because sstable doesn't store schema here.
Most conflicts were that some interfaces take schema parameter at
this version.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit a3bf7558f2)
2016-11-05 11:26:01 +02:00
Pekka Enberg
ef4332ab09 release: prepare for 1.3.4 2016-11-03 12:09:13 +02:00
Pekka Enberg
18a99c00b0 cql3: Fix selecting same column multiple times
Under the hood, the selectable::add_and_get_index() function
deliberately filters out duplicate columns. This causes
simple_selector::get_output_row() to return a row with all duplicate
columns filtered out, which triggers and assertion because of row
mismatch with metadata (which contains the duplicate columns).

The fix is rather simple: just make selection::from_selectors() use
selection_with_processing if the number of selectors and column
definitions doesn't match -- like Apache Cassandra does.

Fixes #1367
Message-Id: <1477989740-6485-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit e1e8ca2788)
2016-11-01 09:34:49 +00:00
Pekka Enberg
cc3a4173f6 release: prepare for 1.3.3 2016-10-28 09:54:41 +03:00
Pekka Enberg
4dc196164d auth: Fix resource level handling
We use `data_resource` class in the CQL parser, which let's users refer
to a table resource without specifying a keyspace. This asserts out in
get_level() for no good reason as we already know the intented level
based on the constructor. Therefore, change `data_resource` to track the
level like upstream Cassandra does and use that.

Fixes #1790

Message-Id: <1477599169-2945-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit b54870764f)
2016-10-27 23:38:01 +03:00
Glauber Costa
31ba6325ef auth: always convert string to upper case before comparing
We store all auth perm strings in upper case, but the user might very
well pass this in upper case.

We could use a standard key comparator / hash here, but since the
strings tend to be small, the new sstring will likely be allocated in
the stack here and this approach yields significantly less code.

Fixes #1791.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <51df92451e6e0a6325a005c19c95eaa55270da61.1477594199.git.glauber@scylladb.com>
(cherry picked from commit ef3c7ab38e)
2016-10-27 22:10:04 +03:00
Tomasz Grabiec
dcd8b87eb8 Merge seastar upstream
* seastar b62d7a5...0fd8792 (1):
  > rpc: Do not close client connection on error response for a timed out request

Refs #1778
2016-10-25 13:56:36 +02:00
Tomasz Grabiec
6b53aad9fb partition_version: Fix corruption of partition_version list
The move constructor of partition_version was not invoking move
constructor of anchorless_list_base_hook. As a result, when
partition_version objects were moved, e.g. during LSA compaction, they
were unlinked from their lists.

This can make readers return invalid data, because not all versions
will be reachable.

It also casues leaks of the versions which are not directly attached
to memtable entry. This will trigger assertion failure in LSA region
destructor. This assetion triggers with row cache disabled. With cache
enabled (default) all segments are merged into the cache region, which
currently is not destroyed on shutdown, so this problem would go
unnoticed. With cache disabled, memtable region is destroyed after
memtable is flushed and after all readers stop using that memtable.

Fixes #1753.
Message-Id: <1476778472-5711-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit fe387f8ba0)
2016-10-18 11:00:19 +02:00
Pekka Enberg
762b156809 database: Fix io_priority_class related compilation error
Commit e6ef49e ("db: Do not timeout streaming readers") breaks compilation of database.cc:

  database.cc: In lambda function:
  database.cc:282:62: error: ‘const class io_priority_class’ has no member named ‘id’
               if (service::get_local_streaming_read_priority().id() == pc.id()) {
                                                              ^~
  database.cc:282:73: error: ‘const class io_priority_class’ has no member named ‘id’
               if (service::get_local_streaming_read_priority().id() == pc.id()) {

...because we don't have Seastar commit 823a404 ("io_priority_class:
remove non-explicit operator unsigned") backported.

Fix the issue by using the non-explicit operator instead of explicit id().

Acked-by: Tomasz Grabiec <tgrabiec@scylladb.com>
Message-Id: <1476425276-17171-1-git-send-email-penberg@scylladb.com>
2016-10-14 13:28:32 +03:00
Pekka Enberg
f3ed1b4763 release: prepare for 1.3.2 2016-10-13 20:34:46 +03:00
Paweł Dziepak
6618236fff query_pagers: fix clustering key range calculation
Paging code assumes that clustering row range [a, a] contains only one
row which may not be true. Another problem is that it tries to use
range<> interface for dealing with clustering key ranges which doesn't
work because of the lack of correct comparator.

Refs #1446.
Fixes #1684.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1475236805-16223-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit eb1fcf3ecc)
2016-10-10 16:08:09 +03:00
Asias He
ecb0e44480 gossip: Switch to use system_clock
The expire time which is used to decide when to remove a node from
gossip membership is gossiped around the cluster. We switched to steady
clock in the past. In order to have a consistent time_point in all the
nodes in the cluster, we have to use wall clock. Switch to use
system_clock for gossip.

Fixes #1704

(cherry picked from commit f0d3084c8b)
2016-10-09 18:12:46 +03:00
Tomasz Grabiec
e6ef49e366 db: Do not timeout streaming readers
There is a limit to concurrency of sstable readers on each shard. When
this limit is exhausted (currently 100 readers) readers queue. There
is a timeout after which queued readers are failed, equal to
read_request_timeout_in_ms (5s by default). The reason we have the
timeout here is primarily because the readers created for the purpose
of serving a CQL request no longer need to execute after waiting
longer than read_request_timeout_in_ms. The coordinator no longer
waits for the result so there is no point in proceeding with the read.

This timeout should not apply for readers created for streaming. The
streaming client currently times out after 10 minutes, so we could
wait at least that long. Timing out sooner makes streaming unreliable,
which under high load may prevent streaming from completing.

The change sets no timeout for streaming readers at replica level,
similarly as we do for system tables readers.

Fixes #1741.

Message-Id: <1475840678-25606-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 2a5a90f391)
2016-10-09 10:33:39 +03:00
Tomasz Grabiec
7d24a3ed56 transport: Extend request memory footprint accounting to also cover execution
CQL server is supposed to throttle requests so that they don't
overflow memory. The problem is that it currently accounts for
request's memory only around reading of its frame from the connection
and not actual request execution. As a result too many requests may be
allowed to execute and we may run out of memory.

Fixes #1708.
Message-Id: <1475149302-11517-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 7e25b958ac)
2016-10-03 14:15:21 +03:00
Avi Kivity
ae2a1158d7 Update seastar submodule
* seastar 9b541ef...b62d7a5 (1):
  > semaphore: Introduce get_units()
2016-10-03 14:11:26 +03:00
Asias He
f110c2456a gossip: Do not remove failure_detector history on remove_endpoint
Otherwise a node could wrongly think the decommissioned node is still
alive and not evict it from the gossip membership.

Backport: CASSANDRA-10371

7877d6f Don't remove FailureDetector history on removeEndpoint

Fixes #1714
Message-Id: <f7f6f1eec2aab1b97a2e568acfd756cca7fc463a.1475112303.git.asias@scylladb.com>

(cherry picked from commit 511f8aeb91)
2016-09-29 13:27:34 +03:00
Raphael S. Carvalho
37d27b1144 api: implement api to return sstable count per level
'nodetool cfstats' wasn't showing per-level sstable count because
the API wasn't implemented.

Fixes #1119.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0dcdf9196eaec1692003fcc8ef18c77d0834b2c6.1474410770.git.raphaelsc@scylladb.com>
(cherry picked from commit 67343798cf)
2016-09-26 09:51:33 +03:00
Tomasz Grabiec
ff674391c2 Merge sestar upstream
Refs #1622
Refs #1690

* seastar b58a287...9b541ef (2):
  > input_stream: Fix possible infinite recursion in consume()
  > iostream: Fix stack overflow in output_stream::split_and_put()
2016-09-22 14:58:44 +02:00
Asias He
55c9279354 gossip: Fix std::out_of_range in setup_collectd
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.

Fixes the issue:

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)

Fixes #1656

Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
(cherry picked from commit aa47265381)
2016-09-20 21:10:55 +03:00
Tomasz Grabiec
ef7b4c61ff tests: Add test for UUID type ordering
Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 2282599394)
2016-09-20 12:22:03 +02:00
Tomasz Grabiec
b9e169ead9 types: fix uuid_type_impl::less
timeuuid_type_impl::compare_bytes is a "trichotomic" comparator (-1,
0, 1) while less() is a "less" comparator (false, true). The code
incorrectly returns c1 instead of c1 < 0 which breaks the ordering.

Fixes #1196.
Message-Id: <1473956716-5209-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 804fe50b7f)
2016-09-20 12:22:00 +02:00
Shlomi Livne
dabbadcf39 release: prepare for 1.3.1
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-09-18 13:18:20 +03:00
Duarte Nunes
c9cb14e160 thrift: Correctly detect clustering range wrap around
This patch uses the clustering bounds comparator to correctly detect
wrap around of a clustering range in the thrift handler.

Refs #1446

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473952155-14886-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit bc3cbb7009)
2016-09-15 16:52:43 +01:00
Shlomi Livne
985352298d ami: Fix instructions how to run scylla_io_setup on non ephemeral instances
On instances differenet then i2/m3/c3 we provide instructions to run
scylla_ip_setup. Running scylla_io_setup requires access to
/var/lib/scylla to crate a temporary file. To gain access to that
directory the user should run 'sudo scylla_io_setup'.

refs: #1645

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4ce90ca1ba4da8f07cf8aa15e755675463a22933.1473935778.git.shlomi@scylladb.com>
(cherry picked from commit acb83073e2)
2016-09-15 15:27:02 +01:00
Paweł Dziepak
c2d347efc7 Merge "Fix abort when querying with contradicting clustering restrictions" from Tomek
"This series backports fixes for #1670 on top of 1.3 branch.

Fixes abort when querying with contradicting clustering column
restrictions, for example:

   SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2"
2016-09-15 14:55:28 +01:00
Tomasz Grabiec
78c7408927 Fix abort when querying with contradicting clustering restrictions
Example of affected query:

  SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2

Refs #1670.

This commit brings back the backport of "Don't allow CK wrapping
ranges" by Duarte by reverting commit 11d7f83d52.

It also has the following fix, which is introduced by the
aforementioned commit, squashed to improve bisectability:

"cql3: Consider bound type when detecting wrap around

 This patch uses the clustering bounds comparator to correctly detect
 wrap around of a clustering range. This fixes a manifestation of #1446,
 introduced by b1f9688432, where a query
 such as select * from cf where k = 0x00 and c0 = 0x02 and c1 > 0x02
 would result in a range containing a clustering key and a prefix,
 incorrectly ordered by the prefix equality or lexicographical
 comparators.

 Refs #1446

 Signed-off-by: Duarte Nunes <duarte@scylladb.com>
 (cherry picked from commit ee2694e27d)"
2016-09-14 19:50:49 +02:00
Duarte Nunes
5716decf60 bounds_view: Create from nonwrapping_range
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 084b931457)
2016-09-14 18:28:55 +02:00
Duarte Nunes
f60eb3958a range_tombstone: Extract out bounds_view
This patch extracts bounds_view from range_tombstone so its comprator
can be reused elsewhere.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 878927d9d2)
2016-09-14 18:28:55 +02:00
Tomasz Grabiec
7848781e5f database: Ignore spaces in initial_token list
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:

  initial_token: -1100081313741479381, -1104041856484663086, ...

Fixes #1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit a498da1987)
2016-09-14 12:03:06 +03:00
Pekka Enberg
195994ea4b Update scylla-ami submodule
* dist/ami/files/scylla-ami 14c1666...e1e3919 (1):
  > scylla_ami_setup: remove scylla_cpuset_setup
2016-09-07 21:05:33 +03:00
Takuya ASADA
183910b8b4 dist/common/scripts/scylla_sysconfig_setup: sync cpuset parameters with rps_cpus settings when posix_net_conf.sh is enabled and NIC is single queue
On posix_net_conf.sh's single queue NIC mode (which means RPS enabled mode), we are excluded cpu0 and it's sibling from network stack processing cpus, and assigned NIC IRQ to cpu0.
So always network stack is not working on cpu0 and it's sibling, to get better performance we need to exclude these cpus from scylla too.
To do this, we need to get RPS cpu mask from posix_net_conf.sh, pass it to scylla_cpuset_setup to construct /etc/scylla.d/cpuset.conf when scylla_setup executed.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-2-git-send-email-syuu@scylladb.com>
(cherry picked from commit 533dc0485d)
2016-09-07 21:00:30 +03:00
Takuya ASADA
f4746d2a46 dist/common/scripts/scylla_prepare: drop unnecesarry multiqueue NIC detection code on scylla_prepare
Right now scylla_prepare specifies -mq option to posix_net_conf.sh when number of RX queues > 1, but on posix_net_conf.sh it sets NIC mode to sq when queues < ncpus / 2.
So the logic is different, and actually posix_net_conf.sh does not need to specify -sq/-mq now, it autodetects queue mode.
So we need to drop detection logic from scylla_prepare, let posix_net_conf.sh to detect it.

Fixes #1406

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 0c3bb2ee63)
2016-09-07 20:59:10 +03:00
Pekka Enberg
fa7f990407 Update seastar submodule
* seastar e6571c4...b58a287 (3):
  > scripts/posix_net_conf.sh: supress 'ls: cannot access
  > /sys/class/net/<NIC>/device/msi_irqs/' error message
  > scripts/posix_net_conf.sh: fix 'command not found' error when
  > specifies --cpu-mask
  > scripts/posix_net_conf.sh: add support --cpu-mask mode
2016-09-07 20:58:09 +03:00
Pekka Enberg
86dbcf093b systemd: Don't start Scylla service until network is up
Alexandr Porunov reports that Scylla fails to start up after reboot as follows:

  Aug 25 19:44:51 scylla1 scylla[637]: Exiting on unhandled exception of type 'std::system_error': Error system:99 (Cannot assign requested address)

The problem is that because there's no dependency to network service,
Scylla simply attempts to start up too soon in the boot sequence and
fails.

Fixes #1618.

Message-Id: <1472212447-21445-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 2d3aee73a6)
2016-08-29 13:26:11 +03:00
Takuya ASADA
db89811fcc dist/common/scripts/scylla_setup: support enabling services on Ubuntu 15.10/16.04
Right now it ignores Ubuntu, but we shareing .service between Fedora/CentOS and Ubuntu >= 15.10, so support it.

Fixes #1556.

Message-Id: <1471932814-17347-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 74d994f6a1)
2016-08-29 13:26:08 +03:00
Duarte Nunes
9ec939f6a3 thrift: Avoid always recording size estimates
Size estimates for a particular column family are recorded every 5
minutes. However, when a user calls the describe_splits(_ex) verbs,
they may want to see estimates for a recently created and updated
column family; this is legitimate and common in testing. However, a
client may also call describe_splits(_ex) very frequently and
recording the estimates on every call is wasteful and, worse, can
cause clients to give up. This patch fixes this by only recording
estimates if the first attempt to query them produces no results.

Refs #1139

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471900595-4715-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 440c1b2189)
2016-08-29 12:08:53 +03:00
Pekka Enberg
d40f586839 dist/docker: Clean up Scylla description for Docker image
Message-Id: <1472145307-3399-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit c5e5e7bb40)
2016-08-29 10:49:09 +03:00
Raphael S. Carvalho
d55c55efec api: use estimation of pending tasks in compaction manager too
We have API for getting pending compaction tasks both in column
family and compaction manager. Column family is already returning
pending tasks properly.
Compaction manager's one is used by 'nodetool compactionstats', and
was returning a value which doesn't reflect pending compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a20b88938ad39e95f98bfd7f93e4d1666d1c6f95.1471641211.git.raphaelsc@scylladb.com>
(cherry picked from commit d8be32d93a)
2016-08-29 10:20:20 +03:00
Raphael S. Carvalho
dc79761c17 sstables: Fix estimation of pending tasks for leveled strategy
There were two underflow bugs.

1) in variable i, causing get_level() to see an invalid level and
throw an exception as a result.
2) when estimating number of pending tasks for a level.

Fixes #1603.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <cce993863d9de4d1f49b3aabe981c475700595fc.1471636164.git.raphaelsc@scylladb.com>
(cherry picked from commit 77d4cd21d7)
2016-08-29 10:19:30 +03:00
Paweł Dziepak
bbffb51811 mutation_partition: fix iterator invalidation in trim_rows
Reversed iterators are adaptors for 'normal' iterators. These underlying
iterators point to different objects that the reversed iterators
themselves.

The consequence of this is that removing an element pointed to by a
reversed iterator may invalidate reversed iterator which point to a
completely different object.

This is what happens in trim_rows for reversed queries. Erasing a row
can invalidate end iterator and the loop would fail to stop.

The solution is to introduce
reversal_traits::erase_dispose_and_update_end() funcion which erases and
disposes object pointed to by a given iterator but takes also a
reference to and end iterator and updates it if necessary to make sure
that it stays valid.

Fixes #1609.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 6012a7e733)
2016-08-25 17:40:37 +03:00
Amnon Heiman
ec3ace5aa3 housekeeping: Silently ignore check version if Scylla is not available
Normally, the check version should start and stop with the scylla-server
service.

If it fails to find scylla server, there is no need to check the
version, nor to report it, so it can stop silently.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 2b98335da4)
2016-08-23 18:11:44 +03:00
Amnon Heiman
1f2d1012be housekeeping: Use curl instead of Python's libraries
There is a problem with Python SSL's in Ubuntu 14.04:

  ubuntu@ip-10-81-165-156:~$ /usr/lib/scylla/scylla-housekeeping -q version
  Traceback (most recent call last):
    File "/usr/lib/scylla/scylla-housekeeping", line 94, in <module>
      args.func(args)
    File "/usr/lib/scylla/scylla-housekeeping", line 71, in check_version
      latest_version = get_json_from_url(version_url + "?version=" + current_version)["version"]
    File "/usr/lib/scylla/scylla-housekeeping", line 50, in get_json_from_url
      response = urllib2.urlopen(req)
    File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
      return _opener.open(url, data, timeout)
    File "/usr/lib/python2.7/urllib2.py", line 404, in open
      response = self._open(req, data)
    File "/usr/lib/python2.7/urllib2.py", line 422, in _open
      '_open', req)
    File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
      result = func(*args)
    File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
      return self.do_open(httplib.HTTPSConnection, req)
    File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
      raise URLError(err)
  urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

Instead of using Python libraries to connect to the check version
server, we will use curl for that.

Fixes #1600

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 4598674673)
2016-08-23 18:11:38 +03:00
Amnon Heiman
9c8cfd3c0e housekeeping: Add curl as a dependency
To work around an SSL problem with Python on Ubuntu 14.04, we need to
use curl. Add it as a dependency so that it's available on the host.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 91944b736e)
2016-08-23 18:11:13 +03:00
Takuya ASADA
d9e4ab38f6 dist/ubuntu: support scylla-housekeeping service on all Ubuntu versions
Current scylla-housekeeping support on Ubuntu has bug, it does not installs .service/.timer for Ubuntu 16.04.
So fix it to make it work.

Fixes #1502

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Tested-by: Amos Kong <amos@scylladb.com>
Message-Id: <1471607903-14889-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 80f7449095)
2016-08-23 13:50:43 +03:00
Takuya ASADA
5d72e96ccc dist/common/systemd: don't use .in for scylla-housekeeping.*, since these are not template file
.in is the name for template files witch requires to rewrite on building time, but these systemd unit files does not require rewrite, so don't name .in, reference directly from .spec.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1471607533-3821-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit aac60082ae)
2016-08-23 13:50:36 +03:00
Paweł Dziepak
a072df0e09 sstables: do not call consume_end_partition() after proceed::no
After state_processor().process_state() returns proceed::no the upper
layer should have a chance to act before more data is pushed to the
consumer. This means that in case of proceed::no verify_end_state()
should not be called immediately since it may invoke
consume_end_partition().

Fixes #1605.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471943032-7290-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 5feed84e32)
2016-08-23 12:24:59 +03:00
Pekka Enberg
8291ec13fa dist/docker: Separate supervisord config files
Move scylla-server and scylla-jmx supervisord config files to separate
files and make the main supervisord.conf scan /etc/supervisord.conf.d/
directory. This makes it easier for people to extend the Docker image
and add their own services.

Message-Id: <1471588406-25444-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 9d1d8baf37)
2016-08-23 11:56:24 +03:00
Vlad Zolotarov
c485551488 tracing::trace_state: fix a compilation error with gcc 4.9
See #1602.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1471774784-26266-1-git-send-email-vladz@cloudius-systems.com>
2016-08-21 16:39:50 +03:00
Paweł Dziepak
b85164bc1d sstables: optimise clustering rows filtering
Clustering rows in the sstables are sorted in the ascending order so we
can use that to minimise number of comparisons when checking if a row is
in the requested range.

Refs #1544.

Paweł further explains the backport rationale for 1.3:

"Apart from making sense on its own, this patch has a very curious
property
of working around #1544 in a way that doesn't make #1446 hit us harder
than
usual.
So, in the branch-1.3 we can:
 - revert 85376ce555
   'Merge "Don't allow CK wrapping ranges" from Duarte' -- previous,
   insufficient workaround for #1544
 - apply this patch
 - rejoice as cql_query_test passes and #1544 is no longer a problem

The scenario above assumes that this patch doesn't introduces any
regressions."

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Reviewed-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <1471608921-30818-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit e60bb83688)
2016-08-19 18:11:49 +03:00
Pekka Enberg
11d7f83d52 Revert "Merge "Don't allow CK wrapping ranges" from Duarte"
This reverts commit 85376ce555, reversing
changes made to 3f54e0c28e.

The change breaks CQL range queries.
2016-08-19 18:11:31 +03:00
Pekka Enberg
5c4a24c1c0 dist/docker: Use Scylla mascot as the logo
Glauber "eagle eyes" Costa pointed out that the Scylla logo used in our
Docker image documentation looks broken because it's missing the Scylla
text.

Fix the problem by using the Scylla mascot instead.
Message-Id: <1471525154-2800-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 2bf5e8de6e)
2016-08-19 12:50:46 +03:00
Pekka Enberg
306eeedf3e dist/docker: Fix bug tracker URL in the documentation
The bug tracker URL in our Docker image documentation is not clickable
because the URL Markdown extracts automatically is broken.

Fix that and add some more links on how to get help and report issues.
Message-Id: <1471524880-2501-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 4d90e1b4d4)
2016-08-19 12:50:42 +03:00
Pekka Enberg
9eada540d9 release: prepare for 1.3.0 2016-08-18 16:20:21 +03:00
Yoav Kleinberger
a662765087 docker: extend supervisor capabilities
allow user to use the `supervisorctl' program to start and stop
services. `exec` needed to be added to the scylla and scylla-jmx starter
scripts - otherwise supervisord loses track of the actual process we
want to manage.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1471442960-110914-1-git-send-email-yoav@scylladb.com>
(cherry picked from commit 25fb5e831e)
2016-08-18 15:41:08 +03:00
Pekka Enberg
192f89bc6f dist/docker: Documentation cleanups
- Fix invisible characters to be space so that Markdown to PDF
  conversion works.

- Fix formatting of examples to be consistent.

- Spellcheck.

Message-Id: <1471514924-29361-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 1553bec57a)
2016-08-18 13:14:21 +03:00
Pekka Enberg
b16bb0c299 dist/docker: Document image command line options
This patch documents all the command line options Scylla's Docker image supports.

Message-Id: <1471513755-27518-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 4ca260a526)
2016-08-18 13:14:16 +03:00
Amos Kong
cd9d967c44 systemd: have the first housekeeping check right after start
Issue: https://github.com/scylladb/scylla/issues/1594

Currently systemd run first housekeeping check at the end of
first timer period. We expected it to be run right after start.

This patch makes systemd to be consistent with upstart.

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <4cc880d509b0a7b283278122a70856e21e5f1649.1471433388.git.amos@scylladb.com>
(cherry picked from commit 9d53305475)
2016-08-17 16:02:25 +03:00
Avi Kivity
236b089b03 Merge "Fixes for streamed_mutation_from_mutation" from Paweł
"This series contains fixes for two memory leaks in
streamed_mutation_from_mutation.

Fixes #1557."

(cherry picked from commit 4871b19337)
2016-08-17 13:26:25 +03:00
Avi Kivity
9d54b33644 Merge 2016-08-17 13:25:49 +03:00
Benoit Canet
4ef6c3155e systemd: Remove WorkingDirectory directive
The WorkingDirectory directive does not support environment variables on
systemd version that is shipped with Ubuntu 16.04. Fortunately, not
setting WorkingDirectory implicitly sets it to user home directory,
which is the same thing (i.e. /var/lib/scylla).

Fixes #1319

Signed-of-by: Benoit Canet <benoit@scylladb.com>
Message-Id: <1470053876-1019-1-git-send-email-benoit@scylladb.com>
(cherry picked from commit 90ef150ee9)
2016-08-17 12:34:44 +03:00
Takuya ASADA
fe529606ae dist/common/scripts: mkdir -p /var/lib/scylla/coredump before symlinking
We are creating this dir in scylla_raid_setup, but user may create XFS volume w/o using the command, scylla_coredump_setup should work on such condition.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470638615-17262-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 60ce16cd54)
2016-08-16 12:35:15 +03:00
Avi Kivity
85376ce555 Merge "Don't allow CK wrapping ranges" from Duarte
"This pathset ensures user-specified clustering key ranges are never
wrapping, as those types of ranges are not defined for CQL3.

Fixes #1544"
2016-08-16 10:09:31 +03:00
Duarte Nunes
5e8ac82614 cql3: Discard wrap around ranges.
Wrapping ranges are not supported in CQL3. If one is specified,
this patch converts it to an empty range.

Fixes #1544

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:22:44 +00:00
Duarte Nunes
22c8520d61 storage_proxy: Short circuit query without clustering ranges
This patch makes the storage_proxy return an empty result when the
query doesn't define any clustering ranges (default or specific).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:05:23 +00:00
Duarte Nunes
e7355c9b60 thrift: Don't always validate clustering range
This patch makes make_clustering_range not enforce that the range be
non-wrapping, so that it can be validated differently if needed. A
make_clustering_range_and_validate function is introduced that keeps
the old behavior.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:05:18 +00:00
Paweł Dziepak
3f54e0c28e partition_version: handle errors during version merge
Currently, partition snapshot destructor can throw which is a big no-no.
The solution is to ignore the exception and leave versions unmerged and
hope that subsequent reads will succeed at merging.

However, another problem is that the merge doesn't use allocating
sections which means that memory won't be reclaimed to satisfy its
needs. If the cache is full this may result in partition versions not
being merged for a very long time.

This patch introduces partition_snapshot::merge_partition_versions()
which contains all the version merging logic that was previously present
in the snapshot destructor. This function may throw so that it can be
used with allocating sections.

The actual merging and handling of potential erros is done from
partition_snapshot_reader destructor. It tries to merge versions under
the allocating section. Only if that fails it gives up and leaves them
unmerged.

Fixes #1578
Fixes #1579.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471265544-23579-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 5cae44114f)
2016-08-15 15:57:10 +03:00
Asias He
4c6f8f9d85 gossip: Add heart_beat_version to collectd
$ tools/scyllatop/scyllatop.py '*gossip*'

node-1/gossip-0/gauge-heart_beat_version 1.0
node-2/gossip-0/gauge-heart_beat_version 1.0
node-3/gossip-0/gauge-heart_beat_version 1.0

Gossip heart beat version changes every second. If everyting is working
correctly, the gauge-heart_beat_version output should be 1.0. If not,
the gauge-heart_beat_version output should be less than 1.0.

Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>
(cherry picked from commit ef782f0335)
2016-08-15 12:32:17 +03:00
Nadav Har'El
7a76157cb9 sstables: don't forget to read static row
[v2: fix check for static column (don't check if the schema is not compound)
 and move want-static-columns flag inside the filtering context to avoid
 changing all the callers.]

When a CQL request asks to read only a range of clustering keys inside
a partition, we actually need to read not just these clustering rows, but
also the static columns and add them to the response (as explained by Tomek
in issue #1568).

With the current code, that CQL request is translated into an
sstable::read_row() with a clustering-key filter. But this currently
only reads the requested clustering keys - NOT the static columns.

We don't want sstable::read_row() to unconditionally read the from disk
the static columns because if, for example, they are already cached, we
might not want to read them from disk. We don't have such partial-partition
cache yet, but we are likely to have one in the future.

This patch adds in the clustering key filter object a flag of whether we
need to read the static columns (actually, it's function, returning this
flag per partition, to match the API for the clustering-key filtering).

When sstable::read_row() sees the flag for this partition is true, it also
request to read the static columns.
Currently, the code always passes "true" for this flag - because we don't
have the logic to cache partially-read partitions.

The current find_disk_ranges() code does not yet support returning a non-
contiguous byte range, so this patch, if it notices that this partition
really has static columns in addition to the range it needs to read,
falls back to reading the entire partition. This is a correct solution
(and fixes #1568) but not the most efficient solution. Because static
columns are relatively rare, let's start with this solution (correct
by less efficient when there are static columns) and providing the non-
contiguous reading support is left as a FIXME.

Fixes #1568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1471124536-19471-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit 0d00da7f7f)
2016-08-15 12:30:36 +03:00
Amnon Heiman
b2e6a52461 scylla.spec: conditionally include the housekeeping.cfg in the conf package
When the housekeeping configuration name was changed from conf to cfg it
was no longer included as part of the conf rpm.

This change adds a macro that determines of if the file should be
included or not and use that marco to conditionally add the
configuration file to the rpm.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1471169042-19099-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit 612f677283)
2016-08-14 13:26:25 +03:00
Tomasz Grabiec
b1376fef9b partition_version: Add missing linearization context
Snapshot removal merges partitions, and cell merging must be done
inside linearization context.

Fixes #1574

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471010625-18019-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 1b2ea14d0e)
2016-08-12 17:56:21 +03:00
Piotr Jastrzebski
23f4813a48 Fix after free access bug in storage proxy
Due to speculative reads we can't guarantee that all
fibers started by storage_proxy::query will be finished
by the time the method returns a result.

We need to make sure that no parameter passed to this
method ever changes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>
(cherry picked from commit f212a6cfcb)
2016-08-12 16:35:45 +02:00
Duarte Nunes
c16c3127fe docker: If set, broadcast address is seed
This patch configures the broadcast address to be the seed if it is
configured, otherwise Scylla complains about it and aborts.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470863058-1011-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 918a2939ff)
2016-08-12 11:47:08 +03:00
Tomasz Grabiec
48fdeb47e2 Merge branch 'raphael/fix_min_max_metadata_v2' from git@github.com:raphaelsc/scylla.git
Fix for generation of sstables min/max clustering metadata from Raphael.

(cherry picked from commit d7f8ce7722)
2016-08-11 17:53:01 +03:00
Avi Kivity
9ef4006d67 Update seastar submodule
* seastar 36a8ebe...e6571c4 (1):
  > reactor: Do not test for poll mode default
2016-08-11 14:45:52 +03:00
Amnon Heiman
75c53e4f24 scylla-housekeeping: rename configuration file from conf to cfg
Files with a conf extension are run by the scylla_prepare on the AMI.
The scylla-housekeeping configuration file is not a bash script and
should not be run.

This patch changes its extension to cfg which is more python like.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470896759-22651-2-git-send-email-amnon@scylladb.com>
(cherry picked from commit 5a4fc9c503)
2016-08-11 14:45:11 +03:00
Tomasz Grabiec
66292c0ef0 sstables: Fix bug in promoted index generation
maybe_flush_pi_block, which is called for each cell, assumes that
block_first_colname will be empty when the first cell is encountered
for each partition.

This didn't hold after writing partition which generated no index
entry, because block_first_colname was cleared only when there way any
data written into the promoted index. Fix by always clearing the name.

The effect was that the promoted index entry for the next partition
would be flushed sooner than necessary (still counting since the start
of the previous partition) and with offset pointing to the start of
the current partition. This will cause parsing error when such sstable
is read through promoted index entry because the offset is assumed to
point to a cell not to partition start.

Fixes #1567

Message-Id: <1470909915-4400-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit f1c2481040)
2016-08-11 13:09:05 +03:00
Amnon Heiman
84f7d9a49c build_deb: Add dist flag
The dist flag mark the debian package as distributed package.
As such the housekeeping configuration file will be included in the
package and will not need to be created by the scylla_setup.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470907208-502-2-git-send-email-amnon@scylladb.com>
(cherry picked from commit a24941cc5f)
2016-08-11 12:25:28 +03:00
Pekka Enberg
f0535eae9b dist/docker: Fix typo in "--overprovisioned" help text
Reported by Mathias Bogaert (@analytically).
Message-Id: <1470904395-4614-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit d1a052237d)
2016-08-11 11:49:42 +03:00
Avi Kivity
4f096b60df Update seastar submodule
* seastar de789f1...36a8ebe (1):
  > reactor: fix I/O queue pending requests collectd metric

Fixes #1558.
2016-08-10 15:28:09 +03:00
Pekka Enberg
4ac160f2fe release: prepare for 1.3.rc3 2016-08-10 13:53:53 +03:00
Avi Kivity
395edc4361 Merge 2016-08-10 13:34:48 +03:00
Avi Kivity
e2c9feafa3 Merge "Add configuration file to scylla-housekeeping" from Amnon
"The series adds an optional configuration file to the scylla-housekeeping. The
file act as a way to prevent the scylla-housekeeping to run. A missing
configuration file, will make the scylla-housekeeping immediately.

The series adds a flag to the build_rpm that differentiate between public
distributions that would contain the configuration file and private
distributions that will not contain it which will cause the setup script to
create it."

(cherry picked from commit da4d33802e)
2016-08-10 13:34:04 +03:00
Avi Kivity
f4dea17c19 Merge "housekeeping: Switch to pytho2 and handle version" from Amnon
This series handle two issues:
* Moving to python2, though python3 is supported, there are modules that we
  need that are not rpm installable, python3 would wait when it will be more
  mature.

* Check version should send the current version when it check for a new one and
  a simple string compare is wrong.

(cherry picked from commit ec62f0d321)
2016-08-10 13:31:50 +03:00
Amnon Heiman
a45b72b66f scylla-housekeeping: check version should use the current version
This patch handle two issues with check version:
* When checking for a version, the script send the current version
* Instead of string compare it uses parse_version to compare the
versions.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 406fa11cc5)
2016-08-10 13:29:53 +03:00
Amnon Heiman
be1c2a875b scylla-housekeeping: Switchng to pythno2
There is a problem with python module installation in pythno3,
especially on centos. Though pytho34 has a normal package, alot of the
modules are missing yum installation and can only be installed by pip.

This patch switch the  scylla-housekeeping implementation to use
python2, we should switch back to python3 when CeontOS python3 will be
more mature.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 641e5dc57c)
2016-08-10 13:23:47 +03:00
Nadav Har'El
0b9f83c6b6 sstable: avoid copying non-existant value
The promoted-index reading code contained a bug where it copied the value
of an disengaged optional (this non-value was never used, but it was still
copied ). Fix it by keeping the optional<> as such longer.

This bug caused tests/sstable_test in the debug build to crash (the release
build somehow worked).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470742418-8813-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit e005762271)
2016-08-10 13:14:49 +03:00
Pekka Enberg
0d77615b80 cql3: Filter compaction strategy class from compaction options
Cassandra 2.x does not store the compaction strategy class in compaction
options so neither should we to avoid confusing the drivers.

Fixes #1538.
Message-Id: <1470722615-29106-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 9ff242d339)
2016-08-10 12:44:50 +03:00
Pekka Enberg
8771220745 dist/docker: Add '--smp', '--memory', and '--overprovisoned' options
Add '--smp', '--memory', and '--overprovisioned' options to the Docker
image. The options are written to /etc/scylla.d/docker.conf file, which
is picked up by the Scylla startup scripts.

You can now, for example, restrict your Docker container to 1 CPU and 1
GB of memory with:

   $ docker run --name some-scylla penberg/scylla --smp 1 --memory 1G --overprovisioned 1

Needed by folks who want to run Scylla on Docker in production.

Cc: Sasha Levin <alexander.levin@verizon.com>
Message-Id: <1470680445-25731-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 6a5ab6bff4)
2016-08-10 11:54:01 +03:00
Avi Kivity
f552a62169 Update seastar submodule
* seastar ee1ecc5...de789f1 (1):
  > Merge "Fix the SMP queue poller" from Tomasz

Fixes #1553.
2016-08-10 09:54:15 +03:00
Avi Kivity
696a978611 Update seastar submodule
* seastar 0b53ab2...ee1ecc5 (1):
  > byteorder: add missing cpu_to_be(), be_to_cpu() functions

Fixes build failure.
2016-08-10 09:51:35 +03:00
Nadav Har'El
0475a98de1 Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit c2e4f5ba16)
2016-08-09 17:54:54 +03:00
Nadav Har'El
0b69e37065 Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit bce020efbd)
2016-08-09 17:54:49 +03:00
Avi Kivity
dc6be68852 Merge "promoted index for reading partial partitions" from Nadav
"The goal of this patch series is to support reading and writing of a
"promoted index" - the Cassandra 2.* SSTable feature which allows reading
only a part of the partition without needing to read an entire partition
when it is very long. To make a long story short, a "promoted index" is
a sample of each partition's column names, written to the SSTable Index
file with that partition's entry. See a longer explanation of the index
file format, and the promoted index, here:

     https://github.com/scylladb/scylla/wiki/SSTables-Index-File

There are two main features in this series - first enabling reading of
parts of partitions (using the promoted index stored in an sstable),
and then enable writing promoted indexes to new sstables. These two
features are broken up into smaller stand-alone pieces to facilitate the
review.

Three features are still missing from this series and are planned to be
developed later:

1. When we fail to parse a partition's promoted index, we silently fall back
   to reading the entire partition. We should log (with rate limiting) and
   count these errors, to help in debugging sstable problems.

2. The current code only uses the promoted index when looking for a single
   contiguous clustering-key range. If the ck range is non-contiguous, we
   fall back to reading the entire partition. We should use the promoted
   index in that case too.

3. The current code only uses the promoted index when reading a single
   partition, via sstable::read_row(). When scanning through all or a
   range of partitions (read_rows() or read_range_rows()), we do not yet
   use the promoted index; We read contiguously from data file (we do not
   even read from the index file, so unsurprisingly we can't use it)."

(cherry picked from commit 700feda0db)
2016-08-09 17:54:15 +03:00
Avi Kivity
8c20741150 Revert "sstables: promoted index write support"
This reverts commit c0e387e1ac.  The full
patchset needs to be backported instead.
2016-08-09 17:53:24 +03:00
Avi Kivity
3e3eaa693c Revert "Fix failing tests"
This reverts commit 8d542221eb.  It is needed,
but prevents another revert from taking place.  Will be reinstated later
2016-08-09 17:52:57 +03:00
Avi Kivity
03ef0a9231 Revert "Avoid some warnings in debug build"
This reverts commit 47bf8181af.  It is needed,
but prevents another revert from taking place.  Will be reinstated later.
2016-08-09 17:52:09 +03:00
Nadav Har'El
47bf8181af Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit c2e4f5ba16)
2016-08-09 16:58:27 +03:00
Nadav Har'El
8d542221eb Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit bce020efbd)
2016-08-09 16:58:27 +03:00
Nadav Har'El
c0e387e1ac sstables: promoted index write support
This patch adds writing of promoted index to sstables.

The promoted index is basically a sample of columns and their positions
for large partitions: The promoted index appears in the sstable's index
file for partitions which are larger than 64 KB, and divides the partition
to 64 KB blocks (as in Cassandra, this interval is configurable through
the column_index_size_in_kb config parameter). Beyond modifying the index
file, having a promoted index may also modify the data file: Since each
of blocks may be read independently, we need to add in the beginning of
each block the list of range tombstones that are still open at that
position.

See also https://github.com/scylladb/scylla/wiki/SSTables-Index-File

Fixes #959

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 0d8463aba5)
2016-08-09 16:58:27 +03:00
Duarte Nunes
57d3dc5c66 thrift: Set default validator
This patch sets the default validator for dynamic column families.
Doing so has no consequences in terms of behavior, but it causes the
correct type to be shown when describing the column family through
cassandra-cli.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470739773-30497-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 0ed19ec64d)
2016-08-09 13:56:43 +02:00
Duarte Nunes
2daee0b62d thrift: Send empty col metadata when describing ks
This patch ensures we always send the column metadata, even when the
column family is dynamic and the metadata is empty, as some clients
like cassandra-cli always assume its presence.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470740971-31169-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit f63886b32e)
2016-08-09 14:34:14 +03:00
Pekka Enberg
3eddf5ac54 dist/docker: Document data volume and cpuset configuration
Message-Id: <1470649675-5648-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 3b31d500c8)
2016-08-09 11:15:57 +03:00
Pekka Enberg
42d6f389f9 dist/docker: Add '--broadcast-rpc-address' command line option
We already have a '--broadcat-address' command line option so let's add
the same thing for RPC broadcast address configuration.

Message-Id: <1470656449-11038-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 4372da426c)
2016-08-09 11:15:53 +03:00
Pekka Enberg
1a6f6f1605 Update scylla-ami submodule
* dist/ami/files/scylla-ami 2e599a3...14c1666 (1):
  > setup coredump on first startup
2016-08-09 11:10:20 +03:00
Avi Kivity
8d8e997f5a Update scylla-ami submodule
* dist/ami/files/scylla-ami 863cc45...2e599a3 (1):
  > Do not set developer-mode on unsupported instance types
2016-08-07 17:52:13 +03:00
Takuya ASADA
50ee889679 dist/ami/files: add a message for privacy policy agreement on login prompt
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470212054-351-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 3d45d6579b)
2016-08-07 17:40:56 +03:00
Duarte Nunes
325f917d8a system_keyspace: Correctly deal with wrapped ranges
This patch ensures we correctly deal with ranges that wrap around when
querying the size_estimates system table.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470412433-7767-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit e0a43a82c6)
2016-08-07 17:21:58 +03:00
Takuya ASADA
b088dd7d9e dist/ami/files: show warning message for unsupported instance types
Notify users to run scylla_io_setup before lunching scylla on unsupported instance types.

Fixes #1511

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470090415-8632-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit bd1ab3a0ad)
2016-08-05 09:51:27 +03:00
Takuya ASADA
a42b2bb0d6 dist/ami: Install scylla metapackage and debuginfo on AMI
Install scylla metapackage and debuginfo on AMI to make AMI to report bugs easier.
Fixes #1496

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469635071-16821-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 9b59bb59f2)
2016-08-05 09:48:19 +03:00
Takuya ASADA
aecda01f8e dist/common/scripts: disable coredump compression by default, add an argument to enable compression on scylla_coredump_setup
On large memory machine compression takes too long, so disable it by default.
Also provide a way to enable it again.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469706934-6280-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 89b790358e)
2016-08-05 09:47:17 +03:00
Takuya ASADA
f9b0a29def dist/ami: setup correct repository when --localrpm specified
There was no way to setup correct repo when AMI is building by --localrpm option, since AMI does not have access to 'version' file, and we don't passed repo URL to the AMI.
So detect optimal repo path when starting build AMI, passes repo URL to the AMI, setup it correctly.

Note: this changes behavor of build_ami.sh/scylla_install_pkg's --repo option.
It was repository URL, but now become .repo/.list file URL.
This is optimal for the distribution which requires 3rdparty packages to install scylla, like CentOS7.
Existing shell scripts which invoking build_ami.sh are need to change in new way, such as our Jenkins jobs.

Fixes #1414

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469636377-17828-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit d3746298ae)
2016-08-05 09:45:22 +03:00
Pekka Enberg
192e935832 dist/docker: Use Scylla 1.3 RPM repository 2016-08-05 09:13:08 +03:00
Avi Kivity
436ff3488a Merge "Docker image fixes" from Pekka
"Kubernetes is unhappy with our Docker image because we start systemd
under the hood. Fix that by switching to use "supervisord" to manage the
two processes -- "scylla" and "scylla-jmx":

  http://blog.kunicki.org/blog/2016/02/12/multiple-entrypoints-in-docker/

While at it, fix up "docker logs" and "docker exec cqlsh" to work
out-of-the-box, and update our documentation to match what we have.

Further work is needed to ensure Scylla production configuration works
as expected and is documented accordingly."

(cherry picked from commit 28ee2bdbd2)
2016-08-05 09:10:51 +03:00
Benoît Canet
b91712fc36 docker: Add documentation page for Docker Hub
Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466438296-5593-1-git-send-email-benoit@scylladb.com>
(cherry picked from commit 4ce7bced27)
2016-08-05 09:10:48 +03:00
Yoav Kleinberger
be954ccaec docker: bring docker image closer to a more 'standard' scylla installation
Previously, the Docker image could only be run interactively, which is
not conducive for running clusters. This patch makes the docker image
run in the background (using systemd). This makes the docker workflow
similar to working with virtual machines, i.e. the user launches a
container, and once it is running they can connect to it with

       docker exec -it <container_name> bash

and immediately use `cqlsh` to control it.

In addition, the configuration of scylla is done using established
scripts, such as `scylla_dev_mode_setup`, `scylla_cpuset_setup` and
`scylla_io_setup`, whereas previously code from these scripts was
duplicated into the docker startup file.

To specify seeds for making a cluster, use the --seeds command line
argument, e.g.

    docker run -d --privileged scylladb/scylla
    docker run -d --privileged scylladb/scylla --seeds 172.17.0.2

other options include --developer-mode, --cpuset, --broadcast-address

The --developer-mode option mode is on by default - so that we don't fail users
who just want to play with this.

The Dockerfile entrypoint script was rewritten as a few Python modules.
The move to Python is meritted because:

    * Using `sed` to manipulate YAML is fragile
    * Lack of proper command line parsing resulted in introducing ad-hoc environment variables
    * Shell scripts don't throw exceptions, and it's easy to forget to check exit codes for every single command

I've made an effort to make the entrypoint `go' script very simple and readable.
The goary details are hidden inside the other python modules.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1468938693-32168-1-git-send-email-yoav@scylladb.com>
(cherry picked from commit d1d1be4c1a)
2016-08-05 09:10:35 +03:00
Glauber Costa
2bffa8af74 logalloc: make sure allocations in release_requests don't recurse back into the allocator
Calls like later() and with_gate() may allocate memory, although that is not
very common. This can create a problem in the sense that it will potentially
recurse and bring us back to the allocator during free - which is the very thing
we are trying to avoid with the call to later().

This patch wraps the relevant calls in the reclaimer lock. This do mean that the
allocation may fail if we are under severe pressure - which includes having
exhausted all reserved space - but at least we won't recurse back to the
allocator.

To make sure we do this as early as possible, we just fold both release_requests
and do_release_requests into a single function

Thanks Tomek for the suggestion.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <980245ccc17960cf4fcbbfedb29d1878a98d85d8.1470254846.git.glauber@scylladb.com>
(cherry picked from commit fe6a0d97d1)
2016-08-04 11:17:54 +02:00
Glauber Costa
4a6d0d503f logalloc: make sure blocked requests memory allocations are served from the standar allocator
Issue 1510 describes a scenario in which, under load, we allocate memory within
release_requests() leading to a reentry into an invalid state in our
blocked requests' shared_promise.

This is not easy to trigger since not all allocations will actually get to the
point in which they need a new segment, let alone have that happening during
another allocator call.

Having those kinds of reentry is something we have always sought to avoid with
release_requests(): this is the reason why most of the actual routine is
deferred after a call to later().

However, that is a trick we cannot use for updating the state of the blocked
requests' shared_promise: we can't guarantee when is that going to run, and we
always need a valid shared_promise, in a valid state, waiting for new requests
to hook into.

The solution employed by this patch is to make sure that no allocation
operations whatsoever happen during the initial part of release_requests on
behalf of the shared promise.  Allocation is now deferred to first use, which
relieves release_requests() from all allocation duties. All it needs to do is
free the old object and signal to the its user that an allocation is needed (by
storing {} into the shared_promise).

Fixes #1510

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <49771e51426f972ddbd4f3eeea3cdeef9cc3b3c6.1470238168.git.glauber@scylladb.com>
(cherry picked from commit ad58691afb)
2016-08-04 11:17:49 +02:00
Avi Kivity
fa81385469 conf: synchronize internode_compression between scylla.yaml and code
Our default is "none", to give reasonable performance, so have scylla.yaml
reflect that.

(cherry picked from commit 9df4ac53e5)
2016-08-04 12:10:03 +03:00
Duarte Nunes
93981aaa93 schema_builder: Ensure dense tables have compact col
This patch ensures that when the schema is dense, regardless of
compact_storage being set, the single regular columns is translated
into a compact column.

This fixes an issue where Thrift dynamic column families are
translated to a dense schema with a regular column, instead of a
compact one.

Since a compact column is also a regular column (e.g., for purposes of
querying), no further changes are required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470062410-1414-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5995aebf39)

Fixes #1535.
2016-08-03 13:49:51 +02:00
Duarte Nunes
89b40f54db schema: Dense schemas are correctly upgrades
When upgrading a dense schema, we would drop the cells of the regular
(compact) column. This patch fixes this by making the regular and
compact column kinds compatible.

Fixes #1536

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470172097-7719-1-git-send-email-duarte@scylladb.com>
2016-08-03 13:37:57 +02:00
Paweł Dziepak
99dfbedf36 sstables: extend sstable life until reader is fully closed
data_consume_rows_context needs to have close() called and the returned
future waited for before it can be destroyed. data_consume_context::impl
does that in the background upon its destruction.

However, it is possible that the sstable is removed before
data_consume_rows_context::close() completes in which case EBADF may
happen. The solution is to make data_consume_context::impl keep a
reference to the sstable and extend its life time until closing of
data_consume_rows_context (which is performed in the background)
completes.

Side effect of this change is also that data_consume_context no longer
requires its user to make sure that the sstable exists as long as it is
in use since it owns its own reference to it.

Fixes #1537.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1470222225-19948-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 02ffc28f0d)
2016-08-03 13:19:50 +02:00
Paweł Dziepak
e95f4eaee4 Merge "partition_limit: Don't count dead partitions" from Duarte
"This patch series ensures we don't count dead partitions (i.e.,
partitions with no live rows) towards the partition_limit. We also
enforce the partition limit at the storage_proxy level, so that
limits with smp > 1 works correctly."

(cherry picked from commit 5f11a727c9)
2016-08-03 12:44:32 +03:00
Avi Kivity
2570da2006 Update seastar submodule
* seastar f603f88...0b53ab2 (2):
  > reactor: limit task backlog
  > reactor: make sure a poll cycle always happens when later is called

Fix runaway task queue growth on cpu-bound loads.
2016-08-03 12:33:54 +03:00
Tomasz Grabiec
b224ff6ede Merge 'pdziepak/row-cache-wide-entries/v4' from seastar-dev.git
This series adds the ability for partition cache to keep information
whether partition size makes it uncacheable. During, reads these
entries save us IO operations since we already know that the partiiton
is too big to be put in the cache.

First part of the patchset makes all mutation_readers allow the
streamed_mutations they produce to outlive them, which is a guarantee
used later by the code handling reading large partitions.

(cherry picked from commit d2ed75c9ff)
2016-08-02 20:24:29 +02:00
Piotr Jastrzebski
6960fce9b2 Use continuity flag correctly with concurrent invalidations
Between reading cache entry and actually using it
invalidations can happen so we have to check if no flag was
cleared if it was we need to read the entry again.

Fixes #1464.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <7856b0ded45e42774ccd6f402b5ee42175bd73cf.1469701026.git.piotr@scylladb.com>
(cherry picked from commit fdfd1af694)
2016-08-02 20:24:22 +02:00
Avi Kivity
a556265ccd checked_file: preserve DMA alignment
Inherit the alignment parameters from the underlying file instead of
defaulting to 4096.  This gives better read performance on disks with 512-byte
sectors.

Fixes #1532.
Message-Id: <1470122188-25548-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 9f35e4d328)
2016-08-02 12:22:37 +03:00
Duarte Nunes
8243d3d1e0 storage_service: Fix get_range_to_address_map_in_local_dc
This patch fixes a couple of bugs in
get_range_to_address_map_in_local_dc.

Fixes #1517

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469782666-21320-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 7d1b7e8da3)
2016-07-29 11:24:12 +02:00
Pekka Enberg
2665bfdc93 Update seastar submodule
* seastar 103543a...f603f88 (1):
  > iotune: Fix SIGFPE with some executions
2016-07-29 11:11:56 +03:00
Tomasz Grabiec
3a1e8fffde Merge branch 'sstables/static-1.3/v1' from git@github.com:duarten/scylla.git into branch-1.3
The current code assumes cell names are always compound and may
wrongly report a non-static row as such. This patch addresses this
and adds a test case to catch regressions.

Backports the fix to #1495.
2016-07-28 15:07:41 +02:00
Gleb Natapov
23c340bed8 api: fix use after free in sum_sstable
get_sstables_including_compacted_undeleted() may return temporary shared
ptr which will be destroyed before the loop if not stored locally.

Fixes #1514

Message-Id: <20160728100504.GD2502@scylladb.com>
(cherry picked from commit 3531dd8d71)
2016-07-28 14:28:25 +03:00
Duarte Nunes
ff8a795021 sstables: Validate static cell is on static column
This patch enforces compatibility between a cell and the
corresponding column definition with regards to them being
static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-28 12:11:46 +02:00
Duarte Nunes
d11b0cac3b sstable_mutation_test: Test non-compound cell name
This patch adds a test case for reading non-compound cell names,
validating that such a cell is not incorrectly marked as static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-5-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:37 +02:00
Duarte Nunes
5ad0448cc9 sstables: Don't assume cell name is compound
The current code assumes cell names are always compound and may
wrongly report a non-static row as such, since it looks at the first
bytes of the name assuming they are the component's length.

Tables with compact storage (which cannot contain static rows) may not
have a compound comparator, so we check for the table's compoundness
before checking for the static marker. We do this by delegating to
composite_view::is_static.

Fixes #1495

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-4-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:27 +02:00
Duarte Nunes
35ab2cadc2 sstables: Remove duplication in extract_clustering_key
This patch removes some duplicated code in extract_clustering_key(),
which is already handled in composite_view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397806-8067-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:22 +02:00
Duarte Nunes
a1cee9f97c sstables: Remove superfluous call to check_static()
When building a column we're calling check_static() two times;
refector things a bit so that this doesn't happen and we reuse the
previous calculation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397748-7987-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:15 +02:00
Duarte Nunes
0ae7347d8e composite: Use operator[] instead of at()
Since we already do bounds checking on is_static(), we can use
bytes_view::operator[] instead of bytes_view::at() to avoid repeating
the bounds checking.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-3-git-send-email-duarte@scylladb.com>
2016-07-28 12:10:14 +02:00
Duarte Nunes
b04168c015 composite_view: Fix is_static
composite_view's is_static function is wrong because:

1) It doesn't guard against the composite being a compound;
2) Doesn't deal with widening due to integral promotions and
   consequent sign extension.

This patch fixes this by ensuring there's only one correct
implementation of is_static, to avoid code duplication and
enforce test coverage.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-2-git-send-email-duarte@scylladb.com>
2016-07-28 12:10:06 +02:00
Duarte Nunes
4e13853cbc compound_compat: Only compound values can be static
If a composite is not a compound, then it doesn't carry a length
prefix where static information is encoded. In its absence, a
non-compound composite can never be static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397561-7748-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:09:59 +02:00
Pekka Enberg
503f6c6755 release: prepare for 1.3.rc2 2016-07-28 10:57:11 +03:00
Tomasz Grabiec
7d73599acd tests: lsa_async_eviction_test: Use chunked_fifo<>
To protect against large reallocations during push() which are done
under reclaim lock and may fail.
2016-07-28 09:43:51 +02:00
Piotr Jastrzebski
bf27379583 Add tests for wide partiton handling in cache.
They shouldn't be cached.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 7d29cdf81f)
2016-07-27 14:09:45 +03:00
Piotr Jastrzebski
02cf5a517a Add collectd counter for uncached wide partitions.
Keep track of every read of wide partition that's
not cached.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 37a7d49676)
2016-07-27 14:09:40 +03:00
Piotr Jastrzebski
ec3d59bf13 Add flag to configure
max size of a cached partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 636a4acfd0)
2016-07-27 14:09:34 +03:00
Piotr Jastrzebski
30c72ef3b4 Try to read whole streamed_mutation up to limit
If limit is exceeded then return the streamed_mutation
and don't cache it.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 98c12dc2e2)
2016-07-27 14:09:29 +03:00
Piotr Jastrzebski
15e69a32ba Implement mutation_from_streamed_mutation_with_limit
If mutation is bigger than this limit
it won't be read and mutation_from_streamed_mutation
will return empty optional.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 0d39bb1ad0)
2016-07-27 14:09:23 +03:00
Paweł Dziepak
4e43cb84ff mests/sstables: test reading sstable with duplicated range tombstones
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit b405ff8ad2)
2016-07-27 14:09:02 +03:00
Paweł Dziepak
07d5e939be sstables: avoid recursion in sstable_streamed_mutation::read_next()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 04f2c278c2)
2016-07-27 14:06:03 +03:00
Paweł Dziepak
a2a5a22504 sstables: protect against duplicated range tombstones
Promoted index may cause sstable to have range tombstones duplicated
several times. These duplicates appear in the "wrong" place since they
are smaller than the entity preceeding them.

This patch ignores such duplicates by skipping range tombstones that are
smaller than previously read ones.

Moreover, these duplicted range tombstone may appear in the middle of
clustering row, so the sstable reader has also gained the ability to
merge parts of the row in such cases.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 08032db269)
2016-07-27 14:05:58 +03:00
Paweł Dziepak
a39bec0e24 tests: extract streamed_mutation assertions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 50469e5ef3)
2016-07-27 14:05:43 +03:00
Duarte Nunes
f0af5719d5 thrift: Preserve partition order when accumulating
This patch changes the column_visitor so that it preservers the order
of the partitions it visits when building the accumulation result.

This is required by verbs such as get_range_slice, on top of which
users can implement paging. In such cases, the last key returned by
the query will be that start of the range for the next query. If
that key is not actually the last in the partitioner's order, then
the new request will likely result in duplicate values being sent.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469568135-19644-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5aaf43d1bc)
2016-07-27 12:11:41 +03:00
Avi Kivity
0523000af5 size_estimates_recorder: unwrap ranges before searching for sstables
column_family::select_sstables() requires unwrapped ranges, so unwrap
them.  Fixes crash with Leveled Compaction Strategy.

Fixes #1507.

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469563488-14869-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 64d0cf58ea)
2016-07-27 10:07:13 +03:00
Paweł Dziepak
69a0e6e002 stables: fix skipping partitions with no rows
If partition contains no static and clustering rows or range tombstones
mp_row_consumer will return disengaged mutation_fragment_opt with
is_mutation_end flag set to mark end of this partition.

Current, mutation_reader::impl code incorrectly recognized disengaged
mutation fragment as end of the stream of all mutations. This patch
fixes that by using is_mutation_end flag to determine whether end of
partition or end of stream was reached.

Fixes #1503.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1469525449-15525-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit efa690ce8c)
2016-07-26 13:10:31 +03:00
Amos Kong
58d4de295c scylla-housekeeping: fix typo of script path
I tried to start scylla-housekeeping service by:
 # sudo systemctl restart scylla-housekeeping.service

But it's failed for wrong script path, error detail:
 systemd[5605]: Failed at step EXEC spawning
 /usr/lib/scylla/scylla-Housekeeping: No such file or directory

The right script name is 'scylla-housekeeping'

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <c11319a3c7d3f22f613f5f6708699be0aa6bd740.1469506477.git.amos@scylladb.com>
(cherry picked from commit 64530e9686)
2016-07-26 09:19:15 +03:00
Vlad Zolotarov
026061733f tracing: set a default TTL for system_traces tables when they are created
Fixes #1482

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469104164-4452-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 4647ad9d8a)
2016-07-25 13:50:43 +03:00
Vlad Zolotarov
1d7ed190f8 SELECT tracing instrumentation: improve inter-nodes communication stages messages
Add/fix "sending to"/"received from" messages.

With this patch the single key select trace with a data on an external node
looks as follows:

Tracing session: 65dbfcc0-4f51-11e6-8dd2-000000000001

 activity                                                                                                                        | timestamp                  | source    | source_elapsed
---------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                                              Execute CQL3 query | 2016-07-21 17:42:50.124000 | 127.0.0.2 |              0
                                                                                                   Parsing a statement [shard 1] | 2016-07-21 17:42:50.124127 | 127.0.0.2 |             --
                                                                                                Processing a statement [shard 1] | 2016-07-21 17:42:50.124190 | 127.0.0.2 |             64
 Creating read executor for token 2309717968349690594 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 1] | 2016-07-21 17:42:50.124229 | 127.0.0.2 |            103
                                                                            read_data: sending a message to /127.0.0.1 [shard 1] | 2016-07-21 17:42:50.124234 | 127.0.0.2 |            108
                                                                           read_data: message received from /127.0.0.2 [shard 1] | 2016-07-21 17:42:50.124358 | 127.0.0.1 |             14
                                                          read_data handling is done, sending a response to /127.0.0.2 [shard 1] | 2016-07-21 17:42:50.124434 | 127.0.0.1 |             89
                                                                               read_data: got response from /127.0.0.1 [shard 1] | 2016-07-21 17:42:50.124662 | 127.0.0.2 |            536
                                                                                  Done processing - preparing a result [shard 1] | 2016-07-21 17:42:50.124695 | 127.0.0.2 |            569
                                                                                                                Request complete | 2016-07-21 17:42:50.124580 | 127.0.0.2 |            580

Fixes #1481

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469112271-22818-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 57b58cad8e)
2016-07-25 13:50:39 +03:00
Raphael S. Carvalho
2d66a4621a compaction: do not convert timestamp resolution to uppercase
C* only allows timestamp resolution in uppercase, so we shouldn't
be forgiving about it, otherwise migration to C* will not work.
Timestamp resolution is stored in compaction strategy options of
schema BTW.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d64878fc9bbcf40fd8de3d0f08cce9f6c2fde717.1469133851.git.raphaelsc@scylladb.com>
(cherry picked from commit c4f34f5038)
2016-07-25 13:47:23 +03:00
Duarte Nunes
aaa9b5ace8 system_keyspace: Add query_size_estimates() function
The query_size_estimates() function queries the size_estimates system
table for a given keyspace and table, filtering out the token ranges
according to the specified tokens.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit ecfa04da77)
2016-07-25 13:43:16 +03:00
Duarte Nunes
8d491e9879 size_estimates_recorder: Fix stop()
This patch fixes stop() by checking if the current CPU instead of
whether the service is active (which it won't be at the time stop() is
called).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit d984cc30bf)
2016-07-25 13:43:08 +03:00
Duarte Nunes
b63c9fb84b system_keyspace: Avoid pointers in range_estimates
This patch makes range_estimates a proper struct, where tokens are
represented as dht::tokens rather than dht::ring_position*.

We also pass other arguments to update_ and clear_size_estimates by
copy, since one will already be required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit e16f3f2969)
2016-07-25 13:42:53 +03:00
Duarte Nunes
b229f03198 thrift: Fail when creating mixed CF
This patch ensures we fail when creating a mixed column family, either
when adding columns to a dynamic CF through updated_column_family() or
when adding a dynamic column upon insertion.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469378658-19853-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5c4a2044d5)
2016-07-25 13:42:05 +03:00
Duarte Nunes
6caa59560b thrift: Correctly translate no_such_column_family
The no_such_column_family exception is translated to
InvalidRequestException instead of to NotFoundException.

8991d35231 exposed this problem.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469376674-14603-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 560cc12fd7)
2016-07-25 13:41:58 +03:00
Duarte Nunes
79196af9fb thrift: Implement describe_splits verb
This patch implements the describe_splits verb on top of
describe_splits_ex.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit ab08561b89)
2016-07-25 13:41:54 +03:00
Duarte Nunes
afe09da858 thrift: Implement describe_splits_ex verb
This patch implements the describe_splits_ex verbs by querying the
size_estimates system table for all the estimates in the specified
token range.

If the keys_per_split argument is bigger then the
estimated partitions count, then we merge ranges until keys_per_split
is met. Note that the tokens can't be split any further,
keys_per_split might be less than the reported number of keys in one
or more ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 472c23d7d2)
2016-07-25 13:41:46 +03:00
Duarte Nunes
d6cb41ff24 thrift: Handle and convert invalid_request_exception
This patch converts an exceptions::invalid_request_exception
into a Thrift InvalidRequestException instead of into a generic one.

This makes TitanDB work correctly, which expects an
InvalidRequestException when setting a non-existent keyspace.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469362086-1013-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 2be45c4806)
2016-07-24 16:46:18 +03:00
Duarte Nunes
6bf77c7b49 thrift: Use database::find_schema directly
This patch changes lookup_schema() so it directly calls
database::find_schema() instead of going through
database::find_column_family(). It also drops conversion of the
no_such_column_family exeption, as that is already handled at a higher
layer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 8991d35231)
2016-07-24 16:46:05 +03:00
Duarte Nunes
6d34b4dab7 thrift: Remove hardcoded version constant
...and use the one in thrift_server.hh instead.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 038d42c589)
2016-07-24 16:45:46 +03:00
Duarte Nunes
d367f1e9ab thrift: Remove unused with_cob_dereference function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 8bb43d09b1)
2016-07-24 16:45:22 +03:00
Avi Kivity
75a36ae453 bloom_filter: fix overflow for large filters
We use ::abs(), which has an int parameter, on long arguments, resulting
in incorrect results.

Switch to std::abs() instead, which has the correct overloads.

Fixes #1494.

Message-Id: <1469347802-28933-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 900639915d)
2016-07-24 11:32:28 +03:00
Tomasz Grabiec
35c1781913 schema_tables: Fix hang during keyspace drop
Fixes #1484.

We drop tables as part of keyspace drop. Table drop starts with
creating a snapshot on all shards. All shards must use the same
snapshot timestamp which, among other things, is part of the snapshot
name. The timestamp is generated using supplied timestamp generating
function (joinpoint object). The joinpoint object will wait for all
shards to arrive and then generate and return the timestamp.

However, we drop tables in parallel, using the same joinpoint
instance. So joinpoint may be contacted by snapshotting shards of
tables A and B concurrently, generating timestamp t1 for some shards
of table A and some shards of table B. Later the remaining shards of
table A will get a different timestamp. As a result, different shards
may use different snapshot names for the same table. The snapshot
creation will never complete because the sealing fiber waits for all
shards to signal it, on the same name.

The fix is to give each table a separate joinpoint instance.

Message-Id: <1469117228-17879-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5e8f0efc85)
2016-07-22 15:36:45 +02:00
Vlad Zolotarov
1489b28ffd cql_server::connection::process_prepare(): don't std::move() a shared_ptr captured by reference in value_of() lambda
A seastar::value_of() lambda used in a trace point was doing the unthinkable:
it called std::move() on a value captured by reference. Not only it compiled(!!!)
but it also actually std::move()ed the shared_ptr before it was used in a make_result()
which naturally caused a SIGSEG crash.

Fixes #1491

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469193763-27631-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 9423c13419)
2016-07-22 16:33:17 +03:00
Avi Kivity
f975653c94 Update seastar submodule to point at scylla-seastar 2016-07-21 12:31:09 +03:00
Duarte Nunes
96f5cbb604 thrift: Omit regular columns for dynamic CFs
This patch skips adding the auto-generated regular column when
describing a dynamic Column family for the describe_keyspace(s) verbs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469091720-10113-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit a436cf945c)
2016-07-21 12:06:29 +03:00
Raphael S. Carvalho
66ebef7d10 tests: add new test for date tiered strategy
This test set the time window to 1 hour and checks that the strategy
works accordingly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit cf54af9e58)
2016-07-21 12:00:26 +03:00
Raphael S. Carvalho
789fb0db97 compaction: implement date tiered compaction strategy options
Now date tiered compaction strategy will take into account the
strategy options which are defined in the schema.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit eaa6e281a2)
2016-07-21 12:00:18 +03:00
Pekka Enberg
af7c0f6433 Revert "Merge seastar upstream"
This reverts commit aaf6786997.

We should backport the iotune fixes for 1.3 and not pull everything.
2016-07-21 11:19:50 +03:00
Pekka Enberg
aaf6786997 Merge seastar upstream
* seastar 103543a...9d1db3f (8):
  > reactor: limit task backlog
  > iotune: Fix SIGFPE with some executions
  > Merge "Preparation for protobuf" from Amnon
  > byteorder: add missing cpu_to_be(), be_to_cpu() functions
  > rpc: fix gcc-7 compilation error
  > reactor: Register the smp metrics disabled
  > scollectd: Allow creating metric that is disabled
  > Merge "Propagate timeout to a server" from Gleb
2016-07-21 11:04:31 +03:00
Pekka Enberg
e8cb163cdf db/config: Start Thrift server by default
We have Thrift support now so start the server by default.
Message-Id: <1469002000-26767-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit aff8cf319d)
2016-07-20 11:29:24 +03:00
Duarte Nunes
2d7c322805 thrift: Actually concatenate strings
This patch fixes concatenating a char[] with an int by using sprint
instead of just increasing the pointer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468971542-9600-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 64dff69077)
2016-07-20 11:09:15 +03:00
Tomasz Grabiec
13f18c6445 database: Add table name to log message about sealing
Message-Id: <1468917744-2539-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 0d26294fac)
2016-07-20 10:13:32 +03:00
Tomasz Grabiec
9c430c2cff schema_tables: Add more logging
Message-Id: <1468917771-2592-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit a0832f08d2)
2016-07-20 10:13:28 +03:00
Pekka Enberg
c84e030fe9 release: prepare for 1.3.rc1 2016-07-19 20:15:26 +03:00
Avi Kivity
dc50b845b4 Merge seastar upstream
* seastar 823bc05...103543a (1):
  > core: add a seastar::format()
2016-07-19 19:09:42 +03:00
Avi Kivity
1a1b7fe3f2 Merge "CQL Tracing patch bomb" from Vlad
"This series includes the following:
   - Introduction of a formatted message support in trace().
   - Major rename: s/flush_/write_/, s/flush()/kick()/, s/store_/write_/.
   - Some cosmetic fixes found on the way.
   - Fix a bug in a shutdown flow.
   - Instrumentation to MUTATE, PREPARE, EXECUTE and BATCH flow and some
     related changes.
   - A patch that aligns the QUERY tracing format with the Origin.
   - Methods and functions description in tracing/trace_state.hh."
2016-07-19 18:46:59 +03:00
Vlad Zolotarov
7c590295ef SELECT instrumentation: add a nice trace point
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
a197323b47 tracing::trace_state.hh: Add descriptions for main methods and functions
Add a proper description to a tracing::trace() that clarifies
that the tracing message string and the positional parameters
are going to be copied if tracing state is initialized.

Add a description for trace_state::begin() methods and for a
tracing::begin() helper function.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
b36b69c1d6 service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
baa6496816 service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
b0a39f210d transport: CQL tracing: QUERY instrumentation: align the session creation parameters with origin
- Don't put the query name as a 'request' but rather save it as one of entries in a
     'params' map.
   - Save some additional query parameters in 'params'.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
962bddf8fe transport: CQL tracing: instrument a BATCH command
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
d21eaabcfe transport: CQL tracing: instrument EXECUTE command
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
89a49c346c tracing::trace_state: add begin() overload for seastar::value_of given as a "request" parameter.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
1f9b858d83 cql3: prepared_statement: add raw_cql_statement field
This field will contain an original statement given to a PREPARE
command.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
147dd72517 transport: CQL tracing: instrument a PREPARE command
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
be88074f47 service::query_state: get rid of begin_tracing()
Use tracing::begin() directly.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
982d301178 service::client_state: add a const version of get_trace_state()
tracing::begin() requires a non-const version, tracing::trace()
requires a const version.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
da56aa4256 service::client_state: rename: trace_state_ptr() -> get_trace_state()
Rename the method for consistency with other classes methods returning
the same value.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
4c16df9e4c service: instrument MUTATE flow with tracing
Store the trace state in the abstract_write_response_handler.
Instrument send_mutation RPC to receive an additional
rpc::optional parameter that will contain optional<trace_info>
value.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
54a758dfff cql3::select_statement: simplify the tracing code by using a tracing::make_trace_info() helper
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
952dc8a3d4 query_state: add get_trace_state() method
Adding this method allows to use tracing helper functions
and remove the no longer needed accessors in the query_state.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
0552ffcd17 service/storage_proxy: tracing: adjust the existing SELECT instrumentation with the new trace() interface
From now on trace_state::trace() is able to receive the sprint-ready
format string with the arguments that will be applied only during
the flush event.

This patch also optimizes the way the source address is evaluated -
do it only once instead of twice if tracing is requested.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
c1bb4d147d query::read_command: std::move() std::experimental::optional when initializing trace_info
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
0689843e79 tracing::trace_state: add method to set the session's "params" map entries
Sometimes we want to be able to set "params" map after we
started a tracing session, e.g. when the parameters values,
like a consistency level parsed from the "options" part of a binary frame,
are available only after some heavy part of a flow we would like
to trace.

This patch includes the following changes:

   - No longer pass a map to the begin().
   - Limit the parameters to the known set.
   - Define a method to set each such parameter and save its
     value till the final sstring->sstring map is created.
   - Construct the final sstring->sstring map in the destructor of the trace_state
     object in order to defer all the formatting to be after the traced flow.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
9c0a725c56 tracing: add a _local_tracing to a i_tracing_backend_helper
A backend helper has to constantly communicate with the corresponding
tracing::tracing instance. By saving a reference to the tracing::tracing instance
will save us a lot of tracing::get_local_tracing_instance() calls and thus
a lot of dereferencing.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
2bb054748e tracing: record events' time stamps
- Extend the i_tracing_backend_helper interface to accept the event
     record timestamp.
   - Grab the current timestamp when the event record is taken.
   - Add the instrumentation to the trace_keyspace_helper to create a unique time-UUID
     from a given std::chrono::duration object.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
f64f27beb9 utils: add get_time_UUID(system_clock::time_point)
Creates a type 1 UUID (time-based UUID) with the given system_clock::time_point

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
06d4221382 tracing: add tracing::make_trace_info() helper
This helper returns an std::experimental::optional<trace_info>
which is initialized or not initialized depending on whether
a given trace_state_ptr is initialized or not.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
7a5fc9fcdc tracing::trace_state: add const qualifiers to a trace_state_ptr parameter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
b0673aabd5 tracing: fix a logger name
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
da4836becc tracing::trace_state: add support for a formatted message in trace()
Add an support for passing a format string plus positional parameters
for creation of a trace point message.

Format string should be given in a fmt library native format described
here: http://fmtlib.net/latest/syntax.html#syntax .

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
ee0e986e96 tracing: make a service shutdown stages more strict
kick() backend during shutdown and restrict accessing a backend
after that.

Flush pending records when service is being shut down.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
6e38133f82 tracing: prevent a destruction of a tracing::tracing while it's used
Prevent the destruction of tracing::tracing instances while there
are still tracing::trace_state objects that are using it:

   - Make tracing::tracing inherit from seastar::async_sharded_service<tracing::tracing>.
   - Grab a tracing::tracing.shared_from_this() in each
     tracing::trace_state object using it.
   - Use a saved pointer to the local tracing::tracing instance in a destructor
     instead of accessing it via tracing::get_local_tracing_instance()
     to avoid "local is not initialized" assert when sessions are
     being destroyed after the service was stopped.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
a5022a09a4 tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API
In names of functions and variables:
s/flush_/write_/
s/store_/write_/

In a i_tracing_backend_helper:
s/flush()/kick()/

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Pekka Enberg
cbf5283a93 Merge "Populate size_estimates table" from Duarte
"This patchset implements the size_estimates_recorder, which periodically
writes estimations for all the non-system column families in the
size_estimates system table. This table is updated per schema with a set
of token ranges and the associated estimations of how many partitions
there are and their mean size.

Fixes #352"
2016-07-19 14:31:12 +03:00
Duarte Nunes
9ffdf4a5cd db: Implement size_estimates_recorder
This patch implements the size_estimates_recorder, which periodically
writes estimations for all the non-system column families in the
size_estimates system table. The size_estimates_recorder class
corresponds to the one in Cassandra's SizeEstimatesRecorder.java.

Estimation is carried out by shard 0. Since we're estimating based on
data in shared sstables, having multiple shards doing this would skew
the results.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-19 09:44:58 +00:00
Avi Kivity
86661db178 Merge seastar upstream
* seastar ceb0c94...823bc05 (1):
  > Revert "util::lazy_eval: add an implicit cast operator overload"
2016-07-19 12:02:44 +03:00
Duarte Nunes
f8f61cf246 system_keyspace: Record and clear size estimates
This patch implements functions that allow the size_estimates system
table to be updated and cleared. The size_estimates table is updated
per schema with a set of token ranges and the associated estimations
of how many partitions there are and their mean size.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Duarte Nunes
3518db531e database: Get non-system column_families
This patch adds an utility function that allows fetching the set of
column_families that do not belong to the system keyspace.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Duarte Nunes
4bc00c2055 database: Expose selection of sstables by a range
This patch allows a set of a column_family's sstables to be
selected according to a range of ring_positions.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Duarte Nunes
d7ae25c572 range: Make transform template arguments deductable
This patch makes it so that the template arguments of
range<T>::transform are more easily deducible by the compiler.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Duarte Nunes
3c05ea2f80 types: Add to_bytes_view for sstrings
This patch adds an overload of to_bytes_view for sstrings

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Tomasz Grabiec
ce768858f5 types: Fix update_types()
We should replace the old type, not insert the new type before the old type.

Fixes #1465

Message-Id: <1468861076-20397-1-git-send-email-tgrabiec@scylladb.com>
2016-07-18 20:14:22 +03:00
Avi Kivity
f886f7a2f5 Merge seastar upstream
* seastar a45823a...ceb0c94 (2):
  > print: switch to fmtlib
  > logging: simplify stringer array building
2016-07-18 19:37:34 +03:00
Avi Kivity
d261927fa3 logalloc: change sprint() of a pointer to use void* explicitly
Otherwise, fmtlib dislikes it.
2016-07-18 19:37:16 +03:00
Avi Kivity
1d1b03a7cb cql3: change sprint() of a pointer to use void* explicitly
Otherwise, fmtlib dislikes it.
2016-07-18 19:36:35 +03:00
Raphael S. Carvalho
7b9cf528ad tests: fix occassional failure in date tiered test
That was a bug in the test itself. It could happen that a sstable would
incorrectly belong to the next time window if the current minute is
approaching its end. Fix is about having all sstables that we want in
the same time window with the same min/max timestamp.

Fixes #1448.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ee25d49e7ed12b4cf7d018a08163404c3d122e56.1468782787.git.raphaelsc@scylladb.com>
2016-07-18 15:18:29 +02:00
Paweł Dziepak
4497204b7d streamed_mutation: do not leave mutation in an invalid state
This patch avoids moving entries from range tombstones and clustering
rows sets in streamed_mutation_from_mutation(). Such action breaks these
sets as the entries will be left in some unknown state.

Instead, the sets are being broken in a supported and predictable way
using unlink_leftmost_without_rebalance().

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468843205-18852-1-git-send-email-pdziepak@scylladb.com>
2016-07-18 15:14:21 +02:00
Avi Kivity
9d1b813f45 Merge seastar upstream
* seastar d699205...a45823a (5):
  > rpc: do not call shutdown function on already closed fd
  > log: Do not crash if logger is invoked from non-reactor thread
  > rpc: remove unaligned_cast and reinterpret_cast uses
  > unaligned: note unaligned_cast<> is deprecated
  > byteorder: add unaligned read/write helpers

Fixes #1463.
2016-07-18 15:24:43 +03:00
Avi Kivity
60491476e3 Merge "thrift: Add authentication and authorization" from Duarte
"This patchset implements the login verb to enable authentication
in the thrift API, and it adds access control to the already
implemented verbs."
2016-07-18 11:32:32 +03:00
Duarte Nunes
b6663f050d thrift: Add authorization for DML verbs
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-17 17:38:23 +00:00
Duarte Nunes
63354320b8 thrift: Add authorization to thrift DDL verbs
This patch adds authorization to the DDL thrift verbs. Since checking
for authorization is asynchronous, we now need to copy the verb
arguments so they can be accessed from the continuations.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-17 17:38:23 +00:00
Duarte Nunes
3c389ba871 client_state: Add has_schema_access function
This function is similar to has_column_family_access, but skips
validating if the specified keyspace and column family names map to a
valid schema, as it already takes one as an argument.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-17 17:38:23 +00:00
Duarte Nunes
dbbf4b3cc2 thrift: Group mutation map by column family
This patch transforms the mutation map, a map of keys to a map of columns
families to mutations, into a map of column families to a map of keys
to mutations. This makes is a more natural organization, as things
like checking access permissions are done by column family.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-17 17:38:23 +00:00
Duarte Nunes
f14628dc49 thrift: Introduce with_schema function
This is a wrapper around with_cob, which fetches a schema and forwards
it to a supplied function.

The patch also removes superfluous return instructions.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-17 17:38:23 +00:00
Duarte Nunes
09a5560b1b thrift: Validate login
This patch validates that a user is correctly logged in (if
authentication is required) for the required verbs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-17 17:38:23 +00:00
Duarte Nunes
a3e507eb1c thrift: Implement login verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-17 17:38:23 +00:00
Amnon Heiman
d096a6762a scylla_setup: Ask if to start scylla-housekeeping
The scylla-server.service will try to start the scylla-housekeeping.

This patch adds a question to the scylla_setup if to enable the version
check, if the answer is no, the scylla-housekeeping will be masked.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1468741129-1977-1-git-send-email-amnon@scylladb.com>
2016-07-17 17:57:12 +03:00
Nadav Har'El
c647d917e0 sstables: move to_bytes_view to header file
Move the to_bytes_view(temporary_buffer<char>) function from source file
to header file where is can be used in more places.

This saves one use of reinterpret_cast (which we are no re-evaluating),
and moreover, we want to use this function also in the promoted index
code (to return a bytes_view from the promoted index which was saved as a
temporary_buffer).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1468761437-27046-1-git-send-email-nyh@scylladb.com>
2016-07-17 16:29:26 +03:00
Avi Kivity
b6b35a986a Merge seastar upstream
* seastar 5e97d5f...d699205 (3):
  > rpc: fix race between send loop and expiration timer
  > rpc: fix cancellable type move operations
  > reactor: create new files with a more reasonable default mode
2016-07-17 13:27:23 +03:00
Paweł Dziepak
81e4952c78 row_cache: fix marking last entry as continuous
Range queries need to take special care when transitioning between
ranges that are read from sstables and ranges that are already in the
cache.

Original code in such case just started a secondary reader and told it
to unconditionally mark the last entry as continuous (primary reader has
already returned an element tha immediately follows the range that is
going to be read form sstables).

However, that information may get stale. For instance, by the time
secondary reader finish reading its range the element immediately
following it may get evicted from the cache thus causing continuity flag
to be incorrectly set.

The solution is to ensure that the element immediately after the range
read from sstables is still in the cache.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468586893-15266-1-git-send-email-pdziepak@scylladb.com>
2016-07-15 15:15:02 +02:00
Tomasz Grabiec
7328a8eff8 cql: modification_statement: Avoid copying keyspace and table names
Message-Id: <1468574135-4701-1-git-send-email-tgrabiec@scylladb.com>
2016-07-15 10:36:53 +01:00
Duarte Nunes
aaa76d58ba query: Move to_partition_range to dht namespace
This patch moves to_partition_range, from the query namespace
to the dht namespace, where it is a more natural fit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468498060-19251-1-git-send-email-duarte@scylladb.com>
2016-07-15 10:41:52 +02:00
Tomasz Grabiec
32937f354e Merge branch 'duarten/thrift/dml/v9' from git@github.com:duarten/scylla.git
From Duarte:

This patchset adds support for the data manipulation verbs. It defers support
for super columns and mixed CFs (a static CF treated as dynamic) to later
patchsets.

Everything is done on top of storage_proxy; it was only necessary to modify the
layers below to add support for different kinds of limits: per partition row
limit, which corresponds to limiting the number of columns returned when
querying a dynamic CF, and limit on the number of partitions returned, so that
we can emulate the one thrift row per key model when querying dynamic CFs.

Ref #399
2016-07-14 18:26:07 +02:00
Duarte Nunes
df1234d86a thrift: Mark static CFs as non-compound
By default, the schema is marked as compound regardless of the
comparator. Since a composite comparator for static CFs is currently
unsupported (otherwise thrift column families would be
indistinguishable from CQL ones), just mark them as non-compound.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:55 +02:00
Duarte Nunes
901d4d1628 thrift: Skip CQL3 column families
This patch prevents CQL3 column families from being returned to
clients or subject to updates from thrift.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
92adbaab0a thrift: Warn about unimplemented features
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
a924f14441 thrift: Validate thrift Columns
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
7c1bf41b0d thrift: Implement truncate verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
4f440217e5 thrift: Implement remove verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
237e3b28d6 thrift: Implement insert verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
5c5056e4f9 thrift: Implement atomic_batch_mutate verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
f237b5ff19 thrift: Implement batch_mutate on top of storage_proxy
So that the specified consistency level can be respected.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
12dca9fdc9 thrift: Convert thrift Mutation to internal one
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
822a315dfa thrift: Implement get_multi_slice verb
The get_multi_slice verb is used to perform multiple slices on a
single row key in one operation. It takes a set of column_slices,
which we normalize to not contain any overlapping ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:54 +02:00
Duarte Nunes
9792a77266 range: Add deoverlap function
This patch adds the deoverlap function to range.hh, which takes in a
vector of possibly overlapping ranges and returns a vector of
non-overlapping ranges covering the same values.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 18:20:41 +02:00
Duarte Nunes
c910a4639c thrift: Implement get_paged_slice verb
The get_paged_slice verb is similar to the get_range_slices verb,
except that it doesn't take a SlicePredicate. Instead, it takes a
column from which to start the query.

For dynamic CFs, we use the partition_slice::specific_ranges to single
out the first partition, and query starting from the start_column row.
For static CFs, we issue an initial query to fetch the remainder of
columns from the first partition, and at least one more query to fetch
the subsequent columns until the limit is reached. This implies a
performance penalty for static CFs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
370572884c thrift: Implement get_range_slices verb
The get_range_slices verb is similar to the multiget_slice verb,
except that it operates on a range of partition keys (or tokens).

In origin, empty partitions are returned as part of the KeySlice, for
which the key will be filled in but the columns vector will be empty.
Since in our case we don't return empty partitions, we don't know which
partition keys in the specified range we should return back to the client.
So for now, our behavior differs from Origin.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
b872db55bd thrift: Implement get_count verb on top of multiget_count
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
a42b7ba3f7 thrift: Implement multiget_count verb
This patch implements the multiget_count verb in a similar fashion as
multiget_slice, but using an accumulator that counts the returned
columns instead of create thrift ColumnOrSuperColumn objects.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
a44561870a thrift: Implement get verb in terms of get_slice
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
db4c26d5b8 thrift: Implement get_slice in terms of multiget_slice
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
cd3a12535e thrift: Implement multiget_slice verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
acd39d871f thrift: Validate column names
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
4e9af0dc8e thrift: Make read_command from SlicePredicate
This patch build a query::read_command from a SlicePredicate,
for both dynamic and static column families.

For dynamic CFs, restrictions on the clustering columns are added, and
for static CFs, limits and ordering is defined inline by selecting the
correct regular columns.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
21d0a2c764 query: Optionally send cell ttl
This patch adds support to send a cell's ttl as part of a query's
result. This is needed for thrift support.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
eb8f5fafb2 thrift: Add partition key validation
This patch validates whether the specified partition key is not empty
and under the size limit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
f57136f2f3 thrift: Make key_from_thrift take schema ref
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
e2b4cc4849 types: Add to_bytes_view function
This patch adds a function that converts a reference to an std::string
to a bytes_view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
a647fea30b schema: Add is_dynamic to thrift_schema
This patch adds the is_dynamic() function to thrift_schema, which
tells whether the underlying column family is dynamic or not,
according to thrift rules.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
527ec2ab59 thrift: Support composite keys
This patch adds support for composite comparators (which, for dynamic
column families, it means composite clustering keys) and for composite
keys (composite partition keys).

Support for composite column names and regular columns is deferred,
which will entail making compound_type an abstract_type.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
7f5ec71b1f thrift: Extract ttl calculation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Duarte Nunes
324b776c1b thrift: Add lookup_schema function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-14 15:36:23 +02:00
Avi Kivity
4ef2b1b25f Merge seastar upstream
* seastar e660d54...5e97d5f (7):
  > util::lazy_eval: add an implicit cast operator overload
  > rpc: consolidate read_(request|response)_frame logic
  > rpc: handle lz4 compressor errors
  > iotune: provide a status dump if we can't calculate a proper number of io_queues
  > rpc: adjust lz4 compression for older lz4.h
  > Fix chunked_fifo move assignment
  > rpc: add missing header file protectors
2016-07-14 16:27:23 +03:00
Avi Kivity
32d670a792 Merge "Scylla-housekeeping check version" from Amnon
"This series replaces the original scylla-help.py

It contains only a basic script that checks daily for version and report if a
newer version matched.

The script is added as a service and will be started and shutdown with
scylla-server."
2016-07-14 14:58:33 +03:00
Avi Kivity
1048e1071b db: do not create column family directories belonging to foreign keyspaces
Currently, for any column family, we create a directory for it in all
keyspace directories.  This is incredibly awkward.

Fix by iterating over just the keyspace's column families, not all
column families in existence.

Fixes #1457.
Message-Id: <1468495182-18424-1-git-send-email-avi@scylladb.com>
2016-07-14 14:31:05 +03:00
Amnon Heiman
260761f2dd rules.in: Add the scylla-timer to ubuntu
This adds a rule to install the scylla-timer as part of the ubuntu
package.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-14 12:46:47 +03:00
Amnon Heiman
3be9ab38e2 ubuntu.in: Add dependency to python3-requests
The check version script uses the python requests package, this add the
dependency to the ubuntu package.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-14 12:46:47 +03:00
Amnon Heiman
3b9db378ac scylla-server.install.in: Pack the scyll-housekeeping on ubuntu
This adds the scylla-housekeeping to the ubuntu packging.
2016-07-14 12:46:47 +03:00
Amnon Heiman
948140bec3 Adding a timer service for ubuntu scylla-housekeeping
Ununtu 14.4 upstart does not support timers for recurrent operations.
The upstart cookbook suggest a way to mimic this functionality here:
http://upstart.ubuntu.com/cookbook/#run-a-job-periodically

This patch adds a service that runs the house-keeping daily.

Setting it as a service insure that it would start and stop with
scylla-server service.
2016-07-14 12:46:39 +03:00
Avi Kivity
23edc1861a db: estimate queued read size more conservatively
There are plenty of continuations involved, so don't assume it fits in 1k.
Message-Id: <1468429516-4591-1-git-send-email-avi@scylladb.com>
2016-07-14 11:42:24 +02:00
Avi Kivity
d3c87975b0 db: don't over-allocate memory for mutation_reader
column_family::make_reader() doesn't deal with sstables directly, so it
doesn't need to reserve memory for them.

Fixes #1453.
Message-Id: <1468429143-4354-1-git-send-email-avi@scylladb.com>
2016-07-14 10:01:42 +02:00
Paweł Dziepak
10c144ffd4 types: fix type aliasing violation
Any pointer can be casted to char*, but not the other way around. This
causes GCC6 to misoptimize timestamp_type_impl::from_string().

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468413349-27267-1-git-send-email-pdziepak@scylladb.com>
2016-07-13 17:22:16 +03:00
Tomasz Grabiec
c97871d95c migration_manager: Uncomment logging for keysapce drop
Message-Id: <1468413673-6899-1-git-send-email-tgrabiec@scylladb.com>
2016-07-13 13:42:23 +01:00
Gleb Natapov
9cc076c9f3 storage_proxy: preserve endpoint's order while filtering local nodes for query
filter_for_query() gets sorted by preference list of endpoints and
should preserve that order after filtering out non local endpoints for
local query. partition() does not guaranty this while stable_partition()
does, so use it instead.

Fixes #1450.
Message-Id: <20160713100909.GM10767@scylladb.com>
2016-07-13 13:17:28 +03:00
Tomasz Grabiec
7227c537ce Merge branch 'pdziepak/streamed-mutations-hashing/v5' from seastar-dev.git
From Paweł:

This is another episode in the "convert X to streamed mutations" series.
Hashing mutations (mainly for repair) is converted so that it doesn't
need to rebuild whole mutation.

The first part of the series changes the way streamed mutations deal
with range tombstones. Since it is not necessary to make sure we write
disjoint tombstones to sstables there is no need anymore for streamed
mutations to produce disjoint tombstones and, consequently, no need for
range tombstones to be split into range_tombstone_begin and
range_tombstone_end.

The second part is the actual hashing implementation. However, to ensure
that the hash depends only on the contents of the mutation and no the
way it is stored in different data sources range tombstones have to be
made disjoint before they are hashed.

This series also ensures that any changes caused by streamed mutations
to hashing and streaming do not break repair during upgrade.
2016-07-13 11:24:00 +02:00
Duarte Nunes
674afc52bc compound_test: Test singular composite_view::explode()
This patch adds a test case for composite_view::explode() called on a
non-compound composite.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468353393-3074-1-git-send-email-duarte@scylladb.com>
2016-07-13 11:23:24 +02:00
Paweł Dziepak
3fe1aec29d streaming: avoid word "ERROR" in non-error messages
Some tools (e.g. ccm) get confused and consider messages containing word
"ERROR" as error level messagess irrespectively of their actual severity
level.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468399752-5228-1-git-send-email-pdziepak@scylladb.com>
2016-07-13 12:06:33 +03:00
Paweł Dziepak
eb88181347 repair: ask for streamed checksums if cluster supports them
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
7e06499458 repair: convert hashing to streamed_mutations
This patch makes hashing for repair calculate checksums in a way that
doesn't require rebuilding whole mutation.
Unfortunately, such checksums are incompatible with the old ones so the
old way for computing checksums is preserved for compatibility reasons.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
e779e2f0c9 streaming: do not fragment mutations in mixed cluster
The receiving side needs to handle fragmented mutations properly so that
isolation guarantees are not broken. If the receiving node may be an old
one do not fragment mutations.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
85c092c56c storage_service: add LARGE_PARTITIONS_FEATURE
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
c5662919df tests/streamed_mutation: test hashing
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
fe172484bd streamed_mutation: add mutation_hasher
mutation_hasher is a consumer of streamed_mutation that feeds its data
to a specified hasher.
It is not compatible with hashing_partition_visitor.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
eb1dcf08e7 tests/streamed_mutation: add test for range_tombstones_stream
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Paweł Dziepak
93cc4454a6 streamed_mutation: emit range_tombstones directly
Originally, streamed_mutations guaranteed that emitted tombstones are
disjoint. In order to achieve that two separate objects were produced
for each range tombstone: range_tombstone_begin and range_tombstone_end.

Unfortunately, this forced sstable writer to accumulate all clustering
rows between range_tombstone_begin and range_tombstone_end.

However, since there is no need to write disjoint tombstones to sstables
(see #1153 "Write range tombstones to sstables like Cassandra does") it
is also not necessary for streamed_mutations to produce disjoint range
tombstones.

This patch changes that by making streamed_mutation produce
range_tombstone objects directly.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:18 +01:00
Paweł Dziepak
c3a8539074 streamed_mutation: add more comparators to position_in_partition
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:08 +01:00
Paweł Dziepak
27fea7bf2c mutation_partition: add non-cons rows and tombstones accessors
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
2208d4b53e range_tombstone_list: add non-const begin() and end()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
5a790a9b49 range_tombstone: add flip()
range_tombstone::flip() flips range bounds. This is necessary in order
to use range tombstone in reversed mutation fragment streams.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
e1d306fa0d range_tombstone: add memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
91a866501d range_tombstone: add range_tombstone_accumulator
range_tombstone_accumulator is a helper class that allows determining
tombstone for a clustering row when range tombstones and clustering rows
are streamed from streamed_mutation.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Paweł Dziepak
cd7937d33b range_tombstone: add apply()
range_tombstone::apply() allows merging two, possibly overlapping, range
tombstones with the same start bound and produces one or two disjoint
range tombstones as a result.

It is intended to be used for merging tombstones coming from different
sources.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:50:07 +01:00
Nadav Har'El
aec90a22da sstable parsing: assert we do not lose clustering rows
The sstable parsing code calls mp_row_consumer::flush() after every
clustering row has been read, and this puts the now complete row in a single
field "_ready". The assumption is that at this point parsing will stop, the
consumer will move out this _ready (mp_row_consumer::get_mutation_fragment())
and when flush() is later called again, _ready will be empty again.

This assumption is correct in our code, but is based on an intricate
combination of estoreric parts of the code, such as:

 1. In data_consume_row_context we stop parsing after reading the parition's
    header, before reading any clustering rows, giving the caller the chance
    to call sstable_streamed_mutation::read_next() to be prepared for the
    incoming mutations.

 2. In mp_row_consumer::flush_if_needed(), we stop the parser after each
    individual clustering row.

It is easy to break this assumption, and I did this in one of my code changes,
and the result was silent loss of clustering rows, as "_ready" got silently
overwritten before the reader had a chance to move it out.

What this patch does is to add an assertion: If a clustering row is silently
lost before being transferred to the mutation fragment reader, we croak.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1468389955-24600-1-git-send-email-nyh@scylladb.com>
2016-07-13 09:42:48 +01:00
Duarte Nunes
4eca7632ec sstables: Replace composite fields with raw bytes
This patch fixes a regression introduced in
f81329be60, which made keys compound by
default when using a particular ctor, in turn leading to mismatches
when comparing the same key built with functions that properly
consider compoundness.

As a temporary fix, the sstable::key and sstable::key_view classes
store raw bytes instead of a composite.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468339295-3924-1-git-send-email-duarte@scylladb.com>
2016-07-12 18:08:04 +02:00
Duarte Nunes
f013425bb5 query: Ensure timestamp is last param in read_command
Since the timestamp is not serialized, it must always be the last
parameter of query::read_command. This patch reorders it with the
partition_limit parameters and updates callers that specified a
timestamp argument.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468312334-10623-1-git-send-email-duarte@scylladb.com>
2016-07-12 10:41:54 +01:00
Amnon Heiman
41546747d8 scylla-server.service: Start the scylla-housekeeping
This makes scylla-server to try and start the scylla-housekeeping.

Failing to start the service will not interfere with the scylla-server
start.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-12 12:32:52 +03:00
Amnon Heiman
0eba2b8fd5 scylla.spec.in: Pack the scylla-housekeeping service
This change pack and install the scylla-housekeeping service under
redhat like systems.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-12 12:32:48 +03:00
Tomasz Grabiec
c5e3c9bc35 Merge branch 'duarten/composite-v7' from git@github.com:duarten/scylla.git
From Duarte:

This patchset adds a representation of a legacy composite
value to compound_compat.hh and replaces the one in
sstables/key.hh. This patchset is needed for the thrift series.
2016-07-12 10:49:02 +02:00
Amnon Heiman
6d5049d90b Adding the scylla-housekeeping service
The scylla housekeeping service responsible for recurent tasks.

It is currently set to run daily and report if the version is correct.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-12 11:47:04 +03:00
Amnon Heiman
30efdabf55 Introducting the scylla-housekeeping script
scylla-housekeeping is a script that check and report for hardware and software issues.

The first phase of it check for newer version and report if the version
is old.

To see the available options run
scylla-housekeeping help
2016-07-12 11:12:43 +03:00
Glauber Costa
73a70e6d0a config: Use Scylla in user visible options
We have imported most of our data about config options from Cassandra.  Due to
that, many options that mention the database by name are still using
"Cassandra".

Specially for the user visible options, which is something that a user sees, we
should really be using Scylla here.

This patch was created by automatically replacing every occurrence of "Cassandra"
with "Scylla" and then later on discarding the ones in which the change didn't
make sense (such as Unused options and mentions to the Cassandra documentation)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1423e1d7e36874a1f46bd091aec96dcb4d8482d9.1468267193.git.glauber@scylladb.com>
2016-07-12 09:18:17 +03:00
Duarte Nunes
f81329be60 sstables: sstables::key delegates to composite
The sstables::key class now delegates much of its functionality
to the composite class. All existing behavior is preserved.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-11 23:37:33 +02:00
Gleb Natapov
726b79ea91 messaging_service: enable internode_compression option
Use LZ4 for internode compression if enabled.

Message-Id: <20160711141734.GZ18455@scylladb.com>
2016-07-11 18:30:21 +03:00
Avi Kivity
201f585ab6 Merge seastar upstream
* seastar e7a7d41...e660d54 (1):
  > rpc: add factory class for lz4 compressor
2016-07-11 18:29:43 +03:00
Glauber Costa
f7706d51d1 scyllatop: fix typo
Keyborad -> Keyboard

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <349f20fd69be2f2e05ae0b7800e34a336cd2472b.1468248179.git.glauber@scylladb.com>
2016-07-11 18:27:49 +03:00
Duarte Nunes
ad8ff1df7e sstables: Replace composite class
This patch replaces the sstables::composite class with the one in
compound_compat.hh.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-11 16:55:11 +02:00
Duarte Nunes
0b87d16699 composite: Add unit tests
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-11 16:55:11 +02:00
Duarte Nunes
b179d8d378 compound_compat: Parse legacy compound values
This patch adds support for parsing legacy compound values by
introducing the composite class, a wrapper around a sequence of bytes
serialized in the legacy format for compounds. Compound values can be
sent though the thrift API.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-11 16:55:07 +02:00
Avi Kivity
9b08ddb639 Merge seastar upstream
* seastar 9267dfa...e7a7d41 (3):
  > Merge "Compression support for RPC" from Gleb
  > reactor: allow sleeping while disk aio is pending
  > sstring: add resize method
2016-07-11 16:23:29 +03:00
Calle Wilund
4ab03e98cf commitlog: Ensure we don't end up in a loop when we must wait for alloc
Continuation reordering could cause us to repeatedly see the
segment-local flag var even though actual write/sync ops are done.
Can cause wild recursion without actual delayed continuation ->
SOE.

Fix by also checking queue status, since this is the wait object.

Message-Id: <1468234873-13581-1-git-send-email-calle@scylladb.com>
2016-07-11 14:12:38 +03:00
Avi Kivity
f126efd7f2 transport: encode user-defined type metadata
Right now we fall back to tuples, which confuses the client.

Fixes #1443.

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468167120-1945-1-git-send-email-avi@scylladb.com>
2016-07-11 08:51:17 +03:00
Takuya ASADA
d2caa486ba dist/redhat/centos_dep: disable go and ada language on scylla-gcc package, since ScyllaDB never use them
centos-master jenkins job failed at building libgo, but we don't need go language, so let's disable it on scylla-gcc package.
Also we never use ada, disable it too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1468166660-23323-1-git-send-email-syuu@scylladb.com>
2016-07-10 19:12:52 +03:00
Avi Kivity
24e3026e32 Merge "compaction manager refactoring" from Raphael 2016-07-10 17:16:23 +03:00
Tomasz Grabiec
6a1f9a9b97 db: Improve logging
Message-Id: <1467997671-16570-1-git-send-email-tgrabiec@scylladb.com>
2016-07-10 16:15:03 +03:00
Avi Kivity
b5bef73ad2 Merge "Avoiding checking bloom filters during compaction" from Tomasz
"Checking bloom filters of sstables to compute max purgeable timestamp
for compaction is expensive in terms of CPU time. We can avoid
calculating it if we're not about to GC any tombstone.

This patch changes compacting functions to accept a function instead
of ready value for max_purgeable.

I verified that bloom filter operations no longer appear on flame
graphs during compaction-heavy workload (without tombstones).

Refs #1322."
2016-07-10 11:33:41 +03:00
Tomasz Grabiec
8c4b5e4283 db: Avoiding checking bloom filters during compaction
Checking bloom filters of sstables to compute max purgeable timestamp
for compaction is expensive in terms of CPU time. We can avoid
calculating it if we're not about to GC any tombstone.

This patch changes compacting functions to accept a function instead
of ready value for max_purgeable.

I verified that bloom filter operations no longer appear on flame
graphs during compaction-heavy workload (without tombstones).

Refs #1322.
2016-07-10 09:54:20 +02:00
Tomasz Grabiec
c0233c877d db: Avoid out-of-memory when flushing cannot keep up
memtable_list::seal_on_overlflow() is called on each mutation to check
if current memtable should be flushed. It will call
memtable_list::seal_active_memtable() when that is the case.

The number of concurrent seals is guarded by a semaphore, starting
from commit 0f64eb7e7d, and allows
at most 4 of them.

If there are 4 flushes already pending, every incoming mutation will
enqueue a new flush task on the semaphore's wait list, without waiting
for it. The wait queue can grow without bounds, eventually leading to
out-of-memory.

The fix is to seal the memtable immediately to satisfy should_flush()
condition, but limit concurrency of actual flushes. This way the wait
queue size on the semaphore is limited by memtables pending a flush,
which is fairly limited.

Message-Id: <1467997652-16513-1-git-send-email-tgrabiec@scylladb.com>
2016-07-10 10:53:51 +03:00
Tomasz Grabiec
74ff30a31a mutation_reader: Introduce stable_flattened_mutations_consumer adaptor
Needed to make compact_mutation class non-movable later. It is used in
do_with, so needs to be movable. Will be solved by using this adaptor.
2016-07-09 22:31:28 +02:00
Tomasz Grabiec
fb44f895b2 mutation_reader: Name template parameters after concepts
With so many consumer concepts out there, it is confusing to name
parameters using genering "Consumer" name, let's name them after
(already defined) concepts: CompactedMutationsConsumer, FlattenedConsumer.
2016-07-09 22:31:27 +02:00
Raphael S. Carvalho
ed5e7e6842 compaction: refactor compaction manager
Previously, same function was used to handle both regular compaction
and cleanup requests. That's bad because a lot of conditions were
added for both compaction types to live in the same function.
Now, cleanup and regular compaction will live in different functions.
They share a lot of code, so helper functions were introduced.
This change is also important for user-initiated compaction that
will go through compaction manager in the future.
Code is also a lot easier to read now.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 16:37:53 -03:00
Raphael S. Carvalho
da6a2b429d compaction: add functions to register and deregister compacting sstables
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 16:00:51 -03:00
Raphael S. Carvalho
4d6dce8ec9 compaction: add helper function to get candidates for strategy
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 15:06:14 -03:00
Raphael S. Carvalho
e38f66c6fe database: make certain column family functions const qualified
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 15:05:22 -03:00
Raphael S. Carvalho
bfc5376548 compaction: remove gate from compaction manager task
There is no longer a need to use gate for regular termination of
fiber that runs compaction. Now, we only set task->stopping to
true, ask for compaction termination, and wait for its future to
resolve. Code is simplified a lot with this change.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 15:05:10 -03:00
Paweł Dziepak
cba996a3ea Merge "Implement missing functions for byte_ordered_partitioner" from Asias 2016-07-08 10:49:25 +01:00
Asias He
f4389349e4 config: Enable partitioner option
Enable --partitioner option so that user can choose partitioner other
than the default Murmur3Partitioner. Currently, only Murmur3Partitioner
and ByteOrderedPartitioner are supported. When non-supported partitioner
is specifed, error will be propogated to user.
2016-07-08 17:44:55 +08:00
Asias He
9c27b5c46e byte_ordered_partitioner: Implement missing describe_ownership and midpoint
In order to support ByteOrderedPartitioner, we need to implement the
missing describe_ownership and midpoint function in
byte_ordered_partitioner class.

As a starter, this path uses a simple node token distance based method
to calculate ownership. C* uses a complicated key samples based method.
We can switch to what C* does later.

Tests are added to tests/partitioner_test.cc.

Fixes #1378
2016-07-08 17:44:55 +08:00
Asias He
e0949a8f4f storage_service: Exit shadow round state if it fails
If a node fails to talk to any seed node, shadow round will fail. We
should exit shadow round state before we continue.

This issue is spotted by
consistency_test.TestConsistency.data_query_digest_test dtest.
Message-Id: <ba0613532a69bac369ca316ab61d907b320c8e68.1467963674.git.asias@scylladb.com>
2016-07-08 10:05:07 +01:00
Avi Kivity
8dab93a853 sstables: fix low disk utilization with compression and small chunk lengths
As Nadav notes we use the chunk length as the buffer size for the compressed
stream too.

Fix by using it only for the outer (uncompressed) stream; the inner
(compressed) stream uses the sstable buffer size, 128 kiB.

Fixes #1402.
Message-Id: <1467910556-5759-1-git-send-email-avi@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2016-07-07 18:13:30 +01:00
Vlad Zolotarov
f2bf453be2 database: revive mutation retry in case of replay_position_reordered_exception
The logic that would retry applying a mutation in case of
a replay_position_reordered_exception error was broken by
a commit 0c31f3e626
Author: Glauber Costa <glauber@scylladb.com>
Date:   Wed Apr 20 19:09:21 2016 -0400

    database: move memtable throttler to the LSA throttler

This patch makes it work again.

Fixes #1439

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1467893342-30559-1-git-send-email-vladz@cloudius-systems.com>
2016-07-07 15:00:35 +02:00
Tomasz Grabiec
de429d6a53 Merge branch 'dev/pdziepak/streamed-mutations-streaming/v3'
Support for streaming of large partitions from Paweł:

This series converts streaming to streaming_mutations so that there
is need to store full mutation in memory in order to send or receive
it.

The first several patches add a way of estimating mutation fragment
memory usage and introduce fragment_and_freeze() which produces
a stream of reasonably sized frozen mutations from a single streamed
mutation.

The second part of this patchset makes sure that streaming mutations
in fragments doesn't break isolation guarantees. This is achieved by
delaying visibility of sstables produced by streaming until the
streaming is completed. However, our current receiving code merges
mutations from all streaming plans together thus making it impossible
to track which data was received from a particular streaming plan.
The solution to that problem is to introduce an additional flag to
STREAM_MUTATION verb which informs the receiver whether the mutation
is fragmented and care must be taken to preserve isolation. Small
mutations behaved as they were, with writes from different stream
plans coalesced while big mutations are handled separately for each
streaming task.
2016-07-07 13:23:39 +02:00
Paweł Dziepak
d9eb4d8028 streaming: use fragment_and_freeze() to send mutations
Commit 206955e4 "streaming: Reduce memory usage when sending mutations"
moved streaming mutation limiter from do_send_mutations() to
send_mutations(). The reason for that was that send_mutation() did full
mutation copies. That's no longer the case and streaming limiter should
be moved back to do_send_mutation() in order to provide back pressure to
fragment_and_freeze().

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:36 +01:00
Paweł Dziepak
32a5de7a1f db: handle receiving fragmented mutations
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.

Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.

This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
f2ae31711e streaming: inform CF when streaming fails
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
4031c0ed8f streaming: pass plan_id to column family for apply and flush
plan_id is needed to keep track of the origin of mutations so that if
they are fragmented all fragments are made visible at the same time,
when that particular streaming plan_id completes.

Basically, each streaming plan that sends big (fragmented) mutations is
going to have its own memtables and a list of sstables which will get
flushed and made visible when that plan completes (or dropped if it
fails).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
51ec7a7285 db: wait for ongoing flushes at end of streaming
When flush_streaming_mutations() is called at the end of streaming it is
supposed to flush all data and then invalidate cache. ranges However, if
there are already some memtable flushes in progress it won't wait for them.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
5bc51821fe sstables: allow writing unsealed sstables
The purpose of this patch is to split the actions of writing sstable and
sealing it. As long as the sstable is unsealed it is considered
incomplete and is going to be removed on reboot.

Such functionality is needed in order to defer visibility of sstables
created during streaming until the streaming is complete.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
a7b6c1110f sstables: do not require seal_sstable() to be run in thread
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
4e34bd4e8a tests/streamed_mutation: test fragment_and_freeze()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
19629e95e2 frozen_mutation: add fragment_add_freeze()
fragment_and_freeze() produces a stream of frozen mutations from a
single streamed_mutation.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:30 +01:00
Paweł Dziepak
820bd6c9bc streamed_mutation: add mutation_fragment::memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Paweł Dziepak
23d0bfd065 mutation_partition: add row::memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Paweł Dziepak
1d54327afd atomic_cell_or_collection: add memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Paweł Dziepak
d0ee750cec keys: add memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Paweł Dziepak
cfa581b426 utils/managed_vector: add memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Paweł Dziepak
703509a1c7 utils/managed_bytes: add memory_usage()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Paweł Dziepak
a289816b31 streamed_mutation: fix mutation_fragment::consume() return type
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Paweł Dziepak
37bd7230bc streamed_mutation: add mutation fragment visitor
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:17:25 +01:00
Glauber Costa
54ce6221a7 allow the dirty memory manager to be used without a database object
Some of our tests don't provide a database object to a CF. Create a default
dirty memory manager object that can be used without a database for them.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <872f8c9232ff87d788e271b1db86c814d7a75d9f.1467832713.git.glauber@scylladb.com>
2016-07-07 10:00:43 +01:00
Raphael S. Carvalho
0772d20c60 fix compilation in debug mode
build/debug/sstables/compaction_strategy.o: In function
`date_tiered_manifest::date_tiered_manifest(std::map<basic_sstring<char, unsigned int, 15u>,
basic_sstring<char, unsigned int, 15u>, std::less<basic_sstring<char, unsigned int, 15u> >,
std::allocator<std::pair<basic_sstring<char, unsigned int, 15u> const, basic_sstring<char,
unsigned int, 15u> > > > const&)':
/home/centos/scylla/sstables/date_tiered_compaction_strategy.hh:67: undefined reference to
`date_tiered_manifest::DEFAULT_BASE_TIME_SECONDS'

That's fixed by moving definition of static constexpr outside the class.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20c16ad71f64900aa5591018bc4e976406cfebb3.1467870383.git.raphaelsc@scylladb.com>
2016-07-07 11:52:37 +03:00
Avi Kivity
9a8788019d row_cache: fix visitor for boost <= 1.55
Older boosts can't return a future from a visitor (likely lacking support
for move-only objects).  Supply a dirty hackaround.

Message-Id: <1467822548-25940-1-git-send-email-avi@scylladb.com>
2016-07-06 19:55:51 +03:00
Avi Kivity
21031d276b Merge seastar upstream
* seastar c82c36f...9267dfa (6):
  > app_template: Make run() wait for func when reactor exit is triggered externally
  > core: Introduce futurize_apply() helper
  > rpc: make unexpected eof messages more informative
  > Fix boost version check
  > reactor: more fix for smp poll with older boost
  > reactor: fix build on older boost due to spsc_queue::read_available()
2016-07-06 18:14:13 +03:00
Avi Kivity
02530faeb2 compaction: fix tombstones not being garbage collected during compaction
2a46410f4a changed sstable_list from a map
to a set, so it is no longer sorted by generation.  The code for finding
the list of sstables not being compacted relied on this sort order, and
now broke, returning a longer list than needed (including some of the
sstables being compacted).  As a result, the compaction code preserved
the tombstones, incorrectly thinking there was still live data they
referenced.

Fix by sorting the set explicitly.

Fixes #1429.
Message-Id: <1467793026-6571-1-git-send-email-avi@scylladb.com>
2016-07-06 10:22:31 +02:00
Asias He
0c56bbe793 gossip: Make get_supported_features and wait_for_feature_on{_all}_node private
They are used only inside gossiper itself. Also make the helper
get_supported_features(std::unordered_map<gms::inet_address, sstring>) static.

Message-Id: <f434c145ad9138084708b60c1d959b84360e47b2.1467775291.git.asias@scylladb.com>
2016-07-06 09:54:56 +03:00
Avi Kivity
ab279a4752 Merge "Add support to date tiered compaction strategy" from Raphael
"After this patchset, date tiered compaction strategy is supported by Scylla.

For those who don't know what it is about, the following article may help:
https://labs.spotify.com/2014/12/18/date-tiered-compaction/

It's also nicely explained here by our wiki page:
https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction

Basically, date tiered strategy was developed to help the database perform better
when facing a time series workload. Date tiered strategy will work to keep data
written at nearly the same time together, such that the number of relevant sstables
for a time-based query is relatively low. We still lacks support to filter out
sstables based on time parameters of a query, but that feature should come ASAP.

The following dtests now pass:
compaction_test.py:TestCompaction_with_DateTieredCompactionStrategy.compaction_delete_test
compaction_test.py:TestCompaction_with_DateTieredCompactionStrategy.compaction_strategy_switching_test

Used cassandra-stress with the parameter '-schema compaction\(strategy=DateTieredCompactionStrategy\)'
to check stability.

Fixes #511."
2016-07-06 09:51:12 +03:00
Avi Kivity
7438c9de5c Merge "Fix database freeze with load for multiple CFs" from Glauber
"Issue 1195 describes a scenario with a fairly easy reproducer in which
we can freeze the database. That involves writing simultaneously to
multiple CFs, such that the sum of all the memory they are using is larger
than the dirty memory limit, without not any of them individually being
larger than the memtable size.

This patchset rewrites the throttling code, including now active flushes
so that this situation cannot happen.

Fixes #1195"
2016-07-06 09:48:13 +03:00
Raphael S. Carvalho
b5ec4d46c6 tests: add test for date tiered compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
b699ef2de3 compaction: wire up date tiered compaction strategy
After this commit, date tiered compaction strategy is supported
on Scylla.

To understand how it works, take a look at our wiki page:
https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction

Fixes #511.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
e5cc0cc6c4 compaction: implement date tiered compaction strategy
This commit is basically about converting Java to C++.
Date tiered compaction strategy isn't wired yet.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
cab2892866 tests: add test for sstables::get_fully_expired_sstables
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
e9076f39be compaction: implement function to get fully expired sstables
Strongly based on org.apache.cassandra.db.compaction.
CompactionController.getFullyExpiredSSTables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
69b3860662 tests: add test for leveled_manifest::overlapping
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:45 -03:00
Raphael S. Carvalho
92848efc42 sstables: make overlapping functions static
That's needed for a function that will get overlapping sstables to
get fully expired ones.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:34:34 -03:00
Raphael S. Carvalho
8d38fa49d4 sstables: move code to get uncompacting sstables to a function
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:33:55 -03:00
Raphael S. Carvalho
1118cfc51a tests: test that sstable max_local_deletion_time is properly updated
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:13:34 -03:00
Raphael S. Carvalho
cc6c383249 sstables: properly keep track of max local deletion time
We weren't updating max local deletion time for cells that contain
ttl, or for tombstone cells.
If there is a live cell with no ttl, then max local deletion time
is supposed to store maximum value, which means that the sstable
will not be fully expired later on.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:13:24 -03:00
Raphael S. Carvalho
1ecd9bdefc sstables: fix type of max_local_deletion_time
max_local_deletion_time was incorrectly using an unsigned type
instead of a signed one.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:13:13 -03:00
Raphael S. Carvalho
f9ab94d266 compaction: import DateTieredCompactionStrategy.java
File can be found at the following C* directory:
src/java/org/apache/cassandra/db/compaction

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:12:49 -03:00
Glauber Costa
b0932ceb04 database: act on LSA pressure notification
Issue 1195 describes a scenario with a fairly easy reproducer in which we can
freeze the database. That involves writing simultaneously to multiple CFs, such
that the sum of all the memory they are using is larger than the dirty memory
limit, without not any of them individually being larger than the memtable size.
Because we will never reach the individual memtable seal size for any of them,
none of them will initiate a flush leading the database to a halt.

The LSA has now gained infrastructure that allow us to be notified when pressure
conditions mount. What we will do in this case is initiate a flush ourselves.

Fixes #1195

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 17:46:28 -04:00
Glauber Costa
7169b727ea move system tables to its own region
In the spirit of what we are doing for the read semaphore, this patch moves
system writes to its own dirty memory manager. Not only will it make sure that
system tables will not be serialized by its own semaphore, but it will also put
system tables in its own region group.

Moving system tables to its own region group has the advantage that system
requests won't be waiting during throttle behind a potentially big queue of user
requests, since requests are tended to in FIFO order within the same region
group. However, system tables being more controlled and predictable, we can
actually go a step further and give them some extra reservation so they may not
necessarily block even if under pressure (up to 10 MB more).

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 17:46:28 -04:00
Glauber Costa
c358947284 database: wrap semaphore and region group into a new dirty memory manager
We currently have a semaphore in the column family level that protects us against
multiple concurrent sstable flushes. However, storing that semaphore into the CF,
not the database, was a (implementation, not design) mistake.

One comment in particular makes it quite clear:

   // Ideally, we'd allow one memtable flush per shard (or per database object), and write-behind
   // would take care of the rest. But that still has issues, so we'll limit parallelism to some
   // number (4), that we will hopefully reduce to 1 when write behind works.

So I aimed for the shard, but ended up coding it into the CF because that's closer to the
flush point - my bad.

This patch fixes this while paving the way for active reclaim to take place. It wraps the semaphore
and the region group in a new structure, the dirty_memory_manager. The immediate benefit is that we
don't need to be passing both the semaphore and the region group downwards in the DB -> CF path. The
long term benefit is that we now have a one unified structure that can hold shared flush data in all
of the CFs.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 15:29:04 -04:00
Glauber Costa
d41fcd45d1 memtables: make memtable inherit from region
The LSA memory pressure mechanism will let us know which region is the best
candidate for eviction when under pressure. We need to somehow then translate
region -> memtable -> column family.

The easiest way to convert from region to memtable, is having memtable inherit
from region. Despite the fact that this requires multiple inheritance, which
always raise a flag a bit, the other class we inherit from is
enable_shared_from_this, which has a very simple and well defined interface. So
I think it is worthy for us to do it.

Once we have the memtable, grabing the column family is easy provided we have a
database object. We can grab it from the schema.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 15:05:29 -04:00
Glauber Costa
0c31f3e626 database: move memtable throttler to the LSA throttler
The LSA infrastructure, through the use of its region groups, now have
a throttler mechanism built-in. This patch converts the current throttlers
so that the LSA throttler is used instead.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 15:05:19 -04:00
Yoav Kleinberger
0ad940bfc3 tools/scyllatop: fix crash due to mouse events
due to an urwid-library technicality, some mouse events like scroll or
click would crash scyllatop. This patch fixes this problem.

closes issue #1396.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1467294117-19218-1-git-send-email-yoav@scylladb.com>
2016-07-05 19:08:55 +03:00
Avi Kivity
cb59e724ee Merge "Fix enabling sstable read ahead" from Paweł
"This series contains remaining changes necessary to safely enable read
ahead of sstables. Basically, it makes sure that input_streams are
always properly closed (even in case of exception during read)."
2016-07-05 19:04:19 +03:00
Raphael S. Carvalho
e688fc9550 api: provide estimation of pending compaction
Use compaction_strategy::estimated_pending_compaction() to provide
user with an estimation of number of compaction for strategy to be
fully satisfied.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <39b7d91f2525ca38fb2ce9d8885d0c2e727de7ed.1467667054.git.raphaelsc@scylladb.com>
2016-07-05 19:03:12 +03:00
Raphael S. Carvalho
43926026c3 compaction: introduce compaction strategy method to estimate pending compaction
At the moment, it's not possible to know how many compaction are needed for
compaction strategy to be satisfied. It's not possible to know exactly the
number of pending compaction, but the strategy can provide an estimation.

For size tiered, it's based on number of sstables in each bucket. By dividing
bucket size by max threshold, we get number of compaction needed to compact
that single bucket.

For leveled, it's about the number of sstables that exceeds the limit in
each level.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <e209e52f6159ee274a8358b69961a7c0ce357f7d.1467667054.git.raphaelsc@scylladb.com>
2016-07-05 19:03:11 +03:00
Avi Kivity
76cc6408cd Merge "feature check for seed node" from Asias
""This series implemnts feature check for seed node.
2016-07-05 19:01:01 +03:00
Asias He
6f69963ef9 system_keyspace: Simplify load_host_ids implementation
- Use plain loop instead of do_for_each

- Use row.get_as() instead of row.template get_as()
Message-Id: <3e108d3a6258c0caaf569eb9c79532d9789ea411.1467703722.git.asias@scylladb.com>
2016-07-05 09:47:21 +02:00
Asias He
3f31be58b6 system_keyspace: Simplify load_tokens implemntation
- Use plain loop instead of do_for_each

- Use row.get_as() instead of row.template get_as()
Message-Id: <f959ace4f30078695d383c849ed4520169228f97.1467703722.git.asias@scylladb.com>
2016-07-05 09:47:21 +02:00
Asias He
5236e7a379 storage_service: Implement feature check for seed node
Checking features for seed node is a bit more complicated than non-seed
node, because non-seed node can always talk to at least one seed node,
seed node may not.

In this patch, we distingush new cluster and existing cluster by
checking if the system table is empty. We relax the feature check for
new cluster because the feature check is mostly useful when upgrading an
existing cluster to prevent old node to join new cluster.

When talking to a seed node failed during the check, we fallback to the
check using features stored in the system table. This makes restarting a
seed node when no other seed node is up possible (no other seed node at
all, or other seed node is not up yet).

I tested the following scenarios.

1) start a completely new seed node in a new cluster
* system table is empty, skip the check.

2) start a cluster, restart one seed node, at least one other seed node
is up
* system table is not empty, check with shadow round, shadow round will
* succeed

3) start a cluster, restart one seed node, no other seed node is up
* system table is not empty, check with shadow round, shadow round will
* fail, fallback to system table check.

4) start a cluster, shutdown all the nodes, start one seed node with new
ip address, seed list in yaml is updated with new ip address
* system table is not empty, check with shadow round, shadow round will
* fail, fallback to system table check
2016-07-05 10:09:54 +08:00
Asias He
bb80362c3f gossip: Insert with result.end() in get_supported_features
It is faster than result.begin(), suggested by Avi.
2016-07-05 10:09:54 +08:00
Asias He
72cb4a228b gossip: Add to_feature_set helper
To convert a "," split feature string to a feature set.
2016-07-05 10:09:54 +08:00
Asias He
1d6c57fb40 gossip: Reduce timeout in shadow round
In 3a36ec33db (gossip: Wait longer for seed node during boot up), we
increased the timeout by the factor of 60, i.e., ring_dealy * 60 = 5
seconds * 60 = 5 minutes.

In 57ee9676c2 (storage_service: Fix default ring_delay time), we fixed
the default ring_dealy to 30 seconds. Now the timeout is 30 * 60 seconds
= 30 minutes, which is too long.

Make it 5 minues.
2016-07-05 10:09:54 +08:00
Asias He
88f0bb3a7b gossip: Add check_knows_remote_features
To check if this node knows features in
std::unordered_map<inet_address, sstring> peer_features_string
2016-07-05 10:09:54 +08:00
Asias He
2b53c50c15 gossip: Add get_supported_features
To get features supported by all the nodes listed in the
address/feature map.
2016-07-05 10:09:53 +08:00
Asias He
31df4e5316 system_keyspace: Introduce load_peer_features
To get the peer features stored in the system.peers table.
2016-07-05 10:09:53 +08:00
Paweł Dziepak
4acf77d755 sstables: drop unused data_stream_at()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-04 18:17:43 +01:00
Paweł Dziepak
2cdf498bbd sstables: close input stream in sstable::data_read()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-04 18:17:42 +01:00
Paweł Dziepak
8931b939a1 sstables: use finally() to close input streams
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-04 18:17:42 +01:00
Paweł Dziepak
e6ececce7f Merge seastar upstream
Submodule seastar a47f893..c82c36f:
  > reactor: fix build error
  > util: lazy_eval: fix compilation errors related to operator<<()s
    definitions
2016-07-04 18:14:05 +01:00
Duarte Nunes
41843b32c5 thrift: Correctly mark a CF as dense
And store whether the comparator is a composite type in the case of
dynamic CFs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1467307688-11059-1-git-send-email-duarte@scylladb.com>
2016-07-04 17:40:53 +02:00
Nadav Har'El
c4e871ea2d Work around unexpected data_value constructor
If someone tried to naively use utf8_type->decompose("18wX"), this would
mysteriously fail, returning an empty key.

decompose takes a data_value, so the compiler looked for an implict
conversion from the string constant (const char*) to data_value. We did
not have such a conversion, only conversion from sstring. But the compiler
chose (backed by the C++ standard, no doubt) to implicitly convert the
const char* to a bool (!), and then use data_value(bool). It did not
convert the const char* to an sstring, nor did it warn about the possible
ambiguity.

So this patch adds a data_value(const char*) constructor, so people will
not fall into the same trap that I fell into...

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1467643462-6349-1-git-send-email-nyh@scylladb.com>
2016-07-04 17:50:53 +03:00
Avi Kivity
e22517bafc Merge "Optimize reads from leveled sstables"
In a leveled column family, there can be many thousands of sstables, since
each sstable is limited to a relatively small size (160M by default).
With the current approach of reading from all sstables in parallel, cpu
quickly becomes a bottleneck as we need to check the bloom filter for each
of these sstables.

This patch addresses the problem by introducing a
compaction-strategy-specific data structure for holding sstables.  This
data structure has a method to obtain the sstables used for a read.

For leveled compaction strategy, this data structure is an interval map,
which can be efficiently used to select the right sstables.
2016-07-04 16:00:35 +03:00
Asias He
610a0f7ef0 storage_service: Skip feature check for seed node for now
When a seed node boots up with more than one node in the seed list, it
will fail to talk to the other seed node which is not up yet.
This fails the feature check, so the seed node will not boot.

Skip the feature check for seed node for now, util we have a proper solution.

Fixes recent dtest failure due to fail to boot the seed node.

Message-Id: <e1d4110f96817e45f81dc0bc948dd14600fc5333.1467251799.git.asias@scylladb.com>
2016-07-04 15:09:57 +03:00
Avi Kivity
28fab55e6e Merge "Convert sstable writes to streamed mutations" from Paweł
"This series converts sstable writers (including compaction) to streamed
mutations and makes them use consumer-style interface.

Code related to sstable writes and compaction is converted to consumers
that can be used with consume_flattened_in_thread() (which is a variant
of consume_flattened() intended to be run inside a thread).
compac_for_query is improved so that it can be reused by sstable
compaction."
2016-07-04 15:07:47 +03:00
Avi Kivity
171054e87b Merge seastar upstream
* seastar d4d9e16...a47f893 (1):
  > Merge "overprovisioning support"
2016-07-04 13:46:03 +03:00
Paweł Dziepak
5d0de2179a Merge "Adding scylla version API" from Amnon
Amnon says:

The API that returns the version, currently returns the compatibility
version
(e.g. the version the compatible origin version - currently 2.1.8).

The check version functionality need to know what is the current running
version of scylla.  For that a new API was added that return the current
version.

The result is equivalent of running scylla --version.

After this series a call to:

$ curl -X GET
"http://localhost:10000/storage_service/scylla_release_version"
"666.development-20160703.72f0d4d"

Which is the json representation of:

$ ./build/release/scylla --version
666.development-20160703.72f0d4d
2016-07-04 10:52:44 +01:00
Asias He
f6a2672be0 storage_service: Modify log to match config option of scylla
We currently log as follow:

May  9 00:09:13 node3.nl scylla[2546]:  [shard 0] storage_service - This
node was decommissioned and will not rejoin the ring unless
cassandra.override_decommission=true has been set,or all existing data
is removed and the node is bootstrapped again

Howerver, user should use

   override_decommission:true

instead of

   cassandra.override_decommission:true

in scylla.yaml where the cassandra prefix is stripped.

Fixes #1240
Message-Id: <b0c9424c6922431ad049ab49391771e07ca6fbde.1467079190.git.asias@scylladb.com>
2016-07-04 10:47:49 +02:00
Avi Kivity
76cc0c0cf9 auth: fix performance problem when looking up permissions
data_resource lookup uses data_resource::name(), which uses sprint(), which
uses (indirectly) locale, which takes a global lock.  This is a bottleneck
on large machines.

Fix by not using name() during lookup.

Fixes #1419
Message-Id: <1467616296-17645-1-git-send-email-avi@scylladb.com>
2016-07-04 10:26:18 +02:00
Yoav Kleinberger
49cba035ea tools/scyllatop: leave terminal in a functioning state when user quits with CTRL-C
closes issue #1417.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1467556769-11851-1-git-send-email-yoav@scylladb.com>
2016-07-03 17:43:46 +03:00
Amnon Heiman
e66a1cd705 API: Add implementation for the scylla release version
This adds the implementation to the scylla release version API.

After this patch a call to:

curl -X GET "http://localhost:10000/storage_service/scylla_release_version"

Will return the current scylla release version.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-03 16:29:09 +03:00
Amnon Heiman
56ea8c943e API: add scylla release version API
This adds a definition to the scylla release version. The API already
return the compatibility version (ie. the compatible origin version)

This definition returns the scylla version, a call to the API should
return the same result as running scylla --version.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-07-03 16:26:21 +03:00
Avi Kivity
68e613b313 Rebuild _column_family::_sstables when changing compaction_strategy
The concrete sstable_set type depends on the compaction strategy, so
ask the compaction_strategy to create a new sstable_set object and populate
it.
2016-07-03 13:42:10 +03:00
Avi Kivity
44a6cef4e1 sstable mutation readers: use sstable_set::select()
Apply compaction strategy specific logic to narrow down the set of sstables
used for a query; can speed up reads using LeveledCompactionStrategy
significantly.

Fixes #1185.
2016-07-03 10:50:58 +03:00
Avi Kivity
4cb7618601 Convert column_family::_sstables to sstable_set
Using sstable_set will allow us to filter sstables during a query before
actually creating a reader (this is left to the next patch; here we just
convert the users of the _sstables field).
2016-07-03 10:32:27 +03:00
Avi Kivity
c8237fc262 compaction_strategy: introduce make_sstable_set()
Allow compaction_strategy to create a container for sstables that is
optimized for the strategy.

Most compaction_strategies return bag_sstable_set; leveled compaction
returns the specialized partitioned_sstable_set.
2016-07-03 10:27:01 +03:00
Avi Kivity
168696c558 Introduce partitioned_sstable_set
partitioned_sstable_set assumes that sstable are mostly partitioned along
the token range: only a few sstables will be needed to access a particular
token.  It is implemented as an interval_map.
2016-07-03 10:27:00 +03:00
Avi Kivity
64e4357461 Introduce bag_sstable_set
bag_sstable_set is a generic sstable_set implementation: it assumes nothing
about the sstables.  It is implemented as a vector, and any select will
return the entire sstable set.
2016-07-03 10:27:00 +03:00
Avi Kivity
85e9cf4616 Introduce sstable_set
sstable_set abstracts the notion of a container of sstables, allowing
different compaction strategies to supply their own implementation.  The
intended user is leveled compaction strategy; since it partitions sstables,
it can quickly restrict the number of sstables that participate in a query
by looking at the min/max partition key.

sstable_set also maintains an internal lw_shared_ptr<sstable_list>,
in parallel with the abstract container.  This is to support
column_family::get_sstable(), which returns a lw_shared_ptr<sstable_list>
which must be anchored somewhere if it is not saved at the caller side,
as it isn't in most current callers.
2016-07-03 10:27:00 +03:00
Avi Kivity
c1815abd15 Introduce compatible_ring_position
ring_position is built for modern code that does not require default
constructors or stateless comparators.  But not all code is modern, so
supply a compatible_ring_position that works with old code, at the cost
of some extra storage.  Intended user is boost's interval container
library.
2016-07-03 10:27:00 +03:00
Avi Kivity
2a46410f4a Change sstable_list from a map to a set
sstable_list is now a map<generation, sstable>; change it to a set
in preparation for replacing it with sstable_set.  The change simplifies
a lot of code; the only casualty is the code that computes the highest
generation number.
2016-07-03 10:26:57 +03:00
Duarte Nunes
386c0dd4b2 storage_proxy: Correctly calculate new limit
This patch fixes a bug where we would always return query::max_rows
when calculating the new limit for a retry read command.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1467289746-18177-1-git-send-email-duarte@scylladb.com>
2016-06-30 14:49:56 +02:00
Paweł Dziepak
b150720361 sstable: enable read ahead
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 13:18:24 +01:00
Paweł Dziepak
4513f8b52c sstables: add compressed_file_data_source_impl::close()
compressed_file_data_source_impl should close the underlying data source
properly when asked to.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 13:07:07 +01:00
Paweł Dziepak
55a6911d7a sstables: close input_stream<> properly
If read ahead is going to be enabled it is important to close
input_stream<> properly (and wait for completion) before destroying it.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
e44e12c74a sstables: drop no longer needed code
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
c2f0ee9b5f sstables: add consumer-style sstable compactor
This patch moves compaction logic to a consumer that can be used with
consume_flattened_in_thread(). Internally, sstable_writer is used to
write individual sstables.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
18a9ee105f sstables: add consumer-style sstable writer
sstable_writer encapsulates all logic related to writing sstable.
Previously introduced component_writer is used to write actual
mutations. sstable_writer is intended to be used with
consume_flattened_in_thread(). Its purpose is to be used by higher-level
consumer that needs to write possibly more than one sstable (sstable
compaction is an example of such consumer).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
0e8b8463ba sstables: introduce consumer-style components writer
This patch rewrites do_write_components() so that it can use
consume_flattened_in_thread(). All components-writing code is moved to a
new consumer: component_writer.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
0287e0c9ac mutation_reader: add consume_flattened_in_thread()
This is a version of consume_flattened() intended to be run inside a
thread. All consumer code is going to be invoked in the same thread
context.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
7a95847014 mutation_compactor: prepare for sstable compaction
compact_mutation code is going to be shared among queries and sstable
compaction. There are some differences though. Queries don't provide
_max_purgeable and sstable compaction don't need any limits.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
00bcc05d36 mutation_compactor: _max_purgeable depends on the decorated key
_max_perguable can be different for each partition, since it is computed
using sstables in which that partition is present (or likely to be
present).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
4133cc7a53 mutation_reader: make consume_flattened() produce decorated keys
Since decorated keys are already computed it is better to pass more
information than less. Consumers interested just in partition key can
just drop token and the ones requiring full decorated key don't need to
recompute it.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:00 +01:00
Paweł Dziepak
fe4b739828 mutation_compactor: rename compact_for_query to compact_mutation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Paweł Dziepak
3e86f9ab73 mutation_partition: extract compact_for_query to a separate header
The compacting logic inside compact_for_query is going to be shared with
sstable compaction.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Paweł Dziepak
9b14c93677 streamed_mutation: return reference to decorated key
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Paweł Dziepak
3c08ffb275 query: add full_slice
query::full_slice is a partiton slice which has full clustering row
ranges for all partition keys and no per-partition row limit.
Options and columns are not set.

It is used as a helper object in cases when a reference to
partition_slice is needed but the user code needs just all data there is
(an example of such case would be sstable compaction).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Paweł Dziepak
599ed7f1ed sstables: restore indentation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Paweł Dziepak
e7ff20b3bb sstables: run compaction code inside a thread
Currently, each sstable write has its separate thread. However, the goal
is to have compaction use consume_flattened() with a consumer that
creates and writes the sstables. consume_flattened() needs to be executed
inside a thread, since sstable writer may defer.

This patch is a first step in preparations and it just makes whole
compaction logic run inside a thread. That makes little sense now, since
all sstable writes spawn their own threads but that's going to change
in the following patches.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Duarte Nunes
0ae6eafadd query: Make partition_limit last parameter
The partition_limit should have been added to the end of the ctor
argument list, as its current placement causes some callers to pass it
the timestamp instead of the limit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1467239360-6853-3-git-send-email-duarte@scylladb.com>
2016-06-30 12:31:11 +02:00
Gleb Natapov
8bf82cc31c put additional info into cql timeout exception
Fixes #1397

Message-Id: <20160628101829.GR14658@scylladb.com>
2016-06-30 12:03:48 +02:00
Paweł Dziepak
b70bf086b7 frozen_mutation: handle reversed streams properly
Freezing streamed_mutations assumed that mutation fragments are streamed
in the order they appear in the frozen mutation. That is not true for
reversed streams.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1467277069-18702-1-git-send-email-pdziepak@scylladb.com>
2016-06-30 11:26:45 +02:00
Avi Kivity
9ac730dcc9 mutation_reader: make restricting_mutation_reader even more restricting
While limiting the number of concurrently executing sstable readers reduces
our memory load, the queued readers, although consuming a small amount of
memory, can still grow without bounds.

To limit the damage, add two limits on the queue:
 - a timeout, which is equal to the read timeout
 - a queue length limit, which is equal to 2% of the shard memory divided
   by an estimate of the queued request size (1kb)

Together, these limits bound the amount of memory needed by queued disk
requests in case the disk can't keep up.
Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>
2016-06-29 15:17:35 +02:00
Raphael S. Carvalho
85cb2a6d35 database: trigger compaction on boot
At the moment, we only trigger compaction after creating a new
sstable as a result of memtable flush, or some other event such
as changing compaction strategy of a column family.
However, it's important to trigger compaction on boot too.
That will happen after loading all column families.

Fixes #1404.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <54d38a418157454eec97aaba6b8a6b6e51484db4.1467135349.git.raphaelsc@scylladb.com>
2016-06-29 13:47:42 +03:00
Amnon Heiman
610fe274fd services: Make scylla-jmx service depends on scylla-server
The scylla-jmx no longer shutdown itself. A better setup would be that
the it would be started when the scylla-server starts and that it would
shutdown when the scylla-server shutdown.

This patch do the scylla-server part of the change.

The scylla-server definition would Want the scylla-jmx.service so there
is no need to enable the scylla-jmx.service.

A patch to the scylla-jmx would cause it to shutdown when the scylla-jmx
shutsdown.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1467184502-4358-1-git-send-email-amnon@scylladb.com>
2016-06-29 11:36:04 +03:00
Avi Kivity
2c4501f317 Merge seastar upstream
* seastar c15055c...d4d9e16 (4):
  > semaphore: switch to chunked_fifo
  > fair_queue: add missing include
  > chunked_fifo: implement back()
  > Chunked FIFO queue
2016-06-28 19:30:29 +03:00
Avi Kivity
1b448877d7 Merge " thrift: Implement CQL over thrift" from Duarte
"This patchset implements the CQL over thrift verbs. Only CQL3 is supported,
and the CQL2 verbs are disabled."
2016-06-28 13:36:12 +03:00
Piotr Jastrzebski
59d0d9e666 Fix cache_tracker::clear
Make sure that artificial entries for
all column families are set to non continuous.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f9e517fe40482c05f6c388faab7d6b9eca6b159e.1467103548.git.piotr@scylladb.com>
2016-06-28 11:18:23 +02:00
Piotr Jastrzebski
27575a0528 Fix previous_entry_is_continuous
Rename it to check_previous_entry.
Remove unnesessary test.
Make sure ring_position always has working relation_to_keys method.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6bc790d492ba9b5c302a50218f3e26b924f657d0.1467101754.git.piotr@scylladb.com>
2016-06-28 10:27:08 +02:00
Piotr Jastrzebski
68e5a199e9 Clean continuous flag of cache entry
preceeding invalidated decorated key even
when it's not found.

Add test.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c7b8f4df37256363bf304e0396f84b5f37921b81.1467059472.git.piotr@scylladb.com>
2016-06-28 10:26:02 +02:00
Piotr Jastrzebski
cd9f3f94c4 Fix row_cache::update
Clear continuous flag on the last cache entry
with key smaller than a partition being dropped from
memtable on flush and not saved in cache.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <0b5293cc0bf8bb858e62aa8dd00ae7fe7a484380.1467059472.git.piotr@scylladb.com>
2016-06-28 10:25:38 +02:00
Piotr Jastrzebski
eb959a8b81 Change check for artificial entry
in cache_entry destructor from
_key.has_key() to _lru_link.is_linked()

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f6d3d1bc49d9f6dd5b67a10cbe862466047b039d.1467059472.git.piotr@scylladb.com>
2016-06-28 10:24:29 +02:00
Nadav Har'El
164c760324 Switch compression chunk default from 64 KB to 4 KB
Following Cassandra, our default sstable compression chunk size is 64 KB.
The big downside of this default size is that small reads need to read
and uncompress a large chunk, around 32 KB (if compression halves the data
size). In this patch we switch the default chunk size to 4 KB, which allows
faster small reads (the report in issue #1337 was of a 60-fold speedup...).

Since commit 2f56577, large reads will not be signficantly slowed down by
changing to a small chunk size. The remaining potential downside of this
change is lowering of the compression ratio because of the smaller chunks
individually compressed. However, experimentation shows that the compression
ratio is hurt somewhat, but not dramatically, by lowering the chunk size:
A recent survey of Cassandra compression in
https://www.percona.com/blog/2016/03/09/evaluating-database-compression-methods/
reports a compression ratio of 2 for 64 KB chunks, vs. 1.75 for 4 KB chunks.
My own test on a cassandra-stress workload (whose data is relatively hard
to compress), showed compression ratio 1.25 for 64 KB chunk, vs. 1.23 for
4 KB chunks.

Also remember that if a user wants to control the chunk length for a
particular table, he can - the 64 KB or 4 KB sizes are just the default.

Fixes #1337

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1467063335-12096-1-git-send-email-nyh@scylladb.com>
2016-06-28 08:50:24 +03:00
Tomasz Grabiec
6108d91362 scylla-gdb: Introduce scylla ptr
Helps in identifying pointers allocated through seastar
allocator. Shows to which thread the pointer belongs, to which size
class, whether it's live or free, what's the offset realtive to the
live object.

Example:

  (gdb) scylla ptr 0x6040abe88170
  thread 1, small (size <= 320), live (0x6040abe88140 +48)

Message-Id: <1467047215-1763-1-git-send-email-tgrabiec@scylladb.com>
2016-06-27 20:11:56 +03:00
Avi Kivity
22ec25b1b3 Merge seastar upstream
* seastar 3029ebe...c15055c (5):
  > memory: add option to mlock() all memory
  > reactor: run idle poll handler with a pure poll function
  > ignore all but one failed futures in map_reduce
  > tutorial: more general exception printout on startup
  > resource: don't abort on too-high io queue count

Fixes #1395.
Fixes #1400.
2016-06-27 19:24:04 +03:00
Tomasz Grabiec
85a37cb379 Merge tag '1398/v3' from https://github.com/avikivity/scylla
From Avi:

Both the cql binary transport and the rpc server have protection against
too many concurrent requests overwhelming the database due to transient
allocations.  There work by estimating the amount of memory a request
requires, and accounting that against a semaphore.  When the semaphore
blocks, we stop dequeing requests from the tcp connection.

Unfortunately, this doesn't work for reads, because we can't estimate the
required memory size.  A small read request can require many sstables to be
read, perhaps concurrently, and a large response to be generated.

Fix by limiting the number of concurrent reads in a shard to 100.  This
is more than enough concurrency for any reasonable disk, and there is no
network communication at this level, so we're safe from high network
latency requiring high concurrency.

Fixes #1398.
2016-06-27 18:04:33 +02:00
Avi Kivity
f03cd6e913 db: add statistics about queued reads 2016-06-27 17:25:08 +03:00
Avi Kivity
edeef03b34 db: restrict replica read concurrency
Since reading mutations can consume a large amount of memory, which, moreover,
is not predicatable at the time the read is initiated, restrict the number
of reads to 100 per shard.  This is more than enough to saturate the disk,
and hopefully enough to prevent allocation failures.

Restriction is applied in column_family::make_sstable_reader(), which is
called either on a cache miss or if the cache is disabled.  This allows
cached reads to proceed without restriction, since their memory usage is
supposedly low.

Reads from the system keyspace use a separate semaphore, to prevent
user reads from blocking system reads.  Perhaps we should select the
semaphore based on the source of the read rather than the keyspace,
but for now using the keyspace is sufficient.
2016-06-27 17:17:56 +03:00
Avi Kivity
bea7d7ee94 mutation_reader: introduce restricting_reader
A restricting_reader wraps a mutation_reader, and restricts it concurrency
using a provided semaphore; this allows controlling read concurrency, which
is important since reads can consume a lot of resources ((number of
participating sstables) * 128k after we have streaming mutations, and a lot
more before).
2016-06-27 17:17:52 +03:00
Duarte Nunes
d31b52a07b thrift: Disable CQL2 verbs
And make set_cql_version a no-op.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:39:33 +02:00
Duarte Nunes
60094f4033 thrift: Implement execute_prepared_cql3_query verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:39:28 +02:00
Duarte Nunes
96068084ca thrift: Implement prepare_cql3_query verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:39:22 +02:00
Duarte Nunes
c8afb4cc46 query_processor: Support thrift prepared statements
This patch adds support for thrift prepared statements. It specializes
the result_message::prepared into two types:
result_message::prepared::cql and result_message::prepared::thrift, as
their identifiers have different types.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:39:02 +02:00
Paweł Dziepak
1addbb9c1d thrift: implement execute_cql3_query
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:38:52 +02:00
Duarte Nunes
2e7cb32601 query_options: Adjust value_views after prepare()
query_options::prepare() changes the values array, but this is not the
one used by query_options internally (e.g., in get_value_at). So we
need to also recalculate the value_views after prepare() is called.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:24:27 +02:00
Duarte Nunes
2683a49c69 query_options: Remove value_views arg from ctor
Having both the values and value_views arguments in the query_options
ctor is confusing, since query_options uses only the value_views field
but that is not communicated to the caller.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:24:27 +02:00
Duarte Nunes
62cfc4ab55 thrift: Add with_exn_cob helper function
Similarly to the with_cob functions, this one takes the exn_cob
function and ensures it is called in case of an exception. This
is useful when the return type of the thrift verb is not nothrow
move constructible; by holding on to the cob inside the verb and
calling it directly when we have the result we avoid having to
wrap it in a smart pointer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:24:27 +02:00
Duarte Nunes
b74ee6fdea thrift: Add consistency level conversion
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:24:27 +02:00
Paweł Dziepak
0c441378f2 client_state: support thrift clients
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:24:27 +02:00
Paweł Dziepak
002d2bc353 thrift: pass query_processor to the thrift handler
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:24:27 +02:00
Duarte Nunes
225c5be78e thrift: Add query_state to thrift_handler
This patch adds a query_state object to the thrift handler,
as it is required for CQL3 operations.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-27 15:24:27 +02:00
Avi Kivity
f96e5d7c1b managed_bytes: fix build with gcc 6
gcc 6 complains that deleting a managed_bytes::external isn't defined
because the size isn't known.  I'm not sure it's correct, but there's no
way to tell because flexible arrays aren't standardized.

Fix by using an array of zero size.
Message-Id: <1466715187-4125-1-git-send-email-avi@scylladb.com>
2016-06-27 10:54:10 +02:00
Avi Kivity
056b427855 range_tombstone_list: use non-template lambda for cloning tombstones
Using a template lambda invokes a bug in Fedora 24's boost where the
lambda's parameter is an internal boost type rather than a range_tombestone.

Constraining the parameter with an explicit type avoids the problem.
Message-Id: <1466844211-17298-1-git-send-email-avi@scylladb.com>
2016-06-27 10:48:59 +02:00
Amnon Heiman
a439a6b8d3 API: Add the collectd enable/disable implementation
This adds the implementation to the enable and disable of the collectd
metrics.

An example for disabling all collectd metrics that has write in their
type_instance part:

curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json"
"http://localhost:10000/collectd/.*?instance=.*&type=.*&type_instance=.*write.*&enable=false"

After that a call to:
curl -X GET "http://localhost:10000/collectd/"

Would return those metrics with the enable set to "false"

An example to enable all the metrics in cache that their type starts
with byt:

curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json"
"http://localhost:10000/collectd/cache?type=byt.*&enable=true"

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1466932139-19264-3-git-send-email-amnon@scylladb.com>
2016-06-26 12:26:50 +03:00
Amnon Heiman
4d7837af40 API Definition: collectd to support enable disable
This adds to the definition of the collectd API the ability to turn on
and off specific collectd metrics.

For the GET end point a POST option was added that allow to enable or
disable a metric.

The general GET endpoint now returns the enable flag that indicates if
the metric is enable.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1466932139-19264-2-git-send-email-amnon@scylladb.com>
2016-06-26 12:26:48 +03:00
Duarte Nunes
dfbf68cd24 commitlog: Define operator<< in namespace db
Needed for compilation with gcc6.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>
2016-06-26 10:05:28 +03:00
Avi Kivity
5b81448ed6 main: add scylla --version option
Fixes #1384.
Message-Id: <1466691517-29964-1-git-send-email-avi@scylladb.com>
2016-06-23 16:24:03 +02:00
Duarte Nunes
1ffae6e6ee database_test: Add test case for row limit
This patch introduces database_test and adds a test case to ensure
the row limit is respected when querying multiple partition ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20160623111723.17523-1-duarte@scylladb.com>
2016-06-23 14:20:34 +02:00
Avi Kivity
e647ec1c4a Merge "thrift: Implement describe verbs" from Duarte
"This patchset implements the thrib describe verbs:
    - describe_keyspace
    - describe_keyspaces
    - describe_cluster_name
    - describe_version
    - describe_ring
    - describe_local_ring
    - describe_token_map
    - describe_partitioner
    - describe_snitch
    - describe_schema_versions

The verbs describe_splits and describe_splits_ex are not implemented
because they are marked as experimentail (Origin's thrift interface has
this to say about them: "experimental API for hadoop/parallel query
support. may change violently and without warning."). Some drivers have
moved away from depending on this verb (SPARKC-94). The correct way to
implement the verbs for us would be to use the size_estimates system table
(CASSANDRA-7688). However, we currently don't populate size_estimates, which
is done by SizeEstimatesRecorder.java in Origin."
2016-06-23 13:30:39 +03:00
Duarte Nunes
b291c22e39 thrift: Complete describe_keyspace verb
This patch completes the describe_keyspace verb by adding setting the
remaining fields.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 12:02:47 +02:00
Duarte Nunes
febc48166d thrift: Type name is already based on Origin
This patch removes a conversion function from an internal type
name to Origin's naming, which isn't needed because the
abstract_type hierarchy already keeps that mapping.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 12:02:47 +02:00
Duarte Nunes
8b00fe3989 thrift: Add explanatory note about describe_splits
We don't implement describe_splits, and this patch describes why that
it. In a nutshell, to properly implement this, we would need something
like Origin's SizeEstimatesRecorder.java, but as the verb is marked as
experimental, we don't do it for now.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 12:02:46 +02:00
Duarte Nunes
b175204cfe thrift: Implement describe_snitch verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:47 +02:00
Duarte Nunes
9e6ab878d6 thrift: Implement describe_partitioner verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:47 +02:00
Duarte Nunes
358b03c409 thrift: Implement describe_token_map verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:47 +02:00
Duarte Nunes
1ea7102d9f thrift: Implement describe_ring verbs
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:42 +02:00
Duarte Nunes
8377264226 thrift: Implement describe_version verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:10 +02:00
Duarte Nunes
8370450dcb trhift: Implement describe_cluster_name verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:09 +02:00
Duarte Nunes
2a898743c6 thrift: Implement describe_schema_versions verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:07 +02:00
Pekka Enberg
c464e85a3f Merge "thrift: Implement DDL verbs" from Duarte
"This patchset implements the thrift DDL verbs:
    - system_add_column_family
    - system_drop_column_family
    - system_update_column_family
    - system_add_keyspace
    - system_drop_keyspace
    - system_update_keyspace"
2016-06-23 12:46:58 +03:00
Duarte Nunes
3c02af083c thrift: Implement system_update_keyspace verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:29 +02:00
Duarte Nunes
aa16c303ca thrift: Implement system_drop_keyspace verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:29 +02:00
Duarte Nunes
8ff3fbe916 thrift: Implement system_drop_column_family verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:29 +02:00
Duarte Nunes
f6fab027c6 thrift: Implement system_update_column_family verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
de46653036 thrift: Implement system_add_column_family verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
25a8ffb09a thrift: Extract keyspace_from_thrift function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
74cb796de7 thrift: Extract schema_from_thrift function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
9d85ea6304 thrift: Complete system_add_keyspace verb
This patch completes the system_add_keyspace verb by setting all
relevant options on the new schemas.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
05f7a6d63e thrift: Add basic support for dynamic CF
In thrift, a static column family is one where all columns are
defined upon schema creation. It maps to a CQL table with a singular
partition key and a set of regular columns.

On the other hand, a dynamic column family is one which allows column
to be dynamically added by insertion requests. It maps to a CQL table
with a partition key and a clustering key, which will hold the names of
the dynamic columns, and a regular column, which will how the respective
values. If the thrift comparator type is composite, then there will be a
clustering column for each of the composite's components.

There can also be mixed column families; supporting those is future
work.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
49b8bff21c thrift: Extract make_exception to common header
This patch moves the make_exception function from thrift/handler.cc to
the new header file thrift/utils.hh.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:40:52 +02:00
Benoît Canet
e37b18b231 scylla_ntp_setup: Define an ntp server on ubuntu if there is none
The pool directive from ntp.conf is not recognized by ntpdate.
Strip it and put the ubuntu server in place.

Fixes: #1345

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466607457-14029-1-git-send-email-benoit@scylladb.com>
2016-06-23 12:40:13 +03:00
Benoît Canet
8b6bb0251d README.md: Fix markdown formating
I suspect wrong formatting causes us trouble in the
docker hub descriptions.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466603787-13423-1-git-send-email-benoit@scylladb.com>
2016-06-23 12:39:04 +03:00
Duarte Nunes
aacc7193f2 schema: Replace keyspace's schema_ptr on CF update
This patch ensures we replace the schema_ptr held by its respective
keyspace object when a column family is being updated.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20160623085710.26168-1-duarte@scylladb.com>
2016-06-23 11:11:52 +02:00
Tzach Livyatan
3fa7bb1292 scylla_setup: Ignore case in prompt responses
Fix #1376 by converting each response to lowercase.

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <1466672539-5625-1-git-send-email-tzach@scylladb.com>
2016-06-23 12:08:26 +03:00
Glauber Costa
e08fa7dafa fix potential stale data in cache update
We currently have a problem in update_cache, that can be trigger by ordering
issues related to memtable flush termination (not initiation) and/or
update_cache() call duration.

That issue is described in #1364, and in short, happens if a call to
update_cache starts before and ongoing call finishes. There is now a new SSTable
that should be consulted by the presence checker that is not.

The partition checker operates in a stale list because we need to make sure the
SSTable we just wrote is excluded from it.  This patch changes the partition
checker so that all SSTables currently in use are consulted, except for the one
we have just flushed. That provides both the guarantee that we won't check our
own SSTable and access to the most up-to-date SSTable list.

Fixes #1364

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <fa1cee672bba8e21725c6847353552791225295f.1466534499.git.glauber@scylladb.com>
2016-06-23 10:54:44 +02:00
Pekka Enberg
bcba45f546 Merge "Prevent old node to join new cluster" from Asias
Fixes #1253
2016-06-23 10:25:38 +03:00
Piotr Jastrzebski
9b011bff18 row_cache: add contiguity flag to cache entry to reduce disk IO during scans
Add contiguity flag to cache entry and set it in scanning reader.
Partitions fetched during scanning are continuous
and we know there's nothing between them.

Clear contiguity flag on cache entries
when the succeeding entry is removed.

Use continuous flag in range queries.
Don't go do disk if we know that there's nothing
between two entries we have in cache. We know that
when continuous flag of the first one is set to true.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <72bae432717037e95d1ac9465deaccfa7c7da707.1466627603.git.piotr@scylladb.com>
2016-06-23 09:43:15 +03:00
Avi Kivity
5af22f6cb1 main: handle exceptions during startup
If we don't, std::terminate() causes a core dump, even though an
exception is sort-of-expected here and can be handled.

Add an exception handler to fix.

Fixes #1379.
Message-Id: <1466595221-20358-1-git-send-email-avi@scylladb.com>
2016-06-23 09:25:33 +03:00
Avi Kivity
a192c80377 gdb: fully-qualify type names
gdb gets confused if a non-fully-qualified class name is used when
we are in some namespace context.  Help it out by adding a :: prefix.
Message-Id: <1466587895-8690-1-git-send-email-avi@scylladb.com>
2016-06-22 12:04:17 +02:00
Avi Kivity
9dacd4fb80 Merge "query: Add new limits" from Duarte
This patchset adds two new types of query limits:

 - Per partition row limit, which limits how many rows
   a given partition may return; needed both for thrift
   and for future CQL features;
 - Limit on the number of partitions returned, needed
   by thrift.
2016-06-22 11:03:13 +03:00
Duarte Nunes
82dbf5bff3 storage_proxy: Trace when retrying a query
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:48:15 +02:00
Duarte Nunes
69798df95e query: Limit number of partitions returned
This is required to implement a thrift verb.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:48:13 +02:00
Duarte Nunes
594e43a60a compact_query: Rename partition_limit
This patch renames compact_query::_partition_limit to
_current_partition_limit for clarity, as the next patch adds
a partition limit that limits the number of partitions.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:47:29 +02:00
Duarte Nunes
e9ebd87991 compact_query: Rename limit to row_limit
This patch renames compact_query::_limit to _row_limit for
clarity, as a subsequent patch introduces yet another limit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:47:28 +02:00
Duarte Nunes
01b18063ea query: Add per-partition row limit
This patch as a per-partition row limit. It ensures both local
queries and the reconciliation logic abide by this limit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:46:51 +02:00
Duarte Nunes
20d9813a89 storage_proxy: Fetch last replica row just in time
This patch changes the way we fetch each replica's last row to
determine if we got incomplete information from any of them. Instead
of fetching the last rows up front, we fetch them on demand only if we
actually trigger the code that needs them. We now get the last row from
the versions vector of vectors.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 00:15:38 +02:00
Duarte Nunes
4ce9fc24cb storage_proxy: Extract finding last row
This patch extracts to a function the code that actually determines
the last row of a partition based on the direction of the query.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 00:15:38 +02:00
Takuya ASADA
73ba4ac337 dist: drop sudoers.d from .rpm, since systemd moved to PermissionsStartOnly
Since systemd moved to PermissionsStartOnly, only upstart uses sudoers.
So move common/sudoers.d to dist/ubuntu, drop them from .rpm.
Also, Ubuntu 15.10/16.04 does not requires sudoers since these are uses systemd.
So copy sudoers only for 14.04.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1466536491-9860-1-git-send-email-syuu@scylladb.com>
2016-06-21 22:59:18 +03:00
Glauber Costa
4e81f19ab5 LSA: fix typo in region merge
There are many potentially tricky things about referring to different regions
from the LSA perspective. Madness, however, is not one of them. I can only
assume we meant made?

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <8eb81f35de4b208a494e43cb392eea07b87b2bf1.1466534798.git.glauber@scylladb.com>
2016-06-21 22:58:44 +03:00
Benoît Canet
8e4dee0bd1 scylla_setup: Hide /dev/loop*
The user probably don't want to use
/dev/loop* as RAID devices.

Fixes: #1259

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466520602-7888-1-git-send-email-benoit@scylladb.com>
2016-06-21 19:27:40 +03:00
Tzach Livyatan
27b99f47e8 scylla_setup: improve the wording of disk setup phase.
Fix #1197 by adding XFS related info to the interactive prompt

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <1466504625-28926-1-git-send-email-tzach@scylladb.com>
2016-06-21 19:26:31 +03:00
Avi Kivity
96ebc4e7b5 Merge seastar upstream
* seastar 401c333...3029ebe (3):
  > util: add a seastar::value_of() helper function
  > rpc: force closing listen fd on server stop
  > reactor: fix I/O priority class id assignment
2016-06-21 15:11:26 +03:00
Tomasz Grabiec
597cbbdedc Merge branch 'pdziepak/streamed-mutations/v5' from seastar-dev.git
From Paweł:

This series introduces streaming_mutations which allow mutations to be
streamed between the producers and the consumers as a series of
mutation_fragments. Because of that the mutation streaming interface
works well with partitions larger than available memory provided that
actual producer and consumer implementations can support this as well.

mutation_fragments are the basic objects that are emitted by
streamed_mutations they can represent a static row, a clustering row,
the beginning and the end of a range tombstone. They are ordered by their
clustering keys (with static rows being always the first emitted mutation
fragment). The beginning of range tombstone is emitted before any
clustering row affected by that tombstone and the end of range tombstone
is emitted after the last clustering row affected by it. Range tombstones
are disjoint.

In this series all producers are converted to fully support the new
interface, that includes cache, memtables and sstables. Mutation queries
and data queries are the only consumers converted so far.

To minimize the per-mutation_fragment overhead streamed_mutations use
batching. The actual producer implementation fills a buffer until
it is full (currently, buffer size is 16, the limit should, however,
be changed to depend on the actual size in memory of the stored elements)
or end of stream is reached.

In order to guarantee isolation of writes reads from cache and memtable
use MVCC. When a reader is created it takes a snapshot of the particular
cache or memtable entry. The snapshot is immutable and if there happen
to be any incoming writes while the read is active a new version of
partition is created. When the snapshot is destroyed partition versions
are merged together as much as possible.

Performance results with perf_simple_query (median of results with
duration 15):

         before        after          diff
write    618652.70     618047.58      -0.10%
read     661712.44     608070.49      -8.11%
2016-06-21 12:15:21 +02:00
Pekka Enberg
11dd20d640 Revert "ami: Change type from EBS to Instance"
This reverts commit 2d7f8f4a47.

Avi sayeth:

"Isn't this the other way round?  EBS is persistent."

and

"The patch is wrong too.  Instance store takes 5 minutes to boot
compared to 1 minute for EBS."
2016-06-21 12:41:30 +03:00
Tomasz Grabiec
e783b58e3b Merge branch 'glommer/LSA-throttler-v6' from git@github.com:glommer/scylla.gi
From Glauber:

This is my new take at the "Move throttler to the LSA" series, except
this one don't actually move anything anywhere: I am leaving all
memtable conversion out, and instead I am sending just the LSA bits +
LSA active reclaim. This should help us see where we are going, and
then we can discuss all memtable changes in a series on its own,
logically separated (and hopefully already integrated with virtual
dirty).

[tgrabiec: trivial merge conflicts in logalloc.cc]
2016-06-21 10:22:26 +02:00
Calle Wilund
2b812a392a commitlog_replayer: Fix calculation of global min pos per shard
If a CF does not have any sstables at all, we should treat it
as having a replay position of zero. However, since we also
must deal with potential re-sharding, we cannot just set
shard->uuid->zero initially, because we don't know what shards
existed.

Go through all CF:s post map-reduce, and for every shard where
a CF does not have an RP-mapping (no sstables found), set the
global min pos (for shard) to zero.

Fixes #1372

Message-Id: <1465991864-4211-1-git-send-email-calle@scylladb.com>
2016-06-21 10:05:05 +03:00
Benoît Canet
2d7f8f4a47 ami: Change type from EBS to Instance
Instance types does not have ephemeral drive that disapear on reboot.

Fixes #1229

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466443232-5898-1-git-send-email-benoit@scylladb.com>
2016-06-21 09:56:26 +03:00
Calle Wilund
88ffe60138 batchlog_manager: Change replay mutation CL to ALL
Try to emulate the origin behaviour for batch reply. They use an
explicit write handler, combinging

1.) Hinting to all known dead endpoints
2.) Sending to all persumed live, requiring ack from all
3.) Hinting to endpoint to which send failed.

We don't have hints, so try to work around by doing send with
cl=ALL, and if send fails (wholly or partially), retain the
batch in the log.

This is still slight behavioural difference, and we also risk
filling up the batch log in extreme cases. (Though probably not
in any real environment).

Refs #1222

Message-Id: <1466444170-23797-1-git-send-email-calle@scylladb.com>
2016-06-21 09:41:09 +03:00
Glauber Costa
7f29cb8aba tests: add logalloc tests for pressure notification
tests to make sure varios scenarios of pressure notification for active
asynchronous reclaim work.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:58:39 -04:00
Glauber Costa
8f5047fc5f tests: add tests to new region_group throttle interface
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:51:00 -04:00
Glauber Costa
579d121db8 LSA: export largest region
We now keep the regions sorted by size, and the children region groups as well.
Internally, the LSA has all information it needs to make size-based reclaim
decisions. However, we don't do reclaim internally, but rather warn our user
that a pressure situation is mounted.

The user of a region_group doesn't need to evict the largest region in case of
pressure and is free to do whatever it chooses - including nothing. But more
likely than not, taking into account which region is the largest makes sense.

This patch puts together this last missing piece of the puzzle, and exports the
information we have internally to the user.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:51:00 -04:00
Glauber Costa
35f8a2ce2c LSA: add a backpointer to the region from its private data
Region is implemented using the pimpl pattern (region_impl), and all its
relevant data is present in a private structure instead of the region itself.

That private structure is the one that the other parts of the LSA will refer to,
the region_group being the prime example. To allow classes such as the
region_group the externally export a particular region, we will introduce a
backpointer region_impl -> region.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:59 -04:00
Glauber Costa
38a402307d LSA: enhance region_group reclaimer
We are currently just allowing the region_group to specify a throttle_threshold,
that triggers throttling when a certain amount of memory is reached. We would
like to notify the callers that such condition is reached, so that the callers
can do something to alleviate it - like triggering flushes of their structures.

The approach we are taking here is to pass a reclaimer instance. Any user of a
region_group can specialize its methods start_reclaiming and stop_reclaiming
that will be called when the region_group becomes under pressure or ceases to
be, respectively.

Now that we have such facility, it makes more sense to move the
throttle_threshold here than having it separately.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:59 -04:00
Glauber Costa
6404028c6a LSA: move subgroups to a heap as well
When we decide to evict from a specific region_group due to excessive memory
usage, we must also consider looking at each of their children (subgroups). It
could very well be that most of memory is used by one of the subgroups, and
we'll have to evict from there.

We also want to make sure we are evicting from the biggest region of all, and
not the biggest region in the biggest region_group. To understand why this is
important, consider the case in which the regions are memtables associated with
dirty region groups. It could be that a very big memtable was recently flushed,
and a fairly small one took its place. That region group is still quite large
because the memtable hasn't finished flushing yet, but that doesn't mean we
should evict from it.

To allow us to efficiently pick which region is the largest, each root of each
subtree will keep track of its maximal score, defined as the maximum between our
largest region total_space and the maximum maximal score of subtrees.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:13 -04:00
Glauber Costa
e1eab5c845 LSA: store regions in a heap for regions_group
Currently, the regions in a region group are organized in a simple vector.
We can do better by using a binomial heap, as we do for segments, and then
updating when there is change. Internally to the LSA, we are in good position
to always know when change happens, so that's really the best way to do it.

The end game here, is to easily call for the reclaim of the largest offending
region (potentially asynchronously). Because of that, we aren't really interested
in the region occupancy, but in the region reclaimable occuppancy instead: that's
simply equal to the occupancy if the region is reclaimable, and 0 otherwise. Doing
that effectively lists all non reclaimable regions in the end of the heap, in no
particular order.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:13 -04:00
Glauber Costa
54d4d46cf7 LSA: move throttling code to LSA.
The database code uses a throttling function to make sure that memory
used for the dirty region never is over the limit. We track that with
a region group, so it makes sense to move this as generic functionality
into LSA.

This patch implements the LSA-side functionality and a later patch will
convert the current memtable throttler to use it.

Unlike the current throttling mechanism, we'll not use a timer-based
mechanism here. Aside from being more generic and friendlier towards
other users, this is a good change for current memtable by itself.

The constants - 10ms and 1MB chosen by the current throttler are arbitrary, and we
would be better off without them. Let's discuss the merits of each separately:

1) 10ms timer: If we are throttling, we expect somebody to flush the memtables
for memory to be released. Since we are in position to know exactly when a memtable
was written, thus releasing memory, we can just call unthrottle at that point, instead
of using a timer.

2) 1MB release threshold: we do that because we have no idea how much memory a request
will use, so we put the cut somehow. However, because of 1) we don't call unthrottle
through a timer anymore, and do it directly instead. This means that we can just execute
the request and see how much memory it has used, with no need to guess. So we'll call
unthrottle at the end of every request that was previously throttled.

Writing the code this way also has the advantage that we need one less continuation in
the common case of the database not being throttled.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:34:19 -04:00
Paweł Dziepak
6f25533f4e mutation_query: drop querying_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:31:52 +01:00
Paweł Dziepak
ed12c164f8 mutation_query: make mutation queries streaming-friendly
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:31:28 +01:00
Paweł Dziepak
0828c88b25 mutation_partition: implement streaming-friendly data_query()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:31:19 +01:00
Paweł Dziepak
67ae9457e3 mutation_partition: introduce mutation_querier
mutation_querier is a streamed_mutation consumer that adds the mutation
content to query::result.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:53 +01:00
Paweł Dziepak
f54e604a16 mutation_partition: introduce compact_for_query
compact_for_query is an intermediate stage used to compact data in a
flattened stream of mutations before they are consumed by query building
consumers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:53 +01:00
Paweł Dziepak
2b7e62599d mutation_reader: add consume_flattened()
Mutation reader produces a stream of streamed_mutations. Each
streamed_mutation itself is a stream so basically we are dealing here
with a stream of streams.

consume_flattened() flattens such stream of streams making all its
elements consumable by a single consumer. It also allows reversing
the mutations before consumption using reverse_streamed_mutation().
2016-06-20 21:29:52 +01:00
Paweł Dziepak
5566d23180 streamed_mutation: add reverse_streamed_mutation()
reverse_streamed_mutation() is an inefficient way of reversing
streamed_mutations. First, it collects all mutation_fragments and then
it emits them in the reversed orders (except static row which always is
the first element and it also flips the bounds of range tombstones).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
f676d1779b range_tombstone: add flip_bound_kind()
flip_bound_kind() changes start bound to end bound and vice versa while
preserving the inclusivness.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
a3423bac38 tests/streamed_mutation: test freezing streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
6e68f0931e frozen_mutation: freeze streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
349905d0fd range_tombstone_list: add clear()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
494c6fa9c1 tests/mutation_query_test: make sure mutations are sliced properly
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
8dfabf2790 mutation_reader: support slicing in make_reader_returning_many()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
6871bd5fa0 memtable: fully support streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:52 +01:00
Paweł Dziepak
983321f194 tests/mutation: do not create memtable on stack
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
4a5a9148e3 tests/row_cache: test slicing mutation reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
e1a8d94542 tests/row_cache: test mvcc
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
b2c37429e7 row_cache: drop slicing_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
f605499aec row_cache: fully support streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
e4ae7894d4 tests/mutation: test slicing mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
f95c5542dc mutation_partition: allow slicing moved mutation_partition
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
db5ea591ad add mvcc implementation for mutation_partitions
To ensure isolation of operation when streaming a mutation from a
mutable source (such as cache or memtable) MVCC is used.

Each entry in memtable or cache is actually a list of used versions of
that entry. Incoming writes are either applied directly to the last
verion (if it wasn't being read by anyone) or preprended to the list
(if the former head was being read by someone). When reader finishes it
tries to squash versions together provided there is no other reader that
could prevent this.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
b2e6e95de7 clustering_key_filter: always return ranges in ascending order
Originally, ranges for reversed queries were in descending order and
ranges for forward queries in ascending order. However,
streamed_mutations require them to always be in ascending order.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
2ab1a73efa memtable: rename partition_entry to memtable_entry
partition_entry is going to be a more general object used by both
cache and memtable entries.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
4992ea9949 tests: add test for anchorless_list
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
dfa827161d utils: add anchorless list
The main user of this list is MVCC implementation in partition_version.cc.
The reason why boost::intrusive::list<> cannot be used is that tere is no
single owner of the list who could keep boost::intrusive::list<> object
alive. In the MVCC case there is at least one partition_entry object and
possibly multiple partition_snapshot objects which lifetime is independent
and the list must remain in a valid state as long as at least one of them
is alive.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
f991a2deb5 tests/row_cache_alloc_stress: use another memtable for underlying storage
It is incorrect to update row_cache with a memtable that is also its
underlying storage. The reason for that is that after memtable is merged
into row_cache they share lsa region. Then when there is a cache miss
it asks underlying storage for data. This will result with memtable
reader running under row_cache allocation section. Since memtable reader
also uses allocation section the result is an assertion fault since
allocation sections from the same lsa region cannot be nested.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
5a5c519fa0 tests/row_cache_alloc_stress: use large cells instead of many rows
With streamed_mutations a partition with many small rows doesn't stress
the cache as much as the test expects. Use large clustering rows instead.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
71e961427a test/sstables: test reading sstables with incorrect ordering
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
2ee69860d2 sstables: make sstable reader produce streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
e82cc68196 streamed_mutation: add range_tombstone_stream
range_tombstone_stream encapsulates logic responsible for turning
range_tombstone_list into a stream of mutation_fragments and merging
that stream with a stream of clustering rows.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
a200189541 range_tombstone_list: mark apply() argument as const
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
5a60f6d1ec range_tombstone: extract is_single_clustering_row_tombstone()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
b6f78a8e2f sstable: make sstable reads return streamed_mutation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
9e8db53c46 sstables: allow row consumer to stop at any point
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
125c4e20e2 tests/sstables: add test for sliced mutation reads
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
71088b4f4a sstables: fix partition slicing for row markers and collections
Row markers and collections weren't filtered out even if they belonged
to a clustering row that shouldn't be in the result. The check whether
to include cell or not was done only for live and dead atomic cells.

This patch adds appropriate checks for collections and row markers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
575daea897 sstables: make deletion_time to tombstone cast safer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
7074b439d8 mutation_reader: do not ask for mutation before current is consumed
mutation_reader and streamed_mutation may use the same stream as a source
mutation_fragments and mutations themselves (this happens in sstable reader).
In such case asking for next streamed_mutation from mutation_reader would
invalidate all other streamed_mutations.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
737eb73499 mutation_reader: make readers return streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
52a0b405f8 tests/row_cache: simplify verify_has()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
fec3346343 tests: add streamed_mutation assertions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
11f43a8e91 tests/sstable: drop sstable_range_wrapping_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
5b45d46f82 row_cache: simplify slicing_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
9c83eb9542 mutation_reader: drop joining and lazy readers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
579de26e95 storage_proxy: drop make_local_reader()
This code was used only by its unit test.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
c8f4b96e76 tests: add streamed_mutation_tests
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
a1fc5888d3 streamed_mutation: add mutation_merger
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
48e08fa997 mutation: add mutation_from_streamed_mutation()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
9df01c2a36 streamed_mutation: add streamed_mutation_from_mutation()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
22160ae6d5 mutation_partition: make rows_type public
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
675f684788 streamed_mutation: introduce streamed_mutation
streamed_mutation represents a mutation in a form of a stream of
mutation_fragments. streamed_mutation emits mutation fragments in the
order they should appear in the sstables, i.e. static row is always
the first one, then clustering rows and range tombstones are emitted
according to the lexicographical ordering of their clustering keys and
bounds of the range tombstones.

Range tombstones are disjoint, i.e. after emitting
range_tombstone_begin it is guaranteed that there is going to be a
single range_tombstone_end before another range_tombstone_begin is
emitted.

The ordering of mutation_fragments also guarantees that by the time
the consumer sees a clustering row it has already received all
relevant tombstones.

Partition key and partition tombstone are not streamed and is part of
the streamed_mutation itself.

streamed_mutation uses batching. The mutation implementations are
supposed to fill a buffer with mutation fragments until is_buffer_full()
or the end of stream is encountered.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
262337768a streamed_mutation: introduce mutation_fragment
This commit introduces mutation_fragment class which represents the parts
of mutation streamed by streamed_mutation.

mutation_fragment can be:
 - a static row (only one in the mutation)
 - a clustering row
 - start of range tombstone
 - end of range rombstone

There is an ordering (implemented in position_in_partition class) between
mutation_fragment objects. It reflects the order in which content of
partition appears in the sstables.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
84713d2236 utils: extract optimized_optional<> from mutation_opt
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
847bf878ec mutation_partition: add more row::apply() overloads
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
7809adc6ce keys: add compound_wrapper::tri_compare
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
c24f08a683 range_tombstone_list: compare full tombstones not just timestamps
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
df4c1c6293 range_tombstone: simplify bound_view::equal()
Bounds are equal only if they are of the same kind. No need to check
weights.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
a6aceb179d range_tombstone: fix bound ordering
Assuming the clustering keys are equal:
  excl_end < incl_start < incl_end < excl_start.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
3a0e76d635 range_tombstone: check for adjacent instead of equal bounds
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Nadav Har'El
3372052d48 Rewriting shared sstables only after all shards loaded sstables
After commit faa4581, each shard only starts splitting its shared sstables
after opening all sstables. This was important because compaction needs to
be aware of all sstables.

However, another bug remained: If one shard finishes loading its sstables
and starts the splitting compactions, and in parallel a different shard is
still opening sstables - the second shard might find a half-written sstable
being written by the first shard, and abort on a malformed sstable.

So in this patch we start the shared sstable rewrites - on all shards -
only after all shards finished loading their sstables. Doing this is easy,
because main.cc already contains a list of sequential steps where each
uses invoke_on_all() to make sure the step completes on all shards before
continuing to the next step.

Fixes #1371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>
2016-06-20 16:25:24 +03:00
Calle Wilund
7cdea1b889 commitlog: Use flush queue for write/flush ordering, improve batch
Using an ordering mechanism better than rw-locks for write/flush
means we can wait for pending write in batch mode, and coalesce
data from more than one mutation into a chunk.

It also means we can wait for a specific read+flush pair (based on
file position).

Downside is that we will not do parallel writes in batch mode (unless
we run out of buffer), which might underutilize the disk bandwidth.

Upside is that running in batch mode (i.e. per-write consistency)
now has way better bandwidth, and also, at least with high mutation
rate, better average latency.

Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>
2016-06-20 13:09:16 +03:00
Benoît Canet
77375cefaa docker: normalize environment variables names
Use a more docker like form.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466414939-5019-1-git-send-email-benoit@scylladb.com>
2016-06-20 12:30:13 +03:00
Benoît Canet
4c7ac4cab7 docker: implement seeds and broadcast_address variables
Implement the seeds and broadcast_address variable
required for clustering behavior.

Do it raw with sed in the startup script.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466412846-4760-3-git-send-email-benoit@scylladb.com>
2016-06-20 11:55:03 +03:00
Benoît Canet
fd811c90fc docker: Complete the missing part of production mode
Scylla will not start if the disk was not benchmarked
so start run io_tune with the right parameters.

Also add the cpu_set environment variables for passing
cpu set to iotune and scylla.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466412846-4760-2-git-send-email-benoit@scylladb.com>
2016-06-20 11:54:54 +03:00
Pekka Enberg
1d5f7be447 systemd: Use PermissionsStartOnly instead of running sudo
Use the PermissionsStartOnly systemd option to apply the permission
related configurations only to the start command. This allows us to stop
using "sudo" for ExecStartPre and ExecStopPost hooks and drop the
"requiretty" /etc/sudoers hack from Scylla's RPM.

Tested-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1466407587-31734-1-git-send-email-penberg@scylladb.com>
2016-06-20 11:53:24 +03:00
Vlad Zolotarov
baf3614e8f sstables: don't backup sstables that are a result of a compaction
According to incremental backup description
(http://docs.datastax.com/en/cassandra_win/2.2/cassandra/operations/opsBackupIncremental.html)
sstables that are a result of a compaction process should not
be backed up since original sstables had already been backed up.

Fixes #1308

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1466338622-7323-1-git-send-email-vladz@cloudius-systems.com>
2016-06-20 09:52:30 +03:00
Pekka Enberg
f4153c75a0 cql3: Bump CQL language version to 3.2.1
We already added 3.2.1 support in commit 569d288 ("cql3: Add TRUNCATE
TABLE alias for TRUNCATE") but never got around fixing the CQL version
reported to drivers.

Fixes #1358.

Message-Id: <1466403967-28654-1-git-send-email-penberg@scylladb.com>
2016-06-20 09:42:12 +03:00
Avi Kivity
07045ffd7c dist: fix scylla-kernel-conf postinstall scriptlet failure
Because we build on CentOS 7, which does not have the %sysctl_apply macro,
the macro is not expanded, and therefore executed incorrectly even on 7.2,
which does.

Fix by expanding the macro manually.

Fixes #1360.
Message-Id: <1466250006-19476-1-git-send-email-avi@scylladb.com>
2016-06-20 09:36:39 +03:00
Lucas Meneghel Rodrigues
ae622b0c08 dist/common/scripts/scylla_kernel_check: Update messages
Small grammar tweaks to the script's output messages.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
Message-Id: <1466205496-3885-3-git-send-email-lmr@scylladb.com>
2016-06-19 19:28:58 +03:00
Lucas Meneghel Rodrigues
aacf7eb2ae dist/common/scripts/scylla_kernel_check: Fix conditional statement
Since most of the time people are running scylla_setup on
a fully upgraded ubuntu 14.04 box, we rarely reach that
code path, but once we do we end up with an error. Let's
fix that.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
Message-Id: <1466205496-3885-2-git-send-email-lmr@scylladb.com>
2016-06-19 19:28:56 +03:00
Nadav Har'El
faa45812b2 Rewrite shared sstables only after entire CF is read
Starting in commit 721f7d1d4f, we start "rewriting" a shared sstable (i.e.,
splitting it into individual shards) as soon as it is loaded in each shard.

However as discovered in issue #1366, this is too soon: Our compaction
process relies in several places that compaction is only done after all
the sstables of the same CF have been loaded. One example is that we
need to know the content of the other sstables to decide which tombstones
we can expire (this is issue #1366). Another example is that we use the
last generation number we are aware of to decide the number of the next
compaction output - and this is wrong before we saw all sstables.

So with this patch, while loading sstables we only make a list of shared
sstables which need to be rewritten - and the actual rewrite is only started
when we finish reading all the sstables for this CF. We need to do this in
two cases: reboot (when we load all the existing sstables we find on disk),
and nodetool referesh (when we import a set of new sstables).

Fixes #1366.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466344078-31290-1-git-send-email-nyh@scylladb.com>
2016-06-19 16:50:51 +03:00
Paweł Dziepak
dde87e0b0e row_cache: drop schema upgrade for new entries in update()
Commit daad2eb "row_cache: fix memory leak in case of schema upgrade
failure" has fixed a memory leak caused by failed upgrade_entry().
However, in case of upgrade failure memtable_entry used to create the
new cache entry was left in some invalid state. If the operation was
retried the cache would attempt again to apply that memtable_entry which
now would be in invalid state.

The solution is to either to ignore upgrade_entry() exceptions or do not
call it at all and let the cache entry be upgraded on demand. This patch
implements the latter.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466163435-27367-1-git-send-email-pdziepak@scylladb.com>
2016-06-17 13:43:01 +02:00
Paweł Dziepak
daad2ebf81 row_cache: fix memory leak in case of schema upgrade failure
When update() causes a new entry to be inserted to the cache the
procedure is as follows:
1. allocate and construct new entry
2. upgrade entry schema
3. add entry to lru list and cache tree

Step 2 may fail and at this point the pointer to the entry is neither
protected by RAII nor added in any of the cache containers. The solution
is to swap steps 2 and 3 so that even if the upgrade fails the entry is
already owned by the cache and won't leak.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466161709-25288-1-git-send-email-pdziepak@scylladb.com>
2016-06-17 13:12:01 +02:00
Asias He
4f3ce42163 storage_service: Prevent old version node to join a new version cluster
We want to prevent older version of scylla which has fewer features to
join a cluster with newer version of scylla which has more features,
because when scylla sees a feature is enabled on all other nodes, it
will start to use the feature and assume existing nodes and future nodes
will always have this feature.

In order to support downgrade during rolling upgrade, we need to support
mixed old and new nodes case.

1) All old nodes
O O O O O <- N   OK
O O O O O <- O   OK

2) All new nodes
N N N N N <- N   OK
N N N N N <- O   FAIL

3) Mixed old and new nodes
O N O N O <- N   OK
O N O N O <- O   OK

(O == old node, N == new node, <- == joining the cluster)

With this patch, I tested:

1.1) Add new node to new node cluster
gossip - Feature check passed. Local node 127.0.0.4 features =
{RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES}

1.2) Add old node to old node cluster
gossip - Feature check passed. Local node 127.0.0.4 features = {},
Remote common_features = {}

2.1) Add new node to new node cluster
gossip - Feature check passed. Local node 127.0.0.4 features =
{RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES}

2.2) Add old node to new node cluster
seastar - Exiting on unhandled exception: std::runtime_error (Feature
check failed. This node can not join the cluster because it does not
understand the feature. Local node 127.0.0.4 features = {}, Remote
common_features = {RANGE_TOMBSTONES})

3.1) Add new node to mixed cluster
gossip - Feature check passed. Local node 127.0.0.4 features =
{RANGE_TOMBSTONES}, Remote common_features = {}

3.2) Add old node to mixed cluster
gossip - Feature check passed. Local node 127.0.0.4 features = {},
Remote common_features = {}

Fixes #1253
2016-06-17 10:49:45 +08:00
Asias He
32ed468e42 gossip: Remove empty string feature in get_supported_features
If the feature string is empty, boost::split will return
std::set<sstring> = {""} instead of std::set<sstring> = {}
which will make a node with a feaure, e.g. std::set<sstring> =
{"RANGE_TOMBSTONES"}, think it does not understand the feature of
a node with no features at all.
2016-06-17 10:49:45 +08:00
Gleb Natapov
4659800ab9 storage_proxy: implement custom speculative retry strategy
User may specify time after which speculative retry should happen
instead of relying on cf statics. Use provided value in speculative
executor.

Message-Id: <20160616104422.GH5961@scylladb.com>
2016-06-16 13:45:56 +03:00
Pekka Enberg
d72c608868 service/storage_service: Make do_isolate_on_error() more robust
Currently, we only stop the CQL transport server. Extract a
stop_transport() function from drain_on_shutdown() and call it from
do_isolate_on_error() to also shut down the inter-node RPC transport,
Thrift, and other communications services.

Fixes #1353
2016-06-16 13:34:09 +03:00
Avi Kivity
85bb5ea064 Merge "Reduce LSA reclaim latency" from Tomasz
"Reclaiming many segments was observed to cause up to multi-ms
latency. With the new setting, the latency of reclamation cycle with
full segments (worst case mode) is below 1ms.

I saw no difference in throughput in a CQL write micro benchmark
in neither of these workloads:
 - full segments, reclaim by random eviction
 - sparse segments (3% occupancy), reclaim by compaction and no eviction

Fixes #1274."
2016-06-16 10:47:57 +03:00
Pekka Enberg
a8f95e8081 dist/docker: Use Scylla superpackage for installation
Make the Dockerfile more future-proof by using the Scylla superpackage
for installation.

Message-Id: <1466015996-19792-1-git-send-email-penberg@scylladb.com>
2016-06-16 10:32:18 +03:00
Benoît Canet
c133748a24 scylla_setup: Fix RAID device enumeration
Commit f42673ed1e ("scylla_setup: Hide
busy block devices from RAID0 configuration") wasn't enumerating
anything.  Additionally it listed from /dev/ and not /dev/dm which broke
the tests conditions.

This one uses blkid instead of /proc/partitions.

A follow up patch will be required to mask encrypted devices.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466059657-12377-1-git-send-email-benoit@scylladb.com>
2016-06-16 09:52:25 +03:00
Glauber Costa
01a658f51d LSA: helper function for region_group
current hierarchy walk converted, but more users will come.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Glauber Costa
741aa16748 LSA: allow a region_group to have a threshold for throttling specified
Allocations will still be allowed if made directly, but callers will have the
choice (in an upcoming patch) to proceed only if memory is below this threshold.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Glauber Costa
7cd0c0731e region_group: delete move constructor
Tomek correctly points out that since we are now using "this" in lambda
captures, we should make the region_group not movable. We currently define a
move constructor, but there are no users. So we should just remove them.

copy constructor is already deleted, and so are the copy and move assignment
operators. So by removing the move constructor, we should be fine.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Benoît Canet
0cf8144485 scylla_setup: Propose defaults values when judicious
Also takes care of explaining the options.

Fixes #1031

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466011848-11054-1-git-send-email-benoit@scylladb.com>
2016-06-15 20:33:55 +03:00
Benoît Canet
263a55c0da scylla_setup: Inform the user that he can skip any step
Fixes: #1188

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466001423-9547-3-git-send-email-benoit@scylladb.com>
2016-06-15 19:38:23 +03:00
Benoît Canet
f42673ed1e scylla_setup: Hide busy block devices from RAID0 configuration
This patch look in /proc/mount for the device name so
the device or it's subdevices will be excluded from the availables
RAID0 targets. It does the same with physical volume from device
mapper.

Fixes #1189
Message-Id: <1466001423-9547-4-git-send-email-benoit@scylladb.com>
2016-06-15 19:36:11 +03:00
Paweł Dziepak
c8e75d2e84 schema: cache is_atomic() in column_definition
is_atomic() is called for each cell in mutation applies, compaction
and query. Since the value doesn't change it can be easily cached which
would save one indirection and virtual call.

Results of perf_simple_query -c1 (median, duration 60):
         before      after
read   54611.49   55396.01   +1.44%
write  65378.92   68554.25   +4.86%

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1465991045-11140-1-git-send-email-pdziepak@scylladb.com>
2016-06-15 19:18:13 +03:00
Benoît Canet
4def1f4524 dist: sysctl.d: Disable automatic numa balancing
On NUMA hardware, autonuma may reduce performance by
unmapping memory.

Since we do manual NUMA placement, autonuma will not
help anything.

We ought to disable it by setting the kernel.numa_balancing
sysctl to 0.

Fixes: #1120

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466006345-9972-1-git-send-email-benoit@scylladb.com>
2016-06-15 19:11:00 +03:00
Gleb Natapov
7f54333c45 storage_proxy: fix complication on older boost
boost before 1.56.0 had broken boost:size() implementation. Do not use
it.

Message-Id: <20160615123134.GD5961@scylladb.com>
2016-06-15 15:34:57 +03:00
Asias He
de0fd98349 repair: Switch log level to warn instead of error
dtest takes error level log as serious error. It is not a serious error
for streaming to fail to send a verb and fail a streaming session which
triggers a repair failure, for example, the peer node is gone or
stopped. Switch to use log level warn instead of level error.

Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

Fixes: #1335
Message-Id: <406fb0c4a45b81bd9c0aea2a898d7ca0787b23e9.1465979288.git.asias@scylladb.com>
2016-06-15 13:01:35 +03:00
Asias He
94c9211b0e streaming: Switch log level to warn instead of error
dtest takes error level log as serious error. It is not a serious error
for streaming to fail to send a verb and fail a streaming session, for
example, the peer node is gone or stopped. Switch to use log level warn
instead of level error.

Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

Fixes: #1335
Message-Id: <0149d30044e6e4d80732f1a20cd20593de489fc8.1465979288.git.asias@scylladb.com>
2016-06-15 13:01:22 +03:00
Vlad Zolotarov
c616e74ae4 locator::gossiping_property_file_snitch: use a lowres_clock time source for a timer
gossiping_property_file_snitch checks a configuration file every 60s.
lowres_clock clock source should be good enough for that.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465314448-11611-1-git-send-email-vladz@cloudius-systems.com>
2016-06-15 13:01:05 +03:00
Tomasz Grabiec
207c8d94f1 idl: Rename variable to a more meaningful name
Message-Id: <1465909911-10534-2-git-send-email-tgrabiec@scylladb.com>
2016-06-14 17:02:59 +03:00
Raphael S. Carvalho
80d8c5ef6f compaction: use proper type in constructor
Correctness is not affected due to long type, but an unsigned
long type should be definitely used instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d3ab15a3206306de195aeb3d78f9b5bc4ca9208e.1465908970.git.raphaelsc@scylladb.com>
2016-06-14 17:02:32 +03:00
Tomasz Grabiec
8e8f63de85 mutation_partition_view: Avoid unnecessary copy into temporary
Message-Id: <1465909038-8174-1-git-send-email-tgrabiec@scylladb.com>
2016-06-14 17:02:17 +03:00
Tomasz Grabiec
75f899cc93 lsa: Make reclamation step configurable via config 2016-06-14 15:13:15 +02:00
Tomasz Grabiec
cd9955d2ce lsa: Reclaim 1 segment by default
Reclaiming many segments was observed to cause up to multi-ms
latency. With the new setting, the latency of reclamation cycle with
full segments (worst case mode) is below 1ms.

I saw no decrease in throughput compared to the step of 16 segments in
neither of these modes:
  - full segments, reclaim by random evicition
  - sparse segments (3% occupancy), reclaim by compaction and no eviction

Fixes #1274.
2016-06-14 15:13:15 +02:00
Tomasz Grabiec
86b76171a8 lsa: Use the same step in both internal and external reclamations 2016-06-14 15:13:15 +02:00
Tomasz Grabiec
d74d902a01 lsa: Make reclamation step configurable 2016-06-14 15:13:14 +02:00
Tomasz Grabiec
93bb95bd0d lsa: Log reclamation rate 2016-06-14 15:13:14 +02:00
Tomasz Grabiec
cb18418022 lsa: Print more details before aborting 2016-06-14 15:13:14 +02:00
Tomasz Grabiec
7cb98c916f tests: lsa_async_eviction_test: Push to refs with reclaim lock
push_back() is not reentrant with pop_front(), used by the evictor. If
reclaimer runs when std::deque allocates a new node it will get
corrupted. Fix by runnning push_back() under reclaim lock.
2016-06-14 15:13:14 +02:00
Tomasz Grabiec
de8772525a tests: lsa_async_eviction_test: Make sure refs scope encloses reclaimer scope 2016-06-14 15:13:14 +02:00
Tomasz Grabiec
c4a556ac13 tests: lsa_async_eviction_test: Fix use after free due to at_exit() callback
The callback will run after thread is destroyed. We don't really need
the stop feature, so for now just remove it.
2016-06-14 15:13:14 +02:00
Pekka Enberg
155ad2eeb5 storage_service: Fix start_rpc_server() to use logger
Message-Id: <1465882880-7392-1-git-send-email-penberg@scylladb.com>
2016-06-14 09:52:04 +02:00
Raphael S. Carvalho
0b2cd41daf database: remember sstable level when cleaning it up
Cleanup operation wasn't preserving level of sstables. That will have
a bad impact on performance because compaction work is lost.

Fixes #1317.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <35ce8fbbb4590725bb0414e6a5450fcbe6cb7212.1465843387.git.raphaelsc@scylladb.com>
2016-06-14 08:06:00 +03:00
Vlad Zolotarov
d3960f0bbb tracing: rearrange shut down
tracing::tracing local instance is dereferenced from a
cql_server::connection::process_request(), therefore tracing::tracing
service may be stop()ed only after a CQL server service is down.
On the other hand it may not be stopped before RPC service is down
because a remote side may request a tracing for a specific command too.

This patch splits the tracing::tracing stop() into two phases:
   1) Flush all pending tracing records and stop the backend.
   2) Stop the service.

The first phase is called after CQL server is down and before RPC is down.
The second phase is called after RPC is down.

Fixes #1339

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>
2016-06-14 07:58:04 +03:00
Avi Kivity
49449fc30c Merge seastar upstream
* seastar 864d6dc...401c333 (8):
  > scollectd: Support filtering specific collectd metrics
  > core: Integrate error reporting with the logging framework
  > rpc: wait for all replies to be completed before closing rpc server
  > rpc: clean up resource accounting
  > queue: fix race between pop_eventually() and abort()
  > rpc_test: fix cancel test to not depend on timing.
  > tutorial: explain application-specific command line options
  > add ostream output operator for std::unordered_map
2016-06-13 19:35:00 +03:00
Gleb Natapov
e089166cfa storage_proxy: wait only for expected CL when writing back data during read repair
When read repair writes diffs back to replicas it is enough to wait
for requested CL to guaranty read monotonicity. This patch makes read
repair write reuse regular mutate functionality which already tracks
CL status. This is done by changing write response handler to not hold
mutation directly, but instead hold a container that, depending on
whether
this is read repair write or regular one, can provide different mutation
per destination.

Message-Id: <20160613124727.GL1096@scylladb.com>
2016-06-13 19:01:51 +03:00
Duarte Nunes
c896309383 database: Actually decrease query_state limit
query_state expects the current row limit to be updated so it
can be enforced across partition ranges. A regression introduced
in e4e8acc946 prevented that from
happening by passing a copy of the limit to querying_reader.

This patch fixes the issue by having column_family::query update
the limit as it processes partitions from the querying_reader.

Fixes #1338

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1465804012-30535-1-git-send-email-duarte@scylladb.com>
2016-06-13 10:03:27 +02:00
Avi Kivity
465c0a4ead Merge "Make stronger guarantees in row_cache's clear/invalidate" from Tomasz
"Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.

To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.

Fixes #1291."
2016-06-13 09:55:29 +03:00
Shlomi Livne
ac6f2b5c13 dist/common: Update scylla_io_setup to use settings done in cpuset.conf
scylla_io_setup is searching for --smp and --cpuset setting in
SCYLLA_ARGS. We have moved the settings of this args into
/etc/scylla.d/cpuset.conf and they are set by scylla_cpuset_setup into
CPUSET.

Fixes: #1327

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <2735e3abdd63d245ec96cfa1e65f766b1c12132e.1465508701.git.shlomi@scylladb.com>
2016-06-10 09:37:44 +03:00
Vlad Zolotarov
89375d4c2a service::storage_proxy: tracing: instrument read_digest and read_mutation_data
Instrument read_digest and read_mutation_data handlers similarly
to a read_data handler instrumentation.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465304055-4263-1-git-send-email-vladz@cloudius-systems.com>
2016-06-09 14:32:42 +02:00
Pekka Enberg
8df5aa7b0c utils/exceptions: Whitelist EEXIST and ENOENT in should_stop_on_system_error()
There are various call-sites that explicitly check for EEXIST and
ENOENT:

  $ git grep "std::error_code(E"
  database.cc:                            if (e.code() != std::error_code(EEXIST, std::system_category())) {
  database.cc:            if (e.code() != std::error_code(ENOENT, std::system_category())) {
  database.cc:        if (e.code() != std::error_code(ENOENT, std::system_category())) {
  database.cc:                            if (e.code() != std::error_code(ENOENT, std::system_category())) {
  sstables/sstables.cc:            if (e.code() == std::error_code(ENOENT, std::system_category())) {
  sstables/sstables.cc:            if (e.code() == std::error_code(ENOENT, std::system_category())) {

Commit 961e80a ("Be more conservative when deciding when to shut down
due to disk errors") turned these errors into a storage_io_exception
that is not expected by the callers, which causes 'nodetool snapshot'
functionality to break, for example.

Whitelist the two error codes to revert back to the old behavior of
io_check().
Message-Id: <1465454446-17954-1-git-send-email-penberg@scylladb.com>
2016-06-09 10:03:04 +02:00
Pekka Enberg
02d033667a utils: Improve storage_io_exception error message
Make storage_io_exception exception error message less cryptic by
actually including the human-readable error message from
std::system_error...

Before:

  nodetool: Scylla API server HTTP POST to URL '/storage_service/snapshots' failed: Storage io error errno: 2

After:

  nodetool: Scylla API server HTTP POST to URL '/storage_service/snapshots' failed: Storage I/O error: 2: No such file or directory

We can improve this further by including the name of the file that the I/O
error happened on.
Message-Id: <1465452061-15474-1-git-send-email-penberg@scylladb.com>
2016-06-09 09:58:00 +02:00
Tomasz Grabiec
d5a2d7a88d row_cache: Add eviciton and removal counters
Fixes #1273.

Message-Id: <1465315433-8473-1-git-send-email-tgrabiec@scylladb.com>
2016-06-08 16:08:32 -04:00
Nadav Har'El
721f7d1d4f Rewrite shared sstables soon after startup
Several shards may share the same sstable - e.g., when re-starting scylla
with a different number of shards, or when importing sstables from an
external source. Sharing an sstable is fine, but it can result in excessive
disk space use because the shared sstable cannot be deleted until all
the shards using it have finished compacting it. Normally, we have no idea
when the shards will decide to compact these sstables - e.g., with size-
tiered-compaction a large sstable will take a long time until we decide
to compact it. So what this patch does is to initiate compaction of the
shared sstables - on each shard using it - so that a soon as possible after
the restart, we will have the original sstable is split into separate
sstables per shard, and the original sstable can be deleted. If several
sstables are shared, we serialize this compaction process so that each
shard only rewrites one sstable at a time. Regular compactions may happen
in parallel, but they will not not be able to choose any of the shared
sstables because those are already marked as being compacted.

Commit 3f2286d0 increased the need for this patch, because since that
commit, if we don't delete the shared sstable, we also cannot delete
additional sstables which the different shards compacted with it. For one
scylla user, this resulted in so much excessive disk space use, that it
literally filled the whole disk.

After this patch commit 3f2286d0, or the discussion in issue #1318 on how
to improve it, is no longer necessary, because we will never compact a shared
sstable together with any other sstable - as explained above, the shared
sstables are marked as "being compacted" so the regular compactions will
avoid them.

Fixes #1314.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1465406235-15378-1-git-send-email-nyh@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-06-08 15:44:29 -04:00
Raphael S. Carvalho
1b8e170254 compaction: retry compaction until strategy is satisfied
Previously, we were using a stat to decide if compaction should be
retried, but that's not efficient. The information is also lost
after node is restarted.

After these changes, compaction will be retried until strategy is
satisfied, i.e. there is nothing to compact.
We will now be doing the following in a loop:
Get compaction job from compaction strategy.
	If cannot run, finish the loop.
	Otherwise, compact this column family.
Go back to start of the loop.

By the way, pending_compactions stat will be deprecated after this
commit. Previously, it was increased to indicate the want for
compaction and decreased when compaction finished. Now, we can
compact more than we asked for, so it would be decreased below 0.
Also, it's the strategy that will tell the want for compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <899df0d8d807f6b5d9bb8600d7c63b4e260cc282.1465398243.git.raphaelsc@scylladb.com>
2016-06-08 11:31:56 -04:00
Avi Kivity
7bd4b7ca63 cql3: split use_statement into raw and prepared variants
Rather than having one class fulfil both roles, have one class per role,
disentangling dependencies.
Message-Id: <1465053407-20931-1-git-send-email-avi@scylladb.com>
2016-06-08 16:48:45 +03:00
Yoav Kleinberger
43071bf488 tools/scyllatop: handle absentee metrics
Sometimes a metric previously reported from collectd is not available
anymore. Previously, this caused scyllatop to log and exception to the
user - which in effect destroyes the user experience and inhibits
monitoring other metrics. This patch makes ScyllaTop handle this
problem. It will display such metrics and 'not available', and exclude
them from some and average computations.

Closes issue #1287.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1465301178-27544-1-git-send-email-yoav@scylladb.com>
2016-06-08 16:35:55 +03:00
Vlad Zolotarov
24624b2600 tests/gossiping_property_file_snitch_test: cancel O_DIRECT enforcement
Cancel O_DIRECT enforcement on shard 0 (a default I/O shard for this snitch)
to ensure proper functioning on any FS (e.g. ecryptfs).

Otherwise tests fails on file systems not supporting O_DIRECT.

Fixes #1324

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465385087-20510-1-git-send-email-vladz@cloudius-systems.com>
2016-06-08 16:21:44 +03:00
Tomasz Grabiec
287ff7dbd3 Merge tag 'ms_update/v2' from seastar-dev.git
From Asias:

In f27e5d2a6 (messaging_service: Delay listening ms during boot up),
messaging_service startup is splitted into two stages. Adjust the api
registration code and fix up the messaging_service stop code.
2016-06-08 10:25:14 +02:00
Paul McGuire
8326fe7760 Clean up idl-compiler pyparsing usage
This patch makes a few minor improvements in the parser:

  - merge first and rest into 2-argument form of Word to define
    identifier – should give some performance boost, simpler code

  - replace Literal(keyword_string) with Keyword(keyword_string)
    throughout - stricter parsing, avoids misinterpreting identifiers
    with keywords

  - replace expr.setResultsName("name") with expr("name") throughout –
    this is a style change (no actual change in underlying parser
    behavior), but I find this form easier to follow

  - add calls to setName to make exceptions more readable

Message-Id: <005901d1bbd2$711f7bb0$535e7310$@austin.rr.com>
2016-06-08 08:13:05 +03:00
Asias He
b36d3be5d4 messaging_service: Fix messaging_service::stop
There are two problems:

1. _server_tls is not stopped

2. _server and _server_tls might not be created if
messaging_service::start_listen is not called yet.
2016-06-08 11:13:36 +08:00
Asias He
e6f63a50e1 main: Delay the messaging_service api registration
Since messaging_service is fully initialized in
storage_service::init_server which calls
messaging_service::start_listen, we need to delay
the messaging_service api registration after it.
2016-06-08 11:13:35 +08:00
Asias He
f7d25e6bae messaging_service: Handle _server is not created in foreach_server_connection_stats
It is possible _server is not created yet when
foreach_server_connection_stats is called. Handle this case.
2016-06-08 11:13:35 +08:00
Gleb Natapov
9635e67a84 config: adjust boost::program_options validator to work with db::string_map
Fixes #1320

Message-Id: <20160607064511.GX9939@scylladb.com>
2016-06-07 10:42:27 +03:00
Amnon Heiman
2cf882c365 rate_moving_average: mean_rate is not initilized
The rate_moving_average is used by timed_rate_moving_average to return
its internal values.

If there are no timed event, the mean_rate is not propertly initilized.
To solve that the mean_rate is now initilized to 0 in the structure
definition.

Refs #1306

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1465231006-7081-1-git-send-email-amnon@scylladb.com>
2016-06-07 09:38:58 +03:00
Vlad Zolotarov
ce08bc611c tracing: fix debug compilation
Define flush_period as a const and not as constexpr.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465240516-20128-1-git-send-email-vladz@cloudius-systems.com>
2016-06-06 22:15:27 -04:00
Avi Kivity
e25380347a Merge "tracing: probabilistic tracing" from Vlad
"This series includes some fixes to and adds a probabilistic tracing feature."
2016-06-06 11:25:18 -04:00
Benoît Canet
b508aaf0d9 docker: Add the production environment variable
This variable if set to true will activate
developer mode. It will be set by using the
-e option of docker run.

The xfs bind mount behavior and the cpuset behavior
will be set by using the relevant docker command
lines options and documented in the scylla/docker
howto.

Fixes: #1267

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1465213713-2537-1-git-send-email-benoit@scylladb.com>
2016-06-06 16:28:17 +03:00
Benoît Canet
c771854120 docker: Start scylla on ubuntu docker
Make it behave on par with redhat version

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1465218003-2740-1-git-send-email-benoit@scylladb.com>
2016-06-06 16:27:03 +03:00
Vlad Zolotarov
0611417c76 api::storage_service: add set_trace_probability/get_trace_probability
Trace probability defines a probability for the next CQL command
to be traced.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 15:44:28 +03:00
Vlad Zolotarov
905190ac06 tracing: add support for probabilistic tracing
Add a support for defining a probability (a value in a [0,1] range)
for tracing the next CQL request.

Traces for requests that are chosen to be traced due to this feature
are not going to flushed immediately.

Use std::subtract_with_carry_engine (implements the "lagged Fibonacci" algorithm)
random number engine for fastest generation of random integer values.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 15:41:01 +03:00
Vlad Zolotarov
779ff88c76 tracing: add flush timer
Flush pending sessions to the storage every 2 seconds.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 14:34:08 +03:00
Tomasz Grabiec
170a214628 row_cache: Make stronger guarantees in clear/invalidate
Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.

To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.
2016-06-06 13:21:06 +02:00
Vlad Zolotarov
4b008ac5ea tracing: rework maximum sessions amount back pressure strategy
A tracing session life cycle includes 3 stages:
   1) Active: when new trace records are being added to this session.
   2) Pending for flushing to a storage: when session is over but not
      yet flushed to the storage ("backend").
   3) Flushing: when session's records are being flushed to the storage
      and this process is not yet completed.

Sessions may accumulate in each of the stages above and we should limit
the maximum amount of sessions being accumulated in each of them in order to avoid OOM
situation.

Current in-tree implementation only limits the number of tracing sessions
accumulated in the first ("Active") stage.

Since currently every closing session is being immediately flushed (as long
as "settraceprobability" is not implemented) the second stage never accumulates
tracing sessions.

The third stage is currently not controlled at all and if, for instance, we
succeed to push enough tracing session towards a slow storage backend, they may
accumulate there consuming an uncontrolled amount of memory and may eventually consume
all of it.

This patch fixes this unpleasant situation by implying the following strategy:

   - Limit the total amount of accumulated tracing sessions in all stages above together
     by a static value - 2 times "flush threshold". "2 times" is needed to allow new
     tracing sessions to accumulate in the stage 2 while sessions in the stage 3 are still
     being  processed.
   - Forcefully flush sessions in the stage 2 to the storage when their count reaches a "flush
     threshold".

This would ensure that there will not more than totally (2 * "flush threshold") sessions (in any stage)
on each shard.

An advantage of this strategy is its simplicity - we only need a single threshold to control all stages.
If we feel that we needed a finer graining for each stage we may add separate limits for each of them
in the future.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 13:50:41 +03:00
Pekka Enberg
b407031401 Update scylla-ami submodule
* dist/ami/files/scylla-ami 72ae258...863cc45 (3):
  > Move --cpuset/--smp parameter settings from scylla_sysconfig_setup to scylla_ami_setup
  > convert scylla_install_ami to bash script
  > 'sh -x -e' is not valid since all scripts converted to bash script, so remove them
2016-06-06 13:37:21 +03:00
Vlad Zolotarov
35402b965f service/client_state: don't try to dereference a tracing state if it's not initialized
Call for a tracing::tracing::create_session() doesn't promise a session creation.
Check that the session is actually created before trying to use it.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 13:00:31 +03:00
Vlad Zolotarov
139fa9d1bd tracing: minor cleanups
- Make small functions on a fast path "inline".
   - Add "const" qualifier where needed.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 13:00:31 +03:00
Avi Kivity
961e80ab74 Be more conservative when deciding when to shut down due to disk errors
Currently we only shut down on EIO.  Expand this to shut down on any
system_error.

This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM).  We may whitelist certain errors in the future
to improve the behavior.

Fixes #1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>
2016-06-06 10:56:34 +02:00
Raphael S. Carvalho
17b56eb459 compaction: leveled: improve log message for overlapping table
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2dcbe3c8131f1d88a3536daa0b6cdd25c6e41d76.1464883077.git.raphaelsc@scylladb.com>
2016-06-05 18:20:01 +03:00
Raphael S. Carvalho
588ce915d6 compaction: disable parallel compaction for leveled strategy
It was discussed that leveled strategy may not benefit from parallel
compaction feature because almost all compaction jobs will have similar
size. It was also found that leveled strategy wasn't working correctly
with it because two overlapping sstable (targetting the same level)
could be created in parallel by two ongoing compaction.

Fixes #1293.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com>
2016-06-05 18:20:00 +03:00
Amnon Heiman
5f84e55bf6 histogram: total need to be increment on plus operator
The total counter (the one that count the actual number of sample
points) should be incremented when adding histograms.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1464172277-4251-1-git-send-email-amnon@scylladb.com>
2016-06-05 12:09:36 +03:00
Tomasz Grabiec
2ab18dcd2d row_cache: Implement clear() using invalidate()
Reduces code duplication.
2016-06-03 13:34:40 +02:00
Tomasz Grabiec
57413618e8 Merge branch 'range-tombstone-v9' from https://github.com/duarten/scylla.git
From Duarte:

This patchset adds the range_tombstone_list data structure,
used to hold a set of disjoint range tombstones, and changes
the internal representation of row tombstones to use that
data structure.

Fixes #1155

[tgrabiec: Added compound_wrapper::make_empty(const schema&) overload
	   to fix compilation failure in tracing code]
2016-06-02 22:17:17 +02:00
Raphael S. Carvalho
3f4500cb71 db: compaction strategy changes via alter table must have immediate effect
At the moment, compaction strategy changes via ALTER TABLE have no effect until
node restart.

Tomek says: "Statements of the following form should have immediate effect:
ALTER TABLE t WITH compaction = { 'class' : 'LeveledCompactionStrategy' };"

Fixes #877.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3b72c494f887643b82a272ef0a9995edb970382c.1464726828.git.raphaelsc@scylladb.com>
2016-06-02 16:59:50 +02:00
Pekka Enberg
d03f65d94e database: Don't use std::cbegin() and std::cend()
They're not supported by GCC 4.9.

Fixes #1305
Message-Id: <1464877984-27856-1-git-send-email-penberg@scylladb.com>
2016-06-02 16:57:24 +02:00
Duarte Nunes
c970d682d1 storage_service: Announce range tombstones feature
This patch enables the RANGE_TOMBSTONES supported feature, meaning
that the node is capable of accepting row entry tombstones as range
tombstones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
70083efee2 sstables: Read and write range tombstone bounds
This patch uses the composite_marker to add inclusiveness information
to the prefixes of a range tombstone.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
7628e403a3 sstables: Drop code for tombstone merging
Since Scylla now supports proper range tombstones, the code for
reading ranges from sstables and converting them to overlapping
tombstones is no longer necessary, and is, in fact, wasteful as
the internal representation converts overlapping tombstones back to
ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
79bff2742f random_mutation_generator: Generate range tombstones
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
95594b8171 mutations: Encapsulate row tombstones difference
This patch moves the difference between two mutation_partition's
row_tombstones inside the range_tombstone_list.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
91aac30f12 mutations: Row tombstones are now a set of ranges
This patch changes the type of the mutation partition's row_tombstones
to be a range_tombstone_list, so that they are now represented as a
set of disjoint ranges. All of its usages are updated accordingly.

Fixes #1155

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
e46537b7d3 storage_service: Include range tombstones feature
This patch adds the range tombstones feature, which is not enabled
yet, to the storage_service, so that consumers can query for it.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
17a544c4a6 gossip: Add feature default ctor and operator=
This allows a feature to be declared and initialized later.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
2c82dcd309 gossip: Decouple feature lifetime from the gossiper
This patch changes the gms::feature destructor so it
checks whether the gossiper has been stopped before trying
to unregister the feature.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
351aaf9738 range_tombstone: Introduce range_tombstone_to_prefix_tombstone_converter
This patch extracts the code from sstables/partition.cc which is used
to transform a set of range tombstones into a set of overlapping
scylladb tombstones.

The range_tombstone_merger will be used to send mutations to nodes not
yet updated to support the internal range tombstone representation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
f7809bcaef range_tombstone_list: Add unit test
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
284bb6b66f range_tombstone_list: Make it ReversiblyMergeable
This patch implements the ReversiblyMergeable cancellative monoid
for the range_tombstone_list.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
86030885c8 mutations: Introduce range tombstone list
This class is responsible for representing a set of range tombstones
as non-overlapping disjoint sets of range tombstones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
6a111fdd01 mutations: Introduce the range_tombstone class
This patch introduces the range_tombstone class, composed of
a [start, end] pair of clustering_key_prefixes, the type
of inclusiveness of each bound, and a tombstone.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
dc8319ed91 keys: Remove schema argument from make_empty
An empty key is independent of the schema.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:36 +02:00
Duarte Nunes
7f8c35dd8c idl: Add range tombstone IDL
This patch adds the range tombstone IDL, preserving backwards
compatibility.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:36 +02:00
Duarte Nunes
9bd7d08fc7 idl-compiler: Default expr can refer to previous fields
This patch changes the idl-compiler so that the default value of a
field can be set to the value of a previous field in the class:

class P {
    uint32_t x;
    uint32_t y = x;
};

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:36 +02:00
Duarte Nunes
e2812c1b7a idl: Rename range_tombstone::key to start
... and make it a clustering_key_prefix, in preparation of
supporting not-whole-row range tombstones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:36 +02:00
Pekka Enberg
f64c25a495 cql3/statements/select_statement: Unify coding style
The coding style in select_statement.cc is very inconsistent which makes
the code hard to read. Clean that up.
Message-Id: <1464871790-21031-1-git-send-email-penberg@scylladb.com>
2016-06-02 16:17:21 +02:00
Avi Kivity
6da0449fc7 tests: adjust config_test for db::string_map changes 2016-06-02 14:48:02 +03:00
Gleb Natapov
9132604a90 config: make string_map to be a unique type instead of an alias to unordered_map
Config provides operators << >> for string_map which makes it impossible
to have generic stream operators for unordered_map. Fix it by making
string_map a separate type and not just an alias.

Message-Id: <20160602102642.GJ9939@scylladb.com>
2016-06-02 13:28:40 +03:00
Asias He
96463cc17c streaming: Fix indention in do_send_mutations
Message-Id: <bc8cfa7c7b29f08e70c0af6d2fb835124d0831ac.1464857352.git.asias@scylladb.com>
2016-06-02 11:56:03 +03:00
Asias He
206955e47c streaming: Reduce memory usage when sending mutations
Limit disk bandwidth to 5MB/s to emulate a slow disk:
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.write_bps_device
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.read_bps_device

Start scylla node 1 with low memory:
scylla -c 1 -m 128M --auto-bootstrap false

Run c-s:
taskset -c 7 cassandra-stress write duration=5m cl=ONE -schema
'replication(factor=1)' -pop seq=1..100000  -rate threads=20
limit=2000/s -node 127.0.0.1

Start scylla node 2 with low memory:
scylla -c 1 -m 128M --auto-bootstrap true

Without this patch, I saw std::bad_alloc during streaming

ERROR 2016-06-01 14:31:00,196 [shard 0] storage_proxy - exception during
mutation write to 127.0.0.1: std::bad_alloc (std::bad_alloc)
...
ERROR 2016-06-01 14:31:10,172 [shard 0] database - failed to move
memtable to cache: std::bad_alloc (std::bad_alloc)
...

To fix:

1. Apply the streaming mutation limiter before we read the mutation into
memory to avoid wasting memory holding the mutation which we can not
send.

2. Reduce the parallelism of sending streaming mutations. Before we send each
range in parallel, after we send each range one by one.

   before: nr_vnode * nr_shard * (send_info + cf.make_reader memory usage)

   after: nr_shard * (send_info + cf.make_reader memory usage)

We can at least save memory usage by the factor of nr_vnode, 256 by
default.

In my setup, fix 1) alone is not enough, with both fix 1) and 2), I saw
no std::bad_alloc. Also, I did not see streaming bandwidth dropped due
to 2).

In addition, I tested grow_cluster_test.py:GrowClusterTest.test_grow_3_to_4,
as described:

https://github.com/scylladb/scylla/issues/1270#issuecomment-222585375

With this patch, I saw no std::bad_alloc any more.

Fixes: #1270

Message-Id: <7703cf7a9db40e53a87f0f7b5acbb03fff2daf43.1464785542.git.asias@scylladb.com>
2016-06-02 11:01:58 +03:00
Gleb Natapov
1476becd28 config: put operators << and >> into db namespace
Makes ADL find the right version of the overload.

Message-Id: <20160601130952.GJ2381@scylladb.com>
2016-06-02 10:45:01 +03:00
Pekka Enberg
b6b2c84316 Merge "CQL tracing" from Vlad
"This series introduces a tracing infrastructure that may be used
for tracing CQL commands execution and measuring latencies of separate
stages of CQL handling as defined by a CQL binary protocol specification.

To begin tracing one should create a "tracing session", which may then
be used to issuing tracing events.

If execution of a specific CQL command involves other Nodes (not only a Coordinator),
then a "tracing session ID" is passed to that Node (in the context of the
corresponding RPC call). Then this "session ID" may be used to create a
"secondary tracing session" to issue tracing events in the context of the original session.

The series contains an implementation of tracing that uses a keyspace in the current
cluster for storing tracing information.

This series contains a demo per-request tracing instrumentation of a QUERY
CQL command and even this instrumentation is partial: it only fully instruments
a QUERY->SELECT->read_data call chain.

This is by all means a very beginning of the proper instrumentation which is
to come.

Right now the latencies for a single SELECT for a single raw with RF 1 from a 2 Nodes cluster
on my laptop started using ccm (for C* all default parameters, for scylla - memory 256MB, --smp 2)
are as follows (pseudo-graphics warning):
--------------------------------------------------------------------------------------------
                                       | scylla (2 Nodes x 2 shards each)  |     C* 2.1.8
_______________________________________|___________________________________|________________
Coordinator and replica are same Node  |                                   |
(TRACING OFF):                         |                0.3ms              |     0.3ms
c-s with a single thread mean latency  |      (was 0.2ms before the last   |
value                                  |       rebase with a master)       |
--------------------------------------------------------------------------------------------
Coordinator and replica are same Node  |                                   |
(TRACING ON)                           |                ~250us             |     ~1200us
Running a SELECT command from a cqlsh  |                                   |
a few times                            |                                   |
--------------------------------------------------------------------------------------------
Coordinator and replica are not on the |                                   |
same Node                              |                ~700us             |     >2500us
(TRACING ON)                           |                                   |
--------------------------------------------------------------------------------------------

To begin tracing one may use a cqlsh "TRACING ON/OFF" commands:

cqlsh> TRACING ON
Now Tracing is enabled
cqlsh> select "C0", "C1" from keyspace1.standard1  where key=0x12345679;

 C0                 | C1
--------------------+------
 0x000000000001e240 | null

(1 rows)

Tracing session: 146f0180-21e7-11e6-b244-000000000000

 activity                                                          | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------+----------------------------+-----------+----------------
 select "C0", "C1" from keyspace1.standard1  where key=0x12345679; | 2016-05-24 22:38:24.536000 | 127.0.0.1 |              0
                              message received from /127.0.0.1 [0] | 2016-05-24 22:38:24.537000 | 127.0.0.2 |             --
                                          Done reading options [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |              3
                                    read_data handling is done [0] | 2016-05-24 22:38:24.537000 | 127.0.0.2 |             37
                                           Parsing a statement [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |              3
                                        Processing a statement [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |             56
                          Done processing - preparing a result [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |            550
                                                  Request complete | 2016-05-24 22:38:24.536560 | 127.0.0.1 |            560

cqlsh>"
2016-06-02 08:35:33 +03:00
Avi Kivity
c7953897d1 build: remove obsolete log.cc dependency 2016-06-01 22:35:07 +03:00
Vlad Zolotarov
69bd8efc40 storage_proxy: instrument a read_data handler to accept a tracing info
This is a demo instrumentation:
   - Check if a tracing info is present in the read_command.
   - If yes - create a tracing session with the given tracing
     session ID.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:17:25 +03:00
Vlad Zolotarov
4c17a422e0 cql3: instrument a SELECT query to send tracing info
Instrument a coordinator of a SELECT query to send tracing session
info to the corresponding replica Nodes.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:17:25 +03:00
Vlad Zolotarov
6e26909b02 query::read_command: add an optional trace_info field
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:17:19 +03:00
Vlad Zolotarov
a53d329b25 tracing: add a serializable trace_info object
tracing::trace_info is used to pass the tracing information between nodes.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:16:53 +03:00
Vlad Zolotarov
099ff0d2d5 transport: instrument a QUERY with tracing
- Store a trace state inside a client_state.
   - Start tracing in a cql_server::connection::process_query().

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:14:29 +03:00
Vlad Zolotarov
f994e0a8d0 transport/server: add support for sending a tracing session ID in a CQL response
- Add a tracing ID (UUID) optional field to cql_server::response.
   - If _tracing_id is set make_frame() would insert a tracing ID
     in the response message. According to CQL spec it should be the
     first thing in the response "body" and the TRACING bit (0x02) should be
     set in the "flags" field.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:53 +03:00
Vlad Zolotarov
9e61a3498d cql_server::response: rework make_frame()
Use a template function to avoid code duplication.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:53 +03:00
Vlad Zolotarov
8bf34fca02 service::client_state: store a client address
When client_state is created with an external_tag - store
a client address in the client state.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:53 +03:00
Vlad Zolotarov
c58c56bccc gms::inet_address: add a constructor from socket_address
Currently only IPv4 addresses are supported.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:53 +03:00
Vlad Zolotarov
63c724c41d service::client_state: make private fields actually private
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:53 +03:00
Vlad Zolotarov
4b43b08ffc main: start a tracing service
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:53 +03:00
Vlad Zolotarov
c965528a03 tracing: add a trace_state and tracing classes
trace_state: Is a single tracing session.
tracing:     A sharded service that contains an i_trace_backend_helper instance
             and is a "factory" of trace_state objects.

trace_state main interface functions are:
   - begin(): Start time counting (should be used via tracing::begin() wrapper).
   - trace(): Create a tracing event - it's coupled with a time passed since begin()
              (should be used via tracing::trace() wrapper).
   - ~trace_state(): Destructor will close the tracing session.

"tracing" service main interface function is:
   - start(): Initialize a backend.
   - stop():  Shut down a backend.
   - create_session(): Creates a new tracing session.

(tracing::end_session(): Is called by a trace_state destructor).

When trace_state needs to store a tracing event it uses a backend helper from
a "tracing" service.

A "tracing" service limits a number of opened tracing session by a static number.
If this number is reached - next sessions will be dropped.

trace_state implements a similar strategy in regard to tracing events per singe
session.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:42 +03:00
Vlad Zolotarov
fa14ad3a99 service/client_state: don't allow modification of a system_trace KS
Only users with enough permissions are allowed to modify system_trace KS.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:12:19 +03:00
Vlad Zolotarov
d3988a8113 tracing::trace_keyspace_helper: a keyspace based i_tracing_backend_helper implementation
Uses a CQL keyspace system_traces to store tracing information.

Uses two tables:

CREATE TABLE system_traces.sessions (
session_id uuid,
command text,
client inet,
coordinator inet,
duration int,
parameters map<text, text>,
request text,
started_at timestamp,
PRIMARY KEY ((session_id)))

and

CREATE TABLE system_traces.events (
session_id uuid,
event_id timeuuid,
activity text,
source inet,
source_elapsed int,
thread text,
PRIMARY KEY ((session_id), event_id))

system_traces.sessions table contains records of tracing sessions.
system_traces.sessions columns description:
   - session_id:  an ID of the session.
   - command:     type of a command this session was created for
                  (currently supported "NONE", "QUERY" and "REPAIR").
   - client:      IP of the client that issued the command.
   - coordinator: IP of a coordinator that received the command.
   - duration:    total duration of the tracing session (in us).
   - parameters:  optional parameters for this session, passed to
                  i_trace_state::begin() call.
   - request:     a CQL command this tracing session is created for.
   - started_at:  the time the session has been started at.

system_traces.events contains records of separate tracing events.
system_traces.events columns description:
   - session_id:     an ID of the session.
   - event_id:       an ID of the event.
   - activity:       the trace point description - a message given to
                     i_trace_state::trace().
   - source:         IP of the Node where trace event was issued.
   - source_elapsed: time passed since creation of a tracing session (in us) on
                     the Node where this trace event was issued.
   - thread:         name of the thread in who's context this trace event was
                     issued in (currently its "core N", where 'N' is an index of
                     a shard the trace event was issued on).

This class will cache lambdas creating the corresponding mutations for each tracing
record requested to be stored till flush() method is called.

flush() will merge all pending mutations to "sessions" and "events" tables and
then apply a mutation to "events" table and when it completes - to "sessions"
table. This way it'll ensure that when some tracing session is visible, all its
events are visible too.

trace_keyspace_helper exposes a few metrics via collectd:
   - tracing_error - a total number of errors (not including OOM)
   - bad_column_family_errors - number of times a tracing record wasn't
                                stored because system_trace tables' schema
                                didn't match the expected value. This may happen if
                                a DB administrator is doing funny things like altering
                                the schemas of the above tables.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:12:19 +03:00
Vlad Zolotarov
a2994ffd7f tracing: add i_tracing_backend_helper interface
This class represents an interface for a specific backend that is
going to store tracing information.

The specific implementation may and expected to implement caching
of pending tracing records.

Interface functions are:
   - start(): Initialize a backend (e.g. create keyspace and tables).
   - stop():  Flush all pending work and shut down the backend.
   - store_session_record()/store_event_record():
              Cache/store the corresponding tracing records.
   - flush(): Flush pending tracing records.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:12:13 +03:00
Gleb Natapov
91c773fdde storage_proxy: fix writes_attempts counter
writes_attempts suppose to count how many time data was sent out, but
currently it counts even those replicas in other DCs that get the data
through a coordinator. Fix it by counting only when data is actually sent.

Message-Id: <20160601153124.GB9939@scylladb.com>
2016-06-01 18:46:23 +03:00
Avi Kivity
8dcbddc7ed Merge "Serialize memtable flushes" from Glauber
"One of the things we need to do as part of the throttle rework I am doing is to
serialize memtable flushes to some extent - that will guarantee that in case
we're throttling, the flushes finish earlier and release memory earlier, if
compared to the case in which we just let all tables flush freely and
simultaneously."
2016-06-01 18:31:18 +03:00
Avi Kivity
0c7b2e2d5c Merge 2016-06-01 18:29:23 +03:00
Avi Kivity
d2e4548b35 Merge seastar upstream
* seastar 0bcdd28...864d6dc (4):
  > Logging framework
  > Add libubsan and libasan to fedora deps docs
  > tests: add rpc cancellable tests
  > rpc: add cancellable interface

Dropped logging implementation in favor of seastar's due to a link
conflict with operator<<.
2016-06-01 18:28:42 +03:00
Tomasz Grabiec
56736389c1 Merge branch 'sstable-errors/v2' from https://github.com/penberg/scylla.git
This series adds a constructor to malformed_sstable_exception that
includes a filename and converts some call-sites to use it.

There are still plenty of low-level sites that don't even know the
sstable filename they are operating on. We need to either change the
code to carry the filename to lower layers or find a higher-level
call-site where we can catch malformed_sstable_exception and rethrow it
with the sstable filename. But that's for another series by someone who
knows the sstable code well.

Refs #669.
2016-06-01 16:59:56 +02:00
Gleb Natapov
26b50eb8f4 storage_proxy: drop debug output
Message-Id: <20160601132641.GK2381@scylladb.com>
2016-06-01 17:13:12 +03:00
Pekka Enberg
94c35cc135 sstables/sstables: Add sstable filename to thrown malformed_sstable_exceptions 2016-06-01 17:11:05 +03:00
Pekka Enberg
3ca7fc2a8b database: Add sstable filename to thrown malformed_sstable_exceptions 2016-06-01 14:56:10 +03:00
Pekka Enberg
fa5354dda4 sstables: Add optional filename to malformed_sstable_exception
Add a constructor to malformed_sstable_exception that accepts a error
message and a sstable name.
2016-06-01 14:48:08 +03:00
Pekka Enberg
de0634c289 Merge "Extract modification_statement's (and related) parsed statement
into raw" from Avi

"Move parsed statements into raw namespace.  Mindless but therapeutic."
2016-06-01 14:19:53 +03:00
Avi Kivity
92d815a6cf Make github issue template less shouty 2016-06-01 10:45:04 +03:00
Pekka Enberg
0255318bf3 Revert "Revert "main: change order between storage service and drain execution during exit""
This reverts commit b3ed55be1d.

The issue is in the failing dtest, not this commit. Gleb writes:

  "The bug is in the test, not the patch. Test waits for repair session
   to end one way or the other when node is killed, but for nodetool to
   know if repair is completed it needs to poll for it.  If node dies
   before nodetool managed to see repair completion it will stuck
   forever since jmx is alive, but does not provide answers any more.
   The patch changes timing, repair is completed much close to exit now,
   so problem appears, but it may happen even without the patch.

   The fix is for dtest to kill jmx as part of killing a node
   operation."

Now that Lucas fixed the problem in scylla-ccm, revert the revert.
2016-06-01 08:48:50 +03:00
Glauber Costa
0f64eb7e7d serialize memtable flush for a memtable_list
We can only free memory for a region_group when the entire memtable is released.
This means that while the disk can handle requests from multiple memtables just fine,
we won't free any memory until all of them finish. If we are under a pressure situation
we will take a lot more time to leave it.

Ideally, with write-behind, we would allow just one memtable to be flushed at a
time. But since we don't have it enabled, it's better to serialize the flushes
so that only some memtables (4) are flushed at a time. Having the memtable writer
bandwidth all to itself, the memtable will finish sooner, release memory sooner,
and recover the system's health sooner.

We would like to do that without having streaming and memtables starve each
other. Ideally, that should mean half the bandwidth for each - but that
sacrifices memtable writes in the common case there is no streaming. Again,
write behind will help here, and since this is something we intend to do, there
is no need to complicate the code too much for an interim solution.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-05-31 17:18:35 -04:00
Glauber Costa
46c79be401 database: allow callers to specify memtable list's flush behavior
This patch introduces an explicit behavior enum class - one of delayed or
immediate, that allow callers to tell the memtable list whether they want a
delayed flush (default), or force an immediate flush. So far this only affects
the streaming code (memtables just ignore it), but the concept is one that can
be easily generalized.

With that in place, we can revert back the stop function to use the standard
flush. I have argued before that adding infrastructure like that would not be
worth it for the sake of stop alone, but some other code could now use it.

Specifically, the active reclaimer for the throttler would like to force
immediate flushes, as delayed flushes really won't make a lot of difference in
reducing memory usage.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-05-31 17:17:48 -04:00
Avi Kivity
c8b5104aa5 cql3: extract raw batch_statement into raw sub-namespace
prepare() was moved to .cc to avoid circular dependencies.
2016-05-31 21:41:26 +03:00
Avi Kivity
1d144699f6 cql3: extract raw delete_statement into raw sub-namespace 2016-05-31 21:24:56 +03:00
Avi Kivity
e596799962 cql3: extract raw update_statement into raw sub-namespace
update_statment also has an insert_statement counterpart, convert it too.
2016-05-31 21:16:53 +03:00
Avi Kivity
10213c4211 cql3: extract raw modification_statement into raw sub-namespace 2016-05-31 20:53:37 +03:00
Asias He
f27e5d2a68 messaging_service: Delay listening ms during boot up
When a node starts up, peer node can send gossip syn message to it
before the gossip message handlers are registered in messaging_service.

We can see:

  scylla[123]:  [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored

To fix, we delay the listening of messaging_service to the point when
gossip message handlers are registered.
Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>
2016-05-31 12:28:11 +03:00
Avi Kivity
f3fc3afe00 cql3: optimize make_empty_metadata()
All empty metadata objects are equal, so make just one and keep returning
it.
Message-Id: <1464334638-7971-4-git-send-email-avi@scylladb.com>
2016-05-31 09:12:20 +03:00
Avi Kivity
0135b4d5cd cql3: constify metadata users
Metadata usually doesn't change after it is created; make that visible in
the code, allowing further optimizations to be applied later.
Message-Id: <1464334638-7971-3-git-send-email-avi@scylladb.com>
2016-05-31 09:12:11 +03:00
Avi Kivity
6728454591 cql3: rationalize extract_result_metadata()
Rather than dynamic_cast<>ing the statement to see whether it is a
select statement, add a virtual function to cql_statement to get the
result metadata.

This is faster and easier to follow.
Message-Id: <1464334638-7971-2-git-send-email-avi@scylladb.com>
2016-05-31 09:12:02 +03:00
Avi Kivity
25b3d74f45 cql3: Split select_statement::raw_statement into raw namespace
cql3::select_statement::raw_statement
    -> cql3::raw::select_statement
Message-Id: <1464609556-3756-4-git-send-email-avi@scylladb.com>
2016-05-31 09:09:30 +03:00
Avi Kivity
c8f98c5981 cql3: move cf_statement into raw hierarchy
cql3::statements::cf_statement
    -> cql3::statements::raw::cf_statement
Message-Id: <1464609556-3756-3-git-send-email-avi@scylladb.com>
2016-05-31 09:09:21 +03:00
Avi Kivity
caf8d4f0e6 cql3: separate parsed_statement and parsed_statment::prepared
cql3::statements::parsed_statement
    -> cql3::statements::raw::parsed_statement
  cql3::statements::parsed_statement::prepared
    -> cql3::statements::prepared_statement
Message-Id: <1464609556-3756-2-git-send-email-avi@scylladb.com>
2016-05-31 09:09:10 +03:00
Duarte Nunes
a15ed3c60f mutation_test: Specify tmp data dir
Otherwise we attempt to create sstable files under /.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1464618602-1124-1-git-send-email-duarte@scylladb.com>
2016-05-30 20:34:47 +02:00
Pekka Enberg
b3ed55be1d Revert "main: change order between storage service and drain execution during exit"
This reverts commit 0ebd8b18b7.

The change breaks repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
2016-05-30 12:48:09 +03:00
Avi Kivity
e515933c70 dist: tune scheduler for lower latency
Scylla-jmx and collectd can preempt scylla and induce long latencies.  Tune
the scheduler to provide lower latencies.

Since when the support processes are not running we normally do not context
switch (one thread per core, remember?), there should be no effect on
throughput.

The tunings are provided in a separate package, which can be uninstalled
if the server is shared with other applications which are negatively
affected by the tuning.

Fixes #1218.
Message-Id: <1464529625-12825-1-git-send-email-avi@scylladb.com>
2016-05-30 08:42:19 +03:00
Avi Kivity
e8e00338d1 config: document defragment_memory_on_idle
Message-Id: <1464261650-14136-2-git-send-email-avi@scylladb.com>
2016-05-30 08:39:26 +03:00
Avi Kivity
b50cb3eca8 config: rename compact_on_idle
compact_on_idle will lead users to thinking we're talking about sstable
compaction, not log-structured-allocator compaction.

Rename the variable to reduce the probability of confusion.
Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>
2016-05-30 08:39:13 +03:00
Yoav Kleinberger
e580ac5dae docker: fix Ubuntu Dockerfile
one needs to update the repository info before one can install packages.
Fixes issue #1296.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <a906e76d584baff5988cb31a4003de27455e0741.1464529740.git.yoav@scylladb.com>
2016-05-29 17:00:25 +03:00
Avi Kivity
3f6ecb9f28 Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb 2016-05-29 15:58:55 +03:00
Gleb Natapov
2efbccc901 storage_proxy: do only local read repair if non matching data was recently modified
When read/write to a partition happens in parallel reader may detect
digest mismatch that may potentially cause cross DC read repair attempt,
but the repair is not really needed, so added latency is not justified.

This patch tries to prevent such parallel access from causing heavy
cross DC repair operation buy checking a timestamp of most resent
modification. If the modification happens less then "write timeout"
seconds ago the patch assumes that the read operation raced with write
one and cancel cross DC repair, but only if CL is LOCAL_*.
2016-05-29 15:26:51 +03:00
Amnon Heiman
d4123ba613 API: column_family count sstable space used correctly
The space calculation counters in column family had two problem:
1. The total bytes is an ever growing counter, which is meaningless for
the API.

2. Trying to simply sum the size on all shards, ignores the fact that the
same sstable file can be referenced by multiple shards, this is
especially noticeable during migration time.

To solve this, the implementation was modified so instead of
collecting the sizes, the API would collect a map of file name to size
and then would do the summing.

This removes the duplications and fixes the total bytes calculation

Calling cfstats before the change with load after a compaction happend:

$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1068253.0 76435
	Read Count: 75915
	Read Latency: 0.5953986037015082 ms.
	Write Count: 76435
	Write Latency: 0.013975966507490025 ms.
	Pending Flushes: 0
		Table: standard1
		SSTable count: 5
		Space used (live): 44261215
		Space used (total): 219724478

After the fix:

$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1863206.0 124219
	Read Count: 125401
	Read Latency: 0.9381053978835895 ms.
	Write Count: 124219
	Write Latency: 0.01499936402643718 ms.
	Pending Flushes: 0
		Table: standard1
		SSTable count: 6
		Space used (live): 50402904
		Space used (total): 50402904
		Space used by snapshots (total): 0

Fixes: #1042

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1464518757-14666-2-git-send-email-amnon@scylladb.com>
2016-05-29 14:11:03 +03:00
Gleb Natapov
32c9a06faf messaging_service: abort retrying send during exit
Fixes #862

Message-Id: <1463579574-15789-3-git-send-email-gleb@scylladb.com>
2016-05-29 11:39:36 +03:00
Gleb Natapov
0ebd8b18b7 main: change order between storage service and drain execution during exit
Even the comment says drain_on_shutdown should be called first, but for
that in has to be registered last.

Fixes #862

Message-Id: <1463579574-15789-2-git-send-email-gleb@scylladb.com>
2016-05-29 11:39:24 +03:00
Glauber Costa
30d54cef38 database: add a comment explaining the choice of function in CF stop
We have recently commited a fix to a broken streaming bug that involved
reverting column_family::stop() back to calling the custom seal functions
explicitly for both memtables and streaming memtables.

We here add a comment to explain why that had to be done.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <fe94b5883e9c29adc7fc9ee9f498894c057e7b64.1464293167.git.glauber@scylladb.com>
2016-05-29 11:28:15 +03:00
Avi Kivity
8e124b31aa Merge "gossip: Refactor waiting for supported features" from Duarte
"This patch changes the way we wait for supported features. We no longer
sleep periodically, waking up to check if the wanted features are now
avaiable. Instead, we register waiters in a condition variable that is
signaled whenever new endpoint information is received.

We also add a new poll interface based on the feature class, which
encapsulates the availability of a cluster feature."
2016-05-27 20:24:25 +03:00
Duarte Nunes
f613dabf53 gossip: Introduce the gms::feature class
This class encapsulates the waiting for a cluster feature. A feature
object is registered with the gossiper, which is responsible for later
marking it as enabled.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-05-27 17:20:51 +00:00
Duarte Nunes
4684b8ecbb gossip: Refactor waiting for features
This patch changes the sleep-based mechanism of detecting new features
by instead registering waiters with a condition variable that is
signaled whenever a new endpoint information is received.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-05-27 17:20:51 +00:00
Duarte Nunes
422f244172 gossip: Don't timeout when waiting for features
This patch removes the timeout when waiting for features,
since future patches will make this argument unnecessary.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-05-27 17:20:51 +00:00
Avi Kivity
fab4cc8d6d Merge seastar upstream
* seastar 8bfbb1a...0bcdd28 (1):
  > Merge "introduce sleep_abortable() that throws exception on application exit" from Gleb
2016-05-27 20:14:49 +03:00
Duarte Nunes
b3011c9039 gossip: Rename set_heart_beat_state
...to set_heart_beat_state_and_update_timestamp in order to make it
explicit for callers the update_timestamp is also changed.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1464309023-3254-3-git-send-email-duarte@scylladb.com>
2016-05-27 09:11:39 +03:00
Duarte Nunes
8c0e2e05b7 gossip: Fix modification to shadow endpoint state
This patch fixes an inadvertent change to the shadow endpoint state
map in gossiper::run, done by calling get_heart_beat_state() which
also updates the endpoint state's timestamp. This did not happen for
the normal map, but did happen for the shadow map. As a result, every
time gossiper::run() was scheduled, endpoint_map_changed would always
be true and all the shards would make superfluous copies of the
endpoint state maps.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1464309023-3254-2-git-send-email-duarte@scylladb.com>
2016-05-27 09:10:38 +03:00
Pekka Enberg
b7e79b72d5 Merge "Introduce SET_NIC for non-AMI environment" from Takuya
"This patchset provides a way to enable SET_NIC(posix_net_conf.sh) on
 non-AMI environment.
 Also support -mq option of the script.
 This also contains number of bug fixes of scripts.

 Fixes #1192"
2016-05-26 13:37:06 +03:00
Yoav Kleinberger
26c0d86401 tools/scyllatop: improved user interface: scrollable views
NOTE: scyllatop now requires the urwid library

previously, if there were more metrics that lines in the terminal
window, the user could not see some of the metrics.  Now the user can
scroll.

As an added bonus, the program will not crash when the window size
changes.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1464098832-5755-1-git-send-email-yoav@scylladb.com>
2016-05-26 13:36:28 +03:00
Piotr Jastrzebski
136b8148d2 Use idle CPU to compact LSA memory
Register an idle CPU handler that compacts a single segment
every time there's nothing better to execute on CPU.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>
2016-05-26 12:43:53 +03:00
Avi Kivity
d7f36a093f Merge seastar upstream
* seastar e5faea8...8bfbb1a (1):
  > reactor: advertise the logging_failures metric as a DERIVE counter

Fixes #1292.
2016-05-26 11:46:08 +03:00
Tomasz Grabiec
f0c2b1d161 config: Fix typos
Message-Id: <1464201938-4778-1-git-send-email-tgrabiec@scylladb.com>
2016-05-26 08:19:57 +03:00
Asias He
f1b3cb4a08 storage_service: Catch and fail an invalid configuration with --replace-address
Vlad reported a strange user configuration:

   SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level
   info --collectd-address=127.0.0.1:25826 --collectd=1
   --collectd-poll-period 60000 --network-stack posix --num-io-queues 32
   --max-io-requests 128 --replace-address 10.0.4.131"

   seed_provider:
       - class_name: org.apache.cassandra.locator.SimpleSeedProvider
         parameters:
             - seeds: "10.0.4.131"

   In the mean while, 10.0.4.131 is the IP address of the node itself.

When the node was started, the following message were reported.

   Apr 13 06:31:12 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (20 seconds passed)
   Apr 13 06:31:13 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (21 seconds passed)
   Apr 13 06:31:14 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (22 seconds passed)
   Apr 13 06:31:15 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (23 seconds passed)

The configruation is invalid, becasue for --replace-address to
work, at least one working seed node should be alive. Catch the
configuration error and fail it with an appropriate error message.

Fixes #1183
Message-Id: <a94a082d896313e7a668915ae21fe2c03719da3a.1464164058.git.asias@scylladb.com>
2016-05-25 14:42:19 +03:00
Asias He
fed1e65e1e gossip: Do not insert the same node into _live_endpoints_just_added
_live_endpoints_just_added tracks the peer node which just becomes live.
When a down node gets back, the peer nodes can receive multiple messages
which would mark the node up, e.g., the message piled up in the sender's
tcp stack, after a node was blocked with gdb and released. Each such
message will trigger a echo message and when the reply of the echo
message is received (real_mark_alive), the same node will be added to
_live_endpoints_just_added.push_back more than once. Thus, we see the
same node be favored more than once:

INFO  2016-04-12 12:09:57,399 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:09:58,412 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:09:59,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:00,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:01,430 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:02,442 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:03,454 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2

To fix, do not insert the node if it is already in
_live_endpoints_just_added.

Fixes #1178
Message-Id: <6bcfad4430fbc63b4a8c40ec86a2744bdfafb40f.1464161975.git.asias@scylladb.com>
2016-05-25 14:19:40 +03:00
Glauber Costa
46f60f52d9 database: do not use implicitly stated seal function when closing the CF
In commit 4981362f57, I have introduced a regression that was thankfully
caught by our dtest infrastructure.

That patch is a preparation patch for the active reclaim patchset that is to
come, and it consolidated all the flushes using the memtable_list's seal_fn
function instead of calling the seal function explicitly.

The problem here is that the streaming memtables have the delayed mechanism,
about which the memtable_list is unaware. Calling memtable_list's
seal_active_memtable() for the streaming memtables calls the delayed version,
that does not guarantee flush. If we're lucky, we will indeed flush after the
timer expires, but if we're not we'll just stop the CF with data not flushed.

There are two options to fix this: the first is to teach the memtable_list about
the delayed/forced mechanism, and the second is to just call the correct
function explicitly during shutdown, and then when the time comes to add
continuations to the result of the seal, add them here as well.

Although the second option involves a bit more work and duplication, I think it
is better in the sense that the delayed / forced mechanism really is something
that belong to the streaming only. Being this the only user, I don't think it
justifies complicating the memtable_list with this concept.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <b26017c825ccf585f39f58c4ab3787d78e551f5f.1464126884.git.glauber@scylladb.com>
2016-05-25 08:21:24 +03:00
Avi Kivity
2d4d6c9c92 Merge seastar upstream
* seastar aed893e...e5faea8 (5):
  > Catch exceptions thrown by idle cpu handler
  > core::gate: add a get_count() method
  > reactor: Introduce idle CPU handler
  > core: add missing header for g++-4.9
  > Add lksctp-tools-devel do required packages
2016-05-24 20:42:41 +03:00
Pekka Enberg
ceb29f9d32 Merge "Introduce upload dir for sstable migration" from Raphael
"This change is intended to make migration process safer and easier.
 All column families will now have a directory called upload.
 With this feature, users may choose to copy migrated sstables to upload
 directory of respective column families, and run 'nodetool refresh'.
 That's supposed to be the preferred option from now on."
2016-05-24 16:36:47 +03:00
Gleb Natapov
7f6b12c97a query: add user provided timestamp to read_command
If read query supplies timestamp  move it to read_command to be
used later otherwise get local timestamp.
2016-05-24 15:19:35 +03:00
Pekka Enberg
d7d8c76fe5 transport/server: Add CQL frame LZ4 compression support
The default CQL frame compression algorithm in Cassandra is LZ4. Add
support for decompressing incoming frames and compressing outgoing
frames with LZ4 if the CQL driver asks for that.

Fixes #416

Message-Id: <1464086807-11325-1-git-send-email-penberg@scylladb.com>
2016-05-24 15:03:33 +03:00
Takuya ASADA
53cebb4a5e dist/ubuntu: don't rebuild dependency packages by default
Same as CentOS, do not build dependencies by default, install binary packages from our repository.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1464023451-21436-1-git-send-email-syuu@scylladb.com>
2016-05-24 14:10:59 +03:00
Gleb Natapov
12cf60c302 messaging_service: add timestemp of last modification to READ_DIGEST verb return value 2016-05-24 13:27:34 +03:00
Gleb Natapov
1e6f64f4ab query: add latest modification timestamp to result structure 2016-05-24 13:27:34 +03:00
Gleb Natapov
5fef0717cc query: find latest modification timestamp while calculating result digest 2016-05-24 13:27:34 +03:00
Avi Kivity
9637c2232c Merge "Move the JMX timer polling logic to Scylla" from Amnon 2016-05-24 13:07:52 +03:00
Raphael S. Carvalho
c2fa3b796d db: fix read consistency after refresh
If sstable loaded by refresh covers a row that is cached by the
column family, read query may fail to return consistent data.
What we should do is to clear cache for the column family being
loaded with new sstables.

Fixes #1212.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a08c9885a5ceb0b2991e40337acf5b7679580a66.1464072720.git.raphaelsc@scylladb.com>
2016-05-24 12:11:41 +03:00
Takuya ASADA
5d5d525a14 dist/ubuntu: fix incorrect dependency package name
PyYAML is CentOS/RHEL/Fedora package name, python-yaml is correct one.

Fixes #1279

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1463987823-22837-1-git-send-email-syuu@scylladb.com>
2016-05-23 10:21:29 +03:00
Pekka Enberg
8a7197e390 dist/docker: Fetch RPM repository from Scylla web site
Fix the hard-coded Scylla RPM repository by downloading it from Scylla
web site. This makes it easier to switch between different versions.

Message-Id: <1463981271-25231-1-git-send-email-penberg@scylladb.com>
2016-05-23 09:45:41 +03:00
Piotr Jastrzebski
2be4ec4e06 Add lksctp-tools-devel to required packages
in fedora build instructions.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <15f3db34f12f01cb9da32fd14c16ba87e64ad5f4.1463947999.git.piotr@scylladb.com>
2016-05-23 08:26:02 +03:00
Avi Kivity
5e5317b228 dist: add build dependencies for sctp
Required by new seastar
2016-05-22 19:10:25 +03:00
Avi Kivity
5bb1255da1 Merge seastar upstream
* seastar 6a849ac...aed893e (3):
  > net: move 'transport' enum to seastar namespace
  > net: sctp protocol support for posix stack
  > future: Support get() when state is at a promise
2016-05-22 16:32:33 +03:00
Amnon Heiman
e26002d581 idl-compiler: default constructor of complex types
This patch solve a problem where a complex type is define as version
depended (with the version attribute) but doesn't have a default value.

In those cases the default constructor is used, but in the case of
complex types (template) param_type should be use to get the C++ type.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1463916723-15322-1-git-send-email-amnon@scylladb.com>
2016-05-22 15:32:29 +03:00
Raphael S. Carvalho
e5f0314afd db: introduce upload directory for sstable migration
This change is intended to make migration process safer and easier.
All column families will now have a directory called upload.
With this feature, users may choose to copy migrated sstables to upload
directory of respective column families, and call 'nodetool refresh'.
That's supposed to be the preferred option from now on.

For each sstable in upload directory, refresh will do the following:
1) Mutate sstable level to 0.
2) Create hard links to its components in column family dir, using
a new generation. We make it safe by creating a hard link to temporary
TOC first.
3) Remove all of its components in upload directory.

This new code runs after refresh checked for new sstables in the column
family directory. Otherwise, we could have a generation conflict.
Unlike the first step, this new step runs with sstable write enabled.
It's easier here because we know exactly which sstables are new.

After that, refresh will load new sstables found in column family
and upload directories.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-20 17:26:21 -03:00
Raphael S. Carvalho
70b793e4d3 tests: add test for statistics rewrite
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-20 17:26:12 -03:00
Raphael S. Carvalho
74c8a87777 sstables: fix statistics rewrite
It's not working because it tries to overwrite existing statistics
file with exclusive flag.
It's fixed by writing new statistics into temporary file and
renaming it into place.

If Scylla failed in middle of rewrite, a temporary file is left
over. So boot code was adjusted to delete a temporary file created
by this rewrite procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-20 17:24:15 -03:00
Pekka Enberg
94e7e61cd0 api: Register snitch API earlier
Currently, we register snitch API in set_server_gossip_settle() which
waits until a node has joined the cluster. This makes 'nodetool status'
not properly show the status of a joining node. Fix the issue by
registering snitch API earlier.

Fixes #1269.
Message-Id: <1463576381-15484-1-git-send-email-penberg@scylladb.com>
2016-05-20 14:24:14 +03:00
Gleb Natapov
7a54b5ebbb gossiper: cleanup mark_alive() even more
Message-Id: <20160519100513.GE984@scylladb.com>
2016-05-19 12:47:19 +02:00
Takuya ASADA
03a762bb0b dist/common/scripts: Ask to set SET_NIC=yes on scylla_setup interactive prompt
We supported SET_NIC on non-AMI environment, so ask user to use it on scylla_setup interactive prompt.
2016-05-19 06:26:23 +09:00
Takuya ASADA
88fde0a91e dist/ami: fix dependency unresolved error on AMI build script with local package, by adding scylla-conf package
Since we added scylla-conf package, we cannot install scylla-server/-tools without the package, because of this --localrpm is failing.
So copy scylla-conf package to AMI, and install it to fix the problem.
2016-05-19 06:26:23 +09:00
Takuya ASADA
898243929f dist/common/scripts: specify queue settings for posix_net_conf.sh on scylla_prepare
posix_net_conf.sh wants -sq/-mq options, so detect number of queues and specify the option in scylla_prepare.
2016-05-19 06:26:23 +09:00
Takuya ASADA
f84b7b094f dist/common/scripts: drop special condition to enable SET_NIC on AMI, do this on AMI installation script
Remove special case of SET_NIC in AMI, do this in scylla-ami-setup.service.
2016-05-19 06:25:41 +09:00
Takuya ASADA
49cdd0b786 dist: move '--cpuset' and '--smp' configuration to scylla_cpuset_setup / cpuset.conf
These parameters are only required for AMI, not for non-AMI environment which want to enable SET_NIC, so split them to indivisual script / conf file, call it from AMI install script.
2016-05-19 06:25:28 +09:00
Takuya ASADA
46fa80a5a6 dist/common/scripts: replace IFNAME variable when --nic specified to scylla_sysconfig_setup
scylla_sysconfig_setup has bug that it not replaces IFNAME variable, so fixed.
2016-05-19 06:25:15 +09:00
Glauber Costa
4eff07d773 database: reorder initialization
In a preparation move for the LSA throttler, we have reordered the
initialization fields in database.hh so that the sizes of the regions are
computed before the initialization of the region.

However, that seemingly innocent move broke one of our tests. The reason behind
that, is that if we don't destroy the column families before destroying the
region, we may end up with a use after free in the memtable destructor - that
itself expects to call into the region.

This patch reorders the initialization so that the CF list still comes after the
dirty regions (therefore being destroyed first), while maintaining the relative
ordering between size / region that we needed in the first place.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0669984b5bccdb2c950f2444bdee4427abad56ba.1463508884.git.glauber@scylladb.com>
2016-05-18 11:02:40 +03:00
Asias He
eb9ac9ab91 gms: Optimize gossiper::is_alive
In perf-flame, I saw in

service::storage_proxy::create_write_response_handler (2.66% cpu)

  gossiper::is_alive takes 0.72% cpu
  locator::token_metadata::pending_endpoints_for takes 1.2% cpu

After this patch:

service::storage_proxy::create_write_response_handler (2.17% cpu)

  gossiper::is_alive does not show up at all
  locator::token_metadata::pending_endpoints_for takes 1.3% cpu

There is no need to copy the endpoint_state from the endpoint_state_map
to check if a node is alive. Optimize it since gossiper::is_alive is
called in the fast path.

Message-Id: <2144310aef8d170cab34a2c96cb67cabca761ca8.1463540290.git.asias@scylladb.com>
2016-05-18 10:12:38 +03:00
Avi Kivity
6ec0000df8 Merge "fix migration of tables with level > 0" from Rapahel 2016-05-17 19:14:01 +03:00
Raphael S. Carvalho
cbc2e96a58 tests: check that overlapping sstable has its level changed to 0
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-17 11:11:05 -03:00
Raphael S. Carvalho
ee0f66eef6 db: fix migration of sstables with level greater than 0
Refresh will rewrite statistics of any migrated sstable with level
> 0. However, this operation is currently not working because O_EXCL
flag is used, meaning that create will fail.

It turns out that we don't actually need to change on-disk level of
a sstable by overwriting statistics file.
We can only set in-memory level of a sstable to 0. If Scylla reboots
before all migrated sstables are compacted, leveled strategy is smart
enough to detect sstables that overlap, and set their in-memory level
to 0.

Fixes #1124.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-17 11:08:08 -03:00
Gleb Natapov
76e0eb426e gossiper: simplify mark_alive()
The code runs in a thread so there is no need to use heap to
communicate between statements.

Message-Id: <20160517120245.GK984@scylladb.com>
2016-05-17 15:37:21 +03:00
Avi Kivity
4413176051 Merge "reduce performance degradation when adding node" from Asias
"With this series, the operations per second drop during adding node period gets
much better.

Before:
45K to 10K

After:
45k to 38K

Refs: #1223
"
2016-05-17 14:31:31 +03:00
Asias He
089734474b token_metadata: Speed up pending_endpoints_for
pending_endpoints_for is called frequently by
storage_proxy::create_write_response_handler when doing cql query.

Before this patch, each call to pending_endpoints_for involves
converting a multimap (std::unordered_multimap<range<token>,
inet_address>>) to map (std::unordered_map<range<token>,
std::unordered_set<inet_address>>).

To speed up the token to pending endpoint mapping search, a interval map
is introduced. It is faster than searching the map linearly and can
avoid caching the token/pending endpoint mapping.

With this patch, the operations per second drop during adding node
period gets much better.

Before:
45K to 10K

After:
45k to 38K

(The number is measured with the streaming code skipping to send data to
rule out the streaming factor.)

Refs: #1223
2016-05-17 17:32:15 +08:00
Asias He
ee0585cee9 dht: Add default constructor for token
It is needed to put token in to a boost interval_map in the following
patch.
2016-05-17 17:32:15 +08:00
Amnon Heiman
ad34f80e6f API: change cache_service, column_family and storage_proxy to rate
object

The API would expose now the rate_moving_average and
rate_moving_average_and_histogram.

The old end points remains for the transition period, but marked as
depricated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:56:52 +03:00
Amnon Heiman
b33ed48527 API Definition: change cache_service, column_family and storage_proxy to use rate objects
This patch replaces the latency histogram to
rate_moving_avrage_and_histogram and the counters to
rate_moving_average.

The old endpoints where left unchagned but marked as depricated when
needed.
2016-05-17 11:55:06 +03:00
Amnon Heiman
20a48b0f20 API: column family stats break the map_reduce functionality
This patch replaces the helper function for column family with two
function, one that collect the relevant column family from all shareds
and another one that do the translation to json object.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:53:15 +03:00
Amnon Heiman
750f30cf07 column_family: Change histogram to
timed_rate_moving_average_and_histogram

As part of moving the derived statistic in to scylla, this replaces the
histogram object in the column_family to
timed_rate_moving_average_and_histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:53:15 +03:00
Amnon Heiman
468bcfbf1f row_cache: Change counter to timed_rate_moving_average_and_histogram
As part of moving the derived statistic in to scylla, this replaces the
counter in the row_cache stats to
timed_rate_moving_average_and_histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:53:15 +03:00
Amnon Heiman
64e0c8cd1b storage_proxy: Change histogram to
timed_rate_moving_average_and_histogram

As part of moving the derived statistic in to scylla, this replaces the
histogram object in the storage_proxy to
timed_rate_moving_average_and_histogram. and the read, write and range
counters where replaced by rate_moving_average.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:52:16 +03:00
Amnon Heiman
f6a5a4e3da API: Add helper function for the rate objects
This patch adds the helper function that are used to sum the
rate_moving_average and rate_moving_average_and_histogram.

The current sum functionality for histogram was modified to support
rate and histogram but return a histogram. This way current endpoints
would continue to behave the same.

It also cleans the histogram related method by using the plus operator
in the histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:49:34 +03:00
Amnon Heiman
8ef25ceb05 Add waited avrage rate related object
This patch adds a few data structure for derived and accumulative
statistics that are similiar to the yammer implementation used by the
JMX.

It also adds a plus operator to histogram which cleans the histogram
usage.

moving_average - An exponentially-weighted moving average. calculate an event rate
on a given interval.

rate_moving_average and timed_rate_moving_average - Calculate 1m, 5m and
15m ewma an all time avrage and a counter.

rate_moving_average_and_histogram and
timed_rate_moving_average_and_histogram - Combines a histogram with a
rate_moving_average. It also expose a histogram API so it will be an
easy task to replace a histogram with a
timed_rate_moving_average_and_histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:47:49 +03:00
Glauber Costa
17b9203719 database: invert order of elements
So that the sizes of the region can be initialized first

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <dc3df186a977b492d83c0a397f206c2db940aa37.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:39 +03:00
Glauber Costa
2ff6d38d0c database: use a single constructor for the column family
We've been keeping two constructors for the column family to allow for a
version without the commitlog. But it's by now quite complicated to maintain
the two, because changes always have to be made in two places.

This patch adds a private constructor that does the actual construction, and
have the public constructors to call it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <dd3cb0b9c20ad154a6131bad6ece619f70ed5025.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:39 +03:00
Glauber Costa
8fede5b98e memtables: isolate logic for disk writes disabled
When we have disk writes disabled, we exit immediately from the flush
function. We can just encode that separately and pass a different function
in the memtable_list creation. That simplifies the memtable flush a bit.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <908e3b5eb2c6ee84b8ad7b31c3673be5531a087c.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:38 +03:00
Glauber Costa
4981362f57 memtables: always seal through memtable_list seal function
I would like to be able to apply a function at the end of every flush, that is
common for both memtables and streaming memtables. For instance, to unthrottle
current waiters. Right now some calls to seal_active_memtable are open coded,
calling the column family's function directly, for both the main memtable list
and the streaming list.

This patch moves all the current open code callers to call the respective
memtable_list function.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0c780254f3c4eb03e2bcd856b83941cf49a84b85.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:37 +03:00
Takuya ASADA
4972a72380 dist: drop 'sudo -E' and SETENV for security reason, source envfile from scripts
As Nadav pointed out, SETENV and sudo -E might be causes security hole:
https://github.com/scylladb/scylla/issues/1028#issuecomment-196202171
So drop them now, sourcing envfiles from scylla_prepare / scylla_stop scripts
instead.

Also on "[PATCH] ubuntu: Fix the init script variable sourcing" thread
we have problem to passing variables from envfiles to scylla_prepare /
scylla_stop on Ubuntu, it seems better to sourcing from these scripts.

Additionally, this fixes #1249

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462989906-30062-1-git-send-email-syuu@scylladb.com>
2016-05-17 10:31:03 +03:00
Pekka Enberg
9c450f673c cql3: Clean up prepared_metadata class
Return vectors by const reference in prepared_metadata class and add a
FIXME to result_message class.

Message-Id: <1463425756-20225-1-git-send-email-penberg@scylladb.com>
2016-05-17 10:02:14 +03:00
Pekka Enberg
217c1ffa95 cql3: Specify result set flag ABI explicitly
As Avi points out, the flag values are an ABI. So specify them explicitly.

Message-Id: <1463413379-8355-1-git-send-email-penberg@scylladb.com>
2016-05-16 19:00:52 +03:00
Avi Kivity
a3b23d75b9 Merge "Fix Prepared message metadata serialization"
"The Prepared message has a metadata section that's similar to result set
metadata but not exactly the same. Fix serialization by introducing a
separate prepared_metadata class like Origin has and implement
serialization as per the CQL protocol specification. This fixes one CQL
binary protocol version 4 issue that we currently have.

The changes have been verified by running the gocql integration tests
using v4. Please note that this series does *not* enable v4 for clients
because Cassandra 2.1.x series only supports CQL binary protocol v3."
2016-05-16 18:59:54 +03:00
Pekka Enberg
868ff5107c cql3: Introduce prepared_metadata class
Introduce a new prepared_metadata class that holds prepared statement
metadata and implement CQL binary protocol serialization that works for
all versions.
2016-05-16 18:06:01 +03:00
Tomasz Grabiec
272e89846d Merge branch 'cache' from git@github.com:haaawk/scylla.git
From Piotr:

Fixes #656.

It makes it possible to slice using clustering ranges in mutation
readers.  We don't have row index yet so the slicing is just ignoring
data which is out of range.
2016-05-16 14:44:33 +02:00
Piotr Jastrzebski
dcba6f5c45 Pass clustering_row_ranges to mutation readers.
This will allow readers to reduce the amount of data read.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 14:36:57 +02:00
Pekka Enberg
a68671e247 cql3: Add column_specification::all_in_same_table() helper
We need it the prepared_metadata class that we're about to introduce.
2016-05-16 14:13:31 +03:00
Takuya ASADA
80037aa95b dist/common/scripts: don't proceed to run scylla_raid_setup when disks not selected, on interactive RAID setup
When disks not selected, run disk select prompt again.
Fixes #1260

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1463388933-3640-1-git-send-email-syuu@scylladb.com>
2016-05-16 13:45:17 +03:00
Pekka Enberg
adfb4d7bbd cql3: Move result_set class implementation to source file 2016-05-16 13:20:45 +03:00
Pekka Enberg
8552f222f5 cql3: Clean up result_set class
Kill some left-over ifdef'd code from the result_set class.

Message-Id: <1463392997-22921-1-git-send-email-penberg@scylladb.com>
2016-05-16 13:09:37 +03:00
Piotr Jastrzebski
23c23abe53 Make memtable mutation_reader slice using clustering ranges.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:41 +02:00
Piotr Jastrzebski
484d2ecd0a Slice data with clustering key range in sstable reader
Add additional parameters to mp_row_consumer to be able to fetch
only cells for given clustering key ranges

This will be used in row_cache when it will work on clustering key
level instead of partition key level.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:30 +02:00
Piotr Jastrzebski
8307681975 Introduce clustering_ranges type.
It will be used to slice data returned by mutation_readers.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:09 +02:00
Amnon Heiman
7e07d97e4b API utils: Adding rate moving avrage
rate_moving_average and rate_moving_average_and_histogram are type that
are used by the JMX.  They are based on the yammer meter and timer and
are used to collect derivative information.

Specificlly: rate_moving_average calculate rates and
rate_moving_average_and_histogram collect rates and
histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-16 11:40:19 +03:00
Pekka Enberg
17765b6c06 Merge seastar upstream
* seastar 3dec26f...6a849ac (4):
  > seastar::socket: Be resilient against ENOTCONN
  > Merge " improve performance and predictability of syscall thread communications" from Glauber
  > rpc_test: Shutdown properly
  > [PATCH} future: better detect get_future() on already used promise
2016-05-16 08:04:47 +03:00
Yoav Kleinberger
de7952a8db tools/scyllatop: log input from collectd for easier debugging
When running with DEBUG verbosity, scyllatop will now log every single
value it receives from collectd. When you suspect that scyllatop is
somehow distorting values, this is a good way to check it.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1463320730-6631-1-git-send-email-yoav@scylladb.com>
2016-05-15 19:17:10 +03:00
Tomasz Grabiec
1eabe9b840 storage_proxy: Add trace-level logging for mutating
Message-Id: <1462978554-31217-1-git-send-email-tgrabiec@scylladb.com>
2016-05-12 13:52:56 +03:00
Tomasz Grabiec
7207cc8b1a storage_proxy: Improve error reporting
Knowing the source node can help in debugging the issue.
Message-Id: <1462978535-31164-1-git-send-email-tgrabiec@scylladb.com>
2016-05-12 13:52:39 +03:00
Pekka Enberg
b5d9aa866d Merge "Fixes for schema synchronization" from Tomek
"Writes may start to be rejected by replicas after issuing alter table
 which doesn't affect columns. This affects all versions with alter table
 support.

 Fixes #1258"
2016-05-12 09:43:25 +03:00
Duarte Nunes
7dbeef3c39 storage_service: Fix ignored future in on_alive
This patch ensures the future created by invoke_on_all is not ignored
by waiting on it, which is safe to do since we are within a
seastar::async context.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1462989837-7326-1-git-send-email-duarte@scylladb.com>
2016-05-12 09:03:46 +03:00
Tomasz Grabiec
13d8cd0ae9 migration_manager: Invalidate prepared statements on every schema change
Currently we only do that when column set changes. When prepared
statements are executed, paramaters like read repair chance are read
from schema version stored in the statement. Not invalidating prepared
statements on changes of such parameters will appear as if alter took
no effect.

Fixes #1255.
Message-Id: <1462985495-9767-1-git-send-email-tgrabiec@scylladb.com>
2016-05-12 08:58:40 +03:00
Tomasz Grabiec
90c31701e3 tests: Add unit tests for schema_registry 2016-05-11 17:31:22 +02:00
Tomasz Grabiec
443e5aef5a schema_registry: Fix possible hang in maybe_sync() if syncer doesn't defer
Spotted during code review.

If it doesn't defer, we may execute then_wrapped() body before we
change the state. Fix by moving then_wrapped() body after state changes.
2016-05-11 17:31:22 +02:00
Tomasz Grabiec
8703136a4f migration_manager: Fix schema syncing with older version
The problem was that "s" would not be marked as synced-with if it came from
shard != 0.

As a result, mutation using that schema would fail to apply with an exception:

  "attempted to mutate using not synced schema of ..."

The problem could surface when altering schema without changing
columns and restarting one of the nodes so that it forgets past
versions.

Fixes #1258.

Will be covered by dtest:

  SchemaManagementTest.test_prepared_statements_work_after_node_restart_after_altering_schema_without_changing_columns
2016-05-11 17:29:14 +02:00
Takuya ASADA
8503600e30 dist/common/systemd: drop hardcoded path
Stop using /var/lib/scylla, use $SCYLLA_HOME instead.
systemd seems does not extract variables on Environment="HOME=$SCYLLA_HOME", but both CentOS/Ubuntu able to run scylla-server without $HOME, so dropped it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462977871-26632-1-git-send-email-syuu@scylladb.com>
2016-05-11 17:53:53 +03:00
Calle Wilund
152bd82a05 alter_keyspace_statement: Handle missing replication strategy
ALTER KEYSPACE should allow no replication strategy to be set,
in which case old strategy should be kept.
Initial translation from origin missed this.

Fixes #1256

Message-Id: <1462967584-2875-2-git-send-email-calle@scylladb.com>
2016-05-11 16:02:22 +03:00
Calle Wilund
5604fb8aa3 cql3::statements::cf_prop_defs: Fix compation min/max not handled
Property parsing code was looking at wrong property level
for initial guard statement.

Fixes #1257

Message-Id: <1462967584-2875-1-git-send-email-calle@scylladb.com>
2016-05-11 16:02:16 +03:00
Takuya ASADA
c38b5fbb3d dist/common/scripts: On scylla_io_setup, run iotune on correct data directory which specified on scylla.yaml
Currently scylla_io_setup hardcoded to run iotune on /var/lib/scylla, but user may change data directory by modifying scylla.yaml, and it may on different block device.
So use scylla_config_get.py to get configuration from scylla.yaml, passes it to iotune.

Fixes #1167

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462955824-21983-2-git-send-email-syuu@scylladb.com>
2016-05-11 13:02:25 +03:00
Takuya ASADA
53820393da dist/common/scripts: add scylla.yaml parser for scripts
To parse scylla.yaml, scylla_config_get.py is added.
It can be use like 'scylla_config_get.py [key name]' from shell script, or command line.
This is needed for scylla_io_setup, to get 'data_file_directories' from shellscript.
Currently it does not supported to specify key name of nested data structure, but enough for scyll_io_setup.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462955824-21983-1-git-send-email-syuu@scylladb.com>
2016-05-11 13:02:23 +03:00
Pekka Enberg
d93d46e721 Merge "ALTER KEYSPACE" from Calle
"Implementation of ALTER KEYSPACE.
Fixes #429"
2016-05-10 22:07:06 +03:00
Takuya ASADA
a73924b4e0 dist/ubuntu/dep: introduce scylla-gdb-7.11 for Ubuntu 14.04LTS
Introduce scylla-gdb-7.11 for Ubuntu 14.04LTS, to get better support of recent version of g++ on gdb.

Fixes #969

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462825880-20866-3-git-send-email-syuu@scylladb.com>
2016-05-10 17:53:32 +03:00
Takuya ASADA
9ff2efb28b dist/common/dep: add Ubuntu support for scylla-env
Since Ubuntu 14.04LTS needs scylla-gdb package which install to /opt/scylladb, we need to port scylla-env package to Ubuntu as well.
This change introduces scylla-env package to Ubuntu 14.04LTS.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462825880-20866-2-git-send-email-syuu@scylladb.com>
2016-05-10 17:53:32 +03:00
Takuya ASADA
43cc77d1b8 dist/redhat/centos_dep: move scylla-env to dist/common to share with Ubuntu
Since Ubuntu 14.04LTS needs scylla-gdb package which install to /opt/scylladb, we need to port scylla-env package to Ubuntu as well.
To do it, share the package directory on dist/common/dep at first.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462825880-20866-1-git-send-email-syuu@scylladb.com>
2016-05-10 17:53:31 +03:00
Calle Wilund
147aa81177 Cql.g: Handle ALTER KEYSPACE 2016-05-10 14:36:46 +00:00
Calle Wilund
5c36d2e09e alter_keyspace_statement: Implement
Note: Like create keyspace, we don't properly validate 
replication strategy yet.
2016-05-10 14:36:17 +00:00
Piotr Jastrzebski
240a185727 Stop scanning keyspace data directory when populating.
Iterate over column families and check/create directories for them
instead of scanning keyspace data directory and filtering directories
against column families that exist in system tables for this keyspace.

Fixes #1008

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <26da66eec67a1ab1318917a66161915cdef924ab.1462890592.git.piotr@scylladb.com>
2016-05-10 17:35:55 +03:00
Calle Wilund
63b6c6bb5a migration_manager: Implement announce_keyspace_update
More or less the same as create keyspace...
2016-05-10 14:34:51 +00:00
Calle Wilund
8cdf4e37fb schema_tables: Fix merge_keyspaces to handle alter keyspace
Must keep "altered" alive into the call chain.
2016-05-10 14:32:51 +00:00
Calle Wilund
6ef7885ae3 database: Implement update_keyspace
Reloads keyspace metadata and replaces in existing keyspace. 
Note: since keyspace metadata, and consequently, replication 
strategy now becomes volatile, keyspace::metadata now returns
shared pointer by value (i.e. keep-alive). 
Replication strategy should receive the same treatment, but
since it is extensively used, but never kept across a 
continuation, I've just added a comment for now.
2016-05-10 14:31:30 +00:00
Raphael S. Carvalho
d80d194873 compaction_manager: stop compaction tasks in parallel
Purpose is to speed up shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a8db3492f1ceeea2a886d3920e5effa841ea155f.1462838670.git.raphaelsc@scylladb.com>
2016-05-10 10:03:35 +03:00
Avi Kivity
28cc6f97af Merge 2016-05-09 14:25:25 +03:00
Calle Wilund
917bf850fa transport::server: Do not treat accept exception as fatal
1.) It most likely is not, i.e. either tcp or more likely, ssl
    negotiation failure. In any case, we can still try next
    connection.
2.) Not retrying will cause us to "leak" the accept, and then hang
    on shutdown.

Also, promote logging message on accept exception to "warn", since
dtest(s?) depend on seeing log output.

Message-Id: <1462283265-27051-4-git-send-email-calle@scylladb.com>
2016-05-09 14:13:07 +03:00
Calle Wilund
437ebe7128 cql_server: Use credentials_builder to init tls
Slightly cleaner, and shard-safe tls init.

Message-Id: <1462283265-27051-3-git-send-email-calle@scylladb.com>
2016-05-09 14:12:59 +03:00
Calle Wilund
58f7edb04f messaging_service: Change tls init to use credentials_builder
To simplify init of msg service, use credendials_builder
to encapsulate tls options so actual credentials can be
more easily created in each shard.

Message-Id: <1462283265-27051-2-git-send-email-calle@scylladb.com>
2016-05-09 14:12:53 +03:00
Avi Kivity
29e103a2ae Merge seastar upstream
* seastar 7782ad4...3dec26f (3):
  > tests/mkcert.gmk: Fix makefile bug in snakeoil cert generator
  > tls_test: Add case to do a little checking of credentials_builder
  > tls: Add credentials_builder - copyable credentials "factory"
2016-05-09 14:12:29 +03:00
Tomasz Grabiec
1ca5ceadff Merge tag '1235-v2' from https://github.com/avikivity/scylla
From Avi:

When we shut down, we may have to give up on some pending atomic
sstable deletions, because not all shards may have agreed to delete
all members of the set.

This is expected, so silence these frightening error messages.

Fixes #1235.
2016-05-09 12:22:41 +02:00
Duarte Nunes
dada385826 rpc: Secure connection attempts can be cancelled
This patch adds support for secure connection attempts to be
cancellable.

Fixes #862

Includes seastar upstream merge:

* seastar f1a3520...7782ad4 (1):
  > Merge "rpc: Allow client connections to be cancelled" from Duarte

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1462783335-10731-1-git-send-email-duarte@scylladb.com>
2016-05-09 11:44:53 +03:00
Takuya ASADA
f7d41ba07a dist: Extract scylla.yaml and create metapackage
This patch create a scylla-conf package containing
scylla.yaml and a scylla package acting as a metapackage.

Fixes #421

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462280987-26909-1-git-send-email-syuu@scylladb.com>
2016-05-09 11:23:28 +03:00
Avi Kivity
4b34152870 Merge seastar upstream
* seastar ab74536...f1a3520 (2):
  > rpc: clear outgoing queue of a socket after failed connection
  > Merge "unconnected socket (now seastar::socket)" from Duarte

Fixes #1236.
2016-05-09 10:16:15 +03:00
Raphael S. Carvalho
3ac22bc0d7 compaction_manager: simplify code that waits for cleanup termination
Now that a task is created on demand, it's possible to wait for
termination of cleanup without extra machinery.
However, shared_future<> is now used because we may have more
than one fiber waiting for completion of task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <209de365c7782742dc2876a66f9d0784998cae53.1462599296.git.raphaelsc@scylladb.com>
2016-05-08 11:26:36 +03:00
Avi Kivity
ee7225a9cb sstables: silence atomic deletion cancellation logs during sstable deletion
Those logs are expected during shutdown.
2016-05-07 20:37:49 +03:00
Avi Kivity
80302d98dd database: silence atomic deletion cancellation logs during compaction
Those logs are expected during shutdown.
2016-05-07 20:37:48 +03:00
Avi Kivity
43221fc7e2 sstables: make delete_atomically() throw a distinct exception when cancelled
Throwing a runtime_error makes it impossible to catch the cancellation
exception, so replace it with a distinct exception class.
2016-05-07 20:37:46 +03:00
Calle Wilund
709dd82d59 storage_service: Add logging to match origin
Pointing out if CQL server is listing in SSL mode.
Message-Id: <1462368016-32394-2-git-send-email-calle@scylladb.com>
2016-05-06 13:27:55 +03:00
Raphael S. Carvalho
bf18025937 main: stop compaction manager earlier
Avi says:
"During shutdown, we prevent new compactions, but perhaps too late.
Memtables are flushed and these can trigger compaction."

To solve that, let's stop compaction manager at a very early step
of shutdown. We will still try to stop compaction manager in
database::stop() because user may ask for a shutdown before scylla
was fully started. It's fine to stop compaction manager twice.
Only the first call will actually stop the manager.

Fixes #1238.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c64ab11f3c91129c424259d317e48abc5bde6ff3.1462496694.git.raphaelsc@scylladb.com>
2016-05-06 07:41:29 +03:00
Calle Wilund
d8ea85cd90 messaging_service: Add logging to match origin
To announce rpc port + ssl if on.

Message-Id: <1462368016-32394-1-git-send-email-calle@scylladb.com>
2016-05-05 10:26:01 +03:00
Raphael S. Carvalho
b8277979ef compaction_manager: fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <82c6b93b24cbcc97f5eff3f91b05d4c1b415ecee.1462412927.git.raphaelsc@scylladb.com>
2016-05-05 10:06:56 +03:00
Avi Kivity
3aefa4f1d2 Merge seastar upstream
* seastar e536555...ab74536 (4):
  > reactor: kill max_inline_continuations
  > smp: optimize smp_message_queue::flush_request_batch() for empty queue
  > thread: do not yield if idle
  > Merge "Fixes for iotune" from Glauber
2016-05-05 09:48:58 +03:00
Gleb Natapov
f1cd52ff3f tests: test for result row counting
Message-Id: <1462377579-2419-2-git-send-email-gleb@scylladb.com>
2016-05-04 18:18:17 +02:00
Gleb Natapov
b75475de80 query: fix result row counting for results with multiple partitions
Message-Id: <1462377579-2419-1-git-send-email-gleb@scylladb.com>
2016-05-04 18:18:15 +02:00
Gleb Natapov
2a00c06dd5 query: fix non full clustering key deserialization
Clustering key prefix may have less columns than described in schema.
Deserailiaztion should stop when end of buffer is reached.

Message-Id: <20160503140420.GP23113@scylladb.com>
2016-05-04 17:42:28 +02:00
Raphael S. Carvalho
5aeeb0b3e8 compaction: add support to parallel compaction on the same column family
It was noticed that small sstables will accumulate for a column family because
scylla was limited to two compaction per shard, and a column family could have
at most one compaction running at a given shard. With the number of sstables
increasing rapidly, read performance is degraded.

At the moment, our compaction manager works by running two compaction task
handlers that run in parallel to the rest of the system. Each task handler
gets to run when needed, gets a column family from compaction manager queue,
runs compaction on it, and goes to sleep again. That's basically its cycle.
Compaction manager only allows one instance of a column family to be on its
queue, meaning that it's impossible for a column family to be compacted in
parallel. One compaction starts after another for a given column family.

To solve the problem described, we want to concurrently run compaction jobs
of a column family that have different "size tier" (or "weight").
For those unfamiliar, compaction job contains a list of sstables that will be
compacted together.
The "size tier" of a compaction job is the log of the total size of the input
sstables. So a compaction job only gets to run if its "size tier" is not the
same of an ongoing compaction. There is no point in compacting concurrently at
the same "size tier", because that slows down both compactions.

We will no longer queue column families in compaction manager. Instead, we
create a new fiber to run compaction on demand.
This fiber that runs asynchronously will do the following:
1) Get a compaction job from compaction strategy.
2) Calculate "size tier" of compaction job.
3) Run compaction job if its "size tier" is not the same of an ongoing
compaction for the given column family.
As before, it may decide to re-compact a column family based on a stat stored
in column family object.

Ran all compaction-related dtests.

Fixes #1216.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d30952ff136192a522bde4351926130addec8852.1462311908.git.raphaelsc@scylladb.com>
2016-05-04 11:46:09 +03:00
Calle Wilund
6d2caedafd auth: Make auth.* schemas use deterministic UUIDs
In initial implementation I figured this was not required, but
we get issues communicating across nodes if system tables
don't have the same UUID, since creation is forcefully local, yet
shared.

Just do a manual re-create of the scema with a name UUID, and
use migration manager directly.
Message-Id: <1462194588-11964-1-git-send-email-calle@scylladb.com>
2016-05-03 10:48:24 +03:00
Avi Kivity
24f90b087f Merge "fix range queries with limiter to not generate more requests than needed" from Gleb
Fixes #1204.
2016-05-02 15:14:45 +03:00
Gleb Natapov
3039e4c7de storage_proxy: stop range query with limit after the limit is reached 2016-05-02 15:10:15 +03:00
Gleb Natapov
db322d8f74 query: put live row count into query::result
The patch calculates row count during result building and while merging.
If one of results that are being merged does not have row count the
merged result will not have one either.
2016-05-02 15:10:15 +03:00
Gleb Natapov
41c586313a storage_proxy: fix calculation of concurrency queried ranges 2016-05-02 15:10:15 +03:00
Gleb Natapov
c364ab9121 storage_proxy: add logging for range query row count estimation 2016-05-02 15:10:15 +03:00
Calle Wilund
751ba2f0bf messaging_service: Change init to use per-shard tls credentials
Fixes: #1220

While the server_credentials object is technically immutable
(esp with last change in seastar), the ::shared_ptr holding them
is not safe to share across shards.

Pre-create cpu x credentials and then move-hand them out in service
start-up instead.

Fixes assertion error in debug builds. And just maybe real memory
corruption in release.

Requires seastar tls change:
"Change server_credentials to copy dh_params input"

Message-Id: <1462187704-2056-1-git-send-email-calle@scylladb.com>
2016-05-02 15:04:40 +03:00
Raphael S. Carvalho
ae95ce1bd7 sstables: optimize leveled compaction strategy
Leveled compaction strategy is doing a lot of work whenever it's asked to get
a list of sstables to be compacted. It's checking if a sstable overlaps with
another sstable in the same level twice. First, when adding a sstable to a
list with sstables at the same level. Second, after adding all sstables to
their respective lists.

It's enough to check that a sstable creates an overlap in its level only once.
So I am changing the code to unconditionally insert a sstable to its respective
list, and after that, it will call repair_overlapping_sstables() that will send
any sstable that creates an overlap in its level to L0 list.

By the way, the optimization isn't in the compaction itself, instead in the
strategy code that gets a set of sstables to be compacted.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <8c8526737277cb47987a3a5dbd5ff3bb81a6d038.1461965074.git.raphaelsc@scylladb.com>
2016-05-02 11:18:39 +03:00
Avi Kivity
dc69999fd8 Merge seastar upstream
* seastar dab58e4...e536555 (5):
  > rpc: introduce outgoing packet queue
  > Add condition variable implementation.
  > future-utils: support futures with multiple values in map_reduce
  > tests: rpc: stop client and server
  > tls_test: Add test for large-ish buffer send/recieve
2016-05-02 11:10:33 +03:00
Takuya ASADA
122330a5eb dist/common/scripts: add interactive prompt for package installation check, also check scylla-tools installed
Currently scylla_setup is unusable when user does not want to install scylla-jmx because it checks package unconditionally, but some users (or developers) does not want to install it, so let's ask to skip check or not on interactive prompt.

Also, scylla-tools package should installed for most of the case, added check code for the package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460662354-10221-1-git-send-email-syuu@scylladb.com>
2016-05-01 14:50:50 +03:00
Takuya ASADA
cc74b6ff5f dist/ubuntu: move lines from rules to .install/.dirs/.docs
To simplify build script, and make it easier spliting two packages,
use .install/.dirs/.docs instead of rules.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461960695-30647-1-git-send-email-syuu@scylladb.com>
2016-05-01 10:16:35 +03:00
Avi Kivity
434db0bc8b Update scylla-ami submodule
* dist/ami/files/scylla-ami 7019088...72ae258 (1):
  > Add --repo option to scylla_install_ami to construct AMI with custom repository URL
2016-04-28 16:41:30 +03:00
Takuya ASADA
6723978891 dist/ami: Add --repo option to build_ami.sh to construct AMI with custom repository URL
To build AMI from specified build of .rpm, custom repo URL option is required.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461849370-11963-1-git-send-email-syuu@scylladb.com>
2016-04-28 16:40:49 +03:00
Takuya ASADA
3ec47fbcf0 dist/ubuntu: unofficial support Debian 8.4
Unofficial support for Debian 8.4.
Now we supported both ubuntu and debian, but keep directory name as 'dist/ubuntu' for now.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461006868-28273-1-git-send-email-syuu@scylladb.com>
2016-04-27 15:39:20 +03:00
Pekka Enberg
31090f3116 Merge "Fix for systemd support on Ubuntu, add Ubuntu 16.04 support" from Takuya
"This is bug fix for systemd support on Ubuntu, and add Ubuntu 16.04 support."
2016-04-27 15:37:25 +03:00
Takuya ASADA
1cfde50102 dist/ubuntu: support 16.04
Drop 'unsupported release' message on 16.04.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
988b7bcd3d dist/ubuntu: don't use ubuntu-toolchain-r/test ppa repo on recent versions of Ubuntu, since it has newer g++
On Ubuntu 15.04 and newer, official g++ package is >= g++-4.9.
So we don't need to use development repository, just use official package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
fa0b90b727 dist/ubuntu: add dependency for libsystemd-dev to handle startup correctly on recent versions of Ubuntu
To handle scylla startup correctly on systemd versions of Ubuntu, scylla requires to build with libsystemd-dev.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
eae881ff70 dist/ubuntu: skip dh_installinit --upstart-only on recent versions of Ubuntu
Since 16.04LTS does not support this argument anymore, drop it on recent version of Ubuntu which does not uses Upstart.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
d5efa02eab dist/ubuntu/dep: Drop python-support on Ubuntu 16.04
Ubuntu 16.04 seems dropped python-support, so remove it from thrift package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
e733c2aae8 dist/ubuntu/dep: use distribution's thrift-compiler-0.9.1 on newer versions of Ubuntu
Use distribution's thrift if version > 14.04LTS.
14.04LTS doesn't have thrift-compiler-0.9.1, use our version.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Avi Kivity
ad9e75a3fa Merge seastar upstream
* seastar 15a92cf...dab58e4 (6):
  > tls: Fix tls sink::put so it deals with larger packets
  > tls: Change server_credentials to copy dh_params input
  > seastar thread: allow the thread_scheduling_group's usage fraction to change
  > seastar::async allow passing an attribute
  > thread: document undocumented classes
  > fair_queue: fix inconsistency during renormalization
2016-04-27 10:39:09 +03:00
Avi Kivity
454512a272 dist/redhat: package scylla_kernel_check
Can't build rpm without this.
Message-Id: <1461683947-30356-1-git-send-email-avi@scylladb.com>
2016-04-27 08:38:48 +03:00
Tomasz Grabiec
61435108a5 query: Do not take arguments via ... in the visitor
Amnon reports that current code fails to compile on gcc 4.9:

distcc[9700] ERROR: compile /home/amnon/.ccache/tmp/query.tmp.localhost.localdomain.9673.ii on localhost failed
In file included from query.cc:30:0:
query-result-reader.hh: In instantiation of ‘void query::result_view::consume(const query::partition_slice&, ResultVisitor&&) [with ResultVisitor = query::result::calculate_row_count(const query::partition_slice&)::<anonymous struct>&]’:
query.cc:196:32:   required from here
query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class clustering_key_prefix’ through ‘...’
                     visitor.accept_new_row(*row.key(), static_row, view);
                     ^
query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’
query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’
query-result-reader.hh:186:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’
                     visitor.accept_new_row(static_row, view);
                     ^
query-result-reader.hh:186:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’

Work around the problem by not using '...'.
Message-Id: <1460964042-2867-1-git-send-email-tgrabiec@scylladb.com>
2016-04-26 14:50:35 +03:00
Takuya ASADA
eb9bd3ee21 dist/common/scripts: show knowledge base URL when kernel is too old
To explain why this kernel is not supported, we need to show kb URL here.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461644708-32078-1-git-send-email-syuu@scylladb.com>
2016-04-26 14:43:10 +03:00
Takuya ASADA
05ac4bb99d dist/common/scripts: notice restart required after changing bootparameters
Fixes #1115

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459330851-32470-1-git-send-email-syuu@scylladb.com>
2016-04-26 14:41:49 +03:00
Tomasz Grabiec
88bb5fcb53 api: Fix error message
Keyspace and table names are separated by a single colon.
Message-Id: <1461600269-4070-1-git-send-email-tgrabiec@scylladb.com>
2016-04-26 08:40:28 +03:00
Takuya ASADA
e7f438eeae dist/ubuntu: Drop dependency to libthrift0, link it statically
Drop dependency to libthrift0 on installation time, link libthrift statically.
With this fix, we don't need to distribute libthrift0 deb package anymore to install scylla-server binary package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461594460-2403-2-git-send-email-syuu@scylladb.com>
2016-04-25 17:44:46 +03:00
Takuya ASADA
ec2ef467c8 configure.py: configure.py: add --static-thrift option to link libthrift statically
This is needed for Ubuntu packaging, to drop dependency to libthrift0 on installation time.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461594460-2403-1-git-send-email-syuu@scylladb.com>
2016-04-25 17:44:44 +03:00
Avi Kivity
af803a9149 Merge seastar upstream
* seastar 2b3c363...15a92cf (2):
  > smp: allow more than 128 in-flight operations on core-to-core queue
  > future: balance constructors and destructors in future_state<>

Fixes #1205.
2016-04-25 13:34:27 +03:00
Calle Wilund
cdd0f00de5 client_state: Remove unwarranted keyspace check
"has_keyspace_access" is not supposed to (according to origin)
verify that a keyspace exists. Remove.
It (and all others) are however supposed to check "ks" (name)
not empty. Add this.
Message-Id: <1461578072-24113-1-git-send-email-calle@scylladb.com>
2016-04-25 13:16:36 +03:00
Calle Wilund
49d3d79dfe sstables: Fix compilation error on boost 1.55
Message-Id: <1461067254-526-2-git-send-email-calle@scylladb.com>
2016-04-25 12:54:44 +03:00
Calle Wilund
9130b0de16 database.cc: Fix compilation error with boost 1.55
Message-Id: <1461067254-526-1-git-send-email-calle@scylladb.com>
2016-04-25 12:54:43 +03:00
Takuya ASADA
c657a431dc dist/common/scripts: Fix incorrect order to run scylla_sysconfig_setup on scylla_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461174517-12441-2-git-send-email-syuu@scylladb.com>
2016-04-25 11:09:49 +03:00
Takuya ASADA
9a99231f6b dist/common/scripts: On scylla_setup, skip showing 'lo' interface on sysconfig prompt
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461174517-12441-1-git-send-email-syuu@scylladb.com>
2016-04-25 11:09:48 +03:00
Takuya ASADA
611b0a3400 dist/common/scripts: Add kernel version check
Check kernel version at beginning of scylla_setup, show error when kernel is too old.
Use iotune --fs-check to check kernel.

Fixes #1116

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459886738-10882-1-git-send-email-syuu@scylladb.com>
2016-04-24 17:47:13 +03:00
Vlad Zolotarov
813ad4024f query_processor: account unprepared statements executions
Add the statistics counter for a number of unprepared statements
executions and expose it with collectd.

Since in our implementation a number of unprepared statements executions
equals to a number of executions of prepare() function we may simply
increment the new statistics counter every time query_processor::get_statement()
is called.

Fixes #1068

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1461503492-32228-1-git-send-email-vladz@cloudius-systems.com>
2016-04-24 16:55:15 +03:00
Avi Kivity
c6b5890eb2 Merge 2016-04-24 16:17:00 +03:00
Pekka Enberg
f6da9bc92b Merge "Additional mutations/queries related collectd metrics" from Vlad
"This series introduces some additional metrics (mostly) in a storage_proxy and
a database level that are meant to create a better picture of how data flows
in the cluster.

First of all where possible counters of each category (e.g. total writes in the storage
proxy level) are split into the following categories:
   - operations performed on a local Node
   - operations performed on remote Nodes aggregated per DC

In a storage_proxy level there are the following metrics that have this "split"
nature (all on a sending side):
   - total writes (attempts/errors)
   - writes performed as a result of a Read Repair logic
   - total data reads (attempts/completed/errors)
   - total digest reads (attempts/completed/errors)
   - total mutations data reads (attempts/completed/errors)

In a batchlog_manager:
   - writes performed as a result of a batchlog replay logic

Thereby if for instance somebody wants to get an idea of how many writes
the current Node performs due to user requested mutations only he/she has
to take a counter of total writes and subtract the writes resulted by Read
Repairs and batchlog replays.

On a receiving side of a storage_proxy we add the two following counters:
   - total number of received mutations
   - total number of forwarded mutations (attempts/errors)

In order to get a better picture of what is going on on a local Node
we are adding two counters on a database level:
   - total number of writes
   - total number of reads

Comparing these to total writes/reads in a storage_proxy may give a good
idea if there is an excessive access to a local DB for example."
2016-04-21 15:58:45 +03:00
Takuya ASADA
2bfc8e8c12 main: add tcp_syncookies sanity check
Check net.ipv4.tcp_syncookies, show error message when it set to 0.
Fixes #1118

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460738415-3798-1-git-send-email-syuu@scylladb.com>
2016-04-21 14:55:26 +03:00
Pekka Enberg
3f1fcca3bc cql3: Fix DROP KEYSPACE error message when keyspace does not exist
Commit d3fe0c5 ("Refactor db/keyspace/column_family toplogy") changed
database::find_keyspace() to throw a std::nested_exception so the catch
block in migration_manager::announce_keyspace_drop() no longer catches
the exception. Fix the issue by explicitly checking if the keyspace
exists and throwing the correct exception type if it doesn't.

Fixes TestCQL.keyspace_test.
Message-Id: <1461218910-26691-1-git-send-email-penberg@scylladb.com>
2016-04-21 12:42:45 +02:00
Vlad Zolotarov
4ef5b11e9b batchlog_manager: add a counter for a total number of write attempts
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:29:21 +03:00
Vlad Zolotarov
97e5bfa815 database: add metrics for total writes and reads
This patch adds a counter of total writes and reads
for each shard.

It seems that nothing ensures that all database queries are
ready before database object is destroyed.
Make _stats lw_shared_ptr in order to ensure that the object is
alive when lambda gets to incrementing it.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:53 +03:00
Vlad Zolotarov
9bf8253412 storage_proxy: add read requests split counters
Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters
for the following read categories:
   - data reads
   - digest reads
   - mutation data reads

Each category is added attempts, completions and errors metrics.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:19 +03:00
Vlad Zolotarov
cbcbdc3b4a storage_proxy: add split counters for writes
Added split metrics for operations on a local Node and on external
Nodes aggregated per Nodes' DCs.

Added separate split counters for:
    - total writes attempts/errors
    - read repair write attempts (there is no easy way to separate errors
      at the moment)

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:15 +03:00
Vlad Zolotarov
c92654b281 storage_proxy: add counters for received and forwarded mutations
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:27:29 +03:00
Piotr Jastrzebski
8231385e0c sstables: Remove unused code from mp_row_consumer
_mutation_to_subscription is not used anywhere so
it should probably be removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <90ef62daee0c183b29dcb86d08843145d657ea38.1461179970.git.piotr@scylladb.com>
2016-04-20 23:10:43 +03:00
Raphael S. Carvalho
eb51c93a5a tests: fix use-after-free in sstable test
After commit a843aea547, a gate was introduced to make sure that
an asynchronous operation is finished before column family is
destroyed. A sstable testcase was not stopping column family,
instead it just removed column family from compaction manager.
That could cause an user-after-free if column family is destroyed
while the asynchronous operation is running. Let's fix it by
stopping column family in the test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ed910ec459c1752148099e6dc503e7f3adee54da.1461177411.git.raphaelsc@scylladb.com>
2016-04-20 22:08:08 +03:00
Pekka Enberg
7af9ac2880 Merge "Add support for User Defined Types" from Duarte
"This patchset enables support for user defined types,
 completing the functionality that was already in place.

 Fixes #426"
2016-04-20 21:26:03 +03:00
Yoav Kleinberger
1543253bfd scyllatop: differentiate metrics coming from different hosts
Fix issue #1173.
Previously scyllatop aggregated metrics coming from a cluster with many
hosts so that individual contributions could not be recognized. This is
now changed so that aggregation is also by hostname.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <8a4d8b82216d8c1aa855026ff31bcfd8bfac7e47.1461150261.git.yoav@scylladb.com>
2016-04-20 20:20:09 +02:00
Duarte Nunes
c04f8c239e udt: Enable user type query test case
This patch enables the test case for user defined types in
cql_query_test.

Fixes #426

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
bc90d6a730 udt: type_parser handles user defined types
This patch ensures type_parser can handle user defined types. It also
prefixes user_type_impl::make_name() with
org.apache.cassandra.db.marshal.UserType.

Fixes #631

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
b5a87f8bdc udt: Add unit test for user type schema changes
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
7911438de0 udt: Add grammar for altering user types
This patch adds support in Cql.g for the alter user type statement.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
fbf70e9bed udt: Add alter type statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
3e663cfa9a udt: Add capability to replace a user_type
This patch adds a function to abstract_type that locates the usage of
a given user_type and recursively returns an updated version of the
containing type containing the updated user type.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:06 +02:00
Duarte Nunes
6cb57a567f udt: Add grammar for dropping user types
This patch adds support in Cql.g for the drop user type statement.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:06 +02:00
Duarte Nunes
809b45e160 udt: Add drop type statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:02 +02:00
Calle Wilund
7f85373e15 cql3/drop_table_statement: Fix exception handling in access check
Tried to handle possibly benign exception in continuation, but
this is always thrown synchronously.
Fixes ttl_test dtest failures.

Message-Id: <1461154499-10674-1-git-send-email-calle@scylladb.com>
2016-04-20 15:49:04 +03:00
Duarte Nunes
66c60f03fe udt: Add references_user_type to abstract_type
This patch adds a virtual function to the abstract_type hierarchy to
tell whether a given type references the specified type. Needed to
implement the drop and alter type statements.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:07 +02:00
Duarte Nunes
6732da67ab udt: Add is_user_type function to abstract_type
This patch adds a function to identify a given abstract_type as a
user_type_impl.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:07 +02:00
Duarte Nunes
ddb4a4b29b udt: Implement as_cql3_type for user_type_impl
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
35a88b5d49 udt: Complete create_type_statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
d1f215b743 udt: Merge user defined type mutations
This patch implements the merge_types() function,
allowing mutations to user defined types to be applied.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
fdddcfb3ea udt: Fix user type compatibility check
A new user type is checked for compatibility against the previous
version of that type, so as to ensure that an updated field type
is compatible with the previous field type (e.g., altering a field
type from text to blob is allowed, but not the other way around).

However, it is also possible to add new fields to a user type. So,
when comparing a user type against its previous version, we should
also allow the current, new type to be longer than the previous one.
The current code instead allows for the previous type to be longer,
which this patch fixes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
eae7f10906 map_difference: Allow on unordered_map
This patch changes the map_difference interface so difference()
can be called on on unordered_maps.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
7dc895e63d types: Add operator== for abstract_types
This patch allows abstract_types to be compared for equality. In
particular, it enables the indirect_equal_to<abstract_type> idiom.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
0aeb4dcaaf udt: Implement equals() for user_type_impl
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
d6d29f7c52 schema: Replace ad hoc func with indirect_equal_to
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
08a7bba4ed udt: Announce UDT migrations
This patch defines the member functions responsible for announce
create, update and drop user defined types migration.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
dd75fe8ec0 udt: Add mutations for user defined types
This patch implements mutations for user defined types.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
37a1547971 udt: Add migration notifications
This patch adds migration notifications for user defined types.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
c2e3e918e8 udt: Take name by ref when querying for an UDT
..so as not to incur in a copy.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
2c15778fe0 udt: Remove user_types field from keyspace
This field is superfluous and adds confusion regarding the user_types
field in the keyspace metadata.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
c7b3a4b144 udt: Parse user types system table
This patch loads and parses the user types system table during
bootstrap.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
f8d8dbdeb7 types: Don't wrap tombstone in an std::optional
All the callers of do_serialize_mutation_form pass a valid tombstone
that is converted into a non-empty optional. This happens even if the
tombstone is empty (tombstone::timestamp == api::missing_timestamp).

This patch fixes this by passing in a reference to the tombstone which
is convertible to bool, based on whether it is empty or not.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1460620528-3628-1-git-send-email-duarte@scylladb.com>
2016-04-20 09:22:01 +02:00
Duarte Nunes
40c1b29701 cql3: Implement contains relation
Although it doesn't work in the absence of secondary indexes,
now we provide the same error messages as origin when trying to use
the contains relation.

Fixes #1158

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1461088626-26958-1-git-send-email-duarte@scylladb.com>
2016-04-20 09:22:25 +03:00
Pekka Enberg
4ed702f0da Merge "Authorizer support" from Calle
"Conversion/implementation of "authorizer" code from origin, handling
 permissions management for users/resources.

 Default implementation keeps mapping of <user.resource>->{permissions}
 in a table, contents of which is cached for slightly quicker checks.

 Adds access control to all (existing) cql statements.
 Adds access management support to the CQL impl. (GRANT/REVOKE/LIST)

 Verified manually and with dtest auth_test.py. Note that several of these
 still fail due to (unrelated) unimplemented features, like index, types
 etc.

 Fixes #1138"
2016-04-19 15:00:38 +03:00
Calle Wilund
4c246b5cc3 scylla.yaml: Move authorizer/authenticator options to supported section 2016-04-19 11:49:06 +00:00
Calle Wilund
9ed25a970e Cql.g: Permission statements parsing 2016-04-19 11:49:06 +00:00
Calle Wilund
3b101c6e19 cql3::statements::drop_user_statement: Drop all permissions for user 2016-04-19 11:49:06 +00:00
Calle Wilund
14cc47d8b9 cql3::statements::revoke_statement: Initial conversion 2016-04-19 11:49:06 +00:00
Calle Wilund
4e1ef3c1bc cql3::statements::grant_statement: Initial conversion 2016-04-19 11:49:05 +00:00
Calle Wilund
04c37def3a cql3::statements::list_permissions_statement: Initial conversion 2016-04-19 11:49:05 +00:00
Calle Wilund
fe23447f6f cql3::statements::permission_altering_statement: Inital conversion
Alter permission base typ
2016-04-19 11:49:05 +00:00
Calle Wilund
add2111c0a cql3::statements::authorizarion_statement: Initial conversion
Auth cql base type
2016-04-19 11:49:05 +00:00
Calle Wilund
3906dc9f0d cql3::statements: Change check_access to future<> + implement 2016-04-19 11:49:05 +00:00
Calle Wilund
dac6cf69eb service::client_state: Add authorization checkers 2016-04-19 11:49:05 +00:00
Calle Wilund
072acc68da validation: Add KS validation + convinence methods
Looking up local db.
2016-04-19 11:49:05 +00:00
Calle Wilund
a7e1af1c06 db::config: Add permissions cache entries/mark auth/perm as used 2016-04-19 11:49:05 +00:00
Calle Wilund
36bb40c205 auth::auth: Add authorizer initialization + permissions getter
Create and init authorizer object on start. Create thread local
permissions cache to front end the actual authorizer.
2016-04-19 11:49:05 +00:00
Calle Wilund
03568d0325 tests::cql_test_env: Fake logged in user in case test requires is. 2016-04-19 11:49:05 +00:00
Calle Wilund
ead1c882f8 utils::loading_cache: Version of the LoadingCache type used in origin
Simple, expiring, cache of potentially limited number of entries.
2016-04-19 11:49:05 +00:00
Calle Wilund
956ee87e12 auth::authenticator: Change "protected_resources" to return reference
It it an immutable static value anyway.
2016-04-19 11:49:05 +00:00
Calle Wilund
1f0bbf2d9a auth::authorizer: Initial conversion
Main authorization endpoint. Default (and only) real authorizer
keeps a mapping resource -> permission sets in system table
2016-04-19 11:49:04 +00:00
Benoît Canet
e17795d2dd scylla_dev_mode_setup: Unify --developer-mode prompt and parsing
Fixes: #1194

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1461002978-5379-2-git-send-email-benoit@scylladb.com>
2016-04-19 09:38:03 +03:00
Takuya ASADA
f6252be0c1 utils: fix compilation error on utils/exceptions.hh
It doesn't able to find std::system_error due to missing header.

Fixes #1202

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461006884-28316-1-git-send-email-syuu@scylladb.com>
2016-04-19 09:37:31 +03:00
Raphael S. Carvalho
bf03cd1ea6 sstables: kill unused code from size tiered strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <485b1e49419cb052218ab4558f27270ce3bd03b4.1460761821.git.raphaelsc@scylladb.com>
2016-04-19 08:46:06 +03:00
Raphael S. Carvalho
29db5f5e1f sstables: move compaction strategy code to a new source file
Moving compaction strategy code from sstables/compaction.cc to
sstables/compaction_strategy.cc
That improves readability. Strategy code should be separated
from the generic compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <5af6fc8f7321351a071fc0ce03c80ffea21f8396.1460761821.git.raphaelsc@scylladb.com>
2016-04-19 08:45:43 +03:00
Pekka Enberg
a3a0404d33 Merge seastar upstream
* seastar 2185f37...2b3c363 (1):
  > net/tls: Fix compilation with older GnuTLS versions
2016-04-19 08:43:36 +03:00
Calle Wilund
c446fe50e6 tuple_hash: Add convinence operator for two arguments (non-pair) 2016-04-18 13:51:15 +00:00
Calle Wilund
f0d2efd206 data_value: Add constructor from unordered_set<> 2016-04-18 13:51:15 +00:00
Calle Wilund
690c7207fe cql3::untyped_result_set: Add get_set<> method
Gets a value as a, you guessed it, set.
2016-04-18 13:51:15 +00:00
Calle Wilund
443af44f24 log: Add output operator for std::exception&/std::system_error& 2016-04-18 13:51:15 +00:00
Calle Wilund
ca7d339110 auth::authenticated_user: Add copy/move constructors 2016-04-18 13:51:15 +00:00
Calle Wilund
d3a9650646 auth::permission_set: Add < operator 2016-04-18 13:51:15 +00:00
Calle Wilund
c93d114949 auth::permission: Add stringizers + move sets into namespace 2016-04-18 13:51:15 +00:00
Calle Wilund
6e09920f93 auth::data_resource: Fix to_string to match origin 2016-04-18 13:51:15 +00:00
Calle Wilund
bb96e5bd66 auth::data_resource: Move declaration of "resource_ids" 2016-04-18 13:51:15 +00:00
Takuya ASADA
2eb91421eb dist/ami: Show correct login message when scylla-ami-setup.service is still running
While scylla-ami-setup.service is running, login message says "run systemctl status scylla-server" to see status, but it actually never launched yet.

This patch fixes the message to notice RAID construction is running, and 'systemctl status scylla-ami-setup' is the correct way to see status.

Fixes #1035

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460660628-10103-2-git-send-email-syuu@scylladb.com>
2016-04-18 15:22:02 +03:00
Takuya ASADA
07a6057c03 dist/ami: fix incorrect service name on .bash_profile
Ubuntu's service name on .bash_profile is incorrect, fix it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460660628-10103-1-git-send-email-syuu@scylladb.com>
2016-04-18 15:21:48 +03:00
Tomasz Grabiec
45527fcffa Merge branch 'glommer/issue-1144-v5'
From Glauber:

There are current some outstanding issues with the throttling code. It's
easier to see them with the streaming code, but at least one of them is general.

One of them is related to situations in which the amount of memory available
leaves only one memtable fitting in memory. That would only happen with the
general code if we set the memtable cleanup threshold to 100 % - and I don't
even know if it is valid - but will happen quite often with the streaming code.
If that happens, we'll start throttling when that memtable is being written,
but won't be able to put anything else in its place - leading to unnecessary
throttling.

The second, and more serious, happens when we start throttling and the amount
of available memory is not at least 1MB. This can deadlock the database in
the sense that it will prevent any request from continuing, and in turn causing
a flush due to memtable size. It is a good practice anyway to always guarantee
progress.

Fixes #1144
2016-04-18 12:20:13 +02:00
Gleb Natapov
f3b515052b udt: fix error generation if accessed type is not udt
Fixes #1198
Message-Id: <1460884314-3717-2-git-send-email-gleb@scylladb.com>
2016-04-18 12:45:03 +03:00
Duarte Nunes
ece89069dd udt: Implement to_string() for selectable
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1460884314-3717-1-git-send-email-gleb@scylladb.com>
2016-04-18 12:44:48 +03:00
Pekka Enberg
edf7f098e2 Merge "Fix query of collection cell with all items deleted" from Tomek 2016-04-18 11:01:24 +03:00
Tomasz Grabiec
2e08d0f698 Merge branch 'dev/gleb/logging'
Logging improvements from Gleb.
2016-04-15 19:03:44 +02:00
Tomasz Grabiec
89bc32b020 tests: Add test for query of collection with deleted item 2016-04-15 18:14:05 +02:00
Tomasz Grabiec
c69d0a8e87 mutation_partition: Fix collection emptiness check
Broken by f15c380a4f.

This resulted in empty collection being returned in the results
instead of no collection.

Fixes org.apache.cassandra.cql3.validation.entities.CollectionsTest
from cassandra-unit-tests.
2016-04-15 18:14:05 +02:00
Tomasz Grabiec
b0d4782016 types: Add default argument values to is_any_live() 2016-04-15 18:14:05 +02:00
Avi Kivity
0de32ab120 Merge seastar upstream
* seastar 2aeb9dd...2185f37 (15):
  > reactor: avoid issuing systemwide memory barriers in parallel
  > Revert "Use sys_membarrier() when available"
  > Merge "Various exception-safety fixes" from Tomasz
  > future-util: make map reduce exception safe
  > collectd: do not give up after a failure
  > future-util: make repeat_until_value exception safe
  > rpc: do not block connection when unknown verbs is received
  > rpc: do not wait for a reply after timeout
  > rpc: move connection stats to base class
  > core/reactor: Handle io_submit failures inside flush_pending_aio
  > apps/iotune: add --fs-check option to use iotune for kernel version check
  > Merge "Some exception safety patches" from Paweł
  > tls: Fix conversion of dh_params::level to gnutls_sec_param_t
  > core: posix_thread: Mark start_routine as noexcept
  > fair_queue: better overflow protection
2016-04-15 16:06:53 +03:00
Pekka Enberg
3f2286d02e Merge "Delete compacted sstables atomically" from Avi
"If we compact sstables A, B into a new sstable C we must either delete both
A and B, or none of them.  This is because a tombstone in B may delete data
in A, and during compaction, both the tombstone and the data are removed.
If only B is deleted, then the data gets resurrected.

Non-atomic deletion occurs because the filesystem does not support atomic
deletion of multiple files; but the window for that is small and is not
addressed in this patchset.  Another case is when A is shared across
multiple shards (as is the case when changing shard count, or migrating
from existing Cassandra sstables).  This case is covered by this patchset.

Fixes #1181."
2016-04-14 22:04:15 +03:00
Glauber Costa
9c87ae3496 throttle: always release at least one request if we are below the limit
Our current throttling code releases one requests per 1MB of memory available
that we have. If we are below the memory limit, but not by 1MB or more, then
we will keep getting to unthrottle, but never really do anything.

If another memtable is close to the flushing point, those requests may be
exactly the ones that would make it flush. Without them, we'll freeze the
database.

In general, we need to always release at least one request to make sure that
progress is always achieved.

This fixes #1144

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 13:13:15 -04:00
Gleb Natapov
9801d69d53 storage_proxy: add query result row count to brief format
Report number of rows in brief reporting format, but only if
we can count them without linearizing result's buffer.
2016-04-14 19:26:00 +03:00
Gleb Natapov
53993527ed storage_proxy: move verbose query result printing into separate logger
If query result is large tracing cannot be done since printing the result takes
too much time and space.
2016-04-14 19:26:00 +03:00
Gleb Natapov
46e5d05220 storage_proxy: cleanup query logging.
Since commit c1cffd06 logger catch errors internally, so no need to
catch most of them at the top level. Only those that can happen during
parameter evaluation can reach here. Change parameters to not throw
too.
2016-04-14 19:26:00 +03:00
Gleb Natapov
15ebe5e4e5 query: add calculate_row_count function to query::result 2016-04-14 19:26:00 +03:00
Gleb Natapov
f47b2dad18 query: add lazy printer to query::result
query::result transformation to printable form is very heavy operation
that allocates memory and thus can fail. Add a class to query::result that
can be used with logger to push to string conversion when output is
performed.
2016-04-14 19:26:00 +03:00
Glauber Costa
2c5dfe08c1 memtable_list: make sure at least two memtables are available
This is usually not a problem for the main memtable list - although it can be,
depending on settings, but shows up easily for the streaming memtables list.

We would like to have at least two memtables, even if we have to cut it short.
If we don't do that, one memtable will have use all available memory and we'll
force throttling until the memtable gets totally flushed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 12:12:50 -04:00
Glauber Costa
1daede7396 unnest throttle_state
throttle_state is currently a nested member of database, but there is no
particular reason - aside from the fact that it is currently only ever
referenced by the database for us to do so.

We'll soon want to have some interaction between this and the column family, to
allow us to flush during throttle. To make that easier, let's unnest it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 12:12:50 -04:00
Glauber Costa
39def369ce move information about memtables' region group inside memtable list
This is a preparation patch so we can move the throttling infrastructure inside
the memtable_list. To do that, the region group will have to be passed to the
throttler so let's just go ahead and store it.

In consequence of that, all that the CF has to tell us is what is the current
schema - no longer how to create a new memtable.

Also, with a new parameter to be passed to the memtable_list the creation code
gets quite big and hard to follow. So let's move the creation functions to a
helper.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 12:12:50 -04:00
Avi Kivity
a843aea547 db: delete compacted sstables atomically
If sstables A, B are compacted, A and B must be deleted atomically.
Otherwise, if A has data that is covered by a tombstone in B, and that
tombstone is deleted, and if B is deleted while A is not, then the data
in A is resurrected.

Fixes #1181.
2016-04-14 17:14:26 +03:00
Avi Kivity
3798d04ae8 sstables: convert sstable::mark_for_deletion() to atomic deletion infrastructure
All deletions must go through the same data structure, or some atomic
deletions will never be satisified.
2016-04-14 17:14:26 +03:00
Avi Kivity
e43dbac836 main: cancel pending atomic deletions on shutdown
A shared sstable must be compacted by all shards before it can be deleted.
Since we're stoping, that's not going to happen.  Cancel those pending
deletions to let anyone waiting on them to continue.
2016-04-14 17:14:26 +03:00
Avi Kivity
2ba584db8d sstables: add delete_atomically(), for atomically deleting multiple sstables
When we compact a set of sstables, we have to remove the set atomically,
otherwise we can resurrect data if the following happens:

 insert data to sstable A
 insert tombstone to sstable B
 compact A+B -> C (removing both data and tombstone)
 delete B only
 read data from A

Since an sstable may be shared by multiple shard, and each shard performs
compaction at a different time, we need to defer deletion of an sstable
set until all shards agree that the set can be deleted.

An additional atomicity issue exists because posix does not provide a way
to atomically delete multiple files.  This issue is not addressed by this
patch.
2016-04-14 17:14:26 +03:00
Pekka Enberg
a1a9294d8c Merge "Support nodetool removenode force and status" from Asias
"With this series, we support all the 3 nodetool removenode commands, e.g.,

$ nodetool removenode 778948bf-6709-4eb5-80fe-bee911e9c3bf

$ nodetool removenode status
RemovalStatus: Removing token (-8969872965815280276). Waiting for
replication confirmation from [127.0.0.3,127.0.0.1].

$ nodetool removenode force
RemovalStatus: No token removals in process.

Tested with:

1)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1

2)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1
- kill -9 node3
- nodetool removenode will wait forever since node3 is gonne, node3
  will never send the replication confirmation to node1
- run nodetool removenode force on node1
  nodetool removenode completes with the following error:
    $ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e
    nodetool: Scylla API server HTTP POST to URL
    '/storage_service/remove_node' failed: nodetool removenode force is called by user
  nodetool removenode force completes sucessfully
    $ nodetool removenode force
    RemovalStatus: Removing token (-9171569494049085776). Waiting for
    replication confirmation from [127.0.0.3,127.0.0.1].

Fixes #1135."
2016-04-14 15:44:33 +03:00
Pekka Enberg
144d1e3216 dist/docker/redhat: Start up JMX proxy and include tools
Make the Docker image more user-friendly by starting up JMX proxy in the
background and install Scylla tools in the image. Also add a welcome
banner like we have with our AMI so that users have pointers to nodetool
and cqlsh, as well as our documentation.
Message-Id: <1460376059-3678-1-git-send-email-penberg@scylladb.com>
2016-04-14 15:41:21 +03:00
Pekka Enberg
355c3ea331 dist/docker/redhat: Make sure image builds against latest Scylla
Use "yum clean expire-cache" to make sure we build against the latest
Scylla release.
Message-Id: <1460374418-27315-1-git-send-email-penberg@scylladb.com>
2016-04-14 15:41:10 +03:00
Gleb Natapov
6f13715f8c storage_proxy: add logging to read executor creation path
Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>
2016-04-14 14:58:02 +03:00
Gleb Natapov
14ecadb247 storage_proxy: add logging for mutation write path
Message-Id: <1460549369-29523-3-git-send-email-gleb@scylladb.com>
2016-04-14 14:57:29 +03:00
Gleb Natapov
dbb1217896 cl: enable logging for insufficient LOCAL_QUORUM consistency
Message-Id: <1460549369-29523-2-git-send-email-gleb@scylladb.com>
2016-04-14 14:56:58 +03:00
Gleb Natapov
dfdbb1e703 storage_proxy: move hack to make coordinator most preferable node for read into sorting function
This is kind of sorting, so it belongs there, but it also fixes a bug in
storage_proxy::get_read_executor() that assumes filter_for_query() do
not change order of nodes in all_nodes when extra replica is chosen.
Otherwise if coordinator ip happens to be last in all_nodes then it will
be chosen as extra replica and will be quired twice.
Message-Id: <1460549369-29523-1-git-send-email-gleb@scylladb.com>
2016-04-14 14:56:21 +03:00
Duarte Nunes
73e3b5ac5d udt: Fix user type compatibility check
A new user type is checked for compatibility against the previous
version of that type, so as to ensure that an updated field type
is compatible with the previous field type (e.g., altering a field
type from text to blob is allowed, but not the other way around).

However, it is also possible to add new fields to a user type. So,
when comparing a user type against its previous version, we should
also allow the current, new type to be longer than the previous one.
The current code instead allows for the previous type to be longer,
which this patch fixes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1460627939-11376-12-git-send-email-duarte@scylladb.com>
2016-04-14 13:30:37 +03:00
Takuya ASADA
f98997120a dist: #!/bin/bash for all scripts
We choosed #!/bin/sh for shebang when we started to implement installer scripts, not bash.
After we started to work on Ubuntu, we found that we mistakenly used bash syntax on AMI script, it caused error since /bin/sh is dash on Ubuntu.
So we changed shebang to /bin/bash for the script, from that time we have both sh scripts and bash scripts.
(2f39e2e269)
If we use bash syntax on sh scripts, it won't work on Ubuntu but works on Fedora/CentOS, could be very easy to confusing.
So switch all scripts to #!/bin/bash. It will much safer.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460594643-30666-1-git-send-email-syuu@scylladb.com>
2016-04-14 12:01:28 +03:00
Pekka Enberg
60352f810a Merge "Fixes for the reading of missing Summary" from Glauber
"This patchset contains some fixes spotted during post-merged review
by {Nad,}av{,i}. I don't consider any of them a must for backport to 1.0,
but since we haven't yet even backported the main series, might as well backport
everything.

It also includes some unit tests to make sure that they will be kept working
in the future."
2016-04-13 11:32:05 +03:00
Raphael S. Carvalho
beaacbda2e tests: test that leveled strategy was fixed
L1 wasn't being compacted into L2.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1a357896a448eafa7da4d28bc56fa02b89d4193e.1460508373.git.raphaelsc@scylladb.com>
2016-04-13 11:14:28 +03:00
Raphael S. Carvalho
c7b728e716 sstables: Fix leveled compaction strategy
There is a problem in the implementation of leveled compaction strategy that
prevents level 1 from being compacted into level 2, and so forth. As a result,
all sstables will only belong to either level 0 or 1. One of the consequences
is level 1 being overwhelmed by a huge amount of sstables.

The root of the problem is a conditional statement in the code that prevents a
single sstable, with level > 0, from being compacted into a subsequent level
that is empty or has no overlapping sstables.

Fixes #1180.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9a4bffdb0368dea77b49c23687015ff5832299ab.1460508373.git.raphaelsc@scylladb.com>
2016-04-13 11:14:14 +03:00
Asias He
1e84699a64 api: Wire up storage_service removal_status and force_remove_completion
They are used by nodetool removenode:

$ nodetool removenode force
$ nodetool removenode status

For example:

$ nodetool removenode status
RemovalStatus: Removing token (-8969872965815280276). Waiting for
replication confirmation from [127.0.0.3,127.0.0.1].

$ nodetool removenode force
RemovalStatus: No token removals in process.

Tested with:

1)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1

2)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1
- kill -9 node3
- nodetool removenode will wait forever since node3 is gonne, node3
  will never send the replication confirmation to node1
- run nodetool removenode force on node1
  nodetool removenode completes with the following error:
    $ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e
    nodetool: Scylla API server HTTP POST to URL
    '/storage_service/remove_node' failed: nodetool removenode force is called by user
  nodetool removenode force completes sucessfully
    $ nodetool removenode force
    RemovalStatus: Removing token (-9171569494049085776). Waiting for
    replication confirmation from [127.0.0.3,127.0.0.1].

Fixes 1135.
2016-04-13 14:53:28 +08:00
Asias He
891e947314 storage_service: Rename remove_node to removenode
nodetool uses removenode command to remove a node. Rename the
implementation in storage_service to match the command.
2016-04-13 14:53:28 +08:00
Asias He
9ffb95216d storage_service: Add force_remove_completion
It is needed by the

$ nodetool removenode force

command.
2016-04-13 14:53:28 +08:00
Asias He
7c7e5967f6 storage_service: Add get_removal_status
It is needed by the

$ nodetool removenode status

command.
2016-04-13 14:53:28 +08:00
Asias He
8d7cd07d6c storage_service: Add print info in confirm_replication
The message is rare but it is very useful to debug removenode operation.
2016-04-13 14:53:28 +08:00
Asias He
ffe91b5755 token_metadata: Do not assert in get_host_id
Throw an exception instead of assert.
2016-04-13 14:53:27 +08:00
Raphael S. Carvalho
c28d168619 sstables: allow user to specify max sstable size with leveled strategy
This change will allow user to specify the maximum size of a new sstable
created as a result of leveled compaction.

Example of using this setting:
ALTER TABLE ks.test5 with compaction = {'sstable_size_in_mb': '1000',
'class': 'LeveledCompactionStrategy'}

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ebb9844401af74388bda12586c2435283f6d8db8.1460486043.git.raphaelsc@scylladb.com>
2016-04-13 09:13:33 +03:00
Raphael S. Carvalho
15246f31f7 sstables: fix incorrect sstable size when compression is enabled
Size of uncompressed sstable was being unconditionally used to determine
when to stop writing a table. When compression is enabled, compressed
size should be used instead. Problem affected Scylla when compression
and leveled strategy were used.

Fixes #1177.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d9bf26def41fb33ca297f4127ce042b7f67adf96.1460484529.git.raphaelsc@scylladb.com>
2016-04-13 09:01:01 +03:00
Glauber Costa
60ab3b3f50 sstable_tests: make sure the generation of the Summary is sane
When we recreate the summary from a missing Summary, we should make
sure it is generated sanely, and that it resembles the Summary that
would have otherwise been there.

In this tests we'll grab one of the Summary tests we've been doing,
and just apply them to the non-existent Summary file. We expect
the same results on those cases. Plus, a new test is added with some
sanity checking.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
114ba5e3a8 be robust against broken summary files
Now that we can boot without a Summary file, we can just as easily boot
with a broken one.

Suggested by Nadav, and it is actually very easy to do, so do it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
72dc45999d review fixes for generate_summary
Spotted by Avi post-merge
1) Need to close the file
2) Should be using the parameter pc instead of the default_class

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
f78f43850d clear components if reading toc fail
This shouldn't be a problem in practice, because if read_toc() fails,
the users will just tend to discard the sstable object altogether, and
not insist on using it.

However, if somebody does try to keep using it, a subsequent read_toc() could
theoretically have some components filled up leading the new reader to believe
the toc was populated successfully.

It is easier to just clear the _components set and never worry about it, than
trying to reason about whether or not that could happen.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
0f41ef1b84 index_reader: avoid misleading parent name
Also add comments about the expected signature of IndexConsumer

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:15:11 -04:00
Takuya ASADA
1eebe8bce1 dist: Support systemd for Ubuntu 15.10
To share systemd unit file between Fedora/CentOS and Ubuntu, generate
systemd unit file on building time since Fedora/CentOS and Ubuntu has
sysconfdir on different place.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459779957-11007-1-git-send-email-syuu@scylladb.com>
2016-04-12 14:39:26 +03:00
Avi Kivity
715794cce6 sstables: filter sstables single-row read using first_key/last_key
Using leveled compaction strategy, only a few sstables will contain a
given key, so we need to filter out the rest.  Using the summary entries
to filter keys works if the key is before the first summary entry,
but does not work if it is after the last summary entry, because the last
summary entry does not represent the last key; so sstables that are
are towards the beginning of the ring are read even if they do not contain
the key, greatly reducing read performance.

Fix by consulting the summary's first_key/last_key entries before consulting
the summary entry array.
2016-04-12 10:33:17 +03:00
Pekka Enberg
64c9ebb962 Merge "More exception safety fixes" from Paweł
"This is the second part of exception safety fixes for issues
discovered using memory allocation failure injector."
2016-04-12 08:08:00 +03:00
Paweł Dziepak
d53354947c storage_proxy: mark hint_to_dead_endpoints() noexcept
Hints are currently unimplemented but there is code depending on the
fact that hint_to_dead_endpoints() doesn't throw.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-12 00:06:10 +01:00
Paweł Dziepak
209b373412 exceptions: make exception constructors noexcept
Some of the exceptions are not thrown but constructed and set to some
future. In such case if there is another exception thrown in the
constructor it won't be propagated properly as it will casue stack to be
unwind in the place where the future is set, not in the continuation
chain waiting for it.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-12 00:06:02 +01:00
Paweł Dziepak
b00a3a76cc transport: ignore errors during connection shutdown
If the other end of the connection has already disconnected the shutdown
will fail with ENOTCONN. The resulting exception is going to propagate
through the continuation chain that is supposed to shut the cql server
down preventing it from properly waiting for all outstanding
continuations.

The solution is to just ignore any errors that shutdown() may return.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:54:47 +01:00
Paweł Dziepak
0d3d0a3c08 gossiper: handle failures in gossiper thread creation
seastar::async() creates a seastar thread and to do that allocates
memory. That allocation, obviously, may fail so the error handling code
needs to be moved so that it also catches errors from thread creation.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:54:47 +01:00
Paweł Dziepak
c1cffd0639 log: try to report logger failure
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:54:47 +01:00
Paweł Dziepak
b75c4098f2 storage_proxy: catch all errors in abstract_read_executor::execute()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:52:13 +01:00
Paweł Dziepak
9cd3da496e transport: retry do_accept() in case of bad_alloc
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:52:13 +01:00
Paweł Dziepak
2db70cf912 database: remove throw() specifiers
Most of them are missing std::bad_alloc (which leads to aborts) and they
force the compiler to add unnecessary runtime checks.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:52:13 +01:00
Gleb Natapov
3734dcbace storage_proxy: cleanup data_read_resolver::resolve()
live_row_count is summed several times in the same function. Do it only
once.

--
v1->v2:
  - call get() on std::reference_wrapper<std::vector<partition>> to get
    to reference for moving out of it.

Message-Id: <20160411123829.GE21479@scylladb.com>
2016-04-11 17:13:48 +02:00
Pekka Enberg
7af46e41e5 Merge "CQL authentication implementation" from Calle
"Adds support for CQL commands to create, alter, drop and list users.
Verified manually and by relevant dtests.

With this patch set, scylla supports adding super/regular users and
run sessions logged in as these. Note however that since actual
authorization is still not implemented, no CF/KS is really protected
by the authentication beyond initial login.

Some fixes for lingering bugs in user management in the existing code
as well.

Fixes #1121"
2016-04-11 12:57:00 +03:00
Pekka Enberg
4e04805352 cql3: Make lexer and parser error messages compatible with Cassandra
The default recognition error messages in antlr C++ backend are
different from Java backend which makes Scylla's CQL error messages
incompatible with Cassandra. This makes it very hard to write CQL level
test cases which are portable between Scylla and Cassandra.

To fix the issue, override the most common lexer and parser error
messages to follow the convention set by the antlr Java backend. This
unlocks various test cases in AlterTest, for example.
Message-Id: <1460032883-14422-1-git-send-email-penberg@scylladb.com>
2016-04-11 12:35:53 +03:00
Calle Wilund
ceac4df164 Cql.g: Add create/drop/alter/list user parsing 2016-04-11 09:10:41 +00:00
Calle Wilund
b8bd77e621 cql3::list_users_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
adaf21403b cql3::drop_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
8732b3eed7 cql3::alter_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
da89189308 cql3::create_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
57f5bb854f cql3::authentication_statement: cql auth base class 2016-04-11 09:10:41 +00:00
Calle Wilund
cef52d1653 cql3::user_options: Add options wrapper type 2016-04-11 09:10:41 +00:00
Calle Wilund
7ebac35779 client_state: break up setting login/validation
transport::server uses client_state in a move-temporary-around
fashion. Having a setter that does continuation-bound validation
makes this messier. Break them up to separate "this" placement
from the actual validation continuation logic
2016-04-11 09:10:41 +00:00
Calle Wilund
83e2604bc6 client_state: Propagate login user in merge 2016-04-11 09:10:41 +00:00
Calle Wilund
3daf768a82 client_state : Add ensure_not_anonymous method 2016-04-11 09:10:41 +00:00
Calle Wilund
1d7930c4bd authenticated_user: implement "is_super"
Which also, unfortunately, must be a continuation. (Queries tables)
2016-04-11 09:10:41 +00:00
Calle Wilund
d9b176307f auth::authenticator: option<->string 2016-04-11 09:10:41 +00:00
Raphael S. Carvalho
8fe7524e46 sstables: enable leveled strategy feature to prevent L0 from falling behind
If level 0 falls behind, size tiered strategy is used on it to reduce overhead
until we can catch up on the higher levels.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <17bf15b7d12cd5dc652cc92939c0c68f921662a2.1459976469.git.raphaelsc@scylladb.com>
2016-04-11 11:52:00 +03:00
Nadav Har'El
92ef11ffaa stables_mutation_test: more compare keys not representations
Commit 0fc4c36952 missed one place where
keys were compared using their byte representation. Fix that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-4-git-send-email-nyh@scylladb.com>
2016-04-11 11:36:17 +03:00
Nadav Har'El
9f9353ae5b sstable_mutation_test: another test for range tombstone merging
This is even a more elaborate tombstone merging unit test, with
3 levels of nesting, which did not pass with older range-tombstone
merging algorithms, and works with the current one.

I started with deletion of three nested levels of row -
aaa, aaa:bbb, and aaa:bbb::ccc. I then complicated the sstable
even further by adding additional middle-points with the same
timestamps (which we saw happening in some real-life sstables),
resulting in:

[
{"key": "pk",
 "cells": [["aaa:_","aaa:bba:_",1459438519943668,"t",1459438519],
           ["aaa:bba:_","aaa:bbb:_",1459438519943668,"t",1459438519],
           ["aaa:bbb:_","aaa:bbb:ccb:_",1459438519950348,"t",1459438519],
           ["aaa:bbb:ccb:_","aaa:bbb:ccc:_",1459438519950348,"t",1459438519],
           ["aaa:bbb:ccc:_","aaa:bbb:ccc:!",1459438519958850,"t",1459438519],
           ["aaa:bbb:ccc:!","aaa:bbb:ddd:!",1459438519950348,"t",1459438519],
           ["aaa:bbb:ddd:!","aaa:bbb:!",1459438519950348,"t",1459438519],
           ["aaa:bbb:!","aaa:!",1459438519943668,"t",1459438519]]}
]

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-3-git-send-email-nyh@scylladb.com>
2016-04-11 11:35:59 +03:00
Nadav Har'El
77a793048e sstable_mutation_test: strengthen tombstone_merging test
In the tombstone_merging test, we expected one row tombstone. But we did
not verify that in addition to that row tombstone, there is no other rows
(deleted or otherwise). It turns out that in the onld merging algorithm,
we did produce additional deleted rows which shouldn't have been there.

So this patch adds a test that there are no such additional deleted rows
beyond the one row tombstone we expect. The test passes with the new
range tombstone merging algorithm.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-2-git-send-email-nyh@scylladb.com>
2016-04-11 11:35:46 +03:00
Nadav Har'El
818f14f444 stable: overhaul (again) range tombstone merging
In commit 99ecda3c96, we overhauled the
way we read Cassandra's disjoint range tombstones, and convert them to
the overlapping whole-prefix tombstones which we support.

Unfortunately, while this algorithm worked correctly for a couple of
test cases, it did not for additional test cases. While the previous
algorithm could not generate "wrong" tombstones (it didn't generate things
it didn't see), it could generate redundant overlapping tombstones, and
missed some sanity checks about the correctness of the merge process.

In this patch, a new algorithm makes sure to not generate redundant
tombstones, and includes additional tests to ensure that we do not
mistakenly merge range tombstones which cannot actually be merged.

The following patches will include tests which failed with the previous
algorithm, and succeeds with this one.

I described the new algorithm on the ScyllaDB mailing list this way:

1. Have a stack of open ranges, start & timestamp for each (no end for
   each), and just one "end of last contiguous deletion"
Processing each range tombstone:
2. If the start of a range tombstone is not adjacent to the "end of last
   deletion", assert we have no open range on the stack (because we can
   never close those). In any case, set the "end of of last deletion" to
   the end of this tombstone.
3. If the current tombstone's timestamp is STRICTLY HIGHER than that on the
   top of the stack, push the new tombstone's start+timestamp to the stack.
   Note: If it was STRICTLY LOWER, throw error (it means the open range will
   never be closed).
4. If the current tombstone's end matches (i.e., closes row) of the start on
   the top of the stack, emit this tombstone and pop the stack.
When the row ends:
5. Assert the stack is empty.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-1-git-send-email-nyh@scylladb.com>
2016-04-11 11:35:23 +03:00
Avi Kivity
0c7f9917dc Merge seastar upstream
* seastar aa281bd...2aeb9dd (20):
  > memory: avoid exercising the reclaimers for oversized requests
  > tests: test cross-cpu free not underflowing live object counter
  > memory: fix live objects counter underflow due to cross-cpu free
  > core/reactor: Don't abort in allocate_aligned_buffer() on allocation failure
  > build: add --tests-debuginfo, to avoid stripping tests
  > connected_socket: Add buffer size arg to output()
  > scripts/posix_net_conf.sh: added a support for bonding interfaces
  > scripts/posix_net_conf.sh: move the NIC configuration code into a separate function
  > scripts/posix_net_conf.sh: implement the logic for selecting default MQ mode
  > scripts/posix_net_conf.sh: forward the interface name as a parameter
  > http/routes: Remove request failure logging to stderr
  > lowres_clock: Initialize _now when the clock is created
  > apps/iotune: fix broken URL
  > tutorial: expand and improve semaphore section
  > DPDK: support set RSS key to port_conf when hash_key_size is unknown
  > dpdk: aware of vmxnet3 max xmit frags and do linearizing
  > packet_util: insert out of order packet when map is empty
  > core: Fix use-after-free of scollectd::impl
  > futures: Optimize finally()
  > futures: Factor out exceptional path of finally()
2016-04-10 18:08:51 +03:00
Pekka Enberg
9b98278436 Merge "Be able to boot without a Summary" from Glauber
"Summary files are a relatively recent addition to Cassandra. I thought
that every SSTable converted to 2.1 would have them, but that does not
seem to be true. It's easy to generate a stream of files that will boot
in Cassandra 2.1 just fine, but not in Scylla as they will be missing
the Summary.

Cassandra can boot those files because they are robust against the Summary
not existing, and we should do the same.

Since we keep the Summary in memory, in case one does not exist we create a
memory copy of it from the Index - the filesystem is not touched. Hopefully,
compaction will run soon and the next time we boot we won't have to do such
thing.

Fixes #1170"
2016-04-09 20:38:57 +03:00
Pekka Enberg
992dab3fcb Merge "Fixes for mutation querying" from Tomek
"Fixes dtest failure of paging_test.py:TestPagingData.static_columns_paging_test"
2016-04-09 09:07:36 +03:00
Glauber Costa
8a50b027aa summary: generate one if it is not present
There are cases in which a Summary file will not be present, and imported
SSTables will have just the Index and Data files. In earlier versions of
Cassandra, a Summary didn't exist, so one may not be generated when migrating.

In Issue #1170, we can see an example of tables generated by CQLSSTableWriter,
and they lack a Summary. Cassandra is robust against this and can cope
perfectly with the Summary not existing. I will argue that we should do the
same.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
4de26fdec8 sstables: allow read_toc to be called more than once
We do that by bailing immediately if we detect that the components
map is already populated. This allow us to call read_toc() earlier
if we need to - for instance, to inquire about the existence of the
Summary - without the need to re-read the components again later.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
736e21222e sstables: avoid passing schema unnecessarily
for prepare_summary we can just pass the min interval as a parameter and
avoid having the schema do yet another hop. For sealing the summary, it
is completely unused and we can do away with it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
0de3a32147 index reader: make index_consumer a template parameter
This is done so we can use other consumers. An example of that, is regeneration
of the Summary from an existing Index.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
8453ff7788 make get_sstable_key_range an instance method
Because just creating an SSTable object does not generate any I/O,
get_sstable_key_range should be an instance method. The main advantage
of doing that is that we won't have to read the summary twice. The way
we're doing it currently, if happens to be a shard-relevant table we'll
call load() - which reads the summary again.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
6ae601a025 do not re-read the summary
There are times in which we read the Summary file twice. That actually happens
every time during normal boot (it doesn't during refresh). First during
get_sstable_key_range and then again during load().

Every summary will have at least one entry, so we can easily test for whether
or not this is properly initialized.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Tomasz Grabiec
3e0c24934b tests: cql_query_test: Add test for slicing in reverse 2016-04-08 20:53:33 +02:00
Tomasz Grabiec
c2b955d40b mutation_partition: Fix static row being returned when paginating
Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test.

Broken by f15c380a4f, where the
calcualtion of has_ck_selector got broken, in such a way that present
clustering restrictions were treated as if not present, which resulted
in static row being returned when it shouldn't.

While at it, unify the check between query_compacted() and
do_compact() by extracting it to a function.
2016-04-08 20:53:33 +02:00
Tomasz Grabiec
a1539fed95 mutation_partition: Fix reversed trim_rows()
The first erase_and_dispose(), which removes rows between last
position and beginning of the next range, can invalidate end()
iterator of the range. Fix by looking up end after erasing.

mutation_partition::range() was split into lower_bound() and
upper_bound() to allow for that.

This affects for example queries with descending order where the
selected clustering range is empty and falls before all rows.

Exposed by f15c380a4f, which is now
calling do_compact() during query.

Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test
2016-04-08 20:53:33 +02:00
Avi Kivity
db03295c8a Merge "Fix query digest mismatch" from Tomasz
"Currently data query digest includes cells and tombstones which may have
expired or be covered by higher-level tombstones. This causes digest
mismatch between replicas if some elements are compacted on one of the
nodes and not on others. This mismatch triggers read-repair which doesn't
resolve because mutations received by mutation queries are not differing,
they are compacted already.

The fix adds compacting step before writing and digesting query results by
reusing the algorithm used by mutation query. This is not the most optimal
way to fix this. The compaction step could be folded with the query writing,
there is redundancy in both steps. However such change carries more risk,
and thus was postponed.

perf_simple_query test (cassandra-stress-like partitions) shows regression
from 83k to 77k (7%) ops/s.

Fixes #1165."
2016-04-08 12:13:29 +03:00
Pekka Enberg
47a904c0f6 Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias
"There is a need to have an ability to detect whether a feature is
supported by entire cluster. The way to do it is to advertise feature
availability over gossip and then each node will be able to check if all
other nodes have a feature in question.

The idea is to have new application state SUPPORTED_FEATURES that will contain
set of strings, each string holding feature name.

This series adds API to do so.

The following patch on top of this series demostreates how to wait for features
during boot up. FEATURE1 and FEATURE2 are introduced. We use
wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully.
Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout.

   --- a/service/storage_service.cc
   +++ b/service/storage_service.cc
   @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() {
        // Add features supported by this local node. When a new feature is
        // introduced in scylla, update it here, e.g.,
        // return sstring("FEATURE1,FEATURE2")
   -    return sstring("");
   +    return sstring("FEATURE1,FEATURE2");
    }

    std::set<inet_address> get_seeds() {
   @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() {
        // gossip snitch infos (local DC and rack)
        gossip_snitch_info().get();

   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get();
   +    logger.info("Wait for FEATURE1 and FEATURE2 done");
   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get();
   +    logger.info("Wait for FEATURE3 done");
   +

We can query the supported_features:

    cqlsh> SELECT supported_features from system.peers;

     supported_features
    --------------------
      FEATURE1,FEATURE2
      FEATURE1,FEATURE2

    (2 rows)
    cqlsh> SELECT supported_features from system.local;

     supported_features
    --------------------
      FEATURE1,FEATURE2

    (1 rows)"
2016-04-08 09:22:50 +03:00
Benoît Canet
7c99ecf16f scylla_setup: Check if scylla-jmx is installed
Signed-of-by: Benoît Canet <benoit@scylladb.com>

Fixes #1107
Message-Id: <1460045692-815-1-git-send-email-benoit@scylladb.com>
2016-04-08 09:03:38 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Tomasz Grabiec
474a35ba6b tests: Add test for query digest calculation 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
4418da77e6 tests: mutation_source: Include random mutations in generate_mutation_sets() result
Probably increases coverage.
2016-04-07 19:57:19 +02:00
Tomasz Grabiec
5d768d0681 tests: mutation_test: Move mutation generator to mutation_source_test.hh
So that it can be reused.
2016-04-07 19:57:19 +02:00
Tomasz Grabiec
30d25bc47a tests: mutation_test: Add test case for querying of expired cells 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
58bbd4203f partition_slice_builder: Add new setters 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
7cd8e61429 tests: result_set_assertions: Add and_only_that() 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
f15c380a4f database: Compact mutations when executing data queries
Currently data query digest includes cells and tombstones which may have
expired or be covered by higher-level tombstones. This causes digest
mismatch between replicas if some elements are compacted on one of the
nodes and not on others. This mismatch triggers read-repair which doesn't
resolve because mutations received by mutation queries are not differing,
they are compacted already.

The fix adds compacting step before writing and digesting query results by
reusing the algorithm used by mutation query. This is not the most optimal
way to fix this. The compaction step could be folded with the query writing,
there is redundancy in both steps. However such change carries more risk,
and thus was postponed.

perf_simple_query test (cassandra-stress-like partitions) shows regression
from 83k to 77k (7%) ops/s.

Fixes #1165.
2016-04-07 19:56:58 +02:00
Tomasz Grabiec
e4e8acc946 mutation_query: Extract main part of mutation_query() into more generic querying_reader
So that it can be reused in query()
2016-04-07 19:03:04 +02:00
Takuya ASADA
ed7a3beed2 dist/ubuntu: drop unused scripts
This was used when we didn't shared scripts between CentOS/Fedora and Ubuntu, but used anymore so drop them.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459408583-13497-1-git-send-email-syuu@scylladb.com>
2016-04-06 08:21:09 +03:00
Asias He
d5dce8016b storage_service: Advertise supported_features into cluster
Advertise features supported by this node, so that other nodes can know
this info. For example, on a 3 node cluster with supported_features ==
FEATURE1 and FEATURE2, it looks like:

cqlsh> SELECT supported_features from system.peers;

 supported_features
--------------------
  FEATURE1,FEATURE2
  FEATURE1,FEATURE2

(2 rows)
cqlsh> SELECT supported_features from system.local;

 supported_features
--------------------
  FEATURE1,FEATURE2

(1 rows)
2016-04-06 07:12:34 +08:00
Asias He
0e1738943d storage_service: Add supported_features into system.peers table 2016-04-06 07:12:34 +08:00
Asias He
50bcfe569a system_keyspace: Add supported_features into system.local table 2016-04-06 07:12:34 +08:00
Asias He
b710a5f9ee storage_service: Introduce get_config_supported_features
It tells features supported by this local node. When new feature is
introduced in scylla, update features returned by
get_config_supported_features, e.g.,

   return sstring("FEATURE1,FEATURE2")
2016-04-06 07:12:34 +08:00
Asias He
e0a82a1107 gossip: Add supported_features helper in versioned_value
Give a supported features sstring, return a versioned_value for it.
2016-04-06 07:12:34 +08:00
Asias He
214c0f72b2 db: Add supported_features column in system.local and system.peers table 2016-04-06 07:12:34 +08:00
Asias He
04e8727793 gossip: Introduce wait_for_feature_on_{all}_node
API to wait for features are available on a node or all the nodes in the
cluster.

$timeout specifies how long we want to wait. If the features are not
availabe yet, sleep 2 seconds and retry.
2016-04-06 07:12:34 +08:00
Asias He
1e437e925c gossip: Introduce get_supported_features
- Get features supported by this particular node

  std::set<sstring> get_supported_features(inet_address endpoint) const;

- Get features supported by all the nodes this node knows about

  std::set<sstring> get_supported_features() const;
2016-04-06 07:12:34 +08:00
Asias He
a6080773b3 gossip: Add SUPPORTED_FEATURES application_state
It is used to negotiate cluster wide features.
2016-04-06 07:12:34 +08:00
Piotr Jastrzebski
d3f91eec61 Implement tuple_type_impl::from_string
This is a fix for:
https://github.com/scylladb/scylla/issues/574

It mirrors the behavior of:
org.apache.cassandra.db.marshal.TupleType.java#fromString

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <24a7d6253727d0faebb1df117c2f52410523d42f.1459843091.git.piotr@scylladb.com>
2016-04-05 16:00:18 +03:00
Vlad Zolotarov
2daaa00c4f conf: resurrect the important text related to endpoint_snitch configuration
commit d1b44cef1b removed an
important part of a comment related to an 'endpoint_snitch'
configuration. This patch puts it back.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1459858934-12005-1-git-send-email-vladz@cloudius-systems.com>
2016-04-05 15:23:13 +03:00
Raphael S. Carvalho
e15ce5eb4d api: Add support to get column family compression ratio
After this change, user can query compression ratio on a per column
family basis with 'nodetool cfstats'.

look at 'nodetool cfstats' output:
./bin/nodetool cfstats ks.test5
Keyspace: ks
	Read Count: 0
	Read Latency: NaN ms.
	Write Count: 0
	Write Latency: NaN ms.
	Pending Flushes: 0
		Table: test5
		SSTable count: 1
		Space used (live): 4774
		Space used (total): 4774
		Space used by snapshots (total): 0
		Off heap memory used (total): 131384
		SSTable Compression Ratio: 0.833333
	...

Fixes #636.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a1bee5a23fe63787df3e387a88f2d216ba4a4134.1459802771.git.raphaelsc@scylladb.com>
2016-04-05 12:46:40 +03:00
Asias He
d1b44cef1b conf: Drop duplicated section for endpoint_snitch
endpoint_snitch is supported and it is the "Supported Parameters".
Remove the duplicated section in "Unsupported parameters".
Message-Id: <f8260b72558305f9186c011b8f8f452b3b91339b.1459325982.git.asias@scylladb.com>
2016-04-05 08:48:48 +03:00
Pekka Enberg
32471fcb96 Merge "Do batch log replay in decommission" from Asias
"batchlog_manager is modified to allow the storage_service to initate a bachlog
 replay operation.

 Refs #1085.

 Tested with tests/batchlog_manager_test and batch_test.py"
2016-04-05 08:42:47 +03:00
Gleb Natapov
70575699e4 commitlog, sstables: enlarge XFS extent allocation for large files
With big rows I see contention in XFS allocations which cause reactor
thread to sleep. Commitlog is a main offender, so enlarge extent to
commitlog segment size for big files (commitlog and sstable Data files).

Message-Id: <20160404110952.GP20957@scylladb.com>
2016-04-04 14:15:00 +03:00
Amnon Heiman
725231a7a0 api: set the api_doc before registering any api
This is a left over from the re ordering of the API init.  The api_doc
should be set first, so later API registration will enable their
relevent swagger doc.

Currently, the swagger documentation of the system API is not available.

Fixes #1160

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1459750490-15996-1-git-send-email-amnon@scylladb.com>
2016-04-04 11:37:59 +03:00
Avi Kivity
6a3cf4ac41 cql: unlock ALTER TABLE syntax
It was marked experimental for 1.0, but will be fully supported in the
next release.
Message-Id: <1459707946-5860-1-git-send-email-avi@scylladb.com>
2016-04-04 11:36:11 +03:00
Piotr Jastrzebski
613e7d8618 Add more info to wrong RPC address error
If listening on RPC address failed then report
IP address and port in the error message.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c4db3527df2ce6dccb3b619584ee3fcb1e70ffd1.1459512258.git.piotr@scylladb.com>
2016-04-03 12:57:19 +03:00
Takuya ASADA
cad5edc53b dist: fix build error at copy symlinks
Both build_rpm.sh and build_deb.sh will fail with "cannot stat 'xxx': No such file or directory" when scylla-server package is not installed, need to prevent it by --no-dereference option of cp.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459523585-9108-1-git-send-email-syuu@scylladb.com>
2016-04-03 12:49:55 +03:00
Tomasz Grabiec
0fc4c36952 tests: sstable_mutation_test: Compare keys not representations
Representation is opaque at this level of abstraction.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459508193-7086-1-git-send-email-tgrabiec@scylladb.com>
2016-04-03 11:39:03 +03:00
Nadav Har'El
6c4ee49bd3 sstables: another test for range tombstone merging
This is another unit test for range tombstone merging, introduced in commit
0fc9a5ee4d and rewritten in commit
99ecda3c96.

In this test, a single large deletion was broken up into several smaller
ranges, all with the same time stamps, so we should recombine them into
one row tombstone, instead of failing the read.

The sstable in this test case was artificially created using json2sstable.
We don't know how yet to produce such a case using Cassandra 2, but we
have seen a similar occurance in the wild, in a real SSTable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459429243-15821-1-git-send-email-nyh@scylladb.com>
2016-04-01 11:55:14 +02:00
Takuya ASADA
d59c1c7648 dist/redhat: drop very old %pre script
These lines are needed for very old version of scylla, not for 1.0.
Can be removed now.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459177601-20269-1-git-send-email-syuu@scylladb.com>
2016-04-01 09:41:18 +03:00
Pekka Enberg
b9a1aef670 Merge "Random exception safety fixes" from Paweł
"These patches fix some of the problems found by randomly injecting
 memory allocation failures."
2016-04-01 08:58:00 +03:00
Paweł Dziepak
8f78b8e190 log: ignore logging exceptions
Logging is used in many places including those that shouldn't really
throw any exceptions (destructors, noexcept functions).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:43:32 +01:00
Paweł Dziepak
c8159eca52 commitlog: make sure that segment destructor doesn't throw
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:42:56 +01:00
Paweł Dziepak
3e0555809e storage_proxy: catch all exceptions in read executor
abstract_read_executor::reconcile() is supposed to make sure that
_result_promise is eventually set to either a result or an exception.
That may not happen however if reconciliation throws any exception
since only read timeouts are being caught. When that happends the
continuation chain becomes stuck.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:38:41 +01:00
Paweł Dziepak
3c107c4b05 sstables: remove HyperLogLog throw() specifier
HyperLogLog constructor promises that it only throws instances of
std::invalid_argument. That's a lie since it also adds elements to a
vector (and doesn't catch potential bad_allocs).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:36:53 +01:00
Avi Kivity
417bcb122d commitlog: ignore commitlog segments generated by Cassandra-derived tools
Cassandra-derived tools (such as sstable2json) may write commitlog segments,
that Scylla cannot recognize.  Since we now write them with a distinct name,
we can recognize the name and ignore these segments, as we know the data they
contain is not interesting.

Fixes #1112.
Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>
2016-03-31 16:01:08 +03:00
Nadav Har'El
78c9f49585 sstables: Move check_marker() to source file
The check_marker() function is use as a sanity-check of data we read
from sstable, so instead of the header file key.hh, let's move it to
the sstable-parsing source file partition.cc.

In addition to having less code in header files, another benefit is
that the function can now throw a more specific exception (malformed
sstable exception).

Also fixed the exception's message (which had a second "%d" but only
one parameter).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459420430-5968-1-git-send-email-nyh@scylladb.com>
2016-03-31 14:22:51 +03:00
Nadav Har'El
99ecda3c96 sstables: overhaul range tombstone reading
Until recently, we believed that range tombstones we read from sstables will
always be for entire rows (or more generalized clustering-key prefixes),
not for arbitrary ranges. But as we found out, because Cassandra insists
that range tombstones do not overlap, it may take two overlapping row
tombstones and convert them into three range tombstones which look like
general ranges (see the patch for a more detailed example).

Not only do we need to accept such "split" range tombstones, we also need
to convert them back to our internal representation which, in the above
example, involves two overlapping tombstones. This is what this patch does.

This patch also contains a test for this case: We created in Cassandra
an sstable with two overlapping deletions, and verify that when we read
it to Scylla, we get these two overlapping deletions - despite the
sstable file actually having contained three non-overlapping tombstones.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <b7c07466074bf0db6457323af8622bb5210bb86a.1459399004.git.glauber@scylladb.com>
2016-03-31 12:49:50 +03:00
Pekka Enberg
2629389d5d dist/docker/ubuntu: Use bash in start-scylla script
The default shell in Ubuntu is "dash" which causes the following error
when "scylla-start" script is executed:

  /start-scylla: 8: /start-scylla: source: not found

Message-Id: <1459406561-20141-1-git-send-email-penberg@scylladb.com>
2016-03-31 11:21:36 +03:00
Duarte Nunes
26a3461908 cql: Fix antlr3 missing token leak
This patch overrides the antlr3 function that allocates the missing
tokens that would eventually leak. The override stores these tokens in
a vector, ensuring memory is freed whenever the parser is destroyed.

Fixes #1147

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1459355146-17402-1-git-send-email-duarte@scylladb.com>
2016-03-31 08:44:45 +03:00
yan cui
6fc29843cd dist/docker: refine docker file for ubuntu 2016-03-30 18:54:14 +03:00
Duarte Nunes
f7a12adb6f cql3: Disable pg-style string format test
antlr3 leaks the token itself creates when recovering from a mismatch in
the case the missing token can be determined. Until this bug is fixed
or circumvented, the test should remain disabled.

Ref #1147

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1459345403-8243-1-git-send-email-duarte@scylladb.com>
2016-03-30 16:44:47 +03:00
Asias He
bc1889b7ab storage_service: Shutdown batchlog_manager after decommission
On the node which was decommissioned, I saw

2016-03-29 09:35:52,097 [shard 0] storage_service - DECOMMISSIONED:
2016-03-29 09:35:52,097 [shard 0] storage_service - DECOMMISSIONING: done
2016-03-29 09:36:28,814 [shard 0] batchlog_manager - Batchlog replay on shard 0: starts
2016-03-29 09:36:28,814 [shard 0] batchlog_manager - Batchlog replay on shard 0: done
2016-03-29 09:37:28,819 [shard 0] batchlog_manager - Batchlog replay on shard 1: starts
2016-03-29 09:37:28,820 [shard 0] batchlog_manager - Batchlog replay on shard 1: done
2016-03-29 09:38:28,830 [shard 0] batchlog_manager - Batchlog replay on shard 0: starts
2016-03-29 09:38:28,830 [shard 0] batchlog_manager - Batchlog replay on shard 0: done
2016-03-29 09:39:28,844 [shard 0] batchlog_manager - Batchlog replay on shard 1: starts
2016-03-29 09:39:28,844 [shard 0] batchlog_manager - Batchlog replay on shard 1: done

We should stop the batchlog_manager to avoid initiating only future
batchlog replay operation.
2016-03-30 20:54:30 +08:00
Asias He
5d1140b1eb storage_service: Do batch log replay in decommission
Replay the batch log during decommission. Kill one FIXME.

Refs #1085
2016-03-30 20:54:30 +08:00
Asias He
5550aeba1d batchlog_manager: Avoid stopping batchlog_manager more than once
We can stop batchlog_manager in decommission and drain.
Avoid stopping it more than once.

Fix the following error:

$ nodetool decommission
$ nodetool drain

storage_service - DECOMMISSIONING: stop_gossiping done
storage_service - messaging_service stopped
storage_service - DECOMMISSIONING: stop messaging_service done
storage_service - DECOMMISSIONING: set_bootstrap_state done
storage_service - DECOMMISSIONED:
storage_service - DECOMMISSIONING: done
storage_service - DRAINING: starting drain process
gossip - gossip is already stopped
scylla: ./seastar/core/gate.hh:93: future<> seastar::gate::close():
Assertion `!_stopped && "seastar::gate::close() cannot be called
more than once"' failed.
2016-03-30 20:54:30 +08:00
Asias He
cdb43c5586 batchlog_manager: Allow user initiated bachlog replay operation
During decommission, the storage_service::unbootstrap() needs to
initiate a batchlog replay operation. To sync the replay operation
initiated by the timer in batchlog_manager and storage_service, a
semaphore is introduced.  To simplify the semaphore locking, the
management code now always runs on shard zero, but the real work is
distruted to all shards.
2016-03-30 20:54:30 +08:00
Nadav Har'El
0fc9a5ee4d sstables: merge range tombstones if possible
This is a rewrite of Glauber's earlier patch to do the same thing, taking
into account Avi's comments (do not use a class, do not throw from the
constructor, etc.). I also verified that the actual use case which was
broken in #1136 was fixed by this patch.

Currently, we have no support for range tombstones because CQL will not
generate them as of version 2.x. Thrift will, but we can safely leave this for
the future.

However, we have seen cases during a real migration in which a pure-CQL
Cassandra would generate range tombstones in its SSTables.

Although we are not sure how and why, those range tombstones were of a special
kind: their end and next's start range were adjacent, which means that in
reality, they could very well have been written as a single range tombstone for
an entire clustering key - which we support just fine.

This code will attempt to fix this problem temporarily by merging such ranges
if possible. Care must be taken so that we don't end up accepting a true
generic range tombstone by accident.

Fixes #1136

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459333972-20345-1-git-send-email-nyh@scylladb.com>
2016-03-30 13:40:10 +03:00
Calle Wilund
0f5ca342b8 lists.cc: setter_by_uuid does not require read before execute
Fixes #1082

Setting by UUID does not need existing data in list,
so need no read before execute

Message-Id: <1459325931-16387-1-git-send-email-calle@scylladb.com>
2016-03-30 11:24:20 +03:00
Takuya ASADA
73fa36b416 dist/common/scripts: update SET_NIC when --setup-nic passed to scylla_sysconfig_setup
scylla_sysconfig_setup mistakenly ignores --setup-nic argument.
Fixes #1132

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459285500-22185-1-git-send-email-syuu@scylladb.com>
2016-03-30 11:07:33 +03:00
Takuya ASADA
58fb7000b1 dist: add setup scripts symlink to /usr/sbin
Instead of moving script to /usr/sbin, create symlink from /usr/lib/scylla/scylla_*setup to /usr/sbin/

Fixes #1092

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459324684-31364-1-git-send-email-syuu@scylladb.com>
2016-03-30 11:04:41 +03:00
Glauber Costa
23808ba184 sstables: fix exception printouts in check_marker
As Nadav noticed in his bug report, check_marker is creating its error messages
using characters instead of numbers - which is what we intended here in the
first place.

That happens because sprint(), when faced with an 8-byte type, interprets this
as a character.  To avoid that we'll use uint16_t types, taking care not to
sign-extend them.

The bug also noted that one of the error messages is missing a parameter, and
that is also fixed.

Fixes #1122

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <74f825bbff8488ffeb1911e626db51eed88629b1.1459266115.git.glauber@scylladb.com>
2016-03-29 19:23:28 +03:00
Takuya ASADA
c1277bacb4 dist/common/scripts: prevent misinterpret blank input as '/dev/', show error when inputted device path is not found
Fixes #1110

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459267786-19123-1-git-send-email-syuu@scylladb.com>
2016-03-29 19:18:51 +03:00
Glauber Costa
d5c1366e85 compaction: be verbose about which table is causing an exception
When we, for some reason, fail to compact an SSTable, we do not log the file
name leaving us with cryptic messages that tell us what happened, but not where
it happened.

This patch adds logging in compaction so that we'll know what's going on.
Please note that readers are more of a concern, because the SSTable being
written technically do not exist yet. Still, better safe than sorry: if
open_data fails, or we leave an unfinished SSTable, it is still good to know
which one was the culprit.

Some argument can be made about whether we should log this at the lower SSTable
level, or at the compaction level.

The reason I am logging this at the compaction level, is that we don't really
know which exception will trigger, and where: it may be the case that we're
seeing exceptions that are not SSTable specific, and may not have the chance to
log it properly.

In particular, if the exception happens inside the reader: read_rows() and
friends only return a mutation reader, which doesn't really do anything until
we call read(). But at that time, we don't hold any pointers to the SSTable
anymore.

In Summary, logging at the compaction level guarantees that we always do it no
matter what. Exceptions that are part of the main SSTable path can log the file
name as well if they want: if that's the case, we'll be left with the name
appearing twice. That's totally harmless, and better than none.

Fixes #1123

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c5c969fb6aeb788a037bd7a4ea69979c1042cb34.1459263847.git.glauber@scylladb.com>
2016-03-29 18:15:56 +03:00
Glauber Costa
d536846433 commitlog: initialize sync period with actual sync period
commitlog's sync period is initialized as the batch period, and not as the
sync period itself as it should be.

I've found this by code inspection, but unless I am missing something
really fundamental, this seems to be completely wrong. It's been working
fine because in our defaults, I have checked that both variables default to
the same value. But it seems to me that as long as anyone would change one
of them, the behavior wouldn't be as expected.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>
2016-03-29 15:21:02 +03:00
Takuya ASADA
a5bb6c4b1b dist/ubuntu: drop classical sysv init script, only support Upstart for Ubuntu 14.04LTS
Sysv init script was added just for prevent warning message on lintian,
never really used by Ubuntu users. Result of that, we often break this
script since upstart/systemd unit file frequently changed. It may
confuse users, it's better to use Upstart only, just like Fedora/CentOS.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459177601-20269-2-git-send-email-syuu@scylladb.com>
2016-03-29 11:48:18 +03:00
Takuya ASADA
42ce77a3b7 dist/redhat: prevent 'yum: command not found' on some Fedora environment
On some Fedora environments such as Fedora official AMI, dnf-yum package
is not installed by default, causes command not found error when we run
our setup scripts.

To prevent this, we need to add dnf-yum to scylla-server package
dependency.

Fixes #1106

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459099744-23068-1-git-send-email-syuu@scylladb.com>
2016-03-29 11:29:09 +03:00
Avi Kivity
adffb1c061 dist/ubuntu: improve handling of bad command line options
On a bad command line, Scylla will exit with an exit code of 2.  Mark it
as a "normal" exit, to prevent a respawn.

Fixes #1087
Message-Id: <1458827221-12833-1-git-send-email-avi@scylladb.com>
2016-03-29 11:14:45 +03:00
Avi Kivity
c1d8fb56f7 dist/ubuntu: specify kill timeout
Allow more time for commitlog flushing
Message-Id: <1458827216-12778-1-git-send-email-avi@scylladb.com>
2016-03-29 11:14:27 +03:00
Raphael Carvalho
d515a7fd85 sstables: fix deletion of sstable with temporary TOC
After 4e52b41a4, remove_by_toc_name() became aware of temporary TOC
files, however, it doesn't consider that some components may be
missing if temporary TOC is present.
When creating a new sstable, the first thing we do is to write all
components into temporary TOC, so content of a temporary TOC isn't
reliable until it is renamed.

Solution is about implementing the following flow (described by Avi):
"Flow should be:

  - remove all components in parallel
  - forgive ENOENT, since the compoent may not have been written;
otherwise deletion error should be raised
  - fsync the directory
  - delete the temporary TOC
"

This problem can be reproduced by running compaction without disk
space, so compaction would fail and leave a partial sstable that would
be marked for deletion. Afterwards, remove_by_toc_name() would try to
delete a component that doesn't exist because it looked at the content
of temporary TOC.

Fixes #1095.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <0cfcaacb43cc5bad3a8a7ea6c1fa6f325c5de97d.1459194263.git.raphaelsc@scylladb.com>
2016-03-29 10:38:01 +03:00
Tomasz Grabiec
d1db23e353 storage_service: Fix typos
Message-Id: <1458837390-26634-1-git-send-email-tgrabiec@scylladb.com>
2016-03-29 10:29:04 +03:00
Pekka Enberg
994390769f Update scylla-ami submodule
* dist/ami/files/scylla-ami 89e7436...7019088 (1):
  > Re-enable clocksource=tsc on AMI
2016-03-29 10:18:06 +03:00
Takuya ASADA
201b0c6ab3 dist: re-enable clocksource=tsc on AMI
clocksource=tsc on boot parameter mistakenly dropped on b3c85aea89, need to re-enable.

[ penberg: Manual backport of commit 050fb911d5 to 1.0. ]
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459180643-4389-1-git-send-email-syuu@scylladb.com>

(cherry picked from commit 80242ff443)
2016-03-29 10:17:41 +03:00
Pekka Enberg
227daecba6 Revert "dist: move setup scripts to /usr/sbin"
This reverts commit 989357189a because it
broke our Jenkins packaging jobs.
2016-03-29 10:17:05 +03:00
Pekka Enberg
d1ec97e76f Revert "dist: re-enable clocksource=tsc on AMI"
This reverts commit 050fb911d5 in
preparation for reverting 989357189a.
2016-03-29 10:16:48 +03:00
Takuya ASADA
050fb911d5 dist: re-enable clocksource=tsc on AMI
clocksource=tsc on boot parameter mistakenly dropped on b3c85aea89, need to re-enable.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459180643-4389-1-git-send-email-syuu@scylladb.com>
2016-03-29 09:53:23 +03:00
Asias He
62d443a07d streaming: Fix log of plan_id and session address in stream_session
They are get swapped. Fix it up. Spotted by looking at the log.
Message-Id: <d163d71e9a96d1a45c3a4c529519790eeff7c486.1459172778.git.asias@scylladb.com>
2016-03-29 09:01:06 +03:00
Nadav Har'El
a05577ca41 sstable: fix read failure of certain sstables
We had a problem reading certain existing Cassandra sstables into
Scylla.

Our consume_range_tombstone() function assumes that the start and end
columns have a certain "end of component" markers, and want to verify
that assumption. But because of bugs in older versions of Cassandra,
see https://issues.apache.org/jira/browse/CASSANDRA-7593, sometimes the
"end of component" was missing (set to 0). CASSANDRA-7593 suggested
this problem might exist on the start column, so we allowed for that,
but now we discovered a case where also the end column is set to 0 -
causing the test in consume_range_tombstone() to fail and the sstable
read to fail - causing Scylla to no be able to import that sstable from
Cassandra. Allowing for an 0 also on the end column made it possible
to read that sstable, compact it, and so on.

Fixes #1125.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459173964-23242-1-git-send-email-nyh@scylladb.com>
2016-03-28 17:09:37 +03:00
Duarte Nunes
db881fdc8f cql: Add support for pg-style string literal
This patch adds support for pg-style string literals to the CQL
grammar.

Fixes #1078

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1459093238-2529-1-git-send-email-duarte@scylladb.com>
2016-03-28 17:06:03 +03:00
yan cui
e5d1c031ac dist: add ubuntu docker file 2016-03-28 10:14:12 +03:00
Avi Kivity
a919113fdb schema_tables: fix deadlock in cross-node communications
Seastar wrongly limits the number of concurrent submit_to()s to a single
remote shard.  This can cause an ABBA deadlock:

  fiberA                fiberB (x127)
  submit_to(0)                         # lock schema
  <- returns
                        submit_to(0)   # lock schema (waits)
  submit_to(0)                         # do work (waits)

The fiberBs wait for fiberA, which in turn waits for a fiberB to return.

While the correct fix is to remote the client-side limit and replace it
with a server-side per-verb limit, we start with a simpler fix that
replaces the blocking lock call with a non-blocking call, removing the
deadlock.

Fixes #1088.

Message-Id: <1459095357-28950-1-git-send-email-avi@scylladb.com>
2016-03-28 10:12:10 +03:00
Raphael Carvalho
e6e5999282 Fix corner-case in refresh
Problem found by dtest which loads sstables with generation 1 and 2 into an
empty column family. The root of the problem is that reshuffle procedure
changes new sstables to start from generation 2 at least. So reshuffle could
try to set generation 1 to 2 when generation 2 exists.
This problem can be fixed by starting from generation 1 instead, so reshuffle
would handle this case properly.

Fixes #1099.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <88c51fbda9557a506ad99395aeb0a91cd550ede4.1458917237.git.raphaelsc@scylladb.com>
2016-03-27 10:03:32 +03:00
Avi Kivity
077c0d1022 dist: ami: fix AMI_OPT receiving no value
We assign AMI=0 and AMI_OPT=1, so in the true case, AMI_OPT has no value,
and a later compare fails.
2016-03-26 21:16:28 +03:00
Takuya ASADA
989357189a dist: move setup scripts to /usr/sbin
Since these scripts are user command, should be on $PATH.

Fixes #1092

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458860407-25269-1-git-send-email-syuu@scylladb.com>
2016-03-25 11:50:13 +03:00
Takuya ASADA
2582dbe4a0 dist/ami: use tilde for release candidate builds
Sync with ubuntu package versioning rule

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458882718-29317-1-git-send-email-syuu@scylladb.com>
2016-03-25 11:34:28 +03:00
Glauber Costa
e750a94300 sanity check Seastar's I/O queue configuration
While Seastar in general can accept any parameter for its I/O queues, Scylla
in particular shouldn't run with them disabled. Such will be the status when
the max-io-requests parameter is not enabled.

On top of that, we would like to have enough depth per I/O queue not to allow
for shard-local parallelism. Therefore, we will require a minimum per-queue
capacity of 4. In machines where the disk iodepth is not enough to allow for 4
concurrent requests per shard, one should reduce the number of I/O queues.

For --max-io-requests, we will check the parameter itself. However, the
--num-io-queues parameter is not mandatory, and given enough concurrent
requests, Seastar's default configuration can very well just be doing the right
thing. So for that, we will check the final result of each I/O queue.

As it is the case with other checks of the sorts, this can be overridden by
the --developer-mode switch.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <63bf7e91ac10c95810351815bb8f5e94d75592a5.1458836000.git.glauber@scylladb.com>
2016-03-25 11:33:57 +03:00
Tomasz Grabiec
53bbcf4a1e schema_tables: Wait for notifications to be processed.
Listeners may defer since:

 93015bcc54 "migration_manager: Make the migration callbacks runs inside seastar thread"

Not all places were adjusted to wait for them. Fix that.

Message-Id: <1458837613-27616-1-git-send-email-tgrabiec@scylladb.com>
2016-03-24 19:04:12 +02:00
Avi Kivity
12744217b8 Initial github issue template
Message-Id: <1458817106-1513-1-git-send-email-avi@scylladb.com>
2016-03-24 15:37:00 +02:00
Benoît Canet
4ac1126677 collectd: Write to the network to get rid of spurious log messages
Closes #1018

Suggested-by: Avi Kivity <avi@scylladb.com>
Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458759378-4935-1-git-send-email-benoit@scylladb.com>
2016-03-24 12:34:14 +02:00
Calle Wilund
ff5df306e3 database: Use disk-marking delete function in discard_sstables
Fixes #797

To make sure an inopportune crash after truncate does not leave
sstables on disk to be considered live, and thus resurrect data,
after a truncate, use delete function that renames the TOC file to
make sure we've marked sstables as dead on disk when we finish
this discard call.
Message-Id: <1458575440-505-2-git-send-email-calle@scylladb.com>
2016-03-24 12:02:08 +02:00
Calle Wilund
4e52b41a46 sstables: Add delete func to rename TOC ensuring table is marked dead
Note: "normal" remove_by_toc_name must now be prepared for and check
if the TOC of the sstable is already moved to temp file when we
get to the juicy delete parts.
Message-Id: <1458575440-505-1-git-send-email-calle@scylladb.com>
2016-03-24 12:01:53 +02:00
Asias He
6fd6e57e80 streaming: Harden keep alive timer
- Do nothing in case the session is closed, to prevent we fire up the
  timer again

- Print log info when no progress has been made if the time expires, it
  is very useful to debug a idle session

- Grab a reference when the keep alive timer is running

Message-Id: <9f2cc3164696905a6a39c0d072a980765d598dfd.1458782956.git.asias@scylladb.com>
2016-03-24 11:58:54 +02:00
Avi Kivity
112a930f92 Merge "Bring back simplify session completion logic" from Asias
"The following patches are reverted becasue they were thought they break
Glauber's "Make sure repairs do not cripple incoming load" series. It turns out
these two patches just made another bug more visisble. The bug is fixed in
c2eff7e824 (streaming: Complete receive task
after the flush). We can bring the two patches back now.

Passed repair_additional_test.py and update_cluster_layout_tests.py with smp 2."
2016-03-24 11:57:20 +02:00
Tomasz Grabiec
341b509f68 cql_test_env: Make initialization exception-safe
Currently start() is not prepared to handle exceptions thrown from
service initialization. It's easy to trigger such exceprion by
starting two tests at the same time, which will result in socket bind
error.

Exception thrown from start() typically results in assertion failures
like this one:

  seastar::sharded<Service>::~sharded() [with Service = database]: Assertion `_instances.empty()' failed.

This patch fixes the problem by combining start() and stop() in a
single do_with() and using RAII for stopping services.

Now exceptions thrown from service initialization should stop services
in proper order and let the original exception to pass
through. Example result:

  fatal error in "test_new_schema_with_no_structural_change_is_propagated": std::runtime_error: bind: Address already in use
Message-Id: <1458768018-27662-1-git-send-email-tgrabiec@scylladb.com>
2016-03-24 11:20:01 +02:00
Shlomi Livne
d3a91e737b fix a collision betwen --ami command line param and env
sysconfig scylla-server includes an AMI, the script also used an AMI
variable fix this by renaming the script variable

6a18634f9f introduced this issue since it
started imported the sysconfig scylla-server

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <0bc472bb885db2f43702907e3e40d871f1385972.1458767984.git.shlomi@scylladb.com>
2016-03-24 08:14:41 +02:00
Asias He
fe263e5436 Revert "Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE""
This reverts commit 1f29a698d5.
2016-03-24 08:43:17 +08:00
Asias He
a6dd6e6d55 Revert "Revert "streaming: Simplify session completion logic""
This reverts commit 354fca9d56.
2016-03-24 07:48:27 +08:00
Gleb Natapov
0afd1c6f0a config: enable truncate_request_timeout_in_ms option
Option truncate_request_timeout_in_ms is used by truncate. Mark it as
used.

Message-Id: <20160323162649.GH2282@scylladb.com>
2016-03-23 18:50:24 +02:00
Yoav Kleinberger
91269d0c15 tools/scyllatop: add sums to aggregate view
the aggregate view now supports both sums and means.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1328af8efb113a786d7402b0704220108bfb28db.1458749600.git.yoav@scylladb.com>
2016-03-23 18:49:57 +02:00
Shlomi Livne
6a18634f9f scylla_io_setup import scylla-server env args
scylla_io_seup requires the scylla-server env to be setup to run
correctly. previously scylla_io_setup was encapsulated in
scylla-io.service that assured this.

extracting CPUSET,SMP from SCYLLA_ARGS as CPUSET is needed for invoking
io_tune

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <d49af9cb54ae327c38e451ff76fe0322e64a5f00.1458747527.git.shlomi@scylladb.com>
2016-03-23 17:54:06 +02:00
Pekka Enberg
8bf3d4f550 Merge "Make sure repairs do not cripple incoming load" from Glauber
"This series makes sure that the influence of repairs on the ongoing loads is
limited. This patch does not fix the situation completely, but it will be the
best we can do for 1.0

Here's a brief explanation about some potentially contentions points, and future work:

1) With the old parallelism semaphore in tree, we could never really drop parallelism
below 256, since even with (local) parallelism = 1, we would still have 256 vnodes. So while
the number 100 is totally empirical, we know for a fact that around 200-something, we
start having real trouble. (total) parallelism = 100 is enough to allow us to survive
a load as much as 3 times heavier than the load described in Issue944. So while it is
empirical, at least it is based on something

2) I totally support changing the checksumming algorithm. However, I would rather focus
my efforts on testing this to exhaustion than doing this at the moment. But if anybody
wants to do it, I think it is a great thing to have before 1.0. Specially because
we'll probably need a new verb for that, so we would be better off having it from the start

3) This problem was made harder due to the fact that there are three conditions really that
can affect the ongoing load. Only one of them needs to trigger for us to see degradation, so
fixing them individually will usually buy us nothing.

Those are:
    a) The disk bandwidth. Since the mutations are all together in the same memtable/commitlog
       as normal memtables, we can differentiate between them from the I/O Scheduler perspective.
       This is not an issue of course if the incoming mutations are not enough for us to saturate
       the disk, but specially given the highly parallel nature of repair, we usually will. If
       the commitlog queue starts getting too big, for instance, new requests will start being
       put to wait. The effect of this part of the series is to *completely* shift the high waiting
       times from those classes to the streaming ones (unfortunately compaction is still affected,
       but that's fine IMHO). With the new streaming classes, the waiting time of a memtable / commitlog
       requests is still kept in the microseconds range. The streaming classes, on the other hand,
       will be in the hundreds of milliseconds range, or even seconds.

    b) The memory consumption: since the whole problem that leads to a) is the fact that due to high
       disk activity some requests will have to wait, we will end up with a lot of streaming memtables
       not yet flushed. Because of that, we will start throttling new incoming CQL requests and all
       the isolation efforts are rendered useless. Once again, due to the highly parallel nature of
       repair, this turned out to be a very easy condition to trigger. The solution proposed here
       is to limit a maximum amount of dirty memory for the repair job (in here, 25 %). This way,
       we can endure even slightly heavier loads without sweating too much.

    c) The task scheduler: repair generates a ton of requests for range checksums, and we actually
       want to keep it that way - so that the ranges checksummed are small enough so we don't have
       to resend a lot of mutations for no reason. However, if we pile up thousands of continuations
       in the task scheduler, seastar has absolutely no mechanism (right now) to prioritize between
       different kinds of requests. That means that the continuations that are supposed to be handling
       user requests will simply not for a long time. Even if the Seastar load is less than 100 %
       that is still a problem, since that is just adding hundreds of milliseconds worth of latencies to
       any request processing.

Fixes #944 and fixes #1033."
2016-03-23 16:07:06 +02:00
Yoav Kleinberger
d2cfb86dc8 tools/scyllatop: defend against unexpected strings from collectd
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <cd7ecf6b3b82bd2027179cbec4e689a946469e9a.1458740337.git.yoav@scylladb.com>
2016-03-23 16:05:59 +02:00
Asias He
c2eff7e824 streaming: Complete receive task after the flush
A STREAM_MUTATION_DONE message will signal the receiver that the sender
has completed the sending of streams mutations. When the receiver finds
it has zero task to send and zero task to receive, it will finish the
stream_session, and in turn finish the stream_plan if all the
stream_sessions are finished. We should call receive_task_completed only
after the flush finishes so that when stream_plan is finshed all the
data is on disk.

Fixes repair_disjoint_data_test issue with Glauber's "[PATCH v4 0/9] Make
sure repairs do not cripple incoming load" serries

======================================================================
FAIL: repair_disjoint_data_test
(repair_additional_test.RepairAdditionalTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "scylla-dtest/repair_additional_test.py",
line 102, in repair_disjoint_data_test
    self.check_rows_on_node(node1, 3000)
  File "scylla-dtest/repair_additional_test.py",
line 33, in check_rows_on_node
    self.assertEqual(len(result), rows, len(result))
AssertionError: 2461
2016-03-23 09:40:49 -04:00
Glauber Costa
f49e965d78 repair: rework repair code so we can limit parallelism
The repair code as it is right now is a bit convoluted: it resorts to detached
continuations + do_for_each when calling sync_ranges, and deals with the
problem of excessive parallelism by employing a semaphore inside that range.

Still, even by doing that, we still generate a great number of
checksum requests because the ranges themselves are processed in parallel.

It would be better to have a single-semaphore to limit the overall parallelism
for all requests.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:40:49 -04:00
Glauber Costa
34a9fc106f database: keep streaming memtables in their own region group
Theoretically, because we can have a lot of pending streaming memtables, we can
have the database start throttling and incoming connections slowing down during
streaming.

Turns out this is actually a very easy condition to trigger. That is basically
because the other side of the wire in this case is quite efficient in sending
us work. This situation is alleviated a bit by reducing parallelism, but not
only it does't go away completely, once we have the tools to start increasing
parallelism again it will become common place.

The solution for this is to limit the streaming memtables to a fraction of the
total allowed dirty memory. Using the nesting capability built in in the LSA
regions, we will make the streaming region group a child of the main region
group.  With that, we can throttle streaming requests separately, while at the
same time being able to control the total amount of dirty memory as well.

Because of the property, it can still be the case that incoming requests will
throttle earlier due to streaming - unless we allow for more dirty memory to be
used during repairs - but at least that effect will be limited.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:40:47 -04:00
Glauber Costa
455d5a57d2 streaming memtables: coalesce incoming writes
The repair process will potentially send ranges containing few mutations,
definitely not enough to fill a memtable. It wants to know whether or not each
of those ranges individually succeeded or failed, so we need a future for each.

Small memtables being flushed are bad, and we would like to write bigger
memtables so we can better utilize our disks.

One of the ways to fix that, is changing the repair itself to send more
mutations at a single batch. But relying on that is a bad idea for two reasons:

First, the goals of the SSTable writer and the repair sender are at odds. The
SSTable writer wants to write as few SSTables as possible, while the repair
sender wants to break down the range in pieces as small as it can and checksum
them individually, so it doesn't have to send a lot of mutations for no reason.

Second, even if the repair process wants to process larger ranges at once, some
ranges themselves may be small. So while most ranges would be large, we would
still have potentially some fairly small SSTables lying around.

The best course of action in this case is to coalesce the incoming streams
write-side.  repair can now choose whatever strategy - small or big ranges - it
wants, resting assure that the incoming memtables will be coalesced together.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:38:22 -04:00
Glauber Costa
5fa866223d streaming: add incoming streaming mutations to a different sstable
Keeping the mutations coming from the streaming process as mutations like any
other have a number of advantages - and that's why we do it.

However, this makes it impossible for Seastar's I/O scheduler to differentiate
between incoming requests from clients, and those who are arriving from peers
in the streaming process.

As a result, if the streaming mutations consume a significant fraction of the
total mutations, and we happen to be using the disk at its limits, we are in no
position to provide any guarantees - defeating the whole purpose of the
scheduler.

To implement that, we'll keep a separate set of memtables that will contain
only streaming mutations. We don't have to do it this way, but doing so
makes life a lot easier. In particular, to write an SSTable, our API requires
(because the filter requires), that a good estimate on the number of partitions
is informed in advance. The partitions also need to be sorted.

We could write mutations directly to disk, but the above conditions couldn't be
met without significant effort. In particular, because mutations can be
arriving from multiple peer nodes, we can't really sort them without keeping a
staging area anyway.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:13:00 -04:00
Glauber Costa
10c8ca6ace priority manager: separate streaming reads from writes
Streaming has currently one class, that can be used to contain the read
operations being generated by the streaming process. Those reads come from two
places:

- checksums (if doing repair)
- reading mutations to be sent over the wire.

Depending on the amount of data we're dealing with, that can generate a
significant chunk of data, with seconds worth of backlog, and if we need to
have the incoming writes intertwined with those reads, those can take a long
time.

Even if one node is only acting as a receiver, it may still read a lot for the
checksums - if we're talking about repairs, those are coming from the
checksums.

However, in more complicated failure scenarios, it is not hard to imagine a
node that will be both sending and receiving a lot of data.

The best way to guarantee progress on both fronts, is to put both kinds of
operations into different classes.

This patch introduces a new write class, and rename the old read class so it
can have a more meaningful name.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
78189de57f database: make seal_on_overflow a method of the memtable_list
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
635bb942b2 database: move add_memtable as a method of the memtable_list
The column family still has to teach the memtable list how to allocate a new memtable,
since it uses CF parameters to do so.

After that, the memtable_list's constructor takes a seal and a create function and is complete.
The copy constructor can now go, since there are no users left.
The behavior of keeping a reference to the underlying memtables can also go, since we can now
guarantee that nobody is keeping references to it (it is not even a shared pointer anymore).
Individual memtables are, and users may be keeping references to them individually.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
6ba95d450f database: move active_memtable to memtable_list
Each list can have a different active memtable. The column family method keeps
existing, since the two separate sets of memtable are just an implementation
detail to deal with the problem of streaming QoS: *the* active memtable keeps
being the one from the main list.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
af6c7a5192 database: create a class for memtable_list
memtable_list is currently just an alias for a vector of memtables.  Let's move
them to a class on its own, exporting the relevant methods to keep user code
unchanged as much as possible.

This will help us keeping separate lists of memtables.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Avi Kivity
8ed95754c0 Merge seastar upstream
* seastar 9f2b868...aa281bd (7):
  > shared_promise: Add move assignment operator
  > lowres_clock: Fix stretched time
  > scripts: Delete tap with ip instead of tunctl
  > vla: Actually be exception-safe
  > vla: Ensure memory is freed if ctor throws
  > vla: Ensure memory is correctly freed
  > net: Improve error message when parsing invalid ipv4 address
2016-03-23 14:39:31 +02:00
Takuya ASADA
50db64de33 dist: drop -j2 option on .spec, make build_rpm.sh able to specify -j option
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458678665-30273-1-git-send-email-syuu@scylladb.com>
2016-03-23 13:32:14 +02:00
Gleb Natapov
48c83163b9 init: make more initialization threaded
Since initialization now runs in a thread storage, messaging and
gossiper services initialization code may take advantage of it too.

Message-Id: <20160323094732.GF2282@scylladb.com>
2016-03-23 11:53:11 +02:00
Shlomi Livne
4ecc37111f dist/ami: Use the actual number of disks instead of AWS meta service
We have seen in some cases that when using the boto api to start
instances the aws metadata service
http://169.254.169.254/latest/meta-data/block-device-mapping/ returns
incorrect number of disks - workaround that by checking the actual
number of disks using lsblk

Adding a validation at the end verifying that after all computations the
NR_IO_QUEUES will not be greater then the number of shards (we had an
issue with i2.8x)

Fixes: #1062

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <54c51cd94dd30577a3fe23aef3ce916c01e05504.1458721659.git.shlomi@scylladb.com>
2016-03-23 10:47:08 +02:00
Raphael Carvalho
370b1336fe service: fix refresh
Vlad and I were working on finding the root of the problems with
refresh. We found that refresh was deleting existing sstable files
because of a bug in a function that was supposed to return the maximum
generation of a column family.
The intention of this function is to get generation from last element
of column_family::_sstables, which is of type std::map.
However, we were incorrectly using std::map::end() to get last element,
so garbage was being read instead of maximum generation.
If the garbage value is lower than the minimum generation of a column
family, then reshuffle_sstables() would set generation of all existing
sstables to a lower value. That would confuse our mechanism used to
delete sstables because sstables loaded at boot stage were touched.
Solution to this problem is about using rbegin() instead of end() to
get last element from column_family::_sstables.

The other problem is that refresh will only load generations that are
larger than or equal to X, so new sstables with lower generation will
not be loaded. Solution is about creating a set with generation of
live SSTables from all shards, and using this set to determine whether
a generation is new or not.

The last change was about providing an unused generation to reshuffle
procedure by adding one to the maximum generation. That's important to
prevent reshuffle from touching an existing SSTable.

Tested 'refresh' under the following scenarios:
1) Existing generations: 1, 2, 3, 4. New ones: 5, 6.
2) Existing generations: 3, 4, 5, 6. New ones: 1, 2.
3) Existing generations: 1, 2, 3, 4. New ones: 7, 8.
4) No existing generation. No new generation.
5) No existing generation. New ones: 1, 2.
I also had to adapt existing testcase for reshuffle procedure.

Fixes #1073.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <1c7b8b7f94163d5cd00d90247598dd7d26442e70.1458694985.git.raphaelsc@scylladb.com>
2016-03-23 10:21:58 +02:00
Benoît Canet
1594bdd5bb dist/ubuntu: Fix the init script variable sourcing
The variable sourcing was crashing the init script on ubuntu.
Fix it with the suggestion from Avi.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458685099-1160-1-git-send-email-benoit@scylladb.com>
2016-03-23 09:03:17 +02:00
Tomasz Grabiec
5f44afa311 cql3: batch_statement: Execute statements sequentially
Currently we execute all statements in parallel, but some statements
depend on order, in particular list append/prepend. Fix by executing
sequentially.

Fixes cql_additional_tests.py:TestCQL.batch_and_list_test dtest.

Fixes #1075.

Message-Id: <1458672874-4749-1-git-send-email-tgrabiec@scylladb.com>
2016-03-22 20:59:40 +02:00
Pekka Enberg
354fca9d56 Revert "streaming: Simplify session completion logic"
This reverts commit 208b7fa7ba. It breaks
Glauber's upcoming repair series.
2016-03-22 20:37:50 +02:00
Pekka Enberg
1f29a698d5 Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE"
This reverts commit 4c06221766. It breaks
Glauber's upcoming repair series.
2016-03-22 20:37:22 +02:00
Avi Kivity
7df21768d6 Merge "Fix row_cache_alloc_stress test" from Tomasz
"The test predates LSA zones and was not anticipating that LSA would
take much more free memory from the system than it needs in its assertions.
Fix by accounting for the fact properly."
2016-03-22 18:46:31 +02:00
Avi Kivity
b8f80bb2be Update scylla-ami submodule
* dist/ami/files/scylla-ami 56f1ab7...89e7436 (1):
  > Merge "iotune packaging fix for scylla-ami" from Takuya
2016-03-22 17:55:00 +02:00
Takuya ASADA
dac2bc3055 dist: on scylla_io_setup, SMP and CPUSET should be empty when the parameter not present
Fixes #1060

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458659928-2050-1-git-send-email-syuu@scylladb.com>
2016-03-22 17:49:06 +02:00
Avi Kivity
8cf785e53a Merge "Merge "iotune packaging fix" from Takuya
"This implements #1065
 - iotune will NOT be a part of scylla service - remove the scylla.io.service
 - User will have to run it manually - using a script call scylla_io_tune_setup (that will do the exact same thing the service does today.
 - if they wont, and do not use --developer-mode, scylla init will fail will a proper error - scylla will not start (in the same manner it does not start if you run scylla on non XFS FS)
 - For c3,m3,i2 we will use the evaluation formula we have (that takes the number of disks , cores etc.)
 - For other instances we will set --developer-mode. if the user logins into the instance - he will get a developer-mode warning
 - No iotune on AWS"

Fixes #1065.
2016-03-22 17:46:32 +02:00
Takuya ASADA
9889712d43 dist: remove scylla-io-setup.service and make it standalone script 2016-03-22 17:45:58 +02:00
Takuya ASADA
2cedab07f2 dist: on scylla_io_setup print out message both for stdout and syslog 2016-03-22 17:45:58 +02:00
Takuya ASADA
83112551bb dist: introduce dev-mode.conf and scylla_dev_mode_setup 2016-03-22 17:45:58 +02:00
Tomasz Grabiec
a4e3adfbec Fix assertion in row_cache_alloc_stress
Fixes the following assertion failure:

  row_cache_alloc_stress: tests/row_cache_alloc_stress.cc:120: main(int, char**)::<lambda()>::<lambda()>: Assertion `mt->occupancy().used_space() < memory::stats().free_memory()' failed.

memory::stats()::free_memory() may be much lower than the actual
amount of reclaimable memory in the system since LSA zones will try to
keep a lot of free segments to themselves. Fix by using actual amount
of reclaimable memory in the check.
2016-03-22 16:31:04 +01:00
Tomasz Grabiec
a0cba3c86f logalloc: Introduce tracker::occupancy()
Returns occupancy information for all memory allocated by LSA, including
segment pools / zones.
2016-03-22 16:28:10 +01:00
Yoav Kleinberger
97bb7a35d9 tools/scyllatop: some sensible default metrics
Previosly if the user did not specify any metrics, scyllatop use
whatever it could find. Now we have some preset defaults which are
probably more interesting.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1458658804-377-1-git-send-email-yoav@scylladb.com>
2016-03-22 17:04:13 +02:00
Tomasz Grabiec
529c8b8858 logalloc: Rename tracker::occupancy() to region_occupancy() 2016-03-22 14:56:44 +01:00
Pekka Enberg
5019b709ba service/migration_manager: Simplify verb unregistration
You can safely unregister verbs even if they're not registered yet.
Simplify code in migration manager by dropping the redundant checks.
Message-Id: <1458027669-6517-1-git-send-email-penberg@scylladb.com>
2016-03-22 15:24:55 +02:00
Pekka Enberg
3e1a660839 Merge seastar upstream
* seastar c193821...9f2b868 (4):
  > memory: set free memory to non-zero value in debug mode
  > Merge "Increase IOTune's robustness by including a timeout" from Glauber
  > shared_future: add companion class, shared_promise
  > rpc: fix client connection stopping
2016-03-22 15:16:21 +02:00
Asias He
4c06221766 streaming: Start to send mutations after PREPARE_DONE_MESSAGE
Below are 3 possible cases in a stream session, after commit
208b7fa7ba (streaming: Simplify session completion logic) We might
close the session before the exchange of the PREPARE_DONE_MESSAGE
message in case 1). To fix, we defer the sending of mutations after
PREPARE_DONE_MESSAGE is sent at the initiator node.

1)
Initiator         Follower
tx rx              tx rx
1  0               0  1
send prepare
                   send back prepare
recev prepare
send mutations (close the session before prepare_done msg is sent)
                  recv mutations (close session before prepare_done msg is received)
send prepare_done
                   recv prepare_done and send no mutations
2)
Initiator         Follower
tx rx              tx rx
0  1               1  0
send prepare
                    send back prepare
recv prepare
nothing to send
send prepare_done
                    recv prepare_done and send mutations (close session)
recv mutations (close session)

3)
Initiator         Follower
tx rx              tx rx
1  1               1  1
send prepare
                    send back prepare
recv prepare
send mutations
                    recv mutations, can not close session since we have mutations to send
send prepare_done
                    recv prepare_done and send mutations (close session)
recv mutations (close session)
Message-Id: <d6510b558565db23202164fa491b883ef3796e58.1458634037.git.asias@scylladb.com>
2016-03-22 15:05:57 +02:00
Takuya ASADA
6b2a8a2f70 dist: enable collectd on scylla_setup by default, to make scyllatop usable
Fixes #1037

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458324769-9152-1-git-send-email-syuu@scylladb.com>
2016-03-22 15:02:18 +02:00
Tomasz Grabiec
ca08db504b managed_bytes: Make operator[] work for large blobs as well
Fixes assertion in mutation_test:

mutation_test: ./utils/managed_bytes.hh:349: blob_storage::char_type* managed_bytes::data(): Assertion `!_u.ptr->next'

Introduced in ea7c2dd085

Message-Id: <1458648786-9127-1-git-send-email-tgrabiec@scylladb.com>
2016-03-22 14:43:52 +02:00
Gleb Natapov
1e6352e398 messaging: do not admit new requests during messaging service shutdown.
Sending a message may open new client connection which will never be
closed in case messaging service is shutting down already.

Fixes #1059

Message-Id: <1458639452-29388-3-git-send-email-gleb@scylladb.com>
2016-03-22 13:00:18 +02:00
Gleb Natapov
357c91a076 messaging: do not delete client during messaging service shutdown
Messaging service stop() method calls stop() on all clients. If
remove_rpc_client_one() is called while those stops are running
client::stop() will be called twice which not suppose to happen. Fix it
by ignoring client remove request during messaging service shutdown.

Fixes #1059

Message-Id: <1458639452-29388-2-git-send-email-gleb@scylladb.com>
2016-03-22 13:00:18 +02:00
Asias He
b8abd88841 messaging_service: Take reference of ms in send_message_timeout_and_retry
Take a reference of messaging_service object inside
send_message_timeout_and_retry to make sure it is not freed during the
life time of send_message_timeout_and_retry operation.
2016-03-22 12:32:19 +02:00
Pekka Enberg
ae33e9fe76 dist/ubuntu: Use tilde for release candidate builds
The version number ordering rules are different for rpm and deb. Use
tilde ('~') for the latter to ensure a release candidate is ordered
_before_ a final version.

Message-Id: <1458627524-23030-1-git-send-email-penberg@scylladb.com>
2016-03-22 11:52:05 +02:00
Avi Kivity
5a20a70728 Merge "CQL syntax extension to handle sstable loader lists" from Calle
"Adds an extension function SCYLLA_TIMEUUID_LIST_INDEX to CQL syntax
for collection element indexing, which, if the target is a list,
will attempt to directly index the list (which is really a map)
by the ordering time uuid (as index parameter)."
2016-03-22 11:42:47 +02:00
Duarte Nunes
36571a2018 init: Trim spaces in seeds list
This patch ensures we are resilient against spaces before or after IP
addresses in the seeds list.

Fixes #958

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1458637617-5761-1-git-send-email-duarte@scylladb.com>
2016-03-22 11:10:29 +02:00
Avi Kivity
1798889e85 Merge "Make apply() exception-safe" from Tomasz
"We cannot leave partially applied mutation behind when the write
fails. It may fail if memory allocation fails in the middle of
apply(). This for example would violate write atomicity, readers
should either see the whole write or none at all.

This fix makes apply() revert partially applied data upon failure, by
the means of ReversiblyMergeable concept. In a nut shell the idea is
to store old state in the source mutation as we apply it and swap back
in case of exception. At cell level this swapping is inexpensive, just
rewiring pointers. For this to work, the source mutation needs to be
brought into mutable form, so frozen mutations need to be unfrozen. In
practice this doesn't increase amount of cell allocations in the
memtable apply path because incoming data will usually be newer and we
will have to copy it into LSA anyway. There are extra allocations
though for the data structures which holds cells.

I didn't see significant change in performance of:

    build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13

The score fluctuates around ~77k ops/s.

The change was tested with a unit test (patch to mutation_test) which generates
random mutations and injects allocation failures at every possible allocation
site in the apply path. This also uncovered other preexisting bugs."
2016-03-22 10:43:41 +02:00
Gleb Natapov
ea92064d38 avoid invoke_on_all during developer-mode application if possible
Message-Id: <20160315145327.GW6117@scylladb.com>
2016-03-22 10:40:30 +02:00
Nadav Har'El
2eb0627665 sstable: fix use-after-free of temporary ioclass copy
Commit 6a3872b355 fixed some use-after-free
bugs but introduced a new one because of a typo:

Instead of capturing a reference to the long-living io-class object, as
all the code does, one place in the code accidentally captured a *copy*
of this object. This copy had a very temporary life, and when a reference
to that *copy* was passed to sstable reading code which assumed that it
lives at least as long as the read call, a use-after-free resulted.

Fixes #1072

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1458595629-9314-1-git-send-email-nyh@scylladb.com>
2016-03-21 22:28:05 +01:00
Tomasz Grabiec
6e73c3f3dc perf_simple_query: Make duration configurable 2016-03-21 21:49:53 +01:00
Tomasz Grabiec
2fbb55929d mutation_test: Add allocation failure stress test for apply()
The test injects allocation failures at every allocation site during
apply(). Only allocations throug allocation_strategy are instrumented,
but currently those should include all allocations in the apply() path.

The target and source mutations are randomized.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
8ede27f9c6 mutation_test: Add more apply() tests 2016-03-21 21:49:53 +01:00
Tomasz Grabiec
36575d9f01 mutation_test: Hoist make_blob() to a function 2016-03-21 21:49:53 +01:00
Tomasz Grabiec
4c85d06df7 mutation_test: Make make_blob() return different blob each time
random_bytes was constructed with the same seed each time.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
19b3df9f0f mutation_test: Fix use-after-free
The problem was that verify_row() was returning a future which was not
waited on. Fix by running the code in a thread.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
a7966e9b71 mutation_partition: Fix friend declarations
Missing "class" confuses CLion IDE.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
dc290f0af7 mutation_partition: Make apply() atomic even in case of exception
We cannot leave partially applied mutation behind when the write
fails. It may fail if memory allocation fails in the middle of
apply(). This for example would violate write atomicity, readers
should either see the whole write or none at all.

This fix makes apply() revert partially applied data upon failure, by
the means of ReversiblyMergeable concept. In a nut shell the idea is
to store old state in the source mutation as we apply it and swap back
in case of exception. At cell level this swapping is inexpensive, just
rewiring pointers. For this to work, the source mutation needs to be
brought into mutable form, so frozen mutations need to be unfrozen. In
practice this doesn't increase amount of cell allocations in the
memtable apply path because incoming data will usually be newer and we
will have to copy it into LSA anyway. There are extra allocations
though for the data structures which holds cells.

I didn't see significant change in performance of:

  build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13

The score fluctuates around ~77k ops/s.

Fixes #283.
2016-03-21 21:49:52 +01:00
Tomasz Grabiec
e09d186c7c mutation_partition: Make intrusive sets ReversiblyMergeable 2016-03-21 21:49:52 +01:00
Tomasz Grabiec
f1a4feb1fc mutation_partition: Make row_tombstones_entry ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
e4a576a90f mutation_partition: Make rows_entry ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
aadcd75d89 mutation_partition: Make row_marker ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
ea7c2dd085 mutation_partition: Make row ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
c9d4f5a49c atomic_cell_or_collection: Introduce as_atomic_cell_ref()
Needed for setting the REVERT flag on existing cell.
2016-03-21 19:25:54 +01:00
Tomasz Grabiec
1ffe06165d atomic_cell_hash: Specialize appending_hash<> for atomic_cell and collection_mutation 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
bfc6413414 atomic_cell: Add REVERT flag
Needed to make atomic cells ReversiblyMergeable.
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
7fcfa97916 tombstone: Make ReversiblyMergeable 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
1407173186 Introduce the concept of ReversiblyMergeable 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
9fc7f8a5ed mutation_partition: row: Add empty() 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
d5e66a5b0d mutation_partition: row: Allow storing empty cells internally
Currently only "set" storage could store empty cells, but not the
"vector" one because there empty cell has the meaning of being
missing. To implement rolback, we need to be able to distinguish empty
cells from missing ones. Solve by making vector storage use a bitmap
for presence checking instead of emptiness. This adds 4 bytes to
vector storage.
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
ed1e6515db mutation_partition: Make row::merge() tolerate empty row
The row may be empty and still have a set storage, in which case
rbegin() dereference is undefined behavior.
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
184e2831e7 managed_bytes: Mark move-assignment noexcept 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
92d4cfc3ab managed_bytes: Make copy assignment exception-safe 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
22d193ba9f managed_bytes: Make linearization_context::forget() noexcept
It is needed for noexcept destruction, which we need for exception
safety in higher layers.

According to [1], erase() only throws if key comparison throws, and in
our case it doesn't.

[1] http://en.cppreference.com/w/cpp/container/unordered_map/erase
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
87d7279267 mutation: Add copy assignment operator
We already have a copy constructor, so can have copy assignment as
well.
2016-03-21 18:41:27 +01:00
Shlomi Livne
b7e338275b fix centos local ami creation (revert some changes)
in centos we do not have a version file created - revert this changes
introduced when adding ubuntu ami creation

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <69c80dcfa7afe4f5db66dde2893d9253a86ac430.1458578004.git.shlomi@scylladb.com>
2016-03-21 18:41:40 +02:00
Asias He
208b7fa7ba streaming: Simplify session completion logic
Both the initiator and follower of a stream session knows how many
transfer task and receive task the stream session contains in the
preparation phase. They use the _transfers and _receivers map to track
the tasks, like below:

       std::map<UUID, stream_transfer_task> _transfers;
       std::map<UUID, stream_receive_task> _receivers;

A stream_transfer_task will send STREAM_MUTATION verb to transfer data
with frozen_mutation, when all the STREAM_MUTATIONs are sent, it will
send STREAM_MUTATION_DONE to tell the peer the stream_transfer_task is
completed and remove the stream_transfer_task from _transfers map.  The
peer will remove the corresponding stream_receive_task in _receivers.

We do not really need the COMPLETE_MESSAGE verb to notify the peer we
have completed sending. It makes the session completion logic much
simpler and cleaner if we do not depend on COMPLETE_MESSAGE verb.

However, to be compatible with older version, we always send a
COMPLETE_MESSAGE message and do nothing in the COMPLETE_MESSAGE handler
and replies a ready future even if the stream_session is closed already.
This way, node with older version will get a COMPLETE_MESSAGE message
and manage to send a COMPLETE_MESSAGE message to new node as before.

Message-Id: <1458540564-34277-2-git-send-email-asias@scylladb.com>
2016-03-21 16:58:03 +02:00
Pekka Enberg
4892a6ded9 build: Invoke Seastar build only once
Make sure we invoke the Seastar ninja build only once from our own build
process so that we don't have multiple ninjas racing with each other.

Refs #1061.

Message-Id: <1458563076-29502-1-git-send-email-penberg@scylladb.com>
2016-03-21 16:22:11 +02:00
Takuya ASADA
6edd909b00 dist: stop using '-p' option on lsblk since Ubuntu doesn't supported it
On scylla_setup interactive mode we are using lsblk to list up candidate
block devices for RAID, and -p option is to print full device paths.

Since Ubuntu 14.04LTS version of lsblk doesn't supported this option, we
need to use non-full path name and complete paths before passes it to
scylla_raid_setup.

Fixes #1030

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458325411-9870-1-git-send-email-syuu@scylladb.com>
2016-03-21 14:54:36 +02:00
Calle Wilund
5982c0ee10 Cql.g: Add extension function SCYLLA_TIMEUUID_LIST_INDEX
Allows scylla sstable loader (cql) to do by-uuid updates to
non-frozen lists.
2016-03-21 12:28:37 +00:00
Calle Wilund
5b570c417b cql3::operation: Allow set_element to be "by uuid" (for lists)
Just add an instantiation flag to keep track. Then choose actual
opertation to perform in prepare.
2016-03-21 12:28:37 +00:00
Calle Wilund
71170f51a8 cql3::lists: Add setter_by_uuid operation
Allows direct setting of list element by UUID key
2016-03-21 12:28:36 +00:00
Asias He
39992dd559 gossip: Sync gossip_digest.idl.hh and application_state.hh
We did the clean up in idl/gossip_digest.idl.hh, but the patch to clean
up gms/application_state.hh was never merged.

To maintain compatibility with previous version of scylla, we can not
change application_state.hh, instead change idl to be sync with
application_state.hh.

Message-Id: <3a78b159d5cb60bc65b354d323d163ce8528b36d.1458557948.git.asias@scylladb.com>
2016-03-21 13:07:22 +02:00
Pekka Enberg
bcdd034512 dist/ubuntu: Install wget package if it's not available
The build scripts use wget so make sure it's actually installed on the
machine.

Message-Id: <1458554706-14558-1-git-send-email-penberg@scylladb.com>
2016-03-21 12:36:52 +02:00
Asias He
7acc9816d2 gossip: Handle unknown application_state when printing
In case an unknown application_state is received, we should be able to
handle it when printting.

Message-Id: <98d2307359292e90c8925f38f67a74b69e45bebe.1458553057.git.asias@scylladb.com>
2016-03-21 11:59:04 +02:00
Asias He
28ccd866e2 streaming: Move ranges in stream_plan
The ranges are not used afterwards. We can move instead of copy.
Message-Id: <1458540564-34277-1-git-send-email-asias@scylladb.com>
2016-03-21 10:10:09 +01:00
Avi Kivity
e1e4766cc6 Merge "Ubuntu based AMI support" from Takuya
"This provides Ubuntu based AMI support.
With this patchset, you will able to run build_ami.sh on Ubuntu 14.04LTS."
2016-03-20 20:40:21 +02:00
Raphael Carvalho
de4b4e593d db: better handling of failure in column_family::populate
Improve handling of failure by saving first exception and ignoring
the remaining futures. At the moment, code only throws first
exception and doesn't care about any possible remaining future.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <383dc4445db09dd2fbce093d4609a0a0bc38a405.1458240398.git.raphaelsc@scylladb.com>
2016-03-20 17:33:20 +02:00
Avi Kivity
7869a48c31 Update scylla-ami submodule
* dist/ami/files/scylla-ami 84bcd0d...56f1ab7 (2):
  > Ubuntu AMI support on scylla_install_ami
  > scylla_ami_setup is not POSIX sh compatible, change shebang to /bin/bash
2016-03-20 17:26:03 +02:00
Takuya ASADA
769204d41e dist: allow more requests for i2 instances
i2 instances has better performance than others, so allow more requests.
Fixes #921

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458251067-1533-1-git-send-email-syuu@scylladb.com>
2016-03-20 17:24:52 +02:00
Tomasz Grabiec
c518e852ee modificiation_statement: Use result_view::do_with()
Reduces code duplication.

Message-Id: <1458336592-22065-1-git-send-email-tgrabiec@scylladb.com>
2016-03-20 15:14:28 +02:00
Avi Kivity
6d031b4c6b Merge seastar upstream
* seastar 6a207e1...c193821 (6):
  > semaphore: allow wait() and signal() after broken()
  > run reactor::stop() only once
  > sharded: fix start with reference parameter
  > core: add asserts to rwlock
  > util/defer: Fix cancel() not being respected
  > tcp: Do not return accept until the connection is connected
2016-03-20 13:32:18 +02:00
Tomasz Grabiec
8134992024 mutation_partition: Add cell_entry constructor which makes an empty cell 2016-03-18 22:30:04 +01:00
Tomasz Grabiec
518e956736 mutation_partition: Make row::vector_to_set() exception-safe
Currently allocation failure can leave the old row in a
half-moved-from state and leak cell_entry objects.
2016-03-18 22:30:04 +01:00
Tomasz Grabiec
c91eefa183 mutation_partition: Unmark cell_entry's copy constructor as noexcept
It was a mistake, it certainly may throw because it copies cells.
2016-03-18 22:30:04 +01:00
Glauber Costa
e52b869b25 fix small typo
will sent -> will send

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20eaf0cea6fe14b03332547b7c4a3b85e9b619e7.1458325926.git.glauber@scylladb.com>
2016-03-18 20:34:22 +02:00
Takuya ASADA
a6cd085c38 dist: allow to run 'sudo scylla_ami_setup' for Ubuntu AMI
Allows to run scylla_ami_setup from scylla-server.conf

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
7828023599 dist: launch scylla_ami_setup on Ubuntu AMI
Since upstart does not have same behavior as systemd, we need to run scylla_io_setup and scylla_ami_setup in scylla-server.conf's pre-start stanza.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
93bf7bff8e dist: fix broken scylla_install_pkg --local-pkg and --unstable on Ubuntu
--local-pkg and --unstable arguments didn't handled on Ubuntu, support it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
0c83b34d0c dist: prevent to show up dialog on apt-get in scylla_raid_setup
"apt-get -y install mdadm" shows up a dialog to select install mode of postfix, this will block scylla-ami-setup.service forever since it is running as background task, we need to prevent it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
b097ed6d75 dist: Ubuntu based AMI support
This introduces Ubuntu AMI.
Both CentOS AMI and Ubuntu AMI are need to build on same distribution, so build_ami.sh script automatically detect current distribution, and selects base AMI image.

Fixes #998

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:40 +09:00
Takuya ASADA
4cc589872d dist: follow sysconfig setting when counting number of cpus on scylla_io_setup
When NR_CPU >= 8, we disabled cpu0 for AMI on scylla_sysconfig_setup.
But scylla_io_setup doesn't know that, try to assign NR_CPU queues, then scylla fails to start because queues > cpus.
So on this fix scylla_io_setup checks sysconfig settings, if '--smp <n>' specified on SCYLLA_ARGS, use n to limit queue size.
Also, when instance type is not supported pre-configured parameters, we need to passes --cpuset parameters to iotune. Otherwise iotune will run on a different set of CPUs, which may have different performance characteristics.

Fixes #996, #1043, #1046

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458221762-10595-2-git-send-email-syuu@scylladb.com>
2016-03-17 16:44:46 +02:00
Takuya ASADA
6f71173827 dist: On scylla_sysconfig_setup, don't disable cpu0 on non-AMI environments
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458221762-10595-1-git-send-email-syuu@scylladb.com>
2016-03-17 16:44:45 +02:00
Benoît Canet
3b1d3d977d exceptions: Shutdown communications on non file I/O errors
Apply the same treatment to non file filesystem I/O errors.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:54 +02:00
Benoît Canet
1fb9a48ac5 exception: Optionally shutdown communication on I/O errors.
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:52 +02:00
Pekka Enberg
69dacf9063 main: Fix broadcast_address and listen_address validation errors
Fix the validation error message to look like this:

  Scylla version 666.development-20160316.49af399 starting ...
  WARN  2016-03-17 12:24:15,137 [shard 0] config - Option partitioner is not (yet) used.
  WARN  2016-03-17 12:24:15,138 [shard 0] init - NOFILE rlimit too low (recommended setting 200000, minimum setting 10000; you may run out of file descriptors.
  ERROR 2016-03-17 12:24:15,138 [shard 0] init - Bad configuration: invalid 'listen_address': eth0: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> > (Invalid argument)
  Exiting on unhandled exception of type 'bad_configuration_error': std::exception

Instead of:

  Exiting on unhandled exception of type 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >': Invalid argument

Fixes #1051.

Message-Id: <1458210329-4488-1-git-send-email-penberg@scylladb.com>
2016-03-17 14:59:00 +02:00
Tomasz Grabiec
b9af32c9d5 Merge branch 'pdziepak/fix-lsa-memory-accounting/v1' from seastar-dev.git
Memory accounting fix from Paweł.
2016-03-17 12:55:21 +01:00
Paweł Dziepak
13849fd129 tests/lsa: add test for region groups
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-17 11:20:22 +00:00
Paweł Dziepak
ed53784cb6 tests/lsa: do not leak memory in large allocation test
Large allocations test, unsurprisingly, allocates a lot of memory. Do
not leak it so that any tests that are going to be run afterwards have
still some memory left.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-17 11:19:13 +00:00
Paweł Dziepak
338fd34770 lsa: update _closed_occupancy after freeing all segments
_closed_occupancy will be used when a region is removed from its region
group, make sure that it is accurate.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-17 11:12:05 +00:00
Pekka Enberg
0434bc3d33 dist: Fix '--developer-mode' parsing in scylla_io_setup
We need to support the following variations:

   --developer-mode true
   --developer-mode 1
   --developer-mode=true
   --developer-mode=1

Fixes #1026.
Message-Id: <1458203393-26658-1-git-send-email-penberg@scylladb.com>
2016-03-17 09:58:34 +01:00
Pekka Enberg
972fc6e014 main: Defer API server hooks until commitlog replay
Defer registering services to the API server until commitlog has been
replayed to ensure that nobody is able to trigger sstable operations via
'nodetool' before we are ready for them.
Message-Id: <1458116227-4671-1-git-send-email-penberg@scylladb.com>
2016-03-17 10:04:35 +02:00
Takuya ASADA
95161d5db7 dist: add scylla-gdb.py on Ubuntu dbg package
Fixes #969

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458150248-10632-1-git-send-email-syuu@scylladb.com>
2016-03-17 09:03:00 +02:00
Pekka Enberg
303dd76205 Merge "Fix debug messages for streaming session" from Glauber
"One of the messages is printed twice, and one of the verbs is missing
 a message. That makes it hard to debug the session."
2016-03-17 08:11:50 +02:00
Glauber Costa
a3ebf640c6 stream_session: print debug message for STREAM_MUTATION
For this verb(), we don't call get_session - and it doesn't look like we will.
We currently have no debug message for this one, which makes it harder to debug
the stream of messages. Print it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-16 22:09:46 -04:00
Glauber Costa
0ab4275893 stream_session: remove duplicated debug message
Whenever we call get_session, that will print a debug message about the arrival
of this new verb. Because we also print that explicitly in PREPARE_DONE, that
message gets duplicated.

That confuses poor developers who are, for a while, left wondering why is it that
the sender is sender the message twice.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-16 22:04:25 -04:00
Glauber Costa
6a3872b355 sstables: do not assume mutation_reader will be kept alive
Our sstables::mutation_reader has a specialization in which start and end
ranges are passed as futures. That is needed because we may have to read the
index file for those.

This works well under the assumption that every time a mutation_reader will be
created it will be used, since whoever is using it will surely keep the state
of the reader alive.

However, that assumption is no longer true - for a while. We use a reader
interface for reading everything from mutations and sstables to cache entries,
and when we create an sstable mutation_reader, that does not mean we'll use it.
In fact we won't, if the read can be serviced first by a higher level entity.

If that happens to be the case, the reader will be destructed. However, since
it may take more time than that for the start and end futures to resolve, by
the time they are resolved the state of the mutation reader will no longer be
valid.

The proposed fix for that is to only resolve the future inside
mutation_reader's read() function. If that function is called,  we can have a
reasonable expectation that the caller object is being kept alive.

A second way to fix this would be to force the mutation reader to be kept alive
by transforming it into a shared pointer and acquiring a reference to itself.
However, because the reader may turn out not to be used, the delayed read
actually has the advantage of not even reading anything from the disk if there
is no need for it.

Also, because sstables can be compacted, we can't guarantee that the sst object
itself , used in the resolution of start and end can be alive and that has the
same problem. If we delay the calling of those, we will also solve a similar
problem.  We assume here that the outter reader is keeping the SSTable object
alive.

I must note that I have not reproduced this problem. What goes above is the
result of the analysis we have made in #1036. That being the case, a thorough
review is appreciated.

Fixes #1036

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <a7e4e722f76774d0b1f263d86c973061fb7fe2f2.1458135770.git.glauber@scylladb.com>
2016-03-16 17:51:02 +02:00
Nadav Har'El
02ba8ffbe8 Allow uncompression at end of file
Asking to read from byte 100 when a file has 50 bytes is an obvious error.
But what if we ask to read from byte 50? What if we ask to read 0 bytes at
byte 50? :-)

Before this patch, code which asked to read from the EOF position would
get an exception. After this patch, it would simply read nothing, without
error. This allows, for example, reading 0 bytes from position 0 on a file
with 0 bytes, which apparently happened in issue #1039...

A read which starts at a position higher than the EOF position still
generates an exception.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1458137867-10998-1-git-send-email-nyh@scylladb.com>
2016-03-16 17:50:23 +02:00
Nadav Har'El
73297c7872 Fix out-of-range exception when uncompressing 0 bytes
The uncompression code reads the compressed chunks containing the bytes
pos through pos + len - 1. This, however, is not correct when len==0,
and pos + len - 1 may even be -1, causing an out-of-range exception when
calling locate() to find the chunks containing this byte position.

So we need to treat len==0 specially, and in this case we don't read
anything, and don't need to locate() the chunks to read.

Refs #1039.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1458135987-10200-1-git-send-email-nyh@scylladb.com>
2016-03-16 15:54:48 +02:00
Takuya ASADA
f1d18e9980 dist: do not auto-start scylla-server job on Ubuntu package install time
Fixes #1017

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458122424-22889-1-git-send-email-syuu@scylladb.com>
2016-03-16 13:55:12 +02:00
Pekka Enberg
2f519b9b34 tests/gossip_test: Fix messaging service stop
This fixes gossip test shutdown similar to what commit 13ce48e ("tests:
Fix stop of storage_service in cql_test_env") did for CQL tests:

  gossip_test: /home/penberg/scylla/seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local() [with Service = net::messaging_service]: Assertion `local_is_initialized()' failed.
  Running 1 test case...

  [snip]

  unknown location(0): fatal error in "test_boot_shutdown": signal: SIGABRT (application abort requested)
  seastar/tests/test-utils.cc(32): last checkpoint
Message-Id: <1458126520-20025-1-git-send-email-penberg@scylladb.com>
2016-03-16 13:15:18 +02:00
Asias He
2d50c71ca3 streaming: Handle cf is deleted after the deletion check
The cf can be deleted after the cf deletion check. Handle this case as
well.

Use "warn" level to log if cf is missing. Although we can handle the
case, but it is good to distingush where the receiver of streaming
applied all the stream mutations or not. We believe that the cf is
missing because it was dropped, but it could be missing because of a bug
or something we didn't anticipated here.

Related patch: "streaming: Handle cf is deleted when sending
STREAM_MUTATION_DONE"

Fixes simple_add_new_node_while_schema_changes_test failure.
Message-Id: <c4497e0500f50e0a3422efb37e73130765c88c57.1458090598.git.asias@scylladb.com>
2016-03-16 09:46:41 +01:00
Asias He
13ce48e775 tests: Fix stop of storage_service in cql_test_env
In stop() of storage_service, it unregisters the verb handler. In the
test, we stop messaging_service before storage_service. Fix it by
deferring stop of messaging_service.
Message-Id: <c71f7b5b46e475efe2fac4c1588460406f890176.1458086329.git.asias@scylladb.com>
2016-03-16 08:32:01 +02:00
839 changed files with 37924 additions and 9900 deletions

9
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,9 @@
*Installation details*
Scylla version (or git commit hash):
Cluster size:
OS (RHEL/CentOS/Ubuntu/AWS AMI):
*Hardware details (for performance issues)* Delete if unneeded
Platform (physical/VM/cloud instance type/docker):
Hardware: sockets= cores= hyperthreading= memory=
Disks: (SSD/HDD, count)

View File

@@ -1,6 +1,6 @@
#Scylla
# Scylla
##Building Scylla
## Building Scylla
In addition to required packages by Seastar, the following packages are required by Scylla.
@@ -15,7 +15,7 @@ git submodule update --recursive
* Installing required packages:
```
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel
```
* Build Scylla

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=1.0.4
VERSION=1.3.5
if test -f version
then

View File

@@ -487,6 +487,36 @@
}
]
},
{
"path": "/cache_service/metrics/row/hits_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get row hits moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_row_hits_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/row/requests_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get row requests moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_row_requests_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/row/size",
"operations": [

View File

@@ -55,6 +55,57 @@
"paramType":"query"
}
]
},
{
"method":"POST",
"summary":"Start reporting on one or more collectd metric",
"type":"void",
"nickname":"enable_collectd",
"produces":[
"application/json"
],
"parameters":[
{
"name":"pluginid",
"description":"The plugin ID, describe the component the metric belongs to. Examples are cache, thrift, etc'. Regex are supported.The plugin ID, describe the component the metric belong to. Examples are: cache, thrift etc'. regex are supported",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"instance",
"description":"The plugin instance typically #CPU indicating per CPU metric. Regex are supported. Omit for all",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"type",
"description":"The plugin type, the type of the information. Examples are total_operations, bytes, total_operations, etc'. Regex are supported. Omit for all",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"type_instance",
"description":"The plugin type instance, the specific metric. Exampls are total_writes, total_size, zones, etc'. Regex are supported, Omit for all",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"enable",
"description":"set to true to enable all, anything else or omit to disable",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
@@ -63,10 +114,10 @@
"operations":[
{
"method":"GET",
"summary":"Get a collectd value",
"summary":"Get a list of all collectd metrics and their status",
"type":"array",
"items":{
"type":"type_instance_id"
"type":"collectd_metric_status"
},
"nickname":"get_collectd_items",
"produces":[
@@ -74,6 +125,25 @@
],
"parameters":[
]
},
{
"method":"POST",
"summary":"Enable or disable all collectd metrics",
"type":"void",
"nickname":"enable_all_collectd",
"produces":[
"application/json"
],
"parameters":[
{
"name":"enable",
"description":"set to true to enable all, anything else or omit to disable",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
}
@@ -113,6 +183,20 @@
}
}
}
},
"collectd_metric_status":{
"id":"collectd_metric_status",
"description":"Holds a collectd id and an enable flag",
"properties":{
"id":{
"description":"The metric ID",
"type":"type_instance_id"
},
"enable":{
"description":"Is the metric enabled",
"type":"boolean"
}
}
}
}
}

View File

@@ -1094,7 +1094,7 @@
"method":"GET",
"summary":"Get read latency histogram",
"$ref": "#/utils/histogram",
"nickname":"get_read_latency_histogram",
"nickname":"get_read_latency_histogram_depricated",
"produces":[
"application/json"
],
@@ -1121,6 +1121,49 @@
"items":{
"$ref": "#/utils/histogram"
},
"nickname":"get_all_read_latency_histogram_depricated",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/column_family/metrics/read_latency/moving_average_histogram/{name}",
"operations":[
{
"method":"GET",
"summary":"Get read latency moving avrage histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname":"get_read_latency_histogram",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/column_family/metrics/read_latency/moving_average_histogram/",
"operations":[
{
"method":"GET",
"summary":"Get read latency moving avrage histogram from all column family",
"type":"array",
"items":{
"$ref": "#/utils/rate_moving_average_and_histogram"
},
"nickname":"get_all_read_latency_histogram",
"produces":[
"application/json"
@@ -1260,7 +1303,7 @@
"method":"GET",
"summary":"Get write latency histogram",
"$ref": "#/utils/histogram",
"nickname":"get_write_latency_histogram",
"nickname":"get_write_latency_histogram_depricated",
"produces":[
"application/json"
],
@@ -1287,6 +1330,49 @@
"items":{
"$ref": "#/utils/histogram"
},
"nickname":"get_all_write_latency_histogram_depricated",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/column_family/metrics/write_latency/moving_average_histogram/{name}",
"operations":[
{
"method":"GET",
"summary":"Get write latency moving average histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname":"get_write_latency_histogram",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/column_family/metrics/write_latency/moving_average_histogram/",
"operations":[
{
"method":"GET",
"summary":"Get write latency moving average histogram of all column family",
"type":"array",
"items":{
"$ref": "#/utils/rate_moving_average_and_histogram"
},
"nickname":"get_all_write_latency_histogram",
"produces":[
"application/json"

View File

@@ -716,6 +716,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/read/timeouts_rates",
"operations": [
{
"method": "GET",
"summary": "Get read metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_read_metrics_timeouts_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/unavailables_rates",
"operations": [
{
"method": "GET",
"summary": "Get read metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_read_metrics_unavailables_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/histogram",
"operations": [
@@ -723,7 +753,7 @@
"method": "GET",
"summary": "Get read metrics",
"$ref": "#/utils/histogram",
"nickname": "get_read_metrics_latency_histogram",
"nickname": "get_read_metrics_latency_histogram_depricated",
"produces": [
"application/json"
],
@@ -738,6 +768,36 @@
"method": "GET",
"summary": "Get range metrics",
"$ref": "#/utils/histogram",
"nickname": "get_range_metrics_latency_histogram_depricated",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/moving_avrage_histogram",
"operations": [
{
"method": "GET",
"summary": "Get read metrics",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_read_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/moving_avrage_histogram",
"operations": [
{
"method": "GET",
"summary": "Get range metrics rate and histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_range_metrics_latency_histogram",
"produces": [
"application/json"
@@ -776,6 +836,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/range/timeouts_rates",
"operations": [
{
"method": "GET",
"summary": "Get range metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_range_metrics_timeouts_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/unavailables_rates",
"operations": [
{
"method": "GET",
"summary": "Get range metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_range_metrics_unavailables_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/timeouts",
"operations": [
@@ -806,6 +896,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/write/timeouts_rates",
"operations": [
{
"method": "GET",
"summary": "Get write metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_write_metrics_timeouts_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/unavailables_rates",
"operations": [
{
"method": "GET",
"summary": "Get write metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_write_metrics_unavailables_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/histogram",
"operations": [
@@ -813,6 +933,21 @@
"method": "GET",
"summary": "Get write metrics",
"$ref": "#/utils/histogram",
"nickname": "get_write_metrics_latency_histogram_depricated",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/moving_avrage_histogram",
"operations": [
{
"method": "GET",
"summary": "Get write metrics",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_write_metrics_latency_histogram",
"produces": [
"application/json"

View File

@@ -177,6 +177,22 @@
}
]
},
{
"path":"/storage_service/scylla_release_version",
"operations":[
{
"method":"GET",
"summary":"Fetch a string representation of the Scylla version.",
"type":"string",
"nickname":"get_scylla_release_version",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/schema_version",
"operations":[

View File

@@ -65,6 +65,41 @@
"description":"The series of values to which the counts in `buckets` correspond"
}
}
}
}
},
"rate_moving_average": {
"id":"rate_moving_average",
"description":"A meter metric which measures mean throughput and one, five, and fifteen-minute exponentially-weighted moving average throughputs",
"properties":{
"rates": {
"type":"array",
"items":{
"type":"double"
},
"description":"One, five and fifteen mintues rates"
},
"mean_rate": {
"type":"double",
"description":"The mean rate from startup"
},
"count": {
"type":"long",
"description":"Total number of events from startup"
}
}
},
"rate_moving_average_and_histogram": {
"id":"rate_moving_average_and_histogram",
"description":"A timer metric which aggregates timing durations and provides duration statistics, plus throughput statistics",
"properties":{
"meter": {
"type":"rate_moving_average",
"description":"The metric rate moving average"
},
"hist": {
"type":"histogram",
"description":"The metric histogram"
}
}
}
}
}

View File

@@ -61,10 +61,10 @@ future<> set_server_init(http_context& ctx) {
new content_replace("html")));
r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
new content_replace("html")));
rb->set_api_doc(r);
rb->register_function(r, "system",
"The system related API");
set_system(ctx, r);
rb->set_api_doc(r);
});
}
@@ -83,6 +83,10 @@ future<> set_server_storage_service(http_context& ctx) {
return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
}
future<> set_server_snitch(http_context& ctx) {
return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", set_endpoint_snitch);
}
future<> set_server_gossip(http_context& ctx) {
return register_api(ctx, "gossiper",
"The gossiper API", set_gossiper);
@@ -118,10 +122,6 @@ future<> set_server_gossip_settle(http_context& ctx) {
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
});
}

View File

@@ -110,44 +110,7 @@ future<json::json_return_type> sum_stats(distributed<T>& d, V F::*f) {
});
}
inline double pow2(double a) {
return a * a;
}
// FIXME: Move to utils::ihistogram::operator+=()
inline utils::ihistogram add_histogram(utils::ihistogram res,
const utils::ihistogram& val) {
if (res.count == 0) {
return val;
}
if (val.count == 0) {
return std::move(res);
}
if (res.min > val.min) {
res.min = val.min;
}
if (res.max < val.max) {
res.max = val.max;
}
double ncount = res.count + val.count;
// To get an estimated sum we take the estimated mean
// and multiply it by the true count
res.sum = res.sum + val.mean * val.count;
double a = res.count/ncount;
double b = val.count/ncount;
double mean = a * res.mean + b * val.mean;
res.variance = (res.variance + pow2(res.mean - mean) )* a +
(val.variance + pow2(val.mean -mean))* b;
res.mean = mean;
res.count = res.count + val.count;
for (auto i : val.sample) {
res.sample.push_back(i);
}
return res;
}
inline
httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
@@ -156,15 +119,39 @@ httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
return h;
}
template<class T, class F>
future<json::json_return_type> sum_histogram_stats(distributed<T>& d, utils::ihistogram F::*f) {
inline
httpd::utils_json::rate_moving_average meter_to_json(const utils::rate_moving_average& val) {
httpd::utils_json::rate_moving_average m;
m = val;
return m;
}
return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, utils::ihistogram(),
add_histogram).then([](const utils::ihistogram& val) {
inline
httpd::utils_json::rate_moving_average_and_histogram timer_to_json(const utils::rate_moving_average_and_histogram& val) {
httpd::utils_json::rate_moving_average_and_histogram h;
h.hist = val.hist;
h.meter = meter_to_json(val.rate);
return h;
}
template<class T, class F>
future<json::json_return_type> sum_histogram_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).hist;}, utils::ihistogram(),
std::plus<utils::ihistogram>()).then([](const utils::ihistogram& val) {
return make_ready_future<json::json_return_type>(to_json(val));
});
}
template<class T, class F>
future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));
});
}
inline int64_t min_int64(int64_t a, int64_t b) {
return std::min(a,b);
}

View File

@@ -38,6 +38,7 @@ struct http_context {
};
future<> set_server_init(http_context& ctx);
future<> set_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx);
future<> set_server_gossip(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx);

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -194,30 +194,46 @@ void set_cache_service(http_context& ctx, routes& r) {
});
cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return map_reduce_cf(ctx, uint64_t(0), [](const column_family& cf) {
return cf.get_row_cache().get_cache_tracker().region().occupancy().used_space();
}, std::plus<uint64_t>());
});
cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
return map_reduce_cf(ctx, uint64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.count();
}, std::plus<uint64_t>());
});
cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().stats().hits + cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
return map_reduce_cf(ctx, uint64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count();
}, std::plus<uint64_t>());
});
cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [](const column_family& cf) {
return ratio_holder(cf.get_row_cache().stats().hits + cf.get_row_cache().stats().misses,
cf.get_row_cache().stats().hits);
return ratio_holder(cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count(),
cf.get_row_cache().stats().hits.count());
}, std::plus<ratio_holder>());
});
cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate() + cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -25,10 +25,14 @@
#include "core/scollectd_api.hh"
#include "endian.h"
#include <boost/range/irange.hpp>
#include <regex>
namespace api {
using namespace scollectd;
using namespace httpd;
using namespace json;
namespace cd = httpd::collectd_json;
static auto transformer(const std::vector<collectd_value>& values) {
@@ -49,6 +53,14 @@ static auto transformer(const std::vector<collectd_value>& values) {
return collected_value;
}
static const char* str_to_regex(const sstring& v) {
if (v != "") {
return v.c_str();
}
return ".*";
}
void set_collectd(http_context& ctx, routes& r) {
cd::get_collectd.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -72,7 +84,7 @@ void set_collectd(http_context& ctx, routes& r) {
});
cd::get_collectd_items.set(r, [](const_req req) {
std::vector<cd::type_instance_id> res;
std::vector<cd::collectd_metric_status> res;
auto ids = scollectd::get_collectd_ids();
for (auto i: ids) {
cd::type_instance_id id;
@@ -80,10 +92,44 @@ void set_collectd(http_context& ctx, routes& r) {
id.plugin_instance = i.plugin_instance();
id.type = i.type();
id.type_instance = i.type_instance();
res.push_back(id);
cd::collectd_metric_status it;
it.id = id;
it.enable = scollectd::is_enabled(i);
res.push_back(it);
}
return res;
});
cd::enable_collectd.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {
std::regex plugin(req->param["pluginid"].c_str());
std::regex instance(str_to_regex(req->get_query_param("instance")));
std::regex type(str_to_regex(req->get_query_param("type")));
std::regex type_instance(str_to_regex(req->get_query_param("type_instance")));
bool enable = strcasecmp(req->get_query_param("enable").c_str(), "true") == 0;
return smp::invoke_on_all([enable, plugin, instance, type, type_instance]() {
for (auto id: scollectd::get_collectd_ids()) {
if (std::regex_match(std::string(id.plugin()), plugin) &&
std::regex_match(std::string(id.plugin_instance()), instance) &&
std::regex_match(std::string(id.type()), type) &&
std::regex_match(std::string(id.type_instance()), type_instance)) {
scollectd::enable(id, enable);
}
}
}).then([] {
return json::json_return_type(json_void());
});
});
cd::enable_all_collectd.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {
bool enable = strcasecmp(req->get_query_param("enable").c_str(), "true") == 0;
return smp::invoke_on_all([enable] {
for (auto id: scollectd::get_collectd_ids()) {
scollectd::enable(id, enable);
}
}).then([] {
return json::json_return_type(json_void());
});
});
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -40,7 +40,7 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
if (pos == sstring::npos) {
pos = name.find(":");
if (pos == sstring::npos) {
throw bad_param_exception("Column family name should be in keyspace::column_family format");
throw bad_param_exception("Column family name should be in keyspace:column_family format");
}
end = pos + 1;
} else {
@@ -77,14 +77,14 @@ future<json::json_return_type> get_cf_stats(http_context& ctx,
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, f](database& db) {
// Histograms information is sample of the actual load
@@ -92,7 +92,7 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
// with count. The information is gather in nano second,
// but reported in micro
column_family& cf = db.find_column_family(uuid);
return ((cf.get_stats().*f).count/1000.0) * (cf.get_stats().*f).mean;
return ((cf.get_stats().*f).hist.count/1000.0) * (cf.get_stats().*f).hist.mean;
}, 0.0, std::plus<double>()).then([](double res) {
return make_ready_future<json::json_return_type>((int64_t)res);
});
@@ -100,28 +100,29 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {return p.find_column_family(uuid).get_stats().*f;},
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
utils::ihistogram(),
add_histogram)
std::plus<utils::ihistogram>())
.then([](const utils::ihistogram& val) {
return make_ready_future<json::json_return_type>(to_json(val));
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::ihistogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
res = add_histogram(res, i.second->get_stats().*f);
res += (i.second->get_stats().*f).hist;
}
return res;
};
@@ -132,6 +133,33 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>())
.then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db) {
utils::rate_moving_average_and_histogram res;
for (auto i : db.get_column_families()) {
res += (i.second->get_stats().*f).rate();
}
return res;
};
return ctx.db.map(fun).then([](const std::vector<utils::rate_moving_average_and_histogram> &res) {
std::vector<httpd::utils_json::rate_moving_average_and_histogram> r;
boost::copy(res | boost::adaptors::transformed(timer_to_json), std::back_inserter(r));
return make_ready_future<json::json_return_type>(r);
});
}
static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ctx, const sstring& name) {
return map_reduce_cf(ctx, name, int64_t(0), [](const column_family& cf) {
return cf.get_unleveled_sstables();
@@ -141,7 +169,7 @@ static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ct
static int64_t min_row_size(column_family& cf) {
int64_t res = INT64_MAX;
for (auto i: *cf.get_sstables() ) {
res = std::min(res, i.second->get_stats_metadata().estimated_row_size.min());
res = std::min(res, i->get_stats_metadata().estimated_row_size.min());
}
return (res == INT64_MAX) ? 0 : res;
}
@@ -149,7 +177,7 @@ static int64_t min_row_size(column_family& cf) {
static int64_t max_row_size(column_family& cf) {
int64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res = std::max(i.second->get_stats_metadata().estimated_row_size.max(), res);
res = std::max(i->get_stats_metadata().estimated_row_size.max(), res);
}
return res;
}
@@ -166,13 +194,95 @@ static double update_ratio(double acc, double f, double total) {
static ratio_holder mean_row_size(column_family& cf) {
ratio_holder res;
for (auto i: *cf.get_sstables() ) {
auto c = i.second->get_stats_metadata().estimated_row_size.count();
res.sub += i.second->get_stats_metadata().estimated_row_size.mean() * c;
auto c = i->get_stats_metadata().estimated_row_size.count();
res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;
res.total += c;
}
return res;
}
static std::unordered_map<sstring, uint64_t> merge_maps(std::unordered_map<sstring, uint64_t> a,
const std::unordered_map<sstring, uint64_t>& b) {
a.insert(b.begin(), b.end());
return a;
}
static json::json_return_type sum_map(const std::unordered_map<sstring, uint64_t>& val) {
uint64_t res = 0;
for (auto i : val) {
res += i.second;
}
return res;
}
static future<json::json_return_type> sum_sstable(http_context& ctx, const sstring name, bool total) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, total](database& db) {
std::unordered_map<sstring, uint64_t> m;
auto sstables = (total) ? db.find_column_family(uuid).get_sstables_including_compacted_undeleted() :
db.find_column_family(uuid).get_sstables();
for (auto t : *sstables) {
m[t->get_filename()] = t->bytes_on_disk();
}
return m;
}, std::unordered_map<sstring, uint64_t>(), merge_maps).
then([](const std::unordered_map<sstring, uint64_t>& val) {
return sum_map(val);
});
}
static future<json::json_return_type> sum_sstable(http_context& ctx, bool total) {
return map_reduce_cf_raw(ctx, std::unordered_map<sstring, uint64_t>(), [total](column_family& cf) {
std::unordered_map<sstring, uint64_t> m;
auto sstables = (total) ? cf.get_sstables_including_compacted_undeleted() :
cf.get_sstables();
for (auto t : *sstables) {
m[t->get_filename()] = t->bytes_on_disk();
}
return m;
},merge_maps).then([](const std::unordered_map<sstring, uint64_t>& val) {
return sum_map(val);
});
}
template <typename T>
class sum_ratio {
uint64_t _n = 0;
T _total = 0;
public:
future<> operator()(T value) {
if (value > 0) {
_total += value;
_n++;
}
return make_ready_future<>();
}
// Returns average value of all registered ratios.
T get() && {
return _n ? (_total / _n) : T(0);
}
};
static double get_compression_ratio(column_family& cf) {
sum_ratio<double> result;
for (auto i : *cf.get_sstables()) {
auto compression_ratio = i->get_compression_ratio();
if (compression_ratio != sstables::metadata_collector::NO_COMPRESSION_RATIO) {
result(compression_ratio);
}
}
return std::move(result).get();
}
static std::vector<uint64_t> concat_sstable_count_per_level(std::vector<uint64_t> a, std::vector<uint64_t>&& b) {
a.resize(std::max(a.size(), b.size()), 0UL);
for (auto i = 0U; i < b.size(); i++) {
a[i] += b[i];
}
return a;
}
void set_column_family(http_context& ctx, routes& r) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
vector<sstring> res;
@@ -296,7 +406,7 @@ void set_column_family(http_context& ctx, routes& r) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
sstables::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i.second->get_stats_metadata().estimated_row_size);
res.merge(i->get_stats_metadata().estimated_row_size);
}
return res;
},
@@ -307,7 +417,7 @@ void set_column_family(http_context& ctx, routes& r) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
uint64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res += i.second->get_stats_metadata().estimated_row_size.count();
res += i->get_stats_metadata().estimated_row_size.count();
}
return res;
},
@@ -318,7 +428,7 @@ void set_column_family(http_context& ctx, routes& r) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
sstables::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i.second->get_stats_metadata().estimated_column_count);
res.merge(i->get_stats_metadata().estimated_column_count);
}
return res;
},
@@ -355,10 +465,14 @@ void set_column_family(http_context& ctx, routes& r) {
return get_cf_stats_count(ctx, &column_family::stats::writes);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
});
@@ -367,24 +481,40 @@ void set_column_family(http_context& ctx, routes& r) {
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::pending_compactions);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.get_compaction_strategy().estimated_pending_compactions(cf);
}, std::plus<int64_t>());
});
cf::get_all_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::pending_compactions);
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.get_compaction_strategy().estimated_pending_compactions(cf);
}, std::plus<int64_t>());
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -400,19 +530,19 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_disk_space_used);
return sum_sstable(ctx, req->param["name"], false);
});
cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return sum_sstable(ctx, false);
});
cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::total_disk_space_used);
return sum_sstable(ctx, req->param["name"], true);
});
cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::total_disk_space_used);
return sum_sstable(ctx, true);
});
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -442,7 +572,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst.second->filter_get_false_positive();
return s + sst->filter_get_false_positive();
});
}, std::plus<uint64_t>());
});
@@ -450,7 +580,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst.second->filter_get_false_positive();
return s + sst->filter_get_false_positive();
});
}, std::plus<uint64_t>());
});
@@ -458,7 +588,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst.second->filter_get_recent_false_positive();
return s + sst->filter_get_recent_false_positive();
});
}, std::plus<uint64_t>());
});
@@ -466,7 +596,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst.second->filter_get_recent_false_positive();
return s + sst->filter_get_recent_false_positive();
});
}, std::plus<uint64_t>());
});
@@ -474,8 +604,8 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst.second->filter_get_false_positive();
return update_ratio(s, f, f + sst.second->filter_get_true_positive());
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
@@ -483,8 +613,8 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst.second->filter_get_false_positive();
return update_ratio(s, f, f + sst.second->filter_get_true_positive());
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
@@ -492,8 +622,8 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst.second->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst.second->filter_get_recent_true_positive());
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
@@ -501,8 +631,8 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst.second->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst.second->filter_get_recent_true_positive());
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
@@ -510,7 +640,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->filter_size();
return sst->filter_size();
});
}, std::plus<uint64_t>());
});
@@ -518,7 +648,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->filter_size();
return sst->filter_size();
});
}, std::plus<uint64_t>());
});
@@ -526,7 +656,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->filter_memory_size();
return sst->filter_memory_size();
});
}, std::plus<uint64_t>());
});
@@ -534,7 +664,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->filter_memory_size();
return sst->filter_memory_size();
});
}, std::plus<uint64_t>());
});
@@ -542,7 +672,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->get_summary().memory_footprint();
return sst->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
@@ -550,7 +680,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->get_summary().memory_footprint();
return sst->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
@@ -623,27 +753,35 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
@@ -719,11 +857,15 @@ void set_column_family(http_context& ctx, routes& r) {
return std::vector<sstring>();
});
cf::get_compression_ratio.set(r, [](const_req) {
// FIXME
// Currently there are no compression information
// so we return 0 as the ratio
return 0;
cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<request> req) {
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.map_reduce(sum_ratio<double>(), [uuid](database& db) {
column_family& cf = db.find_column_family(uuid);
return make_ready_future<double>(get_compression_ratio(cf));
}).then([] (const double& result) {
return make_ready_future<json::json_return_type>(result);
});
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -766,12 +908,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_sstable_count_per_level.set(r, [&ctx](std::unique_ptr<request> req) {
// TBD
// FIXME
// This is a workaround, until there will be an API to return the count
// per level, we return an empty array
vector<uint64_t> res;
return make_ready_future<json::json_return_type>(res);
return map_reduce_cf_raw(ctx, req->param["name"], std::vector<uint64_t>(), [](const column_family& cf) {
return cf.sstable_count_per_level();
}, concat_sstable_count_per_level).then([](const std::vector<uint64_t>& res) {
return make_ready_future<json::json_return_type>(res);
});
});
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -34,31 +34,44 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, std::func
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer).then([](const I& res) {
}, init, reducer);
}
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([](const I& res) {
return make_ready_future<json::json_return_type>(res);
});
}
template<class Mapper, class I, class Reducer, class Result>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer).then([result](const I& res) mutable {
}, init, reducer);
}
template<class Mapper, class I, class Reducer, class Result>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer, result).then([result](const I& res) mutable {
result = res;
return make_ready_future<json::json_return_type>(result);
});
}
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
return ctx.db.map_reduce0([mapper, init, reducer](database& db) {
auto res = init;
@@ -66,10 +79,18 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
res = reducer(res, mapper(*i.second.get()));
}
return res;
}, init, reducer).then([](const I& res) {
}, init, reducer);
}
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
return map_reduce_cf_raw(ctx, init, mapper, reducer).then([](const I& res) {
return make_ready_future<json::json_return_type>(res);
});
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f);

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -22,6 +22,7 @@
#include "compaction_manager.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
namespace api {
@@ -78,7 +79,9 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cm_stats(ctx, &compaction_manager::stats::pending_tasks);
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.get_compaction_strategy().estimated_pending_compactions(cf);
}, std::plus<int64_t>());
});
cm::get_completed_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -33,6 +33,25 @@ namespace sp = httpd::storage_proxy_json;
using proxy = service::storage_proxy;
using namespace json;
static future<utils::rate_moving_average> sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return d.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average(),
std::plus<utils::rate_moving_average>());
}
static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
httpd::utils_json::rate_moving_average m;
m = val;
return make_ready_future<json::json_return_type>(m);
});
}
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
return make_ready_future<json::json_return_type>(val.count);
});
}
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, sstables::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, sstables::estimated_histogram(),
sstables::merge).then([](const sstables::estimated_histogram& val) {
@@ -42,8 +61,8 @@ static future<json::json_return_type> sum_estimated_histogram(http_context& ctx
});
}
static future<json::json_return_type> total_latency(http_context& ctx, utils::ihistogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).mean * (p.get_stats().*f).count;}, 0.0,
static future<json::json_return_type> total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).hist.mean * (p.get_stats().*f).hist.count;}, 0.0,
std::plus<double>()).then([](double val) {
int64_t res = val;
return make_ready_future<json::json_return_type>(res);
@@ -291,41 +310,77 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::read_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_timeouts);
});
sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::read_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_unavailables);
});
sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::range_slice_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::range_slice_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::write_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_timeouts);
});
sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::write_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_unavailables);
});
sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_timeouts);
});
sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_unavailables);
});
sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_timeouts);
});
sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_unavailables);
});
sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::range);
});
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::write);
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::range);
});
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::write);
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::estimated_read);
});
@@ -342,7 +397,7 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::read);
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -31,6 +31,7 @@
#include "locator/snitch_base.hh"
#include "column_family.hh"
#include "log.hh"
#include "release.hh"
namespace api {
@@ -121,6 +122,9 @@ void set_storage_service(http_context& ctx, routes& r) {
return service::get_local_storage_service().get_release_version();
});
ss::get_scylla_release_version.set(r, [](const_req req) {
return scylla_version();
});
ss::get_schema_version.set(r, [](const_req req) {
return service::get_local_storage_service().get_schema_version();
});
@@ -382,21 +386,21 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::remove_node.set(r, [](std::unique_ptr<request> req) {
auto host_id = req->get_query_param("host_id");
return service::get_local_storage_service().remove_node(host_id).then([] {
return service::get_local_storage_service().removenode(host_id).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::get_removal_status.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>("");
return service::get_local_storage_service().get_removal_status().then([] (auto status) {
return make_ready_future<json::json_return_type>(status);
});
});
ss::force_remove_completion.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
return service::get_local_storage_service().force_remove_completion().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::set_logging_level.set(r, [](std::unique_ptr<request> req) {
@@ -659,16 +663,22 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto probability = req->get_query_param("probability");
return make_ready_future<json::json_return_type>(json_void());
try {
double real_prob = std::stod(probability.c_str());
return tracing::tracing::tracing_instance().invoke_on_all([real_prob] (auto& local_tracing) {
local_tracing.set_trace_probability(real_prob);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
} catch (...) {
throw httpd::bad_param_exception(sprint("Bad format of a probability value: \"%s\"", probability.c_str()));
}
});
ss::get_trace_probability.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
return make_ready_future<json::json_return_type>(tracing::tracing::get_local_tracing_instance().get_trace_probability());
});
ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -63,5 +63,8 @@ public:
::feed_hash(as_collection_mutation(), h, def.type);
}
}
size_t memory_usage() const {
return _data.memory_usage();
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -40,14 +40,19 @@
*/
#include <seastar/core/sleep.hh>
#include <seastar/core/distributed.hh>
#include "auth.hh"
#include "authenticator.hh"
#include "authorizer.hh"
#include "database.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/cf_statement.hh"
#include "cql3/statements/raw/cf_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "db/config.hh"
#include "service/migration_manager.hh"
#include "utils/loading_cache.hh"
#include "utils/hash.hh"
const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
const sstring auth::auth::AUTH_KS("system_auth");
@@ -76,13 +81,10 @@ class auth_migration_listener : public service::migration_listener {
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_keyspace(const sstring& ks_name) override {
// TODO:
//DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.keyspace(ksName));
auth::authorizer::get().revoke_all(auth::data_resource(ks_name));
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
// TODO:
//DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.columnFamily(ksName, cfName));
auth::authorizer::get().revoke_all(auth::data_resource(ks_name, cf_name));
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
@@ -91,6 +93,64 @@ class auth_migration_listener : public service::migration_listener {
static auth_migration_listener auth_migration;
namespace std {
template <>
struct hash<auth::data_resource> {
size_t operator()(const auth::data_resource & v) const {
return v.hash_value();
}
};
template <>
struct hash<auth::authenticated_user> {
size_t operator()(const auth::authenticated_user & v) const {
return utils::tuple_hash()(v.name(), v.is_anonymous());
}
};
}
class auth::auth::permissions_cache {
public:
typedef utils::loading_cache<std::pair<authenticated_user, data_resource>, permission_set, utils::tuple_hash> cache_type;
typedef typename cache_type::key_type key_type;
permissions_cache()
: permissions_cache(
cql3::get_local_query_processor().db().local().get_config()) {
}
permissions_cache(const db::config& cfg)
: _cache(cfg.permissions_cache_max_entries(), expiry(cfg),
std::chrono::milliseconds(
cfg.permissions_validity_in_ms()),
[](const key_type& k) {
logger.debug("Refreshing permissions for {}", k.first.name());
return authorizer::get().authorize(::make_shared<authenticated_user>(k.first), k.second);
}) {
}
static std::chrono::milliseconds expiry(const db::config& cfg) {
auto exp = cfg.permissions_update_interval_in_ms();
if (exp == 0 || exp == std::numeric_limits<uint32_t>::max()) {
exp = cfg.permissions_validity_in_ms();
}
return std::chrono::milliseconds(exp);
}
future<> stop() {
return make_ready_future<>();
}
future<permission_set> get(::shared_ptr<authenticated_user> user, data_resource resource) {
return _cache.get(key_type(*user, std::move(resource)));
}
private:
cache_type _cache;
};
static distributed<auth::auth::permissions_cache> perm_cache;
/**
* Poor mans job schedule. For maximum 2 jobs. Sic.
* Still does nothing more clever than waiting 10 seconds
@@ -163,14 +223,22 @@ bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
future<> auth::auth::setup() {
auto& db = cql3::get_local_query_processor().db().local();
auto& cfg = db.get_config();
auto type = cfg.authenticator();
if (is_class_type(type, authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)) {
return authenticator::setup(type).discard_result(); // just create the object
future<> f = perm_cache.start();
if (is_class_type(cfg.authenticator(),
authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)
&& is_class_type(cfg.authorizer(),
authorizer::ALLOW_ALL_AUTHORIZER_NAME)
) {
// just create the objects
return f.then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
});
}
future<> f = make_ready_future();
if (!db.has_keyspace(AUTH_KS)) {
std::map<sstring, sstring> opts;
opts["replication_factor"] = "1";
@@ -182,10 +250,10 @@ future<> auth::auth::setup() {
return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
90 * 24 * 60 * 60)); // 3 months.
}).then([type] {
return authenticator::setup(type).discard_result();
}).then([] {
// TODO authorizer
}).then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
}).then([] {
service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
// instead of once-timer, just schedule this later
@@ -216,9 +284,15 @@ future<> auth::auth::shutdown() {
// db-env-shutdown != process shutdown
return smp::invoke_on_all([] {
thread_waiters().clear();
}).then([] {
return perm_cache.stop();
});
}
future<auth::permission_set> auth::auth::get_permissions(::shared_ptr<authenticated_user> user, data_resource resource) {
return perm_cache.local().get(std::move(user), std::move(resource));
}
static db::consistency_level consistency_for_user(const sstring& username) {
if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
@@ -274,15 +348,18 @@ future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
return make_ready_future();
}
::shared_ptr<cql3::statements::cf_statement> parsed = static_pointer_cast<
cql3::statements::cf_statement>(cql3::query_processor::parse_statement(cql));
::shared_ptr<cql3::statements::raw::cf_statement> parsed = static_pointer_cast<
cql3::statements::raw::cf_statement>(cql3::query_processor::parse_statement(cql));
parsed->prepare_keyspace(AUTH_KS);
::shared_ptr<cql3::statements::create_table_statement> statement =
static_pointer_cast<cql3::statements::create_table_statement>(
parsed->prepare(db)->statement);
// Origin sets "Legacy Cf Id" for the new table. We have no need to be
// pre-2.1 compatible (afaik), so lets skip a whole lotta hoolaballo
return statement->announce_migration(qp.proxy(), false).then([statement](bool) {});
auto schema = statement->get_cf_meta_data();
auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return service::get_local_migration_manager().announce_new_column_family(b.build(), false);
}
future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -44,13 +44,21 @@
#include <chrono>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "exceptions/exceptions.hh"
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class authenticated_user;
class auth {
public:
class permissions_cache;
static const sstring DEFAULT_SUPERUSER_NAME;
static const sstring AUTH_KS;
static const sstring USERS_CF;
@@ -58,12 +66,7 @@ public:
static bool is_class_type(const sstring& type, const sstring& classname);
#if 0
public static Set<Permission> getPermissions(AuthenticatedUser user, IResource resource)
{
return permissionsCache.getPermissions(user, resource);
}
#endif
static future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource);
/**
* Checks if the username is stored in AUTH_KS.USERS_CF.

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -41,6 +41,7 @@
#include "authenticated_user.hh"
#include "auth.hh"
const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
@@ -52,10 +53,20 @@ auth::authenticated_user::authenticated_user(sstring name)
: _name(name), _anon(false)
{}
auth::authenticated_user::authenticated_user(authenticated_user&&) = default;
auth::authenticated_user::authenticated_user(const authenticated_user&) = default;
const sstring& auth::authenticated_user::name() const {
return _anon ? ANONYMOUS_USERNAME : _name;
}
future<bool> auth::authenticated_user::is_super() const {
if (is_anonymous()) {
return make_ready_future<bool>(false);
}
return auth::auth::is_super_user(_name);
}
bool auth::authenticated_user::operator==(const authenticated_user& v) const {
return _anon ? v._anon : _name == v._name;
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -42,6 +42,7 @@
#pragma once
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
namespace auth {
@@ -51,6 +52,8 @@ public:
authenticated_user();
authenticated_user(sstring name);
authenticated_user(authenticated_user&&);
authenticated_user(const authenticated_user&);
const sstring& name() const;
@@ -60,7 +63,7 @@ public:
* Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
* (depends on IAuthorizer implementation).
*/
bool is_super() const;
future<bool> is_super() const;
/**
* If IAuthenticator doesn't require authentication, this method may return true.

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -49,6 +49,22 @@ const sstring auth::authenticator::USERNAME_KEY("username");
const sstring auth::authenticator::PASSWORD_KEY("password");
const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
auth::authenticator::option auth::authenticator::string_to_option(const sstring& name) {
if (strcasecmp(name.c_str(), "password") == 0) {
return option::PASSWORD;
}
throw std::invalid_argument(name);
}
sstring auth::authenticator::option_to_string(option opt) {
switch (opt) {
case option::PASSWORD:
return "PASSWORD";
default:
throw std::invalid_argument(sprint("Unknown option {}", opt));
}
}
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
@@ -84,8 +100,9 @@ auth::authenticator::setup(const sstring& type) throw (exceptions::configuration
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
resource_ids protected_resources() const override {
return resource_ids();
const resource_ids& protected_resources() const override {
static const resource_ids ids;
return ids;
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -79,15 +79,13 @@ public:
PASSWORD
};
static option string_to_option(const sstring&);
static sstring option_to_string(option);
using option_set = enum_set<super_enum<option, option::PASSWORD>>;
using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
using credentials_map = std::unordered_map<sstring, sstring>;
/**
* Resource id mappings, i.e. keyspace and/or column families.
*/
using resource_ids = std::set<data_resource>;
/**
* Setup is called once upon system startup to initialize the IAuthenticator.
*
@@ -174,7 +172,7 @@ public:
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
* @see resource_ids
*/
virtual resource_ids protected_resources() const = 0;
virtual const resource_ids& protected_resources() const = 0;
class sasl_challenge {
public:
@@ -194,5 +192,9 @@ public:
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
};
inline std::ostream& operator<<(std::ostream& os, authenticator::option opt) {
return os << authenticator::option_to_string(opt);
}
}

104
auth/authorizer.cc Normal file
View File

@@ -0,0 +1,104 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authorizer.hh"
#include "authenticated_user.hh"
#include "default_authorizer.hh"
#include "auth.hh"
#include "db/config.hh"
const sstring auth::authorizer::ALLOW_ALL_AUTHORIZER_NAME("org.apache.cassandra.auth.AllowAllAuthorizer");
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authorizer> global_authorizer;
future<>
auth::authorizer::setup(const sstring& type) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHORIZER_NAME)) {
class allow_all_authorizer : public authorizer {
public:
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
}
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
}
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const override {
throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
}
future<> revoke_all(sstring dropped_user) override {
return make_ready_future();
}
future<> revoke_all(data_resource) override {
return make_ready_future();
}
const resource_ids& protected_resources() override {
static const resource_ids ids;
return ids;
}
future<> validate_configuration() const override {
return make_ready_future();
}
};
global_authorizer = std::make_unique<allow_all_authorizer>();
} else if (auth::auth::is_class_type(type, default_authorizer::DEFAULT_AUTHORIZER_NAME)) {
auto da = std::make_unique<default_authorizer>();
auto f = da->init();
return f.then([da = std::move(da)]() mutable {
global_authorizer = std::move(da);
});
} else {
throw exceptions::configuration_exception("Invalid authorizer type: " + type);
}
return make_ready_future();
}
auth::authorizer& auth::authorizer::get() {
assert(global_authorizer);
return *global_authorizer;
}

171
auth/authorizer.hh Normal file
View File

@@ -0,0 +1,171 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include <tuple>
#include <experimental/optional>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class authenticated_user;
struct permission_details {
sstring user;
data_resource resource;
permission_set permissions;
bool operator<(const permission_details& v) const {
return std::tie(user, resource, permissions) < std::tie(v.user, v.resource, v.permissions);
}
};
using std::experimental::optional;
class authorizer {
public:
static const sstring ALLOW_ALL_AUTHORIZER_NAME;
virtual ~authorizer() {}
/**
* The primary Authorizer method. Returns a set of permissions of a user on a resource.
*
* @param user Authenticated user requesting authorization.
* @param resource Resource for which the authorization is being requested. @see DataResource.
* @return Set of permissions of the user on the resource. Should never return empty. Use permission.NONE instead.
*/
virtual future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const = 0;
/**
* Grants a set of permissions on a resource to a user.
* The opposite of revoke().
*
* @param performer User who grants the permissions.
* @param permissions Set of permissions to grant.
* @param to Grantee of the permissions.
* @param resource Resource on which to grant the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> grant(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring to) = 0;
/**
* Revokes a set of permissions on a resource from a user.
* The opposite of grant().
*
* @param performer User who revokes the permissions.
* @param permissions Set of permissions to revoke.
* @param from Revokee of the permissions.
* @param resource Resource on which to revoke the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> revoke(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring from) = 0;
/**
* Returns a list of permissions on a resource of a user.
*
* @param performer User who wants to see the permissions.
* @param permissions Set of Permission values the user is interested in. The result should only include the matching ones.
* @param resource The resource on which permissions are requested. Can be null, in which case permissions on all resources
* should be returned.
* @param of The user whose permissions are requested. Can be null, in which case permissions of every user should be returned.
*
* @return All of the matching permission that the requesting user is authorized to know about.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
/**
* This method is called before deleting a user with DROP USER query so that a new user with the same
* name wouldn't inherit permissions of the deleted user in the future.
*
* @param droppedUser The user to revoke all permissions from.
*/
virtual future<> revoke_all(sstring dropped_user) = 0;
/**
* This method is called after a resource is removed (i.e. keyspace or a table is dropped).
*
* @param droppedResource The resource to revoke all permissions on.
*/
virtual future<> revoke_all(data_resource) = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
*/
virtual const resource_ids& protected_resources() = 0;
/**
* Validates configuration of IAuthorizer implementation (if configurable).
*
* @throws ConfigurationException when there is a configuration error.
*/
virtual future<> validate_configuration() const = 0;
/**
* Setup is called once upon system startup to initialize the IAuthorizer.
*
* For example, use this method to create any required keyspaces/column families.
*/
static future<> setup(const sstring& type);
/**
* Returns the system authorizer. Must have called setup before calling this.
*/
static authorizer& get();
};
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -47,11 +47,8 @@
const sstring auth::data_resource::ROOT_NAME("data");
auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
: _ks(ks), _cf(cf)
: _level(l), _ks(ks), _cf(cf)
{
if (l != get_level()) {
throw std::invalid_argument("level/keyspace/column mismatch");
}
}
auth::data_resource::data_resource()
@@ -67,14 +64,7 @@ auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
{}
auth::data_resource::level auth::data_resource::get_level() const {
if (!_cf.empty()) {
assert(!_ks.empty());
return level::COLUMN_FAMILY;
}
if (!_ks.empty()) {
return level::KEYSPACE;
}
return level::ROOT;
return _level;
}
auth::data_resource auth::data_resource::from_name(
@@ -158,7 +148,15 @@ bool auth::data_resource::exists() const {
}
sstring auth::data_resource::to_string() const {
return name();
switch (get_level()) {
case level::ROOT:
return "<all keyspaces>";
case level::KEYSPACE:
return sprint("<keyspace %s>", _ks);
case level::COLUMN_FAMILY:
default:
return sprint("<table %s.%s>", _ks, _cf);
}
}
bool auth::data_resource::operator==(const data_resource& v) const {
@@ -170,6 +168,6 @@ bool auth::data_resource::operator<(const data_resource& v) const {
}
std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
return os << r.name();
return os << r.to_string();
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -41,7 +41,9 @@
#pragma once
#include "utils/hash.hh"
#include <iosfwd>
#include <set>
#include <seastar/core/sstring.hh>
namespace auth {
@@ -54,6 +56,7 @@ private:
static const sstring ROOT_NAME;
level _level;
sstring _ks;
sstring _cf;
@@ -136,8 +139,17 @@ public:
bool operator==(const data_resource&) const;
bool operator<(const data_resource&) const;
size_t hash_value() const {
return utils::tuple_hash()(_ks, _cf);
}
};
/**
* Resource id mappings, i.e. keyspace and/or column families.
*/
using resource_ids = std::set<data_resource>;
std::ostream& operator<<(std::ostream&, const data_resource&);
}

240
auth/default_authorizer.cc Normal file
View File

@@ -0,0 +1,240 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <unistd.h>
#include <crypt.h>
#include <random>
#include <chrono>
#include <seastar/core/reactor.hh>
#include "auth.hh"
#include "default_authorizer.hh"
#include "authenticated_user.hh"
#include "permission.hh"
#include "cql3/query_processor.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
const sstring auth::default_authorizer::DEFAULT_AUTHORIZER_NAME(
"org.apache.cassandra.auth.CassandraAuthorizer");
static const sstring USER_NAME = "username";
static const sstring RESOURCE_NAME = "resource";
static const sstring PERMISSIONS_NAME = "permissions";
static const sstring PERMISSIONS_CF = "permissions";
static logging::logger logger("default_authorizer");
auth::default_authorizer::default_authorizer() {
}
auth::default_authorizer::~default_authorizer() {
}
future<> auth::default_authorizer::init() {
sstring create_table = sprint("CREATE TABLE %s.%s ("
"%s text,"
"%s text,"
"%s set<text>,"
"PRIMARY KEY(%s, %s)"
") WITH gc_grace_seconds=%d", auth::auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME,
USER_NAME, RESOURCE_NAME, 90 * 24 * 60 * 60); // 3 months.
return auth::setup_table(PERMISSIONS_CF, create_table);
}
future<auth::permission_set> auth::default_authorizer::authorize(
::shared_ptr<authenticated_user> user, data_resource resource) const {
return user->is_super().then([this, user, resource = std::move(resource)](bool is_super) {
if (is_super) {
return make_ready_future<permission_set>(permissions::ALL);
}
/**
* TOOD: could create actual data type for permission (translating string<->perm),
* but this seems overkill right now. We still must store strings so...
*/
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?"
, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
.then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !res->one().has(PERMISSIONS_NAME)) {
return make_ready_future<permission_set>(permissions::NONE);
}
return make_ready_future<permission_set>(permissions::from_strings(res->one().get_set<sstring>(PERMISSIONS_NAME)));
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to authorize {} for {}", user->name(), resource);
return make_ready_future<permission_set>(permissions::NONE);
}
});
});
}
#include <boost/range.hpp>
future<> auth::default_authorizer::modify(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring user, sstring op) {
// TODO: why does this not check super user?
auto& qp = cql3::get_local_query_processor();
auto query = sprint("UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
auth::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
PERMISSIONS_NAME, op, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::ONE, {
permissions::to_strings(set), user, resource.name() }).discard_result();
}
future<> auth::default_authorizer::grant(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring to) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(to), "+");
}
future<> auth::default_authorizer::revoke(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring from) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(from), "-");
}
future<std::vector<auth::permission_details>> auth::default_authorizer::list(
::shared_ptr<authenticated_user> performer, permission_set set,
optional<data_resource> resource, optional<sstring> user) const {
return performer->is_super().then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
if (!is_super && (!user || performer->name() != *user)) {
throw exceptions::unauthorized_exception(sprint("You are not authorized to view %s's permissions", user ? *user : "everyone"));
}
auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF);
auto& qp = cql3::get_local_query_processor();
// Oh, look, it is a case where it does not pay off to have
// parameters to process in an initializer list.
future<::shared_ptr<cql3::untyped_result_set>> f = make_ready_future<::shared_ptr<cql3::untyped_result_set>>();
if (resource && user) {
query += sprint(" WHERE %s = ? AND %s = ?", USER_NAME, RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
} else if (resource) {
query += sprint(" WHERE %s = ? ALLOW FILTERING", RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {resource->name()});
} else if (user) {
query += sprint(" WHERE %s = ?", USER_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user});
} else {
f = qp.process(query, db::consistency_level::ONE, {});
}
return f.then([set](::shared_ptr<cql3::untyped_result_set> res) {
std::vector<permission_details> result;
for (auto& row : *res) {
if (row.has(PERMISSIONS_NAME)) {
auto username = row.get_as<sstring>(USER_NAME);
auto resource = data_resource::from_name(row.get_as<sstring>(RESOURCE_NAME));
auto ps = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
ps = permission_set::from_mask(ps.mask() & set.mask());
result.emplace_back(permission_details {username, resource, ps});
}
}
return make_ready_future<std::vector<permission_details>>(std::move(result));
});
});
}
future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME);
return qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
[dropped_user](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", dropped_user, e);
}
});
}
future<> auth::default_authorizer::revoke_all(data_resource resource) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
USER_NAME, auth::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
.then_wrapped([resource, &qp](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
return parallel_for_each(res->begin(), res->end(), [&qp, res, resource](const cql3::untyped_result_set::row& r) {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ? AND %s = ?"
, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
.discard_result().handle_exception([resource](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
}
});
});
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
return make_ready_future();
}
});
}
const auth::resource_ids& auth::default_authorizer::protected_resources() {
static const resource_ids ids({ data_resource(auth::AUTH_KS, PERMISSIONS_CF) });
return ids;
}
future<> auth::default_authorizer::validate_configuration() const {
return make_ready_future();
}

View File

@@ -0,0 +1,78 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "authorizer.hh"
namespace auth {
class default_authorizer : public authorizer {
public:
static const sstring DEFAULT_AUTHORIZER_NAME;
default_authorizer();
~default_authorizer();
future<> init();
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override;
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
future<> revoke_all(sstring) override;
future<> revoke_all(data_resource) override;
const resource_ids& protected_resources() override;
future<> validate_configuration() const override;
private:
future<> modify(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring, sstring);
};
} /* namespace auth */

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -218,12 +218,12 @@ future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::au
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
auto& qp = cql3::get_local_query_processor();
return qp.process(
sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
consistency_for_user(username), { username }, true).then_wrapped(
[=](future<::shared_ptr<cql3::untyped_result_set>> f) {
return futurize_apply([this, username, password] {
auto& qp = cql3::get_local_query_processor();
return qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
consistency_for_user(username), {username}, true);
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
@@ -234,6 +234,8 @@ future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::au
std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
} catch (exceptions::request_execution_exception& e) {
std::throw_with_nested(exceptions::authentication_exception(e.what()));
} catch (...) {
std::throw_with_nested(exceptions::authentication_exception("authentication failed"));
}
});
}
@@ -281,8 +283,9 @@ future<> auth::password_authenticator::drop(sstring username)
}
}
auth::authenticator::resource_ids auth::password_authenticator::protected_resources() const {
return { data_resource(auth::AUTH_KS, CREDENTIALS_CF) };
const auth::resource_ids& auth::password_authenticator::protected_resources() const {
static const resource_ids ids({ data_resource(auth::AUTH_KS, CREDENTIALS_CF) });
return ids;
}
::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -62,7 +62,7 @@ public:
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
resource_ids protected_resources() const override;
const resource_ids& protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -39,11 +39,66 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <unordered_map>
#include <boost/algorithm/string.hpp>
#include "permission.hh"
const auth::permission_set auth::ALL_DATA = auth::permission_set::of
< auth::permission::CREATE, auth::permission::ALTER,
auth::permission::DROP, auth::permission::SELECT,
auth::permission::MODIFY, auth::permission::AUTHORIZE>();
const auth::permission_set auth::ALL = auth::ALL_DATA;
const auth::permission_set auth::NONE;
const auth::permission_set auth::permissions::ALL_DATA =
auth::permission_set::of<auth::permission::CREATE,
auth::permission::ALTER, auth::permission::DROP,
auth::permission::SELECT,
auth::permission::MODIFY,
auth::permission::AUTHORIZE>();
const auth::permission_set auth::permissions::ALL = auth::permissions::ALL_DATA;
const auth::permission_set auth::permissions::NONE;
const auth::permission_set auth::permissions::ALTERATIONS =
auth::permission_set::of<auth::permission::CREATE,
auth::permission::ALTER, auth::permission::DROP>();
static const std::unordered_map<sstring, auth::permission> permission_names({
{ "READ", auth::permission::READ },
{ "WRITE", auth::permission::WRITE },
{ "CREATE", auth::permission::CREATE },
{ "ALTER", auth::permission::ALTER },
{ "DROP", auth::permission::DROP },
{ "SELECT", auth::permission::SELECT },
{ "MODIFY", auth::permission::MODIFY },
{ "AUTHORIZE", auth::permission::AUTHORIZE },
});
const sstring& auth::permissions::to_string(permission p) {
for (auto& v : permission_names) {
if (v.second == p) {
return v.first;
}
}
throw std::out_of_range("unknown permission");
}
auth::permission auth::permissions::from_string(const sstring& s) {
sstring upper(s);
boost::to_upper(upper);
return permission_names.at(upper);
}
std::unordered_set<sstring> auth::permissions::to_strings(const permission_set& set) {
std::unordered_set<sstring> res;
for (auto& v : permission_names) {
if (set.contains(v.second)) {
res.emplace(v.first);
}
}
return res;
}
auth::permission_set auth::permissions::from_strings(const std::unordered_set<sstring>& set) {
permission_set res = auth::permissions::NONE;
for (auto& s : set) {
res.set(from_string(s));
}
return res;
}
bool auth::operator<(const permission_set& p1, const permission_set& p2) {
return p1.mask() < p2.mask();
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2016 Cloudius Systems
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -41,6 +41,9 @@
#pragma once
#include <unordered_set>
#include <seastar/core/sstring.hh>
#include "enum_set.hh"
namespace auth {
@@ -74,8 +77,22 @@ typedef enum_set<super_enum<permission,
permission::MODIFY,
permission::AUTHORIZE>> permission_set;
bool operator<(const permission_set&, const permission_set&);
namespace permissions {
extern const permission_set ALL_DATA;
extern const permission_set ALL;
extern const permission_set NONE;
extern const permission_set ALTERATIONS;
const sstring& to_string(permission);
permission from_string(const sstring&);
std::unordered_set<sstring> to_strings(const permission_set&);
permission_set from_strings(const std::unordered_set<sstring>&);
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2014 Cloudius Systems, Ltd.
* Copyright (C) 2014 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
*/

151
checked-file-impl.hh Normal file
View File

@@ -0,0 +1,151 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "seastar/core/file.hh"
#include "disk-error-handler.hh"
class checked_file_impl : public file_impl {
public:
checked_file_impl(disk_error_signal_type& s, file f)
: _signal(s) , _file(f) {
_memory_dma_alignment = f.memory_dma_alignment();
_disk_read_dma_alignment = f.disk_read_dma_alignment();
_disk_write_dma_alignment = f.disk_write_dma_alignment();
}
virtual future<size_t> write_dma(uint64_t pos, const void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> write_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, iov, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, iov, pc);
});
}
virtual future<> flush(void) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->flush();
});
}
virtual future<struct stat> stat(void) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->stat();
});
}
virtual future<> truncate(uint64_t length) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->truncate(length);
});
}
virtual future<> discard(uint64_t offset, uint64_t length) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->discard(offset, length);
});
}
virtual future<> allocate(uint64_t position, uint64_t length) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->allocate(position, length);
});
}
virtual future<uint64_t> size(void) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->size();
});
}
virtual future<> close() override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->close();
});
}
virtual subscription<directory_entry> list_directory(std::function<future<> (directory_entry de)> next) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->list_directory(next);
});
}
private:
disk_error_signal_type &_signal;
file _file;
};
inline file make_checked_file(disk_error_signal_type& signal, file& f)
{
return file(::make_shared<checked_file_impl>(signal, f));
}
future<file>
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags,
file_open_options options)
{
return do_io_check(signal, [&] {
return open_file_dma(name, flags, options).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags)
{
return do_io_check(signal, [&] {
return open_file_dma(name, flags).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_directory(disk_error_signal_type& signal,
sstring name)
{
return do_io_check(signal, [&] {
return engine().open_directory(name).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}

View File

@@ -0,0 +1,127 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "keys.hh"
#include "schema.hh"
#include "range.hh"
/**
* Represents the kind of bound in a range tombstone.
*/
enum class bound_kind : uint8_t {
excl_end = 0,
incl_start = 1,
// values 2 to 5 are reserved for forward Origin compatibility
incl_end = 6,
excl_start = 7,
};
std::ostream& operator<<(std::ostream& out, const bound_kind k);
bound_kind invert_kind(bound_kind k);
int32_t weight(bound_kind k);
static inline bound_kind flip_bound_kind(bound_kind bk)
{
switch (bk) {
case bound_kind::excl_end: return bound_kind::excl_start;
case bound_kind::incl_end: return bound_kind::incl_start;
case bound_kind::excl_start: return bound_kind::excl_end;
case bound_kind::incl_start: return bound_kind::incl_end;
}
abort();
}
class bound_view {
const static thread_local clustering_key empty_prefix;
public:
const clustering_key_prefix& prefix;
bound_kind kind;
bound_view(const clustering_key_prefix& prefix, bound_kind kind)
: prefix(prefix)
, kind(kind)
{ }
struct compare {
// To make it assignable and to avoid taking a schema_ptr, we
// wrap the schema reference.
std::reference_wrapper<const schema> _s;
compare(const schema& s) : _s(s)
{ }
bool operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
auto type = _s.get().clustering_key_prefix_type();
auto res = prefix_equality_tri_compare(type->types().begin(),
type->begin(p1), type->end(p1),
type->begin(p2), type->end(p2),
tri_compare);
if (res) {
return res < 0;
}
auto d1 = p1.size(_s);
auto d2 = p2.size(_s);
if (d1 == d2) {
return w1 < w2;
}
return d1 < d2 ? w1 <= 0 : w2 > 0;
}
bool operator()(const bound_view b, const clustering_key_prefix& p) const {
return operator()(b.prefix, weight(b.kind), p, 0);
}
bool operator()(const clustering_key_prefix& p, const bound_view b) const {
return operator()(p, 0, b.prefix, weight(b.kind));
}
bool operator()(const bound_view b1, const bound_view b2) const {
return operator()(b1.prefix, weight(b1.kind), b2.prefix, weight(b2.kind));
}
};
bool equal(const schema& s, const bound_view other) const {
return kind == other.kind && prefix.equal(s, other.prefix);
}
bool adjacent(const schema& s, const bound_view other) const {
return invert_kind(other.kind) == kind && prefix.equal(s, other.prefix);
}
static bound_view bottom() {
return {empty_prefix, bound_kind::incl_start};
}
static bound_view top() {
return {empty_prefix, bound_kind::incl_end};
}
/*
template<template<typename> typename T, typename U>
concept bool Range() {
return requires (T<U> range) {
{ range.start() } -> stdx::optional<U>;
{ range.end() } -> stdx::optional<U>;
};
};*/
template<template<typename> typename Range>
static std::pair<bound_view, bound_view> from_range(const Range<clustering_key_prefix>& range) {
return {
range.start() ? bound_view(range.start()->value(), range.start()->is_inclusive() ? bound_kind::incl_start : bound_kind::excl_start) : bottom(),
range.end() ? bound_view(range.end()->value(), range.end()->is_inclusive() ? bound_kind::incl_end : bound_kind::excl_end) : top(),
};
}
friend std::ostream& operator<<(std::ostream& out, const bound_view& b) {
return out << "{bound: prefix=" << b.prefix << ", kind=" << b.kind << "}";
}
};

138
clustering_key_filter.cc Normal file
View File

@@ -0,0 +1,138 @@
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "clustering_key_filter.hh"
#include "keys.hh"
#include "query-request.hh"
#include "range.hh"
namespace query {
const clustering_row_ranges&
clustering_key_filtering_context::get_ranges(const partition_key& key) const {
static thread_local clustering_row_ranges full_range = {{}};
return _factory ? _factory->get_ranges(key) : full_range;
}
clustering_key_filtering_context clustering_key_filtering_context::create_no_filtering() {
return clustering_key_filtering_context{};
}
const clustering_key_filtering_context no_clustering_key_filtering =
clustering_key_filtering_context::create_no_filtering();
class stateless_clustering_key_filter_factory : public clustering_key_filter_factory {
clustering_key_filter _filter;
clustering_row_ranges _ranges;
public:
stateless_clustering_key_filter_factory(clustering_row_ranges&& ranges,
clustering_key_filter&& filter)
: _filter(std::move(filter)), _ranges(std::move(ranges)) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
return _filter;
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
return _filter;
}
virtual const clustering_row_ranges& get_ranges(const partition_key& key) override {
return _ranges;
}
virtual bool want_static_columns(const partition_key& key) override {
return true;
}
};
class partition_slice_clustering_key_filter_factory : public clustering_key_filter_factory {
schema_ptr _schema;
const partition_slice& _slice;
clustering_key_prefix::prefix_equal_tri_compare _cmp;
clustering_row_ranges _ck_ranges;
public:
partition_slice_clustering_key_filter_factory(schema_ptr s, const partition_slice& slice)
: _schema(std::move(s)), _slice(slice), _cmp(*_schema) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const clustering_range& r) { return r.contains(key, _cmp); });
};
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const clustering_range& r) { return r.contains(key, _cmp); });
};
}
virtual const clustering_row_ranges& get_ranges(const partition_key& key) override {
if (_slice.options.contains(query::partition_slice::option::reversed)) {
_ck_ranges = _slice.row_ranges(*_schema, key);
std::reverse(_ck_ranges.begin(), _ck_ranges.end());
return _ck_ranges;
}
return _slice.row_ranges(*_schema, key);
}
virtual bool want_static_columns(const partition_key& key) override {
return true;
}
};
static const shared_ptr<clustering_key_filter_factory>
create_partition_slice_filter(schema_ptr s, const partition_slice& slice) {
return ::make_shared<partition_slice_clustering_key_filter_factory>(std::move(s), slice);
}
const clustering_key_filtering_context
clustering_key_filtering_context::create(schema_ptr schema, const partition_slice& slice) {
static thread_local clustering_key_filtering_context accept_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(clustering_row_ranges{{}},
[](const clustering_key&) { return true; }));
static thread_local clustering_key_filtering_context reject_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(clustering_row_ranges{},
[](const clustering_key&) { return false; }));
if (slice.get_specific_ranges()) {
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
const clustering_row_ranges& ranges = slice.default_row_ranges();
if (ranges.empty()) {
return reject_all;
}
if (ranges.size() == 1 && ranges[0].is_full()) {
return accept_all;
}
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
}

82
clustering_key_filter.hh Normal file
View File

@@ -0,0 +1,82 @@
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <vector>
#include "core/shared_ptr.hh"
#include "database_fwd.hh"
#include "schema.hh"
template<typename T> class range;
namespace query {
class partition_slice;
// A predicate that tells if a clustering key should be accepted.
using clustering_key_filter = std::function<bool(const clustering_key&)>;
// A factory for clustering key filter which can be reused for multiple clustering keys.
class clustering_key_filter_factory {
public:
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
virtual clustering_key_filter get_filter(const partition_key&) = 0;
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
virtual clustering_key_filter get_filter_for_sorted(const partition_key&) = 0;
virtual const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key&) = 0;
// Whether we want to get the static row, in addition to the desired clustering rows
virtual bool want_static_columns(const partition_key&) = 0;
virtual ~clustering_key_filter_factory() = default;
};
class clustering_key_filtering_context {
private:
shared_ptr<clustering_key_filter_factory> _factory;
clustering_key_filtering_context() {};
clustering_key_filtering_context(shared_ptr<clustering_key_filter_factory> factory) : _factory(factory) {}
public:
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
clustering_key_filter get_filter(const partition_key& key) const {
return _factory ? _factory->get_filter(key) : [] (const clustering_key&) { return true; };
}
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
clustering_key_filter get_filter_for_sorted(const partition_key& key) const {
return _factory ? _factory->get_filter_for_sorted(key) : [] (const clustering_key&) { return true; };
}
const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key& key) const;
bool want_static_columns(const partition_key& key) const {
return _factory ? _factory->want_static_columns(key) : true;
}
static const clustering_key_filtering_context create(schema_ptr, const partition_slice&);
static clustering_key_filtering_context create_no_filtering();
};
extern const clustering_key_filtering_context no_clustering_key_filtering;
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -22,6 +22,8 @@
#pragma once
class column_family;
class schema;
using schema_ptr = lw_shared_ptr<const schema>;
namespace sstables {
@@ -30,11 +32,12 @@ enum class compaction_strategy_type {
major,
size_tiered,
leveled,
// FIXME: Add support to DateTiered.
date_tiered,
};
class compaction_strategy_impl;
class sstable;
class sstable_set;
struct compaction_descriptor;
class compaction_strategy {
@@ -51,6 +54,16 @@ public:
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
// Some strategies may look at the compacted and resulting sstables to
// get some useful information for subsequent compactions.
void notify_completion(schema_ptr schema, const std::vector<lw_shared_ptr<sstable>>& removed, const std::vector<lw_shared_ptr<sstable>>& added);
// Return if parallel compaction is allowed by strategy.
bool parallel_compaction() const;
// An estimation of number of compaction for strategy to be satisfied.
int64_t estimated_pending_compactions(column_family& cf) const;
static sstring name(compaction_strategy_type type) {
switch (type) {
case compaction_strategy_type::null:
@@ -61,6 +74,8 @@ public:
return "SizeTieredCompactionStrategy";
case compaction_strategy_type::leveled:
return "LeveledCompactionStrategy";
case compaction_strategy_type::date_tiered:
return "DateTieredCompactionStrategy";
default:
throw std::runtime_error("Invalid Compaction Strategy");
}
@@ -77,6 +92,8 @@ public:
return compaction_strategy_type::size_tiered;
} else if (short_name == "LeveledCompactionStrategy") {
return compaction_strategy_type::leveled;
} else if (short_name == "DateTieredCompactionStrategy") {
return compaction_strategy_type::date_tiered;
} else {
throw exceptions::configuration_exception(sprint("Unable to find compaction strategy class '%s'", name));
}
@@ -87,6 +104,8 @@ public:
sstring name() const {
return name(type());
}
sstable_set make_sstable_set(schema_ptr schema) const;
};
// Creates a compaction_strategy object from one of the strategies available.

View File

@@ -0,0 +1,64 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "query-request.hh"
#include <experimental/optional>
// Wraps ring_position so it is compatible with old-style C++: default constructor,
// stateless comparators, yada yada
class compatible_ring_position {
const schema* _schema = nullptr;
// optional to supply a default constructor, no more
std::experimental::optional<dht::ring_position> _rp;
public:
compatible_ring_position() noexcept = default;
compatible_ring_position(const schema& s, const dht::ring_position& rp)
: _schema(&s), _rp(rp) {
}
compatible_ring_position(const schema& s, dht::ring_position&& rp)
: _schema(&s), _rp(std::move(rp)) {
}
friend int tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
return x._rp->tri_compare(*x._schema, *y._rp);
}
friend bool operator<(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) != 0;
}
};

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -21,7 +21,10 @@
#pragma once
#include <boost/range/algorithm/copy.hpp>
#include <boost/range/adaptor/transformed.hpp>
#include "compound.hh"
#include "schema.hh"
//
// This header provides adaptors between the representation used by our compound_type<>
@@ -180,3 +183,348 @@ bytes to_legacy(CompoundType& type, bytes_view packed) {
std::copy(lv.begin(), lv.end(), legacy_form.begin());
return legacy_form;
}
// Represents a value serialized according to Origin's CompositeType.
// If is_compound is true, then the value is one or more components encoded as:
//
// <representation> ::= ( <component> )+
// <component> ::= <length> <value> <EOC>
// <length> ::= <uint16_t>
// <EOC> ::= <uint8_t>
//
// If false, then it encodes a single value, without a prefix length or a suffix EOC.
class composite final {
bytes _bytes;
bool _is_compound;
public:
composite(bytes&& b, bool is_compound)
: _bytes(std::move(b))
, _is_compound(is_compound)
{ }
composite(bytes&& b)
: _bytes(std::move(b))
, _is_compound(true)
{ }
composite()
: _bytes()
, _is_compound(true)
{ }
using size_type = uint16_t;
using eoc_type = int8_t;
/*
* The 'end-of-component' byte should always be 0 for actual column name.
* However, it can set to 1 for query bounds. This allows to query for the
* equivalent of 'give me the full range'. That is, if a slice query is:
* start = <3><"foo".getBytes()><0>
* end = <3><"foo".getBytes()><1>
* then we'll return *all* the columns whose first component is "foo".
* If for a component, the 'end-of-component' is != 0, there should not be any
* following component. The end-of-component can also be -1 to allow
* non-inclusive query. For instance:
* end = <3><"foo".getBytes()><-1>
* allows to query everything that is smaller than <3><"foo".getBytes()>, but
* not <3><"foo".getBytes()> itself.
*/
enum class eoc : eoc_type {
start = -1,
none = 0,
end = 1
};
using component = std::pair<bytes, eoc>;
using component_view = std::pair<bytes_view, eoc>;
private:
template<typename Value, typename = std::enable_if_t<!std::is_same<const data_value, std::decay_t<Value>>::value>>
static size_t size(Value& val) {
return val.size();
}
static size_t size(const data_value& val) {
return val.serialized_size();
}
template<typename Value, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
static void write_value(Value&& val, bytes::iterator& out) {
out = std::copy(val.begin(), val.end(), out);
}
static void write_value(const data_value& val, bytes::iterator& out) {
val.serialize(out);
}
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out, bool is_compound) {
if (!is_compound) {
auto it = values.begin();
write_value(std::forward<decltype(*it)>(*it), out);
return;
}
for (auto&& val : values) {
write<size_type>(out, static_cast<size_type>(size(val)));
write_value(std::forward<decltype(val)>(val), out);
// Range tombstones are not keys. For collections, only frozen
// values can be keys. Therefore, for as long as it is safe to
// assume that this code will be used to create keys, it is safe
// to assume the trailing byte is always zero.
write<eoc_type>(out, eoc_type(eoc::none));
}
}
template <typename RangeOfSerializedComponents>
static size_t serialized_size(RangeOfSerializedComponents&& values, bool is_compound) {
size_t len = 0;
auto it = values.begin();
if (it != values.end()) {
// CQL3 uses a specific prefix (0xFFFF) to encode "static columns"
// (CASSANDRA-6561). This does mean the maximum size of the first component of a
// composite is 65534, not 65535 (or we wouldn't be able to detect if the first 2
// bytes is the static prefix or not).
auto value_size = size(*it);
if (value_size > static_cast<size_type>(std::numeric_limits<size_type>::max() - uint8_t(is_compound))) {
throw std::runtime_error(sprint("First component size too large: %d > %d", value_size, std::numeric_limits<size_type>::max() - is_compound));
}
if (!is_compound) {
return value_size;
}
len += sizeof(size_type) + value_size + sizeof(eoc_type);
++it;
}
for ( ; it != values.end(); ++it) {
auto value_size = size(*it);
if (value_size > std::numeric_limits<size_type>::max()) {
throw std::runtime_error(sprint("Component size too large: %d > %d", value_size, std::numeric_limits<size_type>::max()));
}
len += sizeof(size_type) + value_size + sizeof(eoc_type);
}
return len;
}
public:
template <typename Describer>
auto describe_type(Describer f) const {
return f(const_cast<bytes&>(_bytes));
}
template<typename RangeOfSerializedComponents>
static bytes serialize_value(RangeOfSerializedComponents&& values, bool is_compound = true) {
auto size = serialized_size(values, is_compound);
bytes b(bytes::initialized_later(), size);
auto i = b.begin();
serialize_value(std::forward<decltype(values)>(values), i, is_compound);
return b;
}
class iterator : public std::iterator<std::input_iterator_tag, const component_view> {
bytes_view _v;
component_view _current;
private:
eoc to_eoc(int8_t eoc_byte) {
return eoc_byte == 0 ? eoc::none : (eoc_byte < 0 ? eoc::start : eoc::end);
}
void read_current() {
size_type len;
{
if (_v.empty()) {
_v = bytes_view(nullptr, 0);
return;
}
len = read_simple<size_type>(_v);
if (_v.size() < len) {
throw marshal_exception();
}
}
auto value = bytes_view(_v.begin(), len);
_v.remove_prefix(len);
_current = component_view(std::move(value), to_eoc(read_simple<eoc_type>(_v)));
}
public:
struct end_iterator_tag {};
iterator(const bytes_view& v, bool is_compound, bool is_static)
: _v(v) {
if (is_static) {
_v.remove_prefix(2);
}
if (is_compound) {
read_current();
} else {
_current = component_view(_v, eoc::none);
_v.remove_prefix(_v.size());
}
}
iterator(end_iterator_tag) : _v(nullptr, 0) {}
iterator& operator++() {
read_current();
return *this;
}
iterator operator++(int) {
iterator i(*this);
++(*this);
return i;
}
const value_type& operator*() const { return _current; }
const value_type* operator->() const { return &_current; }
bool operator!=(const iterator& i) const { return _v.begin() != i._v.begin(); }
bool operator==(const iterator& i) const { return _v.begin() == i._v.begin(); }
};
iterator begin() const {
return iterator(_bytes, _is_compound, is_static());
}
iterator end() const {
return iterator(iterator::end_iterator_tag());
}
boost::iterator_range<iterator> components() const & {
return { begin(), end() };
}
auto values() const & {
return components() | boost::adaptors::transformed([](auto&& c) { return c.first; });
}
std::vector<component> components() const && {
std::vector<component> result;
std::transform(begin(), end(), std::back_inserter(result), [](auto&& p) {
return component(bytes(p.first.begin(), p.first.end()), p.second);
});
return result;
}
std::vector<bytes> values() const && {
std::vector<bytes> result;
boost::copy(components() | boost::adaptors::transformed([](auto&& c) { return to_bytes(c.first); }), std::back_inserter(result));
return result;
}
const bytes& get_bytes() const {
return _bytes;
}
size_t size() const {
return _bytes.size();
}
bool empty() const {
return _bytes.empty();
}
static bool is_static(bytes_view bytes, bool is_compound) {
return is_compound && bytes.size() > 2 && (bytes[0] & bytes[1] & 0xff) == 0xff;
}
bool is_static() const {
return is_static(_bytes, _is_compound);
}
bool is_compound() const {
return _is_compound;
}
// The following factory functions assume this composite is a compound value.
template <typename ClusteringElement>
static composite from_clustering_element(const schema& s, const ClusteringElement& ce) {
return serialize_value(ce.components(s));
}
static composite from_exploded(const std::vector<bytes_view>& v, eoc marker = eoc::none) {
if (v.size() == 0) {
return bytes(size_t(1), bytes::value_type(marker));
}
auto b = serialize_value(v);
b.back() = eoc_type(marker);
return composite(std::move(b));
}
static composite static_prefix(const schema& s) {
static bytes static_marker(size_t(2), bytes::value_type(0xff));
std::vector<bytes_view> sv(s.clustering_key_size());
return static_marker + serialize_value(sv);
}
explicit operator bytes_view() const {
return _bytes;
}
template <typename Component>
friend inline std::ostream& operator<<(std::ostream& os, const std::pair<Component, eoc>& c) {
return os << "{value=" << c.first << "; eoc=" << sprint("0x%02x", eoc_type(c.second) & 0xff) << "}";
}
};
class composite_view final {
bytes_view _bytes;
bool _is_compound;
public:
composite_view(bytes_view b, bool is_compound = true)
: _bytes(b)
, _is_compound(is_compound)
{ }
composite_view(const composite& c)
: composite_view(static_cast<bytes_view>(c), c.is_compound())
{ }
composite_view()
: _bytes(nullptr, 0)
, _is_compound(true)
{ }
std::vector<bytes> explode() const {
if (!_is_compound) {
return { to_bytes(_bytes) };
}
std::vector<bytes> ret;
for (auto it = begin(), e = end(); it != e; ) {
ret.push_back(to_bytes(it->first));
auto marker = it->second;
++it;
if (it != e && marker != composite::eoc::none) {
throw runtime_exception(sprint("non-zero component divider found (%d) mid", sprint("0x%02x", composite::eoc_type(marker) & 0xff)));
}
}
return ret;
}
composite::iterator begin() const {
return composite::iterator(_bytes, _is_compound, is_static());
}
composite::iterator end() const {
return composite::iterator(composite::iterator::end_iterator_tag());
}
boost::iterator_range<composite::iterator> components() const {
return { begin(), end() };
}
auto values() const {
return components() | boost::adaptors::transformed([](auto&& c) { return c.first; });
}
size_t size() const {
return _bytes.size();
}
bool empty() const {
return _bytes.empty();
}
bool is_static() const {
return composite::is_static(_bytes, _is_compound);
}
explicit operator bytes_view() const {
return _bytes;
}
bool operator==(const composite_view& k) const { return k._bytes == _bytes && k._is_compound == _is_compound; }
bool operator!=(const composite_view& k) const { return !(k == *this); }
};

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -32,7 +32,7 @@ enum class compressor {
class compression_parameters {
public:
static constexpr int32_t DEFAULT_CHUNK_LENGTH = 64 * 1024;
static constexpr int32_t DEFAULT_CHUNK_LENGTH = 4 * 1024;
static constexpr double DEFAULT_CRC_CHECK_CHANCE = 1.0;
static constexpr auto SSTABLE_COMPRESSION = "sstable_compression";

2
conf/housekeeping.cfg Normal file
View File

@@ -0,0 +1,2 @@
[housekeeping]
check-version: True

View File

@@ -106,6 +106,19 @@ write_request_timeout_in_ms: 2000
# most users should never need to adjust this.
# phi_convict_threshold: 8
# IEndpointSnitch. The snitch has two functions:
# - it teaches Scylla enough about your network topology to route
# requests efficiently
# - it allows Scylla to spread replicas around your cluster to avoid
# correlated failures. It does this by grouping machines into
# "datacenters" and "racks." Scylla will do its best not to have
# more than one replica on the same "rack" (which may not actually
# be a physical location)
#
# IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
# YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS
# ARE PLACED.
#
# Out of the box, Scylla provides
# - SimpleSnitch:
# Treats Strategy order as proximity. This can improve cache
@@ -179,6 +192,24 @@ api_address: 127.0.0.1
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
# Authentication backend, identifying users
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
# users. It keeps usernames and hashed passwords in system_auth.credentials table.
# Please increase system_auth keyspace replication factor if you use this authenticator.
# authenticator: AllowAllAuthenticator
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
# increase system_auth keyspace replication factor if you use this authorizer.
# authorizer: AllowAllAuthorizer
###################################################
## Not currently supported, reserved for future use
@@ -216,25 +247,6 @@ batch_size_warn_threshold_in_kb: 5
# reduced proportionally to the number of nodes in the cluster.
# batchlog_replay_throttle_in_kb: 1024
# Authentication backend, identifying users
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
# users. It keeps usernames and hashed passwords in system_auth.credentials table.
# Please increase system_auth keyspace replication factor if you use this authenticator.
# authenticator: AllowAllAuthenticator
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
# increase system_auth keyspace replication factor if you use this authorizer.
# authorizer: AllowAllAuthorizer
# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
# one example). Defaults to 2000, set to 0 to disable.
@@ -680,58 +692,6 @@ commitlog_total_space_in_mb: -1
# Default value is 0, which never timeout streams.
# streaming_socket_timeout_in_ms: 0
# endpoint_snitch -- Set this to a class that implements
# IEndpointSnitch. The snitch has two functions:
# - it teaches Scylla enough about your network topology to route
# requests efficiently
# - it allows Scylla to spread replicas around your cluster to avoid
# correlated failures. It does this by grouping machines into
# "datacenters" and "racks." Scylla will do its best not to have
# more than one replica on the same "rack" (which may not actually
# be a physical location)
#
# IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
# YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS
# ARE PLACED.
#
# Out of the box, Scylla provides
# - SimpleSnitch:
# Treats Strategy order as proximity. This can improve cache
# locality when disabling read repair. Only appropriate for
# single-datacenter deployments.
# - GossipingPropertyFileSnitch
# This should be your go-to snitch for production use. The rack
# and datacenter for the local node are defined in
# cassandra-rackdc.properties and propagated to other nodes via
# gossip. If cassandra-topology.properties exists, it is used as a
# fallback, allowing migration from the PropertyFileSnitch.
# - PropertyFileSnitch:
# Proximity is determined by rack and data center, which are
# explicitly configured in cassandra-topology.properties.
# - Ec2Snitch:
# Appropriate for EC2 deployments in a single Region. Loads Region
# and Availability Zone information from the EC2 API. The Region is
# treated as the datacenter, and the Availability Zone as the rack.
# Only private IPs are used, so this will not work across multiple
# Regions.
# - Ec2MultiRegionSnitch:
# Uses public IPs as broadcast_address to allow cross-region
# connectivity. (Thus, you should set seed addresses to the public
# IP as well.) You will need to open the storage_port or
# ssl_storage_port on the public IP firewall. (For intra-Region
# traffic, Scylla will switch to the private IP after
# establishing a connection.)
# - RackInferringSnitch:
# Proximity is determined by rack and data center, which are
# assumed to correspond to the 3rd and 2nd octet of each node's IP
# address, respectively. Unless this happens to match your
# deployment conventions, this is best used as an example of
# writing a custom Snitch class and is provided in that spirit.
#
# You can use a custom Snitch by setting this to the full class name
# of the snitch, which will be assumed to be on your classpath.
# controls how often to perform the more expensive part of host score
# calculation
# dynamic_snitch_update_interval_in_ms: 100
@@ -824,7 +784,7 @@ commitlog_total_space_in_mb: -1
# can be: all - all traffic is compressed
# dc - traffic between different datacenters is compressed
# none - nothing is compressed.
# internode_compression: all
# internode_compression: none
# Enable or disable tcp_nodelay for inter-dc communication.
# Disabling it will result in larger (but fewer) network packets being sent,
@@ -845,3 +805,11 @@ commitlog_total_space_in_mb: -1
# true: relaxed environment checks; performance and reliability may degraade.
#
# developer_mode: false
# Idle-time background processing
#
# Scylla can perform certain jobs in the background while the system is otherwise idle,
# freeing processor resources when there is other work to be done.
#
# defragment_memory_on_idle: true

View File

@@ -1,6 +1,6 @@
#!/usr/bin/python3
#
# Copyright 2015 Cloudius Systems
# Copyright (C) 2015 ScyllaDB
#
#
@@ -162,6 +162,7 @@ modes = {
scylla_tests = [
'tests/mutation_test',
'tests/streamed_mutation_test',
'tests/schema_registry_test',
'tests/canonical_mutation_test',
'tests/range_test',
@@ -216,6 +217,9 @@ scylla_tests = [
'tests/dynamic_bitset_test',
'tests/auth_test',
'tests/idl_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/database_test',
]
apps = [
@@ -256,6 +260,8 @@ arg_parser.add_argument('--debuginfo', action = 'store', dest = 'debuginfo', typ
help = 'Enable(1)/disable(0)compiler debug information generation')
arg_parser.add_argument('--static-stdc++', dest = 'staticcxx', action = 'store_true',
help = 'Link libgcc and libstdc++ statically')
arg_parser.add_argument('--static-thrift', dest = 'staticthrift', action = 'store_true',
help = 'Link libthrift statically')
arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
help = 'Enable(1)/disable(0)compiler debug information generation for tests')
arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
@@ -276,6 +282,8 @@ scylla_core = (['database.cc',
'schema_registry.cc',
'bytes.cc',
'mutation.cc',
'streamed_mutation.cc',
'partition_version.cc',
'row_cache.cc',
'canonical_mutation.cc',
'frozen_mutation.cc',
@@ -291,15 +299,15 @@ scylla_core = (['database.cc',
'mutation_query.cc',
'key_reader.cc',
'keys.cc',
'clustering_key_filter.cc',
'sstables/sstables.cc',
'sstables/compress.cc',
'sstables/row.cc',
'sstables/key.cc',
'sstables/partition.cc',
'sstables/filter.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/compaction_manager.cc',
'log.cc',
'transport/event.cc',
'transport/event_notifier.cc',
'transport/server.cc',
@@ -316,11 +324,14 @@ scylla_core = (['database.cc',
'cql3/functions/functions.cc',
'cql3/statements/cf_prop_defs.cc',
'cql3/statements/cf_statement.cc',
'cql3/statements/authentication_statement.cc',
'cql3/statements/create_keyspace_statement.cc',
'cql3/statements/create_table_statement.cc',
'cql3/statements/create_type_statement.cc',
'cql3/statements/create_user_statement.cc',
'cql3/statements/drop_keyspace_statement.cc',
'cql3/statements/drop_table_statement.cc',
'cql3/statements/drop_type_statement.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
'cql3/statements/modification_statement.cc',
@@ -336,8 +347,19 @@ scylla_core = (['database.cc',
'cql3/statements/create_index_statement.cc',
'cql3/statements/truncate_statement.cc',
'cql3/statements/alter_table_statement.cc',
'cql3/statements/alter_user_statement.cc',
'cql3/statements/drop_user_statement.cc',
'cql3/statements/list_users_statement.cc',
'cql3/statements/authorization_statement.cc',
'cql3/statements/permission_altering_statement.cc',
'cql3/statements/list_permissions_statement.cc',
'cql3/statements/grant_statement.cc',
'cql3/statements/revoke_statement.cc',
'cql3/statements/alter_type_statement.cc',
'cql3/statements/alter_keyspace_statement.cc',
'cql3/update_parameters.cc',
'cql3/ut_name.cc',
'cql3/user_options.cc',
'thrift/handler.cc',
'thrift/server.cc',
'thrift/thrift_validation.cc',
@@ -353,6 +375,7 @@ scylla_core = (['database.cc',
'cql3/operator.cc',
'cql3/relation.cc',
'cql3/column_identifier.cc',
'cql3/column_specification.cc',
'cql3/constants.cc',
'cql3/query_processor.cc',
'cql3/query_options.cc',
@@ -368,6 +391,7 @@ scylla_core = (['database.cc',
'cql3/selection/selection.cc',
'cql3/selection/selector.cc',
'cql3/restrictions/statement_restrictions.cc',
'cql3/result_set.cc',
'db/consistency_level.cc',
'db/system_keyspace.cc',
'db/schema_tables.cc',
@@ -388,6 +412,7 @@ scylla_core = (['database.cc',
'utils/file_lock.cc',
'utils/dynamic_bitset.cc',
'utils/managed_bytes.cc',
'utils/exceptions.cc',
'gms/version_generator.cc',
'gms/versioned_value.cc',
'gms/gossiper.cc',
@@ -448,9 +473,17 @@ scylla_core = (['database.cc',
'auth/auth.cc',
'auth/authenticated_user.cc',
'auth/authenticator.cc',
'auth/authorizer.cc',
'auth/default_authorizer.cc',
'auth/data_resource.cc',
'auth/password_authenticator.cc',
'auth/permission.cc',
'tracing/tracing.cc',
'tracing/trace_keyspace_helper.cc',
'tracing/trace_state.cc',
'range_tombstone.cc',
'range_tombstone_list.cc',
'db/size_estimates_recorder.cc'
]
+ [Antlr3Grammar('cql3/Cql.g')]
+ [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -510,6 +543,7 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/query.idl.hh',
'idl/idl_test.idl.hh',
'idl/commitlog.idl.hh',
'idl/tracing.idl.hh',
]
scylla_tests_dependencies = scylla_core + api + idls + [
@@ -532,8 +566,6 @@ tests_not_using_seastar_test_framework = set([
'tests/keys_test',
'tests/partitioner_test',
'tests/map_difference_test',
'tests/frozen_mutation_test',
'tests/canonical_mutation_test',
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
@@ -554,6 +586,8 @@ tests_not_using_seastar_test_framework = set([
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
])
for t in tests_not_using_seastar_test_framework:
@@ -570,7 +604,8 @@ deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'log.cc', 'utils/dynamic_bitset.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
warnings = [
'-Wno-mismatched-tags', # clang-only
@@ -706,6 +741,10 @@ user_cflags = args.user_cflags
user_ldflags = args.user_ldflags
if args.staticcxx:
user_ldflags += " -static-libgcc -static-libstdc++"
if args.staticthrift:
thrift_libs = "-Wl,-Bstatic -lthrift -Wl,-Bdynamic"
else:
thrift_libs = "-lthrift"
outdir = 'build'
buildfile = 'build.ninja'
@@ -816,14 +855,14 @@ with open(buildfile, 'w') as f:
f.write('build $builddir/{}/{}: {}.{} {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
if has_thrift:
f.write(' libs = -lthrift -lboost_system $libs\n')
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
f.write('build $builddir/{}/{}_g: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
else:
f.write('build $builddir/{}/{}: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
if has_thrift:
f.write(' libs = -lthrift -lboost_system $libs\n')
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
for src in srcs:
if src.endswith('.cc'):
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -35,7 +35,7 @@ class converting_mutation_partition_applier : public mutation_partition_visitor
deletable_row* _current_row;
private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return new_def.kind == kind && new_def.type->is_value_compatible_with(*old_type);
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
}
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
@@ -90,8 +90,8 @@ public:
}
}
virtual void accept_row_tombstone(clustering_key_prefix_view prefix, tombstone t) override {
_p.apply_row_tombstone(_p_schema, prefix, t);
virtual void accept_row_tombstone(const range_tombstone& rt) override {
_p.apply_row_tombstone(_p_schema, rt);
}
virtual void accept_row(clustering_key_view key, tombstone deleted_at, const row_marker& rm) override {

View File

@@ -26,26 +26,39 @@ options {
@parser::namespace{cql3_parser}
@lexer::includes {
#include "cql3/error_collector.hh"
#include "cql3/error_listener.hh"
}
@parser::includes {
#include "cql3/selection/writetime_or_ttl.hh"
#include "cql3/statements/raw/parsed_statement.hh"
#include "cql3/statements/raw/select_statement.hh"
#include "cql3/statements/alter_keyspace_statement.hh"
#include "cql3/statements/alter_table_statement.hh"
#include "cql3/statements/create_keyspace_statement.hh"
#include "cql3/statements/drop_keyspace_statement.hh"
#include "cql3/statements/create_index_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "cql3/statements/create_type_statement.hh"
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/alter_type_statement.hh"
#include "cql3/statements/property_definitions.hh"
#include "cql3/statements/drop_table_statement.hh"
#include "cql3/statements/truncate_statement.hh"
#include "cql3/statements/select_statement.hh"
#include "cql3/statements/update_statement.hh"
#include "cql3/statements/delete_statement.hh"
#include "cql3/statements/raw/update_statement.hh"
#include "cql3/statements/raw/insert_statement.hh"
#include "cql3/statements/raw/delete_statement.hh"
#include "cql3/statements/index_prop_defs.hh"
#include "cql3/statements/use_statement.hh"
#include "cql3/statements/batch_statement.hh"
#include "cql3/statements/raw/use_statement.hh"
#include "cql3/statements/raw/batch_statement.hh"
#include "cql3/statements/create_user_statement.hh"
#include "cql3/statements/alter_user_statement.hh"
#include "cql3/statements/drop_user_statement.hh"
#include "cql3/statements/list_users_statement.hh"
#include "cql3/statements/grant_statement.hh"
#include "cql3/statements/revoke_statement.hh"
#include "cql3/statements/list_permissions_statement.hh"
#include "cql3/statements/index_target.hh"
#include "cql3/statements/ks_prop_defs.hh"
#include "cql3/selection/raw_selector.hh"
@@ -108,10 +121,13 @@ struct uninitialized {
}
@context {
using listener_type = cql3::error_listener<RecognizerType>;
using collector_type = cql3::error_collector<ComponentType, ExceptionBaseType::TokenType, ExceptionBaseType>;
using listener_type = cql3::error_listener<ComponentType, ExceptionBaseType>;
listener_type* listener;
std::vector<::shared_ptr<cql3::column_identifier>> _bind_variables;
std::vector<std::unique_ptr<TokenType>> _missing_tokens;
// Can't use static variable, since it needs to be defined out-of-line
static const std::unordered_set<sstring>& _reserved_type_names() {
@@ -161,15 +177,26 @@ struct uninitialized {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex)
{
std::stringstream msg;
ex->displayRecognitionError(token_names, msg);
listener->syntax_error(*this, msg.str());
listener->syntax_error(*this, token_names, ex);
}
void add_recognition_error(const sstring& msg) {
listener->syntax_error(*this, msg);
}
bool is_eof_token(CommonTokenType token) const
{
return token == CommonTokenType::TOKEN_EOF;
}
std::string token_text(const TokenType* token)
{
if (!token) {
return "";
}
return token->getText();
}
std::map<sstring, sstring> convert_property_map(shared_ptr<cql3::maps::literal> map) {
if (!map || map->entries.empty()) {
return std::map<sstring, sstring>{};
@@ -216,6 +243,13 @@ struct uninitialized {
}
operations.emplace_back(std::move(key), std::move(update));
}
TokenType* getMissingSymbol(IntStreamType* istream, ExceptionBaseType* e,
ANTLR_UINT32 expectedTokenType, BitsetListType* follow) {
auto token = BaseType::getMissingSymbol(istream, e, expectedTokenType, follow);
_missing_tokens.emplace_back(token);
return token;
}
}
@lexer::namespace{cql3_parser}
@@ -233,7 +267,8 @@ struct uninitialized {
}
@lexer::context {
using listener_type = cql3::error_listener<RecognizerType>;
using collector_type = cql3::error_collector<ComponentType, ExceptionBaseType::TokenType, ExceptionBaseType>;
using listener_type = cql3::error_listener<ComponentType, ExceptionBaseType>;
listener_type* listener;
@@ -243,19 +278,30 @@ struct uninitialized {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex)
{
std::stringstream msg;
ex->displayRecognitionError(token_names, msg);
listener->syntax_error(*this, msg.str());
listener->syntax_error(*this, token_names, ex);
}
bool is_eof_token(CommonTokenType token) const
{
return token == CommonTokenType::TOKEN_EOF;
}
std::string token_text(const TokenType* token) const
{
if (!token) {
return "";
}
return std::to_string(int(*token));
}
}
/** STATEMENTS **/
query returns [shared_ptr<parsed_statement> stmnt]
query returns [shared_ptr<raw::parsed_statement> stmnt]
: st=cqlStatement (';')* EOF { $stmnt = st; }
;
cqlStatement returns [shared_ptr<parsed_statement> stmt]
cqlStatement returns [shared_ptr<raw::parsed_statement> stmt]
@after{ if (stmt) { stmt->set_bound_variables(_bind_variables); } }
: st1= selectStatement { $stmt = st1; }
| st2= insertStatement { $stmt = st2; }
@@ -273,7 +319,6 @@ cqlStatement returns [shared_ptr<parsed_statement> stmt]
| st13=dropIndexStatement { $stmt = st13; }
#endif
| st14=alterTableStatement { $stmt = st14; }
#if 0
| st15=alterKeyspaceStatement { $stmt = st15; }
| st16=grantStatement { $stmt = st16; }
| st17=revokeStatement { $stmt = st17; }
@@ -282,13 +327,14 @@ cqlStatement returns [shared_ptr<parsed_statement> stmt]
| st20=alterUserStatement { $stmt = st20; }
| st21=dropUserStatement { $stmt = st21; }
| st22=listUsersStatement { $stmt = st22; }
#if 0
| st23=createTriggerStatement { $stmt = st23; }
| st24=dropTriggerStatement { $stmt = st24; }
#endif
| st25=createTypeStatement { $stmt = st25; }
#if 0
| st26=alterTypeStatement { $stmt = st26; }
| st27=dropTypeStatement { $stmt = st27; }
#if 0
| st28=createFunctionStatement { $stmt = st28; }
| st29=dropFunctionStatement { $stmt = st29; }
| st30=createAggregateStatement { $stmt = st30; }
@@ -299,8 +345,8 @@ cqlStatement returns [shared_ptr<parsed_statement> stmt]
/*
* USE <KEYSPACE>;
*/
useStatement returns [::shared_ptr<use_statement> stmt]
: K_USE ks=keyspaceName { $stmt = ::make_shared<use_statement>(ks); }
useStatement returns [::shared_ptr<raw::use_statement> stmt]
: K_USE ks=keyspaceName { $stmt = ::make_shared<raw::use_statement>(ks); }
;
/**
@@ -309,11 +355,11 @@ useStatement returns [::shared_ptr<use_statement> stmt]
* WHERE KEY = "key1" AND COL > 1 AND COL < 100
* LIMIT <NUMBER>;
*/
selectStatement returns [shared_ptr<select_statement::raw_statement> expr]
selectStatement returns [shared_ptr<raw::select_statement> expr]
@init {
bool is_distinct = false;
::shared_ptr<cql3::term::raw> limit;
select_statement::parameters::orderings_type orderings;
raw::select_statement::parameters::orderings_type orderings;
bool allow_filtering = false;
}
: K_SELECT ( ( K_DISTINCT { is_distinct = true; } )?
@@ -326,8 +372,8 @@ selectStatement returns [shared_ptr<select_statement::raw_statement> expr]
( K_LIMIT rows=intValue { limit = rows; } )?
( K_ALLOW K_FILTERING { allow_filtering = true; } )?
{
auto params = ::make_shared<select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering);
$expr = ::make_shared<select_statement::raw_statement>(std::move(cf), std::move(params),
auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering);
$expr = ::make_shared<raw::select_statement>(std::move(cf), std::move(params),
std::move(sclause), std::move(wclause), std::move(limit));
}
;
@@ -381,7 +427,7 @@ whereClause returns [std::vector<cql3::relation_ptr> clause]
: relation[$clause] (K_AND relation[$clause])*
;
orderByClause[select_statement::parameters::orderings_type& orderings]
orderByClause[raw::select_statement::parameters::orderings_type& orderings]
@init{
bool reversed = false;
}
@@ -394,7 +440,7 @@ orderByClause[select_statement::parameters::orderings_type& orderings]
* USING TIMESTAMP <long>;
*
*/
insertStatement returns [::shared_ptr<update_statement::parsed_insert> expr]
insertStatement returns [::shared_ptr<raw::insert_statement> expr]
@init {
auto attrs = ::make_shared<cql3::attributes::raw>();
std::vector<::shared_ptr<cql3::column_identifier::raw>> column_names;
@@ -409,7 +455,7 @@ insertStatement returns [::shared_ptr<update_statement::parsed_insert> expr]
( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
( usingClause[attrs] )?
{
$expr = ::make_shared<update_statement::parsed_insert>(std::move(cf),
$expr = ::make_shared<raw::insert_statement>(std::move(cf),
std::move(attrs),
std::move(column_names),
std::move(values),
@@ -432,7 +478,7 @@ usingClauseObjective[::shared_ptr<cql3::attributes::raw> attrs]
* SET name1 = value1, name2 = value2
* WHERE key = value;
*/
updateStatement returns [::shared_ptr<update_statement::parsed_update> expr]
updateStatement returns [::shared_ptr<raw::update_statement> expr]
@init {
auto attrs = ::make_shared<cql3::attributes::raw>();
std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, ::shared_ptr<cql3::operation::raw_update>>> operations;
@@ -443,7 +489,7 @@ updateStatement returns [::shared_ptr<update_statement::parsed_update> expr]
K_WHERE wclause=whereClause
( K_IF conditions=updateConditions )?
{
return ::make_shared<update_statement::parsed_update>(std::move(cf),
return ::make_shared<raw::update_statement>(std::move(cf),
std::move(attrs),
std::move(operations),
std::move(wclause),
@@ -462,7 +508,7 @@ updateConditions returns [conditions_type conditions]
* WHERE KEY = keyname
[IF (EXISTS | name = value, ...)];
*/
deleteStatement returns [::shared_ptr<delete_statement::parsed> expr]
deleteStatement returns [::shared_ptr<raw::delete_statement> expr]
@init {
auto attrs = ::make_shared<cql3::attributes::raw>();
std::vector<::shared_ptr<cql3::operation::raw_deletion>> column_deletions;
@@ -474,7 +520,7 @@ deleteStatement returns [::shared_ptr<delete_statement::parsed> expr]
K_WHERE wclause=whereClause
( K_IF ( K_EXISTS { if_exists = true; } | conditions=updateConditions ))?
{
return ::make_shared<delete_statement::parsed>(cf,
return ::make_shared<raw::delete_statement>(cf,
std::move(attrs),
std::move(column_deletions),
std::move(wclause),
@@ -521,11 +567,11 @@ usingClauseDelete[::shared_ptr<cql3::attributes::raw> attrs]
* ...
* APPLY BATCH
*/
batchStatement returns [shared_ptr<cql3::statements::batch_statement::parsed> expr]
batchStatement returns [shared_ptr<cql3::statements::raw::batch_statement> expr]
@init {
using btype = cql3::statements::batch_statement::type;
using btype = cql3::statements::raw::batch_statement::type;
btype type = btype::LOGGED;
std::vector<shared_ptr<cql3::statements::modification_statement::parsed>> statements;
std::vector<shared_ptr<cql3::statements::raw::modification_statement>> statements;
auto attrs = make_shared<cql3::attributes::raw>();
}
: K_BEGIN
@@ -534,11 +580,11 @@ batchStatement returns [shared_ptr<cql3::statements::batch_statement::parsed> ex
( s=batchStatementObjective ';'? { statements.push_back(std::move(s)); } )*
K_APPLY K_BATCH
{
$expr = ::make_shared<cql3::statements::batch_statement::parsed>(type, std::move(attrs), std::move(statements));
$expr = ::make_shared<cql3::statements::raw::batch_statement>(type, std::move(attrs), std::move(statements));
}
;
batchStatementObjective returns [shared_ptr<cql3::statements::modification_statement::parsed> statement]
batchStatementObjective returns [shared_ptr<cql3::statements::raw::modification_statement> statement]
: i=insertStatement { $statement = i; }
| u=updateStatement { $statement = u; }
| d=deleteStatement { $statement = d; }
@@ -764,15 +810,18 @@ dropTriggerStatement returns [DropTriggerStatement expr]
{ $expr = new DropTriggerStatement(cf, name.toString(), ifExists); }
;
#endif
/**
* ALTER KEYSPACE <KS> WITH <property> = <value>;
*/
alterKeyspaceStatement returns [AlterKeyspaceStatement expr]
@init { KSPropDefs attrs = new KSPropDefs(); }
alterKeyspaceStatement returns [shared_ptr<cql3::statements::alter_keyspace_statement> expr]
@init {
auto attrs = make_shared<cql3::statements::ks_prop_defs>();
}
: K_ALTER K_KEYSPACE ks=keyspaceName
K_WITH properties[attrs] { $expr = new AlterKeyspaceStatement(ks, attrs); }
K_WITH properties[attrs] { $expr = make_shared<cql3::statements::alter_keyspace_statement>(ks, attrs); }
;
#endif
/**
* ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;
@@ -803,26 +852,27 @@ alterTableStatement returns [shared_ptr<alter_table_statement> expr]
}
;
#if 0
/**
* ALTER TYPE <name> ALTER <field> TYPE <newtype>;
* ALTER TYPE <name> ADD <field> <newtype>;
* ALTER TYPE <name> RENAME <field> TO <newtype> AND ...;
*/
alterTypeStatement returns [AlterTypeStatement expr]
alterTypeStatement returns [::shared_ptr<alter_type_statement> expr]
: K_ALTER K_TYPE name=userTypeName
( K_ALTER f=ident K_TYPE v=comparatorType { $expr = AlterTypeStatement.alter(name, f, v); }
| K_ADD f=ident v=comparatorType { $expr = AlterTypeStatement.addition(name, f, v); }
( K_ALTER f=ident K_TYPE v=comparatorType { $expr = ::make_shared<alter_type_statement::add_or_alter>(name, false, f, v); }
| K_ADD f=ident v=comparatorType { $expr = ::make_shared<alter_type_statement::add_or_alter>(name, true, f, v); }
| K_RENAME
{ Map<ColumnIdentifier, ColumnIdentifier> renames = new HashMap<ColumnIdentifier, ColumnIdentifier>(); }
id1=ident K_TO toId1=ident { renames.put(id1, toId1); }
( K_AND idn=ident K_TO toIdn=ident { renames.put(idn, toIdn); } )*
{ $expr = AlterTypeStatement.renames(name, renames); }
{ $expr = ::make_shared<alter_type_statement::renames>(name); }
renames[{ static_pointer_cast<alter_type_statement::renames>($expr) }]
)
;
#endif
renames[::shared_ptr<alter_type_statement::renames> expr]
: fromId=ident K_TO toId=ident { $expr->add_rename(fromId, toId); }
( K_AND renames[$expr] )?
;
/**
* DROP KEYSPACE [IF EXISTS] <KSP>;
*/
@@ -839,15 +889,15 @@ dropTableStatement returns [::shared_ptr<drop_table_statement> stmt]
: K_DROP K_COLUMNFAMILY (K_IF K_EXISTS { if_exists = true; } )? cf=columnFamilyName { $stmt = ::make_shared<drop_table_statement>(cf, if_exists); }
;
#if 0
/**
* DROP TYPE <name>;
*/
dropTypeStatement returns [DropTypeStatement stmt]
@init { boolean ifExists = false; }
: K_DROP K_TYPE (K_IF K_EXISTS { ifExists = true; } )? name=userTypeName { $stmt = new DropTypeStatement(name, ifExists); }
dropTypeStatement returns [::shared_ptr<drop_type_statement> stmt]
@init { bool if_exists = false; }
: K_DROP K_TYPE (K_IF K_EXISTS { if_exists = true; } )? name=userTypeName { $stmt = ::make_shared<drop_type_statement>(name, if_exists); }
;
#if 0
/**
* DROP INDEX [IF EXISTS] <INDEX_NAME>
*/
@@ -865,120 +915,118 @@ truncateStatement returns [::shared_ptr<truncate_statement> stmt]
: K_TRUNCATE (K_COLUMNFAMILY)? cf=columnFamilyName { $stmt = ::make_shared<truncate_statement>(cf); }
;
#if 0
/**
* GRANT <permission> ON <resource> TO <username>
*/
grantStatement returns [GrantStatement stmt]
grantStatement returns [::shared_ptr<grant_statement> stmt]
: K_GRANT
permissionOrAll
K_ON
resource
K_TO
username
{ $stmt = new GrantStatement($permissionOrAll.perms, $resource.res, $username.text); }
{ $stmt = ::make_shared<grant_statement>($permissionOrAll.perms, $resource.res, $username.text); }
;
/**
* REVOKE <permission> ON <resource> FROM <username>
*/
revokeStatement returns [RevokeStatement stmt]
revokeStatement returns [::shared_ptr<revoke_statement> stmt]
: K_REVOKE
permissionOrAll
K_ON
resource
K_FROM
username
{ $stmt = new RevokeStatement($permissionOrAll.perms, $resource.res, $username.text); }
{ $stmt = ::make_shared<revoke_statement>($permissionOrAll.perms, $resource.res, $username.text); }
;
listPermissionsStatement returns [ListPermissionsStatement stmt]
listPermissionsStatement returns [::shared_ptr<list_permissions_statement> stmt]
@init {
IResource resource = null;
String username = null;
boolean recursive = true;
std::experimental::optional<auth::data_resource> r;
std::experimental::optional<sstring> u;
bool recursive = true;
}
: K_LIST
permissionOrAll
( K_ON resource { resource = $resource.res; } )?
( K_OF username { username = $username.text; } )?
( K_ON resource { r = $resource.res; } )?
( K_OF username { u = sstring($username.text); } )?
( K_NORECURSIVE { recursive = false; } )?
{ $stmt = new ListPermissionsStatement($permissionOrAll.perms, resource, username, recursive); }
{ $stmt = ::make_shared<list_permissions_statement>($permissionOrAll.perms, std::move(r), std::move(u), recursive); }
;
permission returns [Permission perm]
permission returns [auth::permission perm]
: p=(K_CREATE | K_ALTER | K_DROP | K_SELECT | K_MODIFY | K_AUTHORIZE)
{ $perm = Permission.valueOf($p.text.toUpperCase()); }
{ $perm = auth::permissions::from_string($p.text); }
;
permissionOrAll returns [Set<Permission> perms]
: K_ALL ( K_PERMISSIONS )? { $perms = Permission.ALL_DATA; }
| p=permission ( K_PERMISSION )? { $perms = EnumSet.of($p.perm); }
permissionOrAll returns [auth::permission_set perms]
: K_ALL ( K_PERMISSIONS )? { $perms = auth::permissions::ALL_DATA; }
| p=permission ( K_PERMISSION )? { $perms = auth::permission_set::from_mask(auth::permission_set::mask_for($p.perm)); }
;
resource returns [IResource res]
resource returns [auth::data_resource res]
: r=dataResource { $res = $r.res; }
;
dataResource returns [DataResource res]
: K_ALL K_KEYSPACES { $res = DataResource.root(); }
| K_KEYSPACE ks = keyspaceName { $res = DataResource.keyspace($ks.id); }
dataResource returns [auth::data_resource res]
: K_ALL K_KEYSPACES { $res = auth::data_resource(); }
| K_KEYSPACE ks = keyspaceName { $res = auth::data_resource($ks.id); }
| ( K_COLUMNFAMILY )? cf = columnFamilyName
{ $res = DataResource.columnFamily($cf.name.getKeyspace(), $cf.name.getColumnFamily()); }
{ $res = auth::data_resource($cf.name->get_keyspace(), $cf.name->get_column_family()); }
;
/**
* CREATE USER [IF NOT EXISTS] <username> [WITH PASSWORD <password>] [SUPERUSER|NOSUPERUSER]
*/
createUserStatement returns [CreateUserStatement stmt]
createUserStatement returns [::shared_ptr<create_user_statement> stmt]
@init {
UserOptions opts = new UserOptions();
boolean superuser = false;
boolean ifNotExists = false;
auto opts = ::make_shared<cql3::user_options>();
bool superuser = false;
bool ifNotExists = false;
}
: K_CREATE K_USER (K_IF K_NOT K_EXISTS { ifNotExists = true; })? username
( K_WITH userOptions[opts] )?
( K_SUPERUSER { superuser = true; } | K_NOSUPERUSER { superuser = false; } )?
{ $stmt = new CreateUserStatement($username.text, opts, superuser, ifNotExists); }
{ $stmt = ::make_shared<create_user_statement>($username.text, std::move(opts), superuser, ifNotExists); }
;
/**
* ALTER USER <username> [WITH PASSWORD <password>] [SUPERUSER|NOSUPERUSER]
*/
alterUserStatement returns [AlterUserStatement stmt]
alterUserStatement returns [::shared_ptr<alter_user_statement> stmt]
@init {
UserOptions opts = new UserOptions();
Boolean superuser = null;
auto opts = ::make_shared<cql3::user_options>();
std::experimental::optional<bool> superuser;
}
: K_ALTER K_USER username
( K_WITH userOptions[opts] )?
( K_SUPERUSER { superuser = true; } | K_NOSUPERUSER { superuser = false; } )?
{ $stmt = new AlterUserStatement($username.text, opts, superuser); }
{ $stmt = ::make_shared<alter_user_statement>($username.text, std::move(opts), std::move(superuser)); }
;
/**
* DROP USER [IF EXISTS] <username>
*/
dropUserStatement returns [DropUserStatement stmt]
@init { boolean ifExists = false; }
: K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? username { $stmt = new DropUserStatement($username.text, ifExists); }
dropUserStatement returns [::shared_ptr<drop_user_statement> stmt]
@init { bool ifExists = false; }
: K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? username { $stmt = ::make_shared<drop_user_statement>($username.text, ifExists); }
;
/**
* LIST USERS
*/
listUsersStatement returns [ListUsersStatement stmt]
: K_LIST K_USERS { $stmt = new ListUsersStatement(); }
listUsersStatement returns [::shared_ptr<list_users_statement> stmt]
: K_LIST K_USERS { $stmt = ::make_shared<list_users_statement>(); }
;
userOptions[UserOptions opts]
userOptions[::shared_ptr<cql3::user_options> opts]
: userOption[opts]
;
userOption[UserOptions opts]
: k=K_PASSWORD v=STRING_LITERAL { opts.put($k.text, $v.text); }
userOption[::shared_ptr<cql3::user_options> opts]
: k=K_PASSWORD v=STRING_LITERAL { opts->put($k.text, $v.text); }
;
#endif
/** DEFINITIONS **/
@@ -1157,7 +1205,8 @@ columnOperation[operations_type& operations]
columnOperationDifferentiator[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
: '=' normalColumnOperation[operations, key]
| '[' k=term ']' specializedColumnOperation[operations, key, k]
| '[' k=term ']' specializedColumnOperation[operations, key, k, false]
| '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' specializedColumnOperation[operations, key, k, true]
;
normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
@@ -1199,11 +1248,12 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
specializedColumnOperation[std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>,
shared_ptr<cql3::operation::raw_update>>>& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::term::raw> k]
shared_ptr<cql3::term::raw> k,
bool by_uuid]
: '=' t=term
{
add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t));
add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t, by_uuid));
}
;
@@ -1383,12 +1433,10 @@ tuple_type returns [shared_ptr<cql3::cql3_type::raw> t]
'>' { $t = cql3::cql3_type::raw::tuple(std::move(types)); }
;
#if 0
username
: IDENT
| STRING_LITERAL
;
#endif
// Basically the same as cident, but we need to exlude existing CQL3 types
// (which for some reason are not reserved otherwise)
@@ -1567,6 +1615,8 @@ K_OR: O R;
K_REPLACE: R E P L A C E;
K_DETERMINISTIC: D E T E R M I N I S T I C;
K_SCYLLA_TIMEUUID_LIST_INDEX: S C Y L L A '_' T I M E U U I D '_' L I S T '_' I N D E X;
// Case-insensitive alpha characters
fragment A: ('a'|'A');
fragment B: ('b'|'B');
@@ -1612,20 +1662,17 @@ STRING_LITERAL
setText(txt);
}
:
// FIXME:
#if 0
/* pg-style string literal */
(
'\$' '\$'
( /* collect all input until '$$' is reached again */
{ (input.size() - input.index() > 1)
&& !"$$".equals(input.substring(input.index(), input.index() + 1)) }?
=> c=. { txt.appendCodePoint(c); }
'$' '$'
(
(c=~('$') { txt.push_back(c); })
|
('$' (c=~('$') { txt.push_back('$'); txt.push_back(c); }))
)*
'\$' '\$'
'$' '$'
)
|
#endif
/* conventional quoted string literal */
(
'\'' (c=~('\'') { txt.push_back(c);} | '\'' '\'' { txt.push_back('\''); })* '\''

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2014 Cloudius Systems
* Copyright (C) 2014 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -0,0 +1,56 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "cql3/column_specification.hh"
namespace cql3 {
bool column_specification::all_in_same_table(const std::vector<::shared_ptr<column_specification>>& names)
{
assert(!names.empty());
auto first = names.front();
return std::all_of(std::next(names.begin()), names.end(), [first] (auto&& spec) {
return spec->ks_name == first->ks_name && spec->cf_name == first->cf_name;
});
}
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -75,6 +75,8 @@ public:
bool is_reversed_type() const {
return ::dynamic_pointer_cast<const reversed_type_impl>(type) != nullptr;
}
static bool all_in_same_table(const std::vector<::shared_ptr<column_specification>>& names);
};
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2014 Cloudius Systems, Ltd.
* Copyright (C) 2014 ScyllaDB
*/
/*
@@ -148,7 +148,7 @@ public:
try {
auto&& ks = db.find_keyspace(_name.get_keyspace());
try {
auto&& type = ks._user_types.get_type(_name.get_user_type_name());
auto&& type = ks.metadata()->user_types()->get_type(_name.get_user_type_name());
if (!_frozen) {
throw exceptions::invalid_request_exception("Non-frozen User-Defined types are not supported, please use frozen<>");
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2014 Cloudius Systems
* Copyright (C) 2014 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -58,6 +58,9 @@ class result_message;
namespace cql3 {
class metadata;
shared_ptr<const metadata> make_empty_metadata();
class cql_statement {
public:
virtual ~cql_statement()
@@ -70,7 +73,7 @@ public:
*
* @param state the current client state
*/
virtual void check_access(const service::client_state& state) = 0;
virtual future<> check_access(const service::client_state& state) = 0;
/**
* Perform additional validation required by the statment.
@@ -102,6 +105,15 @@ public:
virtual bool depends_on_keyspace(const sstring& ks_name) const = 0;
virtual bool depends_on_column_family(const sstring& cf_name) const = 0;
virtual shared_ptr<const metadata> get_result_metadata() const = 0;
};
class cql_statement_no_metadata : public cql_statement {
public:
virtual shared_ptr<const metadata> get_result_metadata() const override {
return make_empty_metadata();
}
};
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -50,8 +50,8 @@ namespace cql3 {
/**
* <code>ErrorListener</code> that collect and enhance the errors send by the CQL lexer and parser.
*/
template<typename Recognizer>
class error_collector : public error_listener<Recognizer> {
template<typename RecognizerType, typename TokenType, typename ExceptionBaseType>
class error_collector : public error_listener<RecognizerType, ExceptionBaseType> {
/**
* The offset of the first token of the snippet.
*/
@@ -81,25 +81,19 @@ public:
*/
error_collector(const sstring_view& query) : _query(query) {}
virtual void syntax_error(Recognizer& recognizer, const std::vector<sstring>& token_names) override {
// FIXME: stub
syntax_error(recognizer, "Parsing failed, detailed description construction not implemented yet");
virtual void syntax_error(RecognizerType& recognizer, ANTLR_UINT8** token_names, ExceptionBaseType* ex) override {
auto hdr = get_error_header(ex);
auto msg = get_error_message(recognizer, ex, token_names);
std::stringstream result;
result << hdr << ' ' << msg;
#if 0
String hdr = recognizer.getErrorHeader(e);
String msg = recognizer.getErrorMessage(e, tokenNames);
StringBuilder builder = new StringBuilder().append(hdr)
.append(' ')
.append(msg);
if (recognizer instanceof Parser)
appendQuerySnippet((Parser) recognizer, builder);
errorMsgs.add(builder.toString());
#endif
_error_msgs.emplace_back(result.str());
}
virtual void syntax_error(Recognizer& recognizer, const sstring& msg) override {
virtual void syntax_error(RecognizerType& recognizer, const sstring& msg) override {
_error_msgs.emplace_back(msg);
}
@@ -114,6 +108,60 @@ public:
}
}
private:
std::string get_error_header(ExceptionBaseType* ex) {
std::stringstream result;
result << "line " << ex->get_line() << ":" << ex->get_charPositionInLine();
return result.str();
}
std::string get_error_message(RecognizerType& recognizer, ExceptionBaseType* ex, ANTLR_UINT8** token_names)
{
using namespace antlr3;
std::stringstream msg;
switch (ex->getType()) {
case ExceptionType::UNWANTED_TOKEN_EXCEPTION: {
msg << "extraneous input " << get_token_error_display(recognizer, ex->get_token());
if (token_names != nullptr) {
std::string token_name;
if (recognizer.is_eof_token(ex->get_expecting())) {
token_name = "EOF";
} else {
token_name = reinterpret_cast<const char*>(token_names[ex->get_expecting()]);
}
msg << " expecting " << token_name;
}
break;
}
case ExceptionType::MISSING_TOKEN_EXCEPTION: {
std::string token_name;
if (token_names == nullptr) {
token_name = "(" + std::to_string(ex->get_expecting()) + ")";
} else {
if (recognizer.is_eof_token(ex->get_expecting())) {
token_name = "EOF";
} else {
token_name = reinterpret_cast<const char*>(token_names[ex->get_expecting()]);
}
}
msg << "missing " << token_name << " at " << get_token_error_display(recognizer, ex->get_token());
break;
}
case ExceptionType::NO_VIABLE_ALT_EXCEPTION: {
msg << "no viable alternative at input " << get_token_error_display(recognizer, ex->get_token());
break;
}
default:
ex->displayRecognitionError(token_names, msg);
}
return msg.str();
}
std::string get_token_error_display(RecognizerType& recognizer, const TokenType* token)
{
return "'" + recognizer.token_text(token) + "'";
}
#if 0
/**

Some files were not shown because too many files have changed in this diff Show More