Compare commits

..

191 Commits

Author SHA1 Message Date
Pekka Enberg
c4bd4e89ae release: prepare for 1.3.5 2016-11-29 09:47:38 +02:00
Raphael S. Carvalho
c5f43ecd0e main: fix exception handling when initializing data or commitlog dirs
Exception handling was broken because after io checker, storage_io_error
exception is wrapped around system error exceptions. Also the message
when handling exception wasn't precise enough for all cases. For example,
lack of permission to write to existing data directory.

Fixes #883.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com>
(cherry picked from commit 9a9f0d3a0f)
2016-11-16 15:13:44 +02:00
Paweł Dziepak
c923b6e20c row_cache: touch entries read during range queries
Fixes #1847.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1479230809-27547-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 999dafbe57)
2016-11-16 13:08:41 +00:00
Amnon Heiman
e9811897d2 API: cache_capacity should use uint for summing
Using integer as a type for the map_reduce causes number over overflow.

Fixes #1801

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1479299425-782-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit a4be7afbb0)
2016-11-16 15:04:24 +02:00
Paweł Dziepak
54c338d785 partition_version: make sure that snapshot is destroyed under LSA
Snapshot destructor may free some objects managed by the LSA. That's why
partition_snapshot_reader destructor explicitly destroys the snapshot it
uses. However, it was possible that exception thrown by _read_section
prevented that from happenning making snapshot destoryed implicitly
without current allocator set to LSA.

Refs #1831.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478778570-2795-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit f16d6f9c40)
2016-11-16 12:54:16 +00:00
Glauber Costa
f705be3518 histogram: moving averages: fix inverted parameters
moving_averages constructor is defined like this:

    moving_average(latency_counter::duration interval, latency_counter::duration tick_interval)

But when it is time to initialize them, we do this:

	... {tick_interval(), std::chrono::minutes(1)} ...

As it can be seen, the interval and tick interval are inverted. This
leads to the metrics being assigned bogus values.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <d83f09eed20ea2ea007d120544a003b2e0099732.1478798595.git.glauber@scylladb.com>
(cherry picked from commit d3f11fbabf)
2016-11-11 10:16:14 +02:00
Calle Wilund
16a9027552 auth::password_authenticator: Ensure exceptions are processed in continuation
Fixes #1718 (even more)
Message-Id: <1475497389-27016-1-git-send-email-calle@scylladb.com>

(cherry picked from commit 5b815b81b4)
2016-11-07 09:25:29 +02:00
Calle Wilund
23e792c9ea auth::password_authenticator: "authenticate" should not throw undeclared excpt
Fixes #1718

Message-Id: <1475487331-25927-1-git-send-email-calle@scylladb.com>
(cherry picked from commit d24d0f8f90)
2016-11-07 09:25:24 +02:00
Pekka Enberg
5294dd9eb2 Merge seastar submodule
* seastar b62d7a5...5adb964 (2):
  > file: make close() more robust against concurrent calls
  > rpc: Do not close client connection on error response for a timed out request
2016-11-07 09:25:02 +02:00
Raphael S. Carvalho
11323582d6 lcs: fix starvation at higher levels
When max sstable size is increased, higher levels are suffering from
starvation because we decide to compact a given level if the following
calculation results in a number greater than 1.001:
level_size(L) / max_size_for_level_l(L)

Fixes #1720.

For this backport, I needed to add schema as parameter to sstable
functions that return first and last decorated keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit a8ab4b8f37)
2016-11-05 11:26:01 +02:00
Raphael S. Carvalho
69948778c6 lcs: fix broken token range distribution at higher levels
Uniform token range distribution across sstables in a level > 1 was broken,
because we were only choosing sstable with lowest first key, when compacting
a level > 0. This resulted in performance problem because L1->L2 may have a
huge overlap over time, for example.
Last compacted key will now be stored for each level to ensure sort of
"round robin" selection of sstables for compactions at level >= 1.
That's also done by C*, and they were once affected by it as described in
https://issues.apache.org/jira/browse/CASSANDRA-6284.

Fixes #1719.

For this backport, I added schema parameter to compaction_strategy::
notify_completion() because sstable doesn't store schema here.
Most conflicts were that some interfaces take schema parameter at
this version.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit a3bf7558f2)
2016-11-05 11:26:01 +02:00
Pekka Enberg
ef4332ab09 release: prepare for 1.3.4 2016-11-03 12:09:13 +02:00
Pekka Enberg
18a99c00b0 cql3: Fix selecting same column multiple times
Under the hood, the selectable::add_and_get_index() function
deliberately filters out duplicate columns. This causes
simple_selector::get_output_row() to return a row with all duplicate
columns filtered out, which triggers and assertion because of row
mismatch with metadata (which contains the duplicate columns).

The fix is rather simple: just make selection::from_selectors() use
selection_with_processing if the number of selectors and column
definitions doesn't match -- like Apache Cassandra does.

Fixes #1367
Message-Id: <1477989740-6485-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit e1e8ca2788)
2016-11-01 09:34:49 +00:00
Pekka Enberg
cc3a4173f6 release: prepare for 1.3.3 2016-10-28 09:54:41 +03:00
Pekka Enberg
4dc196164d auth: Fix resource level handling
We use `data_resource` class in the CQL parser, which let's users refer
to a table resource without specifying a keyspace. This asserts out in
get_level() for no good reason as we already know the intented level
based on the constructor. Therefore, change `data_resource` to track the
level like upstream Cassandra does and use that.

Fixes #1790

Message-Id: <1477599169-2945-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit b54870764f)
2016-10-27 23:38:01 +03:00
Glauber Costa
31ba6325ef auth: always convert string to upper case before comparing
We store all auth perm strings in upper case, but the user might very
well pass this in upper case.

We could use a standard key comparator / hash here, but since the
strings tend to be small, the new sstring will likely be allocated in
the stack here and this approach yields significantly less code.

Fixes #1791.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <51df92451e6e0a6325a005c19c95eaa55270da61.1477594199.git.glauber@scylladb.com>
(cherry picked from commit ef3c7ab38e)
2016-10-27 22:10:04 +03:00
Tomasz Grabiec
dcd8b87eb8 Merge seastar upstream
* seastar b62d7a5...0fd8792 (1):
  > rpc: Do not close client connection on error response for a timed out request

Refs #1778
2016-10-25 13:56:36 +02:00
Tomasz Grabiec
6b53aad9fb partition_version: Fix corruption of partition_version list
The move constructor of partition_version was not invoking move
constructor of anchorless_list_base_hook. As a result, when
partition_version objects were moved, e.g. during LSA compaction, they
were unlinked from their lists.

This can make readers return invalid data, because not all versions
will be reachable.

It also casues leaks of the versions which are not directly attached
to memtable entry. This will trigger assertion failure in LSA region
destructor. This assetion triggers with row cache disabled. With cache
enabled (default) all segments are merged into the cache region, which
currently is not destroyed on shutdown, so this problem would go
unnoticed. With cache disabled, memtable region is destroyed after
memtable is flushed and after all readers stop using that memtable.

Fixes #1753.
Message-Id: <1476778472-5711-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit fe387f8ba0)
2016-10-18 11:00:19 +02:00
Pekka Enberg
762b156809 database: Fix io_priority_class related compilation error
Commit e6ef49e ("db: Do not timeout streaming readers") breaks compilation of database.cc:

  database.cc: In lambda function:
  database.cc:282:62: error: ‘const class io_priority_class’ has no member named ‘id’
               if (service::get_local_streaming_read_priority().id() == pc.id()) {
                                                              ^~
  database.cc:282:73: error: ‘const class io_priority_class’ has no member named ‘id’
               if (service::get_local_streaming_read_priority().id() == pc.id()) {

...because we don't have Seastar commit 823a404 ("io_priority_class:
remove non-explicit operator unsigned") backported.

Fix the issue by using the non-explicit operator instead of explicit id().

Acked-by: Tomasz Grabiec <tgrabiec@scylladb.com>
Message-Id: <1476425276-17171-1-git-send-email-penberg@scylladb.com>
2016-10-14 13:28:32 +03:00
Pekka Enberg
f3ed1b4763 release: prepare for 1.3.2 2016-10-13 20:34:46 +03:00
Paweł Dziepak
6618236fff query_pagers: fix clustering key range calculation
Paging code assumes that clustering row range [a, a] contains only one
row which may not be true. Another problem is that it tries to use
range<> interface for dealing with clustering key ranges which doesn't
work because of the lack of correct comparator.

Refs #1446.
Fixes #1684.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1475236805-16223-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit eb1fcf3ecc)
2016-10-10 16:08:09 +03:00
Asias He
ecb0e44480 gossip: Switch to use system_clock
The expire time which is used to decide when to remove a node from
gossip membership is gossiped around the cluster. We switched to steady
clock in the past. In order to have a consistent time_point in all the
nodes in the cluster, we have to use wall clock. Switch to use
system_clock for gossip.

Fixes #1704

(cherry picked from commit f0d3084c8b)
2016-10-09 18:12:46 +03:00
Tomasz Grabiec
e6ef49e366 db: Do not timeout streaming readers
There is a limit to concurrency of sstable readers on each shard. When
this limit is exhausted (currently 100 readers) readers queue. There
is a timeout after which queued readers are failed, equal to
read_request_timeout_in_ms (5s by default). The reason we have the
timeout here is primarily because the readers created for the purpose
of serving a CQL request no longer need to execute after waiting
longer than read_request_timeout_in_ms. The coordinator no longer
waits for the result so there is no point in proceeding with the read.

This timeout should not apply for readers created for streaming. The
streaming client currently times out after 10 minutes, so we could
wait at least that long. Timing out sooner makes streaming unreliable,
which under high load may prevent streaming from completing.

The change sets no timeout for streaming readers at replica level,
similarly as we do for system tables readers.

Fixes #1741.

Message-Id: <1475840678-25606-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 2a5a90f391)
2016-10-09 10:33:39 +03:00
Tomasz Grabiec
7d24a3ed56 transport: Extend request memory footprint accounting to also cover execution
CQL server is supposed to throttle requests so that they don't
overflow memory. The problem is that it currently accounts for
request's memory only around reading of its frame from the connection
and not actual request execution. As a result too many requests may be
allowed to execute and we may run out of memory.

Fixes #1708.
Message-Id: <1475149302-11517-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 7e25b958ac)
2016-10-03 14:15:21 +03:00
Avi Kivity
ae2a1158d7 Update seastar submodule
* seastar 9b541ef...b62d7a5 (1):
  > semaphore: Introduce get_units()
2016-10-03 14:11:26 +03:00
Asias He
f110c2456a gossip: Do not remove failure_detector history on remove_endpoint
Otherwise a node could wrongly think the decommissioned node is still
alive and not evict it from the gossip membership.

Backport: CASSANDRA-10371

7877d6f Don't remove FailureDetector history on removeEndpoint

Fixes #1714
Message-Id: <f7f6f1eec2aab1b97a2e568acfd756cca7fc463a.1475112303.git.asias@scylladb.com>

(cherry picked from commit 511f8aeb91)
2016-09-29 13:27:34 +03:00
Raphael S. Carvalho
37d27b1144 api: implement api to return sstable count per level
'nodetool cfstats' wasn't showing per-level sstable count because
the API wasn't implemented.

Fixes #1119.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0dcdf9196eaec1692003fcc8ef18c77d0834b2c6.1474410770.git.raphaelsc@scylladb.com>
(cherry picked from commit 67343798cf)
2016-09-26 09:51:33 +03:00
Tomasz Grabiec
ff674391c2 Merge sestar upstream
Refs #1622
Refs #1690

* seastar b58a287...9b541ef (2):
  > input_stream: Fix possible infinite recursion in consume()
  > iostream: Fix stack overflow in output_stream::split_and_put()
2016-09-22 14:58:44 +02:00
Asias He
55c9279354 gossip: Fix std::out_of_range in setup_collectd
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.

Fixes the issue:

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)

Fixes #1656

Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
(cherry picked from commit aa47265381)
2016-09-20 21:10:55 +03:00
Tomasz Grabiec
ef7b4c61ff tests: Add test for UUID type ordering
Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 2282599394)
2016-09-20 12:22:03 +02:00
Tomasz Grabiec
b9e169ead9 types: fix uuid_type_impl::less
timeuuid_type_impl::compare_bytes is a "trichotomic" comparator (-1,
0, 1) while less() is a "less" comparator (false, true). The code
incorrectly returns c1 instead of c1 < 0 which breaks the ordering.

Fixes #1196.
Message-Id: <1473956716-5209-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 804fe50b7f)
2016-09-20 12:22:00 +02:00
Shlomi Livne
dabbadcf39 release: prepare for 1.3.1
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-09-18 13:18:20 +03:00
Duarte Nunes
c9cb14e160 thrift: Correctly detect clustering range wrap around
This patch uses the clustering bounds comparator to correctly detect
wrap around of a clustering range in the thrift handler.

Refs #1446

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473952155-14886-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit bc3cbb7009)
2016-09-15 16:52:43 +01:00
Shlomi Livne
985352298d ami: Fix instructions how to run scylla_io_setup on non ephemeral instances
On instances differenet then i2/m3/c3 we provide instructions to run
scylla_ip_setup. Running scylla_io_setup requires access to
/var/lib/scylla to crate a temporary file. To gain access to that
directory the user should run 'sudo scylla_io_setup'.

refs: #1645

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4ce90ca1ba4da8f07cf8aa15e755675463a22933.1473935778.git.shlomi@scylladb.com>
(cherry picked from commit acb83073e2)
2016-09-15 15:27:02 +01:00
Paweł Dziepak
c2d347efc7 Merge "Fix abort when querying with contradicting clustering restrictions" from Tomek
"This series backports fixes for #1670 on top of 1.3 branch.

Fixes abort when querying with contradicting clustering column
restrictions, for example:

   SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2"
2016-09-15 14:55:28 +01:00
Tomasz Grabiec
78c7408927 Fix abort when querying with contradicting clustering restrictions
Example of affected query:

  SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2

Refs #1670.

This commit brings back the backport of "Don't allow CK wrapping
ranges" by Duarte by reverting commit 11d7f83d52.

It also has the following fix, which is introduced by the
aforementioned commit, squashed to improve bisectability:

"cql3: Consider bound type when detecting wrap around

 This patch uses the clustering bounds comparator to correctly detect
 wrap around of a clustering range. This fixes a manifestation of #1446,
 introduced by b1f9688432, where a query
 such as select * from cf where k = 0x00 and c0 = 0x02 and c1 > 0x02
 would result in a range containing a clustering key and a prefix,
 incorrectly ordered by the prefix equality or lexicographical
 comparators.

 Refs #1446

 Signed-off-by: Duarte Nunes <duarte@scylladb.com>
 (cherry picked from commit ee2694e27d)"
2016-09-14 19:50:49 +02:00
Duarte Nunes
5716decf60 bounds_view: Create from nonwrapping_range
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 084b931457)
2016-09-14 18:28:55 +02:00
Duarte Nunes
f60eb3958a range_tombstone: Extract out bounds_view
This patch extracts bounds_view from range_tombstone so its comprator
can be reused elsewhere.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 878927d9d2)
2016-09-14 18:28:55 +02:00
Tomasz Grabiec
7848781e5f database: Ignore spaces in initial_token list
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:

  initial_token: -1100081313741479381, -1104041856484663086, ...

Fixes #1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit a498da1987)
2016-09-14 12:03:06 +03:00
Pekka Enberg
195994ea4b Update scylla-ami submodule
* dist/ami/files/scylla-ami 14c1666...e1e3919 (1):
  > scylla_ami_setup: remove scylla_cpuset_setup
2016-09-07 21:05:33 +03:00
Takuya ASADA
183910b8b4 dist/common/scripts/scylla_sysconfig_setup: sync cpuset parameters with rps_cpus settings when posix_net_conf.sh is enabled and NIC is single queue
On posix_net_conf.sh's single queue NIC mode (which means RPS enabled mode), we are excluded cpu0 and it's sibling from network stack processing cpus, and assigned NIC IRQ to cpu0.
So always network stack is not working on cpu0 and it's sibling, to get better performance we need to exclude these cpus from scylla too.
To do this, we need to get RPS cpu mask from posix_net_conf.sh, pass it to scylla_cpuset_setup to construct /etc/scylla.d/cpuset.conf when scylla_setup executed.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-2-git-send-email-syuu@scylladb.com>
(cherry picked from commit 533dc0485d)
2016-09-07 21:00:30 +03:00
Takuya ASADA
f4746d2a46 dist/common/scripts/scylla_prepare: drop unnecesarry multiqueue NIC detection code on scylla_prepare
Right now scylla_prepare specifies -mq option to posix_net_conf.sh when number of RX queues > 1, but on posix_net_conf.sh it sets NIC mode to sq when queues < ncpus / 2.
So the logic is different, and actually posix_net_conf.sh does not need to specify -sq/-mq now, it autodetects queue mode.
So we need to drop detection logic from scylla_prepare, let posix_net_conf.sh to detect it.

Fixes #1406

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 0c3bb2ee63)
2016-09-07 20:59:10 +03:00
Pekka Enberg
fa7f990407 Update seastar submodule
* seastar e6571c4...b58a287 (3):
  > scripts/posix_net_conf.sh: supress 'ls: cannot access
  > /sys/class/net/<NIC>/device/msi_irqs/' error message
  > scripts/posix_net_conf.sh: fix 'command not found' error when
  > specifies --cpu-mask
  > scripts/posix_net_conf.sh: add support --cpu-mask mode
2016-09-07 20:58:09 +03:00
Pekka Enberg
86dbcf093b systemd: Don't start Scylla service until network is up
Alexandr Porunov reports that Scylla fails to start up after reboot as follows:

  Aug 25 19:44:51 scylla1 scylla[637]: Exiting on unhandled exception of type 'std::system_error': Error system:99 (Cannot assign requested address)

The problem is that because there's no dependency to network service,
Scylla simply attempts to start up too soon in the boot sequence and
fails.

Fixes #1618.

Message-Id: <1472212447-21445-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 2d3aee73a6)
2016-08-29 13:26:11 +03:00
Takuya ASADA
db89811fcc dist/common/scripts/scylla_setup: support enabling services on Ubuntu 15.10/16.04
Right now it ignores Ubuntu, but we shareing .service between Fedora/CentOS and Ubuntu >= 15.10, so support it.

Fixes #1556.

Message-Id: <1471932814-17347-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 74d994f6a1)
2016-08-29 13:26:08 +03:00
Duarte Nunes
9ec939f6a3 thrift: Avoid always recording size estimates
Size estimates for a particular column family are recorded every 5
minutes. However, when a user calls the describe_splits(_ex) verbs,
they may want to see estimates for a recently created and updated
column family; this is legitimate and common in testing. However, a
client may also call describe_splits(_ex) very frequently and
recording the estimates on every call is wasteful and, worse, can
cause clients to give up. This patch fixes this by only recording
estimates if the first attempt to query them produces no results.

Refs #1139

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471900595-4715-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 440c1b2189)
2016-08-29 12:08:53 +03:00
Pekka Enberg
d40f586839 dist/docker: Clean up Scylla description for Docker image
Message-Id: <1472145307-3399-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit c5e5e7bb40)
2016-08-29 10:49:09 +03:00
Raphael S. Carvalho
d55c55efec api: use estimation of pending tasks in compaction manager too
We have API for getting pending compaction tasks both in column
family and compaction manager. Column family is already returning
pending tasks properly.
Compaction manager's one is used by 'nodetool compactionstats', and
was returning a value which doesn't reflect pending compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a20b88938ad39e95f98bfd7f93e4d1666d1c6f95.1471641211.git.raphaelsc@scylladb.com>
(cherry picked from commit d8be32d93a)
2016-08-29 10:20:20 +03:00
Raphael S. Carvalho
dc79761c17 sstables: Fix estimation of pending tasks for leveled strategy
There were two underflow bugs.

1) in variable i, causing get_level() to see an invalid level and
throw an exception as a result.
2) when estimating number of pending tasks for a level.

Fixes #1603.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <cce993863d9de4d1f49b3aabe981c475700595fc.1471636164.git.raphaelsc@scylladb.com>
(cherry picked from commit 77d4cd21d7)
2016-08-29 10:19:30 +03:00
Paweł Dziepak
bbffb51811 mutation_partition: fix iterator invalidation in trim_rows
Reversed iterators are adaptors for 'normal' iterators. These underlying
iterators point to different objects that the reversed iterators
themselves.

The consequence of this is that removing an element pointed to by a
reversed iterator may invalidate reversed iterator which point to a
completely different object.

This is what happens in trim_rows for reversed queries. Erasing a row
can invalidate end iterator and the loop would fail to stop.

The solution is to introduce
reversal_traits::erase_dispose_and_update_end() funcion which erases and
disposes object pointed to by a given iterator but takes also a
reference to and end iterator and updates it if necessary to make sure
that it stays valid.

Fixes #1609.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 6012a7e733)
2016-08-25 17:40:37 +03:00
Amnon Heiman
ec3ace5aa3 housekeeping: Silently ignore check version if Scylla is not available
Normally, the check version should start and stop with the scylla-server
service.

If it fails to find scylla server, there is no need to check the
version, nor to report it, so it can stop silently.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 2b98335da4)
2016-08-23 18:11:44 +03:00
Amnon Heiman
1f2d1012be housekeeping: Use curl instead of Python's libraries
There is a problem with Python SSL's in Ubuntu 14.04:

  ubuntu@ip-10-81-165-156:~$ /usr/lib/scylla/scylla-housekeeping -q version
  Traceback (most recent call last):
    File "/usr/lib/scylla/scylla-housekeeping", line 94, in <module>
      args.func(args)
    File "/usr/lib/scylla/scylla-housekeeping", line 71, in check_version
      latest_version = get_json_from_url(version_url + "?version=" + current_version)["version"]
    File "/usr/lib/scylla/scylla-housekeeping", line 50, in get_json_from_url
      response = urllib2.urlopen(req)
    File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
      return _opener.open(url, data, timeout)
    File "/usr/lib/python2.7/urllib2.py", line 404, in open
      response = self._open(req, data)
    File "/usr/lib/python2.7/urllib2.py", line 422, in _open
      '_open', req)
    File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
      result = func(*args)
    File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
      return self.do_open(httplib.HTTPSConnection, req)
    File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
      raise URLError(err)
  urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

Instead of using Python libraries to connect to the check version
server, we will use curl for that.

Fixes #1600

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 4598674673)
2016-08-23 18:11:38 +03:00
Amnon Heiman
9c8cfd3c0e housekeeping: Add curl as a dependency
To work around an SSL problem with Python on Ubuntu 14.04, we need to
use curl. Add it as a dependency so that it's available on the host.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 91944b736e)
2016-08-23 18:11:13 +03:00
Takuya ASADA
d9e4ab38f6 dist/ubuntu: support scylla-housekeeping service on all Ubuntu versions
Current scylla-housekeeping support on Ubuntu has bug, it does not installs .service/.timer for Ubuntu 16.04.
So fix it to make it work.

Fixes #1502

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Tested-by: Amos Kong <amos@scylladb.com>
Message-Id: <1471607903-14889-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 80f7449095)
2016-08-23 13:50:43 +03:00
Takuya ASADA
5d72e96ccc dist/common/systemd: don't use .in for scylla-housekeeping.*, since these are not template file
.in is the name for template files witch requires to rewrite on building time, but these systemd unit files does not require rewrite, so don't name .in, reference directly from .spec.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1471607533-3821-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit aac60082ae)
2016-08-23 13:50:36 +03:00
Paweł Dziepak
a072df0e09 sstables: do not call consume_end_partition() after proceed::no
After state_processor().process_state() returns proceed::no the upper
layer should have a chance to act before more data is pushed to the
consumer. This means that in case of proceed::no verify_end_state()
should not be called immediately since it may invoke
consume_end_partition().

Fixes #1605.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471943032-7290-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 5feed84e32)
2016-08-23 12:24:59 +03:00
Pekka Enberg
8291ec13fa dist/docker: Separate supervisord config files
Move scylla-server and scylla-jmx supervisord config files to separate
files and make the main supervisord.conf scan /etc/supervisord.conf.d/
directory. This makes it easier for people to extend the Docker image
and add their own services.

Message-Id: <1471588406-25444-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 9d1d8baf37)
2016-08-23 11:56:24 +03:00
Vlad Zolotarov
c485551488 tracing::trace_state: fix a compilation error with gcc 4.9
See #1602.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1471774784-26266-1-git-send-email-vladz@cloudius-systems.com>
2016-08-21 16:39:50 +03:00
Paweł Dziepak
b85164bc1d sstables: optimise clustering rows filtering
Clustering rows in the sstables are sorted in the ascending order so we
can use that to minimise number of comparisons when checking if a row is
in the requested range.

Refs #1544.

Paweł further explains the backport rationale for 1.3:

"Apart from making sense on its own, this patch has a very curious
property
of working around #1544 in a way that doesn't make #1446 hit us harder
than
usual.
So, in the branch-1.3 we can:
 - revert 85376ce555
   'Merge "Don't allow CK wrapping ranges" from Duarte' -- previous,
   insufficient workaround for #1544
 - apply this patch
 - rejoice as cql_query_test passes and #1544 is no longer a problem

The scenario above assumes that this patch doesn't introduces any
regressions."

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Reviewed-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <1471608921-30818-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit e60bb83688)
2016-08-19 18:11:49 +03:00
Pekka Enberg
11d7f83d52 Revert "Merge "Don't allow CK wrapping ranges" from Duarte"
This reverts commit 85376ce555, reversing
changes made to 3f54e0c28e.

The change breaks CQL range queries.
2016-08-19 18:11:31 +03:00
Pekka Enberg
5c4a24c1c0 dist/docker: Use Scylla mascot as the logo
Glauber "eagle eyes" Costa pointed out that the Scylla logo used in our
Docker image documentation looks broken because it's missing the Scylla
text.

Fix the problem by using the Scylla mascot instead.
Message-Id: <1471525154-2800-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 2bf5e8de6e)
2016-08-19 12:50:46 +03:00
Pekka Enberg
306eeedf3e dist/docker: Fix bug tracker URL in the documentation
The bug tracker URL in our Docker image documentation is not clickable
because the URL Markdown extracts automatically is broken.

Fix that and add some more links on how to get help and report issues.
Message-Id: <1471524880-2501-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 4d90e1b4d4)
2016-08-19 12:50:42 +03:00
Pekka Enberg
9eada540d9 release: prepare for 1.3.0 2016-08-18 16:20:21 +03:00
Yoav Kleinberger
a662765087 docker: extend supervisor capabilities
allow user to use the `supervisorctl' program to start and stop
services. `exec` needed to be added to the scylla and scylla-jmx starter
scripts - otherwise supervisord loses track of the actual process we
want to manage.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1471442960-110914-1-git-send-email-yoav@scylladb.com>
(cherry picked from commit 25fb5e831e)
2016-08-18 15:41:08 +03:00
Pekka Enberg
192f89bc6f dist/docker: Documentation cleanups
- Fix invisible characters to be space so that Markdown to PDF
  conversion works.

- Fix formatting of examples to be consistent.

- Spellcheck.

Message-Id: <1471514924-29361-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 1553bec57a)
2016-08-18 13:14:21 +03:00
Pekka Enberg
b16bb0c299 dist/docker: Document image command line options
This patch documents all the command line options Scylla's Docker image supports.

Message-Id: <1471513755-27518-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 4ca260a526)
2016-08-18 13:14:16 +03:00
Amos Kong
cd9d967c44 systemd: have the first housekeeping check right after start
Issue: https://github.com/scylladb/scylla/issues/1594

Currently systemd run first housekeeping check at the end of
first timer period. We expected it to be run right after start.

This patch makes systemd to be consistent with upstart.

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <4cc880d509b0a7b283278122a70856e21e5f1649.1471433388.git.amos@scylladb.com>
(cherry picked from commit 9d53305475)
2016-08-17 16:02:25 +03:00
Avi Kivity
236b089b03 Merge "Fixes for streamed_mutation_from_mutation" from Paweł
"This series contains fixes for two memory leaks in
streamed_mutation_from_mutation.

Fixes #1557."

(cherry picked from commit 4871b19337)
2016-08-17 13:26:25 +03:00
Avi Kivity
9d54b33644 Merge 2016-08-17 13:25:49 +03:00
Benoit Canet
4ef6c3155e systemd: Remove WorkingDirectory directive
The WorkingDirectory directive does not support environment variables on
systemd version that is shipped with Ubuntu 16.04. Fortunately, not
setting WorkingDirectory implicitly sets it to user home directory,
which is the same thing (i.e. /var/lib/scylla).

Fixes #1319

Signed-of-by: Benoit Canet <benoit@scylladb.com>
Message-Id: <1470053876-1019-1-git-send-email-benoit@scylladb.com>
(cherry picked from commit 90ef150ee9)
2016-08-17 12:34:44 +03:00
Takuya ASADA
fe529606ae dist/common/scripts: mkdir -p /var/lib/scylla/coredump before symlinking
We are creating this dir in scylla_raid_setup, but user may create XFS volume w/o using the command, scylla_coredump_setup should work on such condition.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470638615-17262-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 60ce16cd54)
2016-08-16 12:35:15 +03:00
Avi Kivity
85376ce555 Merge "Don't allow CK wrapping ranges" from Duarte
"This pathset ensures user-specified clustering key ranges are never
wrapping, as those types of ranges are not defined for CQL3.

Fixes #1544"
2016-08-16 10:09:31 +03:00
Duarte Nunes
5e8ac82614 cql3: Discard wrap around ranges.
Wrapping ranges are not supported in CQL3. If one is specified,
this patch converts it to an empty range.

Fixes #1544

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:22:44 +00:00
Duarte Nunes
22c8520d61 storage_proxy: Short circuit query without clustering ranges
This patch makes the storage_proxy return an empty result when the
query doesn't define any clustering ranges (default or specific).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:05:23 +00:00
Duarte Nunes
e7355c9b60 thrift: Don't always validate clustering range
This patch makes make_clustering_range not enforce that the range be
non-wrapping, so that it can be validated differently if needed. A
make_clustering_range_and_validate function is introduced that keeps
the old behavior.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:05:18 +00:00
Paweł Dziepak
3f54e0c28e partition_version: handle errors during version merge
Currently, partition snapshot destructor can throw which is a big no-no.
The solution is to ignore the exception and leave versions unmerged and
hope that subsequent reads will succeed at merging.

However, another problem is that the merge doesn't use allocating
sections which means that memory won't be reclaimed to satisfy its
needs. If the cache is full this may result in partition versions not
being merged for a very long time.

This patch introduces partition_snapshot::merge_partition_versions()
which contains all the version merging logic that was previously present
in the snapshot destructor. This function may throw so that it can be
used with allocating sections.

The actual merging and handling of potential erros is done from
partition_snapshot_reader destructor. It tries to merge versions under
the allocating section. Only if that fails it gives up and leaves them
unmerged.

Fixes #1578
Fixes #1579.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471265544-23579-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 5cae44114f)
2016-08-15 15:57:10 +03:00
Asias He
4c6f8f9d85 gossip: Add heart_beat_version to collectd
$ tools/scyllatop/scyllatop.py '*gossip*'

node-1/gossip-0/gauge-heart_beat_version 1.0
node-2/gossip-0/gauge-heart_beat_version 1.0
node-3/gossip-0/gauge-heart_beat_version 1.0

Gossip heart beat version changes every second. If everyting is working
correctly, the gauge-heart_beat_version output should be 1.0. If not,
the gauge-heart_beat_version output should be less than 1.0.

Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>
(cherry picked from commit ef782f0335)
2016-08-15 12:32:17 +03:00
Nadav Har'El
7a76157cb9 sstables: don't forget to read static row
[v2: fix check for static column (don't check if the schema is not compound)
 and move want-static-columns flag inside the filtering context to avoid
 changing all the callers.]

When a CQL request asks to read only a range of clustering keys inside
a partition, we actually need to read not just these clustering rows, but
also the static columns and add them to the response (as explained by Tomek
in issue #1568).

With the current code, that CQL request is translated into an
sstable::read_row() with a clustering-key filter. But this currently
only reads the requested clustering keys - NOT the static columns.

We don't want sstable::read_row() to unconditionally read the from disk
the static columns because if, for example, they are already cached, we
might not want to read them from disk. We don't have such partial-partition
cache yet, but we are likely to have one in the future.

This patch adds in the clustering key filter object a flag of whether we
need to read the static columns (actually, it's function, returning this
flag per partition, to match the API for the clustering-key filtering).

When sstable::read_row() sees the flag for this partition is true, it also
request to read the static columns.
Currently, the code always passes "true" for this flag - because we don't
have the logic to cache partially-read partitions.

The current find_disk_ranges() code does not yet support returning a non-
contiguous byte range, so this patch, if it notices that this partition
really has static columns in addition to the range it needs to read,
falls back to reading the entire partition. This is a correct solution
(and fixes #1568) but not the most efficient solution. Because static
columns are relatively rare, let's start with this solution (correct
by less efficient when there are static columns) and providing the non-
contiguous reading support is left as a FIXME.

Fixes #1568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1471124536-19471-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit 0d00da7f7f)
2016-08-15 12:30:36 +03:00
Amnon Heiman
b2e6a52461 scylla.spec: conditionally include the housekeeping.cfg in the conf package
When the housekeeping configuration name was changed from conf to cfg it
was no longer included as part of the conf rpm.

This change adds a macro that determines of if the file should be
included or not and use that marco to conditionally add the
configuration file to the rpm.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1471169042-19099-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit 612f677283)
2016-08-14 13:26:25 +03:00
Tomasz Grabiec
b1376fef9b partition_version: Add missing linearization context
Snapshot removal merges partitions, and cell merging must be done
inside linearization context.

Fixes #1574

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471010625-18019-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 1b2ea14d0e)
2016-08-12 17:56:21 +03:00
Piotr Jastrzebski
23f4813a48 Fix after free access bug in storage proxy
Due to speculative reads we can't guarantee that all
fibers started by storage_proxy::query will be finished
by the time the method returns a result.

We need to make sure that no parameter passed to this
method ever changes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>
(cherry picked from commit f212a6cfcb)
2016-08-12 16:35:45 +02:00
Duarte Nunes
c16c3127fe docker: If set, broadcast address is seed
This patch configures the broadcast address to be the seed if it is
configured, otherwise Scylla complains about it and aborts.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470863058-1011-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 918a2939ff)
2016-08-12 11:47:08 +03:00
Tomasz Grabiec
48fdeb47e2 Merge branch 'raphael/fix_min_max_metadata_v2' from git@github.com:raphaelsc/scylla.git
Fix for generation of sstables min/max clustering metadata from Raphael.

(cherry picked from commit d7f8ce7722)
2016-08-11 17:53:01 +03:00
Avi Kivity
9ef4006d67 Update seastar submodule
* seastar 36a8ebe...e6571c4 (1):
  > reactor: Do not test for poll mode default
2016-08-11 14:45:52 +03:00
Amnon Heiman
75c53e4f24 scylla-housekeeping: rename configuration file from conf to cfg
Files with a conf extension are run by the scylla_prepare on the AMI.
The scylla-housekeeping configuration file is not a bash script and
should not be run.

This patch changes its extension to cfg which is more python like.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470896759-22651-2-git-send-email-amnon@scylladb.com>
(cherry picked from commit 5a4fc9c503)
2016-08-11 14:45:11 +03:00
Tomasz Grabiec
66292c0ef0 sstables: Fix bug in promoted index generation
maybe_flush_pi_block, which is called for each cell, assumes that
block_first_colname will be empty when the first cell is encountered
for each partition.

This didn't hold after writing partition which generated no index
entry, because block_first_colname was cleared only when there way any
data written into the promoted index. Fix by always clearing the name.

The effect was that the promoted index entry for the next partition
would be flushed sooner than necessary (still counting since the start
of the previous partition) and with offset pointing to the start of
the current partition. This will cause parsing error when such sstable
is read through promoted index entry because the offset is assumed to
point to a cell not to partition start.

Fixes #1567

Message-Id: <1470909915-4400-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit f1c2481040)
2016-08-11 13:09:05 +03:00
Amnon Heiman
84f7d9a49c build_deb: Add dist flag
The dist flag mark the debian package as distributed package.
As such the housekeeping configuration file will be included in the
package and will not need to be created by the scylla_setup.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470907208-502-2-git-send-email-amnon@scylladb.com>
(cherry picked from commit a24941cc5f)
2016-08-11 12:25:28 +03:00
Pekka Enberg
f0535eae9b dist/docker: Fix typo in "--overprovisioned" help text
Reported by Mathias Bogaert (@analytically).
Message-Id: <1470904395-4614-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit d1a052237d)
2016-08-11 11:49:42 +03:00
Avi Kivity
4f096b60df Update seastar submodule
* seastar de789f1...36a8ebe (1):
  > reactor: fix I/O queue pending requests collectd metric

Fixes #1558.
2016-08-10 15:28:09 +03:00
Pekka Enberg
4ac160f2fe release: prepare for 1.3.rc3 2016-08-10 13:53:53 +03:00
Avi Kivity
395edc4361 Merge 2016-08-10 13:34:48 +03:00
Avi Kivity
e2c9feafa3 Merge "Add configuration file to scylla-housekeeping" from Amnon
"The series adds an optional configuration file to the scylla-housekeeping. The
file act as a way to prevent the scylla-housekeeping to run. A missing
configuration file, will make the scylla-housekeeping immediately.

The series adds a flag to the build_rpm that differentiate between public
distributions that would contain the configuration file and private
distributions that will not contain it which will cause the setup script to
create it."

(cherry picked from commit da4d33802e)
2016-08-10 13:34:04 +03:00
Avi Kivity
f4dea17c19 Merge "housekeeping: Switch to pytho2 and handle version" from Amnon
This series handle two issues:
* Moving to python2, though python3 is supported, there are modules that we
  need that are not rpm installable, python3 would wait when it will be more
  mature.

* Check version should send the current version when it check for a new one and
  a simple string compare is wrong.

(cherry picked from commit ec62f0d321)
2016-08-10 13:31:50 +03:00
Amnon Heiman
a45b72b66f scylla-housekeeping: check version should use the current version
This patch handle two issues with check version:
* When checking for a version, the script send the current version
* Instead of string compare it uses parse_version to compare the
versions.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 406fa11cc5)
2016-08-10 13:29:53 +03:00
Amnon Heiman
be1c2a875b scylla-housekeeping: Switchng to pythno2
There is a problem with python module installation in pythno3,
especially on centos. Though pytho34 has a normal package, alot of the
modules are missing yum installation and can only be installed by pip.

This patch switch the  scylla-housekeeping implementation to use
python2, we should switch back to python3 when CeontOS python3 will be
more mature.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 641e5dc57c)
2016-08-10 13:23:47 +03:00
Nadav Har'El
0b9f83c6b6 sstable: avoid copying non-existant value
The promoted-index reading code contained a bug where it copied the value
of an disengaged optional (this non-value was never used, but it was still
copied ). Fix it by keeping the optional<> as such longer.

This bug caused tests/sstable_test in the debug build to crash (the release
build somehow worked).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470742418-8813-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit e005762271)
2016-08-10 13:14:49 +03:00
Pekka Enberg
0d77615b80 cql3: Filter compaction strategy class from compaction options
Cassandra 2.x does not store the compaction strategy class in compaction
options so neither should we to avoid confusing the drivers.

Fixes #1538.
Message-Id: <1470722615-29106-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 9ff242d339)
2016-08-10 12:44:50 +03:00
Pekka Enberg
8771220745 dist/docker: Add '--smp', '--memory', and '--overprovisoned' options
Add '--smp', '--memory', and '--overprovisioned' options to the Docker
image. The options are written to /etc/scylla.d/docker.conf file, which
is picked up by the Scylla startup scripts.

You can now, for example, restrict your Docker container to 1 CPU and 1
GB of memory with:

   $ docker run --name some-scylla penberg/scylla --smp 1 --memory 1G --overprovisioned 1

Needed by folks who want to run Scylla on Docker in production.

Cc: Sasha Levin <alexander.levin@verizon.com>
Message-Id: <1470680445-25731-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 6a5ab6bff4)
2016-08-10 11:54:01 +03:00
Avi Kivity
f552a62169 Update seastar submodule
* seastar ee1ecc5...de789f1 (1):
  > Merge "Fix the SMP queue poller" from Tomasz

Fixes #1553.
2016-08-10 09:54:15 +03:00
Avi Kivity
696a978611 Update seastar submodule
* seastar 0b53ab2...ee1ecc5 (1):
  > byteorder: add missing cpu_to_be(), be_to_cpu() functions

Fixes build failure.
2016-08-10 09:51:35 +03:00
Nadav Har'El
0475a98de1 Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit c2e4f5ba16)
2016-08-09 17:54:54 +03:00
Nadav Har'El
0b69e37065 Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit bce020efbd)
2016-08-09 17:54:49 +03:00
Avi Kivity
dc6be68852 Merge "promoted index for reading partial partitions" from Nadav
"The goal of this patch series is to support reading and writing of a
"promoted index" - the Cassandra 2.* SSTable feature which allows reading
only a part of the partition without needing to read an entire partition
when it is very long. To make a long story short, a "promoted index" is
a sample of each partition's column names, written to the SSTable Index
file with that partition's entry. See a longer explanation of the index
file format, and the promoted index, here:

     https://github.com/scylladb/scylla/wiki/SSTables-Index-File

There are two main features in this series - first enabling reading of
parts of partitions (using the promoted index stored in an sstable),
and then enable writing promoted indexes to new sstables. These two
features are broken up into smaller stand-alone pieces to facilitate the
review.

Three features are still missing from this series and are planned to be
developed later:

1. When we fail to parse a partition's promoted index, we silently fall back
   to reading the entire partition. We should log (with rate limiting) and
   count these errors, to help in debugging sstable problems.

2. The current code only uses the promoted index when looking for a single
   contiguous clustering-key range. If the ck range is non-contiguous, we
   fall back to reading the entire partition. We should use the promoted
   index in that case too.

3. The current code only uses the promoted index when reading a single
   partition, via sstable::read_row(). When scanning through all or a
   range of partitions (read_rows() or read_range_rows()), we do not yet
   use the promoted index; We read contiguously from data file (we do not
   even read from the index file, so unsurprisingly we can't use it)."

(cherry picked from commit 700feda0db)
2016-08-09 17:54:15 +03:00
Avi Kivity
8c20741150 Revert "sstables: promoted index write support"
This reverts commit c0e387e1ac.  The full
patchset needs to be backported instead.
2016-08-09 17:53:24 +03:00
Avi Kivity
3e3eaa693c Revert "Fix failing tests"
This reverts commit 8d542221eb.  It is needed,
but prevents another revert from taking place.  Will be reinstated later
2016-08-09 17:52:57 +03:00
Avi Kivity
03ef0a9231 Revert "Avoid some warnings in debug build"
This reverts commit 47bf8181af.  It is needed,
but prevents another revert from taking place.  Will be reinstated later.
2016-08-09 17:52:09 +03:00
Nadav Har'El
47bf8181af Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit c2e4f5ba16)
2016-08-09 16:58:27 +03:00
Nadav Har'El
8d542221eb Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit bce020efbd)
2016-08-09 16:58:27 +03:00
Nadav Har'El
c0e387e1ac sstables: promoted index write support
This patch adds writing of promoted index to sstables.

The promoted index is basically a sample of columns and their positions
for large partitions: The promoted index appears in the sstable's index
file for partitions which are larger than 64 KB, and divides the partition
to 64 KB blocks (as in Cassandra, this interval is configurable through
the column_index_size_in_kb config parameter). Beyond modifying the index
file, having a promoted index may also modify the data file: Since each
of blocks may be read independently, we need to add in the beginning of
each block the list of range tombstones that are still open at that
position.

See also https://github.com/scylladb/scylla/wiki/SSTables-Index-File

Fixes #959

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 0d8463aba5)
2016-08-09 16:58:27 +03:00
Duarte Nunes
57d3dc5c66 thrift: Set default validator
This patch sets the default validator for dynamic column families.
Doing so has no consequences in terms of behavior, but it causes the
correct type to be shown when describing the column family through
cassandra-cli.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470739773-30497-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 0ed19ec64d)
2016-08-09 13:56:43 +02:00
Duarte Nunes
2daee0b62d thrift: Send empty col metadata when describing ks
This patch ensures we always send the column metadata, even when the
column family is dynamic and the metadata is empty, as some clients
like cassandra-cli always assume its presence.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470740971-31169-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit f63886b32e)
2016-08-09 14:34:14 +03:00
Pekka Enberg
3eddf5ac54 dist/docker: Document data volume and cpuset configuration
Message-Id: <1470649675-5648-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 3b31d500c8)
2016-08-09 11:15:57 +03:00
Pekka Enberg
42d6f389f9 dist/docker: Add '--broadcast-rpc-address' command line option
We already have a '--broadcat-address' command line option so let's add
the same thing for RPC broadcast address configuration.

Message-Id: <1470656449-11038-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 4372da426c)
2016-08-09 11:15:53 +03:00
Pekka Enberg
1a6f6f1605 Update scylla-ami submodule
* dist/ami/files/scylla-ami 2e599a3...14c1666 (1):
  > setup coredump on first startup
2016-08-09 11:10:20 +03:00
Avi Kivity
8d8e997f5a Update scylla-ami submodule
* dist/ami/files/scylla-ami 863cc45...2e599a3 (1):
  > Do not set developer-mode on unsupported instance types
2016-08-07 17:52:13 +03:00
Takuya ASADA
50ee889679 dist/ami/files: add a message for privacy policy agreement on login prompt
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470212054-351-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 3d45d6579b)
2016-08-07 17:40:56 +03:00
Duarte Nunes
325f917d8a system_keyspace: Correctly deal with wrapped ranges
This patch ensures we correctly deal with ranges that wrap around when
querying the size_estimates system table.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470412433-7767-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit e0a43a82c6)
2016-08-07 17:21:58 +03:00
Takuya ASADA
b088dd7d9e dist/ami/files: show warning message for unsupported instance types
Notify users to run scylla_io_setup before lunching scylla on unsupported instance types.

Fixes #1511

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470090415-8632-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit bd1ab3a0ad)
2016-08-05 09:51:27 +03:00
Takuya ASADA
a42b2bb0d6 dist/ami: Install scylla metapackage and debuginfo on AMI
Install scylla metapackage and debuginfo on AMI to make AMI to report bugs easier.
Fixes #1496

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469635071-16821-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 9b59bb59f2)
2016-08-05 09:48:19 +03:00
Takuya ASADA
aecda01f8e dist/common/scripts: disable coredump compression by default, add an argument to enable compression on scylla_coredump_setup
On large memory machine compression takes too long, so disable it by default.
Also provide a way to enable it again.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469706934-6280-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 89b790358e)
2016-08-05 09:47:17 +03:00
Takuya ASADA
f9b0a29def dist/ami: setup correct repository when --localrpm specified
There was no way to setup correct repo when AMI is building by --localrpm option, since AMI does not have access to 'version' file, and we don't passed repo URL to the AMI.
So detect optimal repo path when starting build AMI, passes repo URL to the AMI, setup it correctly.

Note: this changes behavor of build_ami.sh/scylla_install_pkg's --repo option.
It was repository URL, but now become .repo/.list file URL.
This is optimal for the distribution which requires 3rdparty packages to install scylla, like CentOS7.
Existing shell scripts which invoking build_ami.sh are need to change in new way, such as our Jenkins jobs.

Fixes #1414

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469636377-17828-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit d3746298ae)
2016-08-05 09:45:22 +03:00
Pekka Enberg
192e935832 dist/docker: Use Scylla 1.3 RPM repository 2016-08-05 09:13:08 +03:00
Avi Kivity
436ff3488a Merge "Docker image fixes" from Pekka
"Kubernetes is unhappy with our Docker image because we start systemd
under the hood. Fix that by switching to use "supervisord" to manage the
two processes -- "scylla" and "scylla-jmx":

  http://blog.kunicki.org/blog/2016/02/12/multiple-entrypoints-in-docker/

While at it, fix up "docker logs" and "docker exec cqlsh" to work
out-of-the-box, and update our documentation to match what we have.

Further work is needed to ensure Scylla production configuration works
as expected and is documented accordingly."

(cherry picked from commit 28ee2bdbd2)
2016-08-05 09:10:51 +03:00
Benoît Canet
b91712fc36 docker: Add documentation page for Docker Hub
Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466438296-5593-1-git-send-email-benoit@scylladb.com>
(cherry picked from commit 4ce7bced27)
2016-08-05 09:10:48 +03:00
Yoav Kleinberger
be954ccaec docker: bring docker image closer to a more 'standard' scylla installation
Previously, the Docker image could only be run interactively, which is
not conducive for running clusters. This patch makes the docker image
run in the background (using systemd). This makes the docker workflow
similar to working with virtual machines, i.e. the user launches a
container, and once it is running they can connect to it with

       docker exec -it <container_name> bash

and immediately use `cqlsh` to control it.

In addition, the configuration of scylla is done using established
scripts, such as `scylla_dev_mode_setup`, `scylla_cpuset_setup` and
`scylla_io_setup`, whereas previously code from these scripts was
duplicated into the docker startup file.

To specify seeds for making a cluster, use the --seeds command line
argument, e.g.

    docker run -d --privileged scylladb/scylla
    docker run -d --privileged scylladb/scylla --seeds 172.17.0.2

other options include --developer-mode, --cpuset, --broadcast-address

The --developer-mode option mode is on by default - so that we don't fail users
who just want to play with this.

The Dockerfile entrypoint script was rewritten as a few Python modules.
The move to Python is meritted because:

    * Using `sed` to manipulate YAML is fragile
    * Lack of proper command line parsing resulted in introducing ad-hoc environment variables
    * Shell scripts don't throw exceptions, and it's easy to forget to check exit codes for every single command

I've made an effort to make the entrypoint `go' script very simple and readable.
The goary details are hidden inside the other python modules.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1468938693-32168-1-git-send-email-yoav@scylladb.com>
(cherry picked from commit d1d1be4c1a)
2016-08-05 09:10:35 +03:00
Glauber Costa
2bffa8af74 logalloc: make sure allocations in release_requests don't recurse back into the allocator
Calls like later() and with_gate() may allocate memory, although that is not
very common. This can create a problem in the sense that it will potentially
recurse and bring us back to the allocator during free - which is the very thing
we are trying to avoid with the call to later().

This patch wraps the relevant calls in the reclaimer lock. This do mean that the
allocation may fail if we are under severe pressure - which includes having
exhausted all reserved space - but at least we won't recurse back to the
allocator.

To make sure we do this as early as possible, we just fold both release_requests
and do_release_requests into a single function

Thanks Tomek for the suggestion.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <980245ccc17960cf4fcbbfedb29d1878a98d85d8.1470254846.git.glauber@scylladb.com>
(cherry picked from commit fe6a0d97d1)
2016-08-04 11:17:54 +02:00
Glauber Costa
4a6d0d503f logalloc: make sure blocked requests memory allocations are served from the standar allocator
Issue 1510 describes a scenario in which, under load, we allocate memory within
release_requests() leading to a reentry into an invalid state in our
blocked requests' shared_promise.

This is not easy to trigger since not all allocations will actually get to the
point in which they need a new segment, let alone have that happening during
another allocator call.

Having those kinds of reentry is something we have always sought to avoid with
release_requests(): this is the reason why most of the actual routine is
deferred after a call to later().

However, that is a trick we cannot use for updating the state of the blocked
requests' shared_promise: we can't guarantee when is that going to run, and we
always need a valid shared_promise, in a valid state, waiting for new requests
to hook into.

The solution employed by this patch is to make sure that no allocation
operations whatsoever happen during the initial part of release_requests on
behalf of the shared promise.  Allocation is now deferred to first use, which
relieves release_requests() from all allocation duties. All it needs to do is
free the old object and signal to the its user that an allocation is needed (by
storing {} into the shared_promise).

Fixes #1510

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <49771e51426f972ddbd4f3eeea3cdeef9cc3b3c6.1470238168.git.glauber@scylladb.com>
(cherry picked from commit ad58691afb)
2016-08-04 11:17:49 +02:00
Avi Kivity
fa81385469 conf: synchronize internode_compression between scylla.yaml and code
Our default is "none", to give reasonable performance, so have scylla.yaml
reflect that.

(cherry picked from commit 9df4ac53e5)
2016-08-04 12:10:03 +03:00
Duarte Nunes
93981aaa93 schema_builder: Ensure dense tables have compact col
This patch ensures that when the schema is dense, regardless of
compact_storage being set, the single regular columns is translated
into a compact column.

This fixes an issue where Thrift dynamic column families are
translated to a dense schema with a regular column, instead of a
compact one.

Since a compact column is also a regular column (e.g., for purposes of
querying), no further changes are required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470062410-1414-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5995aebf39)

Fixes #1535.
2016-08-03 13:49:51 +02:00
Duarte Nunes
89b40f54db schema: Dense schemas are correctly upgrades
When upgrading a dense schema, we would drop the cells of the regular
(compact) column. This patch fixes this by making the regular and
compact column kinds compatible.

Fixes #1536

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470172097-7719-1-git-send-email-duarte@scylladb.com>
2016-08-03 13:37:57 +02:00
Paweł Dziepak
99dfbedf36 sstables: extend sstable life until reader is fully closed
data_consume_rows_context needs to have close() called and the returned
future waited for before it can be destroyed. data_consume_context::impl
does that in the background upon its destruction.

However, it is possible that the sstable is removed before
data_consume_rows_context::close() completes in which case EBADF may
happen. The solution is to make data_consume_context::impl keep a
reference to the sstable and extend its life time until closing of
data_consume_rows_context (which is performed in the background)
completes.

Side effect of this change is also that data_consume_context no longer
requires its user to make sure that the sstable exists as long as it is
in use since it owns its own reference to it.

Fixes #1537.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1470222225-19948-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 02ffc28f0d)
2016-08-03 13:19:50 +02:00
Paweł Dziepak
e95f4eaee4 Merge "partition_limit: Don't count dead partitions" from Duarte
"This patch series ensures we don't count dead partitions (i.e.,
partitions with no live rows) towards the partition_limit. We also
enforce the partition limit at the storage_proxy level, so that
limits with smp > 1 works correctly."

(cherry picked from commit 5f11a727c9)
2016-08-03 12:44:32 +03:00
Avi Kivity
2570da2006 Update seastar submodule
* seastar f603f88...0b53ab2 (2):
  > reactor: limit task backlog
  > reactor: make sure a poll cycle always happens when later is called

Fix runaway task queue growth on cpu-bound loads.
2016-08-03 12:33:54 +03:00
Tomasz Grabiec
b224ff6ede Merge 'pdziepak/row-cache-wide-entries/v4' from seastar-dev.git
This series adds the ability for partition cache to keep information
whether partition size makes it uncacheable. During, reads these
entries save us IO operations since we already know that the partiiton
is too big to be put in the cache.

First part of the patchset makes all mutation_readers allow the
streamed_mutations they produce to outlive them, which is a guarantee
used later by the code handling reading large partitions.

(cherry picked from commit d2ed75c9ff)
2016-08-02 20:24:29 +02:00
Piotr Jastrzebski
6960fce9b2 Use continuity flag correctly with concurrent invalidations
Between reading cache entry and actually using it
invalidations can happen so we have to check if no flag was
cleared if it was we need to read the entry again.

Fixes #1464.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <7856b0ded45e42774ccd6f402b5ee42175bd73cf.1469701026.git.piotr@scylladb.com>
(cherry picked from commit fdfd1af694)
2016-08-02 20:24:22 +02:00
Avi Kivity
a556265ccd checked_file: preserve DMA alignment
Inherit the alignment parameters from the underlying file instead of
defaulting to 4096.  This gives better read performance on disks with 512-byte
sectors.

Fixes #1532.
Message-Id: <1470122188-25548-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 9f35e4d328)
2016-08-02 12:22:37 +03:00
Duarte Nunes
8243d3d1e0 storage_service: Fix get_range_to_address_map_in_local_dc
This patch fixes a couple of bugs in
get_range_to_address_map_in_local_dc.

Fixes #1517

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469782666-21320-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 7d1b7e8da3)
2016-07-29 11:24:12 +02:00
Pekka Enberg
2665bfdc93 Update seastar submodule
* seastar 103543a...f603f88 (1):
  > iotune: Fix SIGFPE with some executions
2016-07-29 11:11:56 +03:00
Tomasz Grabiec
3a1e8fffde Merge branch 'sstables/static-1.3/v1' from git@github.com:duarten/scylla.git into branch-1.3
The current code assumes cell names are always compound and may
wrongly report a non-static row as such. This patch addresses this
and adds a test case to catch regressions.

Backports the fix to #1495.
2016-07-28 15:07:41 +02:00
Gleb Natapov
23c340bed8 api: fix use after free in sum_sstable
get_sstables_including_compacted_undeleted() may return temporary shared
ptr which will be destroyed before the loop if not stored locally.

Fixes #1514

Message-Id: <20160728100504.GD2502@scylladb.com>
(cherry picked from commit 3531dd8d71)
2016-07-28 14:28:25 +03:00
Duarte Nunes
ff8a795021 sstables: Validate static cell is on static column
This patch enforces compatibility between a cell and the
corresponding column definition with regards to them being
static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-28 12:11:46 +02:00
Duarte Nunes
d11b0cac3b sstable_mutation_test: Test non-compound cell name
This patch adds a test case for reading non-compound cell names,
validating that such a cell is not incorrectly marked as static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-5-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:37 +02:00
Duarte Nunes
5ad0448cc9 sstables: Don't assume cell name is compound
The current code assumes cell names are always compound and may
wrongly report a non-static row as such, since it looks at the first
bytes of the name assuming they are the component's length.

Tables with compact storage (which cannot contain static rows) may not
have a compound comparator, so we check for the table's compoundness
before checking for the static marker. We do this by delegating to
composite_view::is_static.

Fixes #1495

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-4-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:27 +02:00
Duarte Nunes
35ab2cadc2 sstables: Remove duplication in extract_clustering_key
This patch removes some duplicated code in extract_clustering_key(),
which is already handled in composite_view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397806-8067-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:22 +02:00
Duarte Nunes
a1cee9f97c sstables: Remove superfluous call to check_static()
When building a column we're calling check_static() two times;
refector things a bit so that this doesn't happen and we reuse the
previous calculation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397748-7987-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:15 +02:00
Duarte Nunes
0ae7347d8e composite: Use operator[] instead of at()
Since we already do bounds checking on is_static(), we can use
bytes_view::operator[] instead of bytes_view::at() to avoid repeating
the bounds checking.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-3-git-send-email-duarte@scylladb.com>
2016-07-28 12:10:14 +02:00
Duarte Nunes
b04168c015 composite_view: Fix is_static
composite_view's is_static function is wrong because:

1) It doesn't guard against the composite being a compound;
2) Doesn't deal with widening due to integral promotions and
   consequent sign extension.

This patch fixes this by ensuring there's only one correct
implementation of is_static, to avoid code duplication and
enforce test coverage.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-2-git-send-email-duarte@scylladb.com>
2016-07-28 12:10:06 +02:00
Duarte Nunes
4e13853cbc compound_compat: Only compound values can be static
If a composite is not a compound, then it doesn't carry a length
prefix where static information is encoded. In its absence, a
non-compound composite can never be static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397561-7748-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:09:59 +02:00
Pekka Enberg
503f6c6755 release: prepare for 1.3.rc2 2016-07-28 10:57:11 +03:00
Tomasz Grabiec
7d73599acd tests: lsa_async_eviction_test: Use chunked_fifo<>
To protect against large reallocations during push() which are done
under reclaim lock and may fail.
2016-07-28 09:43:51 +02:00
Piotr Jastrzebski
bf27379583 Add tests for wide partiton handling in cache.
They shouldn't be cached.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 7d29cdf81f)
2016-07-27 14:09:45 +03:00
Piotr Jastrzebski
02cf5a517a Add collectd counter for uncached wide partitions.
Keep track of every read of wide partition that's
not cached.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 37a7d49676)
2016-07-27 14:09:40 +03:00
Piotr Jastrzebski
ec3d59bf13 Add flag to configure
max size of a cached partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 636a4acfd0)
2016-07-27 14:09:34 +03:00
Piotr Jastrzebski
30c72ef3b4 Try to read whole streamed_mutation up to limit
If limit is exceeded then return the streamed_mutation
and don't cache it.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 98c12dc2e2)
2016-07-27 14:09:29 +03:00
Piotr Jastrzebski
15e69a32ba Implement mutation_from_streamed_mutation_with_limit
If mutation is bigger than this limit
it won't be read and mutation_from_streamed_mutation
will return empty optional.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 0d39bb1ad0)
2016-07-27 14:09:23 +03:00
Paweł Dziepak
4e43cb84ff mests/sstables: test reading sstable with duplicated range tombstones
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit b405ff8ad2)
2016-07-27 14:09:02 +03:00
Paweł Dziepak
07d5e939be sstables: avoid recursion in sstable_streamed_mutation::read_next()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 04f2c278c2)
2016-07-27 14:06:03 +03:00
Paweł Dziepak
a2a5a22504 sstables: protect against duplicated range tombstones
Promoted index may cause sstable to have range tombstones duplicated
several times. These duplicates appear in the "wrong" place since they
are smaller than the entity preceeding them.

This patch ignores such duplicates by skipping range tombstones that are
smaller than previously read ones.

Moreover, these duplicted range tombstone may appear in the middle of
clustering row, so the sstable reader has also gained the ability to
merge parts of the row in such cases.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 08032db269)
2016-07-27 14:05:58 +03:00
Paweł Dziepak
a39bec0e24 tests: extract streamed_mutation assertions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 50469e5ef3)
2016-07-27 14:05:43 +03:00
Duarte Nunes
f0af5719d5 thrift: Preserve partition order when accumulating
This patch changes the column_visitor so that it preservers the order
of the partitions it visits when building the accumulation result.

This is required by verbs such as get_range_slice, on top of which
users can implement paging. In such cases, the last key returned by
the query will be that start of the range for the next query. If
that key is not actually the last in the partitioner's order, then
the new request will likely result in duplicate values being sent.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469568135-19644-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5aaf43d1bc)
2016-07-27 12:11:41 +03:00
Avi Kivity
0523000af5 size_estimates_recorder: unwrap ranges before searching for sstables
column_family::select_sstables() requires unwrapped ranges, so unwrap
them.  Fixes crash with Leveled Compaction Strategy.

Fixes #1507.

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469563488-14869-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 64d0cf58ea)
2016-07-27 10:07:13 +03:00
Paweł Dziepak
69a0e6e002 stables: fix skipping partitions with no rows
If partition contains no static and clustering rows or range tombstones
mp_row_consumer will return disengaged mutation_fragment_opt with
is_mutation_end flag set to mark end of this partition.

Current, mutation_reader::impl code incorrectly recognized disengaged
mutation fragment as end of the stream of all mutations. This patch
fixes that by using is_mutation_end flag to determine whether end of
partition or end of stream was reached.

Fixes #1503.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1469525449-15525-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit efa690ce8c)
2016-07-26 13:10:31 +03:00
Amos Kong
58d4de295c scylla-housekeeping: fix typo of script path
I tried to start scylla-housekeeping service by:
 # sudo systemctl restart scylla-housekeeping.service

But it's failed for wrong script path, error detail:
 systemd[5605]: Failed at step EXEC spawning
 /usr/lib/scylla/scylla-Housekeeping: No such file or directory

The right script name is 'scylla-housekeeping'

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <c11319a3c7d3f22f613f5f6708699be0aa6bd740.1469506477.git.amos@scylladb.com>
(cherry picked from commit 64530e9686)
2016-07-26 09:19:15 +03:00
Vlad Zolotarov
026061733f tracing: set a default TTL for system_traces tables when they are created
Fixes #1482

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469104164-4452-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 4647ad9d8a)
2016-07-25 13:50:43 +03:00
Vlad Zolotarov
1d7ed190f8 SELECT tracing instrumentation: improve inter-nodes communication stages messages
Add/fix "sending to"/"received from" messages.

With this patch the single key select trace with a data on an external node
looks as follows:

Tracing session: 65dbfcc0-4f51-11e6-8dd2-000000000001

 activity                                                                                                                        | timestamp                  | source    | source_elapsed
---------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                                              Execute CQL3 query | 2016-07-21 17:42:50.124000 | 127.0.0.2 |              0
                                                                                                   Parsing a statement [shard 1] | 2016-07-21 17:42:50.124127 | 127.0.0.2 |             --
                                                                                                Processing a statement [shard 1] | 2016-07-21 17:42:50.124190 | 127.0.0.2 |             64
 Creating read executor for token 2309717968349690594 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 1] | 2016-07-21 17:42:50.124229 | 127.0.0.2 |            103
                                                                            read_data: sending a message to /127.0.0.1 [shard 1] | 2016-07-21 17:42:50.124234 | 127.0.0.2 |            108
                                                                           read_data: message received from /127.0.0.2 [shard 1] | 2016-07-21 17:42:50.124358 | 127.0.0.1 |             14
                                                          read_data handling is done, sending a response to /127.0.0.2 [shard 1] | 2016-07-21 17:42:50.124434 | 127.0.0.1 |             89
                                                                               read_data: got response from /127.0.0.1 [shard 1] | 2016-07-21 17:42:50.124662 | 127.0.0.2 |            536
                                                                                  Done processing - preparing a result [shard 1] | 2016-07-21 17:42:50.124695 | 127.0.0.2 |            569
                                                                                                                Request complete | 2016-07-21 17:42:50.124580 | 127.0.0.2 |            580

Fixes #1481

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469112271-22818-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 57b58cad8e)
2016-07-25 13:50:39 +03:00
Raphael S. Carvalho
2d66a4621a compaction: do not convert timestamp resolution to uppercase
C* only allows timestamp resolution in uppercase, so we shouldn't
be forgiving about it, otherwise migration to C* will not work.
Timestamp resolution is stored in compaction strategy options of
schema BTW.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d64878fc9bbcf40fd8de3d0f08cce9f6c2fde717.1469133851.git.raphaelsc@scylladb.com>
(cherry picked from commit c4f34f5038)
2016-07-25 13:47:23 +03:00
Duarte Nunes
aaa9b5ace8 system_keyspace: Add query_size_estimates() function
The query_size_estimates() function queries the size_estimates system
table for a given keyspace and table, filtering out the token ranges
according to the specified tokens.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit ecfa04da77)
2016-07-25 13:43:16 +03:00
Duarte Nunes
8d491e9879 size_estimates_recorder: Fix stop()
This patch fixes stop() by checking if the current CPU instead of
whether the service is active (which it won't be at the time stop() is
called).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit d984cc30bf)
2016-07-25 13:43:08 +03:00
Duarte Nunes
b63c9fb84b system_keyspace: Avoid pointers in range_estimates
This patch makes range_estimates a proper struct, where tokens are
represented as dht::tokens rather than dht::ring_position*.

We also pass other arguments to update_ and clear_size_estimates by
copy, since one will already be required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit e16f3f2969)
2016-07-25 13:42:53 +03:00
Duarte Nunes
b229f03198 thrift: Fail when creating mixed CF
This patch ensures we fail when creating a mixed column family, either
when adding columns to a dynamic CF through updated_column_family() or
when adding a dynamic column upon insertion.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469378658-19853-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5c4a2044d5)
2016-07-25 13:42:05 +03:00
Duarte Nunes
6caa59560b thrift: Correctly translate no_such_column_family
The no_such_column_family exception is translated to
InvalidRequestException instead of to NotFoundException.

8991d35231 exposed this problem.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469376674-14603-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 560cc12fd7)
2016-07-25 13:41:58 +03:00
Duarte Nunes
79196af9fb thrift: Implement describe_splits verb
This patch implements the describe_splits verb on top of
describe_splits_ex.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit ab08561b89)
2016-07-25 13:41:54 +03:00
Duarte Nunes
afe09da858 thrift: Implement describe_splits_ex verb
This patch implements the describe_splits_ex verbs by querying the
size_estimates system table for all the estimates in the specified
token range.

If the keys_per_split argument is bigger then the
estimated partitions count, then we merge ranges until keys_per_split
is met. Note that the tokens can't be split any further,
keys_per_split might be less than the reported number of keys in one
or more ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 472c23d7d2)
2016-07-25 13:41:46 +03:00
Duarte Nunes
d6cb41ff24 thrift: Handle and convert invalid_request_exception
This patch converts an exceptions::invalid_request_exception
into a Thrift InvalidRequestException instead of into a generic one.

This makes TitanDB work correctly, which expects an
InvalidRequestException when setting a non-existent keyspace.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469362086-1013-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 2be45c4806)
2016-07-24 16:46:18 +03:00
Duarte Nunes
6bf77c7b49 thrift: Use database::find_schema directly
This patch changes lookup_schema() so it directly calls
database::find_schema() instead of going through
database::find_column_family(). It also drops conversion of the
no_such_column_family exeption, as that is already handled at a higher
layer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 8991d35231)
2016-07-24 16:46:05 +03:00
Duarte Nunes
6d34b4dab7 thrift: Remove hardcoded version constant
...and use the one in thrift_server.hh instead.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 038d42c589)
2016-07-24 16:45:46 +03:00
Duarte Nunes
d367f1e9ab thrift: Remove unused with_cob_dereference function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 8bb43d09b1)
2016-07-24 16:45:22 +03:00
Avi Kivity
75a36ae453 bloom_filter: fix overflow for large filters
We use ::abs(), which has an int parameter, on long arguments, resulting
in incorrect results.

Switch to std::abs() instead, which has the correct overloads.

Fixes #1494.

Message-Id: <1469347802-28933-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 900639915d)
2016-07-24 11:32:28 +03:00
Tomasz Grabiec
35c1781913 schema_tables: Fix hang during keyspace drop
Fixes #1484.

We drop tables as part of keyspace drop. Table drop starts with
creating a snapshot on all shards. All shards must use the same
snapshot timestamp which, among other things, is part of the snapshot
name. The timestamp is generated using supplied timestamp generating
function (joinpoint object). The joinpoint object will wait for all
shards to arrive and then generate and return the timestamp.

However, we drop tables in parallel, using the same joinpoint
instance. So joinpoint may be contacted by snapshotting shards of
tables A and B concurrently, generating timestamp t1 for some shards
of table A and some shards of table B. Later the remaining shards of
table A will get a different timestamp. As a result, different shards
may use different snapshot names for the same table. The snapshot
creation will never complete because the sealing fiber waits for all
shards to signal it, on the same name.

The fix is to give each table a separate joinpoint instance.

Message-Id: <1469117228-17879-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5e8f0efc85)
2016-07-22 15:36:45 +02:00
Vlad Zolotarov
1489b28ffd cql_server::connection::process_prepare(): don't std::move() a shared_ptr captured by reference in value_of() lambda
A seastar::value_of() lambda used in a trace point was doing the unthinkable:
it called std::move() on a value captured by reference. Not only it compiled(!!!)
but it also actually std::move()ed the shared_ptr before it was used in a make_result()
which naturally caused a SIGSEG crash.

Fixes #1491

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469193763-27631-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 9423c13419)
2016-07-22 16:33:17 +03:00
Avi Kivity
f975653c94 Update seastar submodule to point at scylla-seastar 2016-07-21 12:31:09 +03:00
Duarte Nunes
96f5cbb604 thrift: Omit regular columns for dynamic CFs
This patch skips adding the auto-generated regular column when
describing a dynamic Column family for the describe_keyspace(s) verbs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469091720-10113-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit a436cf945c)
2016-07-21 12:06:29 +03:00
Raphael S. Carvalho
66ebef7d10 tests: add new test for date tiered strategy
This test set the time window to 1 hour and checks that the strategy
works accordingly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit cf54af9e58)
2016-07-21 12:00:26 +03:00
Raphael S. Carvalho
789fb0db97 compaction: implement date tiered compaction strategy options
Now date tiered compaction strategy will take into account the
strategy options which are defined in the schema.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit eaa6e281a2)
2016-07-21 12:00:18 +03:00
Pekka Enberg
af7c0f6433 Revert "Merge seastar upstream"
This reverts commit aaf6786997.

We should backport the iotune fixes for 1.3 and not pull everything.
2016-07-21 11:19:50 +03:00
Pekka Enberg
aaf6786997 Merge seastar upstream
* seastar 103543a...9d1db3f (8):
  > reactor: limit task backlog
  > iotune: Fix SIGFPE with some executions
  > Merge "Preparation for protobuf" from Amnon
  > byteorder: add missing cpu_to_be(), be_to_cpu() functions
  > rpc: fix gcc-7 compilation error
  > reactor: Register the smp metrics disabled
  > scollectd: Allow creating metric that is disabled
  > Merge "Propagate timeout to a server" from Gleb
2016-07-21 11:04:31 +03:00
Pekka Enberg
e8cb163cdf db/config: Start Thrift server by default
We have Thrift support now so start the server by default.
Message-Id: <1469002000-26767-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit aff8cf319d)
2016-07-20 11:29:24 +03:00
Duarte Nunes
2d7c322805 thrift: Actually concatenate strings
This patch fixes concatenating a char[] with an int by using sprint
instead of just increasing the pointer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468971542-9600-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 64dff69077)
2016-07-20 11:09:15 +03:00
Tomasz Grabiec
13f18c6445 database: Add table name to log message about sealing
Message-Id: <1468917744-2539-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 0d26294fac)
2016-07-20 10:13:32 +03:00
Tomasz Grabiec
9c430c2cff schema_tables: Add more logging
Message-Id: <1468917771-2592-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit a0832f08d2)
2016-07-20 10:13:28 +03:00
Pekka Enberg
c84e030fe9 release: prepare for 1.3.rc1 2016-07-19 20:15:26 +03:00
192 changed files with 2452 additions and 6443 deletions

View File

@@ -8,14 +8,14 @@ In addition to required packages by Seastar, the following packages are required
Scylla uses submodules, so make sure you pull the submodules first by doing:
```
git submodule init
git submodule update --init --recursive
git submodule update --recursive
```
### Building and Running Scylla on Fedora
* Installing required packages:
```
sudo dnf install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel protobuf-devel protobuf-compiler systemd-devel libunwind-devel
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel
```
* Build Scylla

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=1.4.3
VERSION=1.3.5
if test -f version
then

View File

@@ -777,7 +777,7 @@
]
},
{
"path": "/storage_proxy/metrics/read/moving_average_histogram",
"path": "/storage_proxy/metrics/read/moving_avrage_histogram",
"operations": [
{
"method": "GET",
@@ -792,7 +792,7 @@
]
},
{
"path": "/storage_proxy/metrics/range/moving_average_histogram",
"path": "/storage_proxy/metrics/range/moving_avrage_histogram",
"operations": [
{
"method": "GET",
@@ -942,7 +942,7 @@
]
},
{
"path": "/storage_proxy/metrics/write/moving_average_histogram",
"path": "/storage_proxy/metrics/write/moving_avrage_histogram",
"operations": [
{
"method": "GET",

View File

@@ -1736,57 +1736,6 @@
}
]
},
{
"path":"/storage_service/slow_query",
"operations":[
{
"method":"POST",
"summary":"Set slow query parameter",
"type":"void",
"nickname":"set_slow_query",
"produces":[
"application/json"
],
"parameters":[
{
"name":"enable",
"description":"set it to true to enable, anything else to disable",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"ttl",
"description":"TTL in seconds",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
},
{
"name":"threshold",
"description":"Slow query record threshold in microseconds",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
},
{
"method":"GET",
"summary":"Returns the slow query record configuration.",
"type":"slow_query_info",
"nickname":"get_slow_query_info",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/auto_compaction/{keyspace}",
"operations":[
@@ -2184,24 +2133,6 @@
}
}
},
"slow_query_info": {
"id":"slow_query_info",
"description":"Slow query triggering information",
"properties":{
"enable":{
"type":"boolean",
"description":"Is slow query logging enable or disable"
},
"ttl":{
"type":"long",
"description":"The slow query TTL in seconds"
},
"threshold":{
"type":"long",
"description":"The slow query logging threshold in microseconds. Queries that takes longer, will be logged"
}
}
},
"endpoint_detail":{
"id":"endpoint_detail",
"description":"Endpoint detail",

View File

@@ -116,7 +116,6 @@ inline
httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
httpd::utils_json::histogram h;
h = val;
h.sum = val.estimated_sum();
return h;
}
@@ -130,7 +129,7 @@ httpd::utils_json::rate_moving_average meter_to_json(const utils::rate_moving_av
inline
httpd::utils_json::rate_moving_average_and_histogram timer_to_json(const utils::rate_moving_average_and_histogram& val) {
httpd::utils_json::rate_moving_average_and_histogram h;
h.hist = to_json(val.hist);
h.hist = val.hist;
h.meter = meter_to_json(val.rate);
return h;
}

View File

@@ -24,7 +24,7 @@
#include <vector>
#include "http/exception.hh"
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include "sstables/estimated_histogram.hh"
#include <algorithm>
namespace api {
@@ -403,14 +403,14 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
sstables::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_row_size);
}
return res;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -425,14 +425,14 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
sstables::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_column_count);
}
return res;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_all_compression_ratio.set(r, [] (std::unique_ptr<request> req) {
@@ -807,10 +807,10 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_sstable_per_read;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -869,17 +869,17 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_read;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_write;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -52,9 +52,9 @@ static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>
});
}
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, utils::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, utils::estimated_histogram(),
utils::estimated_histogram_merge).then([](const utils::estimated_histogram& val) {
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, sstables::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, sstables::estimated_histogram(),
sstables::merge).then([](const sstables::estimated_histogram& val) {
utils_json::estimated_histogram res;
res = val;
return make_ready_future<json::json_return_type>(res);

View File

@@ -681,37 +681,6 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(tracing::tracing::get_local_tracing_instance().get_trace_probability());
});
ss::get_slow_query_info.set(r, [](const_req req) {
ss::slow_query_info res;
res.enable = tracing::tracing::get_local_tracing_instance().slow_query_tracing_enabled();
res.ttl = tracing::tracing::get_local_tracing_instance().slow_query_record_ttl().count() ;
res.threshold = tracing::tracing::get_local_tracing_instance().slow_query_threshold().count();
return res;
});
ss::set_slow_query.set(r, [](std::unique_ptr<request> req) {
auto enable = req->get_query_param("enable");
auto ttl = req->get_query_param("ttl");
auto threshold = req->get_query_param("threshold");
try {
return tracing::tracing::tracing_instance().invoke_on_all([enable, ttl, threshold] (auto& local_tracing) {
if (threshold != "") {
local_tracing.set_slow_query_threshold(std::chrono::microseconds(std::stol(threshold.c_str())));
}
if (ttl != "") {
local_tracing.set_slow_query_record_ttl(std::chrono::seconds(std::stol(ttl.c_str())));
}
if (enable != "") {
local_tracing.set_slow_query_enabled(strcasecmp(enable.c_str(), "true") == 0);
}
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
} catch (...) {
throw httpd::bad_param_exception(sprint("Bad format value: "));
}
});
ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();

View File

@@ -236,19 +236,11 @@ public:
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value) {
return atomic_cell_type::make_live(timestamp, value);
}
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value) {
return make_live(timestamp, bytes_view(value));
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return atomic_cell_type::make_live(timestamp, value, expiry, ttl);
}
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return make_live(timestamp, bytes_view(value), expiry, ttl);
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value, ttl_opt ttl) {
if (!ttl) {
return atomic_cell_type::make_live(timestamp, value);

View File

@@ -38,7 +38,6 @@ class bytes_ostream {
public:
using size_type = bytes::size_type;
using value_type = bytes::value_type;
static constexpr size_type max_chunk_size = 16 * 1024;
private:
static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
struct chunk {
@@ -154,18 +153,19 @@ public:
}
bytes_ostream& operator=(const bytes_ostream& o) {
if (this != &o) {
auto x = bytes_ostream(o);
*this = std::move(x);
}
_size = 0;
_current = nullptr;
_begin = {};
append(o);
return *this;
}
bytes_ostream& operator=(bytes_ostream&& o) noexcept {
if (this != &o) {
this->~bytes_ostream();
new (this) bytes_ostream(std::move(o));
}
_size = o._size;
_begin = std::move(o._begin);
_current = o._current;
o._current = nullptr;
o._size = 0;
return *this;
}
@@ -174,7 +174,7 @@ public:
value_type* ptr;
// makes the place_holder looks like a stream
seastar::simple_output_stream get_stream() {
return seastar::simple_output_stream(reinterpret_cast<char*>(ptr), sizeof(T));
return seastar::simple_output_stream{reinterpret_cast<char*>(ptr)};
}
};
@@ -195,19 +195,19 @@ public:
if (v.empty()) {
return;
}
auto this_size = std::min(v.size(), size_t(current_space_left()));
if (this_size) {
memcpy(_current->data + _current->offset, v.begin(), this_size);
_current->offset += this_size;
_size += this_size;
v.remove_prefix(this_size);
}
while (!v.empty()) {
auto this_size = std::min(v.size(), size_t(max_chunk_size));
std::copy_n(v.begin(), this_size, alloc(this_size));
v.remove_prefix(this_size);
auto space_left = current_space_left();
if (v.size() <= space_left) {
memcpy(_current->data + _current->offset, v.begin(), v.size());
_current->offset += v.size();
_size += v.size();
} else {
if (space_left) {
memcpy(_current->data + _current->offset, v.begin(), space_left);
_current->offset += space_left;
_size += space_left;
v.remove_prefix(space_left);
}
memcpy(alloc(v.size()), v.begin(), v.size());
}
}
@@ -272,8 +272,13 @@ public:
}
void append(const bytes_ostream& o) {
for (auto&& bv : o.fragments()) {
write(bv);
if (o.size() > 0) {
auto dst = alloc(o.size());
auto r = o._begin.get();
while (r) {
dst = std::copy_n(r->data, r->offset, dst);
r = r->next.get();
}
}
}
@@ -323,45 +328,6 @@ public:
_current->next = nullptr;
_current->offset = pos._offset;
}
void reduce_chunk_count() {
// FIXME: This is a simplified version. It linearizes the whole buffer
// if its size is below max_chunk_size. We probably could also gain
// some read performance by doing "real" reduction, i.e. merging
// all chunks until all but the last one is max_chunk_size.
if (size() < max_chunk_size) {
linearize();
}
}
bool operator==(const bytes_ostream& other) const {
auto as = fragments().begin();
auto as_end = fragments().end();
auto bs = other.fragments().begin();
auto bs_end = other.fragments().end();
auto a = *as++;
auto b = *bs++;
while (!a.empty() || !b.empty()) {
auto now = std::min(a.size(), b.size());
if (!std::equal(a.begin(), a.begin() + now, b.begin(), b.begin() + now)) {
return false;
}
a.remove_prefix(now);
if (a.empty() && as != as_end) {
a = *as++;
}
b.remove_prefix(now);
if (b.empty() && bs != bs_end) {
b = *bs++;
}
}
return true;
}
bool operator!=(const bytes_ostream& other) const {
return !(*this == other);
}
};
template<>

138
clustering_key_filter.cc Normal file
View File

@@ -0,0 +1,138 @@
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "clustering_key_filter.hh"
#include "keys.hh"
#include "query-request.hh"
#include "range.hh"
namespace query {
const clustering_row_ranges&
clustering_key_filtering_context::get_ranges(const partition_key& key) const {
static thread_local clustering_row_ranges full_range = {{}};
return _factory ? _factory->get_ranges(key) : full_range;
}
clustering_key_filtering_context clustering_key_filtering_context::create_no_filtering() {
return clustering_key_filtering_context{};
}
const clustering_key_filtering_context no_clustering_key_filtering =
clustering_key_filtering_context::create_no_filtering();
class stateless_clustering_key_filter_factory : public clustering_key_filter_factory {
clustering_key_filter _filter;
clustering_row_ranges _ranges;
public:
stateless_clustering_key_filter_factory(clustering_row_ranges&& ranges,
clustering_key_filter&& filter)
: _filter(std::move(filter)), _ranges(std::move(ranges)) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
return _filter;
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
return _filter;
}
virtual const clustering_row_ranges& get_ranges(const partition_key& key) override {
return _ranges;
}
virtual bool want_static_columns(const partition_key& key) override {
return true;
}
};
class partition_slice_clustering_key_filter_factory : public clustering_key_filter_factory {
schema_ptr _schema;
const partition_slice& _slice;
clustering_key_prefix::prefix_equal_tri_compare _cmp;
clustering_row_ranges _ck_ranges;
public:
partition_slice_clustering_key_filter_factory(schema_ptr s, const partition_slice& slice)
: _schema(std::move(s)), _slice(slice), _cmp(*_schema) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const clustering_range& r) { return r.contains(key, _cmp); });
};
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const clustering_range& r) { return r.contains(key, _cmp); });
};
}
virtual const clustering_row_ranges& get_ranges(const partition_key& key) override {
if (_slice.options.contains(query::partition_slice::option::reversed)) {
_ck_ranges = _slice.row_ranges(*_schema, key);
std::reverse(_ck_ranges.begin(), _ck_ranges.end());
return _ck_ranges;
}
return _slice.row_ranges(*_schema, key);
}
virtual bool want_static_columns(const partition_key& key) override {
return true;
}
};
static const shared_ptr<clustering_key_filter_factory>
create_partition_slice_filter(schema_ptr s, const partition_slice& slice) {
return ::make_shared<partition_slice_clustering_key_filter_factory>(std::move(s), slice);
}
const clustering_key_filtering_context
clustering_key_filtering_context::create(schema_ptr schema, const partition_slice& slice) {
static thread_local clustering_key_filtering_context accept_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(clustering_row_ranges{{}},
[](const clustering_key&) { return true; }));
static thread_local clustering_key_filtering_context reject_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(clustering_row_ranges{},
[](const clustering_key&) { return false; }));
if (slice.get_specific_ranges()) {
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
const clustering_row_ranges& ranges = slice.default_row_ranges();
if (ranges.empty()) {
return reject_all;
}
if (ranges.size() == 1 && ranges[0].is_full()) {
return accept_all;
}
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
}

View File

@@ -22,46 +22,61 @@
*/
#pragma once
#include <functional>
#include <vector>
#include "core/shared_ptr.hh"
#include "database_fwd.hh"
#include "schema.hh"
#include "query-request.hh"
template<typename T> class range;
namespace query {
class clustering_key_filter_ranges {
clustering_row_ranges _storage;
const clustering_row_ranges& _ref;
class partition_slice;
// A predicate that tells if a clustering key should be accepted.
using clustering_key_filter = std::function<bool(const clustering_key&)>;
// A factory for clustering key filter which can be reused for multiple clustering keys.
class clustering_key_filter_factory {
public:
clustering_key_filter_ranges(const clustering_row_ranges& ranges) : _ref(ranges) { }
struct reversed { };
clustering_key_filter_ranges(reversed, const clustering_row_ranges& ranges)
: _storage(ranges.rbegin(), ranges.rend()), _ref(_storage) { }
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
virtual clustering_key_filter get_filter(const partition_key&) = 0;
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
virtual clustering_key_filter get_filter_for_sorted(const partition_key&) = 0;
virtual const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key&) = 0;
// Whether we want to get the static row, in addition to the desired clustering rows
virtual bool want_static_columns(const partition_key&) = 0;
clustering_key_filter_ranges(clustering_key_filter_ranges&& other) noexcept
: _storage(std::move(other._storage))
, _ref(&other._ref == &other._storage ? _storage : other._ref)
{ }
clustering_key_filter_ranges& operator=(clustering_key_filter_ranges&& other) noexcept {
if (this != &other) {
this->~clustering_key_filter_ranges();
new (this) clustering_key_filter_ranges(std::move(other));
}
return *this;
}
auto begin() const { return _ref.begin(); }
auto end() const { return _ref.end(); }
bool empty() const { return _ref.empty(); }
size_t size() const { return _ref.size(); }
static clustering_key_filter_ranges get_ranges(const schema& schema, const query::partition_slice& slice, const partition_key& key) {
const query::clustering_row_ranges& ranges = slice.row_ranges(schema, key);
if (slice.options.contains(query::partition_slice::option::reversed)) {
return clustering_key_filter_ranges(clustering_key_filter_ranges::reversed{}, ranges);
}
return clustering_key_filter_ranges(ranges);
}
virtual ~clustering_key_filter_factory() = default;
};
class clustering_key_filtering_context {
private:
shared_ptr<clustering_key_filter_factory> _factory;
clustering_key_filtering_context() {};
clustering_key_filtering_context(shared_ptr<clustering_key_filter_factory> factory) : _factory(factory) {}
public:
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
clustering_key_filter get_filter(const partition_key& key) const {
return _factory ? _factory->get_filter(key) : [] (const clustering_key&) { return true; };
}
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
clustering_key_filter get_filter_for_sorted(const partition_key& key) const {
return _factory ? _factory->get_filter_for_sorted(key) : [] (const clustering_key&) { return true; };
}
const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key& key) const;
bool want_static_columns(const partition_key& key) const {
return _factory ? _factory->want_static_columns(key) : true;
}
static const clustering_key_filtering_context create(schema_ptr, const partition_slice&);
static clustering_key_filtering_context create_no_filtering();
};
extern const clustering_key_filtering_context no_clustering_key_filtering;
}

View File

@@ -56,14 +56,11 @@ public:
// Some strategies may look at the compacted and resulting sstables to
// get some useful information for subsequent compactions.
void notify_completion(const std::vector<lw_shared_ptr<sstable>>& removed, const std::vector<lw_shared_ptr<sstable>>& added);
void notify_completion(schema_ptr schema, const std::vector<lw_shared_ptr<sstable>>& removed, const std::vector<lw_shared_ptr<sstable>>& added);
// Return if parallel compaction is allowed by strategy.
bool parallel_compaction() const;
// Return if optimization to rule out sstables based on clustering key filter should be applied.
bool use_clustering_key_filter() const;
// An estimation of number of compaction for strategy to be satisfied.
int64_t estimated_pending_compactions(column_family& cf) const;

View File

@@ -211,16 +211,16 @@ batch_size_warn_threshold_in_kb: 5
# increase system_auth keyspace replication factor if you use this authorizer.
# authorizer: AllowAllAuthorizer
###################################################
## Not currently supported, reserved for future use
###################################################
# initial_token allows you to specify tokens manually. While you can use # it with
# vnodes (num_tokens > 1, above) -- in which case you should provide a
# comma-separated list -- it's primarily used when adding nodes # to legacy clusters
# that do not have vnodes enabled.
# initial_token:
###################################################
## Not currently supported, reserved for future use
###################################################
# See http://wiki.apache.org/cassandra/HintedHandoff
# May either be "true" or "false" to enable globally, or contain a list
# of data centers to enable per-datacenter.
@@ -813,13 +813,3 @@ commitlog_total_space_in_mb: -1
# freeing processor resources when there is other work to be done.
#
# defragment_memory_on_idle: true
#
# prometheus port
# By default, Scylla opens prometheus API port on port 9180
# setting the port to 0 will disable the prometheus API.
# prometheus_port: 9180
#
# prometheus address
# By default, Scylla binds all interfaces to the prometheus API
# It is possible to restrict the listening address to a specific one
# prometheus_address: 0.0.0.0

View File

@@ -220,9 +220,6 @@ scylla_tests = [
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/database_test',
'tests/nonwrapping_range_test',
'tests/input_stream_test',
'tests/sstable_atomic_deletion_test',
]
apps = [
@@ -302,6 +299,7 @@ scylla_core = (['database.cc',
'mutation_query.cc',
'key_reader.cc',
'keys.cc',
'clustering_key_filter.cc',
'sstables/sstables.cc',
'sstables/compress.cc',
'sstables/row.cc',
@@ -310,7 +308,6 @@ scylla_core = (['database.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/compaction_manager.cc',
'sstables/atomic_deletion.cc',
'transport/event.cc',
'transport/event_notifier.cc',
'transport/server.cc',
@@ -428,7 +425,6 @@ scylla_core = (['database.cc',
'dht/i_partitioner.cc',
'dht/murmur3_partitioner.cc',
'dht/byte_ordered_partitioner.cc',
'dht/random_partitioner.cc',
'dht/boot_strapper.cc',
'dht/range_streamer.cc',
'unimplemented.cc',
@@ -592,7 +588,6 @@ tests_not_using_seastar_test_framework = set([
'tests/idl_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/nonwrapping_range_test',
])
for t in tests_not_using_seastar_test_framework:
@@ -607,7 +602,6 @@ for t in scylla_tests:
deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc']
deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']

View File

@@ -93,6 +93,12 @@ public:
specific_options options,
cql_serialization_format sf);
explicit query_options(db::consistency_level consistency,
std::vector<std::vector<bytes_view_opt>> value_views,
bool skip_metadata,
specific_options options,
cql_serialization_format sf);
// Batch query_options constructor
explicit query_options(query_options&&, std::vector<std::vector<bytes_view_opt>> value_views);

View File

@@ -64,8 +64,8 @@ class query_processor::internal_state {
service::query_state _qs;
public:
internal_state()
: _qs(service::client_state{service::client_state::internal_tag()})
{ }
: _qs(service::client_state{service::client_state::internal_tag()}) {
}
operator service::query_state&() {
return _qs;
}
@@ -88,7 +88,7 @@ api::timestamp_type query_processor::next_timestamp() {
}
query_processor::query_processor(distributed<service::storage_proxy>& proxy,
distributed<database>& db)
distributed<database>& db)
: _migration_subscriber{std::make_unique<migration_subscriber>(this)}
, _proxy(proxy)
, _db(db)
@@ -133,9 +133,8 @@ query_processor::process(const sstring_view& query_string, service::query_state&
}
future<::shared_ptr<result_message>>
query_processor::process_statement(::shared_ptr<cql_statement> statement,
service::query_state& query_state,
const query_options& options)
query_processor::process_statement(::shared_ptr<cql_statement> statement, service::query_state& query_state,
const query_options& options)
{
#if 0
logger.trace("Process {} @CL.{}", statement, options.getConsistency());
@@ -146,7 +145,7 @@ query_processor::process_statement(::shared_ptr<cql_statement> statement,
statement->validate(_proxy, client_state);
auto fut = make_ready_future<::shared_ptr<transport::messages::result_message>>();
future<::shared_ptr<transport::messages::result_message>> fut = make_ready_future<::shared_ptr<transport::messages::result_message>>();
if (client_state.is_internal()) {
fut = statement->execute_internal(_proxy, query_state, options);
} else {
@@ -171,9 +170,7 @@ query_processor::prepare(const std::experimental::string_view& query_string, ser
}
future<::shared_ptr<transport::messages::result_message::prepared>>
query_processor::prepare(const std::experimental::string_view& query_string,
const service::client_state& client_state,
bool for_thrift)
query_processor::prepare(const std::experimental::string_view& query_string, const service::client_state& client_state, bool for_thrift)
{
auto existing = get_stored_prepared_statement(query_string, client_state.get_raw_keyspace(), for_thrift);
if (existing) {
@@ -189,9 +186,7 @@ query_processor::prepare(const std::experimental::string_view& query_string,
}
::shared_ptr<transport::messages::result_message::prepared>
query_processor::get_stored_prepared_statement(const std::experimental::string_view& query_string,
const sstring& keyspace,
bool for_thrift)
query_processor::get_stored_prepared_statement(const std::experimental::string_view& query_string, const sstring& keyspace, bool for_thrift)
{
if (for_thrift) {
auto statement_id = compute_thrift_id(query_string, keyspace);
@@ -211,10 +206,8 @@ query_processor::get_stored_prepared_statement(const std::experimental::string_v
}
future<::shared_ptr<transport::messages::result_message::prepared>>
query_processor::store_prepared_statement(const std::experimental::string_view& query_string,
const sstring& keyspace,
::shared_ptr<statements::prepared_statement> prepared,
bool for_thrift)
query_processor::store_prepared_statement(const std::experimental::string_view& query_string, const sstring& keyspace,
::shared_ptr<statements::prepared_statement> prepared, bool for_thrift)
{
#if 0
// Concatenate the current keyspace so we don't mix prepared statements between keyspace (#5352).
@@ -308,7 +301,7 @@ query_processor::parse_statement(const sstring_view& query)
if (!statement) {
throw exceptions::syntax_exception("Parsing failed");
}
};
return std::move(statement);
} catch (const exceptions::recognition_exception& e) {
throw exceptions::syntax_exception(sprint("Invalid or malformed CQL query string: %s", e.what()));
@@ -320,10 +313,10 @@ query_processor::parse_statement(const sstring_view& query)
}
}
query_options query_processor::make_internal_options(::shared_ptr<statements::prepared_statement> p,
const std::initializer_list<data_value>& values,
db::consistency_level cl)
{
query_options query_processor::make_internal_options(
::shared_ptr<statements::prepared_statement> p,
const std::initializer_list<data_value>& values,
db::consistency_level cl) {
if (p->bound_names.size() != values.size()) {
throw std::invalid_argument(sprint("Invalid number of values. Expecting %d but got %d", p->bound_names.size(), values.size()));
}
@@ -342,8 +335,8 @@ query_options query_processor::make_internal_options(::shared_ptr<statements::pr
return query_options(cl, bound_values);
}
::shared_ptr<statements::prepared_statement> query_processor::prepare_internal(const sstring& query_string)
{
::shared_ptr<statements::prepared_statement> query_processor::prepare_internal(
const sstring& query_string) {
auto& p = _internal_statements[query_string];
if (p == nullptr) {
auto np = parse_statement(query_string)->prepare(_db.local());
@@ -353,10 +346,9 @@ query_options query_processor::make_internal_options(::shared_ptr<statements::pr
return p;
}
future<::shared_ptr<untyped_result_set>>
query_processor::execute_internal(const sstring& query_string,
const std::initializer_list<data_value>& values)
{
future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
const sstring& query_string,
const std::initializer_list<data_value>& values) {
if (log.is_enabled(logging::log_level::trace)) {
log.trace("execute_internal: \"{}\" ({})", query_string, ::join(", ", values));
}
@@ -364,23 +356,22 @@ query_processor::execute_internal(const sstring& query_string,
return execute_internal(p, values);
}
future<::shared_ptr<untyped_result_set>>
query_processor::execute_internal(::shared_ptr<statements::prepared_statement> p,
const std::initializer_list<data_value>& values)
{
future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
::shared_ptr<statements::prepared_statement> p,
const std::initializer_list<data_value>& values) {
auto opts = make_internal_options(p, values);
return do_with(std::move(opts), [this, p = std::move(p)](auto& opts) {
return p->statement->execute_internal(_proxy, *_internal_state, opts).then([p](auto msg) {
return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
});
});
return do_with(std::move(opts),
[this, p = std::move(p)](query_options & opts) {
return p->statement->execute_internal(_proxy, *_internal_state, opts).then(
[p](::shared_ptr<transport::messages::result_message> msg) {
return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
});
});
}
future<::shared_ptr<untyped_result_set>>
query_processor::process(const sstring& query_string,
db::consistency_level cl,
const std::initializer_list<data_value>& values,
bool cache)
future<::shared_ptr<untyped_result_set>> query_processor::process(
const sstring& query_string,
db::consistency_level cl, const std::initializer_list<data_value>& values, bool cache)
{
auto p = cache ? prepare_internal(query_string) : parse_statement(query_string)->prepare(_db.local());
if (!cache) {
@@ -389,24 +380,23 @@ query_processor::process(const sstring& query_string,
return process(p, cl, values);
}
future<::shared_ptr<untyped_result_set>>
query_processor::process(::shared_ptr<statements::prepared_statement> p,
db::consistency_level cl,
const std::initializer_list<data_value>& values)
future<::shared_ptr<untyped_result_set>> query_processor::process(
::shared_ptr<statements::prepared_statement> p,
db::consistency_level cl, const std::initializer_list<data_value>& values)
{
auto opts = make_internal_options(p, values, cl);
return do_with(std::move(opts), [this, p = std::move(p)](auto & opts) {
return p->statement->execute(_proxy, *_internal_state, opts).then([p](auto msg) {
return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
});
});
return do_with(std::move(opts),
[this, p = std::move(p)](query_options & opts) {
return p->statement->execute(_proxy, *_internal_state, opts).then(
[p](::shared_ptr<transport::messages::result_message> msg) {
return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
});
});
}
future<::shared_ptr<transport::messages::result_message>>
query_processor::process_batch(::shared_ptr<statements::batch_statement> batch,
service::query_state& query_state,
query_options& options)
{
query_processor::process_batch(::shared_ptr<statements::batch_statement> batch, service::query_state& query_state, query_options& options) {
return batch->check_access(query_state.get_client_state()).then([this, &query_state, &options, batch] {
batch->validate();
batch->validate(_proxy, query_state.get_client_state());

View File

@@ -393,12 +393,12 @@ public:
auto prefix = clustering_key_prefix::from_optional_exploded(*_schema, vals);
return bounds_range_type::bound(prefix, is_inclusive(b));
};
auto range = wrapping_range<clustering_key_prefix>(read_bound(statements::bound::START), read_bound(statements::bound::END));
auto range = bounds_range_type(read_bound(statements::bound::START), read_bound(statements::bound::END));
auto bounds = bound_view::from_range(range);
if (bound_view::compare(*_schema)(bounds.second, bounds.first)) {
return { };
}
return { bounds_range_type(std::move(range)) };
return { std::move(range) };
}
#if 0
@Override

View File

@@ -207,7 +207,7 @@ protected:
auto& columns = schema->all_columns_in_select_order();
cds.reserve(columns.size());
for (auto& c : columns) {
if (!schema->is_dense() || !c.is_regular() || !c.name().empty()) {
if (!c.is_compact_value() || !c.name().empty()) {
cds.emplace_back(&c);
}
}
@@ -393,6 +393,9 @@ void result_set_builder::visitor::accept_new_row(
case column_kind::regular_column:
add_value(*def, row_iterator);
break;
case column_kind::compact_column:
add_value(*def, row_iterator);
break;
case column_kind::static_column:
add_value(*def, static_row_iterator);
break;

View File

@@ -95,7 +95,7 @@ single_column_relation::to_receivers(schema_ptr schema, const column_definition&
using namespace statements::request_validations;
auto receiver = column_def.column_specification;
if (schema->is_dense() && column_def.is_regular()) {
if (column_def.is_compact_value()) {
throw exceptions::invalid_request_exception(sprint(
"Predicates on the non-primary-key column (%s) of a COMPACT table are not yet supported", column_def.name_as_text()));
}

View File

@@ -40,7 +40,6 @@
*/
#include "alter_keyspace_statement.hh"
#include "prepared_statement.hh"
#include "service/migration_manager.hh"
#include "db/system_keyspace.hh"
#include "database.hh"
@@ -101,9 +100,3 @@ shared_ptr<transport::event::schema_change> cql3::statements::alter_keyspace_sta
transport::event::schema_change::change_type::UPDATED,
keyspace());
}
shared_ptr<cql3::statements::prepared_statement>
cql3::statements::alter_keyspace_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<alter_keyspace_statement>(*this));
}

View File

@@ -63,7 +63,6 @@ public:
void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) override;
future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
shared_ptr<transport::event::schema_change> change_event() override;
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -40,7 +40,6 @@
*/
#include "cql3/statements/alter_table_statement.hh"
#include "prepared_statement.hh"
#include "service/migration_manager.hh"
#include "validation.hh"
#include "db/config.hh"
@@ -181,6 +180,7 @@ future<bool> alter_table_statement::announce_migration(distributed<service::stor
}
break;
case column_kind::compact_column:
case column_kind::regular_column:
case column_kind::static_column:
// Thrift allows to change a column validator so CFMetaData.validateCompatibility will let it slide
@@ -273,11 +273,6 @@ shared_ptr<transport::event::schema_change> alter_table_statement::change_event(
transport::event::schema_change::target_type::TABLE, keyspace(), column_family());
}
shared_ptr<cql3::statements::prepared_statement>
cql3::statements::alter_table_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<alter_table_statement>(*this));
}
}
}

View File

@@ -80,7 +80,6 @@ public:
virtual void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) override;
virtual future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
virtual shared_ptr<transport::event::schema_change> change_event() override;
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -39,7 +39,6 @@
#include "cql3/statements/alter_type_statement.hh"
#include "cql3/statements/create_type_statement.hh"
#include "prepared_statement.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
#include "boost/range/adaptor/map.hpp"
@@ -228,16 +227,6 @@ user_type alter_type_statement::renames::make_updated_type(database& db, user_ty
return updated;
}
shared_ptr<cql3::statements::prepared_statement>
alter_type_statement::add_or_alter::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<alter_type_statement::add_or_alter>(*this));
}
shared_ptr<cql3::statements::prepared_statement>
alter_type_statement::renames::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<alter_type_statement::renames>(*this));
}
}
}

View File

@@ -84,7 +84,6 @@ public:
const shared_ptr<column_identifier> field_name,
const shared_ptr<cql3_type::raw> field_type);
virtual user_type make_updated_type(database& db, user_type to_update) const override;
virtual shared_ptr<prepared> prepare(database& db) override;
private:
user_type do_add(database& db, user_type to_update) const;
user_type do_alter(database& db, user_type to_update) const;
@@ -101,7 +100,6 @@ public:
void add_rename(shared_ptr<column_identifier> previous_name, shared_ptr<column_identifier> new_name);
virtual user_type make_updated_type(database& db, user_type to_update) const override;
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -58,7 +58,7 @@ bool batch_statement::depends_on_column_family(const sstring& cf_name) const
}
void batch_statement::verify_batch_size(const std::vector<mutation>& mutations) {
size_t warn_threshold = service::get_local_storage_proxy().get_db().local().get_config().batch_size_warn_threshold_in_kb() * 1024;
size_t warn_threshold = service::get_local_storage_proxy().get_db().local().get_config().batch_size_warn_threshold_in_kb();
class my_partition_visitor : public mutation_partition_visitor {
public:
@@ -87,15 +87,17 @@ void batch_statement::verify_batch_size(const std::vector<mutation>& mutations)
m.partition().accept(*m.schema(), v);
}
if (v.size > warn_threshold) {
auto size = v.size / 1024;
if (size > warn_threshold) {
std::unordered_set<sstring> ks_cf_pairs;
for (auto&& m : mutations) {
ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());
}
_logger.warn(
"Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}",
join(", ", ks_cf_pairs), v.size, warn_threshold,
v.size - warn_threshold, "");
join(", ", ks_cf_pairs), size, warn_threshold,
size - warn_threshold, "");
}
}

View File

@@ -40,7 +40,6 @@
*/
#include "create_index_statement.hh"
#include "prepared_statement.hh"
#include "validation.hh"
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
@@ -206,9 +205,4 @@ cql3::statements::create_index_statement::announce_migration(distributed<service
});
}
shared_ptr<cql3::statements::prepared_statement>
cql3::statements::create_index_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<create_index_statement>(*this));
}

View File

@@ -87,7 +87,6 @@ public:
transport::event::schema_change::target_type::TABLE, keyspace(),
column_family());
}
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -40,7 +40,6 @@
*/
#include "cql3/statements/create_keyspace_statement.hh"
#include "prepared_statement.hh"
#include "service/migration_manager.hh"
@@ -123,11 +122,6 @@ shared_ptr<transport::event::schema_change> create_keyspace_statement::change_ev
return make_shared<transport::event::schema_change>(transport::event::schema_change::change_type::CREATED, keyspace());
}
shared_ptr<cql3::statements::prepared_statement>
cql3::statements::create_keyspace_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<create_keyspace_statement>(*this));
}
}
}

View File

@@ -84,7 +84,6 @@ public:
virtual future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
virtual shared_ptr<transport::event::schema_change> change_event() override;
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -156,14 +156,6 @@ void create_table_statement::add_column_metadata_from_aliases(schema_builder& bu
}
}
shared_ptr<prepared_statement>
create_table_statement::prepare(database& db) {
// Cannot happen; create_table_statement is never instantiated as a raw statement
// (instead we instantiate create_table_statement::raw_statement)
abort();
}
create_table_statement::raw_statement::raw_statement(::shared_ptr<cf_name> name, bool if_not_exists)
: cf_statement{std::move(name)}
, _if_not_exists{if_not_exists}

View File

@@ -103,8 +103,6 @@ public:
virtual shared_ptr<transport::event::schema_change> change_event() override;
virtual shared_ptr<prepared> prepare(database& db) override;
schema_ptr get_cf_meta_data();
class raw_statement;

View File

@@ -38,7 +38,6 @@
*/
#include "cql3/statements/create_type_statement.hh"
#include "prepared_statement.hh"
#include "service/migration_manager.hh"
@@ -156,11 +155,6 @@ future<bool> create_type_statement::announce_migration(distributed<service::stor
return service::get_local_migration_manager().announce_new_type(type, is_local_only).then([] { return true; });
}
shared_ptr<cql3::statements::prepared_statement>
create_type_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<create_type_statement>(*this));
}
}
}

View File

@@ -69,8 +69,6 @@ public:
virtual future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
virtual shared_ptr<prepared> prepare(database& db) override;
static void check_for_duplicate_names(user_type type);
private:
bool type_exists_in(::keyspace& ks);

View File

@@ -40,7 +40,6 @@
*/
#include "cql3/statements/drop_keyspace_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "service/migration_manager.hh"
#include "transport/event.hh"
@@ -98,11 +97,6 @@ shared_ptr<transport::event::schema_change> drop_keyspace_statement::change_even
}
shared_ptr<cql3::statements::prepared_statement>
drop_keyspace_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<drop_keyspace_statement>(*this));
}
}
}

View File

@@ -62,8 +62,6 @@ public:
virtual future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
virtual shared_ptr<transport::event::schema_change> change_event() override;
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -40,7 +40,6 @@
*/
#include "cql3/statements/drop_table_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "service/migration_manager.hh"
@@ -99,11 +98,6 @@ shared_ptr<transport::event::schema_change> drop_table_statement::change_event()
column_family());
}
shared_ptr<cql3::statements::prepared_statement>
drop_table_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<drop_table_statement>(*this));
}
}
}

View File

@@ -61,8 +61,6 @@ public:
virtual future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
virtual shared_ptr<transport::event::schema_change> change_event() override;
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -38,7 +38,6 @@
*/
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "boost/range/adaptor/map.hpp"
@@ -117,11 +116,6 @@ future<bool> drop_type_statement::announce_migration(distributed<service::storag
return service::get_local_migration_manager().announce_type_drop(to_drop->second, is_local_only).then([] { return true; });
}
shared_ptr<cql3::statements::prepared_statement>
drop_type_statement::prepare(database& db) {
return make_shared<prepared_statement>(make_shared<drop_type_statement>(*this));
}
}
}

View File

@@ -64,8 +64,6 @@ public:
virtual const sstring& keyspace() const override;
virtual future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
virtual shared_ptr<prepared> prepare(database& db) override;
};
}

View File

@@ -447,8 +447,6 @@ modification_statement::execute(distributed<service::storage_proxy>& proxy, serv
throw exceptions::invalid_request_exception("Conditional updates are not supported by the protocol version in use. You need to upgrade to a driver using the native protocol v2.");
}
tracing::add_table_name(qs.get_trace_state(), keyspace(), column_family());
if (has_conditions()) {
return execute_with_condition(proxy, qs, options);
}
@@ -510,9 +508,6 @@ modification_statement::execute_internal(distributed<service::storage_proxy>& pr
if (has_conditions()) {
throw exceptions::unsupported_operation_exception();
}
tracing::add_table_name(qs.get_trace_state(), keyspace(), column_family());
return get_mutations(proxy, options, true, options.get_timestamp(qs), qs.get_trace_state()).then(
[&proxy] (auto mutations) {
return proxy.local().mutate_locally(std::move(mutations));

View File

@@ -86,6 +86,11 @@ void schema_altering_statement::prepare_keyspace(const service::client_state& st
}
}
::shared_ptr<prepared_statement> schema_altering_statement::prepare(database& db)
{
return ::make_shared<prepared>(this->shared_from_this());
}
future<::shared_ptr<messages::result_message>>
schema_altering_statement::execute0(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options, bool is_local_only) {
// If an IF [NOT] EXISTS clause was used, this may not result in an actual schema change. To avoid doing

View File

@@ -60,7 +60,7 @@ namespace messages = transport::messages;
/**
* Abstract class for statements that alter the schema.
*/
class schema_altering_statement : public raw::cf_statement, public cql_statement_no_metadata {
class schema_altering_statement : public raw::cf_statement, public cql_statement_no_metadata, public ::enable_shared_from_this<schema_altering_statement> {
private:
const bool _is_column_family_level;
@@ -81,6 +81,8 @@ protected:
virtual void prepare_keyspace(const service::client_state& state) override;
virtual ::shared_ptr<prepared> prepare(database& db) override;
virtual shared_ptr<transport::event::schema_change> change_event() = 0;
/**

View File

@@ -218,8 +218,6 @@ select_statement::execute(distributed<service::storage_proxy>& proxy,
service::query_state& state,
const query_options& options)
{
tracing::add_table_name(state.get_trace_state(), keyspace(), column_family());
auto cl = options.get_consistency();
validate_for_read(_schema->ks_name(), cl);
@@ -329,8 +327,6 @@ select_statement::execute_internal(distributed<service::storage_proxy>& proxy,
make_partition_slice(options), limit, to_gc_clock(now), std::experimental::nullopt, query::max_partitions, options.get_timestamp(state));
auto partition_ranges = _restrictions->get_partition_key_ranges(options);
tracing::add_table_name(state.get_trace_state(), keyspace(), column_family());
if (needs_post_query_ordering() && _limit) {
return do_with(std::move(partition_ranges), [this, &proxy, &state, command] (auto prs) {
query::result_merger merger;

View File

@@ -64,16 +64,16 @@ void update_statement::add_update_for_key(mutation& m, const exploded_clustering
throw exceptions::invalid_request_exception(sprint("Missing PRIMARY KEY part %s", s->clustering_key_columns().begin()->name_as_text()));
}
// An empty name for the value is what we use to recognize the case where there is not column
// An empty name for the compact value is what we use to recognize the case where there is not column
// outside the PK, see CreateStatement.
if (s->regular_begin()->name().empty()) {
if (s->compact_column().name().empty()) {
// There is no column outside the PK. So no operation could have passed through validation
assert(_column_operations.empty());
constants::setter(*s->regular_begin(), make_shared(constants::value(bytes()))).execute(m, prefix, params);
constants::setter(s->compact_column(), make_shared(constants::value(bytes()))).execute(m, prefix, params);
} else {
// dense means we don't have a row marker, so don't accept to set only the PK. See CASSANDRA-5648.
if (_column_operations.empty()) {
throw exceptions::invalid_request_exception(sprint("Column %s is mandatory for this COMPACT STORAGE table", s->regular_begin()->name_as_text()));
throw exceptions::invalid_request_exception(sprint("Column %s is mandatory for this COMPACT STORAGE table", s->compact_column().name_as_text()));
}
}
} else {

View File

@@ -43,7 +43,6 @@
#include <boost/range/adaptor/map.hpp>
#include "locator/simple_snitch.hh"
#include <boost/algorithm/cxx11/all_of.hpp>
#include <boost/algorithm/cxx11/any_of.hpp>
#include <boost/function_output_iterator.hpp>
#include <boost/range/algorithm/heap_algorithm.hpp>
#include <boost/range/algorithm/remove_if.hpp>
@@ -155,10 +154,9 @@ mutation_source
column_family::sstables_as_mutation_source() {
return mutation_source([this] (schema_ptr s,
const query::partition_range& r,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state) {
return make_sstable_reader(std::move(s), r, slice, pc, std::move(trace_state));
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) {
return make_sstable_reader(std::move(s), r, ck_filtering, pc);
});
}
@@ -188,138 +186,6 @@ bool belongs_to_current_shard(const streamed_mutation& m) {
return dht::shard_of(m.decorated_key().token()) == engine().cpu_id();
}
// Stores ranges for all components of the same clustering key, index 0 referring to component
// range 0, and so on.
using ck_filter_clustering_key_components = std::vector<nonwrapping_range<bytes_view>>;
// Stores an entry for each clustering key range specified by the filter.
using ck_filter_clustering_key_ranges = std::vector<ck_filter_clustering_key_components>;
// Used to split a clustering key range into a range for each component.
// If a range in ck_filtering_all_ranges is composite, a range will be created
// for each component. If it's not composite, a single range is created.
// This split is needed to check for overlap in each component individually.
static ck_filter_clustering_key_ranges
ranges_for_clustering_key_filter(const schema_ptr& schema, const query::clustering_row_ranges& ck_filtering_all_ranges) {
ck_filter_clustering_key_ranges ranges;
for (auto& r : ck_filtering_all_ranges) {
// this vector stores a range for each component of a key, only one if not composite.
ck_filter_clustering_key_components composite_ranges;
if (r.is_full()) {
ranges.push_back({ nonwrapping_range<bytes_view>::make_open_ended_both_sides() });
continue;
}
auto start = r.start() ? r.start()->value().components() : clustering_key_prefix::make_empty().components();
auto end = r.end() ? r.end()->value().components() : clustering_key_prefix::make_empty().components();
auto start_it = start.begin();
auto end_it = end.begin();
// This test is enough because equal bounds in nonwrapping_range are inclusive.
auto is_singular = [&schema] (const auto& type_it, const bytes_view& b1, const bytes_view& b2) {
if (type_it == schema->clustering_key_type()->types().end()) {
throw std::runtime_error(sprint("clustering key filter passed more components than defined in schema of %s.%s",
schema->ks_name(), schema->cf_name()));
}
return (*type_it)->compare(b1, b2) == 0;
};
auto type_it = schema->clustering_key_type()->types().begin();
composite_ranges.reserve(schema->clustering_key_size());
// the rule is to ignore any component cn if another component ck (k < n) is not if the form [v, v].
// If we have [v1, v1], [v2, v2], ... {vl3, vr3}, ....
// then we generate [v1, v1], [v2, v2], ... {vl3, vr3}. Where { = '(' or '[', etc.
while (start_it != start.end() && end_it != end.end() && is_singular(type_it++, *start_it, *end_it)) {
composite_ranges.push_back(nonwrapping_range<bytes_view>({{ std::move(*start_it++), true }},
{{ std::move(*end_it++), true }}));
}
// handle a single non-singular tail element, if present
if (start_it != start.end() && end_it != end.end()) {
composite_ranges.push_back(nonwrapping_range<bytes_view>({{ std::move(*start_it), r.start()->is_inclusive() }},
{{ std::move(*end_it), r.end()->is_inclusive() }}));
} else if (start_it != start.end()) {
composite_ranges.push_back(nonwrapping_range<bytes_view>({{ std::move(*start_it), r.start()->is_inclusive() }}, {}));
} else if (end_it != end.end()) {
composite_ranges.push_back(nonwrapping_range<bytes_view>({}, {{ std::move(*end_it), r.end()->is_inclusive() }}));
}
ranges.push_back(std::move(composite_ranges));
}
return ranges;
}
// Return true if this sstable possibly stores clustering row(s) specified by ranges.
static inline bool
contains_rows(const sstables::sstable& sst, const schema_ptr& schema, const ck_filter_clustering_key_ranges& ranges) {
auto& clustering_key_types = schema->clustering_key_type()->types();
auto& clustering_components_ranges = sst.clustering_components_ranges();
if (!schema->clustering_key_size() || clustering_components_ranges.empty()) {
return true;
}
return boost::algorithm::any_of(ranges, [&] (const ck_filter_clustering_key_components& range) {
auto s = std::min(range.size(), clustering_components_ranges.size());
return boost::algorithm::all_of(boost::irange<unsigned>(0, s), [&] (unsigned i) {
auto& type = clustering_key_types[i];
return range[i].is_full() || range[i].overlaps(clustering_components_ranges[i], type->as_tri_comparator());
});
});
}
// Filter out sstables for reader using bloom filter and sstable metadata that keeps track
// of a range for each clustering component.
static std::vector<sstables::shared_sstable>
filter_sstable_for_reader(std::vector<sstables::shared_sstable>&& sstables, column_family& cf, const schema_ptr& schema,
const sstables::key& key, const query::partition_slice& slice) {
auto sstable_has_not_key = [&] (const sstables::shared_sstable& sst) {
return !sst->filter_has_key(key);
};
sstables.erase(boost::remove_if(sstables, sstable_has_not_key), sstables.end());
// no clustering filtering is applied if schema defines no clustering key or
// compaction strategy thinks it will not benefit from such an optimization.
if (!schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
return sstables;
}
::cf_stats* stats = cf.cf_stats();
stats->clustering_filter_count++;
stats->sstables_checked_by_clustering_filter += sstables.size();
auto ck_filtering_all_ranges = slice.get_all_ranges();
// fast path to include all sstables if only one full range was specified.
// For example, this happens if query only specifies a partition key.
if (ck_filtering_all_ranges.size() == 1 && ck_filtering_all_ranges[0].is_full()) {
stats->clustering_filter_fast_path_count++;
stats->surviving_sstables_after_clustering_filter += sstables.size();
return sstables;
}
auto ranges = ranges_for_clustering_key_filter(schema, ck_filtering_all_ranges);
if (ranges.empty()) {
return {};
}
int64_t min_timestamp = std::numeric_limits<int64_t>::max();
auto sstable_has_clustering_key = [&min_timestamp, &schema, &ranges] (const sstables::shared_sstable& sst) {
if (!contains_rows(*sst, schema, ranges)) {
return false; // ordered after sstables that contain clustering rows.
} else {
min_timestamp = std::min(min_timestamp, sst->get_stats_metadata().min_timestamp);
return true;
}
};
auto sstable_has_relevant_tombstone = [&min_timestamp] (const sstables::shared_sstable& sst) {
const auto& stats = sst->get_stats_metadata();
// re-add sstable as candidate if it contains a tombstone that may cover a row in an included sstable.
return (stats.max_timestamp > min_timestamp && stats.estimated_tombstone_drop_time.bin.map.size());
};
auto skipped = std::partition(sstables.begin(), sstables.end(), sstable_has_clustering_key);
auto actually_skipped = std::partition(skipped, sstables.end(), sstable_has_relevant_tombstone);
sstables.erase(actually_skipped, sstables.end());
stats->surviving_sstables_after_clustering_filter += sstables.size();
return sstables;
}
class range_sstable_reader final : public mutation_reader::impl {
const query::partition_range& _pr;
lw_shared_ptr<sstables::sstable_set> _sstables;
@@ -327,25 +193,23 @@ class range_sstable_reader final : public mutation_reader::impl {
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class& _pc;
tracing::trace_state_ptr _trace_state;
query::clustering_key_filtering_context _ck_filtering;
public:
range_sstable_reader(schema_ptr s,
lw_shared_ptr<sstables::sstable_set> sstables,
const query::partition_range& pr,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state)
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc)
: _pr(pr)
, _sstables(std::move(sstables))
, _pc(pc)
, _trace_state(std::move(trace_state))
, _ck_filtering(ck_filtering)
{
std::vector<mutation_reader> readers;
for (const lw_shared_ptr<sstables::sstable>& sst : _sstables->select(pr)) {
tracing::trace(_trace_state, "Reading partition range {} from sstable {}", _pr, seastar::value_of([&sst] { return sst->get_filename(); }));
// FIXME: make sstable::read_range_rows() return ::mutation_reader so that we can drop this wrapper.
mutation_reader reader =
make_mutation_reader<sstable_range_wrapping_reader>(sst, s, pr, slice, _pc);
make_mutation_reader<sstable_range_wrapping_reader>(sst, s, pr, _ck_filtering, _pc);
if (sst->is_shared()) {
reader = make_filtering_reader(std::move(reader), belongs_to_current_shard);
}
@@ -362,48 +226,37 @@ public:
};
class single_key_sstable_reader final : public mutation_reader::impl {
column_family* _cf;
schema_ptr _schema;
dht::ring_position _rp;
sstables::key _key;
std::vector<streamed_mutation> _mutations;
bool _done = false;
lw_shared_ptr<sstables::sstable_set> _sstables;
utils::estimated_histogram& _sstable_histogram;
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class& _pc;
const query::partition_slice& _slice;
tracing::trace_state_ptr _trace_state;
query::clustering_key_filtering_context _ck_filtering;
public:
single_key_sstable_reader(column_family* cf,
schema_ptr schema,
single_key_sstable_reader(schema_ptr schema,
lw_shared_ptr<sstables::sstable_set> sstables,
utils::estimated_histogram& sstable_histogram,
const partition_key& key,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state)
: _cf(cf)
, _schema(std::move(schema))
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc)
: _schema(std::move(schema))
, _rp(dht::global_partitioner().decorate_key(*_schema, key))
, _key(sstables::key::from_partition_key(*_schema, key))
, _sstables(std::move(sstables))
, _sstable_histogram(sstable_histogram)
, _pc(pc)
, _slice(slice)
, _trace_state(std::move(trace_state))
, _ck_filtering(ck_filtering)
{ }
virtual future<streamed_mutation_opt> operator()() override {
if (_done) {
return make_ready_future<streamed_mutation_opt>();
}
auto candidates = filter_sstable_for_reader(_sstables->select(query::partition_range(_rp)), *_cf, _schema, _key, _slice);
return parallel_for_each(std::move(candidates),
return parallel_for_each(_sstables->select(query::partition_range(_rp)),
[this](const lw_shared_ptr<sstables::sstable>& sstable) {
tracing::trace(_trace_state, "Reading key {} from sstable {}", *_rp.key(), seastar::value_of([&sstable] { return sstable->get_filename(); }));
return sstable->read_row(_schema, _key, _slice, _pc).then([this](auto smo) {
return sstable->read_row(_schema, _key, _ck_filtering, _pc).then([this](auto smo) {
if (smo) {
_mutations.emplace_back(std::move(*smo));
}
@@ -413,7 +266,6 @@ public:
if (_mutations.empty()) {
return { };
}
_sstable_histogram.add(_mutations.size());
return merge_mutations(std::move(_mutations));
});
}
@@ -422,13 +274,12 @@ public:
mutation_reader
column_family::make_sstable_reader(schema_ptr s,
const query::partition_range& pr,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state) const {
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) const {
// restricts a reader's concurrency if the configuration specifies it
auto restrict_reader = [&] (mutation_reader&& in) {
auto&& config = [this, &pc] () -> const restricted_mutation_reader_config& {
if (service::get_local_streaming_read_priority().id() == pc.id()) {
if (service::get_local_streaming_read_priority() == pc) {
return _config.streaming_read_concurrency_config;
}
return _config.read_concurrency_config;
@@ -445,11 +296,10 @@ column_family::make_sstable_reader(schema_ptr s,
if (dht::shard_of(pos.token()) != engine().cpu_id()) {
return make_empty_reader(); // range doesn't belong to this shard
}
return restrict_reader(make_mutation_reader<single_key_sstable_reader>(const_cast<column_family*>(this), std::move(s), _sstables,
_stats.estimated_sstable_per_read, *pos.key(), slice, pc, std::move(trace_state)));
return restrict_reader(make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key(), ck_filtering, pc));
} else {
// range_sstable_reader is not movable so we need to wrap it
return restrict_reader(make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr, slice, pc, std::move(trace_state)));
return restrict_reader(make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr, ck_filtering, pc));
}
}
@@ -511,9 +361,8 @@ column_family::find_row(schema_ptr s, const dht::decorated_key& partition_key, c
mutation_reader
column_family::make_reader(schema_ptr s,
const query::partition_range& range,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state) const {
const query::clustering_key_filtering_context& ck_filtering,
const io_priority_class& pc) const {
if (query::is_wrap_around(range, *s)) {
// make_combined_reader() can't handle streams that wrap around yet.
fail(unimplemented::cause::WRAP_AROUND);
@@ -543,40 +392,18 @@ column_family::make_reader(schema_ptr s,
// https://github.com/scylladb/scylla/issues/185
for (auto&& mt : *_memtables) {
readers.emplace_back(mt->make_reader(s, range, slice, pc));
readers.emplace_back(mt->make_reader(s, range, ck_filtering, pc));
}
if (_config.enable_cache) {
readers.emplace_back(_cache.make_reader(s, range, slice, pc, std::move(trace_state)));
readers.emplace_back(_cache.make_reader(s, range, ck_filtering, pc));
} else {
readers.emplace_back(make_sstable_reader(s, range, slice, pc, std::move(trace_state)));
readers.emplace_back(make_sstable_reader(s, range, ck_filtering, pc));
}
return make_combined_reader(std::move(readers));
}
mutation_reader
column_family::make_streaming_reader(schema_ptr s,
const query::partition_range& range) const {
auto& slice = query::full_slice;
auto& pc = service::get_local_streaming_read_priority();
if (query::is_wrap_around(range, *s)) {
// make_combined_reader() can't handle streams that wrap around yet.
fail(unimplemented::cause::WRAP_AROUND);
}
std::vector<mutation_reader> readers;
readers.reserve(_memtables->size() + 1);
for (auto&& mt : *_memtables) {
readers.emplace_back(mt->make_reader(s, range, slice, pc));
}
readers.emplace_back(make_sstable_reader(s, range, slice, pc, nullptr));
return make_combined_reader(std::move(readers));
}
// Not performance critical. Currently used for testing only.
template <typename Func>
future<bool>
@@ -666,11 +493,7 @@ protected:
});
}
future<> done() {
return _listing.done().then([this] {
return _f.close();
});
}
future<> done() { return _listing.done(); }
private:
future<directory_entry> guarantee_type(directory_entry de) {
if (de.type) {
@@ -798,7 +621,7 @@ future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sst
}
return load_sstable(sstables::sstable(
_schema, sstdir, comps.generation,
_schema->ks_name(), _schema->cf_name(), sstdir, comps.generation,
comps.version, comps.format)).then_wrapped([fname, comps] (future<> f) {
try {
f.get();
@@ -891,7 +714,7 @@ column_family::seal_active_streaming_memtable_immediate() {
_config.streaming_dirty_memory_manager->serialize_flush([this, old] {
return with_lock(_sstables_lock.for_read(), [this, old] {
auto newtab = make_lw_shared<sstables::sstable>(_schema,
auto newtab = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(),
_config.datadir, calculate_generation_for_new_table(),
sstables::sstable::version_types::ka,
sstables::sstable::format_types::big);
@@ -946,7 +769,7 @@ future<> column_family::seal_active_streaming_memtable_big(streaming_memtable_bi
return with_gate(_streaming_flush_gate, [this, old, &smb] {
return with_gate(smb.flush_in_progress, [this, old, &smb] {
return with_lock(_sstables_lock.for_read(), [this, old, &smb] {
auto newtab = make_lw_shared<sstables::sstable>(_schema,
auto newtab = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(),
_config.datadir, calculate_generation_for_new_table(),
sstables::sstable::version_types::ka,
sstables::sstable::format_types::big);
@@ -982,11 +805,6 @@ column_family::seal_active_memtable(memtable_list::flush_behavior ignored) {
_highest_flushed_rp = old->replay_position();
return _flush_queue->run_cf_flush(old->replay_position(), [old, this] {
auto memtable_size = old->occupancy().total_space();
_config.cf_stats->pending_memtables_flushes_count++;
_config.cf_stats->pending_memtables_flushes_bytes += memtable_size;
return _config.dirty_memory_manager->serialize_flush([this, old] {
return repeat([this, old] {
return with_lock(_sstables_lock.for_read(), [this, old] {
@@ -994,9 +812,6 @@ column_family::seal_active_memtable(memtable_list::flush_behavior ignored) {
return try_flush_memtable_to_sstable(old);
});
});
}).then([this, memtable_size] {
_config.cf_stats->pending_memtables_flushes_count--;
_config.cf_stats->pending_memtables_flushes_bytes -= memtable_size;
});
}, [old, this] {
if (_commitlog) {
@@ -1011,11 +826,15 @@ future<stop_iteration>
column_family::try_flush_memtable_to_sstable(lw_shared_ptr<memtable> old) {
auto gen = calculate_generation_for_new_table();
auto newtab = make_lw_shared<sstables::sstable>(_schema,
auto newtab = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(),
_config.datadir, gen,
sstables::sstable::version_types::ka,
sstables::sstable::format_types::big);
auto memtable_size = old->occupancy().total_space();
_config.cf_stats->pending_memtables_flushes_count++;
_config.cf_stats->pending_memtables_flushes_bytes += memtable_size;
newtab->set_unshared();
dblog.debug("Flushing to {}", newtab->get_filename());
// Note that due to our sharded architecture, it is possible that
@@ -1032,7 +851,9 @@ column_family::try_flush_memtable_to_sstable(lw_shared_ptr<memtable> old) {
auto&& priority = service::get_local_memtable_flush_priority();
return newtab->write_components(*old, incremental_backups_enabled(), priority).then([this, newtab, old] {
return newtab->open_data();
}).then_wrapped([this, old, newtab] (future<> ret) {
}).then_wrapped([this, old, newtab, memtable_size] (future<> ret) {
_config.cf_stats->pending_memtables_flushes_count--;
_config.cf_stats->pending_memtables_flushes_bytes -= memtable_size;
dblog.debug("Flushing to {} done", newtab->get_filename());
try {
ret.get();
@@ -1100,7 +921,7 @@ future<std::vector<sstables::entry_descriptor>> column_family::flush_upload_dir(
if (comps.component != sstables::sstable::component_type::TOC) {
return make_ready_future<>();
}
auto sst = make_lw_shared<sstables::sstable>(_schema,
auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(),
_config.datadir + "/upload", comps.generation,
comps.version, comps.format);
work.sstables.emplace(comps.generation, std::move(sst));
@@ -1155,7 +976,7 @@ column_family::reshuffle_sstables(std::set<int64_t> all_generations, int64_t sta
if (work.all_generations.count(comps.generation) != 0) {
return make_ready_future<>();
}
auto sst = make_lw_shared<sstables::sstable>(_schema,
auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(),
_config.datadir, comps.generation,
comps.version, comps.format);
work.sstables.emplace(comps.generation, std::move(sst));
@@ -1247,17 +1068,10 @@ column_family::rebuild_sstable_list(const std::vector<sstables::shared_sstable>&
// Second, delete the old sstables. This is done in the background, so we can
// consider this compaction completed.
seastar::with_gate(_sstable_deletion_gate, [this, sstables_to_remove] {
return sstables::delete_atomically(sstables_to_remove).then_wrapped([this, sstables_to_remove] (future<> f) {
std::exception_ptr eptr;
try {
f.get();
} catch(...) {
eptr = std::current_exception();
}
return sstables::delete_atomically(sstables_to_remove).then([this, sstables_to_remove] {
auto current_sstables = _sstables;
auto new_sstable_list = make_lw_shared<sstable_list>();
// unconditionally remove compacted sstables from _sstables_compacted_but_not_deleted,
// or they could stay forever in the set, resulting in deleted files remaining
// opened and disk space not being released until shutdown.
std::unordered_set<sstables::shared_sstable> s(
sstables_to_remove.begin(), sstables_to_remove.end());
auto e = boost::range::remove_if(_sstables_compacted_but_not_deleted, [&] (sstables::shared_sstable sst) -> bool {
@@ -1265,11 +1079,6 @@ column_family::rebuild_sstable_list(const std::vector<sstables::shared_sstable>&
});
_sstables_compacted_but_not_deleted.erase(e, _sstables_compacted_but_not_deleted.end());
rebuild_statistics();
if (eptr) {
return make_exception_future<>(eptr);
}
return make_ready_future<>();
}).handle_exception([] (std::exception_ptr e) {
try {
std::rethrow_exception(e);
@@ -1293,7 +1102,7 @@ column_family::compact_sstables(sstables::compaction_descriptor descriptor, bool
auto create_sstable = [this] {
auto gen = this->calculate_generation_for_new_table();
// FIXME: use "tmp" marker in names of incomplete sstable
auto sst = make_lw_shared<sstables::sstable>(_schema, _config.datadir, gen,
auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), _config.datadir, gen,
sstables::sstable::version_types::ka,
sstables::sstable::format_types::big);
sst->set_unshared();
@@ -1301,7 +1110,7 @@ column_family::compact_sstables(sstables::compaction_descriptor descriptor, bool
};
return sstables::compact_sstables(*sstables_to_compact, *this, create_sstable, descriptor.max_sstable_bytes, descriptor.level,
cleanup).then([this, sstables_to_compact] (auto new_sstables) {
_compaction_strategy.notify_completion(*sstables_to_compact, new_sstables);
_compaction_strategy.notify_completion(_schema, *sstables_to_compact, new_sstables);
return this->rebuild_sstable_list(new_sstables, *sstables_to_compact);
});
});
@@ -1310,8 +1119,8 @@ column_family::compact_sstables(sstables::compaction_descriptor descriptor, bool
static bool needs_cleanup(const lw_shared_ptr<sstables::sstable>& sst,
const lw_shared_ptr<std::vector<range<dht::token>>>& owned_ranges,
schema_ptr s) {
auto first = sst->get_first_partition_key();
auto last = sst->get_last_partition_key();
auto first = sst->get_first_partition_key(*s);
auto last = sst->get_last_partition_key(*s);
auto first_token = dht::global_partitioner().get_token(*s, first);
auto last_token = dht::global_partitioner().get_token(*s, last);
range<dht::token> sst_token_range = range<dht::token>::make(first_token, last_token);
@@ -1344,7 +1153,7 @@ future<>
column_family::load_new_sstables(std::vector<sstables::entry_descriptor> new_tables) {
return parallel_for_each(new_tables, [this] (auto comps) {
return this->load_sstable(sstables::sstable(
_schema, _config.datadir,
_schema->ks_name(), _schema->cf_name(), _config.datadir,
comps.generation, comps.version, comps.format), true);
}).then([this] {
start_rewrite();
@@ -1637,34 +1446,6 @@ database::setup_collectd() {
, scollectd::make_typed(scollectd::data_type::GAUGE, _cf_stats.pending_memtables_flushes_bytes)
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "total_operations", "clustering_filter")
, scollectd::make_typed(scollectd::data_type::DERIVE, _cf_stats.clustering_filter_count)
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "total_operations", "clustering_filter")
, scollectd::make_typed(scollectd::data_type::DERIVE, _cf_stats.sstables_checked_by_clustering_filter)
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "total_operations", "clustering_filter")
, scollectd::make_typed(scollectd::data_type::DERIVE, _cf_stats.clustering_filter_fast_path_count)
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "total_operations", "clustering_filter")
, scollectd::make_typed(scollectd::data_type::DERIVE, _cf_stats.surviving_sstables_after_clustering_filter)
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
@@ -2264,15 +2045,15 @@ struct query_state {
};
future<lw_shared_ptr<query::result>>
column_family::query(schema_ptr s, const query::read_command& cmd, query::result_request request, const std::vector<query::partition_range>& partition_ranges, tracing::trace_state_ptr trace_state) {
column_family::query(schema_ptr s, const query::read_command& cmd, query::result_request request, const std::vector<query::partition_range>& partition_ranges) {
utils::latency_counter lc;
_stats.reads.set_latency(lc);
auto qs_ptr = std::make_unique<query_state>(std::move(s), cmd, request, partition_ranges);
auto& qs = *qs_ptr;
{
return do_until(std::bind(&query_state::done, &qs), [this, &qs, trace_state = std::move(trace_state)] {
return do_until(std::bind(&query_state::done, &qs), [this, &qs] {
auto&& range = *qs.current_partition_range++;
return data_query(qs.schema, as_mutation_source(trace_state), range, qs.cmd.slice, qs.limit, qs.partition_limit,
return data_query(qs.schema, as_mutation_source(), range, qs.cmd.slice, qs.limit, qs.partition_limit,
qs.cmd.timestamp, qs.builder).then([&qs] (auto&& r) {
qs.limit -= r.live_rows;
qs.partition_limit -= r.partitions;
@@ -2290,28 +2071,28 @@ column_family::query(schema_ptr s, const query::read_command& cmd, query::result
}
mutation_source
column_family::as_mutation_source(tracing::trace_state_ptr trace_state) const {
return mutation_source([this, trace_state = std::move(trace_state)] (schema_ptr s,
column_family::as_mutation_source() const {
return mutation_source([this] (schema_ptr s,
const query::partition_range& range,
const query::partition_slice& slice,
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) {
return this->make_reader(std::move(s), range, slice, pc, std::move(trace_state));
return this->make_reader(std::move(s), range, ck_filtering, pc);
});
}
future<lw_shared_ptr<query::result>>
database::query(schema_ptr s, const query::read_command& cmd, query::result_request request, const std::vector<query::partition_range>& ranges, tracing::trace_state_ptr trace_state) {
database::query(schema_ptr s, const query::read_command& cmd, query::result_request request, const std::vector<query::partition_range>& ranges) {
column_family& cf = find_column_family(cmd.cf_id);
return cf.query(std::move(s), cmd, request, ranges, std::move(trace_state)).then([this, s = _stats] (auto&& res) {
return cf.query(std::move(s), cmd, request, ranges).then([this, s = _stats] (auto&& res) {
++s->total_reads;
return std::move(res);
});
}
future<reconcilable_result>
database::query_mutations(schema_ptr s, const query::read_command& cmd, const query::partition_range& range, tracing::trace_state_ptr trace_state) {
database::query_mutations(schema_ptr s, const query::read_command& cmd, const query::partition_range& range) {
column_family& cf = find_column_family(cmd.cf_id);
return mutation_query(std::move(s), cf.as_mutation_source(std::move(trace_state)), range, cmd.slice, cmd.row_limit, cmd.partition_limit,
return mutation_query(std::move(s), cf.as_mutation_source(), range, cmd.slice, cmd.row_limit, cmd.partition_limit,
cmd.timestamp).then([this, s = _stats] (auto&& res) {
++s->total_reads;
return std::move(res);
@@ -3012,23 +2793,15 @@ future<std::unordered_map<sstring, column_family::snapshot_details>> column_fami
}
future<> column_family::flush() {
// FIXME: this will synchronously wait for this write to finish, but doesn't guarantee
// anything about previous writes.
_stats.pending_flushes++;
auto fut = _memtables->seal_active_memtable(memtable_list::flush_behavior::immediate);
// this rp is either:
// a.) Done - no-op
// b.) Ours
// c.) The last active flush not finished. If our latest memtable is
// empty it still makes sense for this api call to wait for this.
auto high_rp = _highest_flushed_rp;
return fut.finally([this, high_rp] {
return _memtables->seal_active_memtable(memtable_list::flush_behavior::immediate).finally([this]() mutable {
_stats.pending_flushes--;
// In origin memtable_switch_count is incremented inside
// ColumnFamilyMeetrics Flush.run
_stats.memtable_switch_count++;
// wait for all up until us.
return _flush_queue->wait_for_pending(high_rp);
return make_ready_future<>();
});
}

View File

@@ -67,13 +67,12 @@
#include "sstables/compaction_manager.hh"
#include "utils/exponential_backoff_retry.hh"
#include "utils/histogram.hh"
#include "utils/estimated_histogram.hh"
#include "sstables/estimated_histogram.hh"
#include "sstables/compaction.hh"
#include "sstables/sstable_set.hh"
#include "key_reader.hh"
#include <seastar/core/rwlock.hh>
#include <seastar/core/shared_future.hh>
#include "tracing/trace_state.hh"
class frozen_mutation;
class reconcilable_result;
@@ -300,15 +299,6 @@ using sstable_list = sstables::sstable_list;
struct cf_stats {
int64_t pending_memtables_flushes_count = 0;
int64_t pending_memtables_flushes_bytes = 0;
// number of time the clustering filter was executed
int64_t clustering_filter_count = 0;
// sstables considered by the filter (so dividing this by the previous one we get average sstables per read)
int64_t sstables_checked_by_clustering_filter = 0;
// number of times the filter passed the fast-path checks
int64_t clustering_filter_fast_path_count = 0;
// how many sstables survived the clustering key checks
int64_t surviving_sstables_after_clustering_filter = 0;
};
class column_family {
@@ -342,9 +332,9 @@ public:
int64_t pending_compactions = 0;
utils::timed_rate_moving_average_and_histogram reads{256};
utils::timed_rate_moving_average_and_histogram writes{256};
utils::estimated_histogram estimated_read;
utils::estimated_histogram estimated_write;
utils::estimated_histogram estimated_sstable_per_read{35};
sstables::estimated_histogram estimated_read;
sstables::estimated_histogram estimated_write;
sstables::estimated_histogram estimated_sstable_per_read;
utils::timed_rate_moving_average_and_histogram tombstone_scanned;
utils::timed_rate_moving_average_and_histogram live_scanned;
};
@@ -356,7 +346,7 @@ public:
private:
schema_ptr _schema;
config _config;
mutable stats _stats;
stats _stats;
lw_shared_ptr<memtable_list> _memtables;
@@ -478,9 +468,8 @@ private:
// Mutations returned by the reader will all have given schema.
mutation_reader make_sstable_reader(schema_ptr schema,
const query::partition_range& range,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state) const;
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) const;
mutation_source sstables_as_mutation_source();
key_source sstables_as_key_source() const;
@@ -518,18 +507,10 @@ public:
// will be scheduled under the priority class given by pc.
mutation_reader make_reader(schema_ptr schema,
const query::partition_range& range = query::full_partition_range,
const query::partition_slice& slice = query::full_slice,
const io_priority_class& pc = default_priority_class(),
tracing::trace_state_ptr trace_state = nullptr) const;
const query::clustering_key_filtering_context& ck_filtering = query::no_clustering_key_filtering,
const io_priority_class& pc = default_priority_class()) const;
// The streaming mutation reader differs from the regular mutation reader in that:
// - Reflects all writes accepted by replica prior to creation of the
// reader and a _bounded_ amount of writes which arrive later.
// - Does not populate the cache
mutation_reader make_streaming_reader(schema_ptr schema,
const query::partition_range& range = query::full_partition_range) const;
mutation_source as_mutation_source(tracing::trace_state_ptr trace_state) const;
mutation_source as_mutation_source() const;
// Queries can be satisfied from multiple data sources, so they are returned
// as temporaries.
@@ -571,8 +552,7 @@ public:
// Returns at most "cmd.limit" rows
future<lw_shared_ptr<query::result>> query(schema_ptr,
const query::read_command& cmd, query::result_request request,
const std::vector<query::partition_range>& ranges,
tracing::trace_state_ptr trace_state);
const std::vector<query::partition_range>& ranges);
future<> populate(sstring datadir);
@@ -693,10 +673,6 @@ public:
return _stats;
}
::cf_stats* cf_stats() {
return _config.cf_stats;
}
compaction_manager& get_compaction_manager() const {
return _compaction_manager;
}
@@ -1069,8 +1045,8 @@ public:
unsigned shard_of(const dht::token& t);
unsigned shard_of(const mutation& m);
unsigned shard_of(const frozen_mutation& m);
future<lw_shared_ptr<query::result>> query(schema_ptr, const query::read_command& cmd, query::result_request request, const std::vector<query::partition_range>& ranges, tracing::trace_state_ptr trace_state);
future<reconcilable_result> query_mutations(schema_ptr, const query::read_command& cmd, const query::partition_range& range, tracing::trace_state_ptr trace_state);
future<lw_shared_ptr<query::result>> query(schema_ptr, const query::read_command& cmd, query::result_request request, const std::vector<query::partition_range>& ranges);
future<reconcilable_result> query_mutations(schema_ptr, const query::read_command& cmd, const query::partition_range& range);
future<> apply(schema_ptr, const frozen_mutation&);
future<> apply_streaming_mutation(schema_ptr, utils::UUID plan_id, const frozen_mutation&, bool fragmented);
keyspace::config make_keyspace_config(const keyspace_metadata& ksm);

View File

@@ -163,7 +163,7 @@ public:
bool _shutdown = false;
semaphore _new_segment_semaphore {1};
semaphore _new_segment_semaphore;
semaphore _write_semaphore;
semaphore _flush_semaphore;
@@ -453,7 +453,7 @@ public:
segment(::shared_ptr<segment_manager> m, const descriptor& d, file && f, bool active)
: _segment_manager(std::move(m)), _desc(std::move(d)), _file(std::move(f)),
_file_name(_segment_manager->cfg.commit_log_location + "/" + _desc.filename()), _sync_time(
clock_type::now()), _pending_ops(true) // want exception propagation
clock_type::now())
{
++_segment_manager->totals.segments_created;
logger.debug("Created new {} segment {}", active ? "active" : "reserve", *this);
@@ -762,9 +762,9 @@ public:
auto me = shared_from_this();
++_segment_manager->totals.pending_allocations;
logger.trace("Previous allocation is blocking. Must wait.");
return _pending_ops.wait_for_pending().then_wrapped([me](auto f) { // TODO: do we need a finally?
return _pending_ops.wait_for_pending().then([me] { // TODO: do we need a finally?
--me->_segment_manager->totals.pending_allocations;
return f.failed() ? me->_segment_manager->active_segment() : make_ready_future<sseg_ptr>(me);
return make_ready_future<sseg_ptr>(me);
});
}
@@ -793,13 +793,6 @@ public:
});
}
return me->sync();
}).handle_exception([me, fp](auto p) {
// If we get an IO exception (which we assume this is)
// we should close the segment.
// TODO: should we also trunctate away any partial write
// we did?
me->_closed = true; // just mark segment as closed, no writes will be done.
return make_exception_future<sseg_ptr>(p);
});
}
/**
@@ -1557,15 +1550,6 @@ db::commitlog::read_log_file(const sstring& filename, commit_load_reader_func ne
subscription<temporary_buffer<char>, db::replay_position>
db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type off) {
struct work {
private:
file_input_stream_options make_file_input_stream_options() {
file_input_stream_options fo;
fo.buffer_size = db::commitlog::segment::default_size;
fo.read_ahead = 10;
fo.io_priority_class = service::get_local_commitlog_priority();
return fo;
}
public:
file f;
stream<temporary_buffer<char>, replay_position> s;
input_stream<char> fin;
@@ -1581,7 +1565,7 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
bool header = true;
work(file f, position_type o = 0)
: f(f), fin(make_file_input_stream(f, o, make_file_input_stream_options())), start_off(o) {
: f(f), fin(make_file_input_stream(f)), start_off(o) {
}
work(work&&) = default;
@@ -1764,8 +1748,6 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
throw segment_data_corruption_error("Data corruption", corrupt_size);
}
});
}).finally([this] {
return fin.close();
});
}
};

View File

@@ -73,7 +73,7 @@ void commitlog_entry_writer::compute_size() {
}
void commitlog_entry_writer::write(data_output& out) const {
seastar::simple_output_stream str(out.reserve(size()), size());
seastar::simple_output_stream str(out.reserve(size()));
ser::serialize(str, get_entry());
}

View File

@@ -399,8 +399,6 @@ public:
"The IP address a node tells other nodes in the cluster to contact it by. It allows public and private address to be different. For example, use the broadcast_address parameter in topologies where not all nodes have access to other nodes by their private IP addresses.\n" \
"If your Scylla cluster is deployed across multiple Amazon EC2 regions and you use the EC2MultiRegionSnitch , set the broadcast_address to public IP address of the node and the listen_address to the private IP." \
) \
val(listen_on_broadcast_address, bool, false, Used, "When using multiple physical network interfaces, set this to true to listen on broadcast_address in addition to the listen_address, allowing nodes to communicate in both interfaces. Ignore this property if the network configuration automatically routes between the public and private networks such as EC2." \
)\
val(initial_token, sstring, /* N/A */, Used, \
"Used in the single-node-per-token architecture, where a node owns exactly one contiguous range in the ring space. Setting this property overrides num_tokens.\n" \
"If you not using vnodes or have num_tokens set it to 1 or unspecified (#num_tokens), you should always specify this parameter when setting up a production cluster for the first time and when adding capacity. For more information, see this parameter in the Cassandra 1.1 Node and Cluster Configuration documentation.\n" \
@@ -734,9 +732,6 @@ public:
val(skip_wait_for_gossip_to_settle, int32_t, -1, Used, "An integer to configure the wait for gossip to settle. -1: wait normally, 0: do not wait at all, n: wait for at most n polls. Same as -Dcassandra.skip_wait_for_gossip_to_settle in cassandra.") \
val(experimental, bool, false, Used, "Set to true to unlock experimental features.") \
val(lsa_reclamation_step, size_t, 1, Used, "Minimum number of segments to reclaim in a single step") \
val(prometheus_port, uint16_t, 9180, Used, "Prometheus port, set to zero to disable") \
val(prometheus_address, sstring, "0.0.0.0", Used, "Prometheus listening address") \
val(abort_on_lsa_bad_alloc, bool, false, Used, "Abort when allocation in LSA region fails") \
/* done! */
#define _make_value_member(name, type, deflt, status, desc, ...) \

View File

@@ -494,7 +494,7 @@ read_schema_partition_for_table(distributed<service::storage_proxy>& proxy, cons
return query_partition_mutation(proxy.local(), std::move(schema), std::move(cmd), std::move(keyspace_key));
}
static semaphore the_merge_lock {1};
static semaphore the_merge_lock;
future<> merge_lock() {
// ref: #1088
@@ -1213,8 +1213,8 @@ schema_mutations make_table_mutations(schema_ptr table, api::timestamp_type time
m.set_clustered_cell(ckey, "key_aliases", alias(table->partition_key_columns()), timestamp);
m.set_clustered_cell(ckey, "column_aliases", alias(table->clustering_key_columns()), timestamp);
if (table->is_dense()) {
m.set_clustered_cell(ckey, "value_alias", table->regular_begin()->name_as_text(), timestamp);
if (table->compact_columns_count() == 1) {
m.set_clustered_cell(ckey, "value_alias", table->compact_column().name_as_text(), timestamp);
} // null if none
map_type_impl::mutation dropped_columns;
@@ -1593,6 +1593,7 @@ sstring serialize_kind(column_kind kind)
case column_kind::clustering_key: return "clustering_key";
case column_kind::static_column: return "static";
case column_kind::regular_column: return "regular";
case column_kind::compact_column: return "compact_value";
default: throw std::invalid_argument("unknown column kind");
}
}
@@ -1606,8 +1607,8 @@ column_kind deserialize_kind(sstring kind) {
return column_kind::static_column;
} else if (kind == "regular") {
return column_kind::regular_column;
} else if (kind == "compact_value") { // backward compatibility
return column_kind::regular_column;
} else if (kind == "compact_value") {
return column_kind::compact_column;
} else {
throw std::invalid_argument("unknown column kind: " + kind);
}

View File

@@ -75,7 +75,7 @@ static std::vector<db::system_keyspace::range_estimates> estimates_for(const col
// Each range defines both bounds.
for (auto& range : local_ranges) {
int64_t count{0};
utils::estimated_histogram hist{0};
sstables::estimated_histogram hist{0};
unwrapped.clear();
if (range.is_wrap_around(dht::ring_position_comparator(*cf.schema()))) {
auto uw = range.unwrap();
@@ -85,10 +85,10 @@ static std::vector<db::system_keyspace::range_estimates> estimates_for(const col
unwrapped.push_back(range);
}
for (auto&& uwr : unwrapped) {
for (auto&& sstable : cf.select_sstables(uwr)) {
count += sstable->get_estimated_key_count();
hist.merge(sstable->get_stats_metadata().estimated_row_size);
}
for (auto&& sstable : cf.select_sstables(uwr)) {
count += sstable->get_estimated_key_count();
hist.merge(sstable->get_stats_metadata().estimated_row_size);
}
}
estimates.emplace_back(db::system_keyspace::range_estimates{
range.start()->value().token(),

View File

@@ -1,234 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "md5_hasher.hh"
#include "random_partitioner.hh"
#include "utils/class_registrator.hh"
#include <boost/multiprecision/cpp_int.hpp>
namespace dht {
static const boost::multiprecision::uint128_t cppint_one{1};
static const boost::multiprecision::uint128_t cppint127_max = cppint_one << 127;
// Convert token's byte array to integer value.
static boost::multiprecision::uint128_t token_to_cppint(const token& t) {
boost::multiprecision::uint128_t ret{0};
// If the token is minimum token, token._data will be empty,
// zero will be returned
for (uint8_t d : t._data) {
ret = (ret << 8) + d;
}
return ret;
}
// Store integer value for the token into token's byte array. The value must be within [0, 2 ^ 127].
static token cppint_to_token(boost::multiprecision::uint128_t i) {
if (i == 0) {
return minimum_token();
}
if (i > cppint127_max) {
throw std::runtime_error(sprint("RandomPartitioner value %s must be within [0, 2 ^ 127]", i));
}
std::vector<int8_t> t;
while (i) {
static boost::multiprecision::uint128_t byte_mask = 0xFF;
auto data = (i & byte_mask).convert_to<uint8_t>();
t.push_back(data);
i >>= 8;
}
std::reverse(t.begin(), t.end());
return token(token::kind::key, managed_bytes(t.data(), t.size()));
}
// Convert a 16 bytes long raw byte array to token. Byte 0 is the most significant byte.
static token bytes_to_token(bytes digest) {
if (digest.size() != 16) {
throw std::runtime_error(sprint("RandomPartitioner digest should be 16 bytes, it is %d", digest.size()));
}
// Translates the bytes array to signed integer i,
// abs(i) is stored in token's _data array.
if (digest[0] & 0x80) {
boost::multiprecision::uint128_t i = 0;
for (uint8_t d : digest) {
i = (i << 8) + d;
}
// i = abs(i) = ~i + 1
i = ~i + 1;
return cppint_to_token(i);
} else {
return token(token::kind::key, std::move(digest));
}
}
static float ratio_helper(boost::multiprecision::uint128_t a, boost::multiprecision::uint128_t b) {
boost::multiprecision::uint128_t val;
if (a >= b) {
val = a - b;
} else {
val = cppint127_max - (b - a);
}
return static_cast<float>(val.convert_to<double>() * 0x1p-127);
}
token random_partitioner::get_token(bytes data) {
md5_hasher h;
h.update(reinterpret_cast<const char*>(data.c_str()), data.size());
return bytes_to_token(h.finalize());
}
token random_partitioner::get_token(const schema& s, partition_key_view key) {
auto&& legacy = key.legacy_form(s);
return get_token(bytes(legacy.begin(), legacy.end()));
}
token random_partitioner::get_token(const sstables::key_view& key) {
auto v = bytes_view(key);
if (v.empty()) {
return minimum_token();
}
return get_token(bytes(v.begin(), v.end()));
}
int random_partitioner::tri_compare(const token& t1, const token& t2) {
auto l1 = token_to_cppint(t1);
auto l2 = token_to_cppint(t2);
if (l1 == l2) {
return 0;
} else {
return l1 < l2 ? -1 : 1;
}
}
token random_partitioner::get_random_token() {
boost::multiprecision::uint128_t i = dht::get_random_number<uint64_t>();
i = (i << 64) + dht::get_random_number<uint64_t>();
if (i > cppint127_max) {
i = ~i + 1;
}
return cppint_to_token(i);
}
std::map<token, float> random_partitioner::describe_ownership(const std::vector<token>& sorted_tokens) {
std::map<token, float> ownerships;
auto i = sorted_tokens.begin();
// 0-case
if (i == sorted_tokens.end()) {
throw runtime_exception("No nodes present in the cluster. Has this node finished starting up?");
}
// 1-case
if (sorted_tokens.size() == 1) {
ownerships[sorted_tokens[0]] = 1.0;
// n-case
} else {
const token& start = sorted_tokens[0];
auto ti = token_to_cppint(start); // The first token and its value
auto cppint_start = ti;
auto tim1 = ti; // The last token and its value (after loop)
for (i++; i != sorted_tokens.end(); i++) {
ti = token_to_cppint(*i); // The next token and its value
ownerships[*i]= ratio_helper(ti, tim1); // save (T(i) -> %age)
tim1 = ti;
}
// The start token's range extends backward to the last token, which is why both were saved above.
ownerships[start] = ratio_helper(cppint_start, ti);
}
return ownerships;
}
token random_partitioner::midpoint(const token& t1, const token& t2) const {
unsigned sigbytes = std::max(t1._data.size(), t2._data.size());
if (sigbytes == 0) {
// The midpoint of two minimum token is minimum token
return minimum_token();
}
static boost::multiprecision::uint128_t max = cppint_one << 127;
auto l1 = token_to_cppint(t1);
auto l2 = token_to_cppint(t2);
auto sum = l1 + l2;
boost::multiprecision::uint128_t mid;
// t1 <= t2 is the same as l1 <= l2
if (l1 <= l2) {
mid = sum / 2;
} else {
mid = (sum / 2 + max / 2) % max;
}
return cppint_to_token(mid);
}
sstring random_partitioner::to_sstring(const dht::token& t) const {
if (t._kind == dht::token::kind::before_all_keys) {
return sstring();
} else {
return token_to_cppint(t).str();
}
}
dht::token random_partitioner::from_sstring(const sstring& t) const {
if (t.empty()) {
return minimum_token();
} else {
boost::multiprecision::uint128_t x(t.c_str());
return cppint_to_token(x);
}
}
unsigned random_partitioner::shard_of(const token& t) const {
switch (t._kind) {
case token::kind::before_all_keys:
return 0;
case token::kind::after_all_keys:
return smp::count - 1;
case token::kind::key:
auto i = (boost::multiprecision::uint256_t(token_to_cppint(t)) * smp::count) >> 127;
// token can be [0, 2^127], make sure smp be [0, smp::count)
auto smp = i.convert_to<unsigned>();
if (smp >= smp::count) {
return smp::count - 1;
}
return smp;
}
assert(0);
}
bytes random_partitioner::token_to_bytes(const token& t) const {
static const bytes zero_byte(1, int8_t(0x00));
if (t.is_minimum() || t._data.empty()) {
return zero_byte;
}
auto data = bytes(t._data.begin(), t._data.end());
if (t._data[0] & 0x80) {
// Prepend 0x00 to the byte array to mimic BigInteger.toByteArray's
// byte array representation which has a sign bit.
return zero_byte + data;
}
return data;
}
using registry = class_registrator<i_partitioner, random_partitioner>;
static registry registrator("org.apache.cassandra.dht.RandomPartitioner");
static registry registrator_short_name("RandomPartitioner");
}

View File

@@ -1,50 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "i_partitioner.hh"
#include "bytes.hh"
#include "sstables/key.hh"
namespace dht {
class random_partitioner final : public i_partitioner {
public:
virtual const sstring name() { return "org.apache.cassandra.dht.RandomPartitioner"; }
virtual token get_token(const schema& s, partition_key_view key) override;
virtual token get_token(const sstables::key_view& key) override;
virtual token get_random_token() override;
virtual bool preserves_order() override { return false; }
virtual std::map<token, float> describe_ownership(const std::vector<token>& sorted_tokens) override;
virtual data_type get_token_validator() override { return varint_type; }
virtual bytes token_to_bytes(const token& t) const override;
virtual int tri_compare(const token& t1, const token& t2) override;
virtual token midpoint(const token& t1, const token& t2) const;
virtual sstring to_sstring(const dht::token& t) const override;
virtual dht::token from_sstring(const sstring& t) const override;
virtual unsigned shard_of(const token& t) const override;
private:
token get_token(bytes data);
};
}

View File

@@ -33,7 +33,7 @@ done
. /etc/os-release
case "$ID" in
"centos")
AMI=ami-4e1d5b59
AMI=ami-f3102499
REGION=us-east-1
SSH_USERNAME=centos
;;

View File

@@ -1 +0,0 @@
/usr/lib/scylla/node_exporter_install

View File

@@ -1,56 +0,0 @@
#!/bin/sh
#
# Copyright 2016 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
if [ "`id -u`" -ne 0 ]; then
echo "Requires root permission."
exit 1
fi
if [ -f /usr/bin/node_exporter ]; then
echo "node_exporter already installed"
exit 1
fi
version=0.12.0
dir=/usr/lib/scylla/Prometheus/node_exporter
mkdir -p $dir
cd $dir
wget https://github.com/prometheus/node_exporter/releases/download/$version/node_exporter-$version.linux-amd64.tar.gz -O $dir/node_exporter-$version.linux-amd64.tar.gz
tar -xvzf $dir/node_exporter-$version.linux-amd64.tar.gz
rm $dir/node_exporter-$version.linux-amd64.tar.gz
ln -s $dir/node_exporter-$version.linux-amd64/node_exporter /usr/bin
. /etc/os-release
if [ "$(cat /proc/1/comm)" = "systemd" ]; then
systemctl enable node-exporter
systemctl start node-exporter
else
cat <<EOT >> /etc/init/node_exporter.conf
# Run node_exporter
start on startup
script
/usr/bin/node_exporter
end script
EOT
service node_exporter start
fi
printf "node_exporter successfully installed\n"

View File

@@ -22,11 +22,7 @@ while [ $# -gt 0 ]; do
done
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
. /etc/default/scylla-server
else
. /etc/sysconfig/scylla-server
fi
if [ ! -f /etc/default/grub ]; then
echo "Unsupported bootloader"
exit 1
@@ -44,7 +40,7 @@ else
fi
mv /tmp/grub /etc/default/grub
if [ "$ID" = "ubuntu" ] || [ "$ID" = "debian" ]; then
update-grub
grub-mkconfig -o /boot/grub/grub.cfg
else
grub2-mkconfig -o /boot/grub2/grub.cfg
fi

View File

@@ -44,8 +44,8 @@ output_to_user()
}
if [ `is_developer_mode` -eq 0 ]; then
SMP=`echo $CPUSET|grep smp|sed -e "s/^.*smp\(\s\+\|=\)\([^ ]*\).*$/\2/"`
CPUSET=`echo $CPUSET|grep cpuset|sed -e "s/^.*\(--cpuset\(\s\+\|=\)[^ ]*\).*$/\1/"`
SMP=`echo $CPUSET|grep smp|sed -e "s/^.*smp\(\s\+\|=\)\([0-9]*\).*$/\2/"`
CPUSET=`echo $CPUSET|grep cpuset|sed -e "s/^.*\(--cpuset\(\s\+\|=\)[0-9\-]*\).*$/\1/"`
if [ $AMI_OPT -eq 1 ]; then
NR_CPU=`cat /proc/cpuinfo |grep processor|wc -l`
NR_DISKS=`lsblk --list --nodeps --noheadings | grep -v xvda | grep xvd | wc -l`

View File

@@ -30,6 +30,6 @@ else
else
echo "Please upgrade to a newer kernel version."
fi
echo " see http://www.scylladb.com/kb/kb-fs-not-qualified-aio/ for details"
echo " see http://docs.scylladb.com/kb/kb-fs-not-qualified-aio/ for details"
fi
exit $RET

View File

@@ -29,7 +29,6 @@ print_usage() {
echo " --no-sysconfig-setup skip sysconfig setup"
echo " --no-io-setup skip IO configuration setup"
echo " --no-version-check skip daily version check"
echo " --no-node-exporter do not install the node exporter"
exit 1
}
@@ -76,21 +75,6 @@ verify_package() {
fi
}
get_unused_disks() {
blkid -c /dev/null|cut -f 1 -d ' '|sed s/://g|grep -v loop|while read dev
do
count_raw=$(grep $dev /proc/mounts|wc -l)
count_pvs=0
if [ -f /usr/sbin/pvs ]; then
count_pvs=$(pvs|grep $dev|wc -l)
fi
count_swap=$(swapon -s |grep `realpath $dev`|wc -l)
if [ $count_raw -eq 0 -a $count_pvs -eq 0 -a $count_swap -eq 0 ]; then
echo -n "$dev "
fi
done
}
AMI=0
SET_NIC=0
DEV_MODE=0
@@ -105,7 +89,6 @@ RAID_SETUP=1
COREDUMP_SETUP=1
SYSCONFIG_SETUP=1
IO_SETUP=1
NODE_EXPORTER=1
if [ $# -ne 0 ]; then
INTERACTIVE=0
@@ -183,10 +166,6 @@ while [ $# -gt 0 ]; do
IO_SETUP=0
shift 1
;;
"--no-node-exporter")
NODE_EXPORTER=0
shift 1
;;
"-h" | "--help")
print_usage
shift 1
@@ -226,32 +205,31 @@ fi
if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you want to enable ScyllaDB services?" "Answer yes to automatically start Scylla when the node boots; answer no to skip this step." "yes" &&:
ENABLE_SERVICE=$?
if [ $ENABLE_SERVICE -eq 1 ] && [ ! -f /etc/scylla.d/housekeeping.cfg ]; then
interactive_ask_service "Do you want to enable ScyllaDB version check?" "Answer yes to automatically start Scylla-housekeeping that check for newer version, when the node boots; answer no to skip this step." "yes" &&:
ENABLE_CHECK_VERSION=$?
fi
fi
if [ $ENABLE_SERVICE -eq 1 ]; then
if [ "$ID" = "fedora" ] || [ "$ID" = "centos" ] || [ "$ID" = "ubuntu" -a "$VERSION_ID" != "14.04" ]; then
systemctl enable scylla-server.service
systemctl enable collectd.service
fi
if [ $INTERACTIVE -eq 1 ] && [ ! -f /etc/scylla.d/housekeeping.cfg ]; then
interactive_ask_service "Do you want to enable ScyllaDB version check?" "Answer yes to automatically start Scylla-housekeeping that check for newer version, when the node boots; answer no to skip this step." "yes" &&:
ENABLE_CHECK_VERSION=$?
if [ $ENABLE_CHECK_VERSION -eq 1 ]; then
systemctl unmask scylla-housekeeping.timer
else
systemctl mask scylla-housekeeping.timer
systemctl stop scylla-housekeeping.timer || true
fi
fi
if [ $ENABLE_CHECK_VERSION -eq 1 ]; then
if [ ! -f /etc/scylla.d/housekeeping.cfg ]; then
printf "[housekeeping]\ncheck-version: True\n" > /etc/scylla.d/housekeeping.cfg
fi
if [ "$ID" = "fedora" ] || [ "$ID" = "centos" ] || [ "$ID" = "ubuntu" -a "$VERSION_ID" != "14.04" ]; then
systemctl unmask scylla-housekeeping.timer
fi
else
if [ ! -f /etc/scylla.d/housekeeping.cfg ]; then
printf "[housekeeping]\ncheck-version: False\n" > /etc/scylla.d/housekeeping.cfg
fi
if [ "$ID" = "fedora" ] || [ "$ID" = "centos" ] || [ "$ID" = "ubuntu" -a "$VERSION_ID" != "14.04" ]; then
systemctl mask scylla-housekeeping.timer
systemctl stop scylla-housekeeping.timer || true
fi
fi
fi
@@ -287,15 +265,21 @@ if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you want to setup RAID and XFS?" "It is recommended to use RAID0 and XFS for Scylla data. If you select yes, you will be prompt to choose which disks to use for Scylla data. Selected disks will be formatted in the process." "yes" &&:
RAID_SETUP=$?
if [ $RAID_SETUP -eq 1 ]; then
DEVS=`get_unused_disks`
if [ "$DEVS" = "" ]; then
echo "No free disks detected, abort RAID/XFS setup."
echo
RAID_SETUP=0
else
echo "Please select disks from the following list: $DEVS"
fi
while [ "$DEVS" != "" ]; do
echo "Please select disks from following list: "
while true; do
blkid -c /dev/null|cut -f 1 -d ' '|sed s/://g|grep -v loop|while read dev
do
count_raw=$(grep $dev /proc/mounts|wc -l)
count_pvs=0
if [ -f /usr/sbin/pvs ]; then
count_pvs=$(pvs|grep $dev|wc -l)
fi
count_swap=$(swapon --show |grep `realpath $dev`|wc -l)
if [ $count_raw -eq 0 -a $count_pvs -eq 0 -a $count_swap -eq 0 ]; then
echo -n "$dev "
fi
done
echo "type 'done' to finish selection. selected: $DISKS"
echo -n "> "
read dsk
@@ -374,20 +358,10 @@ if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you want to setup IO configuration?" "Answer yes to let iotune study what are your disks IO profile and adapt Scylla to it. Answer no to skip this action." "yes" &&:
IO_SETUP=$?
fi
if [ $IO_SETUP -eq 1 ]; then
/usr/lib/scylla/scylla_io_setup
fi
if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you want to install node exporter, that exports prometheus data from the node?" "Answer yes to install it; answer no to skip this installation." "yes" &&:
NODE_EXPORTER=$?
fi
if [ $NODE_EXPORTER -eq 1 ]; then
/usr/lib/scylla/node_exporter_install
fi
if [ $DEV_MODE -eq 1 ]; then
/usr/lib/scylla/scylla_dev_mode_setup --developer-mode 1
fi

View File

@@ -1,9 +0,0 @@
[Unit]
Description=Node Exporter
[Service]
Type=simple
ExecStart=/usr/bin/node_exporter
[Install]
WantedBy=multi-user.target

View File

@@ -4,8 +4,7 @@ After=scylla-server.service
BindsTo=scylla-server.service
[Timer]
# set OnActiveSec to 3 to safely avoid issues/1846
OnActiveSec=3
OnBootSec=0
OnUnitActiveSec=1d
[Install]

View File

@@ -20,7 +20,6 @@ TimeoutStartSec=900
KillMode=process
Restart=on-abnormal
User=scylla
OOMScoreAdjust=-950
[Install]
WantedBy=multi-user.target

View File

@@ -7,7 +7,7 @@ ENV container docker
VOLUME [ "/sys/fs/cgroup" ]
#install scylla
RUN curl http://downloads.scylladb.com/rpm/centos/scylla-1.4.repo -o /etc/yum.repos.d/scylla.repo
RUN curl http://downloads.scylladb.com/rpm/centos/scylla-1.3.repo -o /etc/yum.repos.d/scylla.repo
RUN yum -y install epel-release
RUN yum -y clean expire-cache
RUN yum -y update

View File

@@ -9,7 +9,6 @@ def parse():
parser.add_argument('--smp', default=None, help="e.g --smp 2 to use two CPUs")
parser.add_argument('--memory', default=None, help="e.g. --memory 1G to use 1 GB of RAM")
parser.add_argument('--overprovisioned', default='0', choices=['0', '1'], help="run in overprovisioned environment")
parser.add_argument('--listen-address', default=None, dest='listenAddress')
parser.add_argument('--broadcast-address', default=None, dest='broadcastAddress')
parser.add_argument('--broadcast-rpc-address', default=None, dest='broadcastRpcAddress')
return parser.parse_args()

View File

@@ -8,7 +8,6 @@ class ScyllaSetup:
self._developerMode = arguments.developerMode
self._seeds = arguments.seeds
self._cpuset = arguments.cpuset
self._listenAddress = arguments.listenAddress
self._broadcastAddress = arguments.broadcastAddress
self._broadcastRpcAddress = arguments.broadcastRpcAddress
self._smp = arguments.smp
@@ -32,15 +31,14 @@ class ScyllaSetup:
def scyllaYAML(self):
configuration = yaml.load(open('/etc/scylla/scylla.yaml'))
if self._listenAddress is None:
self._listenAddress = subprocess.check_output(['hostname', '-i']).decode('ascii').strip()
configuration['listen_address'] = self._listenAddress
configuration['rpc_address'] = self._listenAddress
IP = subprocess.check_output(['hostname', '-i']).decode('ascii').strip()
configuration['listen_address'] = IP
configuration['rpc_address'] = IP
if self._seeds is None:
if self._broadcastAddress is not None:
self._seeds = self._broadcastAddress
else:
self._seeds = self._listenAddress
self._seeds = IP
configuration['seed_provider'] = [
{'class_name': 'org.apache.cassandra.locator.SimpleSeedProvider',
'parameters': [{'seeds': self._seeds}]}

View File

@@ -50,9 +50,6 @@ fi
if [ ! -f /usr/bin/git ]; then
sudo yum -y install git
fi
if [ ! -f /usr/bin/rpmbuild ]; then
sudo yum -y install rpm-build
fi
mkdir -p $RPMBUILD/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS}
if [ "$ID" = "centos" ]; then
sudo yum install -y epel-release

View File

@@ -27,10 +27,10 @@ Group: Applications/Databases
Summary: The Scylla database server
License: AGPLv3
URL: http://www.scylladb.com/
BuildRequires: libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel lksctp-tools-devel protobuf-devel protobuf-compiler libunwind-devel
BuildRequires: libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel lksctp-tools-devel
%{?fedora:BuildRequires: boost-devel ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan python3-pyparsing dnf-yum}
%{?rhel:BuildRequires: scylla-libstdc++-static scylla-boost-devel scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1, python34-pyparsing}
Requires: scylla-conf systemd-libs hwloc collectd PyYAML python-urwid pciutils pyparsing python-requests curl bc util-linux
Requires: scylla-conf systemd-libs hwloc collectd PyYAML python-urwid pyparsing python-requests curl bc util-linux
Conflicts: abrt
%description server
@@ -108,8 +108,8 @@ cp -r scylla-housekeeping $RPM_BUILD_ROOT%{_prefix}/lib/scylla/scylla-housekeepi
cp -P dist/common/sbin/* $RPM_BUILD_ROOT%{_sbindir}/
%pre server
getent group scylla || /usr/sbin/groupadd scylla 2> /dev/null || :
getent passwd scylla || /usr/sbin/useradd -g scylla -s /sbin/nologin -r -d %{_sharedstatedir}/scylla scylla 2> /dev/null || :
/usr/sbin/groupadd scylla 2> /dev/null || :
/usr/sbin/useradd -g scylla -s /sbin/nologin -r -d %{_sharedstatedir}/scylla scylla 2> /dev/null || :
%post server
# Upgrade coredump settings
@@ -151,7 +151,6 @@ rm -rf $RPM_BUILD_ROOT
%{_unitdir}/scylla-server.service
%{_unitdir}/scylla-housekeeping.service
%{_unitdir}/scylla-housekeeping.timer
%{_unitdir}/node-exporter.service
%{_bindir}/scylla
%{_bindir}/iotune
%{_bindir}/scyllatop
@@ -160,8 +159,6 @@ rm -rf $RPM_BUILD_ROOT
%{_prefix}/lib/scylla/scylla_prepare
%{_prefix}/lib/scylla/scylla_stop
%{_prefix}/lib/scylla/scylla_setup
%{_prefix}/lib/scylla/node_exporter_install
%{_sbindir}/node_exporter_install
%{_prefix}/lib/scylla/scylla_coredump_setup
%{_prefix}/lib/scylla/scylla_raid_setup
%{_prefix}/lib/scylla/scylla_sysconfig_setup

View File

@@ -78,13 +78,13 @@ cp dist/ubuntu/scylla-server.install.in debian/scylla-server.install
if [ "$RELEASE" = "14.04" ]; then
sed -i -e "s/@@DH_INSTALLINIT@@/--upstart-only/g" debian/rules
sed -i -e "s/@@COMPILER@@/g++-5/g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/g++-5, libunwind8-dev/g" debian/control
sed -i -e "s/@@BUILD_DEPENDS@@/g++-5/g" debian/control
sed -i -e "s#@@INSTALL@@#dist/ubuntu/sudoers.d/scylla etc/sudoers.d#g" debian/scylla-server.install
sed -i -e "s#@@HKDOTTIMER@@##g" debian/scylla-server.install
else
sed -i -e "s/@@DH_INSTALLINIT@@//g" debian/rules
sed -i -e "s/@@COMPILER@@/g++/g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, g++, libunwind-dev/g" debian/control
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, g++/g" debian/control
sed -i -e "s#@@INSTALL@@##g" debian/scylla-server.install
sed -i -e "s#@@HKDOTTIMER@@#dist/common/systemd/scylla-housekeeping.timer /lib/systemd/system#g" debian/scylla-server.install
fi
@@ -102,7 +102,6 @@ fi
cp dist/common/systemd/scylla-server.service.in debian/scylla-server.service
sed -i -e "s#@@SYSCONFDIR@@#/etc/default#g" debian/scylla-server.service
cp dist/common/systemd/scylla-housekeeping.service debian/scylla-server.scylla-housekeeping.service
cp dist/common/systemd/node-exporter.service debian/scylla-server.node-exporter.service
if [ "$RELEASE" = "14.04" ] && [ $REBUILD -eq 0 ]; then
if [ ! -f /etc/apt/sources.list.d/scylla-3rdparty-trusty.list ]; then

View File

@@ -4,7 +4,7 @@ Homepage: http://scylladb.com
Section: database
Priority: optional
Standards-Version: 3.9.5
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev, xfslibs-dev, python3-pyparsing, libxml2-dev, libsctp-dev, python-urwid, pciutils, libprotobuf-dev, protobuf-compiler, @@BUILD_DEPENDS@@
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev, xfslibs-dev, python3-pyparsing, libxml2-dev, libsctp-dev, python-urwid, @@BUILD_DEPENDS@@
Package: scylla-conf
Architecture: any
@@ -16,7 +16,7 @@ Conflicts: scylla-server (<< 1.1)
Package: scylla-server
Architecture: amd64
Depends: ${shlibs:Depends}, ${misc:Depends}, adduser, hwloc-nox, collectd, scylla-conf, python-yaml, python-urwid, python-requests, curl, bc, util-linux, realpath, @@DEPENDS@@
Depends: ${shlibs:Depends}, ${misc:Depends}, adduser, hwloc-nox, collectd, scylla-conf, python-yaml, python-urwid, python-requests, curl, bc, util-linux, @@DEPENDS@@
Description: Scylla database server binaries
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.

View File

@@ -12,7 +12,6 @@ override_dh_auto_clean:
override_dh_installinit:
dh_installinit --no-start @@DH_INSTALLINIT@@
dh_installinit --no-start --name scylla-housekeeping @@DH_INSTALLINIT@@
dh_installinit --no-start --name node-exporter @@DH_INSTALLINIT@@
override_dh_strip:
dh_strip --dbg-package=scylla-server-dbg

View File

@@ -97,18 +97,6 @@ For example, to configure Scylla to run with two seed nodes `192.168.0.100` and
$ docker run --name some-scylla -d scylladb/scylla --seeds 192.168.0.100,192.168.0.200
```
### `--listen-address ADDR`
The `--listen-address` command line option configures the IP address the Scylla instance listens for client connections.
For example, to configure Scylla to use listen address `10.0.0.5`:
```console
$ docker run --name some-scylla -d scylladb/scylla --listen-address 10.0.0.5
```
**Since: 1.4**
### `--broadcast-address ADDR`
The `--broadcast-address` command line option configures the IP address the Scylla instance tells other Scylla nodes in the cluster to connect to.

View File

@@ -46,19 +46,18 @@
using namespace db;
ser::mutation_view frozen_mutation::mutation_view() const {
auto in = ser::as_input_stream(_bytes);
return ser::deserialize(in, boost::type<ser::mutation_view>());
}
utils::UUID
frozen_mutation::column_family_id() const {
return mutation_view().table_id();
auto in = ser::as_input_stream(_bytes);
auto mv = ser::deserialize(in, boost::type<ser::mutation_view>());
return mv.table_id();
}
utils::UUID
frozen_mutation::schema_version() const {
return mutation_view().schema_version();
auto in = ser::as_input_stream(_bytes);
auto mv = ser::deserialize(in, boost::type<ser::mutation_view>());
return mv.schema_version();
}
partition_key_view
@@ -72,36 +71,37 @@ frozen_mutation::decorated_key(const schema& s) const {
}
partition_key frozen_mutation::deserialize_key() const {
return mutation_view().key();
auto in = ser::as_input_stream(_bytes);
auto mv = ser::deserialize(in, boost::type<ser::mutation_view>());
return mv.key();
}
frozen_mutation::frozen_mutation(bytes_ostream&& b)
frozen_mutation::frozen_mutation(bytes&& b)
: _bytes(std::move(b))
, _pk(deserialize_key())
{
_bytes.reduce_chunk_count();
}
{ }
frozen_mutation::frozen_mutation(bytes_ostream&& b, partition_key pk)
: _bytes(std::move(b))
frozen_mutation::frozen_mutation(bytes_view bv, partition_key pk)
: _bytes(bytes(bv.begin(), bv.end()))
, _pk(std::move(pk))
{
_bytes.reduce_chunk_count();
}
{ }
frozen_mutation::frozen_mutation(const mutation& m)
: _pk(m.key())
{
mutation_partition_serializer part_ser(*m.schema(), m.partition());
ser::writer_of_mutation wom(_bytes);
bytes_ostream out;
ser::writer_of_mutation wom(out);
std::move(wom).write_table_id(m.schema()->id())
.write_schema_version(m.schema()->version())
.write_key(m.key())
.partition([&] (auto wr) {
part_ser.write(std::move(wr));
}).end_mutation();
_bytes.reduce_chunk_count();
auto bv = out.linearize();
_bytes = bytes(bv.begin(), bv.end()); // FIXME: avoid copy
}
mutation
@@ -117,7 +117,9 @@ frozen_mutation freeze(const mutation& m) {
}
mutation_partition_view frozen_mutation::partition() const {
return mutation_partition_view::from_view(mutation_view().partition());
auto in = ser::as_input_stream(_bytes);
auto mv = ser::deserialize(in, boost::type<ser::mutation_view>());
return mutation_partition_view::from_view(mv.partition());
}
std::ostream& operator<<(std::ostream& out, const frozen_mutation::printer& pr) {
@@ -166,7 +168,7 @@ frozen_mutation streamed_mutation_freezer::consume_end_of_stream() {
std::move(_sr), std::move(_rts),
std::move(_crs), std::move(wr));
}).end_mutation();
return frozen_mutation(std::move(out), std::move(_key));
return frozen_mutation(out.linearize(), std::move(_key));
}
future<frozen_mutation> freeze(streamed_mutation sm) {
@@ -206,7 +208,7 @@ private:
_rts.clear();
_crs.clear();
_dirty_size = 0;
return _consumer(frozen_mutation(std::move(out), _key), _fragmented);
return _consumer(frozen_mutation(out.linearize(), _key), _fragmented);
}
future<stop_iteration> maybe_flush() {

View File

@@ -29,10 +29,6 @@
class mutation;
class streamed_mutation;
namespace ser {
class mutation_view;
}
// Immutable, compact form of mutation.
//
// This form is primarily destined to be sent over the network channel.
@@ -45,20 +41,20 @@ class mutation_view;
//
class frozen_mutation final {
private:
bytes_ostream _bytes;
bytes _bytes;
partition_key _pk;
private:
partition_key deserialize_key() const;
ser::mutation_view mutation_view() const;
public:
frozen_mutation(const mutation& m);
explicit frozen_mutation(bytes_ostream&& b);
frozen_mutation(bytes_ostream&& b, partition_key key);
explicit frozen_mutation(bytes&& b);
frozen_mutation(bytes_view bv, partition_key key);
frozen_mutation(frozen_mutation&& m) = default;
frozen_mutation(const frozen_mutation& m) = default;
frozen_mutation& operator=(frozen_mutation&&) = default;
frozen_mutation& operator=(const frozen_mutation&) = default;
const bytes_ostream& representation() const { return _bytes; }
bytes_view representation() const { return _bytes; }
utils::UUID column_family_id() const;
utils::UUID schema_version() const; // FIXME: Should replace column_family_id()
partition_key_view key(const schema& s) const;

View File

@@ -192,12 +192,10 @@ future<> gossiper::handle_syn_msg(msg_addr from, gossip_digest_syn syn_msg) {
/* If the message is from a different cluster throw it away. */
if (syn_msg.cluster_id() != get_cluster_name()) {
logger.warn("ClusterName mismatch from {} {}!={}", from.addr, syn_msg.cluster_id(), get_cluster_name());
return make_ready_future<>();
}
if (syn_msg.partioner() != "" && syn_msg.partioner() != get_partitioner_name()) {
logger.warn("Partitioner mismatch from {} {}!={}", from.addr, syn_msg.partioner(), get_partitioner_name());
return make_ready_future<>();
}
@@ -747,10 +745,10 @@ void gossiper::convict(inet_address endpoint, double phi) {
return;
}
auto& state = it->second;
logger.debug("Convicting {} with status {} - alive {}", endpoint, get_gossip_status(state), state.is_alive());
if (!state.is_alive()) {
return;
}
logger.debug("Convicting {} with status {} - alive {}", endpoint, get_gossip_status(state), state.is_alive());
logger.trace("convict ep={}, phi={}, is_alive={}, is_dead_state={}", endpoint, phi, state.is_alive(), is_dead_state(state));
if (is_shutdown(endpoint)) {
@@ -979,10 +977,7 @@ future<> gossiper::do_gossip_to_unreachable_member(gossip_digest_syn message) {
if (rand_dbl < prob) {
std::set<inet_address> addrs;
for (auto&& x : _unreachable_endpoints) {
// Ignore the node which is decommissioned
if (get_gossip_status(x.first) != sstring(versioned_value::STATUS_LEFT)) {
addrs.insert(x.first);
}
addrs.insert(x.first);
}
logger.trace("do_gossip_to_unreachable_member: live_endpoint nr={} unreachable_endpoints nr={}",
live_endpoint_count, unreachable_endpoint_count);
@@ -1164,7 +1159,7 @@ void gossiper::real_mark_alive(inet_address addr, endpoint_state& local_state) {
_expire_time_endpoint_map.erase(addr);
logger.debug("removing expire time for endpoint : {}", addr);
if (!_in_shadow_round) {
logger.info("InetAddress {} is now UP, status = {}", addr, get_gossip_status(local_state));
logger.info("InetAddress {} is now UP", addr);
}
_subscribers.for_each([addr, local_state] (auto& subscriber) {
@@ -1180,7 +1175,7 @@ void gossiper::mark_dead(inet_address addr, endpoint_state& local_state) {
_live_endpoints.erase(addr);
_live_endpoints_just_added.remove(addr);
_unreachable_endpoints[addr] = now();
logger.info("InetAddress {} is now DOWN, status = {}", addr, get_gossip_status(local_state));
logger.info("InetAddress {} is now DOWN", addr);
_subscribers.for_each([addr, local_state] (auto& subscriber) {
subscriber->on_dead(addr, local_state);
logger.trace("Notified {}", subscriber.get());
@@ -1195,9 +1190,9 @@ void gossiper::handle_major_state_change(inet_address ep, const endpoint_state&
}
if (!is_dead_state(eps) && !_in_shadow_round) {
if (endpoint_state_map.count(ep)) {
logger.debug("Node {} has restarted, now UP, status = {}", ep, get_gossip_status(eps));
logger.info("Node {} has restarted, now UP", ep);
} else {
logger.debug("Node {} is now part of the cluster, status = {}", ep, get_gossip_status(eps));
logger.info("Node {} is now part of the cluster", ep);
}
}
logger.trace("Adding endpoint state for {}, status = {}", ep, get_gossip_status(eps));
@@ -1590,14 +1585,7 @@ bool gossiper::is_in_shadow_round() {
}
void gossiper::add_expire_time_for_endpoint(inet_address endpoint, clk::time_point expire_time) {
char expire_time_buf[100];
auto expire_time_tm = std::chrono::system_clock::to_time_t(expire_time);
auto now_ = now();
strftime(expire_time_buf, sizeof(expire_time_buf), "%Y-%m-%d %T", std::localtime(&expire_time_tm));
auto diff = std::chrono::duration_cast<std::chrono::seconds>(expire_time - now_).count();
logger.info("Node {} will be removed from gossip at [{}]: (expire = {}, now = {}, diff = {} seconds)",
endpoint, expire_time_buf, expire_time.time_since_epoch().count(),
now_.time_since_epoch().count(), diff);
logger.debug("adding expire time for endpoint : {} ({})", endpoint, expire_time.time_since_epoch().count());
_expire_time_endpoint_map[endpoint] = expire_time;
}
@@ -1701,7 +1689,7 @@ future<> gossiper::wait_for_gossip_to_settle() {
total_polls++;
// Make sure 5 gossip rounds are completed sucessfully
if (_nr_run > 5) {
logger.debug("Gossip looks settled. gossip round completed: {}", _nr_run);
logger.debug("Gossip looks settled. {} gossip round completed: {}", _nr_run);
num_okay++;
} else {
logger.info("Gossip not settled after {} polls.", total_polls);

View File

@@ -102,7 +102,7 @@ private:
return msg_addr{to, _default_cpuid};
}
void do_sort(std::vector<gossip_digest>& g_digest_list);
timer<lowres_clock> _scheduled_gossip_task;
timer<std::chrono::steady_clock> _scheduled_gossip_task;
bool _enabled = false;
std::set<inet_address> _seeds_from_config;
sstring _cluster_name;

View File

@@ -184,9 +184,6 @@ struct serializer<$name> {
template <typename Input>
static $name read(Input& buf);
template <typename Input>
static void skip(Input& buf);
};
""").substitute({'name' : name, 'sizetype' : SIZETYPE, 'tmp_param' : template_param }))
if config.ns != '':
@@ -732,19 +729,16 @@ def add_variant_read_size(hout, typ):
add_variant_read_size(hout, p)
read_sizes.add(t)
fprintln(hout, Template("""
template<typename Input>
inline void skip(Input& v, boost::type<${type}>) {
return seastar::with_serialized_stream(v, [] (auto& v) {
template<>
inline void skip(seastar::simple_input_stream& v, boost::type<${type}>) {
size_type ln = deserialize(v, boost::type<size_type>());
v.skip(ln - sizeof(size_type));
});
}""").substitute ({'type' : t}))
fprintln(hout, Template("""
template<typename Input>
$type deserialize(Input& v, boost::type<${type}>) {
return seastar::with_serialized_stream(v, [] (auto& v) {
auto in = v;
deserialize(in, boost::type<size_type>());
size_type o = deserialize(in, boost::type<size_type>());
@@ -755,7 +749,7 @@ $type deserialize(Input& v, boost::type<${type}>) {
v.skip(sizeof(size_type)*2);
return $full_type(deserialize(v, boost::type<$type>()));
}""").substitute({'ind' : index, 'type' : param_view_type(param), 'full_type' : t}))
fprintln(hout, ' return ' + t + '(deserialize(v, boost::type<unknown_variant_type>()));\n });\n}')
fprintln(hout, ' return ' + t + '(deserialize(v, boost::type<unknown_variant_type>()));\n}')
def add_view(hout, info):
[cls, namespaces, parent_template_param] = info
@@ -764,7 +758,7 @@ def add_view(hout, info):
add_variant_read_size(hout, m["type"])
fprintln(hout, Template("""struct ${name}_view {
utils::input_stream v;
seastar::simple_input_stream v;
""").substitute({'name' : cls["name"]}))
if not is_stub(cls["name"]) and is_local_type(cls["name"]):
@@ -775,47 +769,42 @@ def add_view(hout, info):
}
""")).substitute({'type' : cls["name"]}))
skip = "" if is_final(cls) else "ser::skip(in, boost::type<size_type>());"
skip = "" if is_final(cls) else "skip(in, boost::type<size_type>());"
for m in members:
full_type = param_view_type(m["type"])
fprintln(hout, Template(reindent(4, """
auto $name() const {
return seastar::with_serialized_stream(v, [] (auto& v) {
$type $name() const {
auto in = v;
$skip
return deserialize(in, boost::type<$type>());
});
}
""")).substitute({'name' : m["name"], 'type' : full_type, 'skip' : skip}))
skip = skip + Template("\n ser::skip(in, boost::type<${type}>());").substitute({'type': full_type})
skip = skip + Template("\n skip(in, boost::type<${type}>());").substitute({'type': full_type})
fprintln(hout, "};")
skip_impl = "auto& in = v;\n " + skip if is_final(cls) else "v.skip(read_frame_size(v));"
skip_impl = "seastar::simple_input_stream& in = v;\n " + skip if is_final(cls) else "v.skip(read_frame_size(v));"
if skip == "":
skip_impl = ""
fprintln(hout, Template("""
template<>
inline void skip(seastar::simple_input_stream& v, boost::type<${type}_view>) {
$skip_impl
}
template<>
struct serializer<${type}_view> {
template<typename Input>
static ${type}_view read(Input& v) {
return seastar::with_serialized_stream(v, [] (auto& v) {
auto v_start = v;
auto start_size = v.size();
skip(v);
skip(v, boost::type<${type}_view>());
return ${type}_view{v_start.read_substream(start_size - v.size())};
});
}
template<typename Output>
static void write(Output& out, ${type}_view v) {
v.v.copy_to(out);
}
template<typename Input>
static void skip(Input& v) {
return seastar::with_serialized_stream(v, [] (auto& v) {
$skip_impl
});
out.write(v.v.begin(), v.v.size());
}
};
""").substitute({'type' : param_type(cls["name"]), 'skip' : skip, 'skip_impl' : skip_impl}))
@@ -876,13 +865,12 @@ void serializer<$name>::write(Output& buf, const $name& obj) {""").substitute({'
fprintln(cout, Template("""
$template
template <typename Input>
$name$temp_param serializer<$name$temp_param>::read(Input& buf) {
return seastar::with_serialized_stream(buf, [] (auto& buf) {""").substitute({'func' : DESERIALIZER, 'name' : name, 'template': template, 'temp_param' : template_class_param}))
$name$temp_param serializer<$name$temp_param>::read(Input& buf) {""").substitute({'func' : DESERIALIZER, 'name' : name, 'template': template, 'temp_param' : template_class_param}))
if not is_final:
fprintln(cout, Template(""" $size_type size = $func(buf, boost::type<$size_type>());
auto in = buf.read_substream(size - sizeof($size_type));""").substitute({'func' : DESERIALIZER, 'size_type' : SIZETYPE}))
Input in = buf.read_substream(size - sizeof($size_type));""").substitute({'func' : DESERIALIZER, 'size_type' : SIZETYPE}))
else:
fprintln(cout, """ auto& in = buf;""")
fprintln(cout, """ Input& in = buf;""")
params = []
local_names = {}
for index, param in enumerate(cls["members"]):
@@ -894,31 +882,16 @@ $name$temp_param serializer<$name$temp_param>::read(Input& buf) {
deflt = param["default"][0] if "default" in param else param_type(param["type"]) + "()"
if deflt in local_names:
deflt = local_names[deflt]
fprintln(cout, Template(""" auto $local = (in.size()>0) ?
fprintln(cout, Template(""" $typ $local = (in.size()>0) ?
$func(in, boost::type<$typ>()) : $default;""").substitute({'func' : DESERIALIZER, 'typ': param_type(param["type"]), 'local' : local_param, 'default': deflt}))
else:
fprintln(cout, Template(""" auto $local = $func(in, boost::type<$typ>());""").substitute({'func' : DESERIALIZER, 'typ': param_type(param["type"]), 'local' : local_param}))
fprintln(cout, Template(""" $typ $local = $func(in, boost::type<$typ>());""").substitute({'func' : DESERIALIZER, 'typ': param_type(param["type"]), 'local' : local_param}))
params.append("std::move(" + local_param + ")")
fprintln(cout, Template("""
$name$temp_param res {$params};
return res;
});
}""").substitute({'name' : name, 'params': ", ".join(params), 'temp_param' : template_class_param}))
fprintln(cout, Template("""
$template
template <typename Input>
void serializer<$name$temp_param>::skip(Input& buf) {
seastar::with_serialized_stream(buf, [] (auto& buf) {""").substitute({'func' : DESERIALIZER, 'name' : name, 'template': template, 'temp_param' : template_class_param}))
if not is_final:
fprintln(cout, Template(""" $size_type size = $func(buf, boost::type<$size_type>());
buf.skip(size - sizeof($size_type));""").substitute({'func' : DESERIALIZER, 'size_type' : SIZETYPE}))
else:
for m in get_members(cls):
full_type = param_view_type(m["type"])
fprintln(cout, " ser::skip(buf, boost::type<%s>());" % full_type)
fprintln(cout, """ });\n}""")
def handle_objects(tree, hout, cout, namespaces=[]):
for obj in tree:

View File

@@ -70,11 +70,3 @@ struct compound_with_optional {
std::experimental::optional<simple_compound> first;
simple_compound second;
};
class non_final_composite_test_object {
simple_compound x();
};
class final_composite_test_object final {
simple_compound x();
};

View File

@@ -31,10 +31,3 @@ class range {
std::experimental::optional<range_bound<T>> end();
bool is_singular();
};
template<typename T>
class nonwrapping_range {
std::experimental::optional<range_bound<T>> start();
std::experimental::optional<range_bound<T>> end();
bool is_singular();
};

View File

@@ -27,11 +27,11 @@ namespace query {
class specific_ranges {
partition_key pk();
std::vector<nonwrapping_range<clustering_key_prefix>> ranges();
std::vector<range<clustering_key_prefix>> ranges();
};
class partition_slice {
std::vector<nonwrapping_range<clustering_key_prefix>> default_row_ranges();
std::vector<range<clustering_key_prefix>> default_row_ranges();
std::vector<uint32_t> static_columns;
std::vector<uint32_t> regular_columns;
query::partition_slice::option_set options;

View File

@@ -26,7 +26,7 @@ class result_digest final {
};
class result {
bytes buf();
bytes_ostream buf();
std::experimental::optional<query::result_digest> digest();
api::timestamp_type last_modified() [ [version 1.2] ] = api::missing_timestamp;
};

View File

@@ -30,9 +30,6 @@ class trace_info {
utils::UUID session_id;
tracing::trace_type type;
bool write_on_close;
tracing::trace_state_props_set state_props [[version 1.4]];
uint32_t slow_query_threshold_us [[version 1.4]];
uint32_t slow_query_ttl_sec [[version 1.4]];
};
}

View File

@@ -48,8 +48,7 @@ void init_ms_fd_gossiper(sstring listen_address
, sstring ms_compress
, db::seed_provider_type seed_provider
, sstring cluster_name
, double phi
, bool sltba)
, double phi)
{
const gms::inet_address listen(listen_address);
@@ -91,7 +90,7 @@ void init_ms_fd_gossiper(sstring listen_address
// Init messaging_service
// Delay listening messaging_service until gossip message handlers are registered
bool listen_now = false;
net::get_messaging_service().start(listen, storage_port, ew, cw, ssl_storage_port, creds, sltba, listen_now).get();
net::get_messaging_service().start(listen, storage_port, ew, cw, ssl_storage_port, creds, listen_now).get();
// #293 - do not stop anything
//engine().at_exit([] { return net::get_messaging_service().stop(); });

View File

@@ -37,5 +37,4 @@ void init_ms_fd_gossiper(sstring listen_address
, sstring ms_compress
, db::seed_provider_type seed_provider
, sstring cluster_name = "Test Cluster"
, double phi = 8
, bool sltba = false);
, double phi = 8);

18
keys.hh
View File

@@ -168,7 +168,7 @@ public:
template<typename RangeOfSerializedComponents>
static TopLevel from_exploded(RangeOfSerializedComponents&& v) {
return TopLevel::from_range(std::forward<RangeOfSerializedComponents>(v));
return TopLevel(std::forward<RangeOfSerializedComponents>(v));
}
static TopLevel from_exploded(const schema& s, const std::vector<bytes>& v) {
@@ -615,12 +615,8 @@ public:
using c_type = compound_type<allow_prefixes::no>;
template<typename RangeOfSerializedComponents>
static partition_key from_range(RangeOfSerializedComponents&& v) {
return partition_key(managed_bytes(c_type::serialize_value(std::forward<RangeOfSerializedComponents>(v))));
}
partition_key(std::vector<bytes> v)
: compound_wrapper(managed_bytes(c_type::serialize_value(std::move(v))))
partition_key(RangeOfSerializedComponents&& v)
: compound_wrapper(managed_bytes(c_type::serialize_value(std::forward<RangeOfSerializedComponents>(v))))
{ }
partition_key(partition_key&& v) = default;
@@ -709,12 +705,8 @@ class clustering_key_prefix : public prefix_compound_wrapper<clustering_key_pref
{ }
public:
template<typename RangeOfSerializedComponents>
static clustering_key_prefix from_range(RangeOfSerializedComponents&& v) {
return clustering_key_prefix(compound::element_type::serialize_value(std::forward<RangeOfSerializedComponents>(v)));
}
clustering_key_prefix(std::vector<bytes> v)
: prefix_compound_wrapper(compound::element_type::serialize_value(std::move(v)))
clustering_key_prefix(RangeOfSerializedComponents&& v)
: prefix_compound_wrapper(compound::element_type::serialize_value(std::forward<RangeOfSerializedComponents>(v)))
{ }
clustering_key_prefix(clustering_key_prefix&& v) = default;

View File

@@ -54,17 +54,14 @@ static void unwrap_first_range(std::vector<range<token>>& ret) {
std::unique_ptr<abstract_replication_strategy> abstract_replication_strategy::create_replication_strategy(const sstring& ks_name, const sstring& strategy_name, token_metadata& tk_metadata, const std::map<sstring, sstring>& config_options) {
assert(locator::i_endpoint_snitch::get_local_snitch_ptr());
try {
return create_object<abstract_replication_strategy,
const sstring&,
token_metadata&,
snitch_ptr&,
const std::map<sstring, sstring>&>
(strategy_name, ks_name, tk_metadata,
locator::i_endpoint_snitch::get_local_snitch_ptr(), config_options);
} catch (const no_such_class& e) {
throw exceptions::configuration_exception(e.what());
}
return create_object<abstract_replication_strategy,
const sstring&,
token_metadata&,
snitch_ptr&,
const std::map<sstring, sstring>&>
(strategy_name, ks_name, tk_metadata,
locator::i_endpoint_snitch::get_local_snitch_ptr(), config_options);
}
void abstract_replication_strategy::validate_replication_strategy(const sstring& ks_name,

24
main.cc
View File

@@ -51,7 +51,6 @@
#include "disk-error-handler.hh"
#include "tracing/tracing.hh"
#include "db/size_estimates_recorder.hh"
#include "core/prometheus.hh"
#ifdef HAVE_LIBSYSTEMD
#include <systemd/sd-daemon.h>
@@ -306,8 +305,6 @@ int main(int ac, char** av) {
auto& proxy = service::get_storage_proxy();
auto& mm = service::get_migration_manager();
api::http_context ctx(db, proxy);
httpd::http_server_control prometheus_server;
prometheus::config pctx;
directories dirs;
return app.run_deprecated(ac, av, [&] {
@@ -338,7 +335,7 @@ int main(int ac, char** av) {
tcp_syncookies_sanity();
return seastar::async([cfg, &db, &qp, &proxy, &mm, &ctx, &opts, &dirs, &pctx, &prometheus_server] {
return seastar::async([cfg, &db, &qp, &proxy, &mm, &ctx, &opts, &dirs] {
read_config(opts, *cfg).get();
apply_logger_settings(cfg->default_log_level(), cfg->logger_log_level(),
cfg->log_to_stdout(), cfg->log_to_syslog());
@@ -419,7 +416,7 @@ int main(int ac, char** av) {
ctx.http_server.start().get();
api::set_server_init(ctx).get();
ctx.http_server.listen(ipv4_addr{ip, api_port}).get();
startlog.info("Scylla API server listening on {}:{} ...", api_address, api_port);
print("Scylla API server listening on %s:%s ...\n", api_address, api_port);
supervisor_notify("initializing storage service");
init_storage_service(db);
supervisor_notify("starting per-shard database core");
@@ -496,8 +493,7 @@ int main(int ac, char** av) {
, cfg->internode_compression()
, seed_provider
, cluster_name
, phi
, cfg->listen_on_broadcast_address());
, phi);
supervisor_notify("starting messaging service");
supervisor_notify("starting storage proxy");
proxy.start(std::ref(db)).get();
@@ -630,21 +626,7 @@ int main(int ac, char** av) {
smp::invoke_on_all([&cfg] () {
return logalloc::shard_tracker().set_reclamation_step(cfg->lsa_reclamation_step());
}).get();
if (cfg->abort_on_lsa_bad_alloc()) {
smp::invoke_on_all([&cfg]() {
return logalloc::shard_tracker().enable_abort_on_bad_alloc();
}).get();
}
api::set_server_done(ctx).get();
dns::hostent prom_addr = dns::gethostbyname(cfg->prometheus_address()).get0();
supervisor_notify("starting prometheus API server");
uint16_t pport = cfg->prometheus_port();
if (pport) {
pctx.metric_help = "Scylla server statistics";
prometheus_server.start().get();
prometheus::start(prometheus_server, pctx);
prometheus_server.listen(ipv4_addr{prom_addr.addresses[0].in.s_addr, pport}).get();
}
supervisor_notify("serving");
// Register at_exit last, so that storage_service::drain_on_shutdown will be called first
engine().at_exit([] {

View File

@@ -115,7 +115,7 @@ class scanning_reader final : public mutation_reader::impl {
stdx::optional<query::partition_range> _delegate_range;
mutation_reader _delegate;
const io_priority_class& _pc;
const query::partition_slice& _slice;
query::clustering_key_filtering_context _ck_filtering;
private:
memtable::partitions_type::iterator lookup_end() {
auto cmp = memtable_entry::compare(_memtable->_schema);
@@ -152,13 +152,13 @@ public:
scanning_reader(schema_ptr s,
lw_shared_ptr<memtable> m,
const query::partition_range& range,
const query::partition_slice& slice,
const query::clustering_key_filtering_context& ck_filtering,
const io_priority_class& pc)
: _memtable(std::move(m))
, _schema(std::move(s))
, _range(range)
, _pc(pc)
, _slice(slice)
, _ck_filtering(ck_filtering)
{ }
virtual future<streamed_mutation_opt> operator()() override {
@@ -171,7 +171,7 @@ public:
// FIXME: Use cache. See column_family::make_reader().
_delegate_range = _last ? _range.split_after(*_last, dht::ring_position_comparator(*_memtable->_schema)) : _range;
_delegate = make_mutation_reader<sstable_range_wrapping_reader>(
_memtable->_sstable, _schema, *_delegate_range, _slice, _pc);
_memtable->_sstable, _schema, *_delegate_range, _ck_filtering, _pc);
_memtable = {};
_last = {};
return _delegate();
@@ -187,14 +187,14 @@ public:
++_i;
_last = e.key();
_memtable->upgrade_entry(e);
return make_ready_future<streamed_mutation_opt>(e.read(_memtable, _schema, _slice));
return make_ready_future<streamed_mutation_opt>(e.read(_memtable, _schema, _ck_filtering));
}
};
mutation_reader
memtable::make_reader(schema_ptr s,
const query::partition_range& range,
const query::partition_slice& slice,
const query::clustering_key_filtering_context& ck_filtering,
const io_priority_class& pc) {
if (query::is_wrap_around(range, *s)) {
fail(unimplemented::cause::WRAP_AROUND);
@@ -207,13 +207,13 @@ memtable::make_reader(schema_ptr s,
auto i = partitions.find(pos, memtable_entry::compare(_schema));
if (i != partitions.end()) {
upgrade_entry(*i);
return make_reader_returning(i->read(shared_from_this(), s, slice));
return make_reader_returning(i->read(shared_from_this(), s, ck_filtering));
} else {
return make_empty_reader();
}
});
} else {
return make_mutation_reader<scanning_reader>(std::move(s), shared_from_this(), range, slice, pc);
return make_mutation_reader<scanning_reader>(std::move(s), shared_from_this(), range, ck_filtering, pc);
}
}
@@ -300,15 +300,15 @@ bool memtable::is_flushed() const {
}
streamed_mutation
memtable_entry::read(lw_shared_ptr<memtable> mtbl, const schema_ptr& target_schema, const query::partition_slice& slice) {
auto cr = query::clustering_key_filter_ranges::get_ranges(*_schema, slice, _key.key());
memtable_entry::read(lw_shared_ptr<memtable> mtbl, const schema_ptr& target_schema, const query::clustering_key_filtering_context& ck_filtering) {
if (_schema->version() != target_schema->version()) {
auto mp = mutation_partition(_pe.squashed(_schema, target_schema), *target_schema, std::move(cr));
auto mp = mutation_partition(_pe.squashed(_schema, target_schema), *target_schema, ck_filtering.get_ranges(_key.key()));
mutation m = mutation(target_schema, _key, std::move(mp));
return streamed_mutation_from_mutation(std::move(m));
}
auto& cr = ck_filtering.get_ranges(_key.key());
auto snp = _pe.read(_schema);
return make_partition_snapshot_reader(_schema, _key, std::move(cr), snp, *mtbl, mtbl->_read_section, mtbl);
return make_partition_snapshot_reader(_schema, _key, ck_filtering, cr, snp, *mtbl, mtbl->_read_section, mtbl);
}
void memtable::upgrade_entry(memtable_entry& e) {

View File

@@ -59,7 +59,7 @@ public:
partition_entry& partition() { return _pe; }
const schema_ptr& schema() const { return _schema; }
schema_ptr& schema() { return _schema; }
streamed_mutation read(lw_shared_ptr<memtable> mtbl, const schema_ptr&, const query::partition_slice&);
streamed_mutation read(lw_shared_ptr<memtable> mtbl, const schema_ptr&, const query::clustering_key_filtering_context&);
struct compare {
dht::decorated_key::less_comparator _c;
@@ -144,7 +144,7 @@ public:
// Mutations returned by the reader will all have given schema.
mutation_reader make_reader(schema_ptr,
const query::partition_range& range = query::full_partition_range,
const query::partition_slice& slice = query::full_slice,
const query::clustering_key_filtering_context& ck_filtering = query::no_clustering_key_filtering,
const io_priority_class& pc = default_priority_class());
mutation_source as_data_source();

View File

@@ -181,12 +181,10 @@ void messaging_service::foreach_client(std::function<void(const msg_addr& id, co
}
void messaging_service::foreach_server_connection_stats(std::function<void(const rpc::client_info&, const rpc::stats&)>&& f) const {
for (auto&& s : _server) {
if (s) {
s->foreach_connection([f](const rpc_protocol::server::connection& c) {
f(c.info(), c.get_stats());
});
}
if (_server) {
_server->foreach_connection([f](const rpc_protocol::server::connection& c) {
f(c.info(), c.get_stats());
});
}
}
@@ -219,7 +217,7 @@ void register_handler(messaging_service* ms, messaging_verb verb, Func&& func) {
}
messaging_service::messaging_service(gms::inet_address ip, uint16_t port, bool listen_now)
: messaging_service(std::move(ip), port, encrypt_what::none, compress_what::none, 0, nullptr, false, listen_now)
: messaging_service(std::move(ip), port, encrypt_what::none, compress_what::none, 0, nullptr, listen_now)
{}
static
@@ -233,42 +231,28 @@ rpc_resource_limits() {
}
void messaging_service::start_listen() {
bool listen_to_bc = _should_listen_to_broadcast_address && _listen_address != utils::fb_utilities::get_broadcast_address();
rpc::server_options so;
if (_compress_what != compress_what::none) {
so.compressor_factory = &compressor_factory;
}
if (!_server[0]) {
auto listen = [&] (const gms::inet_address& a) {
auto addr = ipv4_addr{a.raw_addr(), _port};
return std::unique_ptr<rpc_protocol_server_wrapper>(new rpc_protocol_server_wrapper(*_rpc,
so, addr, rpc_resource_limits()));
};
_server[0] = listen(_listen_address);
if (listen_to_bc) {
_server[1] = listen(utils::fb_utilities::get_broadcast_address());
}
if (!_server) {
auto addr = ipv4_addr{_listen_address.raw_addr(), _port};
_server = std::unique_ptr<rpc_protocol_server_wrapper>(new rpc_protocol_server_wrapper(*_rpc,
so, addr, rpc_resource_limits()));
}
if (!_server_tls[0]) {
auto listen = [&] (const gms::inet_address& a) {
return std::unique_ptr<rpc_protocol_server_wrapper>(
[this, &so, &a] () -> std::unique_ptr<rpc_protocol_server_wrapper>{
if (!_server_tls) {
_server_tls = std::unique_ptr<rpc_protocol_server_wrapper>(
[this, &so] () -> std::unique_ptr<rpc_protocol_server_wrapper>{
if (_encrypt_what == encrypt_what::none) {
return nullptr;
}
listen_options lo;
lo.reuse_address = true;
auto addr = make_ipv4_address(ipv4_addr{a.raw_addr(), _ssl_port});
auto addr = make_ipv4_address(ipv4_addr{_listen_address.raw_addr(), _ssl_port});
return std::make_unique<rpc_protocol_server_wrapper>(*_rpc,
so, seastar::tls::listen(_credentials, addr, lo));
}());
};
_server_tls[0] = listen(_listen_address);
if (listen_to_bc) {
_server_tls[1] = listen(utils::fb_utilities::get_broadcast_address());
}
}());
}
}
@@ -278,14 +262,12 @@ messaging_service::messaging_service(gms::inet_address ip
, compress_what cw
, uint16_t ssl_port
, std::shared_ptr<seastar::tls::credentials_builder> credentials
, bool sltba
, bool listen_now)
: _listen_address(ip)
, _port(port)
, _ssl_port(ssl_port)
, _encrypt_what(ew)
, _compress_what(cw)
, _should_listen_to_broadcast_address(sltba)
, _rpc(new rpc_protocol_wrapper(serializer { }))
, _credentials(credentials ? credentials->build_server_credentials() : nullptr)
{
@@ -304,7 +286,7 @@ messaging_service::messaging_service(gms::inet_address ip
// Do this on just cpu 0, to avoid duplicate logs.
if (engine().cpu_id() == 0) {
if (_server_tls[0]) {
if (_server_tls) {
logger.info("Starting Encrypted Messaging Service on SSL port {}", _ssl_port);
}
logger.info("Starting Messaging Service on port {}", _port);
@@ -329,19 +311,15 @@ gms::inet_address messaging_service::listen_address() {
}
future<> messaging_service::stop_tls_server() {
for (auto&& s : _server_tls) {
if (s) {
return s->stop();
}
if (_server_tls) {
return _server_tls->stop();
}
return make_ready_future<>();
}
future<> messaging_service::stop_nontls_server() {
for (auto&& s : _server) {
if (s) {
return s->stop();
}
if (_server) {
return _server->stop();
}
return make_ready_future<>();
}
@@ -796,7 +774,7 @@ future<std::vector<frozen_mutation>> messaging_service::send_migration_request(m
return send_message<std::vector<frozen_mutation>>(this, messaging_verb::MIGRATION_REQUEST, std::move(id));
}
void messaging_service::register_mutation(std::function<future<rpc::no_wait_type> (const rpc::client_info&, rpc::opt_time_point, frozen_mutation fm, std::vector<inet_address> forward,
void messaging_service::register_mutation(std::function<future<rpc::no_wait_type> (const rpc::client_info&, frozen_mutation fm, std::vector<inet_address> forward,
inet_address reply_to, unsigned shard, response_id_type response_id, rpc::optional<std::experimental::optional<tracing::trace_info>> trace_info)>&& func) {
register_handler(this, net::messaging_verb::MUTATION, std::move(func));
}

View File

@@ -181,13 +181,12 @@ private:
uint16_t _ssl_port;
encrypt_what _encrypt_what;
compress_what _compress_what;
bool _should_listen_to_broadcast_address;
// map: Node broadcast address -> Node internal IP for communication within the same data center
std::unordered_map<gms::inet_address, gms::inet_address> _preferred_ip_cache;
std::unique_ptr<rpc_protocol_wrapper> _rpc;
std::array<std::unique_ptr<rpc_protocol_server_wrapper>, 2> _server;
std::unique_ptr<rpc_protocol_server_wrapper> _server;
::shared_ptr<seastar::tls::server_credentials> _credentials;
std::array<std::unique_ptr<rpc_protocol_server_wrapper>, 2> _server_tls;
std::unique_ptr<rpc_protocol_server_wrapper> _server_tls;
std::array<clients_map, 3> _clients;
uint64_t _dropped_messages[static_cast<int32_t>(messaging_verb::LAST)] = {};
bool _stopping = false;
@@ -198,7 +197,7 @@ public:
uint16_t port = 7000, bool listen_now = true);
messaging_service(gms::inet_address ip, uint16_t port, encrypt_what, compress_what,
uint16_t ssl_port, std::shared_ptr<seastar::tls::credentials_builder>,
bool sltba = false, bool listen_now = true);
bool listen_now = true);
~messaging_service();
public:
void start_listen();
@@ -278,7 +277,7 @@ public:
// FIXME: response_id_type is an alias in service::storage_proxy::response_id_type
using response_id_type = uint64_t;
// Wrapper for MUTATION
void register_mutation(std::function<future<rpc::no_wait_type> (const rpc::client_info&, rpc::opt_time_point, frozen_mutation fm, std::vector<inet_address> forward,
void register_mutation(std::function<future<rpc::no_wait_type> (const rpc::client_info&, frozen_mutation fm, std::vector<inet_address> forward,
inet_address reply_to, unsigned shard, response_id_type response_id, rpc::optional<std::experimental::optional<tracing::trace_info>> trace_info)>&& func);
void unregister_mutation();
future<> send_mutation(msg_addr id, clock_type::time_point timeout, const frozen_mutation& fm, std::vector<inet_address> forward,

View File

@@ -248,7 +248,7 @@ mutation_partition::mutation_partition(const mutation_partition& x)
}
mutation_partition::mutation_partition(const mutation_partition& x, const schema& schema,
query::clustering_key_filter_ranges ck_ranges)
const query::clustering_row_ranges& ck_ranges)
: _tombstone(x._tombstone)
, _static_row(x._static_row)
, _rows(x._rows.value_comp())
@@ -268,7 +268,7 @@ mutation_partition::mutation_partition(const mutation_partition& x, const schema
}
mutation_partition::mutation_partition(mutation_partition&& x, const schema& schema,
query::clustering_key_filter_ranges ck_ranges)
const query::clustering_row_ranges& ck_ranges)
: _tombstone(x._tombstone)
, _static_row(std::move(x._static_row))
, _rows(std::move(x._rows))
@@ -1830,7 +1830,7 @@ future<data_query_result> data_query(schema_ptr s, const mutation_source& source
auto cfq = make_stable_flattened_mutations_consumer<compact_for_query<emit_only_live_rows::yes, query_result_builder>>(
*s, query_time, slice, row_limit, partition_limit, std::move(qrb));
auto reader = source(s, range, slice, service::get_local_sstable_query_read_priority());
auto reader = source(s, range, query::clustering_key_filtering_context::create(s, slice), service::get_local_sstable_query_read_priority());
return consume_flattened(std::move(reader), std::move(cfq), is_reversed);
}
@@ -1904,6 +1904,6 @@ mutation_query(schema_ptr s,
auto cfq = make_stable_flattened_mutations_consumer<compact_for_query<emit_only_live_rows::no, reconcilable_result_builder>>(
*s, query_time, slice, row_limit, partition_limit, std::move(rrb));
auto reader = source(s, range, slice, service::get_local_sstable_query_read_priority());
auto reader = source(s, range, query::clustering_key_filtering_context::create(s, slice), service::get_local_sstable_query_read_priority());
return consume_flattened(std::move(reader), std::move(cfq), is_reversed);
}

View File

@@ -40,7 +40,6 @@
#include "utils/managed_vector.hh"
#include "hashing_partition_visitor.hh"
#include "range_tombstone_list.hh"
#include "clustering_key_filter.hh"
//
// Container for cells of a row. Cells are identified by column_id.
@@ -564,8 +563,8 @@ public:
{ }
mutation_partition(mutation_partition&&) = default;
mutation_partition(const mutation_partition&);
mutation_partition(const mutation_partition&, const schema&, query::clustering_key_filter_ranges);
mutation_partition(mutation_partition&&, const schema&, query::clustering_key_filter_ranges);
mutation_partition(const mutation_partition&, const schema&, const query::clustering_row_ranges&);
mutation_partition(mutation_partition&&, const schema&, const query::clustering_row_ranges&);
~mutation_partition();
mutation_partition& operator=(const mutation_partition& x);
mutation_partition& operator=(mutation_partition&& x) noexcept;

View File

@@ -23,7 +23,8 @@
#include "database_fwd.hh"
#include "mutation_partition_visitor.hh"
#include "utils/input_stream.hh"
#include <seastar/core/simple-stream.hh>
namespace ser {
class mutation_partition_view;
@@ -31,13 +32,13 @@ class mutation_partition_view;
// View on serialized mutation partition. See mutation_partition_serializer.
class mutation_partition_view {
utils::input_stream _in;
seastar::simple_input_stream _in;
private:
mutation_partition_view(utils::input_stream v)
mutation_partition_view(seastar::simple_input_stream v)
: _in(v)
{ }
public:
static mutation_partition_view from_stream(utils::input_stream v) {
static mutation_partition_view from_stream(seastar::simple_input_stream v) {
return { v };
}
static mutation_partition_view from_view(ser::mutation_partition_view v);

View File

@@ -163,12 +163,12 @@ public:
}
};
mutation_reader make_reader_returning_many(std::vector<mutation> mutations, const query::partition_slice& slice) {
mutation_reader make_reader_returning_many(std::vector<mutation> mutations, query::clustering_key_filtering_context ck_filtering) {
std::vector<streamed_mutation> streamed_mutations;
streamed_mutations.reserve(mutations.size());
for (auto& m : mutations) {
auto ck_ranges = query::clustering_key_filter_ranges::get_ranges(*m.schema(), slice, m.key());
auto mp = mutation_partition(std::move(m.partition()), *m.schema(), std::move(ck_ranges));
const query::clustering_row_ranges& ck_ranges = ck_filtering.get_ranges(m.key());
auto mp = mutation_partition(std::move(m.partition()), *m.schema(), ck_ranges);
auto sm = streamed_mutation_from_mutation(mutation(m.schema(), m.decorated_key(), std::move(mp)));
streamed_mutations.emplace_back(std::move(sm));
}

View File

@@ -24,11 +24,9 @@
#include <vector>
#include "mutation.hh"
#include "clustering_key_filter.hh"
#include "core/future.hh"
#include "core/future-util.hh"
#include "core/do_with.hh"
#include "tracing/trace_state.hh"
// A mutation_reader is an object which allows iterating on mutations: invoke
// the function to get a future for the next mutation, with an unset optional
@@ -89,7 +87,7 @@ mutation_reader make_combined_reader(mutation_reader&& a, mutation_reader&& b);
mutation_reader make_reader_returning(mutation);
mutation_reader make_reader_returning(streamed_mutation);
mutation_reader make_reader_returning_many(std::vector<mutation>,
const query::partition_slice& slice = query::full_slice);
query::clustering_key_filtering_context filter = query::no_clustering_key_filtering);
mutation_reader make_reader_returning_many(std::vector<streamed_mutation>);
mutation_reader make_empty_reader();
@@ -189,35 +187,29 @@ future<> consume(mutation_reader& reader, Consumer consumer) {
// when invoking the source.
class mutation_source {
using partition_range = const query::partition_range&;
using clustering_filter = query::clustering_key_filtering_context;
using io_priority = const io_priority_class&;
std::function<mutation_reader(schema_ptr, partition_range, const query::partition_slice&, io_priority, tracing::trace_state_ptr)> _fn;
std::function<mutation_reader(schema_ptr, partition_range, clustering_filter, io_priority)> _fn;
public:
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, const query::partition_slice&, io_priority, tracing::trace_state_ptr)> fn)
: _fn(std::move(fn)) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, const query::partition_slice&, io_priority)> fn)
: _fn([fn = std::move(fn)] (schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority pc, tracing::trace_state_ptr) {
return fn(s, range, slice, pc);
}) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, const query::partition_slice&)> fn)
: _fn([fn = std::move(fn)] (schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority, tracing::trace_state_ptr) {
return fn(s, range, slice);
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, clustering_filter, io_priority)> fn)
: _fn(std::move(fn)) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, clustering_filter)> fn)
: _fn([fn = std::move(fn)] (schema_ptr s, partition_range range, clustering_filter ck_filtering, io_priority) {
return fn(s, range, ck_filtering);
}) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range range)> fn)
: _fn([fn = std::move(fn)] (schema_ptr s, partition_range range, const query::partition_slice&, io_priority, tracing::trace_state_ptr) {
: _fn([fn = std::move(fn)] (schema_ptr s, partition_range range, clustering_filter, io_priority) {
return fn(s, range);
}) {}
mutation_reader operator()(schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority pc, tracing::trace_state_ptr trace_state) const {
return _fn(std::move(s), range, slice, pc, std::move(trace_state));
mutation_reader operator()(schema_ptr s, partition_range range, clustering_filter ck_filtering, io_priority pc) const {
return _fn(std::move(s), range, ck_filtering, pc);
}
mutation_reader operator()(schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority pc) const {
return _fn(std::move(s), range, slice, pc, nullptr);
}
mutation_reader operator()(schema_ptr s, partition_range range, const query::partition_slice& slice) const {
return _fn(std::move(s), range, slice, default_priority_class(), nullptr);
mutation_reader operator()(schema_ptr s, partition_range range, clustering_filter ck_filtering) const {
return _fn(std::move(s), range, ck_filtering, default_priority_class());
}
mutation_reader operator()(schema_ptr s, partition_range range) const {
return _fn(std::move(s), range, query::full_slice, default_priority_class(), nullptr);
return _fn(std::move(s), range, query::no_clustering_key_filtering, default_priority_class());
}
};

View File

@@ -294,14 +294,14 @@ lw_shared_ptr<partition_snapshot> partition_entry::read(schema_ptr entry_schema)
}
partition_snapshot_reader::partition_snapshot_reader(schema_ptr s, dht::decorated_key dk,
lw_shared_ptr<partition_snapshot> snp,
query::clustering_key_filter_ranges crr, logalloc::region& region,
lw_shared_ptr<partition_snapshot> snp, query::clustering_key_filtering_context fc,
const query::clustering_row_ranges& crr, logalloc::region& region,
logalloc::allocating_section& read_section, boost::any pointer_to_container)
: streamed_mutation::impl(s, std::move(dk), tomb(*snp))
, _container_guard(std::move(pointer_to_container))
, _ck_ranges(std::move(crr))
, _current_ck_range(_ck_ranges.begin())
, _ck_range_end(_ck_ranges.end())
, _filtering_context(fc)
, _current_ck_range(crr.begin())
, _ck_range_end(crr.end())
, _cmp(*s)
, _eq(*s)
, _snapshot(snp)
@@ -463,10 +463,10 @@ future<> partition_snapshot_reader::fill_buffer()
}
streamed_mutation make_partition_snapshot_reader(schema_ptr s, dht::decorated_key dk,
query::clustering_key_filter_ranges crr,
query::clustering_key_filtering_context fc, const query::clustering_row_ranges& crr,
lw_shared_ptr<partition_snapshot> snp, logalloc::region& region,
logalloc::allocating_section& read_section, boost::any pointer_to_container)
{
return make_streamed_mutation<partition_snapshot_reader>(s, std::move(dk),
snp, std::move(crr), region, read_section, std::move(pointer_to_container));
snp, fc, crr, region, read_section, std::move(pointer_to_container));
}

View File

@@ -290,7 +290,8 @@ private:
// that its lifetime is appropriately extended.
boost::any _container_guard;
query::clustering_key_filter_ranges _ck_ranges;
// _filtering_context keeps alive the range of clustering rows
query::clustering_key_filtering_context _filtering_context;
query::clustering_row_ranges::const_iterator _current_ck_range;
query::clustering_row_ranges::const_iterator _ck_range_end;
bool _in_ck_range = false;
@@ -320,7 +321,7 @@ private:
static tombstone tomb(partition_snapshot& snp);
public:
partition_snapshot_reader(schema_ptr s, dht::decorated_key dk, lw_shared_ptr<partition_snapshot> snp,
query::clustering_key_filter_ranges crr,
query::clustering_key_filtering_context fc, const query::clustering_row_ranges& crr,
logalloc::region& region, logalloc::allocating_section& read_section,
boost::any pointer_to_container);
~partition_snapshot_reader();
@@ -328,6 +329,6 @@ public:
};
streamed_mutation make_partition_snapshot_reader(schema_ptr s, dht::decorated_key dk,
query::clustering_key_filter_ranges crr,
query::clustering_key_filtering_context fc, const query::clustering_row_ranges& crr,
lw_shared_ptr<partition_snapshot> snp, logalloc::region& region,
logalloc::allocating_section& read_section, boost::any pointer_to_container);

Some files were not shown because too many files have changed in this diff Show More