Compare commits

..

191 Commits

Author SHA1 Message Date
Pekka Enberg
c4bd4e89ae release: prepare for 1.3.5 2016-11-29 09:47:38 +02:00
Raphael S. Carvalho
c5f43ecd0e main: fix exception handling when initializing data or commitlog dirs
Exception handling was broken because after io checker, storage_io_error
exception is wrapped around system error exceptions. Also the message
when handling exception wasn't precise enough for all cases. For example,
lack of permission to write to existing data directory.

Fixes #883.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com>
(cherry picked from commit 9a9f0d3a0f)
2016-11-16 15:13:44 +02:00
Paweł Dziepak
c923b6e20c row_cache: touch entries read during range queries
Fixes #1847.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1479230809-27547-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 999dafbe57)
2016-11-16 13:08:41 +00:00
Amnon Heiman
e9811897d2 API: cache_capacity should use uint for summing
Using integer as a type for the map_reduce causes number over overflow.

Fixes #1801

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1479299425-782-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit a4be7afbb0)
2016-11-16 15:04:24 +02:00
Paweł Dziepak
54c338d785 partition_version: make sure that snapshot is destroyed under LSA
Snapshot destructor may free some objects managed by the LSA. That's why
partition_snapshot_reader destructor explicitly destroys the snapshot it
uses. However, it was possible that exception thrown by _read_section
prevented that from happenning making snapshot destoryed implicitly
without current allocator set to LSA.

Refs #1831.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478778570-2795-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit f16d6f9c40)
2016-11-16 12:54:16 +00:00
Glauber Costa
f705be3518 histogram: moving averages: fix inverted parameters
moving_averages constructor is defined like this:

    moving_average(latency_counter::duration interval, latency_counter::duration tick_interval)

But when it is time to initialize them, we do this:

	... {tick_interval(), std::chrono::minutes(1)} ...

As it can be seen, the interval and tick interval are inverted. This
leads to the metrics being assigned bogus values.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <d83f09eed20ea2ea007d120544a003b2e0099732.1478798595.git.glauber@scylladb.com>
(cherry picked from commit d3f11fbabf)
2016-11-11 10:16:14 +02:00
Calle Wilund
16a9027552 auth::password_authenticator: Ensure exceptions are processed in continuation
Fixes #1718 (even more)
Message-Id: <1475497389-27016-1-git-send-email-calle@scylladb.com>

(cherry picked from commit 5b815b81b4)
2016-11-07 09:25:29 +02:00
Calle Wilund
23e792c9ea auth::password_authenticator: "authenticate" should not throw undeclared excpt
Fixes #1718

Message-Id: <1475487331-25927-1-git-send-email-calle@scylladb.com>
(cherry picked from commit d24d0f8f90)
2016-11-07 09:25:24 +02:00
Pekka Enberg
5294dd9eb2 Merge seastar submodule
* seastar b62d7a5...5adb964 (2):
  > file: make close() more robust against concurrent calls
  > rpc: Do not close client connection on error response for a timed out request
2016-11-07 09:25:02 +02:00
Raphael S. Carvalho
11323582d6 lcs: fix starvation at higher levels
When max sstable size is increased, higher levels are suffering from
starvation because we decide to compact a given level if the following
calculation results in a number greater than 1.001:
level_size(L) / max_size_for_level_l(L)

Fixes #1720.

For this backport, I needed to add schema as parameter to sstable
functions that return first and last decorated keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit a8ab4b8f37)
2016-11-05 11:26:01 +02:00
Raphael S. Carvalho
69948778c6 lcs: fix broken token range distribution at higher levels
Uniform token range distribution across sstables in a level > 1 was broken,
because we were only choosing sstable with lowest first key, when compacting
a level > 0. This resulted in performance problem because L1->L2 may have a
huge overlap over time, for example.
Last compacted key will now be stored for each level to ensure sort of
"round robin" selection of sstables for compactions at level >= 1.
That's also done by C*, and they were once affected by it as described in
https://issues.apache.org/jira/browse/CASSANDRA-6284.

Fixes #1719.

For this backport, I added schema parameter to compaction_strategy::
notify_completion() because sstable doesn't store schema here.
Most conflicts were that some interfaces take schema parameter at
this version.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit a3bf7558f2)
2016-11-05 11:26:01 +02:00
Pekka Enberg
ef4332ab09 release: prepare for 1.3.4 2016-11-03 12:09:13 +02:00
Pekka Enberg
18a99c00b0 cql3: Fix selecting same column multiple times
Under the hood, the selectable::add_and_get_index() function
deliberately filters out duplicate columns. This causes
simple_selector::get_output_row() to return a row with all duplicate
columns filtered out, which triggers and assertion because of row
mismatch with metadata (which contains the duplicate columns).

The fix is rather simple: just make selection::from_selectors() use
selection_with_processing if the number of selectors and column
definitions doesn't match -- like Apache Cassandra does.

Fixes #1367
Message-Id: <1477989740-6485-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit e1e8ca2788)
2016-11-01 09:34:49 +00:00
Pekka Enberg
cc3a4173f6 release: prepare for 1.3.3 2016-10-28 09:54:41 +03:00
Pekka Enberg
4dc196164d auth: Fix resource level handling
We use `data_resource` class in the CQL parser, which let's users refer
to a table resource without specifying a keyspace. This asserts out in
get_level() for no good reason as we already know the intented level
based on the constructor. Therefore, change `data_resource` to track the
level like upstream Cassandra does and use that.

Fixes #1790

Message-Id: <1477599169-2945-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit b54870764f)
2016-10-27 23:38:01 +03:00
Glauber Costa
31ba6325ef auth: always convert string to upper case before comparing
We store all auth perm strings in upper case, but the user might very
well pass this in upper case.

We could use a standard key comparator / hash here, but since the
strings tend to be small, the new sstring will likely be allocated in
the stack here and this approach yields significantly less code.

Fixes #1791.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <51df92451e6e0a6325a005c19c95eaa55270da61.1477594199.git.glauber@scylladb.com>
(cherry picked from commit ef3c7ab38e)
2016-10-27 22:10:04 +03:00
Tomasz Grabiec
dcd8b87eb8 Merge seastar upstream
* seastar b62d7a5...0fd8792 (1):
  > rpc: Do not close client connection on error response for a timed out request

Refs #1778
2016-10-25 13:56:36 +02:00
Tomasz Grabiec
6b53aad9fb partition_version: Fix corruption of partition_version list
The move constructor of partition_version was not invoking move
constructor of anchorless_list_base_hook. As a result, when
partition_version objects were moved, e.g. during LSA compaction, they
were unlinked from their lists.

This can make readers return invalid data, because not all versions
will be reachable.

It also casues leaks of the versions which are not directly attached
to memtable entry. This will trigger assertion failure in LSA region
destructor. This assetion triggers with row cache disabled. With cache
enabled (default) all segments are merged into the cache region, which
currently is not destroyed on shutdown, so this problem would go
unnoticed. With cache disabled, memtable region is destroyed after
memtable is flushed and after all readers stop using that memtable.

Fixes #1753.
Message-Id: <1476778472-5711-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit fe387f8ba0)
2016-10-18 11:00:19 +02:00
Pekka Enberg
762b156809 database: Fix io_priority_class related compilation error
Commit e6ef49e ("db: Do not timeout streaming readers") breaks compilation of database.cc:

  database.cc: In lambda function:
  database.cc:282:62: error: ‘const class io_priority_class’ has no member named ‘id’
               if (service::get_local_streaming_read_priority().id() == pc.id()) {
                                                              ^~
  database.cc:282:73: error: ‘const class io_priority_class’ has no member named ‘id’
               if (service::get_local_streaming_read_priority().id() == pc.id()) {

...because we don't have Seastar commit 823a404 ("io_priority_class:
remove non-explicit operator unsigned") backported.

Fix the issue by using the non-explicit operator instead of explicit id().

Acked-by: Tomasz Grabiec <tgrabiec@scylladb.com>
Message-Id: <1476425276-17171-1-git-send-email-penberg@scylladb.com>
2016-10-14 13:28:32 +03:00
Pekka Enberg
f3ed1b4763 release: prepare for 1.3.2 2016-10-13 20:34:46 +03:00
Paweł Dziepak
6618236fff query_pagers: fix clustering key range calculation
Paging code assumes that clustering row range [a, a] contains only one
row which may not be true. Another problem is that it tries to use
range<> interface for dealing with clustering key ranges which doesn't
work because of the lack of correct comparator.

Refs #1446.
Fixes #1684.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1475236805-16223-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit eb1fcf3ecc)
2016-10-10 16:08:09 +03:00
Asias He
ecb0e44480 gossip: Switch to use system_clock
The expire time which is used to decide when to remove a node from
gossip membership is gossiped around the cluster. We switched to steady
clock in the past. In order to have a consistent time_point in all the
nodes in the cluster, we have to use wall clock. Switch to use
system_clock for gossip.

Fixes #1704

(cherry picked from commit f0d3084c8b)
2016-10-09 18:12:46 +03:00
Tomasz Grabiec
e6ef49e366 db: Do not timeout streaming readers
There is a limit to concurrency of sstable readers on each shard. When
this limit is exhausted (currently 100 readers) readers queue. There
is a timeout after which queued readers are failed, equal to
read_request_timeout_in_ms (5s by default). The reason we have the
timeout here is primarily because the readers created for the purpose
of serving a CQL request no longer need to execute after waiting
longer than read_request_timeout_in_ms. The coordinator no longer
waits for the result so there is no point in proceeding with the read.

This timeout should not apply for readers created for streaming. The
streaming client currently times out after 10 minutes, so we could
wait at least that long. Timing out sooner makes streaming unreliable,
which under high load may prevent streaming from completing.

The change sets no timeout for streaming readers at replica level,
similarly as we do for system tables readers.

Fixes #1741.

Message-Id: <1475840678-25606-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 2a5a90f391)
2016-10-09 10:33:39 +03:00
Tomasz Grabiec
7d24a3ed56 transport: Extend request memory footprint accounting to also cover execution
CQL server is supposed to throttle requests so that they don't
overflow memory. The problem is that it currently accounts for
request's memory only around reading of its frame from the connection
and not actual request execution. As a result too many requests may be
allowed to execute and we may run out of memory.

Fixes #1708.
Message-Id: <1475149302-11517-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 7e25b958ac)
2016-10-03 14:15:21 +03:00
Avi Kivity
ae2a1158d7 Update seastar submodule
* seastar 9b541ef...b62d7a5 (1):
  > semaphore: Introduce get_units()
2016-10-03 14:11:26 +03:00
Asias He
f110c2456a gossip: Do not remove failure_detector history on remove_endpoint
Otherwise a node could wrongly think the decommissioned node is still
alive and not evict it from the gossip membership.

Backport: CASSANDRA-10371

7877d6f Don't remove FailureDetector history on removeEndpoint

Fixes #1714
Message-Id: <f7f6f1eec2aab1b97a2e568acfd756cca7fc463a.1475112303.git.asias@scylladb.com>

(cherry picked from commit 511f8aeb91)
2016-09-29 13:27:34 +03:00
Raphael S. Carvalho
37d27b1144 api: implement api to return sstable count per level
'nodetool cfstats' wasn't showing per-level sstable count because
the API wasn't implemented.

Fixes #1119.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0dcdf9196eaec1692003fcc8ef18c77d0834b2c6.1474410770.git.raphaelsc@scylladb.com>
(cherry picked from commit 67343798cf)
2016-09-26 09:51:33 +03:00
Tomasz Grabiec
ff674391c2 Merge sestar upstream
Refs #1622
Refs #1690

* seastar b58a287...9b541ef (2):
  > input_stream: Fix possible infinite recursion in consume()
  > iostream: Fix stack overflow in output_stream::split_and_put()
2016-09-22 14:58:44 +02:00
Asias He
55c9279354 gossip: Fix std::out_of_range in setup_collectd
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.

Fixes the issue:

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)

Fixes #1656

Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
(cherry picked from commit aa47265381)
2016-09-20 21:10:55 +03:00
Tomasz Grabiec
ef7b4c61ff tests: Add test for UUID type ordering
Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 2282599394)
2016-09-20 12:22:03 +02:00
Tomasz Grabiec
b9e169ead9 types: fix uuid_type_impl::less
timeuuid_type_impl::compare_bytes is a "trichotomic" comparator (-1,
0, 1) while less() is a "less" comparator (false, true). The code
incorrectly returns c1 instead of c1 < 0 which breaks the ordering.

Fixes #1196.
Message-Id: <1473956716-5209-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 804fe50b7f)
2016-09-20 12:22:00 +02:00
Shlomi Livne
dabbadcf39 release: prepare for 1.3.1
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-09-18 13:18:20 +03:00
Duarte Nunes
c9cb14e160 thrift: Correctly detect clustering range wrap around
This patch uses the clustering bounds comparator to correctly detect
wrap around of a clustering range in the thrift handler.

Refs #1446

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473952155-14886-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit bc3cbb7009)
2016-09-15 16:52:43 +01:00
Shlomi Livne
985352298d ami: Fix instructions how to run scylla_io_setup on non ephemeral instances
On instances differenet then i2/m3/c3 we provide instructions to run
scylla_ip_setup. Running scylla_io_setup requires access to
/var/lib/scylla to crate a temporary file. To gain access to that
directory the user should run 'sudo scylla_io_setup'.

refs: #1645

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4ce90ca1ba4da8f07cf8aa15e755675463a22933.1473935778.git.shlomi@scylladb.com>
(cherry picked from commit acb83073e2)
2016-09-15 15:27:02 +01:00
Paweł Dziepak
c2d347efc7 Merge "Fix abort when querying with contradicting clustering restrictions" from Tomek
"This series backports fixes for #1670 on top of 1.3 branch.

Fixes abort when querying with contradicting clustering column
restrictions, for example:

   SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2"
2016-09-15 14:55:28 +01:00
Tomasz Grabiec
78c7408927 Fix abort when querying with contradicting clustering restrictions
Example of affected query:

  SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2

Refs #1670.

This commit brings back the backport of "Don't allow CK wrapping
ranges" by Duarte by reverting commit 11d7f83d52.

It also has the following fix, which is introduced by the
aforementioned commit, squashed to improve bisectability:

"cql3: Consider bound type when detecting wrap around

 This patch uses the clustering bounds comparator to correctly detect
 wrap around of a clustering range. This fixes a manifestation of #1446,
 introduced by b1f9688432, where a query
 such as select * from cf where k = 0x00 and c0 = 0x02 and c1 > 0x02
 would result in a range containing a clustering key and a prefix,
 incorrectly ordered by the prefix equality or lexicographical
 comparators.

 Refs #1446

 Signed-off-by: Duarte Nunes <duarte@scylladb.com>
 (cherry picked from commit ee2694e27d)"
2016-09-14 19:50:49 +02:00
Duarte Nunes
5716decf60 bounds_view: Create from nonwrapping_range
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 084b931457)
2016-09-14 18:28:55 +02:00
Duarte Nunes
f60eb3958a range_tombstone: Extract out bounds_view
This patch extracts bounds_view from range_tombstone so its comprator
can be reused elsewhere.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 878927d9d2)
2016-09-14 18:28:55 +02:00
Tomasz Grabiec
7848781e5f database: Ignore spaces in initial_token list
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:

  initial_token: -1100081313741479381, -1104041856484663086, ...

Fixes #1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit a498da1987)
2016-09-14 12:03:06 +03:00
Pekka Enberg
195994ea4b Update scylla-ami submodule
* dist/ami/files/scylla-ami 14c1666...e1e3919 (1):
  > scylla_ami_setup: remove scylla_cpuset_setup
2016-09-07 21:05:33 +03:00
Takuya ASADA
183910b8b4 dist/common/scripts/scylla_sysconfig_setup: sync cpuset parameters with rps_cpus settings when posix_net_conf.sh is enabled and NIC is single queue
On posix_net_conf.sh's single queue NIC mode (which means RPS enabled mode), we are excluded cpu0 and it's sibling from network stack processing cpus, and assigned NIC IRQ to cpu0.
So always network stack is not working on cpu0 and it's sibling, to get better performance we need to exclude these cpus from scylla too.
To do this, we need to get RPS cpu mask from posix_net_conf.sh, pass it to scylla_cpuset_setup to construct /etc/scylla.d/cpuset.conf when scylla_setup executed.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-2-git-send-email-syuu@scylladb.com>
(cherry picked from commit 533dc0485d)
2016-09-07 21:00:30 +03:00
Takuya ASADA
f4746d2a46 dist/common/scripts/scylla_prepare: drop unnecesarry multiqueue NIC detection code on scylla_prepare
Right now scylla_prepare specifies -mq option to posix_net_conf.sh when number of RX queues > 1, but on posix_net_conf.sh it sets NIC mode to sq when queues < ncpus / 2.
So the logic is different, and actually posix_net_conf.sh does not need to specify -sq/-mq now, it autodetects queue mode.
So we need to drop detection logic from scylla_prepare, let posix_net_conf.sh to detect it.

Fixes #1406

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1472544875-2033-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 0c3bb2ee63)
2016-09-07 20:59:10 +03:00
Pekka Enberg
fa7f990407 Update seastar submodule
* seastar e6571c4...b58a287 (3):
  > scripts/posix_net_conf.sh: supress 'ls: cannot access
  > /sys/class/net/<NIC>/device/msi_irqs/' error message
  > scripts/posix_net_conf.sh: fix 'command not found' error when
  > specifies --cpu-mask
  > scripts/posix_net_conf.sh: add support --cpu-mask mode
2016-09-07 20:58:09 +03:00
Pekka Enberg
86dbcf093b systemd: Don't start Scylla service until network is up
Alexandr Porunov reports that Scylla fails to start up after reboot as follows:

  Aug 25 19:44:51 scylla1 scylla[637]: Exiting on unhandled exception of type 'std::system_error': Error system:99 (Cannot assign requested address)

The problem is that because there's no dependency to network service,
Scylla simply attempts to start up too soon in the boot sequence and
fails.

Fixes #1618.

Message-Id: <1472212447-21445-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 2d3aee73a6)
2016-08-29 13:26:11 +03:00
Takuya ASADA
db89811fcc dist/common/scripts/scylla_setup: support enabling services on Ubuntu 15.10/16.04
Right now it ignores Ubuntu, but we shareing .service between Fedora/CentOS and Ubuntu >= 15.10, so support it.

Fixes #1556.

Message-Id: <1471932814-17347-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 74d994f6a1)
2016-08-29 13:26:08 +03:00
Duarte Nunes
9ec939f6a3 thrift: Avoid always recording size estimates
Size estimates for a particular column family are recorded every 5
minutes. However, when a user calls the describe_splits(_ex) verbs,
they may want to see estimates for a recently created and updated
column family; this is legitimate and common in testing. However, a
client may also call describe_splits(_ex) very frequently and
recording the estimates on every call is wasteful and, worse, can
cause clients to give up. This patch fixes this by only recording
estimates if the first attempt to query them produces no results.

Refs #1139

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471900595-4715-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 440c1b2189)
2016-08-29 12:08:53 +03:00
Pekka Enberg
d40f586839 dist/docker: Clean up Scylla description for Docker image
Message-Id: <1472145307-3399-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit c5e5e7bb40)
2016-08-29 10:49:09 +03:00
Raphael S. Carvalho
d55c55efec api: use estimation of pending tasks in compaction manager too
We have API for getting pending compaction tasks both in column
family and compaction manager. Column family is already returning
pending tasks properly.
Compaction manager's one is used by 'nodetool compactionstats', and
was returning a value which doesn't reflect pending compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a20b88938ad39e95f98bfd7f93e4d1666d1c6f95.1471641211.git.raphaelsc@scylladb.com>
(cherry picked from commit d8be32d93a)
2016-08-29 10:20:20 +03:00
Raphael S. Carvalho
dc79761c17 sstables: Fix estimation of pending tasks for leveled strategy
There were two underflow bugs.

1) in variable i, causing get_level() to see an invalid level and
throw an exception as a result.
2) when estimating number of pending tasks for a level.

Fixes #1603.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <cce993863d9de4d1f49b3aabe981c475700595fc.1471636164.git.raphaelsc@scylladb.com>
(cherry picked from commit 77d4cd21d7)
2016-08-29 10:19:30 +03:00
Paweł Dziepak
bbffb51811 mutation_partition: fix iterator invalidation in trim_rows
Reversed iterators are adaptors for 'normal' iterators. These underlying
iterators point to different objects that the reversed iterators
themselves.

The consequence of this is that removing an element pointed to by a
reversed iterator may invalidate reversed iterator which point to a
completely different object.

This is what happens in trim_rows for reversed queries. Erasing a row
can invalidate end iterator and the loop would fail to stop.

The solution is to introduce
reversal_traits::erase_dispose_and_update_end() funcion which erases and
disposes object pointed to by a given iterator but takes also a
reference to and end iterator and updates it if necessary to make sure
that it stays valid.

Fixes #1609.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 6012a7e733)
2016-08-25 17:40:37 +03:00
Amnon Heiman
ec3ace5aa3 housekeeping: Silently ignore check version if Scylla is not available
Normally, the check version should start and stop with the scylla-server
service.

If it fails to find scylla server, there is no need to check the
version, nor to report it, so it can stop silently.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 2b98335da4)
2016-08-23 18:11:44 +03:00
Amnon Heiman
1f2d1012be housekeeping: Use curl instead of Python's libraries
There is a problem with Python SSL's in Ubuntu 14.04:

  ubuntu@ip-10-81-165-156:~$ /usr/lib/scylla/scylla-housekeeping -q version
  Traceback (most recent call last):
    File "/usr/lib/scylla/scylla-housekeeping", line 94, in <module>
      args.func(args)
    File "/usr/lib/scylla/scylla-housekeeping", line 71, in check_version
      latest_version = get_json_from_url(version_url + "?version=" + current_version)["version"]
    File "/usr/lib/scylla/scylla-housekeeping", line 50, in get_json_from_url
      response = urllib2.urlopen(req)
    File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
      return _opener.open(url, data, timeout)
    File "/usr/lib/python2.7/urllib2.py", line 404, in open
      response = self._open(req, data)
    File "/usr/lib/python2.7/urllib2.py", line 422, in _open
      '_open', req)
    File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
      result = func(*args)
    File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
      return self.do_open(httplib.HTTPSConnection, req)
    File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
      raise URLError(err)
  urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

Instead of using Python libraries to connect to the check version
server, we will use curl for that.

Fixes #1600

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 4598674673)
2016-08-23 18:11:38 +03:00
Amnon Heiman
9c8cfd3c0e housekeeping: Add curl as a dependency
To work around an SSL problem with Python on Ubuntu 14.04, we need to
use curl. Add it as a dependency so that it's available on the host.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 91944b736e)
2016-08-23 18:11:13 +03:00
Takuya ASADA
d9e4ab38f6 dist/ubuntu: support scylla-housekeeping service on all Ubuntu versions
Current scylla-housekeeping support on Ubuntu has bug, it does not installs .service/.timer for Ubuntu 16.04.
So fix it to make it work.

Fixes #1502

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Tested-by: Amos Kong <amos@scylladb.com>
Message-Id: <1471607903-14889-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 80f7449095)
2016-08-23 13:50:43 +03:00
Takuya ASADA
5d72e96ccc dist/common/systemd: don't use .in for scylla-housekeeping.*, since these are not template file
.in is the name for template files witch requires to rewrite on building time, but these systemd unit files does not require rewrite, so don't name .in, reference directly from .spec.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1471607533-3821-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit aac60082ae)
2016-08-23 13:50:36 +03:00
Paweł Dziepak
a072df0e09 sstables: do not call consume_end_partition() after proceed::no
After state_processor().process_state() returns proceed::no the upper
layer should have a chance to act before more data is pushed to the
consumer. This means that in case of proceed::no verify_end_state()
should not be called immediately since it may invoke
consume_end_partition().

Fixes #1605.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471943032-7290-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 5feed84e32)
2016-08-23 12:24:59 +03:00
Pekka Enberg
8291ec13fa dist/docker: Separate supervisord config files
Move scylla-server and scylla-jmx supervisord config files to separate
files and make the main supervisord.conf scan /etc/supervisord.conf.d/
directory. This makes it easier for people to extend the Docker image
and add their own services.

Message-Id: <1471588406-25444-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 9d1d8baf37)
2016-08-23 11:56:24 +03:00
Vlad Zolotarov
c485551488 tracing::trace_state: fix a compilation error with gcc 4.9
See #1602.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1471774784-26266-1-git-send-email-vladz@cloudius-systems.com>
2016-08-21 16:39:50 +03:00
Paweł Dziepak
b85164bc1d sstables: optimise clustering rows filtering
Clustering rows in the sstables are sorted in the ascending order so we
can use that to minimise number of comparisons when checking if a row is
in the requested range.

Refs #1544.

Paweł further explains the backport rationale for 1.3:

"Apart from making sense on its own, this patch has a very curious
property
of working around #1544 in a way that doesn't make #1446 hit us harder
than
usual.
So, in the branch-1.3 we can:
 - revert 85376ce555
   'Merge "Don't allow CK wrapping ranges" from Duarte' -- previous,
   insufficient workaround for #1544
 - apply this patch
 - rejoice as cql_query_test passes and #1544 is no longer a problem

The scenario above assumes that this patch doesn't introduces any
regressions."

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Reviewed-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <1471608921-30818-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit e60bb83688)
2016-08-19 18:11:49 +03:00
Pekka Enberg
11d7f83d52 Revert "Merge "Don't allow CK wrapping ranges" from Duarte"
This reverts commit 85376ce555, reversing
changes made to 3f54e0c28e.

The change breaks CQL range queries.
2016-08-19 18:11:31 +03:00
Pekka Enberg
5c4a24c1c0 dist/docker: Use Scylla mascot as the logo
Glauber "eagle eyes" Costa pointed out that the Scylla logo used in our
Docker image documentation looks broken because it's missing the Scylla
text.

Fix the problem by using the Scylla mascot instead.
Message-Id: <1471525154-2800-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 2bf5e8de6e)
2016-08-19 12:50:46 +03:00
Pekka Enberg
306eeedf3e dist/docker: Fix bug tracker URL in the documentation
The bug tracker URL in our Docker image documentation is not clickable
because the URL Markdown extracts automatically is broken.

Fix that and add some more links on how to get help and report issues.
Message-Id: <1471524880-2501-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 4d90e1b4d4)
2016-08-19 12:50:42 +03:00
Pekka Enberg
9eada540d9 release: prepare for 1.3.0 2016-08-18 16:20:21 +03:00
Yoav Kleinberger
a662765087 docker: extend supervisor capabilities
allow user to use the `supervisorctl' program to start and stop
services. `exec` needed to be added to the scylla and scylla-jmx starter
scripts - otherwise supervisord loses track of the actual process we
want to manage.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1471442960-110914-1-git-send-email-yoav@scylladb.com>
(cherry picked from commit 25fb5e831e)
2016-08-18 15:41:08 +03:00
Pekka Enberg
192f89bc6f dist/docker: Documentation cleanups
- Fix invisible characters to be space so that Markdown to PDF
  conversion works.

- Fix formatting of examples to be consistent.

- Spellcheck.

Message-Id: <1471514924-29361-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 1553bec57a)
2016-08-18 13:14:21 +03:00
Pekka Enberg
b16bb0c299 dist/docker: Document image command line options
This patch documents all the command line options Scylla's Docker image supports.

Message-Id: <1471513755-27518-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 4ca260a526)
2016-08-18 13:14:16 +03:00
Amos Kong
cd9d967c44 systemd: have the first housekeeping check right after start
Issue: https://github.com/scylladb/scylla/issues/1594

Currently systemd run first housekeeping check at the end of
first timer period. We expected it to be run right after start.

This patch makes systemd to be consistent with upstart.

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <4cc880d509b0a7b283278122a70856e21e5f1649.1471433388.git.amos@scylladb.com>
(cherry picked from commit 9d53305475)
2016-08-17 16:02:25 +03:00
Avi Kivity
236b089b03 Merge "Fixes for streamed_mutation_from_mutation" from Paweł
"This series contains fixes for two memory leaks in
streamed_mutation_from_mutation.

Fixes #1557."

(cherry picked from commit 4871b19337)
2016-08-17 13:26:25 +03:00
Avi Kivity
9d54b33644 Merge 2016-08-17 13:25:49 +03:00
Benoit Canet
4ef6c3155e systemd: Remove WorkingDirectory directive
The WorkingDirectory directive does not support environment variables on
systemd version that is shipped with Ubuntu 16.04. Fortunately, not
setting WorkingDirectory implicitly sets it to user home directory,
which is the same thing (i.e. /var/lib/scylla).

Fixes #1319

Signed-of-by: Benoit Canet <benoit@scylladb.com>
Message-Id: <1470053876-1019-1-git-send-email-benoit@scylladb.com>
(cherry picked from commit 90ef150ee9)
2016-08-17 12:34:44 +03:00
Takuya ASADA
fe529606ae dist/common/scripts: mkdir -p /var/lib/scylla/coredump before symlinking
We are creating this dir in scylla_raid_setup, but user may create XFS volume w/o using the command, scylla_coredump_setup should work on such condition.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470638615-17262-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 60ce16cd54)
2016-08-16 12:35:15 +03:00
Avi Kivity
85376ce555 Merge "Don't allow CK wrapping ranges" from Duarte
"This pathset ensures user-specified clustering key ranges are never
wrapping, as those types of ranges are not defined for CQL3.

Fixes #1544"
2016-08-16 10:09:31 +03:00
Duarte Nunes
5e8ac82614 cql3: Discard wrap around ranges.
Wrapping ranges are not supported in CQL3. If one is specified,
this patch converts it to an empty range.

Fixes #1544

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:22:44 +00:00
Duarte Nunes
22c8520d61 storage_proxy: Short circuit query without clustering ranges
This patch makes the storage_proxy return an empty result when the
query doesn't define any clustering ranges (default or specific).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:05:23 +00:00
Duarte Nunes
e7355c9b60 thrift: Don't always validate clustering range
This patch makes make_clustering_range not enforce that the range be
non-wrapping, so that it can be validated differently if needed. A
make_clustering_range_and_validate function is introduced that keeps
the old behavior.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 15:05:18 +00:00
Paweł Dziepak
3f54e0c28e partition_version: handle errors during version merge
Currently, partition snapshot destructor can throw which is a big no-no.
The solution is to ignore the exception and leave versions unmerged and
hope that subsequent reads will succeed at merging.

However, another problem is that the merge doesn't use allocating
sections which means that memory won't be reclaimed to satisfy its
needs. If the cache is full this may result in partition versions not
being merged for a very long time.

This patch introduces partition_snapshot::merge_partition_versions()
which contains all the version merging logic that was previously present
in the snapshot destructor. This function may throw so that it can be
used with allocating sections.

The actual merging and handling of potential erros is done from
partition_snapshot_reader destructor. It tries to merge versions under
the allocating section. Only if that fails it gives up and leaves them
unmerged.

Fixes #1578
Fixes #1579.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471265544-23579-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 5cae44114f)
2016-08-15 15:57:10 +03:00
Asias He
4c6f8f9d85 gossip: Add heart_beat_version to collectd
$ tools/scyllatop/scyllatop.py '*gossip*'

node-1/gossip-0/gauge-heart_beat_version 1.0
node-2/gossip-0/gauge-heart_beat_version 1.0
node-3/gossip-0/gauge-heart_beat_version 1.0

Gossip heart beat version changes every second. If everyting is working
correctly, the gauge-heart_beat_version output should be 1.0. If not,
the gauge-heart_beat_version output should be less than 1.0.

Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>
(cherry picked from commit ef782f0335)
2016-08-15 12:32:17 +03:00
Nadav Har'El
7a76157cb9 sstables: don't forget to read static row
[v2: fix check for static column (don't check if the schema is not compound)
 and move want-static-columns flag inside the filtering context to avoid
 changing all the callers.]

When a CQL request asks to read only a range of clustering keys inside
a partition, we actually need to read not just these clustering rows, but
also the static columns and add them to the response (as explained by Tomek
in issue #1568).

With the current code, that CQL request is translated into an
sstable::read_row() with a clustering-key filter. But this currently
only reads the requested clustering keys - NOT the static columns.

We don't want sstable::read_row() to unconditionally read the from disk
the static columns because if, for example, they are already cached, we
might not want to read them from disk. We don't have such partial-partition
cache yet, but we are likely to have one in the future.

This patch adds in the clustering key filter object a flag of whether we
need to read the static columns (actually, it's function, returning this
flag per partition, to match the API for the clustering-key filtering).

When sstable::read_row() sees the flag for this partition is true, it also
request to read the static columns.
Currently, the code always passes "true" for this flag - because we don't
have the logic to cache partially-read partitions.

The current find_disk_ranges() code does not yet support returning a non-
contiguous byte range, so this patch, if it notices that this partition
really has static columns in addition to the range it needs to read,
falls back to reading the entire partition. This is a correct solution
(and fixes #1568) but not the most efficient solution. Because static
columns are relatively rare, let's start with this solution (correct
by less efficient when there are static columns) and providing the non-
contiguous reading support is left as a FIXME.

Fixes #1568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1471124536-19471-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit 0d00da7f7f)
2016-08-15 12:30:36 +03:00
Amnon Heiman
b2e6a52461 scylla.spec: conditionally include the housekeeping.cfg in the conf package
When the housekeeping configuration name was changed from conf to cfg it
was no longer included as part of the conf rpm.

This change adds a macro that determines of if the file should be
included or not and use that marco to conditionally add the
configuration file to the rpm.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1471169042-19099-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit 612f677283)
2016-08-14 13:26:25 +03:00
Tomasz Grabiec
b1376fef9b partition_version: Add missing linearization context
Snapshot removal merges partitions, and cell merging must be done
inside linearization context.

Fixes #1574

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1471010625-18019-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 1b2ea14d0e)
2016-08-12 17:56:21 +03:00
Piotr Jastrzebski
23f4813a48 Fix after free access bug in storage proxy
Due to speculative reads we can't guarantee that all
fibers started by storage_proxy::query will be finished
by the time the method returns a result.

We need to make sure that no parameter passed to this
method ever changes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>
(cherry picked from commit f212a6cfcb)
2016-08-12 16:35:45 +02:00
Duarte Nunes
c16c3127fe docker: If set, broadcast address is seed
This patch configures the broadcast address to be the seed if it is
configured, otherwise Scylla complains about it and aborts.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470863058-1011-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 918a2939ff)
2016-08-12 11:47:08 +03:00
Tomasz Grabiec
48fdeb47e2 Merge branch 'raphael/fix_min_max_metadata_v2' from git@github.com:raphaelsc/scylla.git
Fix for generation of sstables min/max clustering metadata from Raphael.

(cherry picked from commit d7f8ce7722)
2016-08-11 17:53:01 +03:00
Avi Kivity
9ef4006d67 Update seastar submodule
* seastar 36a8ebe...e6571c4 (1):
  > reactor: Do not test for poll mode default
2016-08-11 14:45:52 +03:00
Amnon Heiman
75c53e4f24 scylla-housekeeping: rename configuration file from conf to cfg
Files with a conf extension are run by the scylla_prepare on the AMI.
The scylla-housekeeping configuration file is not a bash script and
should not be run.

This patch changes its extension to cfg which is more python like.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470896759-22651-2-git-send-email-amnon@scylladb.com>
(cherry picked from commit 5a4fc9c503)
2016-08-11 14:45:11 +03:00
Tomasz Grabiec
66292c0ef0 sstables: Fix bug in promoted index generation
maybe_flush_pi_block, which is called for each cell, assumes that
block_first_colname will be empty when the first cell is encountered
for each partition.

This didn't hold after writing partition which generated no index
entry, because block_first_colname was cleared only when there way any
data written into the promoted index. Fix by always clearing the name.

The effect was that the promoted index entry for the next partition
would be flushed sooner than necessary (still counting since the start
of the previous partition) and with offset pointing to the start of
the current partition. This will cause parsing error when such sstable
is read through promoted index entry because the offset is assumed to
point to a cell not to partition start.

Fixes #1567

Message-Id: <1470909915-4400-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit f1c2481040)
2016-08-11 13:09:05 +03:00
Amnon Heiman
84f7d9a49c build_deb: Add dist flag
The dist flag mark the debian package as distributed package.
As such the housekeeping configuration file will be included in the
package and will not need to be created by the scylla_setup.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470907208-502-2-git-send-email-amnon@scylladb.com>
(cherry picked from commit a24941cc5f)
2016-08-11 12:25:28 +03:00
Pekka Enberg
f0535eae9b dist/docker: Fix typo in "--overprovisioned" help text
Reported by Mathias Bogaert (@analytically).
Message-Id: <1470904395-4614-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit d1a052237d)
2016-08-11 11:49:42 +03:00
Avi Kivity
4f096b60df Update seastar submodule
* seastar de789f1...36a8ebe (1):
  > reactor: fix I/O queue pending requests collectd metric

Fixes #1558.
2016-08-10 15:28:09 +03:00
Pekka Enberg
4ac160f2fe release: prepare for 1.3.rc3 2016-08-10 13:53:53 +03:00
Avi Kivity
395edc4361 Merge 2016-08-10 13:34:48 +03:00
Avi Kivity
e2c9feafa3 Merge "Add configuration file to scylla-housekeeping" from Amnon
"The series adds an optional configuration file to the scylla-housekeeping. The
file act as a way to prevent the scylla-housekeeping to run. A missing
configuration file, will make the scylla-housekeeping immediately.

The series adds a flag to the build_rpm that differentiate between public
distributions that would contain the configuration file and private
distributions that will not contain it which will cause the setup script to
create it."

(cherry picked from commit da4d33802e)
2016-08-10 13:34:04 +03:00
Avi Kivity
f4dea17c19 Merge "housekeeping: Switch to pytho2 and handle version" from Amnon
This series handle two issues:
* Moving to python2, though python3 is supported, there are modules that we
  need that are not rpm installable, python3 would wait when it will be more
  mature.

* Check version should send the current version when it check for a new one and
  a simple string compare is wrong.

(cherry picked from commit ec62f0d321)
2016-08-10 13:31:50 +03:00
Amnon Heiman
a45b72b66f scylla-housekeeping: check version should use the current version
This patch handle two issues with check version:
* When checking for a version, the script send the current version
* Instead of string compare it uses parse_version to compare the
versions.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 406fa11cc5)
2016-08-10 13:29:53 +03:00
Amnon Heiman
be1c2a875b scylla-housekeeping: Switchng to pythno2
There is a problem with python module installation in pythno3,
especially on centos. Though pytho34 has a normal package, alot of the
modules are missing yum installation and can only be installed by pip.

This patch switch the  scylla-housekeeping implementation to use
python2, we should switch back to python3 when CeontOS python3 will be
more mature.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 641e5dc57c)
2016-08-10 13:23:47 +03:00
Nadav Har'El
0b9f83c6b6 sstable: avoid copying non-existant value
The promoted-index reading code contained a bug where it copied the value
of an disengaged optional (this non-value was never used, but it was still
copied ). Fix it by keeping the optional<> as such longer.

This bug caused tests/sstable_test in the debug build to crash (the release
build somehow worked).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470742418-8813-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit e005762271)
2016-08-10 13:14:49 +03:00
Pekka Enberg
0d77615b80 cql3: Filter compaction strategy class from compaction options
Cassandra 2.x does not store the compaction strategy class in compaction
options so neither should we to avoid confusing the drivers.

Fixes #1538.
Message-Id: <1470722615-29106-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 9ff242d339)
2016-08-10 12:44:50 +03:00
Pekka Enberg
8771220745 dist/docker: Add '--smp', '--memory', and '--overprovisoned' options
Add '--smp', '--memory', and '--overprovisioned' options to the Docker
image. The options are written to /etc/scylla.d/docker.conf file, which
is picked up by the Scylla startup scripts.

You can now, for example, restrict your Docker container to 1 CPU and 1
GB of memory with:

   $ docker run --name some-scylla penberg/scylla --smp 1 --memory 1G --overprovisioned 1

Needed by folks who want to run Scylla on Docker in production.

Cc: Sasha Levin <alexander.levin@verizon.com>
Message-Id: <1470680445-25731-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 6a5ab6bff4)
2016-08-10 11:54:01 +03:00
Avi Kivity
f552a62169 Update seastar submodule
* seastar ee1ecc5...de789f1 (1):
  > Merge "Fix the SMP queue poller" from Tomasz

Fixes #1553.
2016-08-10 09:54:15 +03:00
Avi Kivity
696a978611 Update seastar submodule
* seastar 0b53ab2...ee1ecc5 (1):
  > byteorder: add missing cpu_to_be(), be_to_cpu() functions

Fixes build failure.
2016-08-10 09:51:35 +03:00
Nadav Har'El
0475a98de1 Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit c2e4f5ba16)
2016-08-09 17:54:54 +03:00
Nadav Har'El
0b69e37065 Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit bce020efbd)
2016-08-09 17:54:49 +03:00
Avi Kivity
dc6be68852 Merge "promoted index for reading partial partitions" from Nadav
"The goal of this patch series is to support reading and writing of a
"promoted index" - the Cassandra 2.* SSTable feature which allows reading
only a part of the partition without needing to read an entire partition
when it is very long. To make a long story short, a "promoted index" is
a sample of each partition's column names, written to the SSTable Index
file with that partition's entry. See a longer explanation of the index
file format, and the promoted index, here:

     https://github.com/scylladb/scylla/wiki/SSTables-Index-File

There are two main features in this series - first enabling reading of
parts of partitions (using the promoted index stored in an sstable),
and then enable writing promoted indexes to new sstables. These two
features are broken up into smaller stand-alone pieces to facilitate the
review.

Three features are still missing from this series and are planned to be
developed later:

1. When we fail to parse a partition's promoted index, we silently fall back
   to reading the entire partition. We should log (with rate limiting) and
   count these errors, to help in debugging sstable problems.

2. The current code only uses the promoted index when looking for a single
   contiguous clustering-key range. If the ck range is non-contiguous, we
   fall back to reading the entire partition. We should use the promoted
   index in that case too.

3. The current code only uses the promoted index when reading a single
   partition, via sstable::read_row(). When scanning through all or a
   range of partitions (read_rows() or read_range_rows()), we do not yet
   use the promoted index; We read contiguously from data file (we do not
   even read from the index file, so unsurprisingly we can't use it)."

(cherry picked from commit 700feda0db)
2016-08-09 17:54:15 +03:00
Avi Kivity
8c20741150 Revert "sstables: promoted index write support"
This reverts commit c0e387e1ac.  The full
patchset needs to be backported instead.
2016-08-09 17:53:24 +03:00
Avi Kivity
3e3eaa693c Revert "Fix failing tests"
This reverts commit 8d542221eb.  It is needed,
but prevents another revert from taking place.  Will be reinstated later
2016-08-09 17:52:57 +03:00
Avi Kivity
03ef0a9231 Revert "Avoid some warnings in debug build"
This reverts commit 47bf8181af.  It is needed,
but prevents another revert from taking place.  Will be reinstated later.
2016-08-09 17:52:09 +03:00
Nadav Har'El
47bf8181af Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit c2e4f5ba16)
2016-08-09 16:58:27 +03:00
Nadav Har'El
8d542221eb Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit bce020efbd)
2016-08-09 16:58:27 +03:00
Nadav Har'El
c0e387e1ac sstables: promoted index write support
This patch adds writing of promoted index to sstables.

The promoted index is basically a sample of columns and their positions
for large partitions: The promoted index appears in the sstable's index
file for partitions which are larger than 64 KB, and divides the partition
to 64 KB blocks (as in Cassandra, this interval is configurable through
the column_index_size_in_kb config parameter). Beyond modifying the index
file, having a promoted index may also modify the data file: Since each
of blocks may be read independently, we need to add in the beginning of
each block the list of range tombstones that are still open at that
position.

See also https://github.com/scylladb/scylla/wiki/SSTables-Index-File

Fixes #959

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 0d8463aba5)
2016-08-09 16:58:27 +03:00
Duarte Nunes
57d3dc5c66 thrift: Set default validator
This patch sets the default validator for dynamic column families.
Doing so has no consequences in terms of behavior, but it causes the
correct type to be shown when describing the column family through
cassandra-cli.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470739773-30497-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 0ed19ec64d)
2016-08-09 13:56:43 +02:00
Duarte Nunes
2daee0b62d thrift: Send empty col metadata when describing ks
This patch ensures we always send the column metadata, even when the
column family is dynamic and the metadata is empty, as some clients
like cassandra-cli always assume its presence.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470740971-31169-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit f63886b32e)
2016-08-09 14:34:14 +03:00
Pekka Enberg
3eddf5ac54 dist/docker: Document data volume and cpuset configuration
Message-Id: <1470649675-5648-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 3b31d500c8)
2016-08-09 11:15:57 +03:00
Pekka Enberg
42d6f389f9 dist/docker: Add '--broadcast-rpc-address' command line option
We already have a '--broadcat-address' command line option so let's add
the same thing for RPC broadcast address configuration.

Message-Id: <1470656449-11038-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 4372da426c)
2016-08-09 11:15:53 +03:00
Pekka Enberg
1a6f6f1605 Update scylla-ami submodule
* dist/ami/files/scylla-ami 2e599a3...14c1666 (1):
  > setup coredump on first startup
2016-08-09 11:10:20 +03:00
Avi Kivity
8d8e997f5a Update scylla-ami submodule
* dist/ami/files/scylla-ami 863cc45...2e599a3 (1):
  > Do not set developer-mode on unsupported instance types
2016-08-07 17:52:13 +03:00
Takuya ASADA
50ee889679 dist/ami/files: add a message for privacy policy agreement on login prompt
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470212054-351-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 3d45d6579b)
2016-08-07 17:40:56 +03:00
Duarte Nunes
325f917d8a system_keyspace: Correctly deal with wrapped ranges
This patch ensures we correctly deal with ranges that wrap around when
querying the size_estimates system table.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470412433-7767-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit e0a43a82c6)
2016-08-07 17:21:58 +03:00
Takuya ASADA
b088dd7d9e dist/ami/files: show warning message for unsupported instance types
Notify users to run scylla_io_setup before lunching scylla on unsupported instance types.

Fixes #1511

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470090415-8632-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit bd1ab3a0ad)
2016-08-05 09:51:27 +03:00
Takuya ASADA
a42b2bb0d6 dist/ami: Install scylla metapackage and debuginfo on AMI
Install scylla metapackage and debuginfo on AMI to make AMI to report bugs easier.
Fixes #1496

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469635071-16821-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 9b59bb59f2)
2016-08-05 09:48:19 +03:00
Takuya ASADA
aecda01f8e dist/common/scripts: disable coredump compression by default, add an argument to enable compression on scylla_coredump_setup
On large memory machine compression takes too long, so disable it by default.
Also provide a way to enable it again.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469706934-6280-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 89b790358e)
2016-08-05 09:47:17 +03:00
Takuya ASADA
f9b0a29def dist/ami: setup correct repository when --localrpm specified
There was no way to setup correct repo when AMI is building by --localrpm option, since AMI does not have access to 'version' file, and we don't passed repo URL to the AMI.
So detect optimal repo path when starting build AMI, passes repo URL to the AMI, setup it correctly.

Note: this changes behavor of build_ami.sh/scylla_install_pkg's --repo option.
It was repository URL, but now become .repo/.list file URL.
This is optimal for the distribution which requires 3rdparty packages to install scylla, like CentOS7.
Existing shell scripts which invoking build_ami.sh are need to change in new way, such as our Jenkins jobs.

Fixes #1414

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1469636377-17828-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit d3746298ae)
2016-08-05 09:45:22 +03:00
Pekka Enberg
192e935832 dist/docker: Use Scylla 1.3 RPM repository 2016-08-05 09:13:08 +03:00
Avi Kivity
436ff3488a Merge "Docker image fixes" from Pekka
"Kubernetes is unhappy with our Docker image because we start systemd
under the hood. Fix that by switching to use "supervisord" to manage the
two processes -- "scylla" and "scylla-jmx":

  http://blog.kunicki.org/blog/2016/02/12/multiple-entrypoints-in-docker/

While at it, fix up "docker logs" and "docker exec cqlsh" to work
out-of-the-box, and update our documentation to match what we have.

Further work is needed to ensure Scylla production configuration works
as expected and is documented accordingly."

(cherry picked from commit 28ee2bdbd2)
2016-08-05 09:10:51 +03:00
Benoît Canet
b91712fc36 docker: Add documentation page for Docker Hub
Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466438296-5593-1-git-send-email-benoit@scylladb.com>
(cherry picked from commit 4ce7bced27)
2016-08-05 09:10:48 +03:00
Yoav Kleinberger
be954ccaec docker: bring docker image closer to a more 'standard' scylla installation
Previously, the Docker image could only be run interactively, which is
not conducive for running clusters. This patch makes the docker image
run in the background (using systemd). This makes the docker workflow
similar to working with virtual machines, i.e. the user launches a
container, and once it is running they can connect to it with

       docker exec -it <container_name> bash

and immediately use `cqlsh` to control it.

In addition, the configuration of scylla is done using established
scripts, such as `scylla_dev_mode_setup`, `scylla_cpuset_setup` and
`scylla_io_setup`, whereas previously code from these scripts was
duplicated into the docker startup file.

To specify seeds for making a cluster, use the --seeds command line
argument, e.g.

    docker run -d --privileged scylladb/scylla
    docker run -d --privileged scylladb/scylla --seeds 172.17.0.2

other options include --developer-mode, --cpuset, --broadcast-address

The --developer-mode option mode is on by default - so that we don't fail users
who just want to play with this.

The Dockerfile entrypoint script was rewritten as a few Python modules.
The move to Python is meritted because:

    * Using `sed` to manipulate YAML is fragile
    * Lack of proper command line parsing resulted in introducing ad-hoc environment variables
    * Shell scripts don't throw exceptions, and it's easy to forget to check exit codes for every single command

I've made an effort to make the entrypoint `go' script very simple and readable.
The goary details are hidden inside the other python modules.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1468938693-32168-1-git-send-email-yoav@scylladb.com>
(cherry picked from commit d1d1be4c1a)
2016-08-05 09:10:35 +03:00
Glauber Costa
2bffa8af74 logalloc: make sure allocations in release_requests don't recurse back into the allocator
Calls like later() and with_gate() may allocate memory, although that is not
very common. This can create a problem in the sense that it will potentially
recurse and bring us back to the allocator during free - which is the very thing
we are trying to avoid with the call to later().

This patch wraps the relevant calls in the reclaimer lock. This do mean that the
allocation may fail if we are under severe pressure - which includes having
exhausted all reserved space - but at least we won't recurse back to the
allocator.

To make sure we do this as early as possible, we just fold both release_requests
and do_release_requests into a single function

Thanks Tomek for the suggestion.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <980245ccc17960cf4fcbbfedb29d1878a98d85d8.1470254846.git.glauber@scylladb.com>
(cherry picked from commit fe6a0d97d1)
2016-08-04 11:17:54 +02:00
Glauber Costa
4a6d0d503f logalloc: make sure blocked requests memory allocations are served from the standar allocator
Issue 1510 describes a scenario in which, under load, we allocate memory within
release_requests() leading to a reentry into an invalid state in our
blocked requests' shared_promise.

This is not easy to trigger since not all allocations will actually get to the
point in which they need a new segment, let alone have that happening during
another allocator call.

Having those kinds of reentry is something we have always sought to avoid with
release_requests(): this is the reason why most of the actual routine is
deferred after a call to later().

However, that is a trick we cannot use for updating the state of the blocked
requests' shared_promise: we can't guarantee when is that going to run, and we
always need a valid shared_promise, in a valid state, waiting for new requests
to hook into.

The solution employed by this patch is to make sure that no allocation
operations whatsoever happen during the initial part of release_requests on
behalf of the shared promise.  Allocation is now deferred to first use, which
relieves release_requests() from all allocation duties. All it needs to do is
free the old object and signal to the its user that an allocation is needed (by
storing {} into the shared_promise).

Fixes #1510

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <49771e51426f972ddbd4f3eeea3cdeef9cc3b3c6.1470238168.git.glauber@scylladb.com>
(cherry picked from commit ad58691afb)
2016-08-04 11:17:49 +02:00
Avi Kivity
fa81385469 conf: synchronize internode_compression between scylla.yaml and code
Our default is "none", to give reasonable performance, so have scylla.yaml
reflect that.

(cherry picked from commit 9df4ac53e5)
2016-08-04 12:10:03 +03:00
Duarte Nunes
93981aaa93 schema_builder: Ensure dense tables have compact col
This patch ensures that when the schema is dense, regardless of
compact_storage being set, the single regular columns is translated
into a compact column.

This fixes an issue where Thrift dynamic column families are
translated to a dense schema with a regular column, instead of a
compact one.

Since a compact column is also a regular column (e.g., for purposes of
querying), no further changes are required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470062410-1414-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5995aebf39)

Fixes #1535.
2016-08-03 13:49:51 +02:00
Duarte Nunes
89b40f54db schema: Dense schemas are correctly upgrades
When upgrading a dense schema, we would drop the cells of the regular
(compact) column. This patch fixes this by making the regular and
compact column kinds compatible.

Fixes #1536

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470172097-7719-1-git-send-email-duarte@scylladb.com>
2016-08-03 13:37:57 +02:00
Paweł Dziepak
99dfbedf36 sstables: extend sstable life until reader is fully closed
data_consume_rows_context needs to have close() called and the returned
future waited for before it can be destroyed. data_consume_context::impl
does that in the background upon its destruction.

However, it is possible that the sstable is removed before
data_consume_rows_context::close() completes in which case EBADF may
happen. The solution is to make data_consume_context::impl keep a
reference to the sstable and extend its life time until closing of
data_consume_rows_context (which is performed in the background)
completes.

Side effect of this change is also that data_consume_context no longer
requires its user to make sure that the sstable exists as long as it is
in use since it owns its own reference to it.

Fixes #1537.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1470222225-19948-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 02ffc28f0d)
2016-08-03 13:19:50 +02:00
Paweł Dziepak
e95f4eaee4 Merge "partition_limit: Don't count dead partitions" from Duarte
"This patch series ensures we don't count dead partitions (i.e.,
partitions with no live rows) towards the partition_limit. We also
enforce the partition limit at the storage_proxy level, so that
limits with smp > 1 works correctly."

(cherry picked from commit 5f11a727c9)
2016-08-03 12:44:32 +03:00
Avi Kivity
2570da2006 Update seastar submodule
* seastar f603f88...0b53ab2 (2):
  > reactor: limit task backlog
  > reactor: make sure a poll cycle always happens when later is called

Fix runaway task queue growth on cpu-bound loads.
2016-08-03 12:33:54 +03:00
Tomasz Grabiec
b224ff6ede Merge 'pdziepak/row-cache-wide-entries/v4' from seastar-dev.git
This series adds the ability for partition cache to keep information
whether partition size makes it uncacheable. During, reads these
entries save us IO operations since we already know that the partiiton
is too big to be put in the cache.

First part of the patchset makes all mutation_readers allow the
streamed_mutations they produce to outlive them, which is a guarantee
used later by the code handling reading large partitions.

(cherry picked from commit d2ed75c9ff)
2016-08-02 20:24:29 +02:00
Piotr Jastrzebski
6960fce9b2 Use continuity flag correctly with concurrent invalidations
Between reading cache entry and actually using it
invalidations can happen so we have to check if no flag was
cleared if it was we need to read the entry again.

Fixes #1464.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <7856b0ded45e42774ccd6f402b5ee42175bd73cf.1469701026.git.piotr@scylladb.com>
(cherry picked from commit fdfd1af694)
2016-08-02 20:24:22 +02:00
Avi Kivity
a556265ccd checked_file: preserve DMA alignment
Inherit the alignment parameters from the underlying file instead of
defaulting to 4096.  This gives better read performance on disks with 512-byte
sectors.

Fixes #1532.
Message-Id: <1470122188-25548-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 9f35e4d328)
2016-08-02 12:22:37 +03:00
Duarte Nunes
8243d3d1e0 storage_service: Fix get_range_to_address_map_in_local_dc
This patch fixes a couple of bugs in
get_range_to_address_map_in_local_dc.

Fixes #1517

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469782666-21320-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 7d1b7e8da3)
2016-07-29 11:24:12 +02:00
Pekka Enberg
2665bfdc93 Update seastar submodule
* seastar 103543a...f603f88 (1):
  > iotune: Fix SIGFPE with some executions
2016-07-29 11:11:56 +03:00
Tomasz Grabiec
3a1e8fffde Merge branch 'sstables/static-1.3/v1' from git@github.com:duarten/scylla.git into branch-1.3
The current code assumes cell names are always compound and may
wrongly report a non-static row as such. This patch addresses this
and adds a test case to catch regressions.

Backports the fix to #1495.
2016-07-28 15:07:41 +02:00
Gleb Natapov
23c340bed8 api: fix use after free in sum_sstable
get_sstables_including_compacted_undeleted() may return temporary shared
ptr which will be destroyed before the loop if not stored locally.

Fixes #1514

Message-Id: <20160728100504.GD2502@scylladb.com>
(cherry picked from commit 3531dd8d71)
2016-07-28 14:28:25 +03:00
Duarte Nunes
ff8a795021 sstables: Validate static cell is on static column
This patch enforces compatibility between a cell and the
corresponding column definition with regards to them being
static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-28 12:11:46 +02:00
Duarte Nunes
d11b0cac3b sstable_mutation_test: Test non-compound cell name
This patch adds a test case for reading non-compound cell names,
validating that such a cell is not incorrectly marked as static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-5-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:37 +02:00
Duarte Nunes
5ad0448cc9 sstables: Don't assume cell name is compound
The current code assumes cell names are always compound and may
wrongly report a non-static row as such, since it looks at the first
bytes of the name assuming they are the component's length.

Tables with compact storage (which cannot contain static rows) may not
have a compound comparator, so we check for the table's compoundness
before checking for the static marker. We do this by delegating to
composite_view::is_static.

Fixes #1495

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-4-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:27 +02:00
Duarte Nunes
35ab2cadc2 sstables: Remove duplication in extract_clustering_key
This patch removes some duplicated code in extract_clustering_key(),
which is already handled in composite_view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397806-8067-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:22 +02:00
Duarte Nunes
a1cee9f97c sstables: Remove superfluous call to check_static()
When building a column we're calling check_static() two times;
refector things a bit so that this doesn't happen and we reuse the
previous calculation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397748-7987-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:11:15 +02:00
Duarte Nunes
0ae7347d8e composite: Use operator[] instead of at()
Since we already do bounds checking on is_static(), we can use
bytes_view::operator[] instead of bytes_view::at() to avoid repeating
the bounds checking.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-3-git-send-email-duarte@scylladb.com>
2016-07-28 12:10:14 +02:00
Duarte Nunes
b04168c015 composite_view: Fix is_static
composite_view's is_static function is wrong because:

1) It doesn't guard against the composite being a compound;
2) Doesn't deal with widening due to integral promotions and
   consequent sign extension.

This patch fixes this by ensuring there's only one correct
implementation of is_static, to avoid code duplication and
enforce test coverage.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469616205-4550-2-git-send-email-duarte@scylladb.com>
2016-07-28 12:10:06 +02:00
Duarte Nunes
4e13853cbc compound_compat: Only compound values can be static
If a composite is not a compound, then it doesn't carry a length
prefix where static information is encoded. In its absence, a
non-compound composite can never be static.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469397561-7748-1-git-send-email-duarte@scylladb.com>
2016-07-28 12:09:59 +02:00
Pekka Enberg
503f6c6755 release: prepare for 1.3.rc2 2016-07-28 10:57:11 +03:00
Tomasz Grabiec
7d73599acd tests: lsa_async_eviction_test: Use chunked_fifo<>
To protect against large reallocations during push() which are done
under reclaim lock and may fail.
2016-07-28 09:43:51 +02:00
Piotr Jastrzebski
bf27379583 Add tests for wide partiton handling in cache.
They shouldn't be cached.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 7d29cdf81f)
2016-07-27 14:09:45 +03:00
Piotr Jastrzebski
02cf5a517a Add collectd counter for uncached wide partitions.
Keep track of every read of wide partition that's
not cached.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 37a7d49676)
2016-07-27 14:09:40 +03:00
Piotr Jastrzebski
ec3d59bf13 Add flag to configure
max size of a cached partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 636a4acfd0)
2016-07-27 14:09:34 +03:00
Piotr Jastrzebski
30c72ef3b4 Try to read whole streamed_mutation up to limit
If limit is exceeded then return the streamed_mutation
and don't cache it.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 98c12dc2e2)
2016-07-27 14:09:29 +03:00
Piotr Jastrzebski
15e69a32ba Implement mutation_from_streamed_mutation_with_limit
If mutation is bigger than this limit
it won't be read and mutation_from_streamed_mutation
will return empty optional.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 0d39bb1ad0)
2016-07-27 14:09:23 +03:00
Paweł Dziepak
4e43cb84ff mests/sstables: test reading sstable with duplicated range tombstones
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit b405ff8ad2)
2016-07-27 14:09:02 +03:00
Paweł Dziepak
07d5e939be sstables: avoid recursion in sstable_streamed_mutation::read_next()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 04f2c278c2)
2016-07-27 14:06:03 +03:00
Paweł Dziepak
a2a5a22504 sstables: protect against duplicated range tombstones
Promoted index may cause sstable to have range tombstones duplicated
several times. These duplicates appear in the "wrong" place since they
are smaller than the entity preceeding them.

This patch ignores such duplicates by skipping range tombstones that are
smaller than previously read ones.

Moreover, these duplicted range tombstone may appear in the middle of
clustering row, so the sstable reader has also gained the ability to
merge parts of the row in such cases.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 08032db269)
2016-07-27 14:05:58 +03:00
Paweł Dziepak
a39bec0e24 tests: extract streamed_mutation assertions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 50469e5ef3)
2016-07-27 14:05:43 +03:00
Duarte Nunes
f0af5719d5 thrift: Preserve partition order when accumulating
This patch changes the column_visitor so that it preservers the order
of the partitions it visits when building the accumulation result.

This is required by verbs such as get_range_slice, on top of which
users can implement paging. In such cases, the last key returned by
the query will be that start of the range for the next query. If
that key is not actually the last in the partitioner's order, then
the new request will likely result in duplicate values being sent.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469568135-19644-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5aaf43d1bc)
2016-07-27 12:11:41 +03:00
Avi Kivity
0523000af5 size_estimates_recorder: unwrap ranges before searching for sstables
column_family::select_sstables() requires unwrapped ranges, so unwrap
them.  Fixes crash with Leveled Compaction Strategy.

Fixes #1507.

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469563488-14869-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 64d0cf58ea)
2016-07-27 10:07:13 +03:00
Paweł Dziepak
69a0e6e002 stables: fix skipping partitions with no rows
If partition contains no static and clustering rows or range tombstones
mp_row_consumer will return disengaged mutation_fragment_opt with
is_mutation_end flag set to mark end of this partition.

Current, mutation_reader::impl code incorrectly recognized disengaged
mutation fragment as end of the stream of all mutations. This patch
fixes that by using is_mutation_end flag to determine whether end of
partition or end of stream was reached.

Fixes #1503.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1469525449-15525-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit efa690ce8c)
2016-07-26 13:10:31 +03:00
Amos Kong
58d4de295c scylla-housekeeping: fix typo of script path
I tried to start scylla-housekeeping service by:
 # sudo systemctl restart scylla-housekeeping.service

But it's failed for wrong script path, error detail:
 systemd[5605]: Failed at step EXEC spawning
 /usr/lib/scylla/scylla-Housekeeping: No such file or directory

The right script name is 'scylla-housekeeping'

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <c11319a3c7d3f22f613f5f6708699be0aa6bd740.1469506477.git.amos@scylladb.com>
(cherry picked from commit 64530e9686)
2016-07-26 09:19:15 +03:00
Vlad Zolotarov
026061733f tracing: set a default TTL for system_traces tables when they are created
Fixes #1482

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469104164-4452-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 4647ad9d8a)
2016-07-25 13:50:43 +03:00
Vlad Zolotarov
1d7ed190f8 SELECT tracing instrumentation: improve inter-nodes communication stages messages
Add/fix "sending to"/"received from" messages.

With this patch the single key select trace with a data on an external node
looks as follows:

Tracing session: 65dbfcc0-4f51-11e6-8dd2-000000000001

 activity                                                                                                                        | timestamp                  | source    | source_elapsed
---------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                                              Execute CQL3 query | 2016-07-21 17:42:50.124000 | 127.0.0.2 |              0
                                                                                                   Parsing a statement [shard 1] | 2016-07-21 17:42:50.124127 | 127.0.0.2 |             --
                                                                                                Processing a statement [shard 1] | 2016-07-21 17:42:50.124190 | 127.0.0.2 |             64
 Creating read executor for token 2309717968349690594 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 1] | 2016-07-21 17:42:50.124229 | 127.0.0.2 |            103
                                                                            read_data: sending a message to /127.0.0.1 [shard 1] | 2016-07-21 17:42:50.124234 | 127.0.0.2 |            108
                                                                           read_data: message received from /127.0.0.2 [shard 1] | 2016-07-21 17:42:50.124358 | 127.0.0.1 |             14
                                                          read_data handling is done, sending a response to /127.0.0.2 [shard 1] | 2016-07-21 17:42:50.124434 | 127.0.0.1 |             89
                                                                               read_data: got response from /127.0.0.1 [shard 1] | 2016-07-21 17:42:50.124662 | 127.0.0.2 |            536
                                                                                  Done processing - preparing a result [shard 1] | 2016-07-21 17:42:50.124695 | 127.0.0.2 |            569
                                                                                                                Request complete | 2016-07-21 17:42:50.124580 | 127.0.0.2 |            580

Fixes #1481

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469112271-22818-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 57b58cad8e)
2016-07-25 13:50:39 +03:00
Raphael S. Carvalho
2d66a4621a compaction: do not convert timestamp resolution to uppercase
C* only allows timestamp resolution in uppercase, so we shouldn't
be forgiving about it, otherwise migration to C* will not work.
Timestamp resolution is stored in compaction strategy options of
schema BTW.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d64878fc9bbcf40fd8de3d0f08cce9f6c2fde717.1469133851.git.raphaelsc@scylladb.com>
(cherry picked from commit c4f34f5038)
2016-07-25 13:47:23 +03:00
Duarte Nunes
aaa9b5ace8 system_keyspace: Add query_size_estimates() function
The query_size_estimates() function queries the size_estimates system
table for a given keyspace and table, filtering out the token ranges
according to the specified tokens.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit ecfa04da77)
2016-07-25 13:43:16 +03:00
Duarte Nunes
8d491e9879 size_estimates_recorder: Fix stop()
This patch fixes stop() by checking if the current CPU instead of
whether the service is active (which it won't be at the time stop() is
called).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit d984cc30bf)
2016-07-25 13:43:08 +03:00
Duarte Nunes
b63c9fb84b system_keyspace: Avoid pointers in range_estimates
This patch makes range_estimates a proper struct, where tokens are
represented as dht::tokens rather than dht::ring_position*.

We also pass other arguments to update_ and clear_size_estimates by
copy, since one will already be required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit e16f3f2969)
2016-07-25 13:42:53 +03:00
Duarte Nunes
b229f03198 thrift: Fail when creating mixed CF
This patch ensures we fail when creating a mixed column family, either
when adding columns to a dynamic CF through updated_column_family() or
when adding a dynamic column upon insertion.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469378658-19853-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5c4a2044d5)
2016-07-25 13:42:05 +03:00
Duarte Nunes
6caa59560b thrift: Correctly translate no_such_column_family
The no_such_column_family exception is translated to
InvalidRequestException instead of to NotFoundException.

8991d35231 exposed this problem.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469376674-14603-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 560cc12fd7)
2016-07-25 13:41:58 +03:00
Duarte Nunes
79196af9fb thrift: Implement describe_splits verb
This patch implements the describe_splits verb on top of
describe_splits_ex.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit ab08561b89)
2016-07-25 13:41:54 +03:00
Duarte Nunes
afe09da858 thrift: Implement describe_splits_ex verb
This patch implements the describe_splits_ex verbs by querying the
size_estimates system table for all the estimates in the specified
token range.

If the keys_per_split argument is bigger then the
estimated partitions count, then we merge ranges until keys_per_split
is met. Note that the tokens can't be split any further,
keys_per_split might be less than the reported number of keys in one
or more ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 472c23d7d2)
2016-07-25 13:41:46 +03:00
Duarte Nunes
d6cb41ff24 thrift: Handle and convert invalid_request_exception
This patch converts an exceptions::invalid_request_exception
into a Thrift InvalidRequestException instead of into a generic one.

This makes TitanDB work correctly, which expects an
InvalidRequestException when setting a non-existent keyspace.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469362086-1013-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 2be45c4806)
2016-07-24 16:46:18 +03:00
Duarte Nunes
6bf77c7b49 thrift: Use database::find_schema directly
This patch changes lookup_schema() so it directly calls
database::find_schema() instead of going through
database::find_column_family(). It also drops conversion of the
no_such_column_family exeption, as that is already handled at a higher
layer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 8991d35231)
2016-07-24 16:46:05 +03:00
Duarte Nunes
6d34b4dab7 thrift: Remove hardcoded version constant
...and use the one in thrift_server.hh instead.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 038d42c589)
2016-07-24 16:45:46 +03:00
Duarte Nunes
d367f1e9ab thrift: Remove unused with_cob_dereference function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 8bb43d09b1)
2016-07-24 16:45:22 +03:00
Avi Kivity
75a36ae453 bloom_filter: fix overflow for large filters
We use ::abs(), which has an int parameter, on long arguments, resulting
in incorrect results.

Switch to std::abs() instead, which has the correct overloads.

Fixes #1494.

Message-Id: <1469347802-28933-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 900639915d)
2016-07-24 11:32:28 +03:00
Tomasz Grabiec
35c1781913 schema_tables: Fix hang during keyspace drop
Fixes #1484.

We drop tables as part of keyspace drop. Table drop starts with
creating a snapshot on all shards. All shards must use the same
snapshot timestamp which, among other things, is part of the snapshot
name. The timestamp is generated using supplied timestamp generating
function (joinpoint object). The joinpoint object will wait for all
shards to arrive and then generate and return the timestamp.

However, we drop tables in parallel, using the same joinpoint
instance. So joinpoint may be contacted by snapshotting shards of
tables A and B concurrently, generating timestamp t1 for some shards
of table A and some shards of table B. Later the remaining shards of
table A will get a different timestamp. As a result, different shards
may use different snapshot names for the same table. The snapshot
creation will never complete because the sealing fiber waits for all
shards to signal it, on the same name.

The fix is to give each table a separate joinpoint instance.

Message-Id: <1469117228-17879-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5e8f0efc85)
2016-07-22 15:36:45 +02:00
Vlad Zolotarov
1489b28ffd cql_server::connection::process_prepare(): don't std::move() a shared_ptr captured by reference in value_of() lambda
A seastar::value_of() lambda used in a trace point was doing the unthinkable:
it called std::move() on a value captured by reference. Not only it compiled(!!!)
but it also actually std::move()ed the shared_ptr before it was used in a make_result()
which naturally caused a SIGSEG crash.

Fixes #1491

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469193763-27631-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 9423c13419)
2016-07-22 16:33:17 +03:00
Avi Kivity
f975653c94 Update seastar submodule to point at scylla-seastar 2016-07-21 12:31:09 +03:00
Duarte Nunes
96f5cbb604 thrift: Omit regular columns for dynamic CFs
This patch skips adding the auto-generated regular column when
describing a dynamic Column family for the describe_keyspace(s) verbs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469091720-10113-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit a436cf945c)
2016-07-21 12:06:29 +03:00
Raphael S. Carvalho
66ebef7d10 tests: add new test for date tiered strategy
This test set the time window to 1 hour and checks that the strategy
works accordingly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit cf54af9e58)
2016-07-21 12:00:26 +03:00
Raphael S. Carvalho
789fb0db97 compaction: implement date tiered compaction strategy options
Now date tiered compaction strategy will take into account the
strategy options which are defined in the schema.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit eaa6e281a2)
2016-07-21 12:00:18 +03:00
Pekka Enberg
af7c0f6433 Revert "Merge seastar upstream"
This reverts commit aaf6786997.

We should backport the iotune fixes for 1.3 and not pull everything.
2016-07-21 11:19:50 +03:00
Pekka Enberg
aaf6786997 Merge seastar upstream
* seastar 103543a...9d1db3f (8):
  > reactor: limit task backlog
  > iotune: Fix SIGFPE with some executions
  > Merge "Preparation for protobuf" from Amnon
  > byteorder: add missing cpu_to_be(), be_to_cpu() functions
  > rpc: fix gcc-7 compilation error
  > reactor: Register the smp metrics disabled
  > scollectd: Allow creating metric that is disabled
  > Merge "Propagate timeout to a server" from Gleb
2016-07-21 11:04:31 +03:00
Pekka Enberg
e8cb163cdf db/config: Start Thrift server by default
We have Thrift support now so start the server by default.
Message-Id: <1469002000-26767-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit aff8cf319d)
2016-07-20 11:29:24 +03:00
Duarte Nunes
2d7c322805 thrift: Actually concatenate strings
This patch fixes concatenating a char[] with an int by using sprint
instead of just increasing the pointer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468971542-9600-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 64dff69077)
2016-07-20 11:09:15 +03:00
Tomasz Grabiec
13f18c6445 database: Add table name to log message about sealing
Message-Id: <1468917744-2539-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 0d26294fac)
2016-07-20 10:13:32 +03:00
Tomasz Grabiec
9c430c2cff schema_tables: Add more logging
Message-Id: <1468917771-2592-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit a0832f08d2)
2016-07-20 10:13:28 +03:00
Pekka Enberg
c84e030fe9 release: prepare for 1.3.rc1 2016-07-19 20:15:26 +03:00
901 changed files with 29785 additions and 93951 deletions

9
.gitignore vendored
View File

@@ -9,12 +9,3 @@ dist/ami/files/*.rpm
dist/ami/variables.json
dist/ami/scylla_deploy.sh
*.pyc
Cql.tokens
.kdev4
*.kdev4
CMakeLists.txt.user
.cache
.tox
*.egg-info
__pycache__CMakeLists.txt.user
.gdbinit

View File

@@ -1,140 +0,0 @@
##
## For best results, first compile the project using the Ninja build-system.
##
cmake_minimum_required(VERSION 3.7)
project(scylla)
if (NOT DEFINED FOR_IDE AND NOT DEFINED ENV{FOR_IDE} AND NOT DEFINED ENV{CLION_IDE})
message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in IDEs, please define FOR_IDE to acknowledge this.")
endif()
# Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.
set(SEASTAR_INCLUDE_DIRS "seastar")
# These paths are always available, since they're included in the repository. Additional DPDK headers are placed while
# Seastar is built, and are captured in `SEASTAR_INCLUDE_DIRS` through parsing the Seastar pkg-config file (below).
set(SEASTAR_DPDK_INCLUDE_DIRS
seastar/dpdk/lib/librte_eal/common/include
seastar/dpdk/lib/librte_eal/common/include/generic
seastar/dpdk/lib/librte_eal/common/include/x86
seastar/dpdk/lib/librte_ether)
find_package(PkgConfig REQUIRED)
set(ENV{PKG_CONFIG_PATH} "${CMAKE_SOURCE_DIR}/seastar/build/release:$ENV{PKG_CONFIG_PATH}")
pkg_check_modules(SEASTAR seastar)
find_package(Boost COMPONENTS filesystem program_options system thread)
##
## Populate the names of all source and header files in the indicated paths in a designated variable.
##
## When RECURSIVE is specified, directories are traversed recursively.
##
## Use: scan_scylla_source_directories(VAR my_result_var [RECURSIVE] PATHS [path1 path2 ...])
##
function (scan_scylla_source_directories)
set(options RECURSIVE)
set(oneValueArgs VAR)
set(multiValueArgs PATHS)
cmake_parse_arguments(args "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}")
set(globs "")
foreach (dir ${args_PATHS})
list(APPEND globs "${dir}/*.cc" "${dir}/*.hh")
endforeach()
if (args_RECURSIVE)
set(glob_kind GLOB_RECURSE)
else()
set(glob_kind GLOB)
endif()
file(${glob_kind} var
${globs})
set(${args_VAR} ${var} PARENT_SCOPE)
endfunction()
## Although Seastar is an external project, it is common enough to explore the sources while doing
## Scylla development that we'll treat the Seastar sources as part of this project for easier navigation.
scan_scylla_source_directories(
VAR SEASTAR_SOURCE_FILES
RECURSIVE
PATHS
seastar/core
seastar/http
seastar/json
seastar/net
seastar/rpc
seastar/tests
seastar/util)
scan_scylla_source_directories(
VAR SCYLLA_ROOT_SOURCE_FILES
PATHS .)
scan_scylla_source_directories(
VAR SCYLLA_SUB_SOURCE_FILES
RECURSIVE
PATHS
api
auth
cql3
db
dht
exceptions
gms
index
io
locator
message
repair
service
sstables
streaming
tests
thrift
tracing
transport
utils)
scan_scylla_source_directories(
VAR SCYLLA_GEN_SOURCE_FILES
RECURSIVE
PATHS build/release/gen)
set(SCYLLA_SOURCE_FILES
${SCYLLA_ROOT_SOURCE_FILES}
${SCYLLA_GEN_SOURCE_FILES}
${SCYLLA_SUB_SOURCE_FILES})
add_executable(scylla
${SEASTAR_SOURCE_FILES}
${SCYLLA_SOURCE_FILES})
# Note that since CLion does not undestand GCC6 concepts, we always disable them (even if users configure otherwise).
# CLion seems to have trouble with `-U` (macro undefinition), so we do it this way instead.
list(REMOVE_ITEM SEASTAR_CFLAGS "-DHAVE_GCC6_CONCEPTS")
# If the Seastar pkg-config information is available, append to the default flags.
#
# For ease of browsing the source code, we always pretend that DPDK is enabled.
target_compile_options(scylla PUBLIC
-std=gnu++14
-DHAVE_DPDK
-DHAVE_HWLOC
"${SEASTAR_CFLAGS}")
# The order matters here: prefer the "static" DPDK directories to any dynamic paths from pkg-config. Some files are only
# available dynamically, though.
target_include_directories(scylla PUBLIC
.
${SEASTAR_DPDK_INCLUDE_DIRS}
${SEASTAR_INCLUDE_DIRS}
${Boost_INCLUDE_DIRS}
build/release/gen)

View File

@@ -1,11 +0,0 @@
# Asking questions or requesting help
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.
# Reporting an issue
Please use the [Issue Tracker](https://github.com/scylladb/scylla/issues/) to report issues. Fill in as much information as you can in the issue template, especially for performance problems.
# Contributing Code to Scylla
To contribute code to Scylla, you need to sign the [Contributor License Agreement](http://www.scylladb.com/opensource/cla/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.

View File

@@ -1,233 +0,0 @@
# Guidelines for developing Scylla
This document is intended to help developers and contributors to Scylla get started. The first part consists of general guidelines that make no assumptions about a development environment or tooling. The second part describes a particular environment and work-flow for exemplary purposes.
## Overview
This section covers some high-level information about the Scylla source code and work-flow.
### Getting the source code
Scylla uses [Git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) to manage its dependency on Seastar and other tools. Be sure that all submodules are correctly initialized when cloning the project:
```bash
$ git clone https://github.com/scylladb/scylla
$ cd scylla
$ git submodule update --init --recursive
```
### Dependencies
Scylla depends on the system package manager for its development dependencies.
Running `./install_dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native thread, and up to 3 GB per native thread while linking.
Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.
To build for the first time:
```bash
$ ./configure.py
$ ninja-build
```
Afterwards, it is sufficient to just execute Ninja.
The full suite of options for project configuration is available via
```bash
$ ./configure.py --help
```
The most important options are:
- `--mode={release,debug,all}`: Debug mode enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) and allows for debugging with tools like GDB. Debugging builds are generally slower and generate much larger object files than release builds.
- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.
To save time -- for instance, to avoid compiling all unit tests -- you can also specify specific targets to Ninja. For example,
```bash
$ ninja-build build/release/tests/schema_change_test
```
### Unit testing
Unit tests live in the `/tests` directory. Like with application source files, test sources and executables are specified manually in `configure.py` and need to be updated when changes are made.
A test target can be any executable. A non-zero return code indicates test failure.
Most tests in the Scylla repository are built using the [Boost.Test](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/index.html) library. Utilities for writing tests with Seastar futures are also included.
Run all tests through the test execution wrapper with
```bash
$ ./test.py --mode={debug,release}
```
The `--name` argument can be specified to run a particular test.
Alternatively, you can execute the test executable directly. For example,
```bash
$ build/release/tests/row_cache_test -- -c1 -m1G
```
The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread and 1 GB of memory.
### Preparing patches
All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/).
### Running Scylla
Once Scylla has been compiled, executing the (`debug` or `release`) target will start a running instance in the foreground:
```bash
$ build/release/scylla
```
The `scylla` executable requires a configuration file, `scylla.yaml`. By default, this is read from `$SCYLLA_HOME/conf/scylla.yaml`. A good starting point for development is located in the repository at `/conf/scylla.yaml`.
For development, a directory at `$HOME/scylla` can be used for all Scylla-related files:
```bash
$ mkdir -p $HOME/scylla $HOME/scylla/conf
$ cp conf/scylla.yaml $HOME/scylla/conf/scylla.yaml
$ # Edit configuration options as appropriate
$ SCYLLA_HOME=$HOME/scylla build/release/scylla
```
The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories` and `commitlog_directory` fields as appropriate.
Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.
Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.
On a development machine, one might run Scylla as
```bash
$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes
```
### Branches and tags
Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.
Similarly, tags are used to pin-point precise release versions, including hot-fix versions like 1.5.4. These are named `scylla-1.5.4`, for example.
Most development happens on the `master` branch. Release branches are cut from `master` based on time and/or features. When a patch against `master` fixes a serious issue like a node crash or data loss, it is backported to a particular release branch with `git cherry-pick` by the project maintainers.
## Example: development on Fedora 25
This section describes one possible work-flow for developing Scylla on a Fedora 25 system. It is presented as an example to help you to develop a work-flow and tools that you are comfortable with.
### Preface
This guide will be written from the perspective of a fictitious developer, Taylor Smith.
### Git work-flow
Having two Git remotes is useful:
- A public clone of Seastar (`"public"`)
- A private clone of Seastar (`"private"`) for in-progress work or work that is not yet ready to share
The first step to contributing a change to Scylla is to create a local branch dedicated to it. For example, a feature that fixes a bug in the CQL statement for creating tables could be called `ts/cql_create_table_error/v1`. The branch name is prefaced by the developer's initials and has a suffix indicating that this is the first version. The version suffix is useful when branches are shared publicly and changes are requested on the mailing list. Having a branch for each version of the patch (or patch set) shared publicly makes it easier to reference and compare the history of a change.
Setting the upstream branch of your development branch to `master` is a useful way to track your changes. You can do this with
```bash
$ git branch -u master ts/cql_create_table_error/v1
```
As a patch set is developed, you can periodically push the branch to the private remote to back-up work.
Once the patch set is ready to be reviewed, push the branch to the public remote and prepare an email to the `scylladb-dev` mailing list. Including a link to the branch on your public remote allows for reviewers to quickly test and explore your changes.
### Development environment and source code navigation
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt`, for use only with development environments (not for building) so that they can properly analyze the source code.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice for code hygiene, though its C++ parser sometimes makes errors and flags false issues.
Other good options that directly parse CMake files are [KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
To use the `CMakeLists.txt` file with these programs, define the `FOR_IDE` CMake variable or shell environmental variable.
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects, and its C++ parser has many similar issues as CLion.
### Distributed compilation: `distcc` and `ccache`
Scylla's compilations times can be long. Two tools help somewhat:
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and re-uses them when possible
- [distcc](https://github.com/distcc/distcc) distributes compilation jobs to remote machines
A reasonably-powered laptop acts as the coordinator for compilation. A second, more powerful, machine acts as a passive compilation server.
Having a direct wired connection between the machines ensures that object files can be transmitted quickly and limits the overhead of remote compilation.
The coordinator has been assigned the static IP address `10.0.0.1` and the passive compilation machine has been assigned `10.0.0.2`.
On Fedora, installing the `ccache` package places symbolic links for `gcc` and `g++` in the `PATH`. This allows normal compilation to transparently invoke `ccache` for compilation and cache object files on the local file-system.
Next, set `CCACHE_PREFIX` so that `ccache` is responsible for invoking `distcc` as necessary:
```bash
export CCACHE_PREFIX="distcc"
```
On each host, edit `/etc/sysconfig/distccd` to include the allowed coordinators and the total number of jobs that the machine should accept.
This example is for the laptop, which has 2 physical cores (4 logical cores with hyper-threading):
```
OPTIONS="--allow 10.0.0.2 --allow 127.0.0.1 --jobs 4"
```
`10.0.0.2` has 8 physical cores (16 logical cores) and 64 GB of memory.
As a rule-of-thumb, the number of jobs that a machine should be specified to support should be equal to the number of its native threads.
Restart the `distccd` service on all machines.
On the coordinator machine, edit `$HOME/.distcc/hosts` with the available hosts for compilation. Order of the hosts indicates preference.
```
10.0.0.2/16 localhost/2
```
In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine will be sent up to 2. Allowing for two extra threads on the host machine for coordination, we run compilation with `16 + 2 + 2 = 20` jobs in total: `ninja-build -j20`.
When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.
One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next section speeding up this process.
### Using the `gold` linker
Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds the linking process. On Fedora, you can switch the system linker using
```bash
$ sudo alternatives --config ld
```
### Testing changes in Seastar with Scylla
Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
One way to do this it to create a local remote for the Seastar submodule in the Scylla repository:
```bash
$ cd $HOME/src/scylla
$ cd seastar
$ git remote add local /home/tsmith/src/seastar
$ git remote update
$ git checkout -t local/my_local_seastar_branch
```

View File

@@ -1,19 +1,29 @@
# Scylla
## Quick-start
## Building Scylla
```bash
$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!
In addition to required packages by Seastar, the following packages are required by Scylla.
### Submodules
Scylla uses submodules, so make sure you pull the submodules first by doing:
```
git submodule init
git submodule update --recursive
```
Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.
### Building and Running Scylla on Fedora
* Installing required packages:
## Running Scylla
```
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel
```
* Build Scylla
```
./configure.py --mode=release --with=scylla --disable-xen
ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM
```
* Run Scylla
```
@@ -73,6 +83,14 @@ Run the image with:
docker run -p $(hostname -i):9042:9042 -i -t <image name>
```
## Contributing to Scylla
[Guidelines for contributing](CONTRIBUTING.md)
Do not send pull requests.
Send patches to the mailing list address scylladb-dev@googlegroups.com.
Be sure to subscribe.
In order for your patches to be merged, you must sign the Contributor's
License Agreement, protecting your rights and ours. See
http://www.scylladb.com/opensource/cla/.

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=2.1.6
VERSION=1.3.5
if test -f version
then
@@ -10,12 +10,7 @@ else
DATE=$(date +%Y%m%d)
GIT_COMMIT=$(git log --pretty=format:'%h' -n 1)
SCYLLA_VERSION=$VERSION
# For custom package builds, replace "0" with "counter.your_name",
# where counter starts at 1 and increments for successive versions.
# This ensures that the package manager will select your custom
# package over the standard release.
SCYLLA_BUILD=0
SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
SCYLLA_RELEASE=$DATE.$GIT_COMMIT
fi
echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"

View File

@@ -397,36 +397,6 @@
}
]
},
{
"path": "/cache_service/metrics/key/hits_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get key hits moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_key_hits_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/key/requests_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get key requests moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_key_requests_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/key/size",
"operations": [
@@ -637,36 +607,6 @@
}
]
},
{
"path": "/cache_service/metrics/counter/hits_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get counter hits moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_counter_hits_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/counter/requests_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get counter requests moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_counter_requests_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/counter/size",
"operations": [

View File

@@ -78,19 +78,11 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"split_output",
"description":"true if the output of the major compaction should be split in several sstables",
"required":false,
"allowMultiple":false,
"type":"bool",
"paramType":"query"
}
]
}
@@ -110,7 +102,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -137,7 +129,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -161,7 +153,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -188,7 +180,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -212,7 +204,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -252,7 +244,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -279,7 +271,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -306,7 +298,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -325,7 +317,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -357,7 +349,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -389,7 +381,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -413,7 +405,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -440,7 +432,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -467,7 +459,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -499,7 +491,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -526,7 +518,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -553,7 +545,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -577,7 +569,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -601,7 +593,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -641,7 +633,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -681,7 +673,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -721,7 +713,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -761,7 +753,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -801,7 +793,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -841,7 +833,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -881,7 +873,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -924,7 +916,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -951,7 +943,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -978,7 +970,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1002,7 +994,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1042,7 +1034,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1066,7 +1058,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1109,7 +1101,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1152,7 +1144,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1211,7 +1203,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1251,7 +1243,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1275,7 +1267,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1318,7 +1310,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1361,7 +1353,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1420,7 +1412,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1460,7 +1452,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1500,7 +1492,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1540,7 +1532,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1580,7 +1572,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1620,7 +1612,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1660,7 +1652,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1700,7 +1692,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1740,7 +1732,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1780,7 +1772,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1820,7 +1812,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1860,7 +1852,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1900,7 +1892,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1940,7 +1932,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1980,7 +1972,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2020,7 +2012,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2060,7 +2052,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2100,7 +2092,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2124,7 +2116,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2164,7 +2156,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2204,7 +2196,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2244,7 +2236,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2284,7 +2276,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2308,7 +2300,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2332,7 +2324,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2359,7 +2351,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2386,7 +2378,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2413,7 +2405,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2440,7 +2432,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2509,7 +2501,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2533,7 +2525,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2557,7 +2549,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2581,7 +2573,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2605,7 +2597,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2629,7 +2621,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2653,7 +2645,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2677,7 +2669,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2701,7 +2693,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2725,7 +2717,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2749,7 +2741,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2773,7 +2765,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",

View File

@@ -21,8 +21,8 @@
"parameters":[
{
"name":"host",
"description":"The host name. If absent, the local server broadcast/listen address is used",
"required":false,
"description":"The host name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
@@ -45,8 +45,8 @@
"parameters":[
{
"name":"host",
"description":"The host name. If absent, the local server broadcast/listen address is used",
"required":false,
"description":"The host name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"

View File

@@ -42,25 +42,6 @@
}
]
},
{
"path":"/failure_detector/endpoint_phi_values",
"operations":[
{
"method":"GET",
"summary":"Get end point phi values",
"type":"array",
"items":{
"type":"endpoint_phi_values"
},
"nickname":"get_endpoint_phi_values",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/failure_detector/endpoints/",
"operations":[
@@ -221,20 +202,6 @@
"description": "The application state version"
}
}
},
"endpoint_phi_value": {
"id" : "endpoint_phi_value",
"description": "Holds phi value for a single end point",
"properties": {
"phi": {
"type": "double",
"description": "Phi value"
},
"endpoint": {
"type": "string",
"description": "end point address"
}
}
}
}
}

View File

@@ -777,7 +777,7 @@
]
},
{
"path": "/storage_proxy/metrics/read/moving_average_histogram",
"path": "/storage_proxy/metrics/read/moving_avrage_histogram",
"operations": [
{
"method": "GET",
@@ -792,7 +792,7 @@
]
},
{
"path": "/storage_proxy/metrics/range/moving_average_histogram",
"path": "/storage_proxy/metrics/range/moving_avrage_histogram",
"operations": [
{
"method": "GET",
@@ -942,7 +942,7 @@
]
},
{
"path": "/storage_proxy/metrics/write/moving_average_histogram",
"path": "/storage_proxy/metrics/write/moving_avrage_histogram",
"operations": [
{
"method": "GET",

View File

@@ -952,22 +952,6 @@
}
]
},
{
"path":"/storage_service/force_terminate_repair",
"operations":[
{
"method":"POST",
"summary":"Force terminate all repair sessions",
"type":"void",
"nickname":"force_terminate_all_repair_sessions_new",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/decommission",
"operations":[
@@ -1217,12 +1201,11 @@
],
"parameters":[
{
"name":"type",
"description":"Which keyspaces to return",
"name":"non_system",
"description":"When set to true limit to non system",
"required":false,
"allowMultiple":false,
"type":"string",
"enum": [ "all", "user", "non_local_strategy" ],
"type":"boolean",
"paramType":"query"
}
]
@@ -1753,57 +1736,6 @@
}
]
},
{
"path":"/storage_service/slow_query",
"operations":[
{
"method":"POST",
"summary":"Set slow query parameter",
"type":"void",
"nickname":"set_slow_query",
"produces":[
"application/json"
],
"parameters":[
{
"name":"enable",
"description":"set it to true to enable, anything else to disable",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"ttl",
"description":"TTL in seconds",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
},
{
"name":"threshold",
"description":"Slow query record threshold in microseconds",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
},
{
"method":"GET",
"summary":"Returns the slow query record configuration.",
"type":"slow_query_info",
"nickname":"get_slow_query_info",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/auto_compaction/{keyspace}",
"operations":[
@@ -2201,24 +2133,6 @@
}
}
},
"slow_query_info": {
"id":"slow_query_info",
"description":"Slow query triggering information",
"properties":{
"enable":{
"type":"boolean",
"description":"Is slow query logging enable or disable"
},
"ttl":{
"type":"long",
"description":"The slow query TTL in seconds"
},
"threshold":{
"type":"long",
"description":"The slow query logging threshold in microseconds. Queries that takes longer, will be logged"
}
}
},
"endpoint_detail":{
"id":"endpoint_detail",
"description":"Endpoint detail",

View File

@@ -49,7 +49,7 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
throw bad_param_exception(ex.what());
}
// We never going to get here
throw std::runtime_error("exception_reply");
return std::make_unique<reply>();
}
future<> set_server_init(http_context& ctx) {

View File

@@ -29,7 +29,6 @@
#include "utils/histogram.hh"
#include "http/exception.hh"
#include "api_init.hh"
#include "seastarx.hh"
namespace api {
@@ -117,7 +116,6 @@ inline
httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
httpd::utils_json::histogram h;
h = val;
h.sum = val.estimated_sum();
return h;
}
@@ -131,7 +129,7 @@ httpd::utils_json::rate_moving_average meter_to_json(const utils::rate_moving_av
inline
httpd::utils_json::rate_moving_average_and_histogram timer_to_json(const utils::rate_moving_average_and_histogram& val) {
httpd::utils_json::rate_moving_average_and_histogram h;
h.hist = to_json(val.hist);
h.hist = val.hist;
h.meter = meter_to_json(val.rate);
return h;
}
@@ -167,36 +165,33 @@ inline int64_t max_int64(int64_t a, int64_t b) {
* It combine total and the sub set for the ratio and its
* to_json method return the ration sub/total
*/
template<typename T>
struct basic_ratio_holder : public json::jsonable {
T total = 0;
T sub = 0;
struct ratio_holder : public json::jsonable {
double total = 0;
double sub = 0;
virtual std::string to_json() const {
if (total == 0) {
return "0";
}
return std::to_string(sub/total);
}
basic_ratio_holder() = default;
basic_ratio_holder& add(T _total, T _sub) {
ratio_holder() = default;
ratio_holder& add(double _total, double _sub) {
total += _total;
sub += _sub;
return *this;
}
basic_ratio_holder(T _total, T _sub) {
ratio_holder(double _total, double _sub) {
total = _total;
sub = _sub;
}
basic_ratio_holder<T>& operator+=(const basic_ratio_holder<T>& a) {
ratio_holder& operator+=(const ratio_holder& a) {
return add(a.total, a.sub);
}
friend basic_ratio_holder<T> operator+(basic_ratio_holder a, const basic_ratio_holder<T>& b) {
friend ratio_holder operator+(ratio_holder a, const ratio_holder& b) {
return a += b;
}
};
typedef basic_ratio_holder<double> ratio_holder;
typedef basic_ratio_holder<int64_t> integral_ratio_holder;
class unimplemented_exception : public base_exception {
public:

View File

@@ -177,20 +177,6 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_key_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_key_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_key_size.set(r, [] (std::unique_ptr<request> req) {
// TBD
// FIXME
@@ -252,13 +238,13 @@ void set_cache_service(http_context& ctx, routes& r) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().partitions();
return cf.get_row_cache().num_entries();
}, std::plus<uint64_t>());
});
cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().partitions();
return cf.get_row_cache().num_entries();
}, std::plus<uint64_t>());
});
@@ -294,20 +280,6 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_counter_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_counter_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_counter_size.set(r, [] (std::unique_ptr<request> req) {
// TBD
// FIXME

View File

@@ -40,13 +40,13 @@ static auto transformer(const std::vector<collectd_value>& values) {
for (auto v: values) {
switch (v._type) {
case scollectd::data_type::GAUGE:
collected_value.values.push(v.d());
collected_value.values.push(v.u._d);
break;
case scollectd::data_type::DERIVE:
collected_value.values.push(v.i());
collected_value.values.push(v.u._i);
break;
default:
collected_value.values.push(v.ui());
collected_value.values.push(v.u._ui);
break;
}
}

View File

@@ -24,7 +24,7 @@
#include <vector>
#include "http/exception.hh"
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include "sstables/estimated_histogram.hh"
#include <algorithm>
namespace api {
@@ -182,8 +182,17 @@ static int64_t max_row_size(column_family& cf) {
return res;
}
static integral_ratio_holder mean_row_size(column_family& cf) {
integral_ratio_holder res;
static double update_ratio(double acc, double f, double total) {
if (f && !total) {
throw bad_param_exception("total should include all elements");
} else if (total) {
acc += f / total;
}
return acc;
}
static ratio_holder mean_row_size(column_family& cf) {
ratio_holder res;
for (auto i: *cf.get_sstables() ) {
auto c = i->get_stats_metadata().estimated_row_size.count();
res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;
@@ -274,16 +283,6 @@ static std::vector<uint64_t> concat_sstable_count_per_level(std::vector<uint64_t
return a;
}
ratio_holder filter_false_positive_as_ratio_holder(const sstables::shared_sstable& sst) {
double f = sst->filter_get_false_positive();
return ratio_holder(f + sst->filter_get_true_positive(), f);
}
ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared_sstable& sst) {
double f = sst->filter_get_recent_false_positive();
return ratio_holder(f + sst->filter_get_recent_true_positive(), f);
}
void set_column_family(http_context& ctx, routes& r) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
vector<sstring> res;
@@ -404,14 +403,14 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
sstables::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_row_size);
}
return res;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -426,14 +425,14 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
sstables::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_column_count);
}
return res;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_all_compression_ratio.set(r, [] (std::unique_ptr<request> req) {
@@ -563,13 +562,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), mean_row_size, std::plus<ratio_holder>());
});
cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, ratio_holder(), mean_row_size, std::plus<ratio_holder>());
});
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -605,27 +602,39 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -798,10 +807,10 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_sstable_per_read;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -860,17 +869,17 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_read;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_write;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
sstables::merge, utils_json::estimated_histogram());
});
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -20,13 +20,13 @@
*/
#include "compaction_manager.hh"
#include "sstables/compaction_manager.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
namespace api {
using namespace scollectd;
namespace cm = httpd::compaction_manager_json;
using namespace json;

View File

@@ -22,22 +22,16 @@
#include "locator/snitch_base.hh"
#include "endpoint_snitch.hh"
#include "api/api-doc/endpoint_snitch_info.json.hh"
#include "utils/fb_utilities.hh"
namespace api {
void set_endpoint_snitch(http_context& ctx, routes& r) {
static auto host_or_broadcast = [](const_req req) {
auto host = req.get_query_param("host");
return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);
};
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [](const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(host_or_broadcast(req));
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [] (const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(req.get_query_param("host"));
});
httpd::endpoint_snitch_info_json::get_rack.set(r, [](const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_rack(host_or_broadcast(req));
httpd::endpoint_snitch_info_json::get_rack.set(r, [] (const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_rack(req.get_query_param("host"));
});
httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [] (const_req req) {

View File

@@ -88,20 +88,6 @@ void set_failure_detector(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(state);
});
});
fd::get_endpoint_phi_values.set(r, [](std::unique_ptr<request> req) {
return gms::get_arrival_samples().then([](std::map<gms::inet_address, gms::arrival_window> map) {
std::vector<fd::endpoint_phi_value> res;
auto now = gms::arrival_window::clk::now();
for (auto& p : map) {
fd::endpoint_phi_value val;
val.endpoint = p.first.to_sstring();
val.phi = p.second.phi(now);
res.emplace_back(std::move(val));
}
return make_ready_future<json::json_return_type>(res);
});
});
}
}

View File

@@ -24,6 +24,7 @@
namespace api {
using namespace scollectd;
using namespace json;
namespace hh = httpd::hinted_handoff_json;

View File

@@ -29,11 +29,11 @@
namespace api {
static logging::logger alogger("lsa-api");
static logging::logger logger("lsa-api");
void set_lsa(http_context& ctx, routes& r) {
httpd::lsa_json::lsa_compact.set(r, [&ctx](std::unique_ptr<request> req) {
alogger.info("Triggering compaction");
logger.info("Triggering compaction");
return ctx.db.invoke_on_all([] (database&) {
logalloc::shard_tracker().reclaim(std::numeric_limits<size_t>::max());
}).then([] {

View File

@@ -27,7 +27,7 @@
#include <sstream>
using namespace httpd::messaging_service_json;
using namespace netw;
using namespace net;
namespace api {
@@ -120,13 +120,13 @@ void set_messaging_service(http_context& ctx, routes& r) {
}));
get_version.set(r, [](const_req req) {
return netw::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
return net::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
});
get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);
return netw::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
for (auto i = 0; i < num_verb; i++) {
(*map)[i]+= local_map[i];
}

View File

@@ -52,9 +52,9 @@ static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>
});
}
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, utils::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, utils::estimated_histogram(),
utils::estimated_histogram_merge).then([](const utils::estimated_histogram& val) {
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, sstables::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, sstables::estimated_histogram(),
sstables::merge).then([](const sstables::estimated_histogram& val) {
utils_json::estimated_histogram res;
res = val;
return make_ready_future<json::json_return_type>(res);
@@ -397,7 +397,7 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::range);
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -22,8 +22,6 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include <service/storage_service.hh>
#include <db/commitlog/commitlog.hh>
#include <gms/gossiper.hh>
@@ -34,7 +32,6 @@
#include "column_family.hh"
#include "log.hh"
#include "release.hh"
#include "sstables/compaction_manager.hh"
namespace api {
@@ -362,22 +359,16 @@ void set_storage_service(http_context& ctx, routes& r) {
try {
res = fut.get0();
} catch(std::runtime_error& e) {
throw httpd::bad_param_exception(e.what());
return make_ready_future<json::json_return_type>(json_exception(httpd::bad_param_exception(e.what())));
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
ss::decommission.set(r, [](std::unique_ptr<request> req) {
@@ -466,15 +457,8 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_keyspaces.set(r, [&ctx](const_req req) {
auto type = req.get_query_param("type");
if (type == "user") {
return ctx.db.local().get_non_system_keyspaces();
} else if (type == "non_local_strategy") {
return map_keys(ctx.db.local().get_keyspaces() | boost::adaptors::filtered([](const auto& p) {
return p.second.get_replication_strategy().get_type() != locator::replication_strategy_type::local;
}));
}
return map_keys(ctx.db.local().get_keyspaces());
auto non_system = req.get_query_param("non_system");
return map_keys(ctx.db.local().keyspaces());
});
ss::update_snitch.set(r, [](std::unique_ptr<request> req) {
@@ -558,7 +542,9 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(service::get_local_storage_service().is_joined());
return service::get_local_storage_service().is_joined().then([] (bool is_joined) {
return make_ready_future<json::json_return_type>(is_joined);
});
});
ss::set_stream_throughput_mb_per_sec.set(r, [](std::unique_ptr<request> req) {
@@ -678,60 +664,23 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {
auto probability = req->get_query_param("probability");
return futurize<json::json_return_type>::apply([probability] {
try {
double real_prob = std::stod(probability.c_str());
return tracing::tracing::tracing_instance().invoke_on_all([real_prob] (auto& local_tracing) {
local_tracing.set_trace_probability(real_prob);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
}).then_wrapped([probability] (auto&& f) {
try {
f.get();
return make_ready_future<json::json_return_type>(json_void());
} catch (std::out_of_range& e) {
throw httpd::bad_param_exception(e.what());
} catch (std::invalid_argument&){
throw httpd::bad_param_exception(sprint("Bad format in a probability value: \"%s\"", probability.c_str()));
}
});
} catch (...) {
throw httpd::bad_param_exception(sprint("Bad format of a probability value: \"%s\"", probability.c_str()));
}
});
ss::get_trace_probability.set(r, [](std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(tracing::tracing::get_local_tracing_instance().get_trace_probability());
});
ss::get_slow_query_info.set(r, [](const_req req) {
ss::slow_query_info res;
res.enable = tracing::tracing::get_local_tracing_instance().slow_query_tracing_enabled();
res.ttl = tracing::tracing::get_local_tracing_instance().slow_query_record_ttl().count() ;
res.threshold = tracing::tracing::get_local_tracing_instance().slow_query_threshold().count();
return res;
});
ss::set_slow_query.set(r, [](std::unique_ptr<request> req) {
auto enable = req->get_query_param("enable");
auto ttl = req->get_query_param("ttl");
auto threshold = req->get_query_param("threshold");
try {
return tracing::tracing::tracing_instance().invoke_on_all([enable, ttl, threshold] (auto& local_tracing) {
if (threshold != "") {
local_tracing.set_slow_query_threshold(std::chrono::microseconds(std::stol(threshold.c_str())));
}
if (ttl != "") {
local_tracing.set_slow_query_record_ttl(std::chrono::seconds(std::stol(ttl.c_str())));
}
if (enable != "") {
local_tracing.set_slow_query_enabled(strcasecmp(enable.c_str(), "true") == 0);
}
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
} catch (...) {
throw httpd::bad_param_exception(sprint("Bad format value: "));
}
});
ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
@@ -809,8 +758,10 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
ss::get_metrics_load.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
ss::get_exceptions.set(r, [](const_req req) {

View File

@@ -28,12 +28,11 @@
#include "utils/managed_bytes.hh"
#include "net/byteorder.hh"
#include <cstdint>
#include <iosfwd>
#include <seastar/util/gcc6-concepts.hh>
#include <iostream>
template<typename T, typename Input>
template<typename T>
static inline
void set_field(Input& v, unsigned offset, T val) {
void set_field(managed_bytes& v, unsigned offset, T val) {
reinterpret_cast<net::packed<T>*>(v.begin() + offset)->raw = net::hton(val);
}
@@ -58,8 +57,6 @@ private:
static constexpr int8_t LIVE_FLAG = 0x01;
static constexpr int8_t EXPIRY_FLAG = 0x02; // When present, expiry field is present. Set only for live cells
static constexpr int8_t REVERT_FLAG = 0x04; // transient flag used to efficiently implement ReversiblyMergeable for atomic cells.
static constexpr int8_t COUNTER_UPDATE_FLAG = 0x08; // Cell is a counter update.
static constexpr int8_t COUNTER_IN_PLACE_REVERT = 0x10;
static constexpr unsigned flags_size = 1;
static constexpr unsigned timestamp_offset = flags_size;
static constexpr unsigned timestamp_size = 8;
@@ -69,25 +66,14 @@ private:
static constexpr unsigned deletion_time_size = 4;
static constexpr unsigned ttl_offset = expiry_offset + expiry_size;
static constexpr unsigned ttl_size = 4;
friend class counter_cell_builder;
private:
static bool is_counter_update(bytes_view cell) {
return cell[0] & COUNTER_UPDATE_FLAG;
}
static bool is_revert_set(bytes_view cell) {
return cell[0] & REVERT_FLAG;
}
static bool is_counter_in_place_revert_set(bytes_view cell) {
return cell[0] & COUNTER_IN_PLACE_REVERT;
}
template<typename BytesContainer>
static void set_revert(BytesContainer& cell, bool revert) {
cell[0] = (cell[0] & ~REVERT_FLAG) | (revert * REVERT_FLAG);
}
template<typename BytesContainer>
static void set_counter_in_place_revert(BytesContainer& cell, bool flag) {
cell[0] = (cell[0] & ~COUNTER_IN_PLACE_REVERT) | (flag * COUNTER_IN_PLACE_REVERT);
}
static bool is_live(const bytes_view& cell) {
return cell[0] & LIVE_FLAG;
}
@@ -101,30 +87,13 @@ private:
static api::timestamp_type timestamp(const bytes_view& cell) {
return get_field<api::timestamp_type>(cell, timestamp_offset);
}
template<typename BytesContainer>
static void set_timestamp(BytesContainer& cell, api::timestamp_type ts) {
set_field(cell, timestamp_offset, ts);
}
// Can be called on live cells only
private:
template<typename BytesView>
static BytesView do_get_value(BytesView cell) {
static bytes_view value(bytes_view cell) {
auto expiry_field_size = bool(cell[0] & EXPIRY_FLAG) * (expiry_size + ttl_size);
auto value_offset = flags_size + timestamp_size + expiry_field_size;
cell.remove_prefix(value_offset);
return cell;
}
public:
static bytes_view value(bytes_view cell) {
return do_get_value(cell);
}
static bytes_mutable_view value(bytes_mutable_view cell) {
return do_get_value(cell);
}
// Can be called on live counter update cells only
static int64_t counter_update_value(bytes_view cell) {
return get_field<int64_t>(cell, flags_size + timestamp_size);
}
// Can be called only when is_dead() is true.
static gc_clock::time_point deletion_time(const bytes_view& cell) {
assert(is_dead(cell));
@@ -157,14 +126,6 @@ public:
std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
return b;
}
static managed_bytes make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
auto value_offset = flags_size + timestamp_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + sizeof(value));
b[0] = LIVE_FLAG | COUNTER_UPDATE_FLAG;
set_field(b, timestamp_offset, timestamp);
set_field(b, value_offset, value);
return b;
}
static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value, gc_clock::time_point expiry, gc_clock::duration ttl) {
auto value_offset = flags_size + timestamp_size + expiry_size + ttl_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
@@ -175,31 +136,6 @@ public:
std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
return b;
}
// make_live_from_serializer() is intended for users that need to serialise
// some object or objects to the format used in atomic_cell::value().
// With just make_live() the patter would look like follows:
// 1. allocate a buffer and write to it serialised objects
// 2. pass that buffer to make_live()
// 3. make_live() needs to prepend some metadata to the cell value so it
// allocates a new buffer and copies the content of the original one
//
// The allocation and copy of a buffer can be avoided.
// make_live_from_serializer() allows the user code to specify the timestamp
// and size of the cell value as well as provide the serialiser function
// object, which would write the serialised value of the cell to the buffer
// given to it by make_live_from_serializer().
template<typename Serializer>
GCC6_CONCEPT(requires requires(Serializer serializer, bytes::iterator it) {
serializer(it);
})
static managed_bytes make_live_from_serializer(api::timestamp_type timestamp, size_t size, Serializer&& serializer) {
auto value_offset = flags_size + timestamp_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + size);
b[0] = LIVE_FLAG;
set_field(b, timestamp_offset, timestamp);
serializer(b.begin() + value_offset);
return b;
}
template<typename ByteContainer>
friend class atomic_cell_base;
friend class atomic_cell;
@@ -213,23 +149,17 @@ protected:
atomic_cell_base(ByteContainer&& data) : _data(std::forward<ByteContainer>(data)) { }
friend class atomic_cell_or_collection;
public:
bool is_counter_update() const {
return atomic_cell_type::is_counter_update(_data);
}
bool is_revert_set() const {
return atomic_cell_type::is_revert_set(_data);
}
bool is_counter_in_place_revert_set() const {
return atomic_cell_type::is_counter_in_place_revert_set(_data);
}
bool is_live() const {
return atomic_cell_type::is_live(_data);
}
bool is_live(tombstone t, bool is_counter) const {
return is_live() && !is_covered_by(t, is_counter);
bool is_live(tombstone t) const {
return is_live() && !is_covered_by(t);
}
bool is_live(tombstone t, gc_clock::time_point now, bool is_counter) const {
return is_live() && !is_covered_by(t, is_counter) && !has_expired(now);
bool is_live(tombstone t, gc_clock::time_point now) const {
return is_live() && !is_covered_by(t) && !has_expired(now);
}
bool is_live_and_has_ttl() const {
return atomic_cell_type::is_live_and_has_ttl(_data);
@@ -237,24 +167,17 @@ public:
bool is_dead(gc_clock::time_point now) const {
return atomic_cell_type::is_dead(_data) || has_expired(now);
}
bool is_covered_by(tombstone t, bool is_counter) const {
return timestamp() <= t.timestamp || (is_counter && t.timestamp != api::missing_timestamp);
bool is_covered_by(tombstone t) const {
return timestamp() <= t.timestamp;
}
// Can be called on live and dead cells
api::timestamp_type timestamp() const {
return atomic_cell_type::timestamp(_data);
}
void set_timestamp(api::timestamp_type ts) {
atomic_cell_type::set_timestamp(_data, ts);
}
// Can be called on live cells only
auto value() const {
bytes_view value() const {
return atomic_cell_type::value(_data);
}
// Can be called on live counter update cells only
int64_t counter_update_value() const {
return atomic_cell_type::counter_update_value(_data);
}
// Can be called only when is_dead(gc_clock::time_point)
gc_clock::time_point deletion_time() const {
return !is_live() ? atomic_cell_type::deletion_time(_data) : expiry() - ttl();
@@ -269,7 +192,7 @@ public:
}
// Can be called on live and dead cells
bool has_expired(gc_clock::time_point now) const {
return is_live_and_has_ttl() && expiry() <= now;
return is_live_and_has_ttl() && expiry() < now;
}
bytes_view serialize() const {
return _data;
@@ -277,9 +200,6 @@ public:
void set_revert(bool revert) {
atomic_cell_type::set_revert(_data, revert);
}
void set_counter_in_place_revert(bool flag) {
atomic_cell_type::set_counter_in_place_revert(_data, flag);
}
};
class atomic_cell_view final : public atomic_cell_base<bytes_view> {
@@ -291,14 +211,6 @@ public:
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
};
class atomic_cell_mutable_view final : public atomic_cell_base<bytes_mutable_view> {
atomic_cell_mutable_view(bytes_mutable_view data) : atomic_cell_base(std::move(data)) {}
public:
static atomic_cell_mutable_view from_bytes(bytes_mutable_view data) { return atomic_cell_mutable_view(data); }
friend class atomic_cell;
};
class atomic_cell_ref final : public atomic_cell_base<managed_bytes&> {
public:
atomic_cell_ref(managed_bytes& buf) : atomic_cell_base(buf) {}
@@ -324,22 +236,11 @@ public:
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value) {
return atomic_cell_type::make_live(timestamp, value);
}
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value) {
return make_live(timestamp, bytes_view(value));
}
static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
return atomic_cell_type::make_live_counter_update(timestamp, value);
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return atomic_cell_type::make_live(timestamp, value, expiry, ttl);
}
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return make_live(timestamp, bytes_view(value), expiry, ttl);
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value, ttl_opt ttl) {
if (!ttl) {
return atomic_cell_type::make_live(timestamp, value);
@@ -347,10 +248,6 @@ public:
return atomic_cell_type::make_live(timestamp, value, gc_clock::now() + *ttl, *ttl);
}
}
template<typename Serializer>
static atomic_cell make_live_from_serializer(api::timestamp_type timestamp, size_t size, Serializer&& serializer) {
return atomic_cell_type::make_live_from_serializer(timestamp, size, std::forward<Serializer>(serializer));
}
friend class atomic_cell_or_collection;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
};
@@ -388,6 +285,11 @@ collection_mutation::operator collection_mutation_view() const {
return { data };
}
namespace db {
template<typename T>
class serializer;
}
class column_definition;
int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);

View File

@@ -26,17 +26,16 @@
#include "types.hh"
#include "atomic_cell.hh"
#include "hashing.hh"
#include "counters.hh"
template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
void operator()(Hasher& h, collection_mutation_view cell) const {
auto m_view = collection_type_impl::deserialize_mutation_form(cell);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
::feed_hash(h, key_and_value.second);
}
}
};
@@ -44,14 +43,10 @@ struct appending_hash<collection_mutation_view> {
template<>
struct appending_hash<atomic_cell_view> {
template<typename Hasher>
void operator()(Hasher& h, atomic_cell_view cell, const column_definition& cdef) const {
void operator()(Hasher& h, atomic_cell_view cell) const {
feed_hash(h, cell.is_live());
feed_hash(h, cell.timestamp());
if (cell.is_live()) {
if (cdef.is_counter()) {
::feed_hash(h, counter_cell_view(cell));
return;
}
if (cell.is_live_and_has_ttl()) {
feed_hash(h, cell.expiry());
feed_hash(h, cell.ttl());
@@ -66,15 +61,15 @@ struct appending_hash<atomic_cell_view> {
template<>
struct appending_hash<atomic_cell> {
template<typename Hasher>
void operator()(Hasher& h, const atomic_cell& cell, const column_definition& cdef) const {
feed_hash(h, static_cast<atomic_cell_view>(cell), cdef);
void operator()(Hasher& h, const atomic_cell& cell) const {
feed_hash(h, static_cast<atomic_cell_view>(cell));
}
};
template<>
struct appending_hash<collection_mutation> {
template<typename Hasher>
void operator()(Hasher& h, const collection_mutation& cm, const column_definition& cdef) const {
feed_hash(h, static_cast<collection_mutation_view>(cm), cdef);
void operator()(Hasher& h, const collection_mutation& cm) const {
feed_hash(h, static_cast<collection_mutation_view>(cm));
}
};

View File

@@ -39,14 +39,10 @@ public:
static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
atomic_cell_ref as_atomic_cell_ref() { return { _data }; }
atomic_cell_mutable_view as_mutable_atomic_cell() { return atomic_cell_mutable_view::from_bytes(_data); }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
explicit operator bool() const {
return !_data.empty();
}
bool can_use_mutable_view() const {
return !_data.is_fragmented();
}
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
return std::move(data.data);
}
@@ -62,13 +58,13 @@ public:
template<typename Hasher>
void feed_hash(Hasher& h, const column_definition& def) const {
if (def.is_atomic()) {
::feed_hash(h, as_atomic_cell(), def);
::feed_hash(h, as_atomic_cell());
} else {
::feed_hash(h, as_collection_mutation(), def);
::feed_hash(as_collection_mutation(), h, def.type);
}
}
size_t external_memory_usage() const {
return _data.external_memory_usage();
size_t memory_usage() const {
return _data.memory_usage();
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};

View File

@@ -1,41 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/allow_all_authenticator.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
namespace auth {
const sstring& allow_all_authenticator_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthenticator";
return name;
}
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authenticator,
allow_all_authenticator,
cql3::query_processor&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
}

View File

@@ -1,97 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <stdexcept>
#include "auth/authenticator.hh"
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
namespace cql3 {
class query_processor;
}
namespace service {
class migration_manager;
}
namespace auth {
const sstring& allow_all_authenticator_name();
class allow_all_authenticator final : public authenticator {
public:
allow_all_authenticator(cql3::query_processor&, ::service::migration_manager&) {
}
future<> start() override {
return make_ready_future<>();
}
future<> stop() override {
return make_ready_future<>();
}
const sstring& qualified_java_name() const override {
return allow_all_authenticator_name();
}
bool require_authentication() const override {
return false;
}
option_set supported_options() const override {
return option_set();
}
option_set alterable_options() const override {
return option_set();
}
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override {
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
future<> create(sstring username, const option_map& options) override {
return make_ready_future();
}
future<> alter(sstring username, const option_map& options) override {
return make_ready_future();
}
future<> drop(sstring username) override {
return make_ready_future();
}
const resource_ids& protected_resources() const override {
static const resource_ids ids;
return ids;
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
}

View File

@@ -1,41 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "utils/class_registrator.hh"
namespace auth {
const sstring& allow_all_authorizer_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";
return name;
}
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authorizer,
allow_all_authorizer,
cql3::query_processor&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthorizer");
}

View File

@@ -1,98 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "authorizer.hh"
#include "exceptions/exceptions.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
}
namespace service {
class migration_manager;
}
namespace auth {
class service;
const sstring& allow_all_authorizer_name();
class allow_all_authorizer final : public authorizer {
public:
allow_all_authorizer(cql3::query_processor&, ::service::migration_manager&) {
}
future<> start() override {
return make_ready_future<>();
}
future<> stop() override {
return make_ready_future<>();
}
const sstring& qualified_java_name() const override {
return allow_all_authorizer_name();
}
future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
}
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
}
future<std::vector<permission_details>> list(
service&,
::shared_ptr<authenticated_user> performer,
permission_set,
stdx::optional<data_resource>,
stdx::optional<sstring>) const override {
throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
}
future<> revoke_all(sstring dropped_user) override {
return make_ready_future();
}
future<> revoke_all(data_resource) override {
return make_ready_future();
}
const resource_ids& protected_resources() override {
static const resource_ids ids;
return ids;
}
future<> validate_configuration() const override {
return make_ready_future();
}
};
}

383
auth/auth.cc Normal file
View File

@@ -0,0 +1,383 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <seastar/core/sleep.hh>
#include <seastar/core/distributed.hh>
#include "auth.hh"
#include "authenticator.hh"
#include "authorizer.hh"
#include "database.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/raw/cf_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "db/config.hh"
#include "service/migration_manager.hh"
#include "utils/loading_cache.hh"
#include "utils/hash.hh"
const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
const sstring auth::auth::AUTH_KS("system_auth");
const sstring auth::auth::USERS_CF("users");
static const sstring USER_NAME("name");
static const sstring SUPER("super");
static logging::logger logger("auth");
// TODO: configurable
using namespace std::chrono_literals;
const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
class auth_migration_listener : public service::migration_listener {
void on_create_keyspace(const sstring& ks_name) override {}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_keyspace(const sstring& ks_name) override {}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_keyspace(const sstring& ks_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name));
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name, cf_name));
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
};
static auth_migration_listener auth_migration;
namespace std {
template <>
struct hash<auth::data_resource> {
size_t operator()(const auth::data_resource & v) const {
return v.hash_value();
}
};
template <>
struct hash<auth::authenticated_user> {
size_t operator()(const auth::authenticated_user & v) const {
return utils::tuple_hash()(v.name(), v.is_anonymous());
}
};
}
class auth::auth::permissions_cache {
public:
typedef utils::loading_cache<std::pair<authenticated_user, data_resource>, permission_set, utils::tuple_hash> cache_type;
typedef typename cache_type::key_type key_type;
permissions_cache()
: permissions_cache(
cql3::get_local_query_processor().db().local().get_config()) {
}
permissions_cache(const db::config& cfg)
: _cache(cfg.permissions_cache_max_entries(), expiry(cfg),
std::chrono::milliseconds(
cfg.permissions_validity_in_ms()),
[](const key_type& k) {
logger.debug("Refreshing permissions for {}", k.first.name());
return authorizer::get().authorize(::make_shared<authenticated_user>(k.first), k.second);
}) {
}
static std::chrono::milliseconds expiry(const db::config& cfg) {
auto exp = cfg.permissions_update_interval_in_ms();
if (exp == 0 || exp == std::numeric_limits<uint32_t>::max()) {
exp = cfg.permissions_validity_in_ms();
}
return std::chrono::milliseconds(exp);
}
future<> stop() {
return make_ready_future<>();
}
future<permission_set> get(::shared_ptr<authenticated_user> user, data_resource resource) {
return _cache.get(key_type(*user, std::move(resource)));
}
private:
cache_type _cache;
};
static distributed<auth::auth::permissions_cache> perm_cache;
/**
* Poor mans job schedule. For maximum 2 jobs. Sic.
* Still does nothing more clever than waiting 10 seconds
* like origin, then runs the submitted tasks.
*
* Only difference compared to sleep (from which this
* borrows _heavily_) is that if tasks have not run by the time
* we exit (and do static clean up) we delete the promise + cont
*
* Should be abstracted to some sort of global server function
* probably.
*/
struct waiter {
promise<> done;
timer<> tmr;
waiter() : tmr([this] {done.set_value();})
{
tmr.arm(auth::auth::SUPERUSER_SETUP_DELAY);
}
~waiter() {
if (tmr.armed()) {
tmr.cancel();
done.set_exception(std::runtime_error("shutting down"));
}
logger.trace("Deleting scheduled task");
}
void kill() {
}
};
typedef std::unique_ptr<waiter> waiter_ptr;
static std::vector<waiter_ptr> & thread_waiters() {
static thread_local std::vector<waiter_ptr> the_waiters;
return the_waiters;
}
void auth::auth::schedule_when_up(scheduled_func f) {
logger.trace("Adding scheduled task");
auto & waiters = thread_waiters();
waiters.emplace_back(std::make_unique<waiter>());
auto* w = waiters.back().get();
w->done.get_future().finally([w] {
auto & waiters = thread_waiters();
auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
return p.get() == w;
});
if (i != waiters.end()) {
waiters.erase(i);
}
}).then([f = std::move(f)] {
logger.trace("Running scheduled task");
return f();
}).handle_exception([](auto ep) {
return make_ready_future();
});
}
bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
if (type == classname) {
return true;
}
auto i = classname.find_last_of('.');
return classname.compare(i + 1, sstring::npos, type) == 0;
}
future<> auth::auth::setup() {
auto& db = cql3::get_local_query_processor().db().local();
auto& cfg = db.get_config();
future<> f = perm_cache.start();
if (is_class_type(cfg.authenticator(),
authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)
&& is_class_type(cfg.authorizer(),
authorizer::ALLOW_ALL_AUTHORIZER_NAME)
) {
// just create the objects
return f.then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
});
}
if (!db.has_keyspace(AUTH_KS)) {
std::map<sstring, sstring> opts;
opts["replication_factor"] = "1";
auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
f = service::get_local_migration_manager().announce_new_keyspace(ksm, false);
}
return f.then([] {
return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
90 * 24 * 60 * 60)); // 3 months.
}).then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
}).then([] {
service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
// instead of once-timer, just schedule this later
schedule_when_up([] {
// setup default super user
return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
AUTH_KS, USERS_CF, USER_NAME, SUPER);
cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
logger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
}).handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception&) {
logger.warn("Skipped default superuser setup: some nodes were not ready");
}
});
}
});
});
});
}
future<> auth::auth::shutdown() {
// just make sure we don't have pending tasks.
// this is mostly relevant for test cases where
// db-env-shutdown != process shutdown
return smp::invoke_on_all([] {
thread_waiters().clear();
}).then([] {
return perm_cache.stop();
});
}
future<auth::permission_set> auth::auth::get_permissions(::shared_ptr<authenticated_user> user, data_resource resource) {
return perm_cache.local().get(std::move(user), std::move(resource));
}
static db::consistency_level consistency_for_user(const sstring& username) {
if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return cql3::get_local_query_processor().process(
sprint("SELECT * FROM %s.%s WHERE %s = ?",
auth::auth::AUTH_KS, auth::auth::USERS_CF,
USER_NAME), consistency_for_user(username),
{ username }, true);
}
future<bool> auth::auth::is_existing_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
}
future<bool> auth::auth::is_super_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
});
}
future<> auth::auth::insert_user(const sstring& username, bool is_super)
throw (exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
AUTH_KS, USERS_CF, USER_NAME, SUPER),
consistency_for_user(username), { username, is_super }).discard_result();
}
future<> auth::auth::delete_user(const sstring& username) throw(exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
AUTH_KS, USERS_CF, USER_NAME),
consistency_for_user(username), { username }).discard_result();
}
future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
auto& qp = cql3::get_local_query_processor();
auto& db = qp.db().local();
if (db.has_schema(AUTH_KS, name)) {
return make_ready_future();
}
::shared_ptr<cql3::statements::raw::cf_statement> parsed = static_pointer_cast<
cql3::statements::raw::cf_statement>(cql3::query_processor::parse_statement(cql));
parsed->prepare_keyspace(AUTH_KS);
::shared_ptr<cql3::statements::create_table_statement> statement =
static_pointer_cast<cql3::statements::create_table_statement>(
parsed->prepare(db)->statement);
auto schema = statement->get_cf_meta_data();
auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return service::get_local_migration_manager().announce_new_column_family(b.build(), false);
}
future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
});
});
}

124
auth/auth.hh Normal file
View File

@@ -0,0 +1,124 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "exceptions/exceptions.hh"
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class authenticated_user;
class auth {
public:
class permissions_cache;
static const sstring DEFAULT_SUPERUSER_NAME;
static const sstring AUTH_KS;
static const sstring USERS_CF;
static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
static bool is_class_type(const sstring& type, const sstring& classname);
static future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource);
/**
* Checks if the username is stored in AUTH_KS.USERS_CF.
*
* @param username Username to query.
* @return whether or not Cassandra knows about the user.
*/
static future<bool> is_existing_user(const sstring& username);
/**
* Checks if the user is a known superuser.
*
* @param username Username to query.
* @return true is the user is a superuser, false if they aren't or don't exist at all.
*/
static future<bool> is_super_user(const sstring& username);
/**
* Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
*
* @param username Username to insert.
* @param isSuper User's new status.
* @throws RequestExecutionException
*/
static future<> insert_user(const sstring& username, bool is_super) throw(exceptions::request_execution_exception);
/**
* Deletes the user from AUTH_KS.USERS_CF.
*
* @param username Username to delete.
* @throws RequestExecutionException
*/
static future<> delete_user(const sstring& username) throw(exceptions::request_execution_exception);
/**
* Sets up Authenticator and Authorizer.
*/
static future<> setup();
static future<> shutdown();
/**
* Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
*
* @param name name of the table
* @param cql CREATE TABLE statement
*/
static future<> setup_table(const sstring& name, const sstring& cql);
static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
// For internal use. Run function "when system is up".
typedef std::function<future<>()> scheduled_func;
static void schedule_when_up(scheduled_func);
};
}

View File

@@ -41,6 +41,7 @@
#include "authenticated_user.hh"
#include "auth.hh"
const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
@@ -59,6 +60,13 @@ const sstring& auth::authenticated_user::name() const {
return _anon ? ANONYMOUS_USERNAME : _name;
}
future<bool> auth::authenticated_user::is_super() const {
if (is_anonymous()) {
return make_ready_future<bool>(false);
}
return auth::auth::is_super_user(_name);
}
bool auth::authenticated_user::operator==(const authenticated_user& v) const {
return _anon ? v._anon : _name == v._name;
}

View File

@@ -43,7 +43,6 @@
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include "seastarx.hh"
namespace auth {
@@ -58,6 +57,14 @@ public:
const sstring& name() const;
/**
* Checks the user's superuser status.
* Only a superuser is allowed to perform CREATE USER and DROP USER queries.
* Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
* (depends on IAuthorizer implementation).
*/
future<bool> is_super() const;
/**
* If IAuthenticator doesn't require authentication, this method may return true.
*/

View File

@@ -41,14 +41,13 @@
#include "authenticator.hh"
#include "authenticated_user.hh"
#include "common.hh"
#include "password_authenticator.hh"
#include "cql3/query_processor.hh"
#include "auth.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
const sstring auth::authenticator::USERNAME_KEY("username");
const sstring auth::authenticator::PASSWORD_KEY("password");
const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
auth::authenticator::option auth::authenticator::string_to_option(const sstring& name) {
if (strcasecmp(name.c_str(), "password") == 0) {
@@ -65,3 +64,64 @@ sstring auth::authenticator::option_to_string(option opt) {
throw std::invalid_argument(sprint("Unknown option {}", opt));
}
}
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authenticator> global_authenticator;
future<>
auth::authenticator::setup(const sstring& type) throw (exceptions::configuration_exception) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
class allow_all_authenticator : public authenticator {
public:
const sstring& class_name() const override {
return ALLOW_ALL_AUTHENTICATOR_NAME;
}
bool require_authentication() const override {
return false;
}
option_set supported_options() const override {
return option_set();
}
option_set alterable_options() const override {
return option_set();
}
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override {
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
const resource_ids& protected_resources() const override {
static const resource_ids ids;
return ids;
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
global_authenticator = std::make_unique<allow_all_authenticator>();
} else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
auto pwa = std::make_unique<password_authenticator>();
auto f = pwa->init();
return f.then([pwa = std::move(pwa)]() mutable {
global_authenticator = std::move(pwa);
});
} else {
throw exceptions::configuration_exception("Invalid authenticator type: " + type);
}
return make_ready_future();
}
auth::authenticator& auth::authenticator::get() {
assert(global_authenticator);
return *global_authenticator;
}

View File

@@ -69,6 +69,7 @@ class authenticator {
public:
static const sstring USERNAME_KEY;
static const sstring PASSWORD_KEY;
static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;
/**
* Supported CREATE USER/ALTER USER options.
@@ -85,14 +86,23 @@ public:
using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
using credentials_map = std::unordered_map<sstring, sstring>;
/**
* Setup is called once upon system startup to initialize the IAuthenticator.
*
* For example, use this method to create any required keyspaces/column families.
* Note: Only call from main thread.
*/
static future<> setup(const sstring& type) throw(exceptions::configuration_exception);
/**
* Returns the system authenticator. Must have called setup before calling this.
*/
static authenticator& get();
virtual ~authenticator()
{}
virtual future<> start() = 0;
virtual future<> stop() = 0;
virtual const sstring& qualified_java_name() const = 0;
virtual const sstring& class_name() const = 0;
/**
* Whether or not the authenticator requires explicit login.
@@ -119,7 +129,7 @@ public:
*
* @throws authentication_exception if credentials don't match any known user.
*/
virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const = 0;
virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) = 0;
/**
* Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
@@ -131,7 +141,7 @@ public:
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> create(sstring username, const option_map& options) = 0;
virtual future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Called during execution of ALTER USER query.
@@ -144,7 +154,7 @@ public:
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> alter(sstring username, const option_map& options) = 0;
virtual future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
@@ -154,7 +164,7 @@ public:
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> drop(sstring username) = 0;
virtual future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
@@ -167,9 +177,9 @@ public:
class sasl_challenge {
public:
virtual ~sasl_challenge() {}
virtual bytes evaluate_response(bytes_view client_response) = 0;
virtual bytes evaluate_response(bytes_view client_response) throw(exceptions::authentication_exception) = 0;
virtual bool is_complete() const = 0;
virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const = 0;
virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const throw(exceptions::authentication_exception) = 0;
};
/**

View File

@@ -41,39 +41,23 @@
#include "authorizer.hh"
#include "authenticated_user.hh"
#include "common.hh"
#include "default_authorizer.hh"
#include "auth.hh"
#include "cql3/query_processor.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
const sstring& auth::allow_all_authorizer_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";
return name;
}
const sstring auth::authorizer::ALLOW_ALL_AUTHORIZER_NAME("org.apache.cassandra.auth.AllowAllAuthorizer");
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authorizer> global_authorizer;
using authorizer_registry = class_registry<auth::authorizer, cql3::query_processor&>;
future<>
auth::authorizer::setup(const sstring& type) {
if (type == allow_all_authorizer_name()) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHORIZER_NAME)) {
class allow_all_authorizer : public authorizer {
public:
future<> start() override {
return make_ready_future<>();
}
future<> stop() override {
return make_ready_future<>();
}
const sstring& qualified_java_name() const override {
return allow_all_authorizer_name();
}
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
@@ -102,14 +86,16 @@ auth::authorizer::setup(const sstring& type) {
};
global_authorizer = std::make_unique<allow_all_authorizer>();
return make_ready_future();
} else {
auto a = authorizer_registry::create(type, cql3::get_local_query_processor());
auto f = a->start();
return f.then([a = std::move(a)]() mutable {
global_authorizer = std::move(a);
} else if (auth::auth::is_class_type(type, default_authorizer::DEFAULT_AUTHORIZER_NAME)) {
auto da = std::make_unique<default_authorizer>();
auto f = da->init();
return f.then([da = std::move(da)]() mutable {
global_authorizer = std::move(da);
});
} else {
throw exceptions::configuration_exception("Invalid authorizer type: " + type);
}
return make_ready_future();
}
auth::authorizer& auth::authorizer::get() {

View File

@@ -51,12 +51,8 @@
#include "permission.hh"
#include "data_resource.hh"
#include "seastarx.hh"
namespace auth {
class service;
class authenticated_user;
struct permission_details {
@@ -73,14 +69,10 @@ using std::experimental::optional;
class authorizer {
public:
static const sstring ALLOW_ALL_AUTHORIZER_NAME;
virtual ~authorizer() {}
virtual future<> start() = 0;
virtual future<> stop() = 0;
virtual const sstring& qualified_java_name() const = 0;
/**
* The primary Authorizer method. Returns a set of permissions of a user on a resource.
*
@@ -88,7 +80,7 @@ public:
* @param resource Resource for which the authorization is being requested. @see DataResource.
* @return Set of permissions of the user on the resource. Should never return empty. Use permission.NONE instead.
*/
virtual future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const = 0;
virtual future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const = 0;
/**
* Grants a set of permissions on a resource to a user.
@@ -132,7 +124,7 @@ public:
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<std::vector<permission_details>> list(service&, ::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
virtual future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
/**
* This method is called before deleting a user with DROP USER query so that a new user with the same
@@ -162,6 +154,18 @@ public:
* @throws ConfigurationException when there is a configuration error.
*/
virtual future<> validate_configuration() const = 0;
/**
* Setup is called once upon system startup to initialize the IAuthorizer.
*
* For example, use this method to create any required keyspaces/column families.
*/
static future<> setup(const sstring& type);
/**
* Returns the system authorizer. Must have called setup before calling this.
*/
static authorizer& get();
};
}

View File

@@ -1,70 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/common.hh"
#include <seastar/core/shared_ptr.hh>
#include "cql3/query_processor.hh"
#include "cql3/statements/create_table_statement.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
namespace auth {
namespace meta {
const sstring DEFAULT_SUPERUSER_NAME("cassandra");
const sstring AUTH_KS("system_auth");
const sstring USERS_CF("users");
const sstring AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");
}
future<> create_metadata_table_if_missing(
const sstring& table_name,
cql3::query_processor& qp,
const sstring& cql,
::service::migration_manager& mm) {
auto& db = qp.db().local();
if (db.has_schema(meta::AUTH_KS, table_name)) {
return make_ready_future<>();
}
auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(
cql3::query_processor::parse_statement(cql));
parsed_statement->prepare_keyspace(meta::AUTH_KS);
auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
parsed_statement->prepare(db, qp.get_cql_stats())->statement);
const auto schema = statement->get_cf_meta_data();
const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return mm.announce_new_column_family(b.build(), false);
}
}

View File

@@ -1,74 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <seastar/core/future.hh>
#include <seastar/core/reactor.hh>
#include <seastar/core/resource.hh>
#include <seastar/core/sstring.hh>
#include "delayed_tasks.hh"
#include "seastarx.hh"
namespace service {
class migration_manager;
}
namespace cql3 {
class query_processor;
}
namespace auth {
namespace meta {
extern const sstring DEFAULT_SUPERUSER_NAME;
extern const sstring AUTH_KS;
extern const sstring USERS_CF;
extern const sstring AUTH_PACKAGE_NAME;
}
template <class Task>
future<> once_among_shards(Task&& f) {
if (engine().cpu_id() == 0u) {
return f();
}
return make_ready_future<>();
}
template <class Task, class Clock>
void delay_until_system_ready(delayed_tasks<Clock>& ts, Task&& f) {
static const typename std::chrono::milliseconds delay_duration(10000);
ts.schedule_after(delay_duration, std::forward<Task>(f));
}
future<> create_metadata_table_if_missing(
const sstring& table_name,
cql3::query_processor&,
const sstring& cql,
::service::migration_manager&);
}

View File

@@ -115,14 +115,16 @@ auth::data_resource auth::data_resource::get_parent() const {
}
}
const sstring& auth::data_resource::keyspace() const {
const sstring& auth::data_resource::keyspace() const
throw (std::invalid_argument) {
if (is_root_level()) {
throw std::invalid_argument("ROOT data resource has no keyspace");
}
return _ks;
}
const sstring& auth::data_resource::column_family() const {
const sstring& auth::data_resource::column_family() const
throw (std::invalid_argument) {
if (!is_column_family_level()) {
throw std::invalid_argument(sprint("%s data resource has no column family", name()));
}

View File

@@ -45,7 +45,6 @@
#include <iosfwd>
#include <set>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace auth {
@@ -118,13 +117,13 @@ public:
* @return keyspace of the resource.
* @throws std::invalid_argument if it's the root-level resource.
*/
const sstring& keyspace() const;
const sstring& keyspace() const throw(std::invalid_argument);
/**
* @return column family of the resource.
* @throws std::invalid_argument if it's not a cf-level resource.
*/
const sstring& column_family() const;
const sstring& column_family() const throw(std::invalid_argument);
/**
* @return Whether or not the resource has a parent in the hierarchy.

View File

@@ -46,68 +46,46 @@
#include <seastar/core/reactor.hh>
#include "common.hh"
#include "auth.hh"
#include "default_authorizer.hh"
#include "authenticated_user.hh"
#include "permission.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
const sstring& auth::default_authorizer_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "CassandraAuthorizer";
return name;
}
const sstring auth::default_authorizer::DEFAULT_AUTHORIZER_NAME(
"org.apache.cassandra.auth.CassandraAuthorizer");
static const sstring USER_NAME = "username";
static const sstring RESOURCE_NAME = "resource";
static const sstring PERMISSIONS_NAME = "permissions";
static const sstring PERMISSIONS_CF = "permissions";
static logging::logger alogger("default_authorizer");
static logging::logger logger("default_authorizer");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
auth::authorizer,
auth::default_authorizer,
cql3::query_processor&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.CassandraAuthorizer");
auth::default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
: _qp(qp)
, _migration_manager(mm) {
auth::default_authorizer::default_authorizer() {
}
auth::default_authorizer::~default_authorizer() {
}
future<> auth::default_authorizer::start() {
static const sstring create_table = sprint("CREATE TABLE %s.%s ("
future<> auth::default_authorizer::init() {
sstring create_table = sprint("CREATE TABLE %s.%s ("
"%s text,"
"%s text,"
"%s set<text>,"
"PRIMARY KEY(%s, %s)"
") WITH gc_grace_seconds=%d", meta::AUTH_KS,
") WITH gc_grace_seconds=%d", auth::auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME,
USER_NAME, RESOURCE_NAME, 90 * 24 * 60 * 60); // 3 months.
return auth::once_among_shards([this] {
return auth::create_metadata_table_if_missing(
PERMISSIONS_CF,
_qp,
create_table,
_migration_manager);
});
return auth::setup_table(PERMISSIONS_CF, create_table);
}
future<> auth::default_authorizer::stop() {
return make_ready_future<>();
}
future<auth::permission_set> auth::default_authorizer::authorize(
service& ser, ::shared_ptr<authenticated_user> user, data_resource resource) const {
return auth::is_super_user(ser, *user).then([this, user, resource = std::move(resource)](bool is_super) {
::shared_ptr<authenticated_user> user, data_resource resource) const {
return user->is_super().then([this, user, resource = std::move(resource)](bool is_super) {
if (is_super) {
return make_ready_future<permission_set>(permissions::ALL);
}
@@ -116,9 +94,10 @@ future<auth::permission_set> auth::default_authorizer::authorize(
* TOOD: could create actual data type for permission (translating string<->perm),
* but this seems overkill right now. We still must store strings so...
*/
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?"
, PERMISSIONS_NAME, meta::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return _qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
.then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
@@ -128,7 +107,7 @@ future<auth::permission_set> auth::default_authorizer::authorize(
}
return make_ready_future<permission_set>(permissions::from_strings(res->one().get_set<sstring>(PERMISSIONS_NAME)));
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to authorize {} for {}", user->name(), resource);
logger.warn("CassandraAuthorizer failed to authorize {} for {}", user->name(), resource);
return make_ready_future<permission_set>(permissions::NONE);
}
});
@@ -141,10 +120,11 @@ future<> auth::default_authorizer::modify(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring user, sstring op) {
// TODO: why does this not check super user?
auto& qp = cql3::get_local_query_processor();
auto query = sprint("UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
meta::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
auth::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
PERMISSIONS_NAME, op, USER_NAME, RESOURCE_NAME);
return _qp.process(query, db::consistency_level::ONE, {
return qp.process(query, db::consistency_level::ONE, {
permissions::to_strings(set), user, resource.name() }).discard_result();
}
@@ -162,14 +142,15 @@ future<> auth::default_authorizer::revoke(
}
future<std::vector<auth::permission_details>> auth::default_authorizer::list(
service& ser, ::shared_ptr<authenticated_user> performer, permission_set set,
::shared_ptr<authenticated_user> performer, permission_set set,
optional<data_resource> resource, optional<sstring> user) const {
return auth::is_super_user(ser, *performer).then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
return performer->is_super().then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
if (!is_super && (!user || performer->name() != *user)) {
throw exceptions::unauthorized_exception(sprint("You are not authorized to view %s's permissions", user ? *user : "everyone"));
}
auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, meta::AUTH_KS, PERMISSIONS_CF);
auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF);
auto& qp = cql3::get_local_query_processor();
// Oh, look, it is a case where it does not pay off to have
// parameters to process in an initializer list.
@@ -177,15 +158,15 @@ future<std::vector<auth::permission_details>> auth::default_authorizer::list(
if (resource && user) {
query += sprint(" WHERE %s = ? AND %s = ?", USER_NAME, RESOURCE_NAME);
f = _qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
f = qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
} else if (resource) {
query += sprint(" WHERE %s = ? ALLOW FILTERING", RESOURCE_NAME);
f = _qp.process(query, db::consistency_level::ONE, {resource->name()});
f = qp.process(query, db::consistency_level::ONE, {resource->name()});
} else if (user) {
query += sprint(" WHERE %s = ?", USER_NAME);
f = _qp.process(query, db::consistency_level::ONE, {*user});
f = qp.process(query, db::consistency_level::ONE, {*user});
} else {
f = _qp.process(query, db::consistency_level::ONE, {});
f = qp.process(query, db::consistency_level::ONE, {});
}
return f.then([set](::shared_ptr<cql3::untyped_result_set> res) {
@@ -207,40 +188,42 @@ future<std::vector<auth::permission_details>> auth::default_authorizer::list(
}
future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", meta::AUTH_KS,
auto& qp = cql3::get_local_query_processor();
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME);
return _qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
return qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
[dropped_user](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", dropped_user, e);
logger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", dropped_user, e);
}
});
}
future<> auth::default_authorizer::revoke_all(data_resource resource) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
USER_NAME, meta::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
return _qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
.then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
USER_NAME, auth::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
.then_wrapped([resource, &qp](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
return parallel_for_each(res->begin(), res->end(), [this, res, resource](const cql3::untyped_result_set::row& r) {
return parallel_for_each(res->begin(), res->end(), [&qp, res, resource](const cql3::untyped_result_set::row& r) {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ? AND %s = ?"
, meta::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return _qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
.discard_result().handle_exception([resource](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
}
});
});
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
return make_ready_future();
}
});
@@ -248,7 +231,7 @@ future<> auth::default_authorizer::revoke_all(data_resource resource) {
const auth::resource_ids& auth::default_authorizer::protected_resources() {
static const resource_ids ids({ data_resource(meta::AUTH_KS, PERMISSIONS_CF) });
static const resource_ids ids({ data_resource(auth::AUTH_KS, PERMISSIONS_CF) });
return ids;
}

View File

@@ -41,40 +41,26 @@
#pragma once
#include <functional>
#include "authorizer.hh"
#include "cql3/query_processor.hh"
#include "service/migration_manager.hh"
namespace auth {
const sstring& default_authorizer_name();
class default_authorizer : public authorizer {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
public:
default_authorizer(cql3::query_processor&, ::service::migration_manager&);
static const sstring DEFAULT_AUTHORIZER_NAME;
default_authorizer();
~default_authorizer();
future<> start() override;
future<> init();
future<> stop() override;
const sstring& qualified_java_name() const override {
return default_authorizer_name();
}
future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const override;
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override;
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
future<std::vector<permission_details>> list(service&, ::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
future<> revoke_all(sstring) override;

View File

@@ -46,42 +46,28 @@
#include <seastar/core/reactor.hh>
#include "common.hh"
#include "auth.hh"
#include "password_authenticator.hh"
#include "authenticated_user.hh"
#include "cql3/untyped_result_set.hh"
#include "cql3/query_processor.hh"
#include "log.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
const sstring& auth::password_authenticator_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "PasswordAuthenticator";
return name;
}
const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
// name of the hash column.
static const sstring SALTED_HASH = "salted_hash";
static const sstring USER_NAME = "username";
static const sstring DEFAULT_USER_NAME = auth::meta::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = auth::meta::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring CREDENTIALS_CF = "credentials";
static logging::logger plogger("password_authenticator");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
auth::authenticator,
auth::password_authenticator,
cql3::query_processor&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
static logging::logger logger("password_authenticator");
auth::password_authenticator::~password_authenticator()
{}
auth::password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
: _qp(qp)
, _migration_manager(mm) {
}
auth::password_authenticator::password_authenticator()
{}
// TODO: blowfish
// Origin uses Java bcrypt library, i.e. blowfish salt
@@ -102,10 +88,12 @@ auth::password_authenticator::password_authenticator(cql3::query_processor& qp,
// and some old-fashioned random salt generation.
static constexpr size_t rand_bytes = 16;
static thread_local crypt_data tlcrypt = { 0, };
static sstring hashpw(const sstring& pass, const sstring& salt) {
auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
// crypt_data is huge. should this be a thread_local static?
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
if (res == nullptr) {
throw std::system_error(errno, std::system_category());
}
@@ -134,16 +122,17 @@ static sstring gensalt() {
sstring salt;
if (!prefix.empty()) {
return prefix + input;
return prefix + salt;
}
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
// Try in order:
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
if (e && (e[0] != '*')) {
if (crypt_r("fisk", salt.c_str(), tmp.get())) {
prefix = pfx;
return salt;
}
@@ -155,52 +144,39 @@ static sstring hashpw(const sstring& pass) {
return hashpw(pass, gensalt());
}
future<> auth::password_authenticator::start() {
return auth::once_among_shards([this] {
gensalt(); // do this once to determine usable hashing
future<> auth::password_authenticator::init() {
gensalt(); // do this once to determine usable hashing
static const sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text," // salt + hash + number of rounds
"options map<text,text>,"// for future extensions
"PRIMARY KEY(%s)"
") WITH gc_grace_seconds=%d",
meta::AUTH_KS,
CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
90 * 24 * 60 * 60); // 3 months.
sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text," // salt + hash + number of rounds
"options map<text,text>,"// for future extensions
"PRIMARY KEY(%s)"
") WITH gc_grace_seconds=%d",
auth::auth::AUTH_KS,
CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
90 * 24 * 60 * 60); // 3 months.
return auth::create_metadata_table_if_missing(
CREDENTIALS_CF,
_qp,
create_table,
_migration_manager).then([this] {
auth::delay_until_system_ready(_delayed, [this] {
return has_existing_users().then([this](bool existing) {
if (!existing) {
return _qp.process(
sprint(
"INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
meta::AUTH_KS,
CREDENTIALS_CF,
USER_NAME, SALTED_HASH),
db::consistency_level::ONE,
{ DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD) }).then([](auto) {
plogger.info("Created default user '{}'", DEFAULT_USER_NAME);
});
}
return make_ready_future<>();
});
return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
// instead of once-timer, just schedule this later
auth::schedule_when_up([] {
return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
auth::AUTH_KS,
CREDENTIALS_CF,
USER_NAME, SALTED_HASH
),
db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
logger.info("Created default user '{}'", DEFAULT_USER_NAME);
});
}
});
});
});
}
future<> auth::password_authenticator::stop() {
return make_ready_future<>();
}
db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
if (username == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
@@ -208,8 +184,8 @@ db::consistency_level auth::password_authenticator::consistency_for_user(const s
return db::consistency_level::LOCAL_ONE;
}
const sstring& auth::password_authenticator::qualified_java_name() const {
return password_authenticator_name();
const sstring& auth::password_authenticator::class_name() const {
return PASSWORD_AUTHENTICATOR_NAME;
}
bool auth::password_authenticator::require_authentication() const {
@@ -225,7 +201,8 @@ auth::authenticator::option_set auth::password_authenticator::alterable_options(
}
future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
const credentials_map& credentials) const {
const credentials_map& credentials) const
throw (exceptions::authentication_exception) {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
}
@@ -242,8 +219,9 @@ future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::au
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return futurize_apply([this, username, password] {
return _qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
meta::AUTH_KS, CREDENTIALS_CF, USER_NAME),
auto& qp = cql3::get_local_query_processor();
return qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
consistency_for_user(username), {username}, true);
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
@@ -263,50 +241,58 @@ future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::au
}
future<> auth::password_authenticator::create(sstring username,
const option_map& options) {
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
meta::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
return _qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
future<> auth::password_authenticator::alter(sstring username,
const option_map& options) {
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
meta::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
return _qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
future<> auth::password_authenticator::drop(sstring username) {
future<> auth::password_authenticator::drop(sstring username)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
meta::AUTH_KS, CREDENTIALS_CF, USER_NAME);
return _qp.process(query, consistency_for_user(username), { username }).discard_result();
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
const auth::resource_ids& auth::password_authenticator::protected_resources() const {
static const resource_ids ids({ data_resource(meta::AUTH_KS, CREDENTIALS_CF) });
static const resource_ids ids({ data_resource(auth::AUTH_KS, CREDENTIALS_CF) });
return ids;
}
::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge: public sasl_challenge {
const password_authenticator& _self;
public:
plain_text_password_challenge(const password_authenticator& self) : _self(self)
plain_text_password_challenge(const password_authenticator& a)
: _authenticator(a)
{}
/**
@@ -322,8 +308,9 @@ const auth::resource_ids& auth::password_authenticator::protected_resources() co
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes evaluate_response(bytes_view client_response) override {
plogger.debug("Decoding credentials from client token");
bytes evaluate_response(bytes_view client_response)
throw (exceptions::authentication_exception) override {
logger.debug("Decoding credentials from client token");
sstring username, password;
@@ -360,59 +347,14 @@ const auth::resource_ids& auth::password_authenticator::protected_resources() co
bool is_complete() const override {
return _complete;
}
future<::shared_ptr<authenticated_user>> get_authenticated_user() const override {
return _self.authenticate(_credentials);
future<::shared_ptr<authenticated_user>> get_authenticated_user() const
throw (exceptions::authentication_exception) override {
return _authenticator.authenticate(_credentials);
}
private:
const password_authenticator& _authenticator;
credentials_map _credentials;
bool _complete = false;
};
return ::make_shared<plain_text_password_challenge>(*this);
}
//
// Similar in structure to `auth::service::has_existing_users()`, but trying to generalize the pattern breaks all kinds
// of module boundaries and leaks implementation details.
//
future<bool> auth::password_authenticator::has_existing_users() const {
static const sstring default_user_query = sprint(
"SELECT * FROM %s.%s WHERE %s = ?",
meta::AUTH_KS,
CREDENTIALS_CF,
USER_NAME);
static const sstring all_users_query = sprint(
"SELECT * FROM %s.%s LIMIT 1",
meta::AUTH_KS,
CREDENTIALS_CF);
// This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we
// can potentially avoid doing a range query with a high consistency level.
return _qp.process(
default_user_query,
db::consistency_level::ONE,
{ meta::DEFAULT_SUPERUSER_NAME },
true).then([this](auto results) {
if (!results->empty()) {
return make_ready_future<bool>(true);
}
return _qp.process(
default_user_query,
db::consistency_level::QUORUM,
{ meta::DEFAULT_SUPERUSER_NAME },
true).then([this](auto results) {
if (!results->empty()) {
return make_ready_future<bool>(true);
}
return _qp.process(
all_users_query,
db::consistency_level::QUORUM).then([](auto results) {
return make_ready_future<bool>(!results->empty());
});
});
});
}

View File

@@ -42,48 +42,31 @@
#pragma once
#include "authenticator.hh"
#include "cql3/query_processor.hh"
#include "delayed_tasks.hh"
namespace service {
class migration_manager;
}
namespace auth {
const sstring& password_authenticator_name();
class password_authenticator : public authenticator {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
delayed_tasks<> _delayed{};
public:
password_authenticator(cql3::query_processor&, ::service::migration_manager&);
static const sstring PASSWORD_AUTHENTICATOR_NAME;
password_authenticator();
~password_authenticator();
future<> start() override;
future<> init();
future<> stop() override;
const sstring& qualified_java_name() const override;
const sstring& class_name() const override;
bool require_authentication() const override;
option_set supported_options() const override;
option_set alterable_options() const override;
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override;
future<> create(sstring username, const option_map& options) override;
future<> alter(sstring username, const option_map& options) override;
future<> drop(sstring username) override;
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override;
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
const resource_ids& protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
static db::consistency_level consistency_for_user(const sstring& username);
private:
future<bool> has_existing_users() const;
};
}

View File

@@ -44,7 +44,6 @@
#include <unordered_set>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include "enum_set.hh"
namespace auth {

View File

@@ -1,51 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/permissions_cache.hh"
#include "auth/authorizer.hh"
#include "auth/common.hh"
#include "auth/service.hh"
#include "db/config.hh"
namespace auth {
permissions_cache_config permissions_cache_config::from_db_config(const db::config& dc) {
permissions_cache_config c;
c.max_entries = dc.permissions_cache_max_entries();
c.validity_period = std::chrono::milliseconds(dc.permissions_validity_in_ms());
c.update_period = std::chrono::milliseconds(dc.permissions_update_interval_in_ms());
return c;
}
permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)
: _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {
log.debug("Refreshing permissions for {}", k.first.name());
return ser.underlying_authorizer().authorize(ser, ::make_shared<authenticated_user>(k.first), k.second);
}) {
}
future<permission_set> permissions_cache::get(::shared_ptr<authenticated_user> user, data_resource r) {
return _cache.get(key_type(*user, r));
}
}

View File

@@ -1,99 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <functional>
#include <iostream>
#include <utility>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/authenticated_user.hh"
#include "auth/data_resource.hh"
#include "auth/permission.hh"
#include "log.hh"
#include "utils/loading_cache.hh"
namespace std {
template <>
struct hash<auth::data_resource> final {
size_t operator()(const auth::data_resource & v) const {
return v.hash_value();
}
};
template <>
struct hash<auth::authenticated_user> final {
size_t operator()(const auth::authenticated_user & v) const {
return utils::tuple_hash()(v.name(), v.is_anonymous());
}
};
inline std::ostream& operator<<(std::ostream& os, const std::pair<auth::authenticated_user, auth::data_resource>& p) {
os << "{user: " << p.first.name() << ", data_resource: " << p.second << "}";
return os;
}
}
namespace db {
class config;
}
namespace auth {
class service;
struct permissions_cache_config final {
static permissions_cache_config from_db_config(const db::config&);
std::size_t max_entries;
std::chrono::milliseconds validity_period;
std::chrono::milliseconds update_period;
};
class permissions_cache final {
using cache_type = utils::loading_cache<
std::pair<authenticated_user, data_resource>,
permission_set,
utils::loading_cache_reload_enabled::yes,
utils::simple_entry_size<permission_set>,
utils::tuple_hash>;
using key_type = typename cache_type::key_type;
cache_type _cache;
public:
explicit permissions_cache(const permissions_cache_config&, service&, logging::logger&);
future <> stop() {
return _cache.stop();
}
future<permission_set> get(::shared_ptr<authenticated_user>, data_resource);
};
}

View File

@@ -1,355 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/service.hh"
#include <map>
#include <seastar/core/future-util.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/allow_all_authenticator.hh"
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "db/config.hh"
#include "db/consistency_level.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_listener.hh"
#include "utils/class_registrator.hh"
namespace auth {
namespace meta {
static const sstring user_name_col_name("name");
static const sstring superuser_col_name("super");
}
static logging::logger log("auth_service");
class auth_migration_listener final : public ::service::migration_listener {
authorizer& _authorizer;
public:
explicit auth_migration_listener(authorizer& a) : _authorizer(a) {
}
private:
void on_create_keyspace(const sstring& ks_name) override {}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_create_view(const sstring& ks_name, const sstring& view_name) override {}
void on_update_keyspace(const sstring& ks_name) override {}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_drop_keyspace(const sstring& ks_name) override {
_authorizer.revoke_all(auth::data_resource(ks_name));
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
_authorizer.revoke_all(auth::data_resource(ks_name, cf_name));
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
};
static db::consistency_level consistency_for_user(const sstring& name) {
if (name == meta::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
} else {
return db::consistency_level::LOCAL_ONE;
}
}
static future<::shared_ptr<cql3::untyped_result_set>> select_user(cql3::query_processor& qp, const sstring& name) {
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return qp.process(
sprint(
"SELECT * FROM %s.%s WHERE %s = ?",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name),
consistency_for_user(name),
{ name },
true);
}
service_config service_config::from_db_config(const db::config& dc) {
const qualified_name qualified_authorizer_name(meta::AUTH_PACKAGE_NAME, dc.authorizer());
const qualified_name qualified_authenticator_name(meta::AUTH_PACKAGE_NAME, dc.authenticator());
service_config c;
c.authorizer_java_name = qualified_authorizer_name;
c.authenticator_java_name = qualified_authenticator_name;
return c;
}
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_manager& mm,
std::unique_ptr<authorizer> a,
std::unique_ptr<authenticator> b)
: _permissions_cache_config(std::move(c))
, _permissions_cache(nullptr)
, _qp(qp)
, _migration_manager(mm)
, _authorizer(std::move(a))
, _authenticator(std::move(b))
, _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {
}
service::service(
permissions_cache_config cache_config,
cql3::query_processor& qp,
::service::migration_manager& mm,
const service_config& sc)
: service(
std::move(cache_config),
qp,
mm,
create_object<authorizer>(sc.authorizer_java_name, qp, mm),
create_object<authenticator>(sc.authenticator_java_name, qp, mm)) {
}
bool service::should_create_metadata() const {
const bool null_authorizer = _authorizer->qualified_java_name() == allow_all_authorizer_name();
const bool null_authenticator = _authenticator->qualified_java_name() == allow_all_authenticator_name();
return !null_authorizer || !null_authenticator;
}
future<> service::create_metadata_if_missing() {
auto& db = _qp.db().local();
auto f = make_ready_future<>();
if (!db.has_keyspace(meta::AUTH_KS)) {
std::map<sstring, sstring> opts{{"replication_factor", "1"}};
auto ksm = keyspace_metadata::new_keyspace(
meta::AUTH_KS,
"org.apache.cassandra.locator.SimpleStrategy",
opts,
true);
// We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
// See issue #2129.
f = _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
}
return f.then([this] {
// 3 months.
static const auto gc_grace_seconds = 90 * 24 * 60 * 60;
static const sstring users_table_query = sprint(
"CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY (%s)) WITH gc_grace_seconds=%s",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name,
meta::superuser_col_name,
meta::user_name_col_name,
gc_grace_seconds);
return create_metadata_table_if_missing(
meta::USERS_CF,
_qp,
users_table_query,
_migration_manager);
}).then([this] {
delay_until_system_ready(_delayed, [this] {
return has_existing_users().then([this](bool existing) {
if (!existing) {
//
// Create default superuser.
//
static const sstring query = sprint(
"INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name,
meta::superuser_col_name);
return _qp.process(
query,
db::consistency_level::ONE,
{ meta::DEFAULT_SUPERUSER_NAME, true }).then([](auto&&) {
log.info("Created default superuser '{}'", meta::DEFAULT_SUPERUSER_NAME);
}).handle_exception([](auto exn) {
try {
std::rethrow_exception(exn);
} catch (const exceptions::request_execution_exception&) {
log.warn("Skipped default superuser setup: some nodes were not ready");
}
}).discard_result();
}
return make_ready_future<>();
});
});
return make_ready_future<>();
});
}
future<> service::start() {
return once_among_shards([this] {
if (should_create_metadata()) {
return create_metadata_if_missing();
}
return make_ready_future<>();
}).then([this] {
return when_all_succeed(_authorizer->start(), _authenticator->start());
}).then([this] {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
return once_among_shards([this] {
_migration_manager.register_listener(_migration_listener.get());
return make_ready_future<>();
});
});
}
future<> service::stop() {
return once_among_shards([this] {
_delayed.cancel_all();
return make_ready_future<>();
}).then([this] {
return _permissions_cache->stop();
}).then([this] {
return when_all_succeed(_authorizer->stop(), _authenticator->stop());
});
}
future<bool> service::has_existing_users() const {
static const sstring default_user_query = sprint(
"SELECT * FROM %s.%s WHERE %s = ?",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name);
static const sstring all_users_query = sprint(
"SELECT * FROM %s.%s LIMIT 1",
meta::AUTH_KS,
meta::USERS_CF);
// This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we
// can potentially avoid doing a range query with a high consistency level.
return _qp.process(
default_user_query,
db::consistency_level::ONE,
{ meta::DEFAULT_SUPERUSER_NAME },
true).then([this](auto results) {
if (!results->empty()) {
return make_ready_future<bool>(true);
}
return _qp.process(
default_user_query,
db::consistency_level::QUORUM,
{ meta::DEFAULT_SUPERUSER_NAME },
true).then([this](auto results) {
if (!results->empty()) {
return make_ready_future<bool>(true);
}
return _qp.process(
all_users_query,
db::consistency_level::QUORUM).then([](auto results) {
return make_ready_future<bool>(!results->empty());
});
});
});
}
future<bool> service::is_existing_user(const sstring& name) const {
return select_user(_qp, name).then([](auto results) {
return !results->empty();
});
}
future<bool> service::is_super_user(const sstring& name) const {
return select_user(_qp, name).then([](auto results) {
return !results->empty() && results->one().template get_as<bool>(meta::superuser_col_name);
});
}
future<> service::insert_user(const sstring& name, bool is_superuser) {
return _qp.process(
sprint(
"INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name,
meta::superuser_col_name),
consistency_for_user(name),
{ name, is_superuser }).discard_result();
}
future<> service::delete_user(const sstring& name) {
return _qp.process(
sprint(
"DELETE FROM %s.%s WHERE %s = ?",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name),
consistency_for_user(name),
{ name }).discard_result();
}
future<permission_set> service::get_permissions(::shared_ptr<authenticated_user> u, data_resource r) const {
return _permissions_cache->get(std::move(u), std::move(r));
}
//
// Free functions.
//
future<bool> is_super_user(const service& ser, const authenticated_user& u) {
if (u.is_anonymous()) {
return make_ready_future<bool>(false);
}
return ser.is_super_user(u.name());
}
}

View File

@@ -1,133 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <memory>
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include "auth/authenticator.hh"
#include "auth/authorizer.hh"
#include "auth/authenticated_user.hh"
#include "auth/permission.hh"
#include "auth/permissions_cache.hh"
#include "delayed_tasks.hh"
#include "seastarx.hh"
namespace cql3 {
class query_processor;
}
namespace db {
class config;
}
namespace service {
class migration_manager;
class migration_listener;
}
namespace auth {
class authenticator;
class authorizer;
struct service_config final {
static service_config from_db_config(const db::config&);
sstring authorizer_java_name;
sstring authenticator_java_name;
};
class service final {
permissions_cache_config _permissions_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache;
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
std::unique_ptr<authorizer> _authorizer;
std::unique_ptr<authenticator> _authenticator;
// Only one of these should be registered, so we end up with some unused instances. Not the end of the world.
std::unique_ptr<::service::migration_listener> _migration_listener;
delayed_tasks<> _delayed{};
public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_manager&,
std::unique_ptr<authorizer>,
std::unique_ptr<authenticator>);
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_manager&,
const service_config&);
future<> start();
future<> stop();
future<bool> is_existing_user(const sstring& name) const;
future<bool> is_super_user(const sstring& name) const;
future<> insert_user(const sstring& name, bool is_superuser);
future<> delete_user(const sstring& name);
future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource) const;
authenticator& underlying_authenticator() {
return *_authenticator;
}
const authenticator& underlying_authenticator() const {
return *_authenticator;
}
authorizer& underlying_authorizer() {
return *_authorizer;
}
const authorizer& underlying_authorizer() const {
return *_authorizer;
}
private:
future<bool> has_existing_users() const;
bool should_create_metadata() const;
future<> create_metadata_if_missing();
};
future<bool> is_super_user(const service&, const authenticated_user&);
}

View File

@@ -1,232 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2017 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authenticator.hh"
#include "authenticated_user.hh"
#include "authenticator.hh"
#include "authorizer.hh"
#include "password_authenticator.hh"
#include "default_authorizer.hh"
#include "permission.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
namespace auth {
class service;
static const sstring PACKAGE_NAME("com.scylladb.auth.");
static const sstring& transitional_authenticator_name() {
static const sstring name = PACKAGE_NAME + "TransitionalAuthenticator";
return name;
}
static const sstring& transitional_authorizer_name() {
static const sstring name = PACKAGE_NAME + "TransitionalAuthorizer";
return name;
}
class transitional_authenticator : public authenticator {
std::unique_ptr<authenticator> _authenticator;
public:
static const sstring PASSWORD_AUTHENTICATOR_NAME;
transitional_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
: transitional_authenticator(std::make_unique<password_authenticator>(qp, mm))
{}
transitional_authenticator(std::unique_ptr<authenticator> a)
: _authenticator(std::move(a))
{}
future<> start() override {
return _authenticator->start();
}
future<> stop() override {
return _authenticator->stop();
}
const sstring& qualified_java_name() const override {
return transitional_authenticator_name();
}
bool require_authentication() const override {
return true;
}
option_set supported_options() const override {
return _authenticator->supported_options();
}
option_set alterable_options() const override {
return _authenticator->alterable_options();
}
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override {
auto i = credentials.find(authenticator::USERNAME_KEY);
if ((i == credentials.end() || i->second.empty()) && (!credentials.count(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {
// return anon user
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
return make_ready_future().then([this, &credentials] {
return _authenticator->authenticate(credentials);
}).handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::authentication_exception&) {
// return anon user
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
});
}
future<> create(sstring username, const option_map& options) override {
return _authenticator->create(username, options);
}
future<> alter(sstring username, const option_map& options) override {
return _authenticator->alter(username, options);
}
future<> drop(sstring username) override {
return _authenticator->drop(username);
}
const resource_ids& protected_resources() const override {
return _authenticator->protected_resources();
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
class sasl_wrapper : public sasl_challenge {
public:
sasl_wrapper(::shared_ptr<sasl_challenge> sasl)
: _sasl(std::move(sasl))
{}
bytes evaluate_response(bytes_view client_response) override {
try {
return _sasl->evaluate_response(client_response);
} catch (exceptions::authentication_exception&) {
_complete = true;
return {};
}
}
bool is_complete() const {
return _complete || _sasl->is_complete();
}
future<::shared_ptr<authenticated_user>> get_authenticated_user() const {
return futurize_apply([this] {
return _sasl->get_authenticated_user().handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::authentication_exception&) {
// return anon user
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
});
});
}
private:
::shared_ptr<sasl_challenge> _sasl;
bool _complete = false;
};
return ::make_shared<sasl_wrapper>(_authenticator->new_sasl_challenge());
}
};
class transitional_authorizer : public authorizer {
std::unique_ptr<authorizer> _authorizer;
public:
transitional_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
: transitional_authorizer(std::make_unique<default_authorizer>(qp, mm))
{}
transitional_authorizer(std::unique_ptr<authorizer> a)
: _authorizer(std::move(a))
{}
~transitional_authorizer()
{}
future<> start() override {
return _authorizer->start();
}
future<> stop() override {
return _authorizer->stop();
}
const sstring& qualified_java_name() const override {
return transitional_authorizer_name();
}
future<permission_set> authorize(service& ser, ::shared_ptr<authenticated_user> user, data_resource resource) const override {
return is_super_user(ser, *user).then([](bool s) {
static const permission_set transitional_permissions =
permission_set::of<permission::CREATE,
permission::ALTER, permission::DROP,
permission::SELECT, permission::MODIFY>();
return make_ready_future<permission_set>(s ? permissions::ALL : transitional_permissions);
});
}
future<> grant(::shared_ptr<authenticated_user> user, permission_set ps, data_resource r, sstring s) override {
return _authorizer->grant(std::move(user), std::move(ps), std::move(r), std::move(s));
}
future<> revoke(::shared_ptr<authenticated_user> user, permission_set ps, data_resource r, sstring s) override {
return _authorizer->revoke(std::move(user), std::move(ps), std::move(r), std::move(s));
}
future<std::vector<permission_details>> list(service& ser, ::shared_ptr<authenticated_user> user, permission_set ps, optional<data_resource> r, optional<sstring> s) const override {
return _authorizer->list(ser, std::move(user), std::move(ps), std::move(r), std::move(s));
}
future<> revoke_all(sstring s) override {
return _authorizer->revoke_all(std::move(s));
}
future<> revoke_all(data_resource r) override {
return _authorizer->revoke_all(std::move(r));
}
const resource_ids& protected_resources() override {
return _authorizer->protected_resources();
}
future<> validate_configuration() const override {
return _authorizer->validate_configuration();
}
};
}
//
// To ensure correct initialization order, we unfortunately need to use string literals.
//
static const class_registrator<
auth::authenticator,
auth::transitional_authenticator,
cql3::query_processor&,
::service::migration_manager&> transitional_authenticator_reg("com.scylladb.auth.TransitionalAuthenticator");
static const class_registrator<
auth::authorizer,
auth::transitional_authorizer,
cql3::query_processor&,
::service::migration_manager&> transitional_authorizer_reg("com.scylladb.auth.TransitionalAuthorizer");

View File

@@ -21,17 +21,14 @@
#pragma once
#include "seastarx.hh"
#include "core/sstring.hh"
#include "hashing.hh"
#include <experimental/optional>
#include <iosfwd>
#include <functional>
#include "utils/mutable_view.hh"
using bytes = basic_sstring<int8_t, uint32_t, 31>;
using bytes_view = std::experimental::basic_string_view<int8_t>;
using bytes_mutable_view = basic_mutable_view<bytes_view::value_type>;
using bytes_opt = std::experimental::optional<bytes>;
using sstring_view = std::experimental::string_view;

View File

@@ -38,7 +38,6 @@ class bytes_ostream {
public:
using size_type = bytes::size_type;
using value_type = bytes::value_type;
static constexpr size_type max_chunk_size() { return 16 * 1024; }
private:
static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
struct chunk {
@@ -59,6 +58,7 @@ private:
};
// FIXME: consider increasing chunk size as the buffer grows
static constexpr size_type chunk_size{512};
static constexpr size_type usable_chunk_size{chunk_size - sizeof(chunk)};
private:
std::unique_ptr<chunk> _begin;
chunk* _current;
@@ -99,19 +99,6 @@ private:
}
return _current->size - _current->offset;
}
// Figure out next chunk size.
// - must be enough for data_size
// - must be at least chunk_size
// - try to double each time to prevent too many allocations
// - do not exceed max_chunk_size
size_type next_alloc_size(size_t data_size) const {
auto next_size = _current
? _current->size * 2
: chunk_size;
next_size = std::min(next_size, max_chunk_size());
// FIXME: check for overflow?
return std::max<size_type>(next_size, data_size + sizeof(chunk));
}
// Makes room for a contiguous region of given size.
// The region is accounted for as already written.
// size must not be zero.
@@ -122,7 +109,7 @@ private:
_size += size;
return ret;
} else {
auto alloc_size = next_alloc_size(size);
auto alloc_size = size <= usable_chunk_size ? chunk_size : (size + sizeof(chunk));
auto space = malloc(alloc_size);
if (!space) {
throw std::bad_alloc();
@@ -166,18 +153,19 @@ public:
}
bytes_ostream& operator=(const bytes_ostream& o) {
if (this != &o) {
auto x = bytes_ostream(o);
*this = std::move(x);
}
_size = 0;
_current = nullptr;
_begin = {};
append(o);
return *this;
}
bytes_ostream& operator=(bytes_ostream&& o) noexcept {
if (this != &o) {
this->~bytes_ostream();
new (this) bytes_ostream(std::move(o));
}
_size = o._size;
_begin = std::move(o._begin);
_current = o._current;
o._current = nullptr;
o._size = 0;
return *this;
}
@@ -186,7 +174,7 @@ public:
value_type* ptr;
// makes the place_holder looks like a stream
seastar::simple_output_stream get_stream() {
return seastar::simple_output_stream(reinterpret_cast<char*>(ptr), sizeof(T));
return seastar::simple_output_stream{reinterpret_cast<char*>(ptr)};
}
};
@@ -207,19 +195,19 @@ public:
if (v.empty()) {
return;
}
auto this_size = std::min(v.size(), size_t(current_space_left()));
if (this_size) {
memcpy(_current->data + _current->offset, v.begin(), this_size);
_current->offset += this_size;
_size += this_size;
v.remove_prefix(this_size);
}
while (!v.empty()) {
auto this_size = std::min(v.size(), size_t(max_chunk_size()));
std::copy_n(v.begin(), this_size, alloc(this_size));
v.remove_prefix(this_size);
auto space_left = current_space_left();
if (v.size() <= space_left) {
memcpy(_current->data + _current->offset, v.begin(), v.size());
_current->offset += v.size();
_size += v.size();
} else {
if (space_left) {
memcpy(_current->data + _current->offset, v.begin(), space_left);
_current->offset += space_left;
_size += space_left;
v.remove_prefix(space_left);
}
memcpy(alloc(v.size()), v.begin(), v.size());
}
}
@@ -284,8 +272,13 @@ public:
}
void append(const bytes_ostream& o) {
for (auto&& bv : o.fragments()) {
write(bv);
if (o.size() > 0) {
auto dst = alloc(o.size());
auto r = o._begin.get();
while (r) {
dst = std::copy_n(r->data, r->offset, dst);
r = r->next.get();
}
}
}
@@ -335,45 +328,6 @@ public:
_current->next = nullptr;
_current->offset = pos._offset;
}
void reduce_chunk_count() {
// FIXME: This is a simplified version. It linearizes the whole buffer
// if its size is below max_chunk_size. We probably could also gain
// some read performance by doing "real" reduction, i.e. merging
// all chunks until all but the last one is max_chunk_size.
if (size() < max_chunk_size()) {
linearize();
}
}
bool operator==(const bytes_ostream& other) const {
auto as = fragments().begin();
auto as_end = fragments().end();
auto bs = other.fragments().begin();
auto bs_end = other.fragments().end();
auto a = *as++;
auto b = *bs++;
while (!a.empty() || !b.empty()) {
auto now = std::min(a.size(), b.size());
if (!std::equal(a.begin(), a.begin() + now, b.begin(), b.begin() + now)) {
return false;
}
a.remove_prefix(now);
if (a.empty() && as != as_end) {
a = *as++;
}
b.remove_prefix(now);
if (b.empty() && bs != bs_end) {
b = *bs++;
}
}
return true;
}
bool operator!=(const bytes_ostream& other) const {
return !(*this == other);
}
};
template<>

View File

@@ -1,661 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include "row_cache.hh"
#include "mutation_reader.hh"
#include "streamed_mutation.hh"
#include "partition_version.hh"
#include "utils/logalloc.hh"
#include "query-request.hh"
#include "partition_snapshot_reader.hh"
#include "partition_snapshot_row_cursor.hh"
#include "read_context.hh"
#include "flat_mutation_reader.hh"
namespace cache {
extern logging::logger clogger;
class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
enum class state {
before_static_row,
// Invariants:
// - position_range(_lower_bound, _upper_bound) covers all not yet emitted positions from current range
// - if _next_row has valid iterators:
// - _next_row points to the nearest row in cache >= _lower_bound
// - _next_row_in_range = _next.position() < _upper_bound
// - if _next_row doesn't have valid iterators, it has no meaning.
reading_from_cache,
// Starts reading from underlying reader.
// The range to read is position_range(_lower_bound, min(_next_row.position(), _upper_bound)).
// Invariants:
// - _next_row_in_range = _next.position() < _upper_bound
move_to_underlying,
// Invariants:
// - Upper bound of the read is min(_next_row.position(), _upper_bound)
// - _next_row_in_range = _next.position() < _upper_bound
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
reading_from_underlying,
end_of_stream
};
lw_shared_ptr<partition_snapshot> _snp;
position_in_partition::tri_compare _position_cmp;
query::clustering_key_filter_ranges _ck_ranges;
query::clustering_row_ranges::const_iterator _ck_ranges_curr;
query::clustering_row_ranges::const_iterator _ck_ranges_end;
lsa_manager _lsa_manager;
partition_snapshot_row_weakref _last_row;
// We need to be prepared that we may get overlapping and out of order
// range tombstones. We must emit fragments with strictly monotonic positions,
// so we can't just trim such tombstones to the position of the last fragment.
// To solve that, range tombstones are accumulated first in a range_tombstone_stream
// and emitted once we have a fragment with a larger position.
range_tombstone_stream _tombstones;
// Holds the lower bound of a position range which hasn't been processed yet.
// Only fragments with positions < _lower_bound have been emitted.
//
// It is assumed that !_lower_bound.is_clustering_row(). We depend on this when
// calling range_tombstone::trim_front() and when inserting dummy entries. Dummy
// entries are assumed to be only at !is_clustering_row() positions.
position_in_partition _lower_bound;
position_in_partition_view _upper_bound;
state _state = state::before_static_row;
lw_shared_ptr<read_context> _read_context;
partition_snapshot_row_cursor _next_row;
bool _next_row_in_range = false;
// True iff current population interval, since the previous clustering row, starts before all clustered rows.
// We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
// because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
// us from marking the interval as continuous.
// Valid when _state == reading_from_underlying.
bool _population_range_starts_before_all_rows;
future<> do_fill_buffer();
void copy_from_cache_to_buffer();
future<> process_static_row();
void move_to_end();
void move_to_next_range();
void move_to_range(query::clustering_row_ranges::const_iterator);
void move_to_next_entry();
// Emits all delayed range tombstones with positions smaller than upper_bound.
void drain_tombstones(position_in_partition_view upper_bound);
// Emits all delayed range tombstones.
void drain_tombstones();
void add_to_buffer(const partition_snapshot_row_cursor&);
void add_clustering_row_to_buffer(mutation_fragment&&);
void add_to_buffer(range_tombstone&&);
void add_to_buffer(mutation_fragment&&);
future<> read_from_underlying();
void start_reading_from_underlying();
bool after_current_range(position_in_partition_view position);
bool can_populate() const;
void maybe_update_continuity();
void maybe_add_to_cache(const mutation_fragment& mf);
void maybe_add_to_cache(const clustering_row& cr);
void maybe_add_to_cache(const range_tombstone& rt);
void maybe_add_to_cache(const static_row& sr);
void maybe_set_static_row_continuous();
void finish_reader() {
push_mutation_fragment(partition_end());
_end_of_stream = true;
_state = state::end_of_stream;
}
public:
cache_flat_mutation_reader(schema_ptr s,
dht::decorated_key dk,
query::clustering_key_filter_ranges&& crr,
lw_shared_ptr<read_context> ctx,
lw_shared_ptr<partition_snapshot> snp,
row_cache& cache)
: flat_mutation_reader::impl(std::move(s))
, _snp(std::move(snp))
, _position_cmp(*_schema)
, _ck_ranges(std::move(crr))
, _ck_ranges_curr(_ck_ranges.begin())
, _ck_ranges_end(_ck_ranges.end())
, _lsa_manager(cache)
, _tombstones(*_schema)
, _lower_bound(position_in_partition::before_all_clustered_rows())
, _upper_bound(position_in_partition_view::before_all_clustered_rows())
, _read_context(std::move(ctx))
, _next_row(*_schema, *_snp)
{
clogger.trace("csm {}: table={}.{}", this, _schema->ks_name(), _schema->cf_name());
push_mutation_fragment(partition_start(std::move(dk), _snp->partition_tombstone()));
}
cache_flat_mutation_reader(const cache_flat_mutation_reader&) = delete;
cache_flat_mutation_reader(cache_flat_mutation_reader&&) = delete;
virtual future<> fill_buffer() override;
virtual ~cache_flat_mutation_reader() {
maybe_merge_versions(_snp, _lsa_manager.region(), _lsa_manager.read_section());
}
virtual void next_partition() override {
clear_buffer_to_next_partition();
if (is_buffer_empty()) {
_end_of_stream = true;
}
}
virtual future<> fast_forward_to(const dht::partition_range&) override {
clear_buffer();
_end_of_stream = true;
return make_ready_future<>();
}
virtual future<> fast_forward_to(position_range pr) override {
throw std::bad_function_call();
}
};
inline
future<> cache_flat_mutation_reader::process_static_row() {
if (_snp->version()->partition().static_row_continuous()) {
_read_context->cache().on_row_hit();
row sr = _lsa_manager.run_in_read_section([this] {
return _snp->static_row();
});
if (!sr.empty()) {
push_mutation_fragment(mutation_fragment(static_row(std::move(sr))));
}
return make_ready_future<>();
} else {
_read_context->cache().on_row_miss();
return _read_context->get_next_fragment().then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
});
}
}
inline
future<> cache_flat_mutation_reader::fill_buffer() {
if (_state == state::before_static_row) {
auto after_static_row = [this] {
if (_ck_ranges_curr == _ck_ranges_end) {
finish_reader();
return make_ready_future<>();
}
_state = state::reading_from_cache;
_lsa_manager.run_in_read_section([this] {
move_to_range(_ck_ranges_curr);
});
return fill_buffer();
};
if (_schema->has_static_columns()) {
return process_static_row().then(std::move(after_static_row));
} else {
return after_static_row();
}
}
clogger.trace("csm {}: fill_buffer(), range={}, lb={}", this, *_ck_ranges_curr, _lower_bound);
return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this] {
return do_fill_buffer();
});
}
inline
future<> cache_flat_mutation_reader::do_fill_buffer() {
if (_state == state::move_to_underlying) {
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}).then([this] {
return read_from_underlying();
});
}
if (_state == state::reading_from_underlying) {
return read_from_underlying();
}
// assert(_state == state::reading_from_cache)
return _lsa_manager.run_in_read_section([this] {
auto next_valid = _next_row.iterators_valid();
clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", this, _lower_bound,
_upper_bound, _next_row.position(), next_valid);
// We assume that if there was eviction, and thus the range may
// no longer be continuous, the cursor was invalidated.
if (!next_valid) {
auto adjacent = _next_row.advance_to(_lower_bound);
_next_row_in_range = !after_current_range(_next_row.position());
if (!adjacent && !_next_row.continuous()) {
_last_row = nullptr; // We could insert a dummy here, but this path is unlikely.
start_reading_from_underlying();
return make_ready_future<>();
}
}
_next_row.maybe_refresh();
clogger.trace("csm {}: next={}, cont={}", this, _next_row.position(), _next_row.continuous());
while (!is_buffer_full() && _state == state::reading_from_cache) {
copy_from_cache_to_buffer();
if (need_preempt()) {
break;
}
}
return make_ready_future<>();
});
}
inline
future<> cache_flat_mutation_reader::read_from_underlying() {
return consume_mutation_fragments_until(_read_context->underlying().underlying(),
[this] { return _state != state::reading_from_underlying || is_buffer_full(); },
[this] (mutation_fragment mf) {
_read_context->cache().on_row_miss();
maybe_add_to_cache(mf);
add_to_buffer(std::move(mf));
},
[this] {
_state = state::reading_from_cache;
_lsa_manager.run_in_update_section([this] {
auto same_pos = _next_row.maybe_refresh();
if (!same_pos) {
_read_context->cache().on_mispopulate(); // FIXME: Insert dummy entry at _upper_bound.
_next_row_in_range = !after_current_range(_next_row.position());
if (!_next_row.continuous()) {
start_reading_from_underlying();
}
return;
}
if (_next_row_in_range) {
maybe_update_continuity();
_last_row = _next_row;
add_to_buffer(_next_row);
try {
move_to_next_entry();
} catch (const std::bad_alloc&) {
// We cannot reenter the section, since we may have moved to the new range, and
// because add_to_buffer() should not be repeated.
_snp->region().allocator().invalidate_references(); // Invalidates _next_row
}
} else {
if (no_clustering_row_between(*_schema, _upper_bound, _next_row.position())) {
this->maybe_update_continuity();
} else if (can_populate()) {
rows_entry::compare less(*_schema);
auto& rows = _snp->version()->partition().clustered_rows();
if (query::is_single_row(*_schema, *_ck_ranges_curr)) {
with_allocator(_snp->region().allocator(), [&] {
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(_ck_ranges_curr->start()->value()));
// Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
auto inserted = insert_result.second;
auto it = insert_result.first;
if (inserted) {
e.release();
auto next = std::next(it);
it->set_continuous(next->continuous());
clogger.trace("csm {}: inserted dummy at {}, cont={}", this, it->position(), it->continuous());
}
});
} else if (!_ck_ranges_curr->start() || _last_row.refresh(*_snp)) {
with_allocator(_snp->region().allocator(), [&] {
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(*_schema, _upper_bound, is_dummy::yes, is_continuous::yes));
// Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
auto inserted = insert_result.second;
if (inserted) {
clogger.trace("csm {}: inserted dummy at {}", this, _upper_bound);
e.release();
} else {
clogger.trace("csm {}: mark {} as continuous", this, insert_result.first->position());
insert_result.first->set_continuous(true);
}
});
}
} else {
_read_context->cache().on_mispopulate();
}
try {
move_to_next_range();
} catch (const std::bad_alloc&) {
// We cannot reenter the section, since we may have moved to the new range
_snp->region().allocator().invalidate_references(); // Invalidates _next_row
}
}
});
return make_ready_future<>();
});
}
inline
void cache_flat_mutation_reader::maybe_update_continuity() {
if (can_populate() && (_population_range_starts_before_all_rows || _last_row.refresh(*_snp))) {
if (_next_row.is_in_latest_version()) {
clogger.trace("csm {}: mark {} continuous", this, _next_row.get_iterator_in_latest_version()->position());
_next_row.get_iterator_in_latest_version()->set_continuous(true);
} else {
// Cover entry from older version
with_allocator(_snp->region().allocator(), [&] {
auto& rows = _snp->version()->partition().clustered_rows();
rows_entry::compare less(*_schema);
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(*_schema, _next_row.position(), is_dummy(_next_row.dummy()), is_continuous::yes));
auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
auto inserted = insert_result.second;
if (inserted) {
clogger.trace("csm {}: inserted dummy at {}", this, e->position());
e.release();
}
});
}
} else {
_read_context->cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const mutation_fragment& mf) {
if (mf.is_range_tombstone()) {
maybe_add_to_cache(mf.as_range_tombstone());
} else {
assert(mf.is_clustering_row());
const clustering_row& cr = mf.as_clustering_row();
maybe_add_to_cache(cr);
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (!can_populate()) {
_last_row = nullptr;
_population_range_starts_before_all_rows = false;
_read_context->cache().on_mispopulate();
return;
}
clogger.trace("csm {}: populate({})", this, cr);
_lsa_manager.run_in_update_section_with_allocator([this, &cr] {
mutation_partition& mp = _snp->version()->partition();
rows_entry::compare less(*_schema);
auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(cr.key(), cr.tomb(), cr.marker(), cr.cells()));
new_entry->set_continuous(false);
auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
: mp.clustered_rows().lower_bound(cr.key(), less);
auto insert_result = mp.clustered_rows().insert_check(it, *new_entry, less);
if (insert_result.second) {
_read_context->cache().on_row_insert();
new_entry.release();
}
it = insert_result.first;
rows_entry& e = *it;
if (!_ck_ranges_curr->start() || _last_row.refresh(*_snp)) {
clogger.trace("csm {}: set_continuous({})", this, e.position());
e.set_continuous(true);
} else {
_read_context->cache().on_mispopulate();
}
with_allocator(standard_allocator(), [&] {
_last_row = partition_snapshot_row_weakref(*_snp, it);
});
_population_range_starts_before_all_rows = false;
});
}
inline
bool cache_flat_mutation_reader::after_current_range(position_in_partition_view p) {
return _position_cmp(p, _upper_bound) >= 0;
}
inline
void cache_flat_mutation_reader::start_reading_from_underlying() {
clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", this, _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);
_state = state::move_to_underlying;
}
inline
void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", this, _next_row.position(), _next_row_in_range);
position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
for (auto&& rts : _snp->range_tombstones(*_schema, _lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
add_to_buffer(std::move(rts));
if (is_buffer_full()) {
return;
}
}
if (_next_row_in_range) {
_last_row = _next_row;
add_to_buffer(_next_row);
move_to_next_entry();
} else {
move_to_next_range();
}
}
inline
void cache_flat_mutation_reader::move_to_end() {
drain_tombstones();
finish_reader();
clogger.trace("csm {}: eos", this);
}
inline
void cache_flat_mutation_reader::move_to_next_range() {
auto next_it = std::next(_ck_ranges_curr);
if (next_it == _ck_ranges_end) {
move_to_end();
_ck_ranges_curr = next_it;
} else {
move_to_range(next_it);
}
}
inline
void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::const_iterator next_it) {
auto lb = position_in_partition::for_range_start(*next_it);
auto ub = position_in_partition_view::for_range_end(*next_it);
_last_row = nullptr;
_lower_bound = std::move(lb);
_upper_bound = std::move(ub);
_ck_ranges_curr = next_it;
auto adjacent = _next_row.advance_to(_lower_bound);
_next_row_in_range = !after_current_range(_next_row.position());
clogger.trace("csm {}: move_to_range(), range={}, lb={}, ub={}, next={}", this, *_ck_ranges_curr, _lower_bound, _upper_bound, _next_row.position());
if (!adjacent && !_next_row.continuous()) {
// FIXME: We don't insert a dummy for singular range to avoid allocating 3 entries
// for a hit (before, at and after). If we supported the concept of an incomplete row,
// we could insert such a row for the lower bound if it's full instead, for both singular and
// non-singular ranges.
if (_ck_ranges_curr->start() && !query::is_single_row(*_schema, *_ck_ranges_curr)) {
// Insert dummy for lower bound
if (can_populate()) {
// FIXME: _lower_bound could be adjacent to the previous row, in which case we could skip this
clogger.trace("csm {}: insert dummy at {}", this, _lower_bound);
auto it = with_allocator(_lsa_manager.region().allocator(), [&] {
auto& rows = _snp->version()->partition().clustered_rows();
auto new_entry = current_allocator().construct<rows_entry>(*_schema, _lower_bound, is_dummy::yes, is_continuous::no);
return rows.insert_before(_next_row.get_iterator_in_latest_version(), *new_entry);
});
_last_row = partition_snapshot_row_weakref(*_snp, it);
} else {
_read_context->cache().on_mispopulate();
}
}
start_reading_from_underlying();
}
}
// _next_row must be inside the range.
inline
void cache_flat_mutation_reader::move_to_next_entry() {
clogger.trace("csm {}: move_to_next_entry(), curr={}", this, _next_row.position());
if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
move_to_next_range();
} else {
if (!_next_row.next()) {
move_to_end();
return;
}
_next_row_in_range = !after_current_range(_next_row.position());
clogger.trace("csm {}: next={}, cont={}, in_range={}", this, _next_row.position(), _next_row.continuous(), _next_row_in_range);
if (!_next_row.continuous()) {
start_reading_from_underlying();
}
}
}
inline
void cache_flat_mutation_reader::drain_tombstones(position_in_partition_view pos) {
while (true) {
reserve_one();
auto mfo = _tombstones.get_next(pos);
if (!mfo) {
break;
}
push_mutation_fragment(std::move(*mfo));
}
}
inline
void cache_flat_mutation_reader::drain_tombstones() {
while (true) {
reserve_one();
auto mfo = _tombstones.get_next();
if (!mfo) {
break;
}
push_mutation_fragment(std::move(*mfo));
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_to_buffer({})", this, mf);
if (mf.is_clustering_row()) {
add_clustering_row_to_buffer(std::move(mf));
} else {
assert(mf.is_range_tombstone());
add_to_buffer(std::move(mf).as_range_tombstone());
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {
if (!row.dummy()) {
_read_context->cache().on_row_hit();
add_clustering_row_to_buffer(row.row());
}
}
// Maintains the following invariants, also in case of exception:
// (1) no fragment with position >= _lower_bound was pushed yet
// (2) If _lower_bound > mf.position(), mf was emitted
inline
void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mf);
auto& row = mf.as_clustering_row();
auto key = row.key();
try {
drain_tombstones(row.position());
push_mutation_fragment(std::move(mf));
_lower_bound = position_in_partition::after_key(std::move(key));
} catch (...) {
// We may have emitted some of the range tombstones which start after the old _lower_bound
_lower_bound = position_in_partition::for_key(std::move(key));
throw;
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
clogger.trace("csm {}: add_to_buffer({})", this, rt);
// This guarantees that rt starts after any emitted clustering_row
if (!rt.trim_front(*_schema, _lower_bound)) {
return;
}
_lower_bound = position_in_partition(rt.position());
_tombstones.apply(std::move(rt));
drain_tombstones(_lower_bound);
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {
if (can_populate()) {
clogger.trace("csm {}: maybe_add_to_cache({})", this, rt);
_lsa_manager.run_in_update_section_with_allocator([&] {
_snp->version()->partition().row_tombstones().apply_monotonically(*_schema, rt);
});
} else {
_read_context->cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
if (can_populate()) {
clogger.trace("csm {}: populate({})", this, sr);
_read_context->cache().on_row_insert();
_lsa_manager.run_in_update_section_with_allocator([&] {
_snp->version()->partition().static_row().apply(*_schema, column_kind::static_column, sr.cells());
});
} else {
_read_context->cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_set_static_row_continuous() {
if (can_populate()) {
clogger.trace("csm {}: set static row continuous", this);
_snp->version()->partition().set_static_row_continuous(true);
} else {
_read_context->cache().on_mispopulate();
}
}
inline
bool cache_flat_mutation_reader::can_populate() const {
return _snp->at_latest_version() && _read_context->cache().phase_of(_read_context->key()) == _read_context->phase();
}
} // namespace cache
inline flat_mutation_reader make_cache_flat_mutation_reader(schema_ptr s,
dht::decorated_key dk,
query::clustering_key_filter_ranges crr,
row_cache& cache,
lw_shared_ptr<cache::read_context> ctx,
lw_shared_ptr<partition_snapshot> snp)
{
return make_flat_mutation_reader<cache::cache_flat_mutation_reader>(
std::move(s), std::move(dk), std::move(crr), std::move(ctx), std::move(snp), cache);
}

View File

@@ -24,7 +24,6 @@
#include <boost/lexical_cast.hpp>
#include "exceptions/exceptions.hh"
#include "json.hh"
#include "seastarx.hh"
class schema;
@@ -59,34 +58,30 @@ class caching_options {
caching_options() : _key_cache(default_key), _row_cache(default_row) {}
public:
std::map<sstring, sstring> to_map() const {
return {{ "keys", _key_cache }, { "rows_per_partition", _row_cache }};
}
sstring to_sstring() const {
return json::to_json(to_map());
return json::to_json(std::map<sstring, sstring>({{ "keys", _key_cache }, { "rows_per_partition", _row_cache }}));
}
template<typename Map>
static caching_options from_map(const Map & map) {
sstring k = default_key;
sstring r = default_row;
static caching_options from_sstring(const sstring& str) {
auto map = json::to_map(str);
if (map.size() > 2) {
throw exceptions::configuration_exception("Invalid map: " + str);
}
sstring k;
sstring r;
if (map.count("keys")) {
k = map.at("keys");
} else {
k = default_key;
}
for (auto& p : map) {
if (p.first == "keys") {
k = p.second;
} else if (p.first == "rows_per_partition") {
r = p.second;
} else {
throw exceptions::configuration_exception("Invalid caching option: " + p.first);
}
if (map.count("rows_per_partition")) {
r = map.at("rows_per_partition");
} else {
r = default_row;
}
return caching_options(k, r);
}
static caching_options from_sstring(const sstring& str) {
return from_map(json::to_map(str));
}
bool operator==(const caching_options& other) const {
return _key_cache == other._key_cache && _row_cache == other._row_cache;
}

View File

@@ -22,7 +22,6 @@
#include "canonical_mutation.hh"
#include "mutation.hh"
#include "mutation_partition_serializer.hh"
#include "counters.hh"
#include "converting_mutation_partition_applier.hh"
#include "hashing_partition_visitor.hh"
#include "utils/UUID.hh"
@@ -45,7 +44,7 @@ canonical_mutation::canonical_mutation(const mutation& m)
mutation_partition_serializer part_ser(*m.schema(), m.partition());
bytes_ostream out;
ser::writer_of_canonical_mutation<bytes_ostream> wr(out);
ser::writer_of_canonical_mutation wr(out);
std::move(wr).write_table_id(m.schema()->id())
.write_schema_version(m.schema()->version())
.write_key(m.key())

View File

@@ -1,566 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <boost/intrusive/unordered_set.hpp>
#if __has_include(<boost/container/small_vector.hpp>)
#include <boost/container/small_vector.hpp>
template <typename T, size_t N>
using small_vector = boost::container::small_vector<T, N>;
#else
#include <vector>
template <typename T, size_t N>
using small_vector = std::vector<T>;
#endif
#include "fnv1a_hasher.hh"
#include "streamed_mutation.hh"
#include "mutation_partition.hh"
class cells_range {
using ids_vector_type = small_vector<column_id, 5>;
position_in_partition_view _position;
ids_vector_type _ids;
public:
using iterator = ids_vector_type::iterator;
using const_iterator = ids_vector_type::const_iterator;
cells_range()
: _position(position_in_partition_view(position_in_partition_view::static_row_tag_t())) { }
explicit cells_range(position_in_partition_view pos, const row& cells)
: _position(pos)
{
_ids.reserve(cells.size());
cells.for_each_cell([this] (auto id, auto&&) {
_ids.emplace_back(id);
});
}
position_in_partition_view position() const { return _position; }
bool empty() const { return _ids.empty(); }
auto begin() const { return _ids.begin(); }
auto end() const { return _ids.end(); }
};
class partition_cells_range {
const mutation_partition& _mp;
public:
class iterator {
const mutation_partition& _mp;
stdx::optional<mutation_partition::rows_type::const_iterator> _position;
cells_range _current;
public:
explicit iterator(const mutation_partition& mp)
: _mp(mp)
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row())
{ }
iterator(const mutation_partition& mp, mutation_partition::rows_type::const_iterator it)
: _mp(mp)
, _position(it)
{ }
iterator& operator++() {
if (!_position) {
_position = _mp.clustered_rows().begin();
} else {
++(*_position);
}
if (_position != _mp.clustered_rows().end()) {
auto it = *_position;
_current = cells_range(position_in_partition_view(position_in_partition_view::clustering_row_tag_t(), it->key()),
it->row().cells());
}
return *this;
}
iterator operator++(int) {
iterator it(*this);
operator++();
return it;
}
cells_range& operator*() {
return _current;
}
cells_range* operator->() {
return &_current;
}
bool operator==(const iterator& other) const {
return _position == other._position;
}
bool operator!=(const iterator& other) const {
return !(*this == other);
}
};
public:
explicit partition_cells_range(const mutation_partition& mp) : _mp(mp) { }
iterator begin() const {
return iterator(_mp);
}
iterator end() const {
return iterator(_mp, _mp.clustered_rows().end());
}
};
class locked_cell;
struct cell_locker_stats {
uint64_t lock_acquisitions = 0;
uint64_t operations_waiting_for_lock = 0;
};
class cell_locker {
public:
using timeout_clock = lowres_clock;
private:
using semaphore_type = basic_semaphore<default_timeout_exception_factory, timeout_clock>;
class partition_entry;
struct cell_address {
position_in_partition position;
column_id id;
};
class cell_entry : public bi::unordered_set_base_hook<bi::link_mode<bi::auto_unlink>>,
public enable_lw_shared_from_this<cell_entry> {
partition_entry& _parent;
cell_address _address;
semaphore_type _semaphore { 0 };
friend class cell_locker;
public:
cell_entry(partition_entry& parent, position_in_partition position, column_id id)
: _parent(parent)
, _address { std::move(position), id }
{ }
// Upgrades cell_entry to another schema.
// Changes the value of cell_address, so cell_entry has to be
// temporarily removed from its parent partition_entry.
// Returns true if the cell_entry still exist in the new schema and
// should be reinserted.
bool upgrade(const schema& from, const schema& to, column_kind kind) noexcept {
auto& old_column_mapping = from.get_column_mapping();
auto& column = old_column_mapping.column_at(kind, _address.id);
auto cdef = to.get_column_definition(column.name());
if (!cdef) {
return false;
}
_address.id = cdef->id;
return true;
}
const position_in_partition& position() const {
return _address.position;
}
future<> lock(timeout_clock::time_point _timeout) {
return _semaphore.wait(_timeout);
}
void unlock() {
_semaphore.signal();
}
~cell_entry() {
if (!is_linked()) {
return;
}
unlink();
if (!--_parent._cell_count) {
delete &_parent;
}
}
class hasher {
const schema* _schema; // pointer instead of reference for default assignment
public:
explicit hasher(const schema& s) : _schema(&s) { }
size_t operator()(const cell_address& ca) const {
fnv1a_hasher hasher;
ca.position.feed_hash(hasher, *_schema);
::feed_hash(hasher, ca.id);
return hasher.finalize();
}
size_t operator()(const cell_entry& ce) const {
return operator()(ce._address);
}
};
class equal_compare {
position_in_partition::equal_compare _cmp;
private:
bool do_compare(const cell_address& a, const cell_address& b) const {
return a.id == b.id && _cmp(a.position, b.position);
}
public:
explicit equal_compare(const schema& s) : _cmp(s) { }
bool operator()(const cell_address& ca, const cell_entry& ce) const {
return do_compare(ca, ce._address);
}
bool operator()(const cell_entry& ce, const cell_address& ca) const {
return do_compare(ca, ce._address);
}
bool operator()(const cell_entry& a, const cell_entry& b) const {
return do_compare(a._address, b._address);
}
};
};
class partition_entry : public bi::unordered_set_base_hook<bi::link_mode<bi::auto_unlink>> {
using cells_type = bi::unordered_set<cell_entry,
bi::equal<cell_entry::equal_compare>,
bi::hash<cell_entry::hasher>,
bi::constant_time_size<false>>;
static constexpr size_t initial_bucket_count = 16;
using max_load_factor = std::ratio<3, 4>;
dht::decorated_key _key;
cell_locker& _parent;
size_t _rehash_at_size = compute_rehash_at_size(initial_bucket_count);
std::unique_ptr<cells_type::bucket_type[]> _buckets; // TODO: start with internal storage?
size_t _cell_count = 0; // cells_type::empty() is not O(1) if the hook is auto-unlink
cells_type::bucket_type _internal_buckets[initial_bucket_count];
cells_type _cells;
schema_ptr _schema;
friend class cell_entry;
private:
static constexpr size_t compute_rehash_at_size(size_t bucket_count) {
return bucket_count * max_load_factor::num / max_load_factor::den;
}
void maybe_rehash() {
if (_cell_count >= _rehash_at_size) {
auto new_bucket_count = std::min(_cells.bucket_count() * 2, _cells.bucket_count() + 1024);
auto buckets = std::make_unique<cells_type::bucket_type[]>(new_bucket_count);
_cells.rehash(cells_type::bucket_traits(buckets.get(), new_bucket_count));
_buckets = std::move(buckets);
_rehash_at_size = compute_rehash_at_size(new_bucket_count);
}
}
public:
partition_entry(schema_ptr s, cell_locker& parent, const dht::decorated_key& dk)
: _key(dk)
, _parent(parent)
, _cells(cells_type::bucket_traits(_internal_buckets, initial_bucket_count),
cell_entry::hasher(*s), cell_entry::equal_compare(*s))
, _schema(s)
{ }
~partition_entry() {
if (is_linked()) {
_parent._partition_count--;
}
}
// Upgrades partition entry to new schema. Returns false if all
// cell_entries has been removed during the upgrade.
bool upgrade(schema_ptr new_schema);
void insert(lw_shared_ptr<cell_entry> cell) {
_cells.insert(*cell);
_cell_count++;
maybe_rehash();
}
cells_type& cells() {
return _cells;
}
struct hasher {
size_t operator()(const dht::decorated_key& dk) const {
return std::hash<dht::decorated_key>()(dk);
}
size_t operator()(const partition_entry& pe) const {
return operator()(pe._key);
}
};
class equal_compare {
dht::decorated_key_equals_comparator _cmp;
public:
explicit equal_compare(const schema& s) : _cmp(s) { }
bool operator()(const dht::decorated_key& dk, const partition_entry& pe) {
return _cmp(dk, pe._key);
}
bool operator()(const partition_entry& pe, const dht::decorated_key& dk) {
return _cmp(dk, pe._key);
}
bool operator()(const partition_entry& a, const partition_entry& b) {
return _cmp(a._key, b._key);
}
};
};
using partitions_type = bi::unordered_set<partition_entry,
bi::equal<partition_entry::equal_compare>,
bi::hash<partition_entry::hasher>,
bi::constant_time_size<false>>;
static constexpr size_t initial_bucket_count = 4 * 1024;
using max_load_factor = std::ratio<3, 4>;
std::unique_ptr<partitions_type::bucket_type[]> _buckets;
partitions_type _partitions;
size_t _partition_count = 0;
size_t _rehash_at_size = compute_rehash_at_size(initial_bucket_count);
schema_ptr _schema;
// partitions_type uses equality comparator which keeps a reference to the
// original schema, we must ensure that it doesn't die.
schema_ptr _original_schema;
cell_locker_stats& _stats;
friend class locked_cell;
private:
struct locker;
static constexpr size_t compute_rehash_at_size(size_t bucket_count) {
return bucket_count * max_load_factor::num / max_load_factor::den;
}
void maybe_rehash() {
if (_partition_count >= _rehash_at_size) {
auto new_bucket_count = std::min(_partitions.bucket_count() * 2, _partitions.bucket_count() + 64 * 1024);
auto buckets = std::make_unique<partitions_type::bucket_type[]>(new_bucket_count);
_partitions.rehash(partitions_type::bucket_traits(buckets.get(), new_bucket_count));
_buckets = std::move(buckets);
_rehash_at_size = compute_rehash_at_size(new_bucket_count);
}
}
public:
explicit cell_locker(schema_ptr s, cell_locker_stats& stats)
: _buckets(std::make_unique<partitions_type::bucket_type[]>(initial_bucket_count))
, _partitions(partitions_type::bucket_traits(_buckets.get(), initial_bucket_count),
partition_entry::hasher(), partition_entry::equal_compare(*s))
, _schema(s)
, _original_schema(std::move(s))
, _stats(stats)
{ }
~cell_locker() {
assert(_partitions.empty());
}
void set_schema(schema_ptr s) {
_schema = s;
}
schema_ptr schema() const {
return _schema;
}
// partition_cells_range is required to be in cell_locker::schema()
future<std::vector<locked_cell>> lock_cells(const dht::decorated_key& dk, partition_cells_range&& range,
timeout_clock::time_point timeout);
};
class locked_cell {
lw_shared_ptr<cell_locker::cell_entry> _entry;
public:
explicit locked_cell(lw_shared_ptr<cell_locker::cell_entry> entry)
: _entry(std::move(entry)) { }
locked_cell(const locked_cell&) = delete;
locked_cell(locked_cell&&) = default;
~locked_cell() {
if (_entry) {
_entry->unlock();
}
}
};
struct cell_locker::locker {
cell_entry::hasher _hasher;
cell_entry::equal_compare _eq_cmp;
partition_entry& _partition_entry;
partition_cells_range _range;
partition_cells_range::iterator _current_ck;
cells_range::const_iterator _current_cell;
timeout_clock::time_point _timeout;
std::vector<locked_cell> _locks;
cell_locker_stats& _stats;
private:
void update_ck() {
if (!is_done()) {
_current_cell = _current_ck->begin();
}
}
future<> lock_next();
bool is_done() const { return _current_ck == _range.end(); }
public:
explicit locker(const ::schema& s, cell_locker_stats& st, partition_entry& pe, partition_cells_range&& range, timeout_clock::time_point timeout)
: _hasher(s)
, _eq_cmp(s)
, _partition_entry(pe)
, _range(std::move(range))
, _current_ck(_range.begin())
, _timeout(timeout)
, _stats(st)
{
update_ck();
}
locker(const locker&) = delete;
locker(locker&&) = delete;
future<> lock_all() {
// Cannot defer before first call to lock_next().
return lock_next().then([this] {
return do_until([this] { return is_done(); }, [this] {
return lock_next();
});
});
}
std::vector<locked_cell> get() && { return std::move(_locks); }
};
inline
future<std::vector<locked_cell>> cell_locker::lock_cells(const dht::decorated_key& dk, partition_cells_range&& range, timeout_clock::time_point timeout) {
partition_entry::hasher pe_hash;
partition_entry::equal_compare pe_eq(*_schema);
auto it = _partitions.find(dk, pe_hash, pe_eq);
std::unique_ptr<partition_entry> partition;
if (it == _partitions.end()) {
partition = std::make_unique<partition_entry>(_schema, *this, dk);
} else if (!it->upgrade(_schema)) {
partition = std::unique_ptr<partition_entry>(&*it);
_partition_count--;
_partitions.erase(it);
}
if (partition) {
std::vector<locked_cell> locks;
for (auto&& r : range) {
if (r.empty()) {
continue;
}
for (auto&& c : r) {
auto cell = make_lw_shared<cell_entry>(*partition, position_in_partition(r.position()), c);
_stats.lock_acquisitions++;
partition->insert(cell);
locks.emplace_back(std::move(cell));
}
}
if (!locks.empty()) {
_partitions.insert(*partition.release());
_partition_count++;
maybe_rehash();
}
return make_ready_future<std::vector<locked_cell>>(std::move(locks));
}
auto l = std::make_unique<locker>(*_schema, _stats, *it, std::move(range), timeout);
auto f = l->lock_all();
return f.then([l = std::move(l)] {
return std::move(*l).get();
});
}
inline
future<> cell_locker::locker::lock_next() {
while (!is_done()) {
if (_current_cell == _current_ck->end()) {
++_current_ck;
update_ck();
continue;
}
auto cid = *_current_cell++;
cell_address ca { position_in_partition(_current_ck->position()), cid };
auto it = _partition_entry.cells().find(ca, _hasher, _eq_cmp);
if (it != _partition_entry.cells().end()) {
_stats.operations_waiting_for_lock++;
return it->lock(_timeout).then([this, ce = it->shared_from_this()] () mutable {
_stats.operations_waiting_for_lock--;
_stats.lock_acquisitions++;
_locks.emplace_back(std::move(ce));
});
}
auto cell = make_lw_shared<cell_entry>(_partition_entry, position_in_partition(_current_ck->position()), cid);
_stats.lock_acquisitions++;
_partition_entry.insert(cell);
_locks.emplace_back(std::move(cell));
}
return make_ready_future<>();
}
inline
bool cell_locker::partition_entry::upgrade(schema_ptr new_schema) {
if (_schema == new_schema) {
return true;
}
auto buckets = std::make_unique<cells_type::bucket_type[]>(_cells.bucket_count());
auto cells = cells_type(cells_type::bucket_traits(buckets.get(), _cells.bucket_count()),
cell_entry::hasher(*new_schema), cell_entry::equal_compare(*new_schema));
_cells.clear_and_dispose([&] (cell_entry* cell_ptr) noexcept {
auto& cell = *cell_ptr;
auto kind = cell.position().is_static_row() ? column_kind::static_column
: column_kind::regular_column;
auto reinsert = cell.upgrade(*_schema, *new_schema, kind);
if (reinsert) {
cells.insert(cell);
} else {
_cell_count--;
}
});
// bi::unordered_set move assignment is actually a swap.
// Original _buckets cannot be destroyed before the container using them is
// so we need to explicitly make sure that the original _cells is no more.
_cells = std::move(cells);
auto destroy = [] (auto) { };
destroy(std::move(cells));
_buckets = std::move(buckets);
_schema = new_schema;
return _cell_count;
}

View File

@@ -27,125 +27,125 @@
class checked_file_impl : public file_impl {
public:
checked_file_impl(const io_error_handler& error_handler, file f)
: _error_handler(error_handler), _file(f) {
checked_file_impl(disk_error_signal_type& s, file f)
: _signal(s) , _file(f) {
_memory_dma_alignment = f.memory_dma_alignment();
_disk_read_dma_alignment = f.disk_read_dma_alignment();
_disk_write_dma_alignment = f.disk_write_dma_alignment();
}
virtual future<size_t> write_dma(uint64_t pos, const void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> write_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, iov, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, iov, pc);
});
}
virtual future<> flush(void) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->flush();
});
}
virtual future<struct stat> stat(void) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->stat();
});
}
virtual future<> truncate(uint64_t length) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->truncate(length);
});
}
virtual future<> discard(uint64_t offset, uint64_t length) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->discard(offset, length);
});
}
virtual future<> allocate(uint64_t position, uint64_t length) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->allocate(position, length);
});
}
virtual future<uint64_t> size(void) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->size();
});
}
virtual future<> close() override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->close();
});
}
// returns a handle for plain file, so make_checked_file() should be called
// on file returned by handle.
virtual std::unique_ptr<seastar::file_handle_impl> dup() override {
return get_file_impl(_file)->dup();
}
virtual subscription<directory_entry> list_directory(std::function<future<> (directory_entry de)> next) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->list_directory(next);
});
}
virtual future<temporary_buffer<uint8_t>> dma_read_bulk(uint64_t offset, size_t range_size, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return get_file_impl(_file)->dma_read_bulk(offset, range_size, pc);
});
}
private:
const io_error_handler& _error_handler;
disk_error_signal_type &_signal;
file _file;
};
inline file make_checked_file(const io_error_handler& error_handler, file f)
inline file make_checked_file(disk_error_signal_type& signal, file& f)
{
return file(::make_shared<checked_file_impl>(error_handler, f));
return file(::make_shared<checked_file_impl>(signal, f));
}
future<file>
inline open_checked_file_dma(const io_error_handler& error_handler,
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags,
file_open_options options = {})
file_open_options options)
{
return do_io_check(error_handler, [&] {
return do_io_check(signal, [&] {
return open_file_dma(name, flags, options).then([&] (file f) {
return make_ready_future<file>(make_checked_file(error_handler, f));
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_directory(const io_error_handler& error_handler,
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags)
{
return do_io_check(signal, [&] {
return open_file_dma(name, flags).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_directory(disk_error_signal_type& signal,
sstring name)
{
return do_io_check(error_handler, [&] {
return do_io_check(signal, [&] {
return engine().open_directory(name).then([&] (file f) {
return make_ready_future<file>(make_checked_file(error_handler, f));
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}

View File

@@ -1,49 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <algorithm>
#include <atomic>
#include <chrono>
#include <cstdint>
extern std::atomic<int64_t> clocks_offset;
template<typename Duration>
static inline void forward_jump_clocks(Duration delta)
{
auto d = std::chrono::duration_cast<std::chrono::seconds>(delta).count();
clocks_offset.fetch_add(d, std::memory_order_relaxed);
}
static inline std::chrono::seconds get_clocks_offset()
{
auto off = clocks_offset.load(std::memory_order_relaxed);
return std::chrono::seconds(off);
}
// Returns a time point which is earlier from t by d, or minimum time point if it cannot be represented.
template<typename Clock, typename Duration, typename Rep, typename Period>
inline
auto saturating_subtract(std::chrono::time_point<Clock, Duration> t, std::chrono::duration<Rep, Period> d) -> decltype(t) {
return std::max(t, decltype(t)::min() + d) - d;
}

View File

@@ -42,63 +42,47 @@ std::ostream& operator<<(std::ostream& out, const bound_kind k);
bound_kind invert_kind(bound_kind k);
int32_t weight(bound_kind k);
static inline bound_kind flip_bound_kind(bound_kind bk)
{
switch (bk) {
case bound_kind::excl_end: return bound_kind::excl_start;
case bound_kind::incl_end: return bound_kind::incl_start;
case bound_kind::excl_start: return bound_kind::excl_end;
case bound_kind::incl_start: return bound_kind::incl_end;
}
abort();
}
class bound_view {
public:
const static thread_local clustering_key empty_prefix;
public:
const clustering_key_prefix& prefix;
bound_kind kind;
bound_view(const clustering_key_prefix& prefix, bound_kind kind)
: prefix(prefix)
, kind(kind)
{ }
bound_view(const bound_view& other) noexcept = default;
bound_view& operator=(const bound_view& other) noexcept {
if (this != &other) {
this->~bound_view();
new (this) bound_view(other);
}
return *this;
}
struct tri_compare {
struct compare {
// To make it assignable and to avoid taking a schema_ptr, we
// wrap the schema reference.
std::reference_wrapper<const schema> _s;
tri_compare(const schema& s) : _s(s)
compare(const schema& s) : _s(s)
{ }
int operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
bool operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
auto type = _s.get().clustering_key_prefix_type();
auto res = prefix_equality_tri_compare(type->types().begin(),
type->begin(p1), type->end(p1),
type->begin(p2), type->end(p2),
::tri_compare);
tri_compare);
if (res) {
return res;
return res < 0;
}
auto d1 = p1.size(_s);
auto d2 = p2.size(_s);
if (d1 == d2) {
return w1 - w2;
return w1 < w2;
}
return d1 < d2 ? w1 - (w1 <= 0) : -(w2 - (w2 <= 0));
}
int operator()(const bound_view b, const clustering_key_prefix& p) const {
return operator()(b.prefix, weight(b.kind), p, 0);
}
int operator()(const clustering_key_prefix& p, const bound_view b) const {
return operator()(p, 0, b.prefix, weight(b.kind));
}
int operator()(const bound_view b1, const bound_view b2) const {
return operator()(b1.prefix, weight(b1.kind), b2.prefix, weight(b2.kind));
}
};
struct compare {
// To make it assignable and to avoid taking a schema_ptr, we
// wrap the schema reference.
tri_compare _cmp;
compare(const schema& s) : _cmp(s)
{ }
bool operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
return _cmp(p1, w1, p2, w2) < 0;
return d1 < d2 ? w1 <= 0 : w2 > 0;
}
bool operator()(const bound_view b, const clustering_key_prefix& p) const {
return operator()(b.prefix, weight(b.kind), p, 0);
@@ -122,33 +106,20 @@ public:
static bound_view top() {
return {empty_prefix, bound_kind::incl_end};
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
static bound_view from_range_start(const R<clustering_key_prefix>& range) {
return range.start()
? bound_view(range.start()->value(), range.start()->is_inclusive() ? bound_kind::incl_start : bound_kind::excl_start)
: bottom();
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix> )
static bound_view from_range_end(const R<clustering_key_prefix>& range) {
return range.end()
? bound_view(range.end()->value(), range.end()->is_inclusive() ? bound_kind::incl_end : bound_kind::excl_end)
: top();
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix> )
static std::pair<bound_view, bound_view> from_range(const R<clustering_key_prefix>& range) {
return {from_range_start(range), from_range_end(range)};
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
static stdx::optional<typename R<clustering_key_prefix_view>::bound> to_range_bound(const bound_view& bv) {
if (&bv.prefix == &empty_prefix) {
return {};
}
bool inclusive = bv.kind != bound_kind::excl_end && bv.kind != bound_kind::excl_start;
return {typename R<clustering_key_prefix_view>::bound(bv.prefix.view(), inclusive)};
/*
template<template<typename> typename T, typename U>
concept bool Range() {
return requires (T<U> range) {
{ range.start() } -> stdx::optional<U>;
{ range.end() } -> stdx::optional<U>;
};
};*/
template<template<typename> typename Range>
static std::pair<bound_view, bound_view> from_range(const Range<clustering_key_prefix>& range) {
return {
range.start() ? bound_view(range.start()->value(), range.start()->is_inclusive() ? bound_kind::incl_start : bound_kind::excl_start) : bottom(),
range.end() ? bound_view(range.end()->value(), range.end()->is_inclusive() ? bound_kind::incl_end : bound_kind::excl_end) : top(),
};
}
friend std::ostream& operator<<(std::ostream& out, const bound_view& b) {
return out << "{bound: prefix=" << b.prefix << ", kind=" << b.kind << "}";

138
clustering_key_filter.cc Normal file
View File

@@ -0,0 +1,138 @@
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "clustering_key_filter.hh"
#include "keys.hh"
#include "query-request.hh"
#include "range.hh"
namespace query {
const clustering_row_ranges&
clustering_key_filtering_context::get_ranges(const partition_key& key) const {
static thread_local clustering_row_ranges full_range = {{}};
return _factory ? _factory->get_ranges(key) : full_range;
}
clustering_key_filtering_context clustering_key_filtering_context::create_no_filtering() {
return clustering_key_filtering_context{};
}
const clustering_key_filtering_context no_clustering_key_filtering =
clustering_key_filtering_context::create_no_filtering();
class stateless_clustering_key_filter_factory : public clustering_key_filter_factory {
clustering_key_filter _filter;
clustering_row_ranges _ranges;
public:
stateless_clustering_key_filter_factory(clustering_row_ranges&& ranges,
clustering_key_filter&& filter)
: _filter(std::move(filter)), _ranges(std::move(ranges)) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
return _filter;
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
return _filter;
}
virtual const clustering_row_ranges& get_ranges(const partition_key& key) override {
return _ranges;
}
virtual bool want_static_columns(const partition_key& key) override {
return true;
}
};
class partition_slice_clustering_key_filter_factory : public clustering_key_filter_factory {
schema_ptr _schema;
const partition_slice& _slice;
clustering_key_prefix::prefix_equal_tri_compare _cmp;
clustering_row_ranges _ck_ranges;
public:
partition_slice_clustering_key_filter_factory(schema_ptr s, const partition_slice& slice)
: _schema(std::move(s)), _slice(slice), _cmp(*_schema) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const clustering_range& r) { return r.contains(key, _cmp); });
};
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const clustering_range& r) { return r.contains(key, _cmp); });
};
}
virtual const clustering_row_ranges& get_ranges(const partition_key& key) override {
if (_slice.options.contains(query::partition_slice::option::reversed)) {
_ck_ranges = _slice.row_ranges(*_schema, key);
std::reverse(_ck_ranges.begin(), _ck_ranges.end());
return _ck_ranges;
}
return _slice.row_ranges(*_schema, key);
}
virtual bool want_static_columns(const partition_key& key) override {
return true;
}
};
static const shared_ptr<clustering_key_filter_factory>
create_partition_slice_filter(schema_ptr s, const partition_slice& slice) {
return ::make_shared<partition_slice_clustering_key_filter_factory>(std::move(s), slice);
}
const clustering_key_filtering_context
clustering_key_filtering_context::create(schema_ptr schema, const partition_slice& slice) {
static thread_local clustering_key_filtering_context accept_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(clustering_row_ranges{{}},
[](const clustering_key&) { return true; }));
static thread_local clustering_key_filtering_context reject_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(clustering_row_ranges{},
[](const clustering_key&) { return false; }));
if (slice.get_specific_ranges()) {
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
const clustering_row_ranges& ranges = slice.default_row_ranges();
if (ranges.empty()) {
return reject_all;
}
if (ranges.size() == 1 && ranges[0].is_full()) {
return accept_all;
}
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
}

View File

@@ -22,47 +22,61 @@
*/
#pragma once
#include <functional>
#include <vector>
#include "core/shared_ptr.hh"
#include "database_fwd.hh"
#include "schema.hh"
#include "query-request.hh"
template<typename T> class range;
namespace query {
class clustering_key_filter_ranges {
clustering_row_ranges _storage;
const clustering_row_ranges& _ref;
class partition_slice;
// A predicate that tells if a clustering key should be accepted.
using clustering_key_filter = std::function<bool(const clustering_key&)>;
// A factory for clustering key filter which can be reused for multiple clustering keys.
class clustering_key_filter_factory {
public:
clustering_key_filter_ranges(const clustering_row_ranges& ranges) : _ref(ranges) { }
struct reversed { };
clustering_key_filter_ranges(reversed, const clustering_row_ranges& ranges)
: _storage(ranges.rbegin(), ranges.rend()), _ref(_storage) { }
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
virtual clustering_key_filter get_filter(const partition_key&) = 0;
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
virtual clustering_key_filter get_filter_for_sorted(const partition_key&) = 0;
virtual const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key&) = 0;
// Whether we want to get the static row, in addition to the desired clustering rows
virtual bool want_static_columns(const partition_key&) = 0;
clustering_key_filter_ranges(clustering_key_filter_ranges&& other) noexcept
: _storage(std::move(other._storage))
, _ref(&other._ref == &other._storage ? _storage : other._ref)
{ }
clustering_key_filter_ranges& operator=(clustering_key_filter_ranges&& other) noexcept {
if (this != &other) {
this->~clustering_key_filter_ranges();
new (this) clustering_key_filter_ranges(std::move(other));
}
return *this;
}
auto begin() const { return _ref.begin(); }
auto end() const { return _ref.end(); }
bool empty() const { return _ref.empty(); }
size_t size() const { return _ref.size(); }
const clustering_row_ranges& ranges() const { return _ref; }
static clustering_key_filter_ranges get_ranges(const schema& schema, const query::partition_slice& slice, const partition_key& key) {
const query::clustering_row_ranges& ranges = slice.row_ranges(schema, key);
if (slice.options.contains(query::partition_slice::option::reversed)) {
return clustering_key_filter_ranges(clustering_key_filter_ranges::reversed{}, ranges);
}
return clustering_key_filter_ranges(ranges);
}
virtual ~clustering_key_filter_factory() = default;
};
class clustering_key_filtering_context {
private:
shared_ptr<clustering_key_filter_factory> _factory;
clustering_key_filtering_context() {};
clustering_key_filtering_context(shared_ptr<clustering_key_filter_factory> factory) : _factory(factory) {}
public:
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
clustering_key_filter get_filter(const partition_key& key) const {
return _factory ? _factory->get_filter(key) : [] (const clustering_key&) { return true; };
}
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
clustering_key_filter get_filter_for_sorted(const partition_key& key) const {
return _factory ? _factory->get_filter_for_sorted(key) : [] (const clustering_key&) { return true; };
}
const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key& key) const;
bool want_static_columns(const partition_key& key) const {
return _factory ? _factory->want_static_columns(key) : true;
}
static const clustering_key_filtering_context create(schema_ptr, const partition_slice&);
static clustering_key_filtering_context create_no_filtering();
};
extern const clustering_key_filtering_context no_clustering_key_filtering;
}

View File

@@ -1,219 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "schema.hh"
#include "query-request.hh"
#include "streamed_mutation.hh"
// Utility for in-order checking of overlap with position ranges.
class clustering_ranges_walker {
const schema& _schema;
const query::clustering_row_ranges& _ranges;
query::clustering_row_ranges::const_iterator _current;
query::clustering_row_ranges::const_iterator _end;
bool _in_current; // next position is known to be >= _current_start
bool _with_static_row;
position_in_partition_view _current_start;
position_in_partition_view _current_end;
stdx::optional<position_in_partition> _trim;
size_t _change_counter = 1;
private:
bool advance_to_next_range() {
_in_current = false;
if (!_current_start.is_static_row()) {
if (_current == _end) {
return false;
}
++_current;
}
++_change_counter;
if (_current == _end) {
_current_end = _current_start = position_in_partition_view::after_all_clustered_rows();
return false;
}
_current_start = position_in_partition_view::for_range_start(*_current);
_current_end = position_in_partition_view::for_range_end(*_current);
return true;
}
public:
clustering_ranges_walker(const schema& s, const query::clustering_row_ranges& ranges, bool with_static_row = true)
: _schema(s)
, _ranges(ranges)
, _current(ranges.begin())
, _end(ranges.end())
, _in_current(with_static_row)
, _with_static_row(with_static_row)
, _current_start(position_in_partition_view::for_static_row())
, _current_end(position_in_partition_view::before_all_clustered_rows())
{
if (!with_static_row) {
if (_current == _end) {
_current_start = position_in_partition_view::before_all_clustered_rows();
} else {
_current_start = position_in_partition_view::for_range_start(*_current);
_current_end = position_in_partition_view::for_range_end(*_current);
}
}
}
clustering_ranges_walker(clustering_ranges_walker&& o) noexcept
: _schema(o._schema)
, _ranges(o._ranges)
, _current(o._current)
, _end(o._end)
, _in_current(o._in_current)
, _with_static_row(o._with_static_row)
, _current_start(o._current_start)
, _current_end(o._current_end)
, _trim(std::move(o._trim))
, _change_counter(o._change_counter)
{ }
clustering_ranges_walker& operator=(clustering_ranges_walker&& o) {
if (this != &o) {
this->~clustering_ranges_walker();
new (this) clustering_ranges_walker(std::move(o));
}
return *this;
}
// Excludes positions smaller than pos from the ranges.
// pos should be monotonic.
// No constraints between pos and positions passed to advance_to().
//
// After the invocation, when !out_of_range(), lower_bound() returns the smallest position still contained.
void trim_front(position_in_partition pos) {
position_in_partition::less_compare less(_schema);
do {
if (!less(_current_start, pos)) {
break;
}
if (less(pos, _current_end)) {
_trim = std::move(pos);
_current_start = *_trim;
_in_current = false;
++_change_counter;
break;
}
} while (advance_to_next_range());
}
// Returns true if given position is contained.
// Must be called with monotonic positions.
// Idempotent.
bool advance_to(position_in_partition_view pos) {
position_in_partition::less_compare less(_schema);
do {
if (!_in_current && less(pos, _current_start)) {
break;
}
// All subsequent clustering keys are larger than the start of this
// range so there is no need to check that again.
_in_current = true;
if (less(pos, _current_end)) {
return true;
}
} while (advance_to_next_range());
return false;
}
// Returns true if the range expressed by start and end (as in position_range) overlaps
// with clustering ranges.
// Must be called with monotonic start position. That position must also be greater than
// the last position passed to the other advance_to() overload.
// Idempotent.
bool advance_to(position_in_partition_view start, position_in_partition_view end) {
position_in_partition::less_compare less(_schema);
do {
if (!less(_current_start, end)) {
break;
}
if (less(start, _current_end)) {
return true;
}
} while (advance_to_next_range());
return false;
}
// Returns true if the range tombstone expressed by start and end (as in position_range) overlaps
// with clustering ranges.
// No monotonicity restrictions on argument values across calls.
// Does not affect lower_bound().
// Idempotent.
bool contains_tombstone(position_in_partition_view start, position_in_partition_view end) const {
position_in_partition::less_compare less(_schema);
if (_trim && !less(*_trim, end)) {
return false;
}
auto i = _current;
while (i != _end) {
auto range_start = position_in_partition_view::for_range_start(*i);
if (!less(range_start, end)) {
return false;
}
auto range_end = position_in_partition_view::for_range_end(*i);
if (less(start, range_end)) {
return true;
}
++i;
}
return false;
}
// Returns true if advanced past all contained positions. Any later advance_to() until reset() will return false.
bool out_of_range() const {
return !_in_current && _current == _end;
}
// Resets the state of the walker so that advance_to() can be now called for new sequence of positions.
// Any range trimmings still hold after this.
void reset() {
auto trim = std::move(_trim);
auto ctr = _change_counter;
*this = clustering_ranges_walker(_schema, _ranges, _with_static_row);
_change_counter = ctr + 1;
if (trim) {
trim_front(std::move(*trim));
}
}
// Can be called only when !out_of_range()
position_in_partition_view lower_bound() const {
return _current_start;
}
// When lower_bound() changes, this also does
// Always > 0.
size_t lower_bound_change_counter() const {
return _change_counter;
}
};

View File

@@ -1,3 +0,0 @@
# Scylla Coding Style
Please see the [Seastar style document](https://github.com/scylladb/seastar/blob/master/coding-style.md).

View File

@@ -21,9 +21,6 @@
#pragma once
#include "sstables/shared_sstable.hh"
#include "exceptions/exceptions.hh"
class column_family;
class schema;
using schema_ptr = lw_shared_ptr<const schema>;
@@ -36,14 +33,12 @@ enum class compaction_strategy_type {
size_tiered,
leveled,
date_tiered,
time_window,
};
class compaction_strategy_impl;
class sstable;
class sstable_set;
struct compaction_descriptor;
struct resharding_descriptor;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -57,20 +52,15 @@ public:
compaction_strategy& operator=(compaction_strategy&&);
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<shared_sstable> candidates);
std::vector<resharding_descriptor> get_resharding_jobs(column_family& cf, std::vector<shared_sstable> candidates);
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
// Some strategies may look at the compacted and resulting sstables to
// get some useful information for subsequent compactions.
void notify_completion(const std::vector<shared_sstable>& removed, const std::vector<shared_sstable>& added);
void notify_completion(schema_ptr schema, const std::vector<lw_shared_ptr<sstable>>& removed, const std::vector<lw_shared_ptr<sstable>>& added);
// Return if parallel compaction is allowed by strategy.
bool parallel_compaction() const;
// Return if optimization to rule out sstables based on clustering key filter should be applied.
bool use_clustering_key_filter() const;
// An estimation of number of compaction for strategy to be satisfied.
int64_t estimated_pending_compactions(column_family& cf) const;
@@ -86,8 +76,6 @@ public:
return "LeveledCompactionStrategy";
case compaction_strategy_type::date_tiered:
return "DateTieredCompactionStrategy";
case compaction_strategy_type::time_window:
return "TimeWindowCompactionStrategy";
default:
throw std::runtime_error("Invalid Compaction Strategy");
}
@@ -106,8 +94,6 @@ public:
return compaction_strategy_type::leveled;
} else if (short_name == "DateTieredCompactionStrategy") {
return compaction_strategy_type::date_tiered;
} else if (short_name == "TimeWindowCompactionStrategy") {
return compaction_strategy_type::time_window;
} else {
throw exceptions::configuration_exception(sprint("Unable to find compaction strategy class '%s'", name));
}

View File

@@ -39,9 +39,6 @@ public:
compatible_ring_position(const schema& s, dht::ring_position&& rp)
: _schema(&s), _rp(std::move(rp)) {
}
const dht::token& token() const {
return _rp->token();
}
friend int tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
return x._rp->tri_compare(*x._schema, *y._rp);
}

View File

@@ -22,7 +22,7 @@
#pragma once
#include "types.hh"
#include <iosfwd>
#include <iostream>
#include <algorithm>
#include <vector>
#include <boost/range/iterator_range.hpp>
@@ -130,10 +130,10 @@ public:
bytes decompose_value(const value_type& values) {
return serialize_value(values);
}
class iterator : public std::iterator<std::input_iterator_tag, const bytes_view> {
class iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
private:
bytes_view _v;
bytes_view _current;
value_type _current;
private:
void read_current() {
size_type len;
@@ -220,9 +220,6 @@ public:
assert(AllowPrefixes == allow_prefixes::yes);
return std::distance(begin(v), end(v)) == (ssize_t)_types.size();
}
bool is_empty(bytes_view v) const {
return begin(v) == end(v);
}
void validate(bytes_view v) {
// FIXME: implement
warn(unimplemented::cause::VALIDATION);

View File

@@ -184,8 +184,6 @@ bytes to_legacy(CompoundType& type, bytes_view packed) {
return legacy_form;
}
class composite_view;
// Represents a value serialized according to Origin's CompositeType.
// If is_compound is true, then the value is one or more components encoded as:
//
@@ -204,7 +202,7 @@ public:
, _is_compound(is_compound)
{ }
explicit composite(bytes&& b)
composite(bytes&& b)
: _bytes(std::move(b))
, _is_compound(true)
{ }
@@ -241,7 +239,7 @@ public:
using component_view = std::pair<bytes_view, eoc>;
private:
template<typename Value, typename = std::enable_if_t<!std::is_same<const data_value, std::decay_t<Value>>::value>>
static size_t size(const Value& val) {
static size_t size(Value& val) {
return val.size();
}
static size_t size(const data_value& val) {
@@ -306,36 +304,23 @@ public:
return f(const_cast<bytes&>(_bytes));
}
// marker is ignored if !is_compound
template<typename RangeOfSerializedComponents>
static composite serialize_value(RangeOfSerializedComponents&& values, bool is_compound = true, eoc marker = eoc::none) {
static bytes serialize_value(RangeOfSerializedComponents&& values, bool is_compound = true) {
auto size = serialized_size(values, is_compound);
bytes b(bytes::initialized_later(), size);
auto i = b.begin();
serialize_value(std::forward<decltype(values)>(values), i, is_compound);
if (is_compound && !b.empty()) {
b.back() = eoc_type(marker);
}
return composite(std::move(b), is_compound);
}
template<typename RangeOfSerializedComponents>
static composite serialize_static(const schema& s, RangeOfSerializedComponents&& values) {
// FIXME: Optimize
auto b = bytes(size_t(2), bytes::value_type(0xff));
std::vector<bytes_view> sv(s.clustering_key_size());
b += composite::serialize_value(boost::range::join(sv, std::forward<RangeOfSerializedComponents>(values)), true).release_bytes();
return composite(std::move(b));
}
static eoc to_eoc(int8_t eoc_byte) {
return eoc_byte == 0 ? eoc::none : (eoc_byte < 0 ? eoc::start : eoc::end);
return b;
}
class iterator : public std::iterator<std::input_iterator_tag, const component_view> {
bytes_view _v;
component_view _current;
private:
eoc to_eoc(int8_t eoc_byte) {
return eoc_byte == 0 ? eoc::none : (eoc_byte < 0 ? eoc::start : eoc::end);
}
void read_current() {
size_type len;
{
@@ -421,10 +406,6 @@ public:
return _bytes;
}
bytes release_bytes() && {
return std::move(_bytes);
}
size_t size() const {
return _bytes.size();
}
@@ -445,20 +426,26 @@ public:
return _is_compound;
}
// The following factory functions assume this composite is a compound value.
template <typename ClusteringElement>
static composite from_clustering_element(const schema& s, const ClusteringElement& ce) {
return serialize_value(ce.components(s), s.is_compound());
return serialize_value(ce.components(s));
}
static composite from_exploded(const std::vector<bytes_view>& v, bool is_compound, eoc marker = eoc::none) {
static composite from_exploded(const std::vector<bytes_view>& v, eoc marker = eoc::none) {
if (v.size() == 0) {
return composite(bytes(size_t(1), bytes::value_type(marker)), is_compound);
return bytes(size_t(1), bytes::value_type(marker));
}
return serialize_value(v, is_compound, marker);
auto b = serialize_value(v);
b.back() = eoc_type(marker);
return composite(std::move(b));
}
static composite static_prefix(const schema& s) {
return serialize_static(s, std::vector<bytes_view>());
static bytes static_marker(size_t(2), bytes::value_type(0xff));
std::vector<bytes_view> sv(s.clustering_key_size());
return static_marker + serialize_value(sv);
}
explicit operator bytes_view() const {
@@ -469,15 +456,6 @@ public:
friend inline std::ostream& operator<<(std::ostream& os, const std::pair<Component, eoc>& c) {
return os << "{value=" << c.first << "; eoc=" << sprint("0x%02x", eoc_type(c.second) & 0xff) << "}";
}
friend std::ostream& operator<<(std::ostream& os, const composite& v);
struct tri_compare {
const std::vector<data_type>& _types;
tri_compare(const std::vector<data_type>& types) : _types(types) {}
int operator()(const composite&, const composite&) const;
int operator()(composite_view, composite_view) const;
};
};
class composite_view final {
@@ -498,15 +476,14 @@ public:
, _is_compound(true)
{ }
std::vector<bytes_view> explode() const {
std::vector<bytes> explode() const {
if (!_is_compound) {
return { _bytes };
return { to_bytes(_bytes) };
}
std::vector<bytes_view> ret;
ret.reserve(8);
std::vector<bytes> ret;
for (auto it = begin(), e = end(); it != e; ) {
ret.push_back(it->first);
ret.push_back(to_bytes(it->first));
auto marker = it->second;
++it;
if (it != e && marker != composite::eoc::none) {
@@ -528,15 +505,6 @@ public:
return { begin(), end() };
}
composite::eoc last_eoc() const {
if (!_is_compound || _bytes.empty()) {
return composite::eoc::none;
}
bytes_view v(_bytes);
v.remove_prefix(v.size() - 1);
return composite::to_eoc(read_simple<composite::eoc_type>(v));
}
auto values() const {
return components() | boost::adaptors::transformed([](auto&& c) { return c.first; });
}
@@ -559,46 +527,4 @@ public:
bool operator==(const composite_view& k) const { return k._bytes == _bytes && k._is_compound == _is_compound; }
bool operator!=(const composite_view& k) const { return !(k == *this); }
friend inline std::ostream& operator<<(std::ostream& os, composite_view v) {
return os << "{" << ::join(", ", v.components()) << ", compound=" << v._is_compound << ", static=" << v.is_static() << "}";
}
};
inline
std::ostream& operator<<(std::ostream& os, const composite& v) {
return os << composite_view(v);
}
inline
int composite::tri_compare::operator()(const composite& v1, const composite& v2) const {
return (*this)(composite_view(v1), composite_view(v2));
}
inline
int composite::tri_compare::operator()(composite_view v1, composite_view v2) const {
// See org.apache.cassandra.db.composites.AbstractCType#compare
if (v1.empty()) {
return v2.empty() ? 0 : -1;
}
if (v2.empty()) {
return 1;
}
if (v1.is_static() != v2.is_static()) {
return v1.is_static() ? -1 : 1;
}
auto a_values = v1.components();
auto b_values = v2.components();
auto cmp = [&](const data_type& t, component_view c1, component_view c2) {
// First by value, then by EOC
auto r = t->compare(c1.first, c2.first);
if (r) {
return r;
}
return static_cast<int>(c1.second) - static_cast<int>(c2.second);
};
return lexicographical_tri_compare(_types.begin(), _types.end(),
a_values.begin(), a_values.end(),
b_values.begin(), b_values.end(),
cmp);
}

View File

@@ -39,17 +39,17 @@ public:
static constexpr auto CHUNK_LENGTH_KB = "chunk_length_kb";
static constexpr auto CRC_CHECK_CHANCE = "crc_check_chance";
private:
compressor _compressor;
compressor _compressor = compressor::none;
std::experimental::optional<int> _chunk_length;
std::experimental::optional<double> _crc_check_chance;
public:
compression_parameters(compressor c = compressor::lz4) : _compressor(c) { }
compression_parameters() = default;
compression_parameters(compressor c) : _compressor(c) { }
compression_parameters(const std::map<sstring, sstring>& options) {
validate_options(options);
auto it = options.find(SSTABLE_COMPRESSION);
if (it == options.end() || it->second.empty()) {
_compressor = compressor::none;
return;
}
const auto& compressor_class = it->second;

View File

@@ -12,9 +12,7 @@
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
# It is recommended to change the default value when creating a new cluster.
# You can NOT modify this value for an existing cluster
#cluster_name: 'Test Cluster'
cluster_name: 'Test Cluster'
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
@@ -87,26 +85,10 @@ listen_address: localhost
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4
# When using multiple physical network interfaces, set this to true to listen on broadcast_address
# in addition to the listen_address, allowing nodes to communicate in both interfaces.
# Ignore this property if the network configuration automatically routes between the public and private networks such as EC2.
#
# listen_on_broadcast_address: false
# port for the CQL native transport to listen for clients on
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
native_transport_port: 9042
# Enabling native transport encryption in client_encryption_options allows you to either use
# encryption for the standard port or to use a dedicated, additional port along with the unencrypted
# standard native_transport_port.
# Enabling client encryption and keeping native_transport_port_ssl disabled will use encryption
# for native_transport_port. Setting native_transport_port_ssl to a different value
# from native_transport_port will use encryption for native_transport_port_ssl while
# keeping native_transport_port unencrypted.
#native_transport_port_ssl: 9142
# Throttles all outbound streaming file transfers on this node to the
# given total throughput in Mbps. This is necessary because Scylla does
# mostly sequential IO when streaming data during bootstrap or repair, which
@@ -210,9 +192,6 @@ api_address: 127.0.0.1
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
# Fail any multiple-partition batch exceeding this value. 50kb (10x warn threshold) by default.
batch_size_fail_threshold_in_kb: 50
# Authentication backend, identifying users
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
@@ -232,25 +211,16 @@ batch_size_fail_threshold_in_kb: 50
# increase system_auth keyspace replication factor if you use this authorizer.
# authorizer: AllowAllAuthorizer
###################################################
## Not currently supported, reserved for future use
###################################################
# initial_token allows you to specify tokens manually. While you can use # it with
# vnodes (num_tokens > 1, above) -- in which case you should provide a
# comma-separated list -- it's primarily used when adding nodes # to legacy clusters
# that do not have vnodes enabled.
# initial_token:
# RPC address to broadcast to drivers and other Scylla nodes. This cannot
# be set to 0.0.0.0. If left blank, this will be set to the value of
# rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must
# be set.
# broadcast_rpc_address: 1.2.3.4
# Uncomment to enable experimental features
# experimental: true
###################################################
## Not currently supported, reserved for future use
###################################################
# See http://wiki.apache.org/cassandra/HintedHandoff
# May either be "true" or "false" to enable globally, or contain a list
# of data centers to enable per-datacenter.
@@ -279,17 +249,17 @@ batch_size_fail_threshold_in_kb: 50
# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
# one example). Defaults to 10000, set to 0 to disable.
# one example). Defaults to 2000, set to 0 to disable.
# Will be disabled automatically for AllowAllAuthorizer.
# permissions_validity_in_ms: 10000
# permissions_validity_in_ms: 2000
# Refresh interval for permissions cache (if enabled).
# After this interval, cache entries become eligible for refresh. Upon next
# access, an async reload is scheduled and the old value returned until it
# completes. If permissions_validity_in_ms is non-zero, then this also must have
# a non-zero value. Defaults to 2000. It's recommended to set this value to
# be at least 3 times smaller than the permissions_validity_in_ms.
# permissions_update_interval_in_ms: 2000
# completes. If permissions_validity_in_ms is non-zero, then this must be
# also.
# Defaults to the same value as permissions_validity_in_ms.
# permissions_update_interval_in_ms: 1000
# The partitioner is responsible for distributing groups of rows (by
# partition key) across nodes in the cluster. You should leave this
@@ -303,6 +273,28 @@ batch_size_fail_threshold_in_kb: 50
#
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# policy for data disk failures:
# die: shut down gossip and Thrift and kill the JVM for any fs errors or
# single-sstable errors, so the node can be replaced.
# stop_paranoid: shut down gossip and Thrift even for single-sstable errors.
# stop: shut down gossip and Thrift, leaving the node effectively dead, but
# can still be inspected via JMX.
# best_effort: stop using the failed disk and respond to requests based on
# remaining available sstables. This means you WILL see obsolete
# data at CL.ONE!
# ignore: ignore fatal errors and let requests fail, as in pre-1.2 Scylla
# disk_failure_policy: stop
# policy for commit disk failures:
# die: shut down gossip and Thrift and kill the JVM, so the node can be replaced.
# stop: shut down gossip and Thrift, leaving the node effectively dead, but
# can still be inspected via JMX.
# stop_commit: shutdown the commit log, letting writes collect but
# continuing to service reads, as in pre-2.0.5 Scylla
# ignore: ignore fatal errors and let the batches fail
# commit_failure_policy: stop
# Maximum size of the key cache in memory.
#
# Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
@@ -417,6 +409,29 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# the smaller of 1/4 of heap or 512MB.
# file_cache_size_in_mb: 512
# Total permitted memory to use for memtables. Scylla will stop
# accepting writes when the limit is exceeded until a flush completes,
# and will trigger a flush based on memtable_cleanup_threshold
# If omitted, Scylla will set both to 1/4 the size of the heap.
# memtable_heap_space_in_mb: 2048
# memtable_offheap_space_in_mb: 2048
# Ratio of occupied non-flushing memtable size to total permitted size
# that will trigger a flush of the largest memtable. Lager mct will
# mean larger flushes and hence less compaction, but also less concurrent
# flush activity which can make it difficult to keep your disks fed
# under heavy write load.
#
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11
# Specify the way Scylla allocates and manages memtable memory.
# Options are:
# heap_buffers: on heap nio buffers
# offheap_buffers: off heap (direct) nio buffers
# offheap_objects: native memory, eliminating nio buffer heap overhead
# memtable_allocation_type: heap_buffers
# Total space to use for commitlogs.
#
# If space gets above this value (it will round up to the next nearest
@@ -428,6 +443,17 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# available for Scylla.
commitlog_total_space_in_mb: -1
# This sets the amount of memtable flush writer threads. These will
# be blocked by disk io, and each one will hold a memtable in memory
# while blocked.
#
# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
# A fixed memory pool size in MB for for SSTable index summaries. If left
# empty, this will default to 5% of the heap size. If the memory usage of
# all index summaries exceeds this limit, SSTables with low read rates will
@@ -492,6 +518,13 @@ commitlog_total_space_in_mb: -1
# Whether to start the thrift rpc server.
# start_rpc: true
# RPC address to broadcast to drivers and other Scylla nodes. This cannot
# be set to 0.0.0.0. If left blank, this will be set to the value of
# rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must
# be set.
# broadcast_rpc_address: 1.2.3.4
# enable or disable keepalive on rpc/native connections
# rpc_keepalive: true
@@ -729,17 +762,22 @@ commitlog_total_space_in_mb: -1
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# truststore: <none, use system trust>
# require_client_auth: False
# priority_string: <none, use default>
# enable or disable client/server encryption.
# client_encryption_options:
# enabled: false
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# truststore: <none, use system trust>
# require_client_auth: False
# priority_string: <none, use default>
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
# truststore: conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
# internode_compression controls whether traffic between nodes is
# compressed.
@@ -775,33 +813,3 @@ commitlog_total_space_in_mb: -1
# freeing processor resources when there is other work to be done.
#
# defragment_memory_on_idle: true
#
# prometheus port
# By default, Scylla opens prometheus API port on port 9180
# setting the port to 0 will disable the prometheus API.
# prometheus_port: 9180
#
# prometheus address
# By default, Scylla binds all interfaces to the prometheus API
# It is possible to restrict the listening address to a specific one
# prometheus_address: 0.0.0.0
# Distribution of data among cores (shards) within a node
#
# Scylla distributes data within a node among shards, using a round-robin
# strategy:
# [shard0] [shard1] ... [shardN-1] [shard0] [shard1] ... [shardN-1] ...
#
# Scylla versions 1.6 and below used just one repetition of the pattern;
# this intefered with data placement among nodes (vnodes).
#
# Scylla versions 1.7 and above use 4096 repetitions of the pattern; this
# provides for better data distribution.
#
# the value below is log (base 2) of the number of repetitions.
#
# Set to 0 to avoid rewriting all data when upgrading from Scylla 1.6 and
# below.
#
# Keep at 12 for new clusters.
murmur3_partitioner_ignore_msb_bits: 12

View File

@@ -86,14 +86,14 @@ def try_compile(compiler, source = '', flags = []):
with tempfile.NamedTemporaryFile() as sfile:
sfile.file.write(bytes(source, 'utf-8'))
sfile.file.flush()
return subprocess.call([compiler, '-x', 'c++', '-o', '/dev/null', '-c', sfile.name] + args.user_cflags.split() + flags,
return subprocess.call([compiler, '-x', 'c++', '-o', '/dev/null', '-c', sfile.name] + flags,
stdout = subprocess.DEVNULL,
stderr = subprocess.DEVNULL) == 0
def warning_supported(warning, compiler):
# gcc ignores -Wno-x even if it is not supported
adjusted = re.sub('^-Wno-', '-W', warning)
return try_compile(flags = ['-Werror', adjusted], compiler = compiler)
return try_compile(flags = [adjusted], compiler = compiler)
def debug_flag(compiler):
src_with_auto = textwrap.dedent('''\
@@ -108,11 +108,6 @@ def debug_flag(compiler):
print('Note: debug information disabled; upgrade your compiler')
return ''
def maybe_static(flag, libs):
if flag and not args.static:
libs = '-Wl,-Bstatic {} -Wl,-Bdynamic'.format(libs)
return libs
class Thrift(object):
def __init__(self, source, service):
self.source = source
@@ -167,9 +162,7 @@ modes = {
scylla_tests = [
'tests/mutation_test',
'tests/mvcc_test',
'tests/streamed_mutation_test',
'tests/flat_mutation_reader_test',
'tests/schema_registry_test',
'tests/canonical_mutation_test',
'tests/range_test',
@@ -177,8 +170,6 @@ scylla_tests = [
'tests/keys_test',
'tests/partitioner_test',
'tests/frozen_mutation_test',
'tests/serialized_action_test',
'tests/clustering_ranges_walker_test',
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
@@ -187,22 +178,18 @@ scylla_tests = [
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/perf/perf_simple_query',
'tests/perf/perf_fast_forward',
'tests/perf/perf_cache_eviction',
'tests/cache_flat_mutation_reader_test',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/perf/perf_sstable',
'tests/cql_query_test',
'tests/storage_proxy_test',
'tests/schema_change_test',
'tests/mutation_reader_test',
'tests/key_reader_test',
'tests/mutation_query_test',
'tests/row_cache_test',
'tests/test-serialization',
'tests/sstable_test',
'tests/sstable_mutation_test',
'tests/sstable_resharding_test',
'tests/memtable_test',
'tests/commitlog_test',
'tests/cartesian_product_test',
@@ -224,7 +211,6 @@ scylla_tests = [
'tests/murmur_hash_test',
'tests/allocation_strategy_test',
'tests/logalloc_test',
'tests/log_heap_test',
'tests/managed_vector_test',
'tests/crc_test',
'tests/flush_queue_test',
@@ -234,22 +220,6 @@ scylla_tests = [
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/database_test',
'tests/nonwrapping_range_test',
'tests/input_stream_test',
'tests/sstable_atomic_deletion_test',
'tests/virtual_reader_test',
'tests/view_schema_test',
'tests/counter_test',
'tests/cell_locker_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/loading_cache_test',
'tests/castas_fcts_test',
'tests/big_decimal_test',
'tests/aggregate_fcts_test',
]
apps = [
@@ -280,8 +250,6 @@ arg_parser.add_argument('--ldflags', action = 'store', dest = 'user_ldflags', de
help = 'Extra flags for the linker')
arg_parser.add_argument('--compiler', action = 'store', dest = 'cxx', default = 'g++',
help = 'C++ compiler path')
arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='gcc',
help='C compiler path')
arg_parser.add_argument('--with-osv', action = 'store', dest = 'with_osv', default = '',
help = 'Shortcut for compile for OSv')
arg_parser.add_argument('--enable-dpdk', action = 'store_true', dest = 'dpdk', default = False,
@@ -293,19 +261,13 @@ arg_parser.add_argument('--debuginfo', action = 'store', dest = 'debuginfo', typ
arg_parser.add_argument('--static-stdc++', dest = 'staticcxx', action = 'store_true',
help = 'Link libgcc and libstdc++ statically')
arg_parser.add_argument('--static-thrift', dest = 'staticthrift', action = 'store_true',
help = 'Link libthrift statically')
arg_parser.add_argument('--static-boost', dest = 'staticboost', action = 'store_true',
help = 'Link boost statically')
help = 'Link libthrift statically')
arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
help = 'Enable(1)/disable(0)compiler debug information generation for tests')
arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
help = 'Python3 path')
add_tristate(arg_parser, name = 'hwloc', dest = 'hwloc', help = 'hwloc support')
add_tristate(arg_parser, name = 'xen', dest = 'xen', help = 'Xen support')
arg_parser.add_argument('--enable-gcc6-concepts', dest='gcc6_concepts', action='store_true', default=False,
help='enable experimental support for C++ Concepts as implemented in GCC 6')
arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_injector', action='store_true', default=False,
help='enable allocation failure injection')
args = arg_parser.parse_args()
defines = []
@@ -328,26 +290,24 @@ scylla_core = (['database.cc',
'memtable.cc',
'schema_mutations.cc',
'release.cc',
'supervisor.cc',
'utils/logalloc.cc',
'utils/large_bitset.cc',
'mutation_partition.cc',
'mutation_partition_view.cc',
'mutation_partition_serializer.cc',
'mutation_reader.cc',
'flat_mutation_reader.cc',
'mutation_query.cc',
'key_reader.cc',
'keys.cc',
'counters.cc',
'clustering_key_filter.cc',
'sstables/sstables.cc',
'sstables/compress.cc',
'sstables/row.cc',
'sstables/partition.cc',
'sstables/filter.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/compaction_manager.cc',
'sstables/atomic_deletion.cc',
'sstables/integrity_checked_file_impl.cc',
'transport/event.cc',
'transport/event_notifier.cc',
'transport/server.cc',
@@ -362,19 +322,15 @@ scylla_core = (['database.cc',
'cql3/sets.cc',
'cql3/maps.cc',
'cql3/functions/functions.cc',
'cql3/functions/castas_fcts.cc',
'cql3/statements/cf_prop_defs.cc',
'cql3/statements/cf_statement.cc',
'cql3/statements/authentication_statement.cc',
'cql3/statements/create_keyspace_statement.cc',
'cql3/statements/create_table_statement.cc',
'cql3/statements/create_view_statement.cc',
'cql3/statements/create_type_statement.cc',
'cql3/statements/create_user_statement.cc',
'cql3/statements/drop_index_statement.cc',
'cql3/statements/drop_keyspace_statement.cc',
'cql3/statements/drop_table_statement.cc',
'cql3/statements/drop_view_statement.cc',
'cql3/statements/drop_type_statement.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
@@ -391,7 +347,6 @@ scylla_core = (['database.cc',
'cql3/statements/create_index_statement.cc',
'cql3/statements/truncate_statement.cc',
'cql3/statements/alter_table_statement.cc',
'cql3/statements/alter_view_statement.cc',
'cql3/statements/alter_user_statement.cc',
'cql3/statements/drop_user_statement.cc',
'cql3/statements/list_users_statement.cc',
@@ -437,22 +392,16 @@ scylla_core = (['database.cc',
'cql3/selection/selector.cc',
'cql3/restrictions/statement_restrictions.cc',
'cql3/result_set.cc',
'cql3/variable_specifications.cc',
'db/consistency_level.cc',
'db/system_keyspace.cc',
'db/schema_tables.cc',
'db/cql_type_parser.cc',
'db/legacy_schema_migrator.cc',
'db/commitlog/commitlog.cc',
'db/commitlog/commitlog_replayer.cc',
'db/commitlog/commitlog_entry.cc',
'db/config.cc',
'db/heat_load_balance.cc',
'db/index/secondary_index.cc',
'db/marshal/type_parser.cc',
'db/batchlog_manager.cc',
'db/view/view.cc',
'index/secondary_index_manager.cc',
'io/io.cc',
'utils/utils.cc',
'utils/UUID_gen.cc',
@@ -464,7 +413,6 @@ scylla_core = (['database.cc',
'utils/dynamic_bitset.cc',
'utils/managed_bytes.cc',
'utils/exceptions.cc',
'utils/config_file.cc',
'gms/version_generator.cc',
'gms/versioned_value.cc',
'gms/gossiper.cc',
@@ -474,11 +422,9 @@ scylla_core = (['database.cc',
'gms/gossip_digest_ack2.cc',
'gms/endpoint_state.cc',
'gms/application_state.cc',
'gms/inet_address.cc',
'dht/i_partitioner.cc',
'dht/murmur3_partitioner.cc',
'dht/byte_ordered_partitioner.cc',
'dht/random_partitioner.cc',
'dht/boot_strapper.cc',
'dht/range_streamer.cc',
'unimplemented.cc',
@@ -502,7 +448,7 @@ scylla_core = (['database.cc',
'service/client_state.cc',
'service/migration_task.cc',
'service/storage_service.cc',
'service/misc_services.cc',
'service/load_broadcaster.cc',
'service/pager/paging_state.cc',
'service/pager/query_pagers.cc',
'streaming/stream_task.cc',
@@ -518,33 +464,26 @@ scylla_core = (['database.cc',
'streaming/stream_manager.cc',
'streaming/stream_result_future.cc',
'streaming/stream_session_state.cc',
'clocks-impl.cc',
'gc_clock.cc',
'partition_slice_builder.cc',
'init.cc',
'lister.cc',
'repair/repair.cc',
'exceptions/exceptions.cc',
'auth/allow_all_authenticator.cc',
'auth/allow_all_authorizer.cc',
'dns.cc',
'auth/auth.cc',
'auth/authenticated_user.cc',
'auth/authenticator.cc',
'auth/common.cc',
'auth/authorizer.cc',
'auth/default_authorizer.cc',
'auth/data_resource.cc',
'auth/password_authenticator.cc',
'auth/permission.cc',
'auth/permissions_cache.cc',
'auth/service.cc',
'auth/transitional.cc',
'tracing/tracing.cc',
'tracing/trace_keyspace_helper.cc',
'tracing/trace_state.cc',
'table_helper.cc',
'range_tombstone.cc',
'range_tombstone_list.cc',
'disk-error-handler.cc',
'duration.cc',
'vint-serialization.cc',
'db/size_estimates_recorder.cc'
]
+ [Antlr3Grammar('cql3/Cql.g')]
+ [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -605,8 +544,6 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/idl_test.idl.hh',
'idl/commitlog.idl.hh',
'idl/tracing.idl.hh',
'idl/consistency_level.idl.hh',
'idl/cache_temperature.idl.hh',
]
scylla_tests_dependencies = scylla_core + api + idls + [
@@ -625,92 +562,61 @@ deps = {
'scylla': idls + ['main.cc'] + scylla_core + api,
}
pure_boost_tests = set([
tests_not_using_seastar_test_framework = set([
'tests/keys_test',
'tests/partitioner_test',
'tests/map_difference_test',
'tests/keys_test',
'tests/compound_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/nonwrapping_range_test',
'tests/test-serialization',
'tests/range_test',
'tests/crc_test',
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/cartesian_product_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/big_decimal_test',
])
tests_not_using_seastar_test_framework = set([
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
'tests/row_cache_alloc_stress',
'tests/perf_row_cache_update',
'tests/cartesian_product_test',
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/message',
'tests/perf/perf_simple_query',
'tests/perf/perf_fast_forward',
'tests/perf/perf_cache_eviction',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/test-serialization',
'tests/gossip',
'tests/compound_test',
'tests/range_test',
'tests/crc_test',
'tests/perf/perf_sstable',
]) | pure_boost_tests
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
])
for t in tests_not_using_seastar_test_framework:
if not t in scylla_tests:
raise Exception("Test %s not found in scylla_tests" % (t))
for t in scylla_tests:
deps[t] = [t + '.cc']
deps[t] = scylla_tests_dependencies + [t + '.cc']
if t not in tests_not_using_seastar_test_framework:
deps[t] += scylla_tests_dependencies
deps[t] += scylla_tests_seastar_deps
else:
deps[t] += scylla_core + api + idls + ['tests/cql_test_env.cc']
deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc', 'tests/sstable_utils.cc']
deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']
deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
warnings = [
'-Wno-mismatched-tags', # clang-only
'-Wno-maybe-uninitialized', # false positives on gcc 5
'-Wno-tautological-compare',
'-Wno-parentheses-equality',
'-Wno-c++11-narrowing',
'-Wno-c++1z-extensions',
'-Wno-sometimes-uninitialized',
'-Wno-return-stack-address',
'-Wno-missing-braces',
'-Wno-unused-lambda-capture',
'-Wno-misleading-indentation',
'-Wno-overflow',
'-Wno-noexcept-type',
'-Wno-nonnull-compare'
]
warnings = [w
for w in warnings
if warning_supported(warning = w, compiler = args.cxx)]
warnings = ' '.join(warnings + ['-Wno-error=deprecated-declarations'])
warnings = ' '.join(warnings)
dbgflag = debug_flag(args.cxx) if args.debuginfo else ''
tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'
@@ -764,9 +670,6 @@ if not try_compile(compiler=args.cxx, source='''\
print('Installed boost version too old. Please update {}.'.format(pkgname("boost-devel")))
sys.exit(1)
has_sanitize_address_use_after_scope = try_compile(compiler=args.cxx, flags=['-fsanitize-address-use-after-scope'], source='int f() {}')
defines = ' '.join(['-D' + d for d in defines])
globals().update(vars(args))
@@ -789,7 +692,7 @@ scylla_release = file.read().strip()
extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
seastar_flags = []
seastar_flags = ['--disable-xen']
if args.dpdk:
# fake dependencies on dpdk, so that it is built before anything else
seastar_flags += ['--enable-dpdk']
@@ -797,16 +700,9 @@ elif args.dpdk_target:
seastar_flags += ['--dpdk-target', args.dpdk_target]
if args.staticcxx:
seastar_flags += ['--static-stdc++']
if args.staticboost:
seastar_flags += ['--static-boost']
if args.gcc6_concepts:
seastar_flags += ['--enable-gcc6-concepts']
if args.alloc_failure_injector:
seastar_flags += ['--enable-alloc-failure-injector']
seastar_cflags = args.user_cflags + " -march=nehalem"
seastar_ldflags = args.user_ldflags
seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' %(seastar_ldflags)]
seastar_flags += ['--compiler', args.cxx, '--cflags=%s' % (seastar_cflags)]
status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')
@@ -837,14 +733,7 @@ for mode in build_modes:
seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'
args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
libs = ' '.join(['-lyaml-cpp', '-llz4', '-lz', '-lsnappy', pkg_config("--libs", "jsoncpp"),
maybe_static(args.staticboost, '-lboost_filesystem'), ' -lcrypt',
maybe_static(args.staticboost, '-lboost_date_time'),
])
if not args.staticboost:
args.user_cflags += ' -DBOOST_TEST_DYN_LINK'
libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem' + ' -lcrypt' + ' -lboost_date_time'
for pkg in pkgs:
args.user_cflags += ' ' + pkg_config('--cflags', pkg)
libs += ' ' + pkg_config('--libs', pkg)
@@ -870,12 +759,10 @@ with open(buildfile, 'w') as f:
builddir = {outdir}
cxx = {cxx}
cxxflags = {user_cflags} {warnings} {defines}
ldflags = -fuse-ld=gold {user_ldflags}
ldflags = {user_ldflags}
libs = {libs}
pool link_pool
depth = {link_pool_depth}
pool seastar_pool
depth = 1
rule ragel
command = ragel -G2 -o $out $in
description = RAGEL $out
@@ -901,7 +788,7 @@ with open(buildfile, 'w') as f:
f.write(textwrap.dedent('''\
cxxflags_{mode} = -I. -I $builddir/{mode}/gen -I seastar -I seastar/build/{mode}/gen
rule cxx.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -c -o $out $in
command = $cxx -MMD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} -c -o $out $in
description = CXX $out
depfile = $out.d
rule link.{mode}
@@ -919,17 +806,7 @@ with open(buildfile, 'w') as f:
command = thrift -gen cpp:cob_style -out $builddir/{mode}/gen $in
description = THRIFT $in
rule antlr3.{mode}
# We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.
# Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
# name, we also add a global typedef to avoid compilation errors.
command = sed -e '/^#if 0/,/^#endif/d' $in > $builddir/{mode}/gen/$in $
&& antlr3 $builddir/{mode}/gen/$in $
&& sed -i -e 's/^\\( *\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$$/\\1const \\2/' $
-e '1i using ExceptionBaseType = int;' $
-e 's/^{{/{{ ExceptionBaseType\* ex = nullptr;/; $
s/ExceptionBaseType\* ex = new/ex = new/; $
s/exceptions::syntax_exception e/exceptions::syntax_exception\& e/' $
build/{mode}/gen/${{stem}}Parser.cpp
command = sed -e '/^#if 0/,/^#endif/d' $in > $builddir/{mode}/gen/$in && antlr3 $builddir/{mode}/gen/$in && sed -i 's/^\\( *\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$$/\\1const \\2/' build/{mode}/gen/${{stem}}Parser.cpp
description = ANTLR3 $in
''').format(mode = mode, **modeval))
f.write('build {mode}: phony {artifacts}\n'.format(mode = mode,
@@ -954,15 +831,22 @@ with open(buildfile, 'w') as f:
objs += dep.objects('$builddir/' + mode + '/gen')
if isinstance(dep, Antlr3Grammar):
objs += dep.objects('$builddir/' + mode + '/gen')
if binary.endswith('.a'):
if binary.endswith('.pc'):
vars = modeval.copy()
vars.update(globals())
pc = textwrap.dedent('''\
Name: Seastar
URL: http://seastar-project.org/
Description: Advanced C++ framework for high-performance server applications on modern hardware.
Version: 1.0
Libs: -L{srcdir}/{builddir} -Wl,--whole-archive -lseastar -Wl,--no-whole-archive {dbgflag} -Wl,--no-as-needed {static} {pie} -fvisibility=hidden -pthread {user_ldflags} {libs} {sanitize_libs}
Cflags: -std=gnu++1y {dbgflag} {fpie} -Wall -Werror -fvisibility=hidden -pthread -I{srcdir} -I{srcdir}/{builddir}/gen {user_cflags} {warnings} {defines} {sanitize} {opt}
''').format(builddir = 'build/' + mode, srcdir = os.getcwd(), **vars)
f.write('build $builddir/{}/{}: gen\n text = {}\n'.format(mode, binary, repr(pc)))
elif binary.endswith('.a'):
f.write('build $builddir/{}/{}: ar.{} {}\n'.format(mode, binary, mode, str.join(' ', objs)))
else:
if binary.startswith('tests/'):
local_libs = '$libs'
if binary not in tests_not_using_seastar_test_framework or binary in pure_boost_tests:
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
if has_thrift:
local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
# Our code's debugging information is huge, and multiplied
# by many tests yields ridiculous amounts of disk space.
# So we strip the tests by default; The user can very
@@ -970,15 +854,15 @@ with open(buildfile, 'w') as f:
# to the test name, e.g., "ninja build/release/testname_g"
f.write('build $builddir/{}/{}: {}.{} {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
f.write(' libs = {}\n'.format(local_libs))
if has_thrift:
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
f.write('build $builddir/{}/{}_g: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
f.write(' libs = {}\n'.format(local_libs))
else:
f.write('build $builddir/{}/{}: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
if has_thrift:
f.write(' libs = {} {} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system')))
if has_thrift:
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
for src in srcs:
if src.endswith('.cc'):
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
@@ -1017,7 +901,7 @@ with open(buildfile, 'w') as f:
f.write('build {}: ragel {}\n'.format(hh, src))
for hh in swaggers:
src = swaggers[hh]
f.write('build {}: swagger {} | seastar/json/json2code.py\n'.format(hh,src))
f.write('build {}: swagger {}\n'.format(hh,src))
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh,src))
@@ -1034,12 +918,8 @@ with open(buildfile, 'w') as f:
for cc in grammar.sources('$builddir/{}/gen'.format(mode)):
obj = cc.replace('.cpp', '.o')
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
if cc.endswith('Parser.cpp') and has_sanitize_address_use_after_scope:
# Parsers end up using huge amounts of stack space and overflowing their stack
f.write(' obj_cxxflags = -fno-sanitize-address-use-after-scope\n')
f.write('build seastar/build/{mode}/libseastar.a seastar/build/{mode}/apps/iotune/iotune seastar/build/{mode}/gen/http/request_parser.hh seastar/build/{mode}/gen/http/http_response_parser.hh: ninja {seastar_deps}\n'
.format(**locals()))
f.write(' pool = seastar_pool\n')
f.write(' subdir = seastar\n')
f.write(' target = build/{mode}/libseastar.a build/{mode}/apps/iotune/iotune build/{mode}/gen/http/request_parser.hh build/{mode}/gen/http/http_response_parser.hh\n'.format(**locals()))
f.write(textwrap.dedent('''\

View File

@@ -22,7 +22,6 @@
#pragma once
#include "mutation_partition_view.hh"
#include "mutation_partition.hh"
#include "schema.hh"
// Mutation partition visitor which applies visited data into
@@ -38,12 +37,12 @@ private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
dst.apply(new_def, atomic_cell_or_collection(cell));
}
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
if (!is_compatible(new_def, old_type, kind)) {
return;
}
@@ -95,8 +94,8 @@ public:
_p.apply_row_tombstone(_p_schema, rt);
}
virtual void accept_row(position_in_partition_view key, const row_tombstone& deleted_at, const row_marker& rm, is_dummy dummy, is_continuous continuous) override {
deletable_row& r = _p.clustered_row(_p_schema, key, dummy, continuous);
virtual void accept_row(clustering_key_view key, tombstone deleted_at, const row_marker& rm) override {
deletable_row& r = _p.clustered_row(_p_schema, key);
r.apply(rm);
r.apply(deleted_at);
_current_row = &r;
@@ -117,14 +116,4 @@ public:
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
}
}
// Appends the cell to dst upgrading it to the new schema.
// Cells must have monotonic names.
static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, const atomic_cell_or_collection& cell) {
if (new_def.is_atomic()) {
accept_cell(dst, kind, new_def, old_type, cell.as_atomic_cell());
} else {
accept_cell(dst, kind, new_def, old_type, cell.as_collection_mutation());
}
}
};

View File

@@ -1,332 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "service/storage_service.hh"
#include "counters.hh"
#include "mutation.hh"
#include "combine.hh"
counter_id counter_id::local()
{
return counter_id(service::get_local_storage_service().get_local_id());
}
bool counter_id::less_compare_1_7_4::operator()(const counter_id& a, const counter_id& b) const
{
if (a._most_significant != b._most_significant) {
return a._most_significant < b._most_significant;
} else {
return a._least_significant < b._least_significant;
}
}
std::ostream& operator<<(std::ostream& os, const counter_id& id) {
return os << id.to_uuid();
}
std::ostream& operator<<(std::ostream& os, counter_shard_view csv) {
return os << "{global_shard id: " << csv.id() << " value: " << csv.value()
<< " clock: " << csv.logical_clock() << "}";
}
std::ostream& operator<<(std::ostream& os, counter_cell_view ccv) {
return os << "{counter_cell timestamp: " << ccv.timestamp() << " shards: {" << ::join(", ", ccv.shards()) << "}}";
}
void counter_cell_builder::do_sort_and_remove_duplicates()
{
boost::range::sort(_shards, [] (auto& a, auto& b) { return a.id() < b.id(); });
std::vector<counter_shard> new_shards;
new_shards.reserve(_shards.size());
for (auto& cs : _shards) {
if (new_shards.empty() || new_shards.back().id() != cs.id()) {
new_shards.emplace_back(cs);
} else {
new_shards.back().apply(cs);
}
}
_shards = std::move(new_shards);
_sorted = true;
}
std::vector<counter_shard> counter_cell_view::shards_compatible_with_1_7_4() const
{
auto sorted_shards = boost::copy_range<std::vector<counter_shard>>(shards());
counter_id::less_compare_1_7_4 cmp;
boost::range::sort(sorted_shards, [&] (auto& a, auto& b) {
return cmp(a.id(), b.id());
});
return sorted_shards;
}
static bool apply_in_place(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
{
auto dst_ccmv = counter_cell_mutable_view(dst.as_mutable_atomic_cell());
auto src_ccmv = counter_cell_mutable_view(src.as_mutable_atomic_cell());
auto dst_shards = dst_ccmv.shards();
auto src_shards = src_ccmv.shards();
auto dst_it = dst_shards.begin();
auto src_it = src_shards.begin();
while (src_it != src_shards.end()) {
while (dst_it != dst_shards.end() && dst_it->id() < src_it->id()) {
++dst_it;
}
if (dst_it == dst_shards.end() || dst_it->id() != src_it->id()) {
// Fast-path failed. Revert and fall back to the slow path.
if (dst_it == dst_shards.end()) {
--dst_it;
}
while (src_it != src_shards.begin()) {
--src_it;
while (dst_it->id() != src_it->id()) {
--dst_it;
}
src_it->swap_value_and_clock(*dst_it);
}
return false;
}
if (dst_it->logical_clock() < src_it->logical_clock()) {
dst_it->swap_value_and_clock(*src_it);
} else {
src_it->set_value_and_clock(*dst_it);
}
++src_it;
}
auto dst_ts = dst_ccmv.timestamp();
auto src_ts = src_ccmv.timestamp();
dst_ccmv.set_timestamp(std::max(dst_ts, src_ts));
src_ccmv.set_timestamp(dst_ts);
src.as_mutable_atomic_cell().set_counter_in_place_revert(true);
return true;
}
static void revert_in_place_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
{
assert(dst.can_use_mutable_view() && src.can_use_mutable_view());
auto dst_ccmv = counter_cell_mutable_view(dst.as_mutable_atomic_cell());
auto src_ccmv = counter_cell_mutable_view(src.as_mutable_atomic_cell());
auto dst_shards = dst_ccmv.shards();
auto src_shards = src_ccmv.shards();
auto dst_it = dst_shards.begin();
auto src_it = src_shards.begin();
while (src_it != src_shards.end()) {
while (dst_it != dst_shards.end() && dst_it->id() < src_it->id()) {
++dst_it;
}
assert(dst_it != dst_shards.end() && dst_it->id() == src_it->id());
dst_it->swap_value_and_clock(*src_it);
++src_it;
}
auto dst_ts = dst_ccmv.timestamp();
auto src_ts = src_ccmv.timestamp();
dst_ccmv.set_timestamp(src_ts);
src_ccmv.set_timestamp(dst_ts);
src.as_mutable_atomic_cell().set_counter_in_place_revert(false);
}
bool counter_cell_view::apply_reversibly(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
{
auto dst_ac = dst.as_atomic_cell();
auto src_ac = src.as_atomic_cell();
if (!dst_ac.is_live() || !src_ac.is_live()) {
if (dst_ac.is_live() || (!src_ac.is_live() && compare_atomic_cell_for_merge(dst_ac, src_ac) < 0)) {
std::swap(dst, src);
return true;
}
return false;
}
if (dst_ac.is_counter_update() && src_ac.is_counter_update()) {
auto src_v = src_ac.counter_update_value();
auto dst_v = dst_ac.counter_update_value();
dst = atomic_cell::make_live_counter_update(std::max(dst_ac.timestamp(), src_ac.timestamp()),
src_v + dst_v);
return true;
}
assert(!dst_ac.is_counter_update());
assert(!src_ac.is_counter_update());
if (counter_cell_view(dst_ac).shard_count() >= counter_cell_view(src_ac).shard_count()
&& dst.can_use_mutable_view() && src.can_use_mutable_view()) {
if (apply_in_place(dst, src)) {
return true;
}
}
src.as_mutable_atomic_cell().set_counter_in_place_revert(false);
auto dst_shards = counter_cell_view(dst_ac).shards();
auto src_shards = counter_cell_view(src_ac).shards();
counter_cell_builder result;
combine(dst_shards.begin(), dst_shards.end(), src_shards.begin(), src_shards.end(),
result.inserter(), counter_shard_view::less_compare_by_id(), [] (auto& x, auto& y) {
return x.logical_clock() < y.logical_clock() ? y : x;
});
auto cell = result.build(std::max(dst_ac.timestamp(), src_ac.timestamp()));
src = std::exchange(dst, atomic_cell_or_collection(cell));
return true;
}
void counter_cell_view::revert_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
{
if (dst.as_atomic_cell().is_counter_update()) {
auto src_v = src.as_atomic_cell().counter_update_value();
auto dst_v = dst.as_atomic_cell().counter_update_value();
dst = atomic_cell::make_live(dst.as_atomic_cell().timestamp(),
long_type->decompose(dst_v - src_v));
} else if (src.as_atomic_cell().is_counter_in_place_revert_set()) {
revert_in_place_apply(dst, src);
} else {
std::swap(dst, src);
}
}
stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, atomic_cell_view b)
{
assert(!a.is_counter_update());
assert(!b.is_counter_update());
if (!b.is_live() || !a.is_live()) {
if (b.is_live() || (!a.is_live() && compare_atomic_cell_for_merge(b, a) < 0)) {
return atomic_cell(a);
}
return { };
}
auto a_shards = counter_cell_view(a).shards();
auto b_shards = counter_cell_view(b).shards();
auto a_it = a_shards.begin();
auto a_end = a_shards.end();
auto b_it = b_shards.begin();
auto b_end = b_shards.end();
counter_cell_builder result;
while (a_it != a_end) {
while (b_it != b_end && (*b_it).id() < (*a_it).id()) {
++b_it;
}
if (b_it == b_end || (*a_it).id() != (*b_it).id() || (*a_it).logical_clock() > (*b_it).logical_clock()) {
result.add_shard(counter_shard(*a_it));
}
++a_it;
}
stdx::optional<atomic_cell> diff;
if (!result.empty()) {
diff = result.build(std::max(a.timestamp(), b.timestamp()));
} else if (a.timestamp() > b.timestamp()) {
diff = atomic_cell::make_live(a.timestamp(), bytes_view());
}
return diff;
}
void transform_counter_updates_to_shards(mutation& m, const mutation* current_state, uint64_t clock_offset) {
// FIXME: allow current_state to be frozen_mutation
auto transform_new_row_to_shards = [clock_offset] (auto& cells) {
cells.for_each_cell([clock_offset] (auto, atomic_cell_or_collection& ac_o_c) {
auto acv = ac_o_c.as_atomic_cell();
if (!acv.is_live()) {
return; // continue -- we are in lambda
}
auto delta = acv.counter_update_value();
auto cs = counter_shard(counter_id::local(), delta, clock_offset + 1);
ac_o_c = counter_cell_builder::from_single_shard(acv.timestamp(), cs);
});
};
if (!current_state) {
transform_new_row_to_shards(m.partition().static_row());
for (auto& cr : m.partition().clustered_rows()) {
transform_new_row_to_shards(cr.row().cells());
}
return;
}
clustering_key::less_compare cmp(*m.schema());
auto transform_row_to_shards = [clock_offset] (auto& transformee, auto& state) {
std::deque<std::pair<column_id, counter_shard>> shards;
state.for_each_cell([&] (column_id id, const atomic_cell_or_collection& ac_o_c) {
auto acv = ac_o_c.as_atomic_cell();
if (!acv.is_live()) {
return; // continue -- we are in lambda
}
counter_cell_view ccv(acv);
auto cs = ccv.local_shard();
if (!cs) {
return; // continue
}
shards.emplace_back(std::make_pair(id, counter_shard(*cs)));
});
transformee.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
auto acv = ac_o_c.as_atomic_cell();
if (!acv.is_live()) {
return; // continue -- we are in lambda
}
while (!shards.empty() && shards.front().first < id) {
shards.pop_front();
}
auto delta = acv.counter_update_value();
if (shards.empty() || shards.front().first > id) {
auto cs = counter_shard(counter_id::local(), delta, clock_offset + 1);
ac_o_c = counter_cell_builder::from_single_shard(acv.timestamp(), cs);
} else {
auto& cs = shards.front().second;
cs.update(delta, clock_offset + 1);
ac_o_c = counter_cell_builder::from_single_shard(acv.timestamp(), cs);
shards.pop_front();
}
});
};
transform_row_to_shards(m.partition().static_row(), current_state->partition().static_row());
auto& cstate = current_state->partition();
auto it = cstate.clustered_rows().begin();
auto end = cstate.clustered_rows().end();
for (auto& cr : m.partition().clustered_rows()) {
while (it != end && cmp(it->key(), cr.key())) {
++it;
}
if (it == end || cmp(cr.key(), it->key())) {
transform_new_row_to_shards(cr.row().cells());
continue;
}
transform_row_to_shards(cr.row().cells(), it->row().cells());
}
}

View File

@@ -1,435 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <boost/range/algorithm/find_if.hpp>
#include "atomic_cell_or_collection.hh"
#include "types.hh"
#include "stdx.hh"
class mutation;
class mutation;
class counter_id {
int64_t _least_significant;
int64_t _most_significant;
public:
static_assert(std::is_same<decltype(std::declval<utils::UUID>().get_least_significant_bits()), int64_t>::value
&& std::is_same<decltype(std::declval<utils::UUID>().get_most_significant_bits()), int64_t>::value,
"utils::UUID is expected to work with two signed 64-bit integers");
counter_id() = default;
explicit counter_id(utils::UUID uuid) noexcept
: _least_significant(uuid.get_least_significant_bits())
, _most_significant(uuid.get_most_significant_bits())
{ }
utils::UUID to_uuid() const {
return utils::UUID(_most_significant, _least_significant);
}
bool operator<(const counter_id& other) const {
return to_uuid() < other.to_uuid();
}
bool operator>(const counter_id& other) const {
return other.to_uuid() < to_uuid();
}
bool operator==(const counter_id& other) const {
return to_uuid() == other.to_uuid();
}
bool operator!=(const counter_id& other) const {
return !(*this == other);
}
public:
// (Wrong) Counter ID ordering used by Scylla 1.7.4 and earlier.
struct less_compare_1_7_4 {
bool operator()(const counter_id& a, const counter_id& b) const;
};
public:
static counter_id local();
// For tests.
static counter_id generate_random() {
return counter_id(utils::make_random_uuid());
}
};
static_assert(std::is_pod<counter_id>::value, "counter_id should be a POD type");
std::ostream& operator<<(std::ostream& os, const counter_id& id);
template<typename View>
class basic_counter_shard_view {
enum class offset : unsigned {
id = 0u,
value = unsigned(id) + sizeof(counter_id),
logical_clock = unsigned(value) + sizeof(int64_t),
total_size = unsigned(logical_clock) + sizeof(int64_t),
};
private:
typename View::pointer _base;
private:
template<typename T>
T read(offset off) const {
T value;
std::copy_n(_base + static_cast<unsigned>(off), sizeof(T), reinterpret_cast<signed char*>(&value));
return value;
}
public:
static constexpr auto size = size_t(offset::total_size);
public:
basic_counter_shard_view() = default;
explicit basic_counter_shard_view(typename View::pointer ptr) noexcept
: _base(ptr) { }
counter_id id() const { return read<counter_id>(offset::id); }
int64_t value() const { return read<int64_t>(offset::value); }
int64_t logical_clock() const { return read<int64_t>(offset::logical_clock); }
void swap_value_and_clock(basic_counter_shard_view& other) noexcept {
static constexpr size_t off = size_t(offset::value);
static constexpr size_t size = size_t(offset::total_size) - off;
typename View::value_type tmp[size];
std::copy_n(_base + off, size, tmp);
std::copy_n(other._base + off, size, _base + off);
std::copy_n(tmp, size, other._base + off);
}
void set_value_and_clock(const basic_counter_shard_view& other) noexcept {
static constexpr size_t off = size_t(offset::value);
static constexpr size_t size = size_t(offset::total_size) - off;
std::copy_n(other._base + off, size, _base + off);
}
bool operator==(const basic_counter_shard_view& other) const {
return id() == other.id() && value() == other.value()
&& logical_clock() == other.logical_clock();
}
bool operator!=(const basic_counter_shard_view& other) const {
return !(*this == other);
}
struct less_compare_by_id {
bool operator()(const basic_counter_shard_view& x, const basic_counter_shard_view& y) const {
return x.id() < y.id();
}
};
};
using counter_shard_view = basic_counter_shard_view<bytes_view>;
std::ostream& operator<<(std::ostream& os, counter_shard_view csv);
class counter_shard {
counter_id _id;
int64_t _value;
int64_t _logical_clock;
private:
template<typename T>
static void write(const T& value, bytes::iterator& out) {
out = std::copy_n(reinterpret_cast<const signed char*>(&value), sizeof(T), out);
}
private:
// Shared logic for applying counter_shards and counter_shard_views.
// T is either counter_shard or basic_counter_shard_view<U>.
template<typename T>
GCC6_CONCEPT(requires requires(T shard) {
{ shard.value() } -> int64_t;
{ shard.logical_clock() } -> int64_t;
})
counter_shard& do_apply(T&& other) noexcept {
auto other_clock = other.logical_clock();
if (_logical_clock < other_clock) {
_logical_clock = other_clock;
_value = other.value();
}
return *this;
}
public:
counter_shard(counter_id id, int64_t value, int64_t logical_clock) noexcept
: _id(id)
, _value(value)
, _logical_clock(logical_clock)
{ }
explicit counter_shard(counter_shard_view csv) noexcept
: _id(csv.id())
, _value(csv.value())
, _logical_clock(csv.logical_clock())
{ }
counter_id id() const { return _id; }
int64_t value() const { return _value; }
int64_t logical_clock() const { return _logical_clock; }
counter_shard& update(int64_t value_delta, int64_t clock_increment) noexcept {
_value += value_delta;
_logical_clock += clock_increment;
return *this;
}
counter_shard& apply(counter_shard_view other) noexcept {
return do_apply(other);
}
counter_shard& apply(const counter_shard& other) noexcept {
return do_apply(other);
}
static size_t serialized_size() {
return counter_shard_view::size;
}
void serialize(bytes::iterator& out) const {
write(_id, out);
write(_value, out);
write(_logical_clock, out);
}
};
class counter_cell_builder {
std::vector<counter_shard> _shards;
bool _sorted = true;
private:
void do_sort_and_remove_duplicates();
public:
counter_cell_builder() = default;
counter_cell_builder(size_t shard_count) {
_shards.reserve(shard_count);
}
void add_shard(const counter_shard& cs) {
_shards.emplace_back(cs);
}
void add_maybe_unsorted_shard(const counter_shard& cs) {
add_shard(cs);
if (_sorted && _shards.size() > 1) {
auto current = _shards.rbegin();
auto previous = std::next(current);
_sorted = current->id() > previous->id();
}
}
void sort_and_remove_duplicates() {
if (!_sorted) {
do_sort_and_remove_duplicates();
}
}
size_t serialized_size() const {
return _shards.size() * counter_shard::serialized_size();
}
void serialize(bytes::iterator& out) const {
for (auto&& cs : _shards) {
cs.serialize(out);
}
}
bool empty() const {
return _shards.empty();
}
atomic_cell build(api::timestamp_type timestamp) const {
return atomic_cell::make_live_from_serializer(timestamp, serialized_size(), [this] (bytes::iterator out) {
serialize(out);
});
}
static atomic_cell from_single_shard(api::timestamp_type timestamp, const counter_shard& cs) {
return atomic_cell::make_live_from_serializer(timestamp, counter_shard::serialized_size(), [&cs] (bytes::iterator out) {
cs.serialize(out);
});
}
class inserter_iterator : public std::iterator<std::output_iterator_tag, counter_shard> {
counter_cell_builder* _builder;
public:
explicit inserter_iterator(counter_cell_builder& b) : _builder(&b) { }
inserter_iterator& operator=(const counter_shard& cs) {
_builder->add_shard(cs);
return *this;
}
inserter_iterator& operator=(const counter_shard_view& csv) {
return operator=(counter_shard(csv));
}
inserter_iterator& operator++() { return *this; }
inserter_iterator& operator++(int) { return *this; }
inserter_iterator& operator*() { return *this; };
};
inserter_iterator inserter() {
return inserter_iterator(*this);
}
};
// <counter_id> := <int64_t><int64_t>
// <shard> := <counter_id><int64_t:value><int64_t:logical_clock>
// <counter_cell> := <shard>*
template<typename View>
class basic_counter_cell_view {
protected:
atomic_cell_base<View> _cell;
private:
class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<View>> {
typename View::pointer _current;
basic_counter_shard_view<View> _current_view;
public:
shard_iterator() = default;
shard_iterator(typename View::pointer ptr) noexcept
: _current(ptr), _current_view(ptr) { }
basic_counter_shard_view<View>& operator*() noexcept {
return _current_view;
}
basic_counter_shard_view<View>* operator->() noexcept {
return &_current_view;
}
shard_iterator& operator++() noexcept {
_current += counter_shard_view::size;
_current_view = basic_counter_shard_view<View>(_current);
return *this;
}
shard_iterator operator++(int) noexcept {
auto it = *this;
operator++();
return it;
}
shard_iterator& operator--() noexcept {
_current -= counter_shard_view::size;
_current_view = basic_counter_shard_view<View>(_current);
return *this;
}
shard_iterator operator--(int) noexcept {
auto it = *this;
operator--();
return it;
}
bool operator==(const shard_iterator& other) const noexcept {
return _current == other._current;
}
bool operator!=(const shard_iterator& other) const noexcept {
return !(*this == other);
}
};
public:
boost::iterator_range<shard_iterator> shards() const {
auto bv = _cell.value();
auto begin = shard_iterator(bv.data());
auto end = shard_iterator(bv.data() + bv.size());
return boost::make_iterator_range(begin, end);
}
size_t shard_count() const {
return _cell.value().size() / counter_shard_view::size;
}
public:
// ac must be a live counter cell
explicit basic_counter_cell_view(atomic_cell_base<View> ac) noexcept : _cell(ac) {
assert(_cell.is_live());
assert(!_cell.is_counter_update());
}
api::timestamp_type timestamp() const { return _cell.timestamp(); }
static data_type total_value_type() { return long_type; }
int64_t total_value() const {
return boost::accumulate(shards(), int64_t(0), [] (int64_t v, counter_shard_view cs) {
return v + cs.value();
});
}
stdx::optional<counter_shard_view> get_shard(const counter_id& id) const {
auto it = boost::range::find_if(shards(), [&id] (counter_shard_view csv) {
return csv.id() == id;
});
if (it == shards().end()) {
return { };
}
return *it;
}
stdx::optional<counter_shard_view> local_shard() const {
// TODO: consider caching local shard position
return get_shard(counter_id::local());
}
bool operator==(const basic_counter_cell_view& other) const {
return timestamp() == other.timestamp() && boost::equal(shards(), other.shards());
}
};
struct counter_cell_view : basic_counter_cell_view<bytes_view> {
using basic_counter_cell_view::basic_counter_cell_view;
// Returns counter shards in an order that is compatible with Scylla 1.7.4.
std::vector<counter_shard> shards_compatible_with_1_7_4() const;
// Reversibly applies two counter cells, at least one of them must be live.
// Returns true iff dst was modified.
static bool apply_reversibly(atomic_cell_or_collection& dst, atomic_cell_or_collection& src);
// Reverts apply performed by apply_reversible().
static void revert_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src);
// Computes a counter cell containing minimal amount of data which, when
// applied to 'b' returns the same cell as 'a' and 'b' applied together.
static stdx::optional<atomic_cell> difference(atomic_cell_view a, atomic_cell_view b);
friend std::ostream& operator<<(std::ostream& os, counter_cell_view ccv);
};
struct counter_cell_mutable_view : basic_counter_cell_view<bytes_mutable_view> {
using basic_counter_cell_view::basic_counter_cell_view;
void set_timestamp(api::timestamp_type ts) { _cell.set_timestamp(ts); }
};
// Transforms mutation dst from counter updates to counter shards using state
// stored in current_state.
// If current_state is present it has to be in the same schema as dst.
void transform_counter_updates_to_shards(mutation& dst, const mutation* current_state, uint64_t clock_offset);
template<>
struct appending_hash<counter_shard_view> {
template<typename Hasher>
void operator()(Hasher& h, const counter_shard_view& cshard) const {
::feed_hash(h, cshard.id().to_uuid());
::feed_hash(h, cshard.value());
::feed_hash(h, cshard.logical_clock());
}
};
template<>
struct appending_hash<counter_cell_view> {
template<typename Hasher>
void operator()(Hasher& h, const counter_cell_view& cell) const {
::feed_hash(h, true); // is_live
::feed_hash(h, cell.timestamp());
for (auto&& csv : cell.shards()) {
::feed_hash(h, csv);
}
}
};

View File

@@ -1,89 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/core/thread.hh>
#include <seastar/core/timer.hh>
#include <chrono>
// Simple proportional controller to adjust shares of memtable/streaming flushes.
//
// Goal is to flush as fast as we can, but not so fast that we steal all the CPU from incoming
// requests, and at the same time minimize user-visible fluctuations in the flush quota.
//
// What that translates to is we'll try to keep virtual dirty's firt derivative at 0 (IOW, we keep
// virtual dirty constant), which means that the rate of incoming writes is equal to the rate of
// flushed bytes.
//
// The exact point at which the controller stops determines the desired flush CPU usage. As we
// approach the hard dirty limit, we need to be more aggressive. We will therefore define two
// thresholds, and increase the constant as we cross them.
//
// 1) the soft limit line
// 2) halfway between soft limit and dirty limit
//
// The constants q1 and q2 are used to determine the proportional factor at each stage.
//
// Below the soft limit, we are in no particular hurry to flush, since it means we're set to
// complete flushing before we a new memtable is ready. The quota is dirty * q1, and q1 is set to a
// low number.
//
// The first half of the virtual dirty region is where we expect to be usually, so we have a low
// slope corresponding to a sluggish response between q1 * soft_limit and q2.
//
// In the second half, we're getting close to the hard dirty limit so we increase the slope and
// become more responsive, up to a maximum quota of qmax.
//
// For now we'll just set them in the structure not to complicate the constructor. But q1, q2 and
// qmax can easily become parameters if we find another user.
class flush_cpu_controller {
static constexpr float hard_dirty_limit = 0.50;
static constexpr float q1 = 0.01;
static constexpr float q2 = 0.2;
static constexpr float qmax = 1;
float _current_quota = 0.0f;
float _goal;
std::function<float()> _current_dirty;
std::chrono::milliseconds _interval;
timer<> _update_timer;
seastar::thread_scheduling_group _scheduling_group;
seastar::thread_scheduling_group *_current_scheduling_group = nullptr;
void adjust();
public:
seastar::thread_scheduling_group* scheduling_group() {
return _current_scheduling_group;
}
float current_quota() const {
return _current_quota;
}
struct disabled {
seastar::thread_scheduling_group *backup;
};
flush_cpu_controller(disabled d) : _scheduling_group(std::chrono::nanoseconds(0), 0), _current_scheduling_group(d.backup) {}
flush_cpu_controller(std::chrono::milliseconds interval, float soft_limit, std::function<float()> current_dirty);
flush_cpu_controller(flush_cpu_controller&&) = default;
};

View File

@@ -36,19 +36,15 @@ options {
#include "cql3/statements/raw/select_statement.hh"
#include "cql3/statements/alter_keyspace_statement.hh"
#include "cql3/statements/alter_table_statement.hh"
#include "cql3/statements/alter_view_statement.hh"
#include "cql3/statements/create_keyspace_statement.hh"
#include "cql3/statements/drop_keyspace_statement.hh"
#include "cql3/statements/create_index_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "cql3/statements/create_view_statement.hh"
#include "cql3/statements/create_type_statement.hh"
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/alter_type_statement.hh"
#include "cql3/statements/property_definitions.hh"
#include "cql3/statements/drop_index_statement.hh"
#include "cql3/statements/drop_table_statement.hh"
#include "cql3/statements/drop_view_statement.hh"
#include "cql3/statements/truncate_statement.hh"
#include "cql3/statements/raw/update_statement.hh"
#include "cql3/statements/raw/insert_statement.hh"
@@ -319,7 +315,9 @@ cqlStatement returns [shared_ptr<raw::parsed_statement> stmt]
| st10=createIndexStatement { $stmt = st10; }
| st11=dropKeyspaceStatement { $stmt = st11; }
| st12=dropTableStatement { $stmt = st12; }
#if 0
| st13=dropIndexStatement { $stmt = st13; }
#endif
| st14=alterTableStatement { $stmt = st14; }
| st15=alterKeyspaceStatement { $stmt = st15; }
| st16=grantStatement { $stmt = st16; }
@@ -342,9 +340,6 @@ cqlStatement returns [shared_ptr<raw::parsed_statement> stmt]
| st30=createAggregateStatement { $stmt = st30; }
| st31=dropAggregateStatement { $stmt = st31; }
#endif
| st32=createViewStatement { $stmt = st32; }
| st33=alterViewStatement { $stmt = st33; }
| st34=dropViewStatement { $stmt = st34; }
;
/*
@@ -399,7 +394,6 @@ unaliasedSelector returns [shared_ptr<selectable::raw> s]
| K_WRITETIME '(' c=cident ')' { tmp = make_shared<selectable::writetime_or_ttl::raw>(c, true); }
| K_TTL '(' c=cident ')' { tmp = make_shared<selectable::writetime_or_ttl::raw>(c, false); }
| f=functionName args=selectionFunctionArgs { tmp = ::make_shared<selectable::with_function::raw>(std::move(f), std::move(args)); }
| K_CAST '(' arg=unaliasedSelector K_AS t=native_type ')' { tmp = ::make_shared<selectable::with_cast::raw>(std::move(arg), std::move(t)); }
)
( '.' fi=cident { tmp = make_shared<selectable::with_field_selection::raw>(std::move(tmp), std::move(fi)); } )*
{ $s = tmp; }
@@ -722,7 +716,7 @@ createTableStatement returns [shared_ptr<cql3::statements::create_table_statemen
cfamDefinition[shared_ptr<cql3::statements::create_table_statement::raw_statement> expr]
: '(' cfamColumns[expr] ( ',' cfamColumns[expr]? )* ')'
( K_WITH cfamProperty[$expr->properties()] ( K_AND cfamProperty[$expr->properties()] )*)?
( K_WITH cfamProperty[expr] ( K_AND cfamProperty[expr] )*)?
;
cfamColumns[shared_ptr<cql3::statements::create_table_statement::raw_statement> expr]
@@ -738,15 +732,15 @@ pkDef[shared_ptr<cql3::statements::create_table_statement::raw_statement> expr]
| '(' k1=ident { l.push_back(k1); } ( ',' kn=ident { l.push_back(kn); } )* ')' { $expr->add_key_aliases(l); }
;
cfamProperty[cql3::statements::cf_properties& expr]
: property[$expr.properties()]
| K_COMPACT K_STORAGE { $expr.set_compact_storage(); }
cfamProperty[shared_ptr<cql3::statements::create_table_statement::raw_statement> expr]
: property[expr->properties]
| K_COMPACT K_STORAGE { $expr->set_compact_storage(); }
| K_CLUSTERING K_ORDER K_BY '(' cfamOrdering[expr] (',' cfamOrdering[expr])* ')'
;
cfamOrdering[cql3::statements::cf_properties& expr]
cfamOrdering[shared_ptr<cql3::statements::create_table_statement::raw_statement> expr]
@init{ bool reversed=false; }
: k=ident (K_ASC | K_DESC { reversed=true;} ) { $expr.set_ordering(k, reversed); }
: k=ident (K_ASC | K_DESC { reversed=true;} ) { $expr->set_ordering(k, reversed); }
;
@@ -778,13 +772,12 @@ createIndexStatement returns [::shared_ptr<create_index_statement> expr]
auto props = make_shared<index_prop_defs>();
bool if_not_exists = false;
auto name = ::make_shared<cql3::index_name>();
std::vector<::shared_ptr<index_target::raw>> targets;
}
: K_CREATE (K_CUSTOM { props->is_custom = true; })? K_INDEX (K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
(idxName[name])? K_ON cf=columnFamilyName '(' (target1=indexIdent { targets.emplace_back(target1); } (',' target2=indexIdent { targets.emplace_back(target2); } )*)? ')'
(idxName[name])? K_ON cf=columnFamilyName '(' id=indexIdent ')'
(K_USING cls=STRING_LITERAL { props->custom_class = sstring{$cls.text}; })?
(K_WITH properties[props])?
{ $expr = ::make_shared<create_index_statement>(cf, name, targets, props, if_not_exists); }
{ $expr = ::make_shared<create_index_statement>(cf, name, id, props, if_not_exists); }
;
indexIdent returns [::shared_ptr<index_target::raw> id]
@@ -794,39 +787,6 @@ indexIdent returns [::shared_ptr<index_target::raw> id]
| K_FULL '(' c=cident ')' { $id = index_target::raw::full_collection(c); }
;
/**
* CREATE MATERIALIZED VIEW <viewName> AS
* SELECT <columns>
* FROM <CF>
* WHERE <pkColumns> IS NOT NULL
* PRIMARY KEY (<pkColumns>)
* WITH <property> = <value> AND ...;
*/
createViewStatement returns [::shared_ptr<create_view_statement> expr]
@init {
bool if_not_exists = false;
std::vector<::shared_ptr<cql3::column_identifier::raw>> partition_keys;
std::vector<::shared_ptr<cql3::column_identifier::raw>> composite_keys;
}
: K_CREATE K_MATERIALIZED K_VIEW (K_IF K_NOT K_EXISTS { if_not_exists = true; })? cf=columnFamilyName K_AS
K_SELECT sclause=selectClause K_FROM basecf=columnFamilyName
(K_WHERE wclause=whereClause)?
K_PRIMARY K_KEY (
'(' '(' k1=cident { partition_keys.push_back(k1); } ( ',' kn=cident { partition_keys.push_back(kn); } )* ')' ( ',' c1=cident { composite_keys.push_back(c1); } )* ')'
| '(' k1=cident { partition_keys.push_back(k1); } ( ',' cn=cident { composite_keys.push_back(cn); } )* ')'
)
{
$expr = ::make_shared<create_view_statement>(
std::move(cf),
std::move(basecf),
std::move(sclause),
std::move(wclause),
std::move(partition_keys),
std::move(composite_keys),
if_not_exists);
}
( K_WITH cfamProperty[{ $expr->properties() }] ( K_AND cfamProperty[{ $expr->properties() }] )*)?
;
#if 0
/**
@@ -873,7 +833,7 @@ alterKeyspaceStatement returns [shared_ptr<cql3::statements::alter_keyspace_stat
alterTableStatement returns [shared_ptr<alter_table_statement> expr]
@init {
alter_table_statement::type type;
auto props = make_shared<cql3::statements::cf_prop_defs>();
auto props = make_shared<cql3::statements::cf_prop_defs>();;
std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames;
bool is_static = false;
}
@@ -907,18 +867,6 @@ alterTypeStatement returns [::shared_ptr<alter_type_statement> expr]
)
;
/**
* ALTER MATERIALIZED VIEW <CF> WITH <property> = <value>;
*/
alterViewStatement returns [::shared_ptr<alter_view_statement> expr]
@init {
auto props = make_shared<cql3::statements::cf_prop_defs>();
}
: K_ALTER K_MATERIALIZED K_VIEW cf=columnFamilyName K_WITH properties[props]
{
$expr = ::make_shared<alter_view_statement>(std::move(cf), std::move(props));
}
;
renames[::shared_ptr<alter_type_statement::renames> expr]
: fromId=ident K_TO toId=ident { $expr->add_rename(fromId, toId); }
@@ -949,23 +897,16 @@ dropTypeStatement returns [::shared_ptr<drop_type_statement> stmt]
: K_DROP K_TYPE (K_IF K_EXISTS { if_exists = true; } )? name=userTypeName { $stmt = ::make_shared<drop_type_statement>(name, if_exists); }
;
/**
* DROP MATERIALIZED VIEW [IF EXISTS] <view_name>
*/
dropViewStatement returns [::shared_ptr<drop_view_statement> stmt]
@init { bool if_exists = false; }
: K_DROP K_MATERIALIZED K_VIEW (K_IF K_EXISTS { if_exists = true; } )? cf=columnFamilyName
{ $stmt = ::make_shared<drop_view_statement>(cf, if_exists); }
;
#if 0
/**
* DROP INDEX [IF EXISTS] <INDEX_NAME>
*/
dropIndexStatement returns [::shared_ptr<drop_index_statement> expr]
@init { bool if_exists = false; }
: K_DROP K_INDEX (K_IF K_EXISTS { if_exists = true; } )? index=indexName
{ $expr = ::make_shared<drop_index_statement>(index, if_exists); }
dropIndexStatement returns [DropIndexStatement expr]
@init { boolean ifExists = false; }
: K_DROP K_INDEX (K_IF K_EXISTS { ifExists = true; } )? index=indexName
{ $expr = new DropIndexStatement(index, ifExists); }
;
#endif
/**
* TRUNCATE <CF>;
@@ -1168,7 +1109,6 @@ constant returns [shared_ptr<cql3::constants::literal> constant]
| t=INTEGER { $constant = cql3::constants::literal::integer(sstring{$t.text}); }
| t=FLOAT { $constant = cql3::constants::literal::floating_point(sstring{$t.text}); }
| t=BOOLEAN { $constant = cql3::constants::literal::bool_(sstring{$t.text}); }
| t=DURATION { $constant = cql3::constants::literal::duration(sstring{$t.text}); }
| t=UUID { $constant = cql3::constants::literal::uuid(sstring{$t.text}); }
| t=HEXNUMBER { $constant = cql3::constants::literal::hex(sstring{$t.text}); }
| { sign=""; } ('-' {sign = "-"; } )? t=(K_NAN | K_INFINITY) { $constant = cql3::constants::literal::floating_point(sstring{sign + $t.text}); }
@@ -1303,10 +1243,6 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
}
add_raw_update(operations, key, make_shared<cql3::operation::addition>(cql3::constants::literal::integer($i.text)));
}
| K_SCYLLA_COUNTER_SHARD_LIST '(' t=term ')'
{
add_raw_update(operations, key, ::make_shared<cql3::operation::set_counter_value_from_tuple_list>(t));
}
;
specializedColumnOperation[std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>,
@@ -1368,8 +1304,7 @@ relation[std::vector<cql3::relation_ptr>& clauses]
| K_TOKEN l=tupleOfIdentifiers type=relationType t=term
{ $clauses.emplace_back(::make_shared<cql3::token_relation>(std::move(l), *type, std::move(t))); }
| name=cident K_IS K_NOT K_NULL {
$clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::operator_type::IS_NOT, cql3::constants::NULL_LITERAL)); }
| name=cident K_IN marker=inMarker
{ $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::operator_type::IN, std::move(marker))); }
| name=cident K_IN in_values=singleColumnInValues
@@ -1466,20 +1401,15 @@ native_type returns [shared_ptr<cql3_type> t]
| K_COUNTER { $t = cql3_type::counter; }
| K_DECIMAL { $t = cql3_type::decimal; }
| K_DOUBLE { $t = cql3_type::double_; }
| K_DURATION { $t = cql3_type::duration; }
| K_FLOAT { $t = cql3_type::float_; }
| K_INET { $t = cql3_type::inet; }
| K_INT { $t = cql3_type::int_; }
| K_SMALLINT { $t = cql3_type::smallint; }
| K_TEXT { $t = cql3_type::text; }
| K_TIMESTAMP { $t = cql3_type::timestamp; }
| K_TINYINT { $t = cql3_type::tinyint; }
| K_UUID { $t = cql3_type::uuid; }
| K_VARCHAR { $t = cql3_type::varchar; }
| K_VARINT { $t = cql3_type::varint; }
| K_TIMEUUID { $t = cql3_type::timeuuid; }
| K_DATE { $t = cql3_type::date; }
| K_TIME { $t = cql3_type::time; }
;
collection_type returns [shared_ptr<cql3::cql3_type::raw> pt]
@@ -1553,8 +1483,6 @@ basic_unreserved_keyword returns [sstring str]
| K_DISTINCT
| K_CONTAINS
| K_STATIC
| K_FROZEN
| K_TUPLE
| K_FUNCTION
| K_AGGREGATE
| K_SFUNC
@@ -1572,7 +1500,6 @@ basic_unreserved_keyword returns [sstring str]
K_SELECT: S E L E C T;
K_FROM: F R O M;
K_AS: A S;
K_CAST: C A S T;
K_WHERE: W H E R E;
K_AND: A N D;
K_KEY: K E Y;
@@ -1601,8 +1528,6 @@ K_KEYSPACE: ( K E Y S P A C E
K_KEYSPACES: K E Y S P A C E S;
K_COLUMNFAMILY:( C O L U M N F A M I L Y
| T A B L E );
K_MATERIALIZED:M A T E R I A L I Z E D;
K_VIEW: V I E W;
K_INDEX: I N D E X;
K_CUSTOM: C U S T O M;
K_ON: O N;
@@ -1626,7 +1551,6 @@ K_DESC: D E S C;
K_ALLOW: A L L O W;
K_FILTERING: F I L T E R I N G;
K_IF: I F;
K_IS: I S;
K_CONTAINS: C O N T A I N S;
K_GRANT: G R A N T;
@@ -1653,12 +1577,9 @@ K_BOOLEAN: B O O L E A N;
K_COUNTER: C O U N T E R;
K_DECIMAL: D E C I M A L;
K_DOUBLE: D O U B L E;
K_DURATION: D U R A T I O N;
K_FLOAT: F L O A T;
K_INET: I N E T;
K_INT: I N T;
K_SMALLINT: S M A L L I N T;
K_TINYINT: T I N Y I N T;
K_TEXT: T E X T;
K_UUID: U U I D;
K_VARCHAR: V A R C H A R;
@@ -1666,8 +1587,6 @@ K_VARINT: V A R I N T;
K_TIMEUUID: T I M E U U I D;
K_TOKEN: T O K E N;
K_WRITETIME: W R I T E T I M E;
K_DATE: D A T E;
K_TIME: T I M E;
K_NULL: N U L L;
K_NOT: N O T;
@@ -1697,7 +1616,6 @@ K_REPLACE: R E P L A C E;
K_DETERMINISTIC: D E T E R M I N I S T I C;
K_SCYLLA_TIMEUUID_LIST_INDEX: S C Y L L A '_' T I M E U U I D '_' L I S T '_' I N D E X;
K_SCYLLA_COUNTER_SHARD_LIST: S C Y L L A '_' C O U N T E R '_' S H A R D '_' L I S T;
// Case-insensitive alpha characters
fragment A: ('a'|'A');
@@ -1783,20 +1701,6 @@ fragment EXPONENT
: E ('+' | '-')? DIGIT+
;
fragment DURATION_UNIT
: Y
| M O
| W
| D
| H
| M
| S
| M S
| U S
| '\u00B5' S
| N S
;
INTEGER
: '-'? DIGIT+
;
@@ -1821,13 +1725,6 @@ BOOLEAN
: T R U E | F A L S E
;
DURATION
: '-'? DIGIT+ DURATION_UNIT (DIGIT+ DURATION_UNIT)*
| '-'? 'P' (DIGIT+ 'Y')? (DIGIT+ 'M')? (DIGIT+ 'D')? ('T' (DIGIT+ 'H')? (DIGIT+ 'M')? (DIGIT+ 'S')?)? // ISO 8601 "format with designators"
| '-'? 'P' DIGIT+ 'W'
| '-'? 'P' DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT 'T' DIGIT DIGIT ':' DIGIT DIGIT ':' DIGIT DIGIT // ISO 8601 "alternative format"
;
IDENT
: LETTER (LETTER | DIGIT | '_')*
;

View File

@@ -79,7 +79,6 @@ abstract_marker::raw::raw(int32_t bind_index)
return ::make_shared<maps::marker>(_bind_index, receiver);
}
assert(0);
return shared_ptr<term>();
}
assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {

View File

@@ -71,15 +71,13 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
}
auto tval = _timestamp->bind_and_get(options);
if (tval.is_null()) {
if (!tval) {
throw exceptions::invalid_request_exception("Invalid null value of timestamp");
}
if (tval.is_unset_value()) {
return now;
}
try {
data_type_for<int64_t>()->validate(*tval);
} catch (marshal_exception& e) {
} catch (marshal_exception e) {
throw exceptions::invalid_request_exception("Invalid timestamp value");
}
return value_cast<int64_t>(data_type_for<int64_t>()->deserialize(*tval));
@@ -90,16 +88,14 @@ int32_t attributes::get_time_to_live(const query_options& options) {
return 0;
auto tval = _time_to_live->bind_and_get(options);
if (tval.is_null()) {
if (!tval) {
throw exceptions::invalid_request_exception("Invalid null value of TTL");
}
if (tval.is_unset_value()) {
return 0;
}
try {
data_type_for<int32_t>()->validate(*tval);
}
catch (marshal_exception& e) {
catch (marshal_exception e) {
throw exceptions::invalid_request_exception("Invalid TTL value");
}

View File

@@ -40,29 +40,11 @@
*/
#include "cql3/column_condition.hh"
#include "statements/request_validations.hh"
#include "unimplemented.hh"
#include "lists.hh"
#include "maps.hh"
#include <boost/range/algorithm_ext/push_back.hpp>
namespace {
void validate_operation_on_durations(const abstract_type& type, const cql3::operator_type& op) {
using cql3::statements::request_validations::check_false;
if (op.is_slice() && type.references_duration()) {
check_false(type.is_collection(), "Slice conditions are not supported on collections containing durations");
check_false(type.is_tuple(), "Slice conditions are not supported on tuples containing durations");
check_false(type.is_user_type(), "Slice conditions are not supported on UDTs containing durations");
// We're a duration.
throw exceptions::invalid_request_exception(sprint("Slice conditions are not supported on durations"));
}
}
}
namespace cql3 {
bool
@@ -113,7 +95,6 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
}
return column_condition::in_condition(receiver, std::move(terms));
} else {
validate_operation_on_durations(*receiver.type, _op);
return column_condition::condition(receiver, _value->prepare(db, keyspace, receiver.column_specification), _op);
}
}
@@ -148,8 +129,6 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
| boost::adaptors::transformed(std::bind(&term::raw::prepare, std::placeholders::_1, std::ref(db), std::ref(keyspace), value_spec)));
return column_condition::in_condition(receiver, _collection_element->prepare(db, keyspace, element_spec), terms);
} else {
validate_operation_on_durations(*receiver.type, _op);
return column_condition::condition(receiver,
_collection_element->prepare(db, keyspace, element_spec),
_value->prepare(db, keyspace, value_spec),

View File

@@ -23,8 +23,6 @@
#include "exceptions/exceptions.hh"
#include "cql3/selection/simple_selector.hh"
#include <regex>
namespace cql3 {
column_identifier::column_identifier(sstring raw_text, bool keep_case) {
@@ -61,17 +59,6 @@ sstring column_identifier::to_string() const {
return _text;
}
sstring column_identifier::to_cql_string() const {
static const std::regex unquoted_identifier_re("[a-z][a-z0-9_]*");
if (std::regex_match(_text.begin(), _text.end(), unquoted_identifier_re)) {
return _text;
}
static const std::regex double_quote_re("\"");
std::string result = _text;
std::regex_replace(result, double_quote_re, "\"\"");
return '"' + result + '"';
}
column_identifier::raw::raw(sstring raw_text, bool keep_case)
: _raw_text{raw_text}
, _text{raw_text}

View File

@@ -47,7 +47,7 @@
#include <algorithm>
#include <functional>
#include <iosfwd>
#include <iostream>
namespace cql3 {
@@ -80,8 +80,6 @@ public:
sstring to_string() const;
sstring to_cql_string() const;
friend std::ostream& operator<<(std::ostream& out, const column_identifier& i) {
return out << i._text;
}

View File

@@ -44,7 +44,6 @@
namespace cql3 {
thread_local const ::shared_ptr<constants::value> constants::UNSET_VALUE = ::make_shared<constants::value>(cql3::raw_value::make_unset_value());
thread_local const ::shared_ptr<term::raw> constants::NULL_LITERAL = ::make_shared<constants::null_literal>();
thread_local const ::shared_ptr<terminal> constants::null_literal::NULL_VALUE = ::make_shared<constants::null_literal::null_value>();
@@ -52,15 +51,14 @@ std::ostream&
operator<<(std::ostream&out, constants::type t)
{
switch (t) {
case constants::type::STRING: return out << "STRING";
case constants::type::INTEGER: return out << "INTEGER";
case constants::type::UUID: return out << "UUID";
case constants::type::FLOAT: return out << "FLOAT";
case constants::type::BOOLEAN: return out << "BOOLEAN";
case constants::type::HEX: return out << "HEX";
case constants::type::DURATION: return out << "DURATION";
}
abort();
case constants::type::STRING: return out << "STRING";
case constants::type::INTEGER: return out << "INTEGER";
case constants::type::UUID: return out << "UUID";
case constants::type::FLOAT: return out << "FLOAT";
case constants::type::BOOLEAN: return out << "BOOLEAN";
case constants::type::HEX: return out << "HEX";
};
assert(0);
}
bytes
@@ -99,9 +97,7 @@ constants::literal::test_assignment(database& db, const sstring& keyspace, ::sha
cql3_type::kind::TEXT,
cql3_type::kind::INET,
cql3_type::kind::VARCHAR,
cql3_type::kind::TIMESTAMP,
cql3_type::kind::DATE,
cql3_type::kind::TIME>::contains(kind)) {
cql3_type::kind::TIMESTAMP>::contains(kind)) {
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
}
break;
@@ -113,10 +109,7 @@ constants::literal::test_assignment(database& db, const sstring& keyspace, ::sha
cql3_type::kind::DOUBLE,
cql3_type::kind::FLOAT,
cql3_type::kind::INT,
cql3_type::kind::SMALLINT,
cql3_type::kind::TIMESTAMP,
cql3_type::kind::DATE,
cql3_type::kind::TINYINT,
cql3_type::kind::VARINT>::contains(kind)) {
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
}
@@ -146,11 +139,6 @@ constants::literal::test_assignment(database& db, const sstring& keyspace, ::sha
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
}
break;
case type::DURATION:
if (kind == cql3_type::kind_enum_set::prepare<cql3_type::kind::DURATION>()) {
return assignment_testable::test_result::EXACT_MATCH;
}
break;
}
return assignment_testable::test_result::NOT_ASSIGNABLE;
}
@@ -162,10 +150,10 @@ constants::literal::prepare(database& db, const sstring& keyspace, ::shared_ptr<
throw exceptions::invalid_request_exception(sprint("Invalid %s constant (%s) for \"%s\" of type %s",
_type, _text, *receiver->name, receiver->type->as_cql3_type()->to_string()));
}
return ::make_shared<value>(cql3::raw_value::make_value(parsed_value(receiver->type)));
return ::make_shared<value>(std::experimental::make_optional(parsed_value(receiver->type)));
}
void constants::deleter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
void constants::deleter::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
if (column.type->is_multi_cell()) {
collection_type_impl::mutation coll_m;
coll_m.tomb = params.make_tombstone();

View File

@@ -44,7 +44,6 @@
#include "cql3/abstract_marker.hh"
#include "cql3/update_parameters.hh"
#include "cql3/operation.hh"
#include "cql3/values.hh"
#include "cql3/term.hh"
#include "core/shared_ptr.hh"
@@ -60,7 +59,7 @@ public:
#endif
public:
enum class type {
STRING, INTEGER, UUID, FLOAT, BOOLEAN, HEX, DURATION
STRING, INTEGER, UUID, FLOAT, BOOLEAN, HEX
};
/**
@@ -68,20 +67,18 @@ public:
*/
class value : public terminal {
public:
cql3::raw_value _bytes;
value(cql3::raw_value bytes_) : _bytes(std::move(bytes_)) {}
virtual cql3::raw_value get(const query_options& options) override { return _bytes; }
virtual cql3::raw_value_view bind_and_get(const query_options& options) override { return _bytes.to_view(); }
bytes_opt _bytes;
value(bytes_opt bytes_) : _bytes(std::move(bytes_)) {}
virtual bytes_opt get(const query_options& options) override { return _bytes; }
virtual bytes_view_opt bind_and_get(const query_options& options) override { return as_bytes_view_opt(_bytes); }
virtual sstring to_string() const override { return to_hex(*_bytes); }
};
static thread_local const ::shared_ptr<value> UNSET_VALUE;
class null_literal final : public term::raw {
private:
class null_value final : public value {
public:
null_value() : value(cql3::raw_value::make_null()) {}
null_value() : value({}) {}
virtual ::shared_ptr<terminal> bind(const query_options& options) override { return {}; }
virtual sstring to_string() const override { return "null"; }
};
@@ -149,10 +146,6 @@ public:
return ::make_shared<literal>(type::HEX, text);
}
static ::shared_ptr<literal> duration(sstring text) {
return ::make_shared<literal>(type::DURATION, text);
}
virtual ::shared_ptr<term> prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver);
private:
bytes parsed_value(data_type validator);
@@ -176,13 +169,14 @@ public:
assert(!_receiver->type->is_collection());
}
virtual cql3::raw_value_view bind_and_get(const query_options& options) override {
virtual bytes_view_opt bind_and_get(const query_options& options) override {
try {
auto value = options.get_value_at(_bind_index);
if (value) {
_receiver->type->validate(*value);
return *value;
}
return value;
return std::experimental::nullopt;
} catch (const marshal_exception& e) {
throw exceptions::invalid_request_exception(e.what());
}
@@ -193,7 +187,7 @@ public:
if (!bytes) {
return ::shared_ptr<terminal>{};
}
return ::make_shared<constants::value>(std::move(cql3::raw_value::make_value(to_bytes(*bytes))));
return ::make_shared<constants::value>(std::move(to_bytes_opt(*bytes)));
}
};
@@ -201,48 +195,54 @@ public:
public:
using operation::operation;
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override {
virtual void execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) override {
auto value = _t->bind_and_get(params._options);
if (value.is_null()) {
m.set_cell(prefix, column, std::move(make_dead_cell(params)));
} else if (value.is_value()) {
m.set_cell(prefix, column, std::move(make_cell(*value, params)));
}
auto cell = value ? make_cell(*value, params) : make_dead_cell(params);
m.set_cell(prefix, column, std::move(cell));
}
};
struct adder final : operation {
using operation::operation;
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override {
auto value = _t->bind_and_get(params._options);
if (value.is_null()) {
throw exceptions::invalid_request_exception("Invalid null value for counter increment");
} else if (value.is_unset_value()) {
return;
}
auto increment = value_cast<int64_t>(long_type->deserialize_value(*value));
m.set_cell(prefix, column, make_counter_update_cell(increment, params));
#if 0
public static class Adder extends Operation
{
public Adder(ColumnDefinition column, Term t)
{
super(column, t);
}
};
struct subtracter final : operation {
using operation::operation;
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override {
auto value = _t->bind_and_get(params._options);
if (value.is_null()) {
throw exceptions::invalid_request_exception("Invalid null value for counter increment");
} else if (value.is_unset_value()) {
return;
}
auto increment = value_cast<int64_t>(long_type->deserialize_value(*value));
if (increment == std::numeric_limits<int64_t>::min()) {
throw exceptions::invalid_request_exception(sprint("The negation of %d overflows supported counter precision (signed 8 bytes integer)", increment));
}
m.set_cell(prefix, column, make_counter_update_cell(-increment, params));
public void execute(ByteBuffer rowKey, ColumnFamily cf, Composite prefix, UpdateParameters params) throws InvalidRequestException
{
ByteBuffer bytes = t.bindAndGet(params.options);
if (bytes == null)
throw new InvalidRequestException("Invalid null value for counter increment");
long increment = ByteBufferUtil.toLong(bytes);
CellName cname = cf.getComparator().create(prefix, column);
cf.addColumn(params.makeCounter(cname, increment));
}
};
}
public static class Substracter extends Operation
{
public Substracter(ColumnDefinition column, Term t)
{
super(column, t);
}
public void execute(ByteBuffer rowKey, ColumnFamily cf, Composite prefix, UpdateParameters params) throws InvalidRequestException
{
ByteBuffer bytes = t.bindAndGet(params.options);
if (bytes == null)
throw new InvalidRequestException("Invalid null value for counter increment");
long increment = ByteBufferUtil.toLong(bytes);
if (increment == Long.MIN_VALUE)
throw new InvalidRequestException("The negation of " + increment + " overflows supported counter precision (signed 8 bytes integer)");
CellName cname = cf.getComparator().create(prefix, column);
cf.addColumn(params.makeCounter(cname, -increment));
}
}
#endif
class deleter : public operation {
public:
@@ -250,7 +250,7 @@ public:
: operation(column, {})
{ }
virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override;
virtual void execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) override;
};
};

View File

@@ -19,43 +19,11 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <iostream>
#include <iterator>
#include <regex>
#include "cql3_type.hh"
#include "cql3/util.hh"
#include "ut_name.hh"
namespace cql3 {
sstring cql3_type::to_string() const {
if (_type->is_user_type()) {
return "frozen<" + util::maybe_quote(_name) + ">";
}
if (_type->is_tuple()) {
return "frozen<" + _name + ">";
}
return _name;
}
shared_ptr<cql3_type> cql3_type::raw::prepare(database& db, const sstring& keyspace) {
try {
auto&& ks = db.find_keyspace(keyspace);
return prepare_internal(keyspace, ks.metadata()->user_types());
} catch (no_such_keyspace& nsk) {
throw exceptions::invalid_request_exception("Unknown keyspace " + keyspace);
}
}
bool cql3_type::raw::is_duration() const {
return false;
}
bool cql3_type::raw::references_user_type(const sstring& name) const {
return false;
}
class cql3_type::raw_type : public raw {
private:
shared_ptr<cql3_type> _type;
@@ -67,9 +35,6 @@ public:
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace) {
return _type;
}
shared_ptr<cql3_type> prepare_internal(const sstring&, lw_shared_ptr<user_types_metadata>) override {
return _type;
}
virtual bool supports_freezing() const {
return false;
@@ -82,10 +47,6 @@ public:
virtual sstring to_string() const {
return _type->to_string();
}
virtual bool is_duration() const override {
return _type->get_type()->equals(duration_type);
}
};
class cql3_type::raw_collection : public raw {
@@ -115,7 +76,7 @@ public:
return true;
}
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace) override {
assert(_values); // "Got null values type for a collection";
if (!_frozen && _values->supports_freezing() && !_values->_frozen) {
@@ -132,30 +93,16 @@ public:
}
if (_kind == &collection_type_impl::kind::list) {
return make_shared(cql3_type(to_string(), list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
return make_shared(cql3_type(to_string(), list_type_impl::get_instance(_values->prepare(db, keyspace)->get_type(), !_frozen), false));
} else if (_kind == &collection_type_impl::kind::set) {
if (_values->is_duration()) {
throw exceptions::invalid_request_exception(sprint("Durations are not allowed inside sets: %s", *this));
}
return make_shared(cql3_type(to_string(), set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
return make_shared(cql3_type(to_string(), set_type_impl::get_instance(_values->prepare(db, keyspace)->get_type(), !_frozen), false));
} else if (_kind == &collection_type_impl::kind::map) {
assert(_keys); // "Got null keys type for a collection";
if (_keys->is_duration()) {
throw exceptions::invalid_request_exception(sprint("Durations are not allowed as map keys: %s", *this));
}
return make_shared(cql3_type(to_string(), map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types)->get_type(), _values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
return make_shared(cql3_type(to_string(), map_type_impl::get_instance(_keys->prepare(db, keyspace)->get_type(), _values->prepare(db, keyspace)->get_type(), !_frozen), false));
}
abort();
}
bool references_user_type(const sstring& name) const override {
return (_keys && _keys->references_user_type(name)) || _values->references_user_type(name);
}
bool is_duration() const override {
return false;
}
virtual sstring to_string() const override {
sstring start = _frozen ? "frozen<" : "";
sstring end = _frozen ? ">" : "";
@@ -185,7 +132,7 @@ public:
_frozen = true;
}
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace) override {
if (_name.has_keyspace()) {
// The provided keyspace is the one of the current statement this is part of. If it's different from the keyspace of
// the UTName, we reject since we want to limit user types to their own keyspace (see #6643)
@@ -197,23 +144,23 @@ public:
} else {
_name.set_keyspace(keyspace);
}
if (!user_types) {
// bootstrap mode.
throw exceptions::invalid_request_exception(sprint("Unknown type %s", _name));
}
try {
auto&& type = user_types->get_type(_name.get_user_type_name());
if (!_frozen) {
throw exceptions::invalid_request_exception("Non-frozen User-Defined types are not supported, please use frozen<>");
auto&& ks = db.find_keyspace(_name.get_keyspace());
try {
auto&& type = ks.metadata()->user_types()->get_type(_name.get_user_type_name());
if (!_frozen) {
throw exceptions::invalid_request_exception("Non-frozen User-Defined types are not supported, please use frozen<>");
}
return make_shared<cql3_type>(_name.to_string(), std::move(type));
} catch (std::out_of_range& e) {
throw exceptions::invalid_request_exception(sprint("Unknown type %s", _name));
}
return make_shared<cql3_type>(_name.to_string(), std::move(type));
} catch (std::out_of_range& e) {
throw exceptions::invalid_request_exception(sprint("Unknown type %s", _name));
} catch (no_such_keyspace& nsk) {
throw exceptions::invalid_request_exception("Unknown keyspace " + _name.get_keyspace());
}
}
bool references_user_type(const sstring& name) const override {
return _name.get_string_type_name() == name;
}
virtual bool supports_freezing() const override {
return true;
}
@@ -244,7 +191,7 @@ public:
}
_frozen = true;
}
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace) override {
if (!_frozen) {
freeze();
}
@@ -253,17 +200,10 @@ public:
if (t->is_counter()) {
throw exceptions::invalid_request_exception("Counters are not allowed inside tuples");
}
ts.push_back(t->prepare_internal(keyspace, user_types)->get_type());
ts.push_back(t->prepare(db, keyspace)->get_type());
}
return make_cql3_tuple_type(tuple_type_impl::get_instance(std::move(ts)));
}
bool references_user_type(const sstring& name) const override {
return std::any_of(_types.begin(), _types.end(), [&name](auto t) {
return t->references_user_type(name);
});
}
virtual sstring to_string() const override {
return sprint("tuple<%s>", join(", ", _types));
}
@@ -331,23 +271,17 @@ thread_local shared_ptr<cql3_type> cql3_type::bigint = make("bigint", long_type,
thread_local shared_ptr<cql3_type> cql3_type::blob = make("blob", bytes_type, cql3_type::kind::BLOB);
thread_local shared_ptr<cql3_type> cql3_type::boolean = make("boolean", boolean_type, cql3_type::kind::BOOLEAN);
thread_local shared_ptr<cql3_type> cql3_type::double_ = make("double", double_type, cql3_type::kind::DOUBLE);
thread_local shared_ptr<cql3_type> cql3_type::empty = make("empty", empty_type, cql3_type::kind::EMPTY);
thread_local shared_ptr<cql3_type> cql3_type::float_ = make("float", float_type, cql3_type::kind::FLOAT);
thread_local shared_ptr<cql3_type> cql3_type::int_ = make("int", int32_type, cql3_type::kind::INT);
thread_local shared_ptr<cql3_type> cql3_type::smallint = make("smallint", short_type, cql3_type::kind::SMALLINT);
thread_local shared_ptr<cql3_type> cql3_type::text = make("text", utf8_type, cql3_type::kind::TEXT);
thread_local shared_ptr<cql3_type> cql3_type::timestamp = make("timestamp", timestamp_type, cql3_type::kind::TIMESTAMP);
thread_local shared_ptr<cql3_type> cql3_type::tinyint = make("tinyint", byte_type, cql3_type::kind::TINYINT);
thread_local shared_ptr<cql3_type> cql3_type::uuid = make("uuid", uuid_type, cql3_type::kind::UUID);
thread_local shared_ptr<cql3_type> cql3_type::varchar = make("varchar", utf8_type, cql3_type::kind::TEXT);
thread_local shared_ptr<cql3_type> cql3_type::timeuuid = make("timeuuid", timeuuid_type, cql3_type::kind::TIMEUUID);
thread_local shared_ptr<cql3_type> cql3_type::date = make("date", simple_date_type, cql3_type::kind::DATE);
thread_local shared_ptr<cql3_type> cql3_type::time = make("time", time_type, cql3_type::kind::TIME);
thread_local shared_ptr<cql3_type> cql3_type::inet = make("inet", inet_addr_type, cql3_type::kind::INET);
thread_local shared_ptr<cql3_type> cql3_type::varint = make("varint", varint_type, cql3_type::kind::VARINT);
thread_local shared_ptr<cql3_type> cql3_type::decimal = make("decimal", decimal_type, cql3_type::kind::DECIMAL);
thread_local shared_ptr<cql3_type> cql3_type::counter = make("counter", counter_type, cql3_type::kind::COUNTER);
thread_local shared_ptr<cql3_type> cql3_type::duration = make("duration", duration_type, cql3_type::kind::DURATION);
const std::vector<shared_ptr<cql3_type>>&
cql3_type::values() {
@@ -359,21 +293,15 @@ cql3_type::values() {
cql3_type::counter,
cql3_type::decimal,
cql3_type::double_,
cql3_type::empty,
cql3_type::float_,
cql3_type::inet,
cql3_type:inet,
cql3_type::int_,
cql3_type::smallint,
cql3_type::text,
cql3_type::timestamp,
cql3_type::tinyint,
cql3_type::uuid,
cql3_type::varchar,
cql3_type::varint,
cql3_type::timeuuid,
cql3_type::date,
cql3_type::time,
cql3_type::duration,
};
return v;
}
@@ -393,23 +321,5 @@ operator<<(std::ostream& os, const cql3_type::raw& r) {
return os << r.to_string();
}
namespace util {
sstring maybe_quote(const sstring& s) {
static const std::regex unquoted("\\w*");
static const std::regex double_quote("\"");
if (std::regex_match(s.begin(), s.end(), unquoted)) {
return s;
}
std::ostringstream ss;
ss << "\"";
std::regex_replace(std::ostreambuf_iterator<char>(ss), s.begin(), s.end(), double_quote, "\"\"");
ss << "\"";
return ss.str();
}
}
}

View File

@@ -47,7 +47,6 @@
#include "enum_set.hh"
class database;
class user_types_metadata;
namespace cql3 {
@@ -64,23 +63,19 @@ public:
bool is_counter() const { return _type->is_counter(); }
bool is_native() const { return _native; }
data_type get_type() const { return _type; }
sstring to_string() const;
sstring to_string() const { return _name; }
// For UserTypes, we need to know the current keyspace to resolve the
// actual type used, so Raw is a "not yet prepared" CQL3Type.
class raw {
public:
virtual ~raw() {}
bool _frozen = false;
virtual bool supports_freezing() const = 0;
virtual bool is_collection() const;
virtual bool is_counter() const;
virtual bool is_duration() const;
virtual bool references_user_type(const sstring&) const;
virtual std::experimental::optional<sstring> keyspace() const;
virtual void freeze();
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata>) = 0;
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace);
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace) = 0;
static shared_ptr<raw> from(shared_ptr<cql3_type> type);
static shared_ptr<raw> user_type(ut_name name);
static shared_ptr<raw> map(shared_ptr<raw> t1, shared_ptr<raw> t2);
@@ -103,7 +98,7 @@ private:
public:
enum class kind : int8_t {
ASCII, BIGINT, BLOB, BOOLEAN, COUNTER, DECIMAL, DOUBLE, EMPTY, FLOAT, INT, SMALLINT, TINYINT, INET, TEXT, TIMESTAMP, UUID, VARCHAR, VARINT, TIMEUUID, DATE, TIME, DURATION
ASCII, BIGINT, BLOB, BOOLEAN, COUNTER, DECIMAL, DOUBLE, FLOAT, INT, INET, TEXT, TIMESTAMP, UUID, VARCHAR, VARINT, TIMEUUID
};
using kind_enum = super_enum<kind,
kind::ASCII,
@@ -113,21 +108,15 @@ public:
kind::COUNTER,
kind::DECIMAL,
kind::DOUBLE,
kind::EMPTY,
kind::FLOAT,
kind::INET,
kind::INT,
kind::SMALLINT,
kind::TINYINT,
kind::TEXT,
kind::TIMESTAMP,
kind::UUID,
kind::VARCHAR,
kind::VARINT,
kind::TIMEUUID,
kind::DATE,
kind::TIME,
kind::DURATION>;
kind::TIMEUUID>;
using kind_enum_set = enum_set<kind_enum>;
private:
std::experimental::optional<kind_enum_set::prepared> _kind;
@@ -140,23 +129,17 @@ public:
static thread_local shared_ptr<cql3_type> blob;
static thread_local shared_ptr<cql3_type> boolean;
static thread_local shared_ptr<cql3_type> double_;
static thread_local shared_ptr<cql3_type> empty;
static thread_local shared_ptr<cql3_type> float_;
static thread_local shared_ptr<cql3_type> int_;
static thread_local shared_ptr<cql3_type> smallint;
static thread_local shared_ptr<cql3_type> text;
static thread_local shared_ptr<cql3_type> timestamp;
static thread_local shared_ptr<cql3_type> tinyint;
static thread_local shared_ptr<cql3_type> uuid;
static thread_local shared_ptr<cql3_type> varchar;
static thread_local shared_ptr<cql3_type> timeuuid;
static thread_local shared_ptr<cql3_type> date;
static thread_local shared_ptr<cql3_type> time;
static thread_local shared_ptr<cql3_type> inet;
static thread_local shared_ptr<cql3_type> varint;
static thread_local shared_ptr<cql3_type> decimal;
static thread_local shared_ptr<cql3_type> counter;
static thread_local shared_ptr<cql3_type> duration;
static const std::vector<shared_ptr<cql3_type>>& values();
public:

View File

@@ -46,7 +46,7 @@
#include "service/storage_proxy.hh"
#include "cql3/query_options.hh"
namespace cql_transport {
namespace transport {
namespace messages {
@@ -89,7 +89,7 @@ public:
* @param state the current query state
* @param options options for this query (consistency, variables, pageSize, ...)
*/
virtual future<::shared_ptr<cql_transport::messages::result_message>>
virtual future<::shared_ptr<transport::messages::result_message>>
execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) = 0;
/**
@@ -97,7 +97,7 @@ public:
*
* @param state the current query state
*/
virtual future<::shared_ptr<cql_transport::messages::result_message>>
virtual future<::shared_ptr<transport::messages::result_message>>
execute_internal(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) = 0;
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;

View File

@@ -67,6 +67,10 @@ class error_collector : public error_listener<RecognizerType, ExceptionBaseType>
*/
const sstring_view _query;
/**
* The error messages.
*/
std::vector<sstring> _error_msgs;
public:
/**
@@ -77,10 +81,7 @@ public:
*/
error_collector(const sstring_view& query) : _query(query) {}
/**
* Format and throw a new \c exceptions::syntax_exception.
*/
[[noreturn]] virtual void syntax_error(RecognizerType& recognizer, ANTLR_UINT8** token_names, ExceptionBaseType* ex) override {
virtual void syntax_error(RecognizerType& recognizer, ANTLR_UINT8** token_names, ExceptionBaseType* ex) override {
auto hdr = get_error_header(ex);
auto msg = get_error_message(recognizer, ex, token_names);
std::stringstream result;
@@ -89,15 +90,22 @@ public:
if (recognizer instanceof Parser)
appendQuerySnippet((Parser) recognizer, builder);
#endif
_error_msgs.emplace_back(result.str());
}
throw exceptions::syntax_exception(result.str());
virtual void syntax_error(RecognizerType& recognizer, const sstring& msg) override {
_error_msgs.emplace_back(msg);
}
/**
* Throw a new \c exceptions::syntax_exception.
* Throws the first syntax error found by the lexer or the parser if it exists.
*
* @throws SyntaxException the syntax error.
*/
[[noreturn]] virtual void syntax_error(RecognizerType&, const sstring& msg) override {
throw exceptions::syntax_exception(msg);
void throw_first_syntax_error() {
if (!_error_msgs.empty()) {
throw exceptions::syntax_exception(_error_msgs[0]);
}
}
private:

View File

@@ -41,7 +41,6 @@
#pragma once
#include "seastarx.hh"
#include <seastar/core/sstring.hh>
#include <antlr3.hpp>
@@ -53,7 +52,6 @@ namespace cql3 {
template<typename RecognizerType, typename ExceptionBaseType>
class error_listener {
public:
virtual ~error_listener() = default;
/**
* Invoked when a syntax error occurs.

View File

@@ -43,7 +43,7 @@
#include "types.hh"
#include <vector>
#include <iosfwd>
#include <iostream>
#include <boost/functional/hash.hpp>
namespace cql3 {

View File

@@ -41,7 +41,6 @@
#pragma once
#include "utils/big_decimal.hh"
#include "aggregate_function.hh"
#include "native_aggregate_function.hh"
@@ -112,70 +111,9 @@ make_sum_function() {
return make_shared<sum_function_for<Type>>();
}
template <typename Type>
class impl_div_for_avg {
public:
static Type div(const Type& x, const int64_t y) {
return x/y;
}
};
template <>
class impl_div_for_avg<big_decimal> {
public:
static big_decimal div(const big_decimal& x, const int64_t y) {
return x.div(y, big_decimal::rounding_mode::HALF_EVEN);
}
};
// We need a wider accumulator for average, since summing the inputs can overflow
// the input type
template <typename T>
struct accumulator_for;
template <>
struct accumulator_for<int8_t> {
using type = __int128;
};
template <>
struct accumulator_for<int16_t> {
using type = __int128;
};
template <>
struct accumulator_for<int32_t> {
using type = __int128;
};
template <>
struct accumulator_for<int64_t> {
using type = __int128;
};
template <>
struct accumulator_for<float> {
using type = float;
};
template <>
struct accumulator_for<double> {
using type = double;
};
template <>
struct accumulator_for<boost::multiprecision::cpp_int> {
using type = boost::multiprecision::cpp_int;
};
template <>
struct accumulator_for<big_decimal> {
using type = big_decimal;
};
template <typename Type>
class impl_avg_function_for final : public aggregate_function::aggregate {
typename accumulator_for<Type>::type _sum{};
Type _sum{};
int64_t _count = 0;
public:
virtual void reset() override {
@@ -183,9 +121,9 @@ public:
_count = 0;
}
virtual opt_bytes compute(cql_serialization_format sf) override {
Type ret{};
Type ret = 0;
if (_count) {
ret = impl_div_for_avg<Type>::div(_sum, _count);
ret = _sum / _count;
}
return data_type_for<Type>()->decompose(ret);
}

View File

@@ -1,82 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "castas_fcts.hh"
#include "cql3/functions/native_scalar_function.hh"
namespace cql3 {
namespace functions {
namespace {
using bytes_opt = std::experimental::optional<bytes>;
class castas_function_for : public cql3::functions::native_scalar_function {
castas_fctn _func;
public:
castas_function_for(data_type to_type,
data_type from_type,
castas_fctn func)
: native_scalar_function("castas" + to_type->as_cql3_type()->to_string(), to_type, {from_type})
, _func(func) {
}
virtual bool is_pure() override {
return true;
}
virtual void print(std::ostream& os) const override {
os << "cast(" << _arg_types[0]->name() << " as " << _return_type->name() << ")";
}
virtual bytes_opt execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) override {
auto from_type = arg_types()[0];
auto to_type = return_type();
auto&& val = parameters[0];
if (!val) {
return val;
}
auto val_from = from_type->deserialize(*val);
auto val_to = _func(val_from);
return to_type->decompose(val_to);
}
};
shared_ptr<function> make_castas_function(data_type to_type, data_type from_type, castas_fctn func) {
return ::make_shared<castas_function_for>(std::move(to_type), std::move(from_type), std::move(func));
}
} /* Anonymous Namespace */
shared_ptr<function> castas_functions::get(data_type to_type, const std::vector<shared_ptr<cql3::selection::selector>>& provided_args, schema_ptr s) {
if (provided_args.size() != 1) {
throw exceptions::invalid_request_exception("Invalid CAST expression");
}
auto from_type = provided_args[0]->get_type();
auto from_type_key = from_type;
if (from_type_key->is_reversed()) {
from_type_key = dynamic_cast<const reversed_type_impl&>(*from_type).underlying_type();
}
auto f = get_castas_fctn(to_type, from_type_key);
return make_castas_function(to_type, from_type, f);
}
}
}

View File

@@ -1,63 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Modified by ScyllaDB
*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <tuple>
#include <unordered_map>
#include "cql3/functions/function.hh"
#include "cql3/functions/abstract_function.hh"
#include "exceptions/exceptions.hh"
#include "core/print.hh"
#include "cql3/cql3_type.hh"
#include "cql3/selection/selector.hh"
namespace cql3 {
namespace functions {
class castas_functions {
public:
static shared_ptr<function> get(data_type to_type, const std::vector<shared_ptr<cql3::selection::selector>>& provided_args, schema_ptr s);
};
}
}

View File

@@ -59,13 +59,13 @@ public:
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override;
virtual void collect_marker_specification(shared_ptr<variable_specifications> bound_names) override;
virtual shared_ptr<terminal> bind(const query_options& options) override;
virtual cql3::raw_value_view bind_and_get(const query_options& options) override;
virtual bytes_view_opt bind_and_get(const query_options& options) override;
private:
static bytes_opt execute_internal(cql_serialization_format sf, scalar_function& fun, std::vector<bytes_opt> params);
public:
virtual bool contains_bind_marker() const override;
private:
static shared_ptr<terminal> make_terminal(shared_ptr<function> fun, cql3::raw_value result, cql_serialization_format sf);
static shared_ptr<terminal> make_terminal(shared_ptr<function> fun, bytes_opt result, cql_serialization_format sf);
public:
class raw : public term::raw {
function_name _name;

View File

@@ -43,7 +43,7 @@
#include "core/sstring.hh"
#include "db/system_keyspace.hh"
#include <iosfwd>
#include <iostream>
#include <functional>
namespace cql3 {

View File

@@ -59,14 +59,6 @@ functions::init() {
declare(make_to_blob_function(type->get_type()));
declare(make_from_blob_function(type->get_type()));
}
declare(aggregate_fcts::make_count_function<int8_t>());
declare(aggregate_fcts::make_max_function<int8_t>());
declare(aggregate_fcts::make_min_function<int8_t>());
declare(aggregate_fcts::make_count_function<int16_t>());
declare(aggregate_fcts::make_max_function<int16_t>());
declare(aggregate_fcts::make_min_function<int16_t>());
declare(aggregate_fcts::make_count_function<int32_t>());
declare(aggregate_fcts::make_max_function<int32_t>());
declare(aggregate_fcts::make_min_function<int32_t>());
@@ -75,26 +67,6 @@ functions::init() {
declare(aggregate_fcts::make_max_function<int64_t>());
declare(aggregate_fcts::make_min_function<int64_t>());
declare(aggregate_fcts::make_count_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_max_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_min_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_count_function<big_decimal>());
declare(aggregate_fcts::make_max_function<big_decimal>());
declare(aggregate_fcts::make_min_function<big_decimal>());
declare(aggregate_fcts::make_count_function<float>());
declare(aggregate_fcts::make_max_function<float>());
declare(aggregate_fcts::make_min_function<float>());
declare(aggregate_fcts::make_count_function<double>());
declare(aggregate_fcts::make_max_function<double>());
declare(aggregate_fcts::make_min_function<double>());
declare(aggregate_fcts::make_count_function<sstring>());
declare(aggregate_fcts::make_max_function<sstring>());
declare(aggregate_fcts::make_min_function<sstring>());
//FIXME:
//declare(aggregate_fcts::make_count_function<bytes>());
//declare(aggregate_fcts::make_max_function<bytes>());
@@ -104,22 +76,20 @@ functions::init() {
declare(make_varchar_as_blob_fct());
declare(make_blob_as_varchar_fct());
declare(aggregate_fcts::make_sum_function<int8_t>());
declare(aggregate_fcts::make_sum_function<int16_t>());
declare(aggregate_fcts::make_sum_function<int32_t>());
declare(aggregate_fcts::make_sum_function<int64_t>());
declare(aggregate_fcts::make_sum_function<float>());
declare(aggregate_fcts::make_sum_function<double>());
declare(aggregate_fcts::make_sum_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_sum_function<big_decimal>());
declare(aggregate_fcts::make_avg_function<int8_t>());
declare(aggregate_fcts::make_avg_function<int16_t>());
declare(aggregate_fcts::make_avg_function<int32_t>());
declare(aggregate_fcts::make_avg_function<int64_t>());
declare(aggregate_fcts::make_avg_function<float>());
declare(aggregate_fcts::make_avg_function<double>());
declare(aggregate_fcts::make_avg_function<boost::multiprecision::cpp_int>());
declare(aggregate_fcts::make_avg_function<big_decimal>());
#if 0
declare(AggregateFcts.sumFunctionForFloat);
declare(AggregateFcts.sumFunctionForDouble);
declare(AggregateFcts.sumFunctionForDecimal);
declare(AggregateFcts.sumFunctionForVarint);
declare(AggregateFcts.avgFunctionForFloat);
declare(AggregateFcts.avgFunctionForDouble);
declare(AggregateFcts.avgFunctionForVarint);
declare(AggregateFcts.avgFunctionForDecimal);
#endif
// also needed for smp:
#if 0
@@ -329,10 +299,10 @@ function_call::collect_marker_specification(shared_ptr<variable_specifications>
shared_ptr<terminal>
function_call::bind(const query_options& options) {
return make_terminal(_fun, cql3::raw_value::make_value(bind_and_get(options)), options.get_cql_serialization_format());
return make_terminal(_fun, to_bytes_opt(bind_and_get(options)), options.get_cql_serialization_format());
}
cql3::raw_value_view
bytes_view_opt
function_call::bind_and_get(const query_options& options) {
std::vector<bytes_opt> buffers;
buffers.reserve(_terms.size());
@@ -346,7 +316,7 @@ function_call::bind_and_get(const query_options& options) {
buffers.push_back(std::move(to_bytes_opt(val)));
}
auto result = execute_internal(options.get_cql_serialization_format(), *_fun, std::move(buffers));
return options.make_temporary(cql3::raw_value::make_value(result));
return options.make_temporary(result);
}
bytes_opt
@@ -358,7 +328,7 @@ function_call::execute_internal(cql_serialization_format sf, scalar_function& fu
fun.return_type()->validate(*result);
}
return result;
} catch (marshal_exception& e) {
} catch (marshal_exception e) {
throw runtime_exception(sprint("Return of function %s (%s) is not a valid value for its declared return type %s",
fun, to_hex(result),
*fun.return_type()->as_cql3_type()
@@ -377,7 +347,7 @@ function_call::contains_bind_marker() const {
}
shared_ptr<terminal>
function_call::make_terminal(shared_ptr<function> fun, cql3::raw_value result, cql_serialization_format sf) {
function_call::make_terminal(shared_ptr<function> fun, bytes_opt result, cql_serialization_format sf) {
if (!dynamic_pointer_cast<const collection_type_impl>(fun->return_type())) {
return ::make_shared<constants::value>(std::move(result));
}
@@ -443,7 +413,7 @@ function_call::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<
// If all parameters are terminal and the function is pure, we can
// evaluate it now, otherwise we'd have to wait execution time
if (all_terminal && scalar_fun->is_pure()) {
return make_terminal(scalar_fun, cql3::raw_value::make_value(execute(*scalar_fun, parameters)), query_options::DEFAULT.get_cql_serialization_format());
return make_terminal(scalar_fun, execute(*scalar_fun, parameters), query_options::DEFAULT.get_cql_serialization_format());
} else {
return ::make_shared<function_call>(scalar_fun, parameters);
}
@@ -456,7 +426,7 @@ function_call::raw::execute(scalar_function& fun, std::vector<shared_ptr<term>>
for (auto&& t : parameters) {
assert(dynamic_cast<terminal*>(t.get()));
auto&& param = static_cast<terminal*>(t.get())->get(query_options::DEFAULT);
buffers.push_back(std::move(to_bytes_opt(param)));
buffers.push_back(std::move(param));
}
return execute_internal(cql_serialization_format::internal(), fun, buffers);

View File

@@ -42,7 +42,6 @@
#pragma once
#include "core/sstring.hh"
#include "seastarx.hh"
#include <experimental/optional>

View File

@@ -111,7 +111,7 @@ lists::literal::test_assignment(database& db, const sstring& keyspace, shared_pt
sstring
lists::literal::to_string() const {
return std::to_string(_elements);
return ::to_string(_elements);
}
lists::value
@@ -133,9 +133,9 @@ lists::value::from_serialized(bytes_view v, list_type type, cql_serialization_fo
}
}
cql3::raw_value
bytes_opt
lists::value::get(const query_options& options) {
return cql3::raw_value::make_value(get_with_protocol_version(options.get_cql_serialization_format()));
return get_with_protocol_version(options.get_cql_serialization_format());
}
bytes
@@ -196,12 +196,10 @@ lists::delayed_value::bind(const query_options& options) {
for (auto&& t : _elements) {
auto bo = t->bind_and_get(options);
if (bo.is_null()) {
if (!bo) {
throw exceptions::invalid_request_exception("null is not supported inside collections");
}
if (bo.is_unset_value()) {
return constants::UNSET_VALUE;
}
// We don't support value > 64K because the serialization format encode the length as an unsigned short.
if (bo->size() > std::numeric_limits<uint16_t>::max()) {
throw exceptions::invalid_request_exception(sprint("List value is too long. List values are limited to %d bytes but %d bytes value provided",
@@ -218,10 +216,8 @@ lists::delayed_value::bind(const query_options& options) {
lists::marker::bind(const query_options& options) {
const auto& value = options.get_value_at(_bind_index);
auto ltype = static_pointer_cast<const list_type_impl>(_receiver->type);
if (value.is_null()) {
if (!value) {
return nullptr;
} else if (value.is_unset_value()) {
return constants::UNSET_VALUE;
} else {
return make_shared(value::from_serialized(*value, std::move(ltype), options.get_cql_serialization_format()));
}
@@ -242,11 +238,7 @@ lists::precision_time::get_next(db_clock::time_point millis) {
}
void
lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
const auto& value = _t->bind(params._options);
if (value == constants::UNSET_VALUE) {
return;
}
lists::setter::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
if (column.type->is_multi_cell()) {
// delete + append
collection_type_impl::mutation mut;
@@ -255,7 +247,7 @@ lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const u
auto col_mut = ctype->serialize_mutation_form(std::move(mut));
m.set_cell(prefix, column, std::move(col_mut));
}
do_append(value, m, prefix, column, params);
do_append(_t, m, prefix, column, params);
}
bool
@@ -270,24 +262,24 @@ lists::setter_by_index::collect_marker_specification(shared_ptr<variable_specifi
}
void
lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
lists::setter_by_index::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
// we should not get here for frozen lists
assert(column.type->is_multi_cell()); // "Attempted to set an individual element on a frozen list";
std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
auto index = _idx->bind_and_get(params._options);
if (index.is_null()) {
throw exceptions::invalid_request_exception("Invalid null value for list index");
}
if (index.is_unset_value()) {
throw exceptions::invalid_request_exception("Invalid unset value for list index");
}
auto value = _t->bind_and_get(params._options);
if (value.is_unset_value()) {
return;
if (!index) {
throw exceptions::invalid_request_exception("Invalid null value for list index");
}
auto idx = net::ntoh(int32_t(*unaligned_cast<int32_t>(index->begin())));
auto&& existing_list_opt = params.get_prefetched_list(m.key().view(), prefix.view(), column);
auto&& existing_list_opt = params.get_prefetched_list(m.key(), std::move(row_key), column);
if (!existing_list_opt) {
throw exceptions::invalid_request_exception("Attempted to set an element on a list which is null");
}
@@ -322,10 +314,15 @@ lists::setter_by_uuid::requires_read() {
}
void
lists::setter_by_uuid::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
lists::setter_by_uuid::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
// we should not get here for frozen lists
assert(column.type->is_multi_cell()); // "Attempted to set an individual element on a frozen list";
std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
auto index = _idx->bind_and_get(params._options);
auto value = _t->bind_and_get(params._options);
@@ -345,27 +342,24 @@ lists::setter_by_uuid::execute(mutation& m, const clustering_key_prefix& prefix,
}
void
lists::appender::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
const auto& value = _t->bind(params._options);
if (value == constants::UNSET_VALUE) {
return;
}
lists::appender::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
assert(column.type->is_multi_cell()); // "Attempted to append to a frozen list";
do_append(value, m, prefix, column, params);
do_append(_t, m, prefix, column, params);
}
void
lists::do_append(shared_ptr<term> value,
lists::do_append(shared_ptr<term> t,
mutation& m,
const clustering_key_prefix& prefix,
const exploded_clustering_prefix& prefix,
const column_definition& column,
const update_parameters& params) {
auto&& value = t->bind(params._options);
auto&& list_value = dynamic_pointer_cast<lists::value>(value);
auto&& ltype = dynamic_pointer_cast<const list_type_impl>(column.type);
if (column.type->is_multi_cell()) {
// If we append null, do nothing. Note that for Setter, we've
// already removed the previous value so we're good here too
if (!value || value == constants::UNSET_VALUE) {
if (!value) {
return;
}
@@ -391,10 +385,10 @@ lists::do_append(shared_ptr<term> value,
}
void
lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
lists::prepender::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
assert(column.type->is_multi_cell()); // "Attempted to prepend to a frozen list";
auto&& value = _t->bind(params._options);
if (!value || value == constants::UNSET_VALUE) {
if (!value) {
return;
}
@@ -423,10 +417,15 @@ lists::discarder::requires_read() {
}
void
lists::discarder::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
lists::discarder::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
assert(column.type->is_multi_cell()); // "Attempted to delete from a frozen list";
auto&& existing_list = params.get_prefetched_list(m.key().view(), prefix.view(), column);
std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
auto&& existing_list = params.get_prefetched_list(m.key(), std::move(row_key), column);
// We want to call bind before possibly returning to reject queries where the value provided is not a list.
auto&& value = _t->bind(params._options);
@@ -442,7 +441,7 @@ lists::discarder::execute(mutation& m, const clustering_key_prefix& prefix, cons
return;
}
if (!value || value == constants::UNSET_VALUE) {
if (!value) {
return;
}
@@ -475,21 +474,22 @@ lists::discarder_by_index::requires_read() {
}
void
lists::discarder_by_index::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
lists::discarder_by_index::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
assert(column.type->is_multi_cell()); // "Attempted to delete an item by index from a frozen list";
auto&& index = _t->bind(params._options);
if (!index) {
throw exceptions::invalid_request_exception("Invalid null value for list index");
}
if (index == constants::UNSET_VALUE) {
return;
}
auto ltype = static_pointer_cast<const list_type_impl>(column.type);
auto cvalue = dynamic_pointer_cast<constants::value>(index);
assert(cvalue);
auto&& existing_list_opt = params.get_prefetched_list(m.key().view(), prefix.view(), column);
std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
auto&& existing_list_opt = params.get_prefetched_list(m.key(), std::move(row_key), column);
int32_t idx = read_simple_exactly<int32_t>(*cvalue->_bytes);
if (!existing_list_opt) {
throw exceptions::invalid_request_exception("Attempted to delete an element from a list which is null");

Some files were not shown because too many files have changed in this diff Show More