Compare commits

..

102 Commits

Author SHA1 Message Date
Pekka Enberg
c87b24cc9a Merge "gossip mark alive fixes" from Asias
"This series fixes the user after free issue in gossip and elimates the
duplicated / unnecessary mark alive operations.

Fixes #2341"

* tag 'asias/gossip_fix_mark_alive/v1' of github.com:cloudius-systems/seastar-dev:
  gossip: Ignore callbacks and mark alive operation in shadow round
  gossip: Ingore the duplicated mark alive operation
  gossip: Fix user after free in mark_alive

(cherry picked from commit 1e04731fa0)
2017-05-09 01:58:02 +03:00
Takuya ASADA
32cd286a5c dist/debian/debian/scylla-server.upstart: export SCYLLA_CONF, SCYLLA_HOME
We are sourcing sysconfig file on upstart, but forgot to load them as
environment variables.
So export them.

Fixes #2236

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1491209505-32293-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit b087616a6c)
2017-05-03 10:23:32 +03:00
Avi Kivity
80e7948d64 Merge "] Fix problems with slicing using sstable's promoted index" from Tomasz
"Fixes #2327.
Fixes #2326."

* 'tgrabiec/fix-promoted-index-parsing-1.7' of github.com:cloudius-systems/seastar-dev:
  sstables: Fix incorrect parsing of cell names in promoted index
  sstables: Fix find_disk_ranges() to not miss relevant range tombstones

(cherry picked from commit ea0591ad3d)
2017-04-30 17:05:01 +03:00
Pekka Enberg
f58104c434 release: prepare for 1.5.4 2017-04-26 07:53:51 +03:00
Tomasz Grabiec
f2517b5e04 sstables: Fix usage of wrong comparator in find_disk_ranges()
This made a difference if clustering restriction bounds were not full
keys but prefixes.

Fixes #2272.

Message-Id: <1493058357-24156-1-git-send-email-tgrabiec@scylladb.com>
2017-04-24 21:56:55 +03:00
Pekka Enberg
2ba78dc48c release: prepare for 1.5.3 2017-04-24 19:25:29 +03:00
Asias He
ccb0c33f75 gossip: Fix possible use-after-free of entry in endpoint_state_map
We take a reference of endpoint_state entry in endpoint_state_map. We
access it again after code which defers, the reference can be invalid
after the defer if someone deletes the entry during the defer.

Fix this by checking take the reference again after the defering code.

I also audited the code to remove unsafe reference to endpoint_state_map entry
as much as possible.

Fixes the following SIGSEGV:

Core was generated by `/usr/bin/scylla --log-to-syslog 1 --log-to-stdout
0 --default-log-level info --'.
Program terminated with signal SIGSEGV, Segmentation fault.
(this=<optimized out>) at /usr/include/c++/5/bits/stl_pair.h:127
127     in /usr/include/c++/5/bits/stl_pair.h
[Current thread is 1 (Thread 0x7f1448f39bc0 (LWP 107308))]

Fixes #2271

Message-Id: <529ec8ede6da884e844bc81d408b93044610afd2.1491960061.git.asias@scylladb.com>
(cherry picked from commit d27b47595b)
2017-04-24 18:48:45 +03:00
Duarte Nunes
b24cb89d7c alter_type_statement: Fix signed to unsigned conversion
This could allow us to alter a non-existing field of an UDT.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170419114254.5582-1-duarte@scylladb.com>
(cherry picked from commit e06bafdc6c)
2017-04-19 14:48:56 +03:00
Avi Kivity
7c812efc8f Update seastar submodule
* seastar 548d67d...035cc15 (1):
  > prometheus: Use type_instance to diffentiate between metrics

Fixes #2221.
2017-03-28 11:19:24 +03:00
Amos Kong
c2ec9ef53e scylla_setup: match '-p' option of lsblk with strict pattern
On Ubuntu 14.04, the lsblk doesn't have '-p' option, but
`scylla_setup` try to get block list by `lsblk -pnr` and
trigger error.

Current simple pattern will match all help content, it might
match wrong options.
  scylla-test@amos-ubuntu-1404:~$ lsblk --help | grep -e -p
   -m, --perms          output info about permissions
   -P, --pairs          use key="value" output format

Let's use strict pattern to only match option at the head. Example:
  scylla-test@amos-ubuntu-1404:~$ lsblk --help | grep -e '^\s*-D'
   -D, --discard        print discard capabilities

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <4f0f318353a43664e27da8a66855f5831457f061.1489712867.git.amos@scylladb.com>
(cherry picked from commit 468df7dd5f)
2017-03-20 11:16:50 +02:00
Pekka Enberg
2200cea895 release: prepare for 1.5.2 2017-03-09 21:47:51 +02:00
Asias He
c024dfe093 repair: Fix midpoint is not contained in the split range assertion in split_and_add
We have:

  auto halves = range.split(midpoint, dht::token_comparator());

We saw a case where midpoint == range.start, as a result, range.split
will assert becasue the range.start is marked non-inclusive, so the
midpoint doesn't appear to be contain()ed in the range - hence the
assertion failure.

Fixes #2148

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Asias He <asias@scylladb.com>
Message-Id: <93af2697637c28fbca261ddfb8375a790824df65.1489023933.git.asias@scylladb.com>
(cherry picked from commit 39d2e59e7e)
2017-03-09 10:39:19 +02:00
Nadav Har'El
0fff3b60b1 sstable decompression: fix skip() to end of file
The skip() implementation for the compressed file input stream incorrectly
handled the case of skipping to the end of file: In that case we just need
to update the file pointer, but not skip anywhere in the compressed disk
file; In particular, we must NOT call locate() to find the relevant on-disk
compressed chunk, because there is none - locate() can only be called on
actual positions of bytes, not on the one-past-end-of-file position.

Fixes #2143

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170308100057.23316-1-nyh@scylladb.com>
(cherry picked from commit 506e074ba4)
2017-03-08 12:36:08 +02:00
Gleb Natapov
b9fb4442cc memtable: do not open code logalloc::reclaim_lock use
logalloc::reclaim_lock prevents reclaim from running which may cause
regular allocation to fail although there is enough of free memory.
To solve that there is an allocation_section which acquire reclaim_lock
and if allocation fails it run reclaimer outside of a lock and retries
the allocation. The patch make use of allocation_section instead of
direct use of reclaim_lock in memtable code.

Fixes #2138.

Message-Id: <20170306160050.GC5902@scylladb.com>
(cherry picked from commit d7bdf16a16)
2017-03-07 11:17:06 +02:00
Tomasz Grabiec
dc20d19f52 db: Fix overflow of gc_clock time point
If query_time is time_point::min(), which is used by
to_data_query_result(), the result of subtraction of
gc_grace_seconds() from query_time will overflow.

I don't think this bug would currently have user-perceivable
effects. This affects which tombstones are dropped, but in case of
to_data_query_result() uses, tombstones are not present in the final
data query result, and mutation_partition::do_compact() takes
tombstones into consideration while compacting before expiring them.

Fixes the following UBSAN report:

  /usr/include/c++/5.3.1/chrono:399:55: runtime error: signed integer overflow: -2147483648 - 604800 cannot be represented in type 'int'

Message-Id: <1488385429-14276-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 4b6e77e97e)
2017-03-01 18:50:55 +02:00
Avi Kivity
ae830d9c51 Update seastar submodule
* seastar bc44be9...548d67d (1):
  > fix append_challenged_posix_file_impl::process_queue() to handle recursion

Fixes #2121.
2017-02-28 11:26:21 +02:00
Gleb Natapov
3336c1cf97 sstable: close sstable_writer's file if writing of sstable fails.
Failing to close a file properly before destroying file's object causes
crashes.

[tgrabiec: fixed typo]

Message-Id: <20170221144858.GG11471@scylladb.com>
(cherry picked from commit 0977f4fdf8)

Fixes #2122.
2017-02-28 11:24:34 +02:00
Tomasz Grabiec
181162b326 sstables: Fix double close on index and data files when writing fails
file output streams take the responsibility of closing the file, they
will close the file as part of closing the stream.

During sstable writing we create sstable object and keep file
references there as well. Sstable object also has responsibility for
closing the files, and does so from sstable::~sstable().

Double close was supposed to be avoided by a construct like this:

  writer.close().get();
  _file = {};

However if close() failed, which can happen when write-ahead failed,
_file would not be cleared, and both the writer and sstable would
close the file. This will result in a crash in
append_challenged_posix_file_impl::close(), which is not prepared to
be closed twice.

Another problem is that if exception happened before we reached that
construct, we still should close the writer. Currently we don't, so
there's no double close on the file, but that's a bug which needs to
be fixed and once that's fixed double close on _file will be even more
likely.

The fix employed here is to not keep files inside sstable object when
writing. As soon as the writer is constructed, it's the only owner of
the file.

Fixes #1764.

Message-Id: <1482428648-22553-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit f2a63270d1)
2017-02-28 11:16:56 +02:00
Shlomi Livne
a00c273b7d dist/redhat : fix backport of scylla.spec.in
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2017-02-15 23:55:57 +02:00
Takuya ASADA
2fa4aad847 dist/redhat: stop backporting ninja-build from Fedora, install it from EPEL instead
ninja-build-1.6.0-2.fc23.src.rpm on fedora web site deleted for some
reason, but there is ninja-build-1.7.2-2 on EPEL, so we don't need to
backport from Fedora anymore.

Fixes #2087

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1487155729-13257-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 9c8515eeed)
(cherry picked from commit a04722aeb2)
2017-02-15 13:07:40 +02:00
Avi Kivity
a7daa0655c Update seastar submodule
* seastar f4b5be5...bc44be9 (1):
  > prometheus: send one MetricFamily per unique metric name

Fixes #2077.
Fixes #2078.
2017-02-13 16:28:07 +02:00
Shlomi Livne
19c4353607 release: prepare for 1.5.1
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2017-02-06 17:19:09 +02:00
Avi Kivity
12428fd29e Merge "Avoid avalanche of tasks after memtable flush" from Tomasz
"Before, the logic for releasing writes blocked on dirty worked like this:

  1) When region group size changes and it is not under pressure and there
     are some requests blocked, then schedule request releasing task

  2) request releasing task, if no pressure, runs one request and if there are
     still blocked requests, schedules next request releasing task

If requests don't change the size of the region group, then either some request
executes or there is a request releasing task scheduled. The amount of scheduled
tasks is at most 1, there is a single releasing thread.

However, if requests themselves would change the size of the group, then each
such change would schedule yet another request releasing thread, growing the task
queue size by one.

The group size can also change when memory is reclaimed from the groups (e.g.
when contains sparse segments). Compaction may start many request releasing
threads due to group size updates.

Such behavior is detrimental for performance and stability if there are a lot
of blocked requests. This can happen on 1.5 even with modest concurrency
because timed out requests stay in the queue. This is less likely on 1.6 where
they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in the
system. When the amount of scheduled tasks reaches 1000, polling stops and
server becomes unresponsive until all of the released requests are done, which
is either when they start to block on dirty memory again or run out of blocked
requests. It may take a while to reach pressure condition after memtable flush
if it brings virtual dirty much below the threshold, which is currently the
case for workloads with overwrites producing sparse regions.

I saw this happening in a write workload from issue #2021 where the number of
request releasing threads grew into thousands.

Fix by ensuring there is at most one request releasing thread at a time. There
will be one releasing fiber per region group which is woken up when pressure is
lifted. It executes blocked requests until pressure occurs."

* tag 'tgrabiec/lsa-single-threaded-releasing-v2' of github.com:cloudius-systems/seastar-dev:
  tests: lsa: Add test for reclaimer starting and stopping
  tests: lsa: Add request releasing stress test
  lsa: Avoid avalanche releasing of requests
  lsa: Move definitions to .cc
  lsa: Simplify hard pressure notification management
  lsa: Do not start or stop reclaiming on hard pressure
  tests: lsa: Adjust to take into account that reclaimers are run synchronously
  lsa: Document and annotate reclaimer notification callbacks
  tests: lsa: Use with_timeout() in quiesce()

(cherry picked from commit 7a00dd6985)
2017-02-03 10:16:00 +01:00
Tomasz Grabiec
41b7482ec9 Update seastar submodule
* seastar bd9eda1...f4b5be5 (1):
  > future-util: Introduce with_timeout()
2017-02-03 10:16:00 +01:00
Takuya ASADA
cfeb5c62ba dist/redhat: add python-setuptools on dependency since it requires for scylla-housekeeping
scylla-housekeeping breaks when python-setuptools doesn't installed, so
add it on dependency.

Fixes #1884

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1483525828-7507-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 43655512e1)
2017-01-04 14:34:38 +02:00
Benoît Canet
0529766e00 scylla_setup: Use blkid or ls to list potentials block devices
blkid does not list root raw device.

Revert to lsblk while taking care of having a fallback
path in case the -p option is not supported.

Fixes #1963.

Suggested-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <20161225100204.13297-1-benoit@scylladb.com>
(cherry picked from commit a24ff47c63)
2016-12-27 15:21:18 +02:00
Raphael S. Carvalho
00a25f1698 db: avoid excessive disk usage during sstable resharding
Shared sstables will now be resharded in the same order to guarantee
that all shards owning a sstable will agree on its deletion nearly
the same time, therefore, reducing disk space requirement.
That's done by picking which column family to reshard in UUID order,
and each individual column family will reshard its shared sstables
in generation order.

Fixes #1952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <87ff649ed24590c55c00cbb32bffd8fa2743e36e.1482342754.git.raphaelsc@scylladb.com>
(cherry picked from commit 27fb8ec512)
2016-12-27 12:19:37 +02:00
Takuya ASADA
17c6fe8b77 dist/redhat: don't try to adduser when user is already exists
Currently we get "failed adding user 'scylla'" on .rpm installation when user is already exists, we can skip it to prevent error.

Fixes #1958

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1482550075-27939-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit f3e45bc9ef)
2016-12-27 09:47:39 +02:00
Vlad Zolotarov
2228c2edf7 tracing: don't start tracing until a Tracing service is fully initialized
RPC messaging service is initialized before the Tracing service, so
we should prevent creation of tracing spans before the service is
fully initialized.

We will use an already existing "_down" state and extend it in a way
that !_down equals "started", where "started" is TRUE when the local
service is fully initialized.

We will also split the Tracing service initialization into two parts:
   1) Initialize the sharded object.
   2) Start the tracing service:
      - Create the I/O backend service.
      - Enable tracing.

Fixes issue #1939

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1482424317-6665-1-git-send-email-vladz@scylladb.com>
2016-12-22 19:40:04 +02:00
Glauber Costa
78f2f50f09 track streaming and system virtual dirty memory
A case could be made that we should have counters for them no matter
what, since it can help us reason about the distribution of memory among
the groups. But with the hierarchy being broken in 1.5 it becomes even
more important. Now by looking solely at dirty, we will have no idea
about how much memory we are using in those groups.

After this patch, the dirty_memory_manager will register its metrics
for the 3 groups that we have, and the legacy names will be used to show
totals.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0d04ca4c7e8472097f16a5dc950b77c73766049e.1481831644.git.glauber@scylladb.com>
(cherry picked from commit 7133583797)
2016-12-22 13:51:22 +02:00
Tomasz Grabiec
eaaabcb5d6 tests: Remove unintentional enablement of trace-level logging
Sneaked in by mistake.

(cherry picked from commit c9344826e9)
2016-12-21 15:38:51 +01:00
Pekka Enberg
654919cbf1 release: prepare for 1.5.0 2016-12-21 12:12:11 +02:00
Tomasz Grabiec
0d0e53c524 tests: commitlog: Fix assumption about write visibility
The test assumed that mutations added to the commitlog are visible to
reads as soon as a new segment is opened. That's not true because
buffers are written back in the background, and new segment may be
active while the previous one is still being written or not yet
synced.

Fix the test so that it expectes that the number of mutations read
this way is <= the number of mutations read, and that after all
segments are synced, the number of mutations read is equal.

Message-Id: <1481630481-19395-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit fe6a70dba1)
2016-12-20 20:08:48 +01:00
Glauber Costa
99d9b4e727 commitlog: correctly report requests blocked
The semaphore future may be unavailable for many reasons. Specifically,
if the task quota is depleted right between sem.wait() and the .then()
clause in get_units() the resulting future won't be available.

That is particularly visible if we decrease the task quota, since those
events will be more frequent: we can in those cases clearly see this
counter going up, even though there aren't more requests pending than
usual.

This patch improves the situation by replacing that check. We now verify
whether or not there are waiters in the semaphore.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <113c0d6b43cd6653ce972541baf6920e5765546b.1481222621.git.glauber@scylladb.com>
(cherry picked from commit 9b5e6d6bd8)
2016-12-19 15:26:35 +01:00
Pekka Enberg
e2790748e6 release: prepare for 1.5.rc3 2016-12-18 11:14:09 +02:00
Tomasz Grabiec
e82324fb82 Merge branch 'virtual-dirty-fixes-1.5-backport' from git@github.com:glommer/scylla.git into branch-1.5
Rework dirty memory hierarchy from Glauber.
2016-12-16 19:48:08 +01:00
Glauber Costa
1ae62678e9 config: get rid of memtable_total_space
Those values are now statically set.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 2aa6514667)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-12-16 10:56:34 -05:00
Glauber Costa
09a463fd31 database: rework dirty memory hierarchy
Issue #1918 describes a problem, in which we are generating smaller
memtables than we could, and therefore not respecting the flush
criteria.

That happens because group sizes (and limits) for pressure purposes, and
the the soft threshold is currently at 40 %. This causes system group's
soft threshold to be way below regular's virtual dirty limit and close
to regular group's soft threshold. The system group was very likely to
become under soft pressure when regular was because writes to regular
group are not yet throttled when they cross both soft thresholds.

This is a direct consequence of the linear hierarchy between the regions
and to guarantee that it won't happen we would have acqire the semaphore
of all ancestor regions when flushing from a child region. While that
works, it can lead to problems on its own, like priority inversion if
the regions have different priorities - like streaming and regular, and
groups lower in the hierarchy, like user, blocking explicit flushes
from their ancestors

To fix that, this patch reorganizes the dirty memory region groups so
that groups are now completely independent. As a disadvantage, when
streaming happen we will draw some memory from the cache, but we will
live with it for the time being.

Fixes #1918

[ glauber: fix conflicts in memtable.cc due to lack of graceful clear ]

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 80440c0d79)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-12-16 10:55:57 -05:00
Glauber Costa
347136380d system keyspace: write batchlog mutation in user memory
Batchlog is a potentially memory-intensive table whose workload is
driven by user needs, not system's. Move it to the user dirty memory
manager.

[ glauber: fix conflict with virtual readers ]

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit db7cc3cba8)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-12-16 10:54:47 -05:00
Glauber Costa
8680174f37 database: remove friendship declaration
Not needed anymore since memtable started having a direct pointer to the
memtable list.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 2e8c7d2c62)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-12-16 10:53:40 -05:00
Glauber Costa
261b67f4f5 database: simplify flush_one
flush_one has to make sure that we're using the correct
dirty_memory_manager object, because we could be flushing from a region
group different than the one the flush request originated.

It's simpler to just assume flush_one will be dealing with the right
object, and use a different object instead of "this" when calling it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit bb1509c21e)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-12-16 10:53:33 -05:00
Glauber Costa
bb173e3e2a database: make memtable_list aware in cases it can't flush
Some of our CFs can't be flushed. Those are the ones who are not marked
as having durable writes. We treat them just the same from the point of
view of the flush logic, but they provide a function that doesn't do
anything and just returns right away.

We already had troubles with that in the past, and that also poses a
problem for an upcoming patch reworking the flush memtable pick
criteria.

It's easier, simpler, and cleaner, to just make the memtable_list aware
it can't flush. Achieving that is also not very complicated: we just
need a special constructor that doesn't take a seal function and then we
make sure that it is initialized to an empty std::function

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 8ab7c04caa)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-12-16 10:53:24 -05:00
Glauber Costa
9688dca861 database: move reversion of virtual dirty state closer to update_cache.
When we finish writing a memtable, we revert the dirty memory charges
immediately. When we do that, dirty memory will grow back to what it
was, and soon (we hope) will go down again when we release the requests
for real.

During that time, we may not accept new requests. Sealing can take a
long time, specially in the face of Linux issues like the ones we have
seen in the past. It also will take proportionally more time if the
SSTables end up being small, which is a possibility in some scenarios.

This patch changes the dirty_memory_manager so that the charges won't be
reverted right after we finish the flush. Rather, we will hold on to it,
and revert it right before we update the cache. We don't need to do it
for all classes of memtable writes, because after we finish flushing,
flush_one() will destroy the hashed element anyway.

[tgrabiec: conflicts]

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <2d5a8f6ca57d5036f4850ac163557bca59b8063d.1480004384.git.glauber@scylladb.com>
(cherry picked from commit c32803f2f0)
2016-12-12 19:05:34 +01:00
Duarte Nunes
549c979035 lz4: Conditionally use LZ4_compress_default()
Since not all distributions have a version of LZ4 with
LZ4_compress_default(), we use it conditionally.

This is specially important beginning with version 1.7.3 of LZ4,
which deprecates the LZ4_compress() function in favour of
LZ4_compress_default() and thus prevents Scylla from compiling
due to the deprecated warning.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161124092339.23017-1-duarte@scylladb.com>
(cherry picked from commit cc3f26c993)
2016-12-11 19:33:26 +02:00
Avi Kivity
631d921767 Update seastar submodule
* seastar 386ccd9...bd9eda1 (1):
  > rpc: Conditionally use LZ4_compress_default()
2016-12-11 19:24:06 +02:00
Glauber Costa
0a341b403b database: try to acquire semaphore before we start flush
As Tomek pointed out, as we are starting the flush before we acquire the
semaphore, we are not really limiting parallelism, but only delaying the
end of the flush instead.

Fixes #1919

[tgrabiec: conflicts]

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <6cbf9ec2f3a341c76becf94f794cfa16539c5192.1481120410.git.glauber@scylladb.com>
(cherry picked from commit 733d87fcc6)
2016-12-09 10:56:36 +01:00
Avi Kivity
182f67cf23 sstables: fix probe with Unknown component
Commit 53b7b7def3 ("sstables: handle unrecognized sstable component")
ignores unrecognized components, but misses one code path during probe_file().

Ignore unrecognized components there too.

Fixes #1922.
Message-Id: <20161208131027.28939-1-avi@scylladb.com>

(cherry picked from commit 872b5ef5f0)
2016-12-08 17:23:30 +02:00
Tomasz Grabiec
dc08cb46bb commitlog: Fix replay to not delete dirty segments
The problem is that replay will unlink any segments which were on disk
at the time the replay starts. However, some of those segments may
have been created by current node since the boot. If a segment is part
of reserve for example, it will be unlinked by replay, but we will
still use that segment to log mutations. Those mutations will not be
visible to replay after a crash though.

The fix is to record preexisting segents before any new segments will
have a chance to be created and use that as the replay list.

Introduced in abe7358767.

dtest failure:

 commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup

Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit f7197dabf8)
2016-12-07 15:56:08 +02:00
Amos Kong
06db918d1e systemd: reset housekeeping timer at each start
Currently housekeeping timer won't be reset when we restart scylla-server.
We expect the service to be run at each start, it will be consistent with
upstart script in Ubuntu 14.04

When we restart scylla-server, housekeepting timer will also be restarted,
so let's replace "OnBootSec" with "OnActiveSec".

Fixes: #1601

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <a22943cc11a3de23db266c52fd476c08014098c4.1480607401.git.amos@scylladb.com>
2016-12-06 18:33:56 +02:00
Takuya ASADA
edbd25ea0c dist/common/systemd/scylla-housekeeping.timer: workaround to avoid crash of systemd on RHEL 7.3
RHEL 7.3's systemd contains known bug on timer.c:
https://github.com/systemd/systemd/issues/2632

This is workaround to avoid hitting bug.

Fixes #1846

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1480452194-11683-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 8464903021)
2016-12-06 10:48:52 +02:00
Pekka Enberg
c7f7a3aaa1 release: prepare for 1.5.rc2 2016-12-05 09:50:37 +02:00
Paweł Dziepak
c014e7385d row_cache: dummy entry does not count as partition
Since continuity flag introduction row cache contains a single dummy
entry. cache_tracker knows nothing about it so that it doesn't appear in
any of the metrics. However, cache destructor calls
cache_tracker::on_erase() for every entry in the cache including the
dummy one. This is incorrect since the tracker wasn't informed when the
dummy entry was created.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478608776-10363-1-git-send-email-pdziepak@scylladb.com>
2016-12-01 14:26:20 +01:00
Glauber Costa
abe7358767 prevent commitlog replay position reordering during reserve refill
When requests hit the commitlog, each of them will be assigned a replay
position, which we expect to be ordered. If reorders happen, the request
will be discarded and re-applied. Although this is supposed to be rare,
it does increase our latencies, specially when big requests are
involved. Processing big requests is expensive and if we have to do it
twice that adds to the cost.

The commitlog is supposed to issue replay positions in order, and it
coudl be that the code that adds them to the memtables will reorder
them. However, there is one instance in which the commitlog will not
keep its side of the bargain.

That happens when the reserve is exhausted, and we are allocating a
segment directly at the same time the reserve is being replenished.  The
following sequence of events with its deferring points will ilustrate
it:

on_timer:

    return this->allocate_segment(false). // defer here // then([this](sseg_ptr s) {

At this point, the segment id is already allocated.

new_segment():

    if (_reserve_segments.empty()) {
	[ ... ]
        return allocate_segment(true).then ...

At this point, we have a new segment that has an id that is higher than
the previous id allocated.

Then we resume the execution from the deferring point in on_timer():

    i = _reserve_segments.emplace(i, std::move(s));

The next time we need to allocate a segment, we'll pick it from the
reserve. But the segment in the reserve has an id that is lower than the
id that we have already used.

Reorders are bad, but this one is particularly bad: because the reorder
happens with the segment id side of the replay position, that means that
every request that falls into that segment will have to be reinserted.

This bug can be a bit tricky to reproduce. To make it more common, we
can artificially add a sleep() fiber after the allocate_segment(false)
in on_timer(). If we do that, we'll see a sea of reinsertions going on
in the logs (if dblog is set to debug).

Applying this patch (keeping the sleep) will make them all disappear.
We do this by rewriting the reserve logic, so that the segments always
come from the reserve. If we draw from a single pool all the time, there
is no chance of reordering happening. To make that more amenable, we'll
have the reserve filler always running in the background and take it out
of the timer code.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <49eb7edfcafaef7f1fdceb270639a9a8b50cfce7.1480531446.git.glauber@scylladb.com>
(cherry picked from commit 99a5a77234)
2016-12-01 13:33:35 +01:00
Glauber Costa
0bce019781 commitlog: sync segments before acquiring semaphore on shutdown.
Sync all segments before acquiring the semaphore, otherwise waiting may
have to wait for the timer to kick in and push them down.
Note that we can't guarantee that no other requests were executed in the
mean time, so we have to sync again.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <aea019fe49820acce5d2b55dd5ec31e975b3436c.1480388674.git.glauber@scylladb.com>
(cherry picked from commit 353a4cd2d4)
2016-12-01 13:33:35 +01:00
Tomasz Grabiec
ae3b1667e3 tests: Fix use-after-free on commitlog
Only shutdown() ensures all internal processes are complete. Call it before calling clear().

Message-Id: <1480495534-2253-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit c35e18ba12)
2016-12-01 13:33:35 +01:00
Tomasz Grabiec
2aa73ac176 Update seastar submodule
* seastar 6fd4534...386ccd9 (1):
  > queue: allow queue to change its maximum size
2016-12-01 13:33:35 +01:00
Avi Kivity
261fcc1e12 Update scylla-ami submodule
* dist/ami/files/scylla-ami e1e3919...d5a4397 (3):
  > scylla_install_ami: allow specify different repository for Scylla installation and receive update
  > scylla_install_ami: delete unneeded authorized_keys from AMI image
  > scylla_ami_setup: run posix_net_conf.sh when NCPUS < 8
2016-12-01 10:46:21 +02:00
Takuya ASADA
3a7b9d55da dist/ami: allow specify different repository for Scylla installation and receive update
This fix splits build_ami.sh --repo to three different options:
 --repo-for-install is for Scylla package installation, only valid
 during AMI construction.

 --repo-for-update will be stored at /etc/yum.repos.d/scylla.repo, to
 receive update package on AMI.

 --repo is both, for installation and update.

Fixes #1872

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1480438858-6007-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 17ef5e638e)
2016-12-01 10:44:47 +02:00
Glauber Costa
60d5b21e28 database: do not call seal directly from the streaming timer
Streaming memtable have a delayed mode where many flushes are coalesced
together into one, with the actual flush happening later and propagated
to all the previous waiters.

However, the timer that triggers the actual flush was not using the
newly introduced flush infrastructure. This was a minor problem because
those flushes wouldn't try to take the semaphore, and so we could have
many flushes going on at the same time.

What was a potential performance issue became a correctness issue when
we moved the reversal of the dirty memory accounting out of
revert_potentially_cleaned_up_memory() into remove_from_flush_manager().

Since the latter is only called through the flush infrastructure, it
simply wasn't called. So the deferral of the reversal exposed this bug.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0d5755375bc27524b8cfb9970c76d492b14d9eea.1480522742.git.glauber@scylladb.com>
(cherry picked from commit d7256e7b21)
2016-11-30 18:01:52 +01:00
Glauber Costa
903a323ba2 commitlog: use read ahead for replay requests
Aside from putting the requests in the commitlog class, read ahead
will help us going through the file faster.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 59a41cf7f1)
2016-11-30 13:03:33 +02:00
Glauber Costa
0174b9ad18 commitlog: use commitlog priority for replay
Right now replay is being issued with the standard seastar priority.
The rationale for that at the time is that it is an early event that
doesn't really share the disk with anybody.

That is largely untrue now that we start compactions on boot.
Compactions may fight for bandwidth with the commitlog, and with such
low priority the commitlog is guaranteed to lose.

Fixes #1856

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit aa375cd33d)
2016-11-30 13:03:27 +02:00
Glauber Costa
3b7f646f88 commitlog: close file after read, and not at stop
There are other code paths that may interrupt the read in the middle
and bypass stop. It's safer this way.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <8c32ca2777ce2f44462d141fd582848ac7cf832d.1479477360.git.glauber@scylladb.com>
(cherry picked from commit 60b7d35f15)
2016-11-30 13:01:49 +02:00
Glauber Costa
127152e0a7 commitlog: close replay file
Replay file is opened, so it should be closed. We're not seeing any
problems arising from this, but they may happen. Enabling read ahead in
this stream makes them happen immediately. Fix it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 4d3d774757)
2016-11-30 12:59:04 +02:00
Takuya ASADA
80811d3891 dist/common/scripts/scylla_kernel_check: fix incorrect document URL
Fixes #1871

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1480327243-18177-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 1042e40188)
2016-11-29 11:12:09 +02:00
Avi Kivity
c6ffda7abe Update seastar submodule
* seastar df471a8...6fd4534 (1):
  > Collectd get_value_map safe scan the map

Fixes #1835.
2016-11-27 18:19:43 +02:00
Takuya ASADA
be9f62bd60 dist/ubuntu: increase number of open files on Ubuntu 14.04(upstart)
Follow the change of NOFILE for non-systemd environment.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1479975050-14907-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit ce80fb3a39)
2016-11-24 17:17:41 +02:00
Glauber Costa
d6ab5ff179 dist: increase number of open files
This limit was found to be too low for production environments. It would
be hit at boot, when we're touching a lot of files from multiple shards
before deciding that we don't need them.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87bbf43da1a67f5fa6174017205c6ef8bdb0dc3d.1479829232.git.glauber@scylladb.com>
(cherry picked from commit 18b9fa3d43)
2016-11-24 17:17:13 +02:00
Duarte Nunes
8a83819f1d thrift: Don't apply cell limit across rows
In Thrift, SliceRange defines a count that limits the number of cells
to return from that row (in CQL3 terms, it limits the number of rows
in that partition). While this limit is honored in the engine, the
Thrift layer also applies the same limit, which, while redundant in
most cases, is used to support the get_paged_slice verb.

Currently, the limit is not being reset per Thrift row (CQL3
partition), so in practice, instead of limiting the cells in a row,
we're limiting the rows we return as well. This patch fixes that by
ensuring the limit applies only within a row/partition.

Fixes #1882

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161123220001.15496-1-duarte@scylladb.com>
(cherry picked from commit a527ba285f)
2016-11-24 10:38:55 +02:00
Pekka Enberg
44249e4b09 dist/docker: Actually use 1.5...
Fix typo in the RPM repository URL to actually use 1.5.
2016-11-24 07:57:16 +02:00
Pekka Enberg
33c3a7e722 dist/docker: Use Scylla 1.5 RPM repository 2016-11-24 07:47:44 +02:00
Tomasz Grabiec
de5327a4fb Update seastar submodule
* seastar 25137c2...df471a8 (1):
  > semaphore_units: add missing return statement
2016-11-23 19:55:36 +01:00
Glauber Costa
6c7f055955 keep background work semaphore alive during sstable flush
We have a semaphore controlling the amount of background work generated
by the memtable flush process. However, because we are not moving it
inside the memtable post-flush continuation, the units are being
released when we star the flush and not when we finish it.

That's not the intended behavior and that can cause flushes to
accumulate.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <b7dc1866ed3473b9b1862c433d59c5ebd8575dbc.1479839600.git.glauber@scylladb.com>
(cherry picked from commit 13973e7f3b)
2016-11-22 19:54:38 +01:00
Glauber Costa
d58af7ded5 commitlog: acquire semaphore earlier
Recently we have changed our shutdown strategy to wait for the
_request_controller semaphore to make sure no other allocations are
in-flight. That was done to fix an actual issue.

The problem is that this wasn't done early enough. We acquire the
semaphore after we have already marked ourselves as _shutdown and
released the timer.

That means that if there is an allocation in flight that needs to use a
new segment, it will never finish - and we'll therefore neve acquire
the semaphore.

Fix it by acquiring it first. At this point the allocations will all be
done and gone, and then we can shutdown everything else.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <5c2a2f20e3832b6ea37d6541897519a9307294ed.1479765782.git.glauber@scylladb.com>
(cherry picked from commit 0b8b5abf16)
2016-11-21 22:23:15 +00:00
Avi Kivity
d9700a2826 storage_proxy: don't query concurrently needlessly during range queries
storage_proxy has an optimization where it tries to query multiple token
ranges concurrently to satisfy very large requests (an optimization which is
likely meaningless when paging is enabled, as it always should be).  However,
the rows-per-range code severely underestimates the number of rows per range,
resulting in a large number of "read-ahead" internal queries being performed,
the results of most of which are discarded.

Fix by disabling this code. We should likely remove it completely, but let's
start with a band-aid that can be backported.

Fixes #1863.

Message-Id: <20161120165741.2488-1-avi@scylladb.com>
(cherry picked from commit 6bdb8ba31d)
2016-11-21 18:19:59 +02:00
Glauber Costa
d2438059a7 database: keep a pointer to the memtable list in a memtable
We current pass a region group to the memtable, but after so many recent
changes, that is a bit too low level. This patch changes that so we pass
a memtable list instead.

Doing that also has a couple of advantages. Mainly, during flush we must
get to a memtable to a memtable_list. Currently we do that by going to
the memtable to a column family through the schema, and from there to
the memtable_list.

That, however, involves calling virtual functions in a derived class,
because a single column family could have both streaming and normal
memtables. If we pass a memtable_list to the memtable, we can keep
pointer, and when needed get the memtable_list directly.

Not only that gets rid of the inheritance for aesthetic reasons, but
that inheritance is not even correct anymore. Since the introduction of
the big streaming memtables, we now have a plethora of lists per column
family and this transversal is totally wrong. We haven't noticed before
because we were flushing the memtables based on their individual sizes,
but it has been wrong all along for edge cases in which we would have to
resort to size-based flush. This could be the case, for instance, with
various plan_ids in flight at the same time.

At this point, there is no more reason to keep the derived classes for
the dirty_memory_manager. I'm only keeping them around to reduce
clutter, although they are useful for the specialized constructors and
to communicate to the reader exactly what they are. But those can be
removed in a follow up patch if we want.

The old memtable constructor signature is kept around for the benefit of
two tests in memtable_tests which have their own flush logic. In the
future we could do something like we do for the SSTable tests, and have
a proxy class that is friends with the memtable class. That too, is left
for the future.

Fixes #1870

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <811ec9e8e123dc5fc26eadbda82b0bae906657a9.1479743266.git.glauber@scylladb.com>
(cherry picked from commit 0ca8c3f162)
2016-11-21 18:18:56 +02:00
Glauber Costa
4098831ebc commitlog: wait for pending allocations to finish before closing gate.
allocations may enter the gate, so it would be wise for us to wait for them.

Fixes #1860

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <53cd6996c1cbd8b38bab3b03604bd11e5c20beda.1479650012.git.glauber@scylladb.com>
(cherry picked from commit 21c1e2b48c)
2016-11-20 20:00:32 +02:00
Glauber Costa
4539b8403a database: fix direct flushes of non-durable column families.
If a Column Family is non-durable, then its flushes will never create a
memtable flush reader. Our current flush logic depends on that being
created and destroyed to release the semaphore permits on the flush.

We will remove the permits ourselves it there is an exception, but not
under normal circumnstances. Given this issue, however, it would be more
adequate to always try to remove the permits after we flush. If the
permits were already removed by the flush reader, then this test will
just see that the permit is not in the map and return. But if it is
still there, then it is removed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <049334c3b4bef620af2c7c045e6c84347dcf9013.1479498026.git.glauber@scylladb.com>
(cherry picked from commit 1933349654)
2016-11-18 21:33:22 +01:00
Raphael S. Carvalho
558f535fcb db: do not leak deleted sstable when deletion triggers an exception
The leakage results in deleted sstables being opened until shutdown, and disk
space isn't released. That's because column_family::rebuild_sstable_list()
will not remove reference to deleted sstables if an exception was triggered in
sstables::delete_atomically(). A sstable only has its files closed when its
object is destructed.

The exception happens when a major compaction is issued in parallel to a
regular one, and one of them will be unable to delete a sstable already deleted
by the other. That results in remove_by_toc_name() triggering boost::filesystem
::filesystem_error because TOC and temporary TOC don't exist.

We wouldn't have seen this problem if major compaction were going through
compaction manager, but remove_by_toc_name() and rebuild_sstable_list() should
be made resilient.

Fixes #1840.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d43b2e78f9658e2c3c5bbb7f813756f18874bf92.1479390842.git.raphaelsc@scylladb.com>
(cherry picked from commit 3dc9294023)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <760f96d81de0bab7507bb4f52c06b30f21e82577.1479420770.git.raphaelsc@scylladb.com>
2016-11-18 13:10:46 +02:00
Glauber Costa
3d45d0d339 fix shutdown and exception conditions for flush logic
This patch addresses post-merge follow up comments by Tomek.
Basically, what we do is:
- we don't need to signal() from remove_from_flush_manager(), because
  the explicit flushes no longer wait on the condition variable. So we
  don't.
- We now wait on the stop() flushes (regardless of their return status)
  so we can make sure that the _flush_queue will indeed be done with.
- we acquire the semaphore before shutting down the dirty_memory_manager
  to make sure that there are no pending flushes
- the flush manager that holds the semaphore has to match in the exception
  handler

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <a23ab5098934546c660a08de64cd9294bb3a2008.1479400239.git.glauber@scylladb.com>
(cherry picked from commit 461778918b)
2016-11-18 11:53:21 +02:00
Avi Kivity
affc0d9138 Merge "get rid of memtable size parameter and rework flush logic" from Glauber
"This patchset allows Scylla to determine the size of a memtable instead
of relying in the user-provided memtable_cleanup_threshold. It does that
by allowing the region_group to specify a soft limit which will trigger
the allocation as early as it is reached.

Given that, we'll keep the memtables in memory for as long as it takes
to reach that limit, regardless of the individual size of any single one
of them. That limit is set to 1/4 of dirty memory. That's the same as
last submission, except this time I have run some experiments to gauge
behavior of that versus 1/2 of dirty memory, which was a preferred
theoretical value.

After that is done, the flush logic is reworked to guarantee that
flushes are not initiated if we already have one memtable under flush.
That allow us to better take advantage of coalescing opportunities with
new requests and prevents the pending memtable explosion that is
ultimately responsible for Issue 1817.

I have run mainly two workloads with this. The first one a local RF=1
workload with large partitions, sized 128kB and 100 threads. The results
are:

Before:

op rate                   : 632 [WRITE:632]
partition rate            : 632 [WRITE:632]
row rate                  : 632 [WRITE:632]
latency mean              : 157.8 [WRITE:157.8]
latency median            : 115.5 [WRITE:115.5]
latency 95th percentile   : 486.7 [WRITE:486.7]
latency 99th percentile   : 534.8 [WRITE:534.8]
latency 99.9th percentile : 599.0 [WRITE:599.0]
latency max               : 722.6 [WRITE:722.6]
Total partitions          : 189667 [WRITE:189667]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:05:00
END

After:

op rate                   : 951 [WRITE:951]
partition rate            : 951 [WRITE:951]
row rate                  : 951 [WRITE:951]
latency mean              : 104.8 [WRITE:104.8]
latency median            : 102.5 [WRITE:102.5]
latency 95th percentile   : 155.8 [WRITE:155.8]
latency 99th percentile   : 177.8 [WRITE:177.8]
latency 99.9th percentile : 686.4 [WRITE:686.4]
latency max               : 1081.4 [WRITE:1081.4]
Total partitions          : 285324 [WRITE:285324]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:05:00
END

The other workload was the workload described in #1817. And the result
is that we now have a load that is very stable around 100k ops/s and
hardly any timeouts, instead of the 1.4 baseline of wild variations
around 100k ops/s and lots of timeouts, or the deep reduction of
1.5-rc1."

* 'issue-1817-v4' of github.com:glommer/scylla:
  database: rework memtable flush logic
  get rid of max_memtable_size
  pass a region to dirty_memory_manager accounting API
  memtable: add a method to expose the region_group
  logalloc: allow region group reclaimer to specify a soft limit
  database: remove outdated comment
  database: uphold virtual dirty for system tables.

(cherry picked from commit 5d067eebf2)
2016-11-17 14:41:23 +02:00
Gleb Natapov
3c68504e54 sstables: fix ad-hoc summary creation
If sstable Summary is not present Scylla does not refuses to boot but
instead creates summary information on the fly. There is a bug in this
code though. Summary files is a map between keys and offsets into Index
file, but the code creates map between keys and Data file offsets
instead. Fix it by keeping offset of an index entry in index_entry
structure and use it during Summary file creation.

Fixes #1857.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20161116165421.GA22296@scylladb.com>
(cherry picked from commit ae0a2935b4)
2016-11-17 11:45:29 +02:00
Raphael S. Carvalho
e9b26d547d main: fix exception handling when initializing data or commitlog dirs
Exception handling was broken because after io checker, storage_io_error
exception is wrapped around system error exceptions. Also the message
when handling exception wasn't precise enough for all cases. For example,
lack of permission to write to existing data directory.

Fixes #883.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com>
(cherry picked from commit 9a9f0d3a0f)
2016-11-16 15:12:48 +02:00
Raphael S. Carvalho
8510389188 sstables: handle unrecognized sstable component
As in C*, unrecognized sstable components should be ignored when
loading a sstable. At the moment, Scylla fails to do so and will
not boot as a result. In addition, unknown components should be
remembered when moving a sstable or changing its generation.

Fixes #1780.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com>
(cherry picked from commit 53b7b7def3)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <e30115e089a4c3c3fb4aad956645c9d006c2ee55.1479141101.git.raphaelsc@scylladb.com>
2016-11-16 15:11:05 +02:00
Amnon Heiman
ea61a8b410 API: cache_capacity should use uint for summing
Using integer as a type for the map_reduce causes number over overflow.

Fixes #1801

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1479299425-782-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit a4be7afbb0)
2016-11-16 15:03:15 +02:00
Paweł Dziepak
bd694d845e partition_version: make sure that snapshot is destroyed under LSA
Snapshot destructor may free some objects managed by the LSA. That's why
partition_snapshot_reader destructor explicitly destroys the snapshot it
uses. However, it was possible that exception thrown by _read_section
prevented that from happenning making snapshot destoryed implicitly
without current allocator set to LSA.

Refs #1831.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478778570-2795-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit f16d6f9c40)
2016-11-16 14:34:11 +02:00
Paweł Dziepak
01c01d9ac4 query_pagers: distinct queries do not have clustering keys
Query pager needs to handle results that contain partitions with
possibly multiple clustering rows quite differently than results with
just one row per partition (for example a page may end in a middle of
partition). However, the logic dealing with partitions with clustering
rows doesn't work correctly for SELECT DISTINCT queries, which are
much more similar to the ones for schemas without clustering key.

The solution is to set _has_clustering_keys to false in case of SELECT
DISTINCT queries regardless of the schema which will make pager
correctly expect each partition to return at most one rows.

Fixes #1822.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478612486-13421-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 055d78ee4c)
2016-11-16 10:17:34 +01:00
Paweł Dziepak
ed39e8c235 row_cache: touch entries read during range queries
Fixes #1847.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1479230809-27547-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 999dafbe57)
2016-11-15 20:34:40 +00:00
Avi Kivity
c57835e7b5 Merge "Fixes for histogram and moving average calculations" from Glauber
"JMX metrics were found to be either not showing, or showing absurd
values.  Turns out there were multiple things wrong with them. The
patches were sent separately but conflict with one another. This series
is a collection of the patches needed to fix the issues we saw.

Fixes #1832, #1836, #1837"

(cherry picked from commit bf20aa722b)
2016-11-13 11:42:53 +02:00
Amnon Heiman
13baa04056 API: fix a type in storage_proxy
This patch fixes a typo in the URL definition, causing the metric in the
jmx not to find it.

Fixes #1821

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1478563869-20504-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit c8082ccadb)
2016-11-13 09:25:14 +02:00
Glauber Costa
298de37cef histogram: moving averages: fix inverted parameters
moving_averages constructor is defined like this:

    moving_average(latency_counter::duration interval, latency_counter::duration tick_interval)

But when it is time to initialize them, we do this:

	... {tick_interval(), std::chrono::minutes(1)} ...

As it can be seen, the interval and tick interval are inverted. This
leads to the metrics being assigned bogus values.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <d83f09eed20ea2ea007d120544a003b2e0099732.1478798595.git.glauber@scylladb.com>
(cherry picked from commit d3f11fbabf)
2016-11-11 10:15:32 +02:00
Paweł Dziepak
91e5e50647 Merge "Remove quadratic behavior from atomic sstable deletion" from Avi
"The atomic sstable deletion provides exception safety at the cost of
quadratic behavior in the number of sstables awaiting deletion.  This
causes high cpu utilization during startup.

Change the code to avoid quadratic complexity, and add some unit tests.

See #1812."

(cherry picked from commit 985d2f6d4a)
2016-11-08 22:46:01 +02:00
Pekka Enberg
08b1ff53dd release: prepare for 1.5.rc1 2016-11-02 13:39:53 +02:00
Pekka Enberg
0485289741 cql3: Fix selecting same column multiple times
Under the hood, the selectable::add_and_get_index() function
deliberately filters out duplicate columns. This causes
simple_selector::get_output_row() to return a row with all duplicate
columns filtered out, which triggers and assertion because of row
mismatch with metadata (which contains the duplicate columns).

The fix is rather simple: just make selection::from_selectors() use
selection_with_processing if the number of selectors and column
definitions doesn't match -- like Apache Cassandra does.

Fixes #1367
Message-Id: <1477989740-6485-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit e1e8ca2788)
2016-11-01 09:33:19 +00:00
Avi Kivity
b3504e5482 Update seastar submodule
* seastar 57a17ca...25137c2 (2):
  > reactor: improve task quota timer resolution
  > future: prioritise continuations that can run immediately

Fixes #1794.
2016-10-28 14:17:26 +03:00
Avi Kivity
6cdb1256bb Update seastar submodule
* seastar e2c2bbc...57a17ca (1):
  > rpc: Avoid using zero-copy interface of output_stream (Fixes #1786)
2016-10-28 14:11:47 +03:00
Pekka Enberg
39b0da51a3 auth: Fix resource level handling
We use `data_resource` class in the CQL parser, which let's users refer
to a table resource without specifying a keyspace. This asserts out in
get_level() for no good reason as we already know the intented level
based on the constructor. Therefore, change `data_resource` to track the
level like upstream Cassandra does and use that.

Fixes #1790

Message-Id: <1477599169-2945-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit b54870764f)
2016-10-27 23:37:50 +03:00
Glauber Costa
0656e66f5f auth: always convert string to upper case before comparing
We store all auth perm strings in upper case, but the user might very
well pass this in upper case.

We could use a standard key comparator / hash here, but since the
strings tend to be small, the new sstring will likely be allocated in
the stack here and this approach yields significantly less code.

Fixes #1791.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <51df92451e6e0a6325a005c19c95eaa55270da61.1477594199.git.glauber@scylladb.com>
(cherry picked from commit ef3c7ab38e)
2016-10-27 22:11:02 +03:00
Avi Kivity
185fbb8abc Merge "Cache fixes" from Paweł
"5ff699e09fcbd62611e78b9de601f6c8636ab2f0 ("row_cache: rework cache to
use fast forwarding reader") brought some significant changes to the
row cache implementation. Unfortunately, "significant changes" often
translates to "more bugs" and this time was no different.

This series contains fixes for the problems introduced in that rework
and makes failing dtest
bootstrap_test.py:TestBootstrap.local_quorum_bootstrap_test
pass again."

* 'pdziepak/cache-fixes/v1' of github.com:cloudius-systems/seastar-dev:
  row_cache: avoid dereferencing invalid iterator
  row_cache: set _first_element flag correctly
  row_cache: fix clearing continuity flag at eviction

(cherry picked from commit 72d78ffa7e)
2016-10-27 11:45:20 +03:00
Tomasz Grabiec
4ed3d350cc Update seastar submodule
* seastar ab1531e...e2c2bbc (3):
  > rpc: do not assume underling semaphore type
  > rpc: fix default resource limit
  > rpc: Move _connected flag to protocol::connection
2016-10-26 10:00:52 +02:00
Tomasz Grabiec
72d4a26c43 Update seastar submodule
* seastar f8e4e93...ab1531e (1):
  > rpc: Fix crash during connection teardown
2016-10-26 09:49:41 +02:00
Tomasz Grabiec
b582525ad8 Merge seastar upstream
(This time for real)

* seastar 69acec1...f8e4e93 (1):
  > rpc: Do not close client connection on error response for a timed out request

Refs #1778
2016-10-25 13:53:01 +02:00
Tomasz Grabiec
5ca372e852 Merge seastar upstream
* seastar 69acec1...f8e4e93 (1):
  > rpc: Do not close client connection on error response for a timed out request

Refs #1778
2016-10-25 13:45:58 +02:00
1492 changed files with 40719 additions and 139617 deletions

View File

@@ -1,9 +1,3 @@
This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
- [] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.
*Installation details*
Scylla version (or git commit hash):
Cluster size:

View File

@@ -1,4 +0,0 @@
Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.
See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.
If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).

10
.gitignore vendored
View File

@@ -9,13 +9,3 @@ dist/ami/files/*.rpm
dist/ami/variables.json
dist/ami/scylla_deploy.sh
*.pyc
Cql.tokens
.kdev4
*.kdev4
CMakeLists.txt.user
.cache
.tox
*.egg-info
__pycache__CMakeLists.txt.user
.gdbinit
resources

5
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui
@@ -9,6 +9,3 @@
[submodule "dist/ami/files/scylla-ami"]
path = dist/ami/files/scylla-ami
url = ../scylla-ami
[submodule "xxHash"]
path = xxHash
url = ../xxHash

View File

@@ -1,141 +0,0 @@
##
## For best results, first compile the project using the Ninja build-system.
##
cmake_minimum_required(VERSION 3.7)
project(scylla)
if (NOT DEFINED FOR_IDE AND NOT DEFINED ENV{FOR_IDE} AND NOT DEFINED ENV{CLION_IDE})
message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in IDEs, please define FOR_IDE to acknowledge this.")
endif()
# Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.
set(SEASTAR_INCLUDE_DIRS "seastar")
# These paths are always available, since they're included in the repository. Additional DPDK headers are placed while
# Seastar is built, and are captured in `SEASTAR_INCLUDE_DIRS` through parsing the Seastar pkg-config file (below).
set(SEASTAR_DPDK_INCLUDE_DIRS
seastar/dpdk/lib/librte_eal/common/include
seastar/dpdk/lib/librte_eal/common/include/generic
seastar/dpdk/lib/librte_eal/common/include/x86
seastar/dpdk/lib/librte_ether)
find_package(PkgConfig REQUIRED)
set(ENV{PKG_CONFIG_PATH} "${CMAKE_SOURCE_DIR}/seastar/build/release:$ENV{PKG_CONFIG_PATH}")
pkg_check_modules(SEASTAR seastar)
find_package(Boost COMPONENTS filesystem program_options system thread)
##
## Populate the names of all source and header files in the indicated paths in a designated variable.
##
## When RECURSIVE is specified, directories are traversed recursively.
##
## Use: scan_scylla_source_directories(VAR my_result_var [RECURSIVE] PATHS [path1 path2 ...])
##
function (scan_scylla_source_directories)
set(options RECURSIVE)
set(oneValueArgs VAR)
set(multiValueArgs PATHS)
cmake_parse_arguments(args "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}")
set(globs "")
foreach (dir ${args_PATHS})
list(APPEND globs "${dir}/*.cc" "${dir}/*.hh")
endforeach()
if (args_RECURSIVE)
set(glob_kind GLOB_RECURSE)
else()
set(glob_kind GLOB)
endif()
file(${glob_kind} var
${globs})
set(${args_VAR} ${var} PARENT_SCOPE)
endfunction()
## Although Seastar is an external project, it is common enough to explore the sources while doing
## Scylla development that we'll treat the Seastar sources as part of this project for easier navigation.
scan_scylla_source_directories(
VAR SEASTAR_SOURCE_FILES
RECURSIVE
PATHS
seastar/core
seastar/http
seastar/json
seastar/net
seastar/rpc
seastar/tests
seastar/util)
scan_scylla_source_directories(
VAR SCYLLA_ROOT_SOURCE_FILES
PATHS .)
scan_scylla_source_directories(
VAR SCYLLA_SUB_SOURCE_FILES
RECURSIVE
PATHS
api
auth
cql3
db
dht
exceptions
gms
index
io
locator
message
repair
service
sstables
streaming
tests
thrift
tracing
transport
utils)
scan_scylla_source_directories(
VAR SCYLLA_GEN_SOURCE_FILES
RECURSIVE
PATHS build/release/gen)
set(SCYLLA_SOURCE_FILES
${SCYLLA_ROOT_SOURCE_FILES}
${SCYLLA_GEN_SOURCE_FILES}
${SCYLLA_SUB_SOURCE_FILES})
add_executable(scylla
${SEASTAR_SOURCE_FILES}
${SCYLLA_SOURCE_FILES})
# Note that since CLion does not undestand GCC6 concepts, we always disable them (even if users configure otherwise).
# CLion seems to have trouble with `-U` (macro undefinition), so we do it this way instead.
list(REMOVE_ITEM SEASTAR_CFLAGS "-DHAVE_GCC6_CONCEPTS")
# If the Seastar pkg-config information is available, append to the default flags.
#
# For ease of browsing the source code, we always pretend that DPDK is enabled.
target_compile_options(scylla PUBLIC
-std=gnu++1z
-DHAVE_DPDK
-DHAVE_HWLOC
"${SEASTAR_CFLAGS}")
# The order matters here: prefer the "static" DPDK directories to any dynamic paths from pkg-config. Some files are only
# available dynamically, though.
target_include_directories(scylla PUBLIC
.
${SEASTAR_DPDK_INCLUDE_DIRS}
${SEASTAR_INCLUDE_DIRS}
${Boost_INCLUDE_DIRS}
xxhash
build/release/gen)

View File

@@ -1,11 +0,0 @@
# Asking questions or requesting help
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.
# Reporting an issue
Please use the [Issue Tracker](https://github.com/scylladb/scylla/issues/) to report issues. Fill in as much information as you can in the issue template, especially for performance problems.
# Contributing Code to Scylla
To contribute code to Scylla, you need to sign the [Contributor License Agreement](http://www.scylladb.com/opensource/cla/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.

View File

@@ -1,279 +0,0 @@
# Guidelines for developing Scylla
This document is intended to help developers and contributors to Scylla get started. The first part consists of general guidelines that make no assumptions about a development environment or tooling. The second part describes a particular environment and work-flow for exemplary purposes.
## Overview
This section covers some high-level information about the Scylla source code and work-flow.
### Getting the source code
Scylla uses [Git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) to manage its dependency on Seastar and other tools. Be sure that all submodules are correctly initialized when cloning the project:
```bash
$ git clone https://github.com/scylladb/scylla
$ cd scylla
$ git submodule update --init --recursive
```
### Dependencies
Scylla depends on the system package manager for its development dependencies.
Running `./install_dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native thread, and up to 3 GB per native thread while linking.
Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.
To build for the first time:
```bash
$ ./configure.py
$ ninja-build
```
Afterwards, it is sufficient to just execute Ninja.
The full suite of options for project configuration is available via
```bash
$ ./configure.py --help
```
The most important options are:
- `--mode={release,debug,all}`: Debug mode enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) and allows for debugging with tools like GDB. Debugging builds are generally slower and generate much larger object files than release builds.
- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.
To save time -- for instance, to avoid compiling all unit tests -- you can also specify specific targets to Ninja. For example,
```bash
$ ninja-build build/release/tests/schema_change_test
```
### Unit testing
Unit tests live in the `/tests` directory. Like with application source files, test sources and executables are specified manually in `configure.py` and need to be updated when changes are made.
A test target can be any executable. A non-zero return code indicates test failure.
Most tests in the Scylla repository are built using the [Boost.Test](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/index.html) library. Utilities for writing tests with Seastar futures are also included.
Run all tests through the test execution wrapper with
```bash
$ ./test.py --mode={debug,release}
```
The `--name` argument can be specified to run a particular test.
Alternatively, you can execute the test executable directly. For example,
```bash
$ build/release/tests/row_cache_test -- -c1 -m1G
```
The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread and 1 GB of memory.
### Preparing patches
All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/). There are also some guidelines that can help you make the patch review process smoother:
1. Before generating patches, make sure your Git configuration points to `.gitorderfile`. You can do it by running
```bash
$ git config diff.orderfile .gitorderfile
```
2. If you are sending more than a single patch, push your changes into a new branch of your fork of Scylla on GitHub and add a URL pointing to this branch to your cover letter.
3. If you are sending a new revision of an earlier patchset, add a brief summary of changes in this version, for example:
```
In v3:
- declared move constructor and move assignment operator as noexcept
- used std::variant instead of a union
...
```
4. Add information about the tests run with this fix. It can look like
```
"Tests: unit ({mode}), dtest ({smp})"
```
The usual is "Tests: unit (release)", although running debug tests is encouraged.
5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.
### Finding a person to review and merge your patches
You can use the `scripts/find-maintainer` script to find a subsystem maintainer and/or reviewer for your patches. The script accepts a filename in the git source tree as an argument and outputs a list of subsystems the file belongs to and their respective maintainers and reviewers. For example, if you changed the `cql3/statements/create_view_statement.hh` file, run the script as follows:
```bash
$ ./scripts/find-maintainer cql3/statements/create_view_statement.hh
```
and you will get output like this:
```
CQL QUERY LANGUAGE
Tomasz Grabiec <tgrabiec@scylladb.com> [maintainer]
Pekka Enberg <penberg@scylladb.com> [maintainer]
MATERIALIZED VIEWS
Pekka Enberg <penberg@scylladb.com> [maintainer]
Duarte Nunes <duarte@scylladb.com> [maintainer]
Nadav Har'El <nyh@scylladb.com> [reviewer]
Duarte Nunes <duarte@scylladb.com> [reviewer]
```
### Running Scylla
Once Scylla has been compiled, executing the (`debug` or `release`) target will start a running instance in the foreground:
```bash
$ build/release/scylla
```
The `scylla` executable requires a configuration file, `scylla.yaml`. By default, this is read from `$SCYLLA_HOME/conf/scylla.yaml`. A good starting point for development is located in the repository at `/conf/scylla.yaml`.
For development, a directory at `$HOME/scylla` can be used for all Scylla-related files:
```bash
$ mkdir -p $HOME/scylla $HOME/scylla/conf
$ cp conf/scylla.yaml $HOME/scylla/conf/scylla.yaml
$ # Edit configuration options as appropriate
$ SCYLLA_HOME=$HOME/scylla build/release/scylla
```
The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories` and `commitlog_directory` fields as appropriate.
Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.
Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.
On a development machine, one might run Scylla as
```bash
$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes
```
### Branches and tags
Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.
Similarly, tags are used to pin-point precise release versions, including hot-fix versions like 1.5.4. These are named `scylla-1.5.4`, for example.
Most development happens on the `master` branch. Release branches are cut from `master` based on time and/or features. When a patch against `master` fixes a serious issue like a node crash or data loss, it is backported to a particular release branch with `git cherry-pick` by the project maintainers.
## Example: development on Fedora 25
This section describes one possible work-flow for developing Scylla on a Fedora 25 system. It is presented as an example to help you to develop a work-flow and tools that you are comfortable with.
### Preface
This guide will be written from the perspective of a fictitious developer, Taylor Smith.
### Git work-flow
Having two Git remotes is useful:
- A public clone of Seastar (`"public"`)
- A private clone of Seastar (`"private"`) for in-progress work or work that is not yet ready to share
The first step to contributing a change to Scylla is to create a local branch dedicated to it. For example, a feature that fixes a bug in the CQL statement for creating tables could be called `ts/cql_create_table_error/v1`. The branch name is prefaced by the developer's initials and has a suffix indicating that this is the first version. The version suffix is useful when branches are shared publicly and changes are requested on the mailing list. Having a branch for each version of the patch (or patch set) shared publicly makes it easier to reference and compare the history of a change.
Setting the upstream branch of your development branch to `master` is a useful way to track your changes. You can do this with
```bash
$ git branch -u master ts/cql_create_table_error/v1
```
As a patch set is developed, you can periodically push the branch to the private remote to back-up work.
Once the patch set is ready to be reviewed, push the branch to the public remote and prepare an email to the `scylladb-dev` mailing list. Including a link to the branch on your public remote allows for reviewers to quickly test and explore your changes.
### Development environment and source code navigation
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt`, for use only with development environments (not for building) so that they can properly analyze the source code.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice for code hygiene, though its C++ parser sometimes makes errors and flags false issues.
Other good options that directly parse CMake files are [KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
To use the `CMakeLists.txt` file with these programs, define the `FOR_IDE` CMake variable or shell environmental variable.
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects, and its C++ parser has many similar issues as CLion.
### Distributed compilation: `distcc` and `ccache`
Scylla's compilations times can be long. Two tools help somewhat:
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and re-uses them when possible
- [distcc](https://github.com/distcc/distcc) distributes compilation jobs to remote machines
A reasonably-powered laptop acts as the coordinator for compilation. A second, more powerful, machine acts as a passive compilation server.
Having a direct wired connection between the machines ensures that object files can be transmitted quickly and limits the overhead of remote compilation.
The coordinator has been assigned the static IP address `10.0.0.1` and the passive compilation machine has been assigned `10.0.0.2`.
On Fedora, installing the `ccache` package places symbolic links for `gcc` and `g++` in the `PATH`. This allows normal compilation to transparently invoke `ccache` for compilation and cache object files on the local file-system.
Next, set `CCACHE_PREFIX` so that `ccache` is responsible for invoking `distcc` as necessary:
```bash
export CCACHE_PREFIX="distcc"
```
On each host, edit `/etc/sysconfig/distccd` to include the allowed coordinators and the total number of jobs that the machine should accept.
This example is for the laptop, which has 2 physical cores (4 logical cores with hyper-threading):
```
OPTIONS="--allow 10.0.0.2 --allow 127.0.0.1 --jobs 4"
```
`10.0.0.2` has 8 physical cores (16 logical cores) and 64 GB of memory.
As a rule-of-thumb, the number of jobs that a machine should be specified to support should be equal to the number of its native threads.
Restart the `distccd` service on all machines.
On the coordinator machine, edit `$HOME/.distcc/hosts` with the available hosts for compilation. Order of the hosts indicates preference.
```
10.0.0.2/16 localhost/2
```
In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine will be sent up to 2. Allowing for two extra threads on the host machine for coordination, we run compilation with `16 + 2 + 2 = 20` jobs in total: `ninja-build -j20`.
When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.
One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next section speeding up this process.
### Using the `gold` linker
Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds the linking process. On Fedora, you can switch the system linker using
```bash
$ sudo alternatives --config ld
```
### Testing changes in Seastar with Scylla
Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
One way to do this it to create a local remote for the Seastar submodule in the Scylla repository:
```bash
$ cd $HOME/src/scylla
$ cd seastar
$ git remote add local /home/tsmith/src/seastar
$ git remote update
$ git checkout -t local/my_local_seastar_branch
```

View File

@@ -1,131 +0,0 @@
M: Maintainer with commit access
R: Reviewer with subsystem expertise
F: Filename, directory, or pattern for the subsystem
---
AUTH
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
R: Vlad Zolotarov <vladz@scylladb.com>
R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
F: auth/*
CACHE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
R: Piotr Jastrzebski <piotr@scylladb.com>
F: row_cache*
F: *mutation*
F: tests/mvcc*
COMMITLOG / BATCHLOGa
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
F: db/commitlog/*
F: db/batch*
COORDINATOR
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Gleb Natapov <gleb@scylladb.com>
F: service/storage_proxy*
COMPACTION
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: sstables/compaction*
CQL TRANSPORT LAYER
M: Pekka Enberg <penberg@scylladb.com>
F: transport/*
CQL QUERY LANGUAGE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: cql3/*
COUNTERS
M: Paweł Dziepak <pdziepak@scylladb.com>
F: counters*
F: tests/counter_test*
GOSSIP
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
F: gms/*
DOCKER
M: Pekka Enberg <penberg@scylladb.com>
F: dist/docker/*
LSA
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
F: utils/logalloc*
MATERIALIZED VIEWS
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
R: Duarte Nunes <duarte@scylladb.com>
F: db/view/*
F: cql3/statements/*view*
PACKAGING
R: Takuya ASADA <syuu@scylladb.com>
F: dist/*
REPAIR
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: repair/*
SCHEMA MANAGEMENT
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: db/schema_tables*
F: db/legacy_schema_migrator*
F: service/migration*
F: schema*
SECONDARY INDEXES
M: Pekka Enberg <penberg@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
R: Pekka Enberg <penberg@scylladb.com>
F: db/index/*
F: cql3/statements/*index*
SSTABLES
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: sstables/*
STREAMING
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
F: streaming/*
F: service/storage_service.*
THRIFT TRANSPORT LAYER
M: Duarte Nunes <duarte@scylladb.com>
F: thrift/*
THE REST
M: Avi Kivity <avi@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
F: *

View File

@@ -1,5 +1,2 @@
This project includes code developed by the Apache Software Foundation (http://www.apache.org/),
especially Apache Cassandra.
It also includes files from https://github.com/antonblanchard/crc32-vpmsum (author Anton Blanchard <anton@au.ibm.com>, IBM).
These files are located in utils/arch/powerpc/crc32-vpmsum. Their license may be found in licenses/LICENSE-crc32-vpmsum.TXT.

View File

@@ -1,19 +1,29 @@
# Scylla
## Quick-start
## Building Scylla
```bash
$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!
In addition to required packages by Seastar, the following packages are required by Scylla.
### Submodules
Scylla uses submodules, so make sure you pull the submodules first by doing:
```
git submodule init
git submodule update --init --recursive
```
Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.
### Building and Running Scylla on Fedora
* Installing required packages:
## Running Scylla
```
sudo dnf install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel protobuf-devel protobuf-compiler systemd-devel libunwind-devel
```
* Build Scylla
```
./configure.py --mode=release --with=scylla --disable-xen
ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM
```
* Run Scylla
```
@@ -73,6 +83,14 @@ Run the image with:
docker run -p $(hostname -i):9042:9042 -i -t <image name>
```
## Contributing to Scylla
[Guidelines for contributing](CONTRIBUTING.md)
Do not send pull requests.
Send patches to the mailing list address scylladb-dev@googlegroups.com.
Be sure to subscribe.
In order for your patches to be merged, you must sign the Contributor's
License Agreement, protecting your rights and ours. See
http://www.scylladb.com/opensource/cla/.

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=666.development
VERSION=1.5.4
if test -f version
then
@@ -10,12 +10,7 @@ else
DATE=$(date +%Y%m%d)
GIT_COMMIT=$(git log --pretty=format:'%h' -n 1)
SCYLLA_VERSION=$VERSION
# For custom package builds, replace "0" with "counter.your_name",
# where counter starts at 1 and increments for successive versions.
# This ensures that the package manager will select your custom
# package over the standard release.
SCYLLA_BUILD=0
SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
SCYLLA_RELEASE=$DATE.$GIT_COMMIT
fi
echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"

View File

@@ -397,36 +397,6 @@
}
]
},
{
"path": "/cache_service/metrics/key/hits_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get key hits moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_key_hits_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/key/requests_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get key requests moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_key_requests_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/key/size",
"operations": [
@@ -637,36 +607,6 @@
}
]
},
{
"path": "/cache_service/metrics/counter/hits_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get counter hits moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_counter_hits_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/counter/requests_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get counter requests moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_counter_requests_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/counter/size",
"operations": [

View File

@@ -78,19 +78,11 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"split_output",
"description":"true if the output of the major compaction should be split in several sstables",
"required":false,
"allowMultiple":false,
"type":"bool",
"paramType":"query"
}
]
}
@@ -110,7 +102,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -137,7 +129,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -161,7 +153,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -188,7 +180,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -212,7 +204,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -252,7 +244,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -279,7 +271,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -306,7 +298,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -325,7 +317,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -357,7 +349,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -389,7 +381,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -413,7 +405,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -440,7 +432,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -455,7 +447,7 @@
"operations":[
{
"method":"GET",
"summary":"Returns a list of sstable filenames that contain the given partition key on this node",
"summary":"Returns a list of filenames that contain the given key on this node",
"type":"array",
"items":{
"type":"string"
@@ -467,7 +459,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -475,7 +467,7 @@
},
{
"name":"key",
"description":"The partition key. In a composite-key scenario, use ':' to separate the columns in the key.",
"description":"The key",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -499,7 +491,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -526,7 +518,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -553,7 +545,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -577,7 +569,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -601,7 +593,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -641,7 +633,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -681,7 +673,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -721,7 +713,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -761,7 +753,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -801,7 +793,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -841,7 +833,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -881,7 +873,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -924,7 +916,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -951,7 +943,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -978,7 +970,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1002,7 +994,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1042,7 +1034,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1066,7 +1058,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1109,7 +1101,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1152,7 +1144,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1211,7 +1203,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1251,7 +1243,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1275,7 +1267,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1318,7 +1310,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1361,7 +1353,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1420,7 +1412,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1460,7 +1452,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1500,7 +1492,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1540,7 +1532,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1580,7 +1572,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1620,7 +1612,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1660,7 +1652,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1700,7 +1692,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1740,7 +1732,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1780,7 +1772,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1820,7 +1812,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1860,7 +1852,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1900,7 +1892,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1940,7 +1932,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1980,7 +1972,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2020,7 +2012,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2060,7 +2052,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2100,7 +2092,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2124,7 +2116,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2164,7 +2156,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2204,7 +2196,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2244,7 +2236,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2284,7 +2276,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2308,7 +2300,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2332,7 +2324,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2359,7 +2351,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2386,7 +2378,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2413,7 +2405,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2440,7 +2432,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2509,7 +2501,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2533,7 +2525,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2557,7 +2549,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2581,7 +2573,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2605,7 +2597,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2629,7 +2621,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2653,7 +2645,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2677,7 +2669,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2701,7 +2693,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2725,7 +2717,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2749,7 +2741,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -2773,7 +2765,7 @@
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",

View File

@@ -1,30 +0,0 @@
"/v2/config/{id}": {
"get": {
"description": "Return a config value",
"operationId": "find_config_id",
"produces": [
"application/json"
],
"tags": ["config"],
"parameters": [
{
"name": "id",
"in": "path",
"description": "ID of config to return",
"required": true,
"type": "string"
}
],
"responses": {
"200": {
"description": "Config value"
},
"default": {
"description": "unexpected error",
"schema": {
"$ref": "#/definitions/ErrorModel"
}
}
}
}
}

View File

@@ -21,8 +21,8 @@
"parameters":[
{
"name":"host",
"description":"The host name. If absent, the local server broadcast/listen address is used",
"required":false,
"description":"The host name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
@@ -45,8 +45,8 @@
"parameters":[
{
"name":"host",
"description":"The host name. If absent, the local server broadcast/listen address is used",
"required":false,
"description":"The host name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"

View File

@@ -42,25 +42,6 @@
}
]
},
{
"path":"/failure_detector/endpoint_phi_values",
"operations":[
{
"method":"GET",
"summary":"Get end point phi values",
"type":"array",
"items":{
"type":"endpoint_phi_values"
},
"nickname":"get_endpoint_phi_values",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/failure_detector/endpoints/",
"operations":[
@@ -221,20 +202,6 @@
"description": "The application state version"
}
}
},
"endpoint_phi_value": {
"id" : "endpoint_phi_value",
"description": "Holds phi value for a single end point",
"properties": {
"phi": {
"type": "double",
"description": "Phi value"
},
"endpoint": {
"type": "string",
"description": "end point address"
}
}
}
}
}

View File

@@ -792,24 +792,6 @@
}
]
},
{
"path":"/storage_service/active_repair/",
"operations":[
{
"method":"GET",
"summary":"Return an array with the ids of the currently active repairs",
"type":"array",
"items":{
"type":"int"
},
"nickname":"get_active_repair_async",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/repair_async/{keyspace}",
"operations":[
@@ -970,22 +952,6 @@
}
]
},
{
"path":"/storage_service/force_terminate_repair",
"operations":[
{
"method":"POST",
"summary":"Force terminate all repair sessions",
"type":"void",
"nickname":"force_terminate_all_repair_sessions_new",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/decommission",
"operations":[
@@ -1235,12 +1201,11 @@
],
"parameters":[
{
"name":"type",
"description":"Which keyspaces to return",
"name":"non_system",
"description":"When set to true limit to non system",
"required":false,
"allowMultiple":false,
"type":"string",
"enum": [ "all", "user", "non_local_strategy" ],
"type":"boolean",
"paramType":"query"
}
]
@@ -2129,41 +2094,6 @@
]
}
]
},
{
"path":"/storage_service/view_build_statuses/{keyspace}/{view}",
"operations":[
{
"method":"GET",
"summary":"Gets the progress of a materialized view build",
"type":"array",
"items":{
"type":"mapper"
},
"nickname":"view_build_statuses",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"view",
"description":"View name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
}
],
"models":{

View File

@@ -1,29 +0,0 @@
{
"swagger": "2.0",
"info": {
"version": "1.0.0",
"title": "Scylla API",
"description": "The scylla API version 2.0",
"termsOfService": "http://www.scylladb.com/tos/",
"contact": {
"name": "Scylla Team",
"email": "info@scylladb.com",
"url": "http://scylladb.com"
},
"license": {
"name": "AGPL",
"url": "https://github.com/scylladb/scylla/blob/master/LICENSE.AGPL"
}
},
"host": "{{Host}}",
"basePath": "/v2",
"schemes": [
"http"
],
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"paths": {

View File

@@ -39,7 +39,6 @@
#include "http/exception.hh"
#include "stream_manager.hh"
#include "system.hh"
#include "api/config.hh"
namespace api {
@@ -50,23 +49,19 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
throw bad_param_exception(ex.what());
}
// We never going to get here
throw std::runtime_error("exception_reply");
return std::make_unique<reply>();
}
future<> set_server_init(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
auto rb02 = std::make_shared < api_registry_builder20 > (ctx.api_doc, "/v2");
return ctx.http_server.set_routes([rb, &ctx, rb02](routes& r) {
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
r.register_exeption_handler(exception_reply);
r.put(GET, "/ui", new httpd::file_handler(ctx.api_dir + "/index.html",
new content_replace("html")));
r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
new content_replace("html")));
rb->set_api_doc(r);
rb02->set_api_doc(r);
rb02->register_api_file(r, "swagger20_header");
set_config(rb02, ctx, r);
rb->register_function(r, "system",
"The system related API");
set_system(ctx, r);
@@ -117,11 +112,6 @@ future<> set_server_stream_manager(http_context& ctx) {
"The stream manager API", set_stream_manager);
}
future<> set_server_cache(http_context& ctx) {
return register_api(ctx, "cache_service",
"The cache service API", set_cache_service);
}
future<> set_server_gossip_settle(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
@@ -129,6 +119,9 @@ future<> set_server_gossip_settle(http_context& ctx) {
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
});
}

View File

@@ -29,7 +29,6 @@
#include "utils/histogram.hh"
#include "http/exception.hh"
#include "api_init.hh"
#include "seastarx.hh"
namespace api {
@@ -167,36 +166,33 @@ inline int64_t max_int64(int64_t a, int64_t b) {
* It combine total and the sub set for the ratio and its
* to_json method return the ration sub/total
*/
template<typename T>
struct basic_ratio_holder : public json::jsonable {
T total = 0;
T sub = 0;
struct ratio_holder : public json::jsonable {
double total = 0;
double sub = 0;
virtual std::string to_json() const {
if (total == 0) {
return "0";
}
return std::to_string(sub/total);
}
basic_ratio_holder() = default;
basic_ratio_holder& add(T _total, T _sub) {
ratio_holder() = default;
ratio_holder& add(double _total, double _sub) {
total += _total;
sub += _sub;
return *this;
}
basic_ratio_holder(T _total, T _sub) {
ratio_holder(double _total, double _sub) {
total = _total;
sub = _sub;
}
basic_ratio_holder<T>& operator+=(const basic_ratio_holder<T>& a) {
ratio_holder& operator+=(const ratio_holder& a) {
return add(a.total, a.sub);
}
friend basic_ratio_holder<T> operator+(basic_ratio_holder a, const basic_ratio_holder<T>& b) {
friend ratio_holder operator+(ratio_holder a, const ratio_holder& b) {
return a += b;
}
};
typedef basic_ratio_holder<double> ratio_holder;
typedef basic_ratio_holder<int64_t> integral_ratio_holder;
class unimplemented_exception : public base_exception {
public:

View File

@@ -46,7 +46,7 @@ future<> set_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx);
future<> set_server_cache(http_context& ctx);
future<> set_server_done(http_context& ctx);
}

View File

@@ -177,20 +177,6 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_key_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_key_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_key_size.set(r, [] (std::unique_ptr<request> req) {
// TBD
// FIXME
@@ -252,13 +238,13 @@ void set_cache_service(http_context& ctx, routes& r) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().partitions();
return cf.get_row_cache().num_entries();
}, std::plus<uint64_t>());
});
cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().partitions();
return cf.get_row_cache().num_entries();
}, std::plus<uint64_t>());
});
@@ -294,20 +280,6 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_counter_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_counter_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_counter_size.set(r, [] (std::unique_ptr<request> req) {
// TBD
// FIXME

View File

@@ -40,13 +40,13 @@ static auto transformer(const std::vector<collectd_value>& values) {
for (auto v: values) {
switch (v._type) {
case scollectd::data_type::GAUGE:
collected_value.values.push(v.d());
collected_value.values.push(v.u._d);
break;
case scollectd::data_type::DERIVE:
collected_value.values.push(v.i());
collected_value.values.push(v.u._i);
break;
default:
collected_value.values.push(v.ui());
collected_value.values.push(v.u._ui);
break;
}
}

View File

@@ -182,8 +182,17 @@ static int64_t max_row_size(column_family& cf) {
return res;
}
static integral_ratio_holder mean_row_size(column_family& cf) {
integral_ratio_holder res;
static double update_ratio(double acc, double f, double total) {
if (f && !total) {
throw bad_param_exception("total should include all elements");
} else if (total) {
acc += f / total;
}
return acc;
}
static ratio_holder mean_row_size(column_family& cf) {
ratio_holder res;
for (auto i: *cf.get_sstables() ) {
auto c = i->get_stats_metadata().estimated_row_size.count();
res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;
@@ -274,16 +283,6 @@ static std::vector<uint64_t> concat_sstable_count_per_level(std::vector<uint64_t
return a;
}
ratio_holder filter_false_positive_as_ratio_holder(const sstables::shared_sstable& sst) {
double f = sst->filter_get_false_positive();
return ratio_holder(f + sst->filter_get_true_positive(), f);
}
ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared_sstable& sst) {
double f = sst->filter_get_recent_false_positive();
return ratio_holder(f + sst->filter_get_recent_true_positive(), f);
}
void set_column_family(http_context& ctx, routes& r) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
vector<sstring> res;
@@ -429,7 +428,7 @@ void set_column_family(http_context& ctx, routes& r) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_cells_count);
res.merge(i->get_stats_metadata().estimated_column_count);
}
return res;
},
@@ -563,13 +562,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), mean_row_size, std::plus<ratio_holder>());
});
cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, ratio_holder(), mean_row_size, std::plus<ratio_holder>());
});
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -605,27 +602,39 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -905,20 +914,5 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<request> req) {
auto key = req->get_query_param("key");
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.map_reduce0([key, uuid] (database& db) {
return db.find_column_family(uuid).get_sstables_by_partition_key(key);
}, std::unordered_set<sstring>(),
[](std::unordered_set<sstring> a, std::unordered_set<sstring>&& b) mutable {
a.insert(b.begin(),b.end());
return a;
}).then([](const std::unordered_set<sstring>& res) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
});
}
}

View File

@@ -24,7 +24,6 @@
#include "api.hh"
#include "api/api-doc/column_family.json.hh"
#include "database.hh"
#include <any>
namespace api {
@@ -38,15 +37,9 @@ template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
using mapper_type = std::function<std::any (database&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {
return I(mapper(db.find_column_family(uuid)));
}), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
})).then([] (std::any r) {
return std::any_cast<I>(std::move(r));
});
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer);
}
@@ -58,42 +51,35 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
});
}
template<class Mapper, class I, class Reducer, class Result>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer);
}
template<class Mapper, class I, class Reducer, class Result>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([result](const I& res) mutable {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer, result).then([result](const I& res) mutable {
result = res;
return make_ready_future<json::json_return_type>(result);
});
}
struct map_reduce_column_families_locally {
std::any init;
std::function<std::any (column_family&)> mapper;
std::function<std::any (std::any, std::any)> reducer;
std::any operator()(database& db) const {
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
return ctx.db.map_reduce0([mapper, init, reducer](database& db) {
auto res = init;
for (auto i : db.get_column_families()) {
res = reducer(res, mapper(*i.second.get()));
}
return res;
}
};
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
using mapper_type = std::function<std::any (column_family&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {
return I(mapper(cf));
});
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
});
return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {
return std::any_cast<I>(std::move(res));
});
}, init, reducer);
}

View File

@@ -20,13 +20,13 @@
*/
#include "compaction_manager.hh"
#include "sstables/compaction_manager.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
namespace api {
using namespace scollectd;
namespace cm = httpd::compaction_manager_json;
using namespace json;

View File

@@ -1,112 +0,0 @@
/*
* Copyright 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "api/config.hh"
#include "api/api-doc/config.json.hh"
#include "db/config.hh"
#include <sstream>
#include <boost/algorithm/string/replace.hpp>
namespace api {
template<class T>
json::json_return_type get_json_return_type(const T& val) {
return json::json_return_type(val);
}
/*
* As commented on db::seed_provider_type is not used
* and probably never will.
*
* Just in case, we will return its name
*/
template<>
json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
return json::json_return_type(val.class_name);
}
std::string format_type(const std::string& type) {
if (type == "int") {
return "integer";
}
return type;
}
future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {
std::stringstream ss;
if (first) {
first=false;
} else {
ss <<',';
};
ss << "\"/config/" << name <<"\": {"
"\"get\": {"
"\"description\": \"" << boost::replace_all_copy(boost::replace_all_copy(boost::replace_all_copy(description,"\n","\\n"),"\"", "''"), "\t", " ") <<"\","
"\"operationId\": \"find_config_"<< name <<"\","
"\"produces\": ["
"\"application/json\""
"],"
"\"tags\": [\"config\"],"
"\"parameters\": ["
"],"
"\"responses\": {"
"\"200\": {"
"\"description\": \"Config value\","
"\"schema\": {"
"\"type\": \"" << format_type(type) << "\""
"}"
"},"
"\"default\": {"
"\"description\": \"unexpected error\","
"\"schema\": {"
"\"$ref\": \"#/definitions/ErrorModel\""
"}"
"}"
"}"
"}"
"}";
return os.write(ss.str());
}
namespace cs = httpd::config_json;
#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}
#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});
void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {
rb->register_function(r, [] (output_stream<char>& os) {
return do_with(true, [&os] (bool& first) {
auto f = make_ready_future();
_make_config_values(_get_config_description)
return f;
});
});
cs::find_config_id.set(r, [&ctx] (const_req r) {
auto id = r.param["id"];
_make_config_values(_get_config_value)
throw bad_param_exception(sstring("No such config entry: ") + id);
});
}
}

View File

@@ -22,22 +22,16 @@
#include "locator/snitch_base.hh"
#include "endpoint_snitch.hh"
#include "api/api-doc/endpoint_snitch_info.json.hh"
#include "utils/fb_utilities.hh"
namespace api {
void set_endpoint_snitch(http_context& ctx, routes& r) {
static auto host_or_broadcast = [](const_req req) {
auto host = req.get_query_param("host");
return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);
};
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [](const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(host_or_broadcast(req));
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [] (const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(req.get_query_param("host"));
});
httpd::endpoint_snitch_info_json::get_rack.set(r, [](const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_rack(host_or_broadcast(req));
httpd::endpoint_snitch_info_json::get_rack.set(r, [] (const_req req) {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_rack(req.get_query_param("host"));
});
httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [] (const_req req) {

View File

@@ -88,20 +88,6 @@ void set_failure_detector(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(state);
});
});
fd::get_endpoint_phi_values.set(r, [](std::unique_ptr<request> req) {
return gms::get_arrival_samples().then([](std::map<gms::inet_address, gms::arrival_window> map) {
std::vector<fd::endpoint_phi_value> res;
auto now = gms::arrival_window::clk::now();
for (auto& p : map) {
fd::endpoint_phi_value val;
val.endpoint = p.first.to_sstring();
val.phi = p.second.phi(now);
res.emplace_back(std::move(val));
}
return make_ready_future<json::json_return_type>(res);
});
});
}
}

View File

@@ -24,6 +24,7 @@
namespace api {
using namespace scollectd;
using namespace json;
namespace hh = httpd::hinted_handoff_json;

View File

@@ -29,11 +29,11 @@
namespace api {
static logging::logger alogger("lsa-api");
static logging::logger logger("lsa-api");
void set_lsa(http_context& ctx, routes& r) {
httpd::lsa_json::lsa_compact.set(r, [&ctx](std::unique_ptr<request> req) {
alogger.info("Triggering compaction");
logger.info("Triggering compaction");
return ctx.db.invoke_on_all([] (database&) {
logalloc::shard_tracker().reclaim(std::numeric_limits<size_t>::max());
}).then([] {

View File

@@ -27,7 +27,7 @@
#include <sstream>
using namespace httpd::messaging_service_json;
using namespace netw;
using namespace net;
namespace api {
@@ -120,13 +120,13 @@ void set_messaging_service(http_context& ctx, routes& r) {
}));
get_version.set(r, [](const_req req) {
return netw::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
return net::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
});
get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);
return netw::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
for (auto i = 0; i < num_verb; i++) {
(*map)[i]+= local_map[i];
}

View File

@@ -397,7 +397,7 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::range);
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -22,8 +22,6 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include <service/storage_service.hh>
#include <db/commitlog/commitlog.hh>
#include <gms/gossiper.hh>
@@ -34,7 +32,6 @@
#include "column_family.hh"
#include "log.hh"
#include "release.hh"
#include "sstables/compaction_manager.hh"
namespace api {
@@ -93,13 +90,10 @@ void set_storage_service(http_context& ctx, routes& r) {
return ctx.db.local().commitlog()->active_config().commit_log_location;
});
ss::get_token_endpoint.set(r, [] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_to_endpoint_map(), [](const auto& i) {
storage_service_json::mapper val;
val.key = boost::lexical_cast<std::string>(i.first);
val.value = boost::lexical_cast<std::string>(i.second);
return val;
}));
ss::get_token_endpoint.set(r, [] (const_req req) {
auto token_to_ep = service::get_local_storage_service().get_token_to_endpoint_map();
std::vector<storage_service_json::mapper> res;
return map_to_key_value(token_to_ep, res);
});
ss::get_leaving_nodes.set(r, [](const_req req) {
@@ -358,12 +352,6 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::get_active_repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
return get_active_repairs(ctx.db).then([] (std::vector<int> res){
return make_ready_future<json::json_return_type>(res);
});
});
ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {
return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))
.then_wrapped([] (future<repair_status>&& fut) {
@@ -371,22 +359,16 @@ void set_storage_service(http_context& ctx, routes& r) {
try {
res = fut.get0();
} catch(std::runtime_error& e) {
throw httpd::bad_param_exception(e.what());
return make_ready_future<json::json_return_type>(json_exception(httpd::bad_param_exception(e.what())));
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
ss::decommission.set(r, [](std::unique_ptr<request> req) {
@@ -475,15 +457,8 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_keyspaces.set(r, [&ctx](const_req req) {
auto type = req.get_query_param("type");
if (type == "user") {
return ctx.db.local().get_non_system_keyspaces();
} else if (type == "non_local_strategy") {
return map_keys(ctx.db.local().get_keyspaces() | boost::adaptors::filtered([](const auto& p) {
return p.second.get_replication_strategy().get_type() != locator::replication_strategy_type::local;
}));
}
return map_keys(ctx.db.local().get_keyspaces());
auto non_system = req.get_query_param("non_system");
return map_keys(ctx.db.local().keyspaces());
});
ss::update_snitch.set(r, [](std::unique_ptr<request> req) {
@@ -567,7 +542,9 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(service::get_local_storage_service().is_joined());
return service::get_local_storage_service().is_joined().then([] (bool is_joined) {
return make_ready_future<json::json_return_type>(is_joined);
});
});
ss::set_stream_throughput_mb_per_sec.set(r, [](std::unique_ptr<request> req) {
@@ -687,23 +664,17 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {
auto probability = req->get_query_param("probability");
return futurize<json::json_return_type>::apply([probability] {
try {
double real_prob = std::stod(probability.c_str());
return tracing::tracing::tracing_instance().invoke_on_all([real_prob] (auto& local_tracing) {
local_tracing.set_trace_probability(real_prob);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
}).then_wrapped([probability] (auto&& f) {
try {
f.get();
return make_ready_future<json::json_return_type>(json_void());
} catch (std::out_of_range& e) {
throw httpd::bad_param_exception(e.what());
} catch (std::invalid_argument&){
throw httpd::bad_param_exception(sprint("Bad format in a probability value: \"%s\"", probability.c_str()));
}
});
} catch (...) {
throw httpd::bad_param_exception(sprint("Bad format of a probability value: \"%s\"", probability.c_str()));
}
});
ss::get_trace_probability.set(r, [](std::unique_ptr<request> req) {
@@ -818,8 +789,10 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
ss::get_metrics_load.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
ss::get_exceptions.set(r, [](const_req req) {
@@ -852,15 +825,6 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
});
});
ss::view_build_statuses.set(r, [&ctx] (std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto view = req->param["view"];
return service::get_local_storage_service().view_build_statuses(std::move(keyspace), std::move(view)).then([] (std::unordered_map<sstring, sstring> status) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
});
});
}
}

View File

@@ -1,239 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "types.hh"
/// LSA mirator for cells with irrelevant type
///
///
const data::type_imr_descriptor& no_type_imr_descriptor() {
static thread_local data::type_imr_descriptor state(data::type_info::make_variable_size());
return state;
}
atomic_cell atomic_cell::make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
auto& imr_data = no_type_imr_descriptor();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_dead(timestamp, deletion_time), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
auto& imr_data = no_type_imr_descriptor();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live_counter_update(timestamp, value), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size) {
auto& imr_data = no_type_imr_descriptor();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live_uninitialized(imr_data.type_info(), timestamp, size), &imr_data.lsa_migrator())
);
}
static imr::utils::object<data::cell::structure> copy_cell(const data::type_imr_descriptor& imr_data, const uint8_t* ptr)
{
using imr_object_type = imr::utils::object<data::cell::structure>;
// If the cell doesn't own any memory it is trivial and can be copied with
// memcpy.
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
if (!f.template get<data::cell::tags::external_data>()) {
data::cell::context ctx(f, imr_data.type_info());
// XXX: We may be better off storing the total cell size in memory. Measure!
auto size = data::cell::structure::serialized_object_size(ptr, ctx);
return imr_object_type::make_raw(size, [&] (uint8_t* dst) noexcept {
std::copy_n(ptr, size, dst);
}, &imr_data.lsa_migrator());
}
return imr_object_type::make(data::cell::copy_fn(imr_data.type_info(), ptr), &imr_data.lsa_migrator());
}
atomic_cell::atomic_cell(const abstract_type& type, atomic_cell_view other)
: atomic_cell(type.imr_state().type_info(),
copy_cell(type.imr_state(), other._view.raw_pointer()))
{ }
atomic_cell_or_collection atomic_cell_or_collection::copy(const abstract_type& type) const {
if (!_data.get()) {
return atomic_cell_or_collection();
}
auto& imr_data = type.imr_state();
return atomic_cell_or_collection(
copy_cell(imr_data, _data.get())
);
}
atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type, atomic_cell_view acv)
: _data(copy_cell(type.imr_state(), acv._view.raw_pointer()))
{
}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
: _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
{
auto ptr_a = _data.get();
auto ptr_b = other._data.get();
if (!ptr_a || !ptr_b) {
return !ptr_a && !ptr_b;
}
if (type.is_atomic()) {
auto a = atomic_cell_view::from_bytes(type.imr_state().type_info(), _data);
auto b = atomic_cell_view::from_bytes(type.imr_state().type_info(), other._data);
if (a.timestamp() != b.timestamp()) {
return false;
}
if (a.is_live()) {
if (!b.is_live()) {
return false;
}
if (a.is_counter_update()) {
if (!b.is_counter_update()) {
return false;
}
return a.counter_update_value() == b.counter_update_value();
}
if (a.is_live_and_has_ttl()) {
if (!b.is_live_and_has_ttl()) {
return false;
}
if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {
return false;
}
}
return a.value() == b.value();
}
return a.deletion_time() == b.deletion_time();
} else {
return as_collection_mutation().data == other.as_collection_mutation().data;
}
}
size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t) const
{
if (!_data.get()) {
return 0;
}
auto ctx = data::cell::context(_data.get(), t.imr_state().type_info());
auto view = data::cell::structure::make_view(_data.get(), ctx);
auto flags = view.get<data::cell::tags::flags>();
size_t external_value_size = 0;
if (flags.get<data::cell::tags::external_data>()) {
if (flags.get<data::cell::tags::collection>()) {
external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
} else {
auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
external_value_size = cell_view.value_size();
}
// Add overhead of chunk headers. The last one is a special case.
external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;
external_value_size += data::cell::external_last_chunk_overhead;
}
return data::cell::structure::serialized_object_size(_data.get(), ctx)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {
if (!c._data.get()) {
return os << "{ null atomic_cell_or_collection }";
}
using dc = data::cell;
os << "{ ";
if (dc::structure::get_member<dc::tags::flags>(c._data.get()).get<dc::tags::collection>()) {
os << "collection";
} else {
os << "atomic cell";
}
return os << " @" << static_cast<const void*>(c._data.get()) << " }";
}

View File

@@ -28,190 +28,234 @@
#include "utils/managed_bytes.hh"
#include "net/byteorder.hh"
#include <cstdint>
#include <iosfwd>
#include <seastar/util/gcc6-concepts.hh>
#include "data/cell.hh"
#include "data/schema_info.hh"
#include "imr/utils.hh"
#include <iostream>
#include "serializer.hh"
template<typename T>
static inline
void set_field(managed_bytes& v, unsigned offset, T val) {
reinterpret_cast<net::packed<T>*>(v.begin() + offset)->raw = net::hton(val);
}
class abstract_type;
class collection_type_impl;
template<typename T>
static inline
T get_field(const bytes_view& v, unsigned offset) {
return net::ntoh(*reinterpret_cast<const net::packed<T>*>(v.begin() + offset));
}
using atomic_cell_value_view = data::value_view;
using atomic_cell_value_mutable_view = data::value_mutable_view;
class atomic_cell_or_collection;
/// View of an atomic cell
template<mutable_view is_mutable>
class basic_atomic_cell_view {
protected:
data::cell::basic_atomic_cell_view<is_mutable> _view;
/*
* Represents atomic cell layout. Works on serialized form.
*
* Layout:
*
* <live> := <int8_t:flags><int64_t:timestamp>(<int32_t:expiry><int32_t:ttl>)?<value>
* <dead> := <int8_t: 0><int64_t:timestamp><int32_t:deletion_time>
*/
class atomic_cell_type final {
private:
static constexpr int8_t LIVE_FLAG = 0x01;
static constexpr int8_t EXPIRY_FLAG = 0x02; // When present, expiry field is present. Set only for live cells
static constexpr int8_t REVERT_FLAG = 0x04; // transient flag used to efficiently implement ReversiblyMergeable for atomic cells.
static constexpr unsigned flags_size = 1;
static constexpr unsigned timestamp_offset = flags_size;
static constexpr unsigned timestamp_size = 8;
static constexpr unsigned expiry_offset = timestamp_offset + timestamp_size;
static constexpr unsigned expiry_size = 4;
static constexpr unsigned deletion_time_offset = timestamp_offset + timestamp_size;
static constexpr unsigned deletion_time_size = 4;
static constexpr unsigned ttl_offset = expiry_offset + expiry_size;
static constexpr unsigned ttl_size = 4;
private:
static bool is_revert_set(bytes_view cell) {
return cell[0] & REVERT_FLAG;
}
template<typename BytesContainer>
static void set_revert(BytesContainer& cell, bool revert) {
cell[0] = (cell[0] & ~REVERT_FLAG) | (revert * REVERT_FLAG);
}
static bool is_live(const bytes_view& cell) {
return cell[0] & LIVE_FLAG;
}
static bool is_live_and_has_ttl(const bytes_view& cell) {
return cell[0] & EXPIRY_FLAG;
}
static bool is_dead(const bytes_view& cell) {
return !is_live(cell);
}
// Can be called on live and dead cells
static api::timestamp_type timestamp(const bytes_view& cell) {
return get_field<api::timestamp_type>(cell, timestamp_offset);
}
// Can be called on live cells only
static bytes_view value(bytes_view cell) {
auto expiry_field_size = bool(cell[0] & EXPIRY_FLAG) * (expiry_size + ttl_size);
auto value_offset = flags_size + timestamp_size + expiry_field_size;
cell.remove_prefix(value_offset);
return cell;
}
// Can be called only when is_dead() is true.
static gc_clock::time_point deletion_time(const bytes_view& cell) {
assert(is_dead(cell));
return gc_clock::time_point(gc_clock::duration(
get_field<int32_t>(cell, deletion_time_offset)));
}
// Can be called only when is_live_and_has_ttl() is true.
static gc_clock::time_point expiry(const bytes_view& cell) {
assert(is_live_and_has_ttl(cell));
auto expiry = get_field<int32_t>(cell, expiry_offset);
return gc_clock::time_point(gc_clock::duration(expiry));
}
// Can be called only when is_live_and_has_ttl() is true.
static gc_clock::duration ttl(const bytes_view& cell) {
assert(is_live_and_has_ttl(cell));
return gc_clock::duration(get_field<int32_t>(cell, ttl_offset));
}
static managed_bytes make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
managed_bytes b(managed_bytes::initialized_later(), flags_size + timestamp_size + deletion_time_size);
b[0] = 0;
set_field(b, timestamp_offset, timestamp);
set_field(b, deletion_time_offset, deletion_time.time_since_epoch().count());
return b;
}
static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value) {
auto value_offset = flags_size + timestamp_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
b[0] = LIVE_FLAG;
set_field(b, timestamp_offset, timestamp);
std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
return b;
}
static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value, gc_clock::time_point expiry, gc_clock::duration ttl) {
auto value_offset = flags_size + timestamp_size + expiry_size + ttl_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
b[0] = EXPIRY_FLAG | LIVE_FLAG;
set_field(b, timestamp_offset, timestamp);
set_field(b, expiry_offset, expiry.time_since_epoch().count());
set_field(b, ttl_offset, ttl.count());
std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
return b;
}
template<typename ByteContainer>
friend class atomic_cell_base;
friend class atomic_cell;
public:
using pointer_type = std::conditional_t<is_mutable == mutable_view::no, const uint8_t*, uint8_t*>;
};
template<typename ByteContainer>
class atomic_cell_base {
protected:
explicit basic_atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> v)
: _view(std::move(v)) { }
basic_atomic_cell_view(const data::type_info& ti, pointer_type ptr)
: _view(data::cell::make_atomic_cell_view(ti, ptr))
{ }
ByteContainer _data;
protected:
atomic_cell_base(ByteContainer&& data) : _data(std::forward<ByteContainer>(data)) { }
friend class atomic_cell_or_collection;
public:
operator basic_atomic_cell_view<mutable_view::no>() const noexcept {
return basic_atomic_cell_view<mutable_view::no>(_view);
}
void swap(basic_atomic_cell_view& other) noexcept {
using std::swap;
swap(_view, other._view);
}
bool is_counter_update() const {
return _view.is_counter_update();
bool is_revert_set() const {
return atomic_cell_type::is_revert_set(_data);
}
bool is_live() const {
return _view.is_live();
return atomic_cell_type::is_live(_data);
}
bool is_live(tombstone t, bool is_counter) const {
return is_live() && !is_covered_by(t, is_counter);
bool is_live(tombstone t) const {
return is_live() && !is_covered_by(t);
}
bool is_live(tombstone t, gc_clock::time_point now, bool is_counter) const {
return is_live() && !is_covered_by(t, is_counter) && !has_expired(now);
bool is_live(tombstone t, gc_clock::time_point now) const {
return is_live() && !is_covered_by(t) && !has_expired(now);
}
bool is_live_and_has_ttl() const {
return _view.is_expiring();
return atomic_cell_type::is_live_and_has_ttl(_data);
}
bool is_dead(gc_clock::time_point now) const {
return !is_live() || has_expired(now);
return atomic_cell_type::is_dead(_data) || has_expired(now);
}
bool is_covered_by(tombstone t, bool is_counter) const {
return timestamp() <= t.timestamp || (is_counter && t.timestamp != api::missing_timestamp);
bool is_covered_by(tombstone t) const {
return timestamp() <= t.timestamp;
}
// Can be called on live and dead cells
api::timestamp_type timestamp() const {
return _view.timestamp();
}
void set_timestamp(api::timestamp_type ts) {
_view.set_timestamp(ts);
return atomic_cell_type::timestamp(_data);
}
// Can be called on live cells only
data::basic_value_view<is_mutable> value() const {
return _view.value();
}
// Can be called on live cells only
size_t value_size() const {
return _view.value_size();
}
bool is_value_fragmented() const {
return _view.is_value_fragmented();
}
// Can be called on live counter update cells only
int64_t counter_update_value() const {
return _view.counter_update_value();
bytes_view value() const {
return atomic_cell_type::value(_data);
}
// Can be called only when is_dead(gc_clock::time_point)
gc_clock::time_point deletion_time() const {
return !is_live() ? _view.deletion_time() : expiry() - ttl();
return !is_live() ? atomic_cell_type::deletion_time(_data) : expiry() - ttl();
}
// Can be called only when is_live_and_has_ttl()
gc_clock::time_point expiry() const {
return _view.expiry();
return atomic_cell_type::expiry(_data);
}
// Can be called only when is_live_and_has_ttl()
gc_clock::duration ttl() const {
return _view.ttl();
return atomic_cell_type::ttl(_data);
}
// Can be called on live and dead cells
bool has_expired(gc_clock::time_point now) const {
return is_live_and_has_ttl() && expiry() <= now;
return is_live_and_has_ttl() && expiry() < now;
}
bytes_view serialize() const {
return _view.serialize();
return _data;
}
void set_revert(bool revert) {
atomic_cell_type::set_revert(_data, revert);
}
};
class atomic_cell_view final : public basic_atomic_cell_view<mutable_view::no> {
atomic_cell_view(const data::type_info& ti, const uint8_t* data)
: basic_atomic_cell_view<mutable_view::no>(ti, data) {}
template<mutable_view is_mutable>
atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> view)
: basic_atomic_cell_view<mutable_view::no>(view) { }
friend class atomic_cell;
class atomic_cell_view final : public atomic_cell_base<bytes_view> {
atomic_cell_view(bytes_view data) : atomic_cell_base(std::move(data)) {}
public:
static atomic_cell_view from_bytes(const data::type_info& ti, const imr::utils::object<data::cell::structure>& data) {
return atomic_cell_view(ti, data.get());
}
static atomic_cell_view from_bytes(const data::type_info& ti, bytes_view bv) {
return atomic_cell_view(ti, reinterpret_cast<const uint8_t*>(bv.begin()));
}
static atomic_cell_view from_bytes(bytes_view data) { return atomic_cell_view(data); }
friend class atomic_cell;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
};
class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
atomic_cell_mutable_view(const data::type_info& ti, uint8_t* data)
: basic_atomic_cell_view<mutable_view::yes>(ti, data) {}
class atomic_cell_ref final : public atomic_cell_base<managed_bytes&> {
public:
static atomic_cell_mutable_view from_bytes(const data::type_info& ti, imr::utils::object<data::cell::structure>& data) {
return atomic_cell_mutable_view(ti, data.get());
}
friend class atomic_cell;
atomic_cell_ref(managed_bytes& buf) : atomic_cell_base(buf) {}
};
using atomic_cell_ref = atomic_cell_mutable_view;
class atomic_cell final : public basic_atomic_cell_view<mutable_view::yes> {
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
atomic_cell(const data::type_info& ti, imr::utils::object<data::cell::structure>&& data)
: basic_atomic_cell_view<mutable_view::yes>(ti, data.get()), _data(std::move(data)) {}
class atomic_cell final : public atomic_cell_base<managed_bytes> {
atomic_cell(managed_bytes b) : atomic_cell_base(std::move(b)) {}
public:
class collection_member_tag;
using collection_member = bool_class<collection_member_tag>;
atomic_cell(const atomic_cell&) = default;
atomic_cell(atomic_cell&&) = default;
atomic_cell& operator=(const atomic_cell&) = delete;
atomic_cell& operator=(const atomic_cell&) = default;
atomic_cell& operator=(atomic_cell&&) = default;
void swap(atomic_cell& other) noexcept {
basic_atomic_cell_view<mutable_view::yes>::swap(other);
_data.swap(other._data);
static atomic_cell from_bytes(managed_bytes b) {
return atomic_cell(std::move(b));
}
operator atomic_cell_view() const { return atomic_cell_view(_view); }
atomic_cell(const abstract_type& t, atomic_cell_view other);
static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
collection_member cm = collection_member::no) {
return make_live(type, timestamp, bytes_view(value), cm);
atomic_cell(atomic_cell_view other) : atomic_cell_base(managed_bytes{other._data}) {}
operator atomic_cell_view() const {
return atomic_cell_view(_data);
}
static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm = collection_member::no)
static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
return atomic_cell_type::make_dead(timestamp, deletion_time);
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value) {
return atomic_cell_type::make_live(timestamp, value);
}
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value) {
return make_live(timestamp, bytes_view(value));
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return make_live(type, timestamp, bytes_view(value), expiry, ttl, cm);
return atomic_cell_type::make_live(timestamp, value, expiry, ttl);
}
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, ttl_opt ttl, collection_member cm = collection_member::no) {
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return make_live(timestamp, bytes_view(value), expiry, ttl);
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value, ttl_opt ttl) {
if (!ttl) {
return make_live(type, timestamp, value, cm);
return atomic_cell_type::make_live(timestamp, value);
} else {
return make_live(type, timestamp, value, gc_clock::now() + *ttl, *ttl, cm);
return atomic_cell_type::make_live(timestamp, value, gc_clock::now() + *ttl, *ttl);
}
}
static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
friend class atomic_cell_or_collection;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
};
@@ -225,24 +269,38 @@ class collection_mutation_view;
// list: tbd, probably ugly
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
managed_bytes data;
collection_mutation() {}
collection_mutation(const collection_type_impl&, collection_mutation_view v);
collection_mutation(const collection_type_impl&, bytes_view bv);
collection_mutation(managed_bytes b) : data(std::move(b)) {}
collection_mutation(collection_mutation_view v);
operator collection_mutation_view() const;
};
class collection_mutation_view {
public:
atomic_cell_value_view data;
bytes_view data;
bytes_view serialize() const { return data; }
static collection_mutation_view from_bytes(bytes_view v) { return { v }; }
};
inline
collection_mutation::collection_mutation(collection_mutation_view v)
: data(v.data) {
}
inline
collection_mutation::operator collection_mutation_view() const {
return { data };
}
namespace db {
template<typename T>
class serializer;
}
class column_definition;
int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);
void merge_column(const abstract_type& def,
void merge_column(const column_definition& def,
atomic_cell_or_collection& old,
const atomic_cell_or_collection& neww);

View File

@@ -25,39 +25,28 @@
#include "types.hh"
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "hashing.hh"
#include "counters.hh"
template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
auto m_view = ctype->deserialize_mutation_form(cell_bv);
void operator()(Hasher& h, collection_mutation_view cell) const {
auto m_view = collection_type_impl::deserialize_mutation_form(cell);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
::feed_hash(h, key_and_value.second);
}
});
}
};
template<>
struct appending_hash<atomic_cell_view> {
template<typename Hasher>
void operator()(Hasher& h, atomic_cell_view cell, const column_definition& cdef) const {
void operator()(Hasher& h, atomic_cell_view cell) const {
feed_hash(h, cell.is_live());
feed_hash(h, cell.timestamp());
if (cell.is_live()) {
if (cdef.is_counter()) {
counter_cell_view::with_linearized(cell, [&] (counter_cell_view ccv) {
::feed_hash(h, ccv);
});
return;
}
if (cell.is_live_and_has_ttl()) {
feed_hash(h, cell.expiry());
feed_hash(h, cell.ttl());
@@ -72,27 +61,15 @@ struct appending_hash<atomic_cell_view> {
template<>
struct appending_hash<atomic_cell> {
template<typename Hasher>
void operator()(Hasher& h, const atomic_cell& cell, const column_definition& cdef) const {
feed_hash(h, static_cast<atomic_cell_view>(cell), cdef);
void operator()(Hasher& h, const atomic_cell& cell) const {
feed_hash(h, static_cast<atomic_cell_view>(cell));
}
};
template<>
struct appending_hash<collection_mutation> {
template<typename Hasher>
void operator()(Hasher& h, const collection_mutation& cm, const column_definition& cdef) const {
feed_hash(h, static_cast<collection_mutation_view>(cm), cdef);
}
};
template<>
struct appending_hash<atomic_cell_or_collection> {
template<typename Hasher>
void operator()(Hasher& h, const atomic_cell_or_collection& c, const column_definition& cdef) const {
if (cdef.is_atomic()) {
feed_hash(h, c.as_atomic_cell(cdef), cdef);
} else {
feed_hash(h, c.as_collection_mutation(), cdef);
}
void operator()(Hasher& h, const collection_mutation& cm) const {
feed_hash(h, static_cast<collection_mutation_view>(cm));
}
};

View File

@@ -25,56 +25,46 @@
#include "schema.hh"
#include "hashing.hh"
#include "imr/utils.hh"
// A variant type that can hold either an atomic_cell, or a serialized collection.
// Which type is stored is determined by the schema.
// Has an "empty" state.
// Objects moved-from are left in an empty state.
class atomic_cell_or_collection final {
// FIXME: This has made us lose small-buffer optimisation. Unfortunately,
// due to the changed cell format it would be less effective now, anyway.
// Measure the actual impact because any attempts to fix this will become
// irrelevant once rows are converted to the IMR as well, so maybe we can
// live with this like that.
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
managed_bytes _data;
private:
atomic_cell_or_collection(imr::utils::object<data::cell::structure>&& data) : _data(std::move(data)) {}
atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
public:
atomic_cell_or_collection() = default;
atomic_cell_or_collection(atomic_cell_or_collection&&) = default;
atomic_cell_or_collection(const atomic_cell_or_collection&) = delete;
atomic_cell_or_collection& operator=(atomic_cell_or_collection&&) = default;
atomic_cell_or_collection& operator=(const atomic_cell_or_collection&) = delete;
atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
atomic_cell_or_collection(const abstract_type& at, atomic_cell_view acv);
static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
atomic_cell_view as_atomic_cell(const column_definition& cdef) const { return atomic_cell_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
atomic_cell_ref as_atomic_cell_ref(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
atomic_cell_mutable_view as_mutable_atomic_cell(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm._data)) { }
atomic_cell_or_collection copy(const abstract_type&) const;
atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
atomic_cell_ref as_atomic_cell_ref() { return { _data }; }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
explicit operator bool() const {
return bool(_data);
return !_data.empty();
}
static constexpr bool can_use_mutable_view() {
return true;
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
return std::move(data.data);
}
void swap(atomic_cell_or_collection& other) noexcept {
_data.swap(other._data);
collection_mutation_view as_collection_mutation() const {
return collection_mutation_view{_data};
}
bytes_view serialize() const {
return _data;
}
bool operator==(const atomic_cell_or_collection& other) const {
return _data == other._data;
}
template<typename Hasher>
void feed_hash(Hasher& h, const column_definition& def) const {
if (def.is_atomic()) {
::feed_hash(h, as_atomic_cell());
} else {
::feed_hash(as_collection_mutation(), h, def.type);
}
}
size_t memory_usage() const {
return _data.memory_usage();
}
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) { return std::move(data._data); }
collection_mutation_view as_collection_mutation() const;
bytes_view serialize() const;
bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;
size_t external_memory_usage(const abstract_type&) const;
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};
namespace std {
inline void swap(atomic_cell_or_collection& a, atomic_cell_or_collection& b) noexcept
{
a.swap(b);
}
}

View File

@@ -1,41 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/allow_all_authenticator.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
namespace auth {
const sstring& allow_all_authenticator_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthenticator";
return name;
}
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authenticator,
allow_all_authenticator,
cql3::query_processor&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
}

View File

@@ -1,101 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <stdexcept>
#include "auth/authenticated_user.hh"
#include "auth/authenticator.hh"
#include "auth/common.hh"
namespace cql3 {
class query_processor;
}
namespace service {
class migration_manager;
}
namespace auth {
const sstring& allow_all_authenticator_name();
class allow_all_authenticator final : public authenticator {
public:
allow_all_authenticator(cql3::query_processor&, ::service::migration_manager&) {
}
virtual future<> start() override {
return make_ready_future<>();
}
virtual future<> stop() override {
return make_ready_future<>();
}
virtual const sstring& qualified_java_name() const override {
return allow_all_authenticator_name();
}
virtual bool require_authentication() const override {
return false;
}
virtual authentication_option_set supported_options() const override {
return authentication_option_set();
}
virtual authentication_option_set alterable_options() const override {
return authentication_option_set();
}
future<authenticated_user> authenticate(const credentials_map& credentials) const override {
return make_ready_future<authenticated_user>(anonymous_user());
}
virtual future<> create(stdx::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> alter(stdx::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> drop(stdx::string_view) const override {
return make_ready_future();
}
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
return make_ready_future<custom_options>();
}
virtual const resource_set& protected_resources() const override {
static const resource_set resources;
return resources;
}
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
}

View File

@@ -1,41 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "utils/class_registrator.hh"
namespace auth {
const sstring& allow_all_authorizer_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";
return name;
}
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authorizer,
allow_all_authorizer,
cql3::query_processor&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthorizer");
}

View File

@@ -1,93 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "auth/authorizer.hh"
#include "exceptions/exceptions.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
}
namespace service {
class migration_manager;
}
namespace auth {
const sstring& allow_all_authorizer_name();
class allow_all_authorizer final : public authorizer {
public:
allow_all_authorizer(cql3::query_processor&, ::service::migration_manager&) {
}
virtual future<> start() override {
return make_ready_future<>();
}
virtual future<> stop() override {
return make_ready_future<>();
}
virtual const sstring& qualified_java_name() const override {
return allow_all_authorizer_name();
}
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke(stdx::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual future<std::vector<permission_details>> list_all() const override {
return make_exception_future<std::vector<permission_details>>(
unsupported_authorization_operation(
"LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(stdx::string_view) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(const resource&) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual const resource_set& protected_resources() const override {
static const resource_set resources;
return resources;
}
};
}

383
auth/auth.cc Normal file
View File

@@ -0,0 +1,383 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <seastar/core/sleep.hh>
#include <seastar/core/distributed.hh>
#include "auth.hh"
#include "authenticator.hh"
#include "authorizer.hh"
#include "database.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/raw/cf_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "db/config.hh"
#include "service/migration_manager.hh"
#include "utils/loading_cache.hh"
#include "utils/hash.hh"
const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
const sstring auth::auth::AUTH_KS("system_auth");
const sstring auth::auth::USERS_CF("users");
static const sstring USER_NAME("name");
static const sstring SUPER("super");
static logging::logger logger("auth");
// TODO: configurable
using namespace std::chrono_literals;
const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
class auth_migration_listener : public service::migration_listener {
void on_create_keyspace(const sstring& ks_name) override {}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_keyspace(const sstring& ks_name) override {}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_keyspace(const sstring& ks_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name));
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name, cf_name));
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
};
static auth_migration_listener auth_migration;
namespace std {
template <>
struct hash<auth::data_resource> {
size_t operator()(const auth::data_resource & v) const {
return v.hash_value();
}
};
template <>
struct hash<auth::authenticated_user> {
size_t operator()(const auth::authenticated_user & v) const {
return utils::tuple_hash()(v.name(), v.is_anonymous());
}
};
}
class auth::auth::permissions_cache {
public:
typedef utils::loading_cache<std::pair<authenticated_user, data_resource>, permission_set, utils::tuple_hash> cache_type;
typedef typename cache_type::key_type key_type;
permissions_cache()
: permissions_cache(
cql3::get_local_query_processor().db().local().get_config()) {
}
permissions_cache(const db::config& cfg)
: _cache(cfg.permissions_cache_max_entries(), expiry(cfg),
std::chrono::milliseconds(
cfg.permissions_validity_in_ms()),
[](const key_type& k) {
logger.debug("Refreshing permissions for {}", k.first.name());
return authorizer::get().authorize(::make_shared<authenticated_user>(k.first), k.second);
}) {
}
static std::chrono::milliseconds expiry(const db::config& cfg) {
auto exp = cfg.permissions_update_interval_in_ms();
if (exp == 0 || exp == std::numeric_limits<uint32_t>::max()) {
exp = cfg.permissions_validity_in_ms();
}
return std::chrono::milliseconds(exp);
}
future<> stop() {
return make_ready_future<>();
}
future<permission_set> get(::shared_ptr<authenticated_user> user, data_resource resource) {
return _cache.get(key_type(*user, std::move(resource)));
}
private:
cache_type _cache;
};
static distributed<auth::auth::permissions_cache> perm_cache;
/**
* Poor mans job schedule. For maximum 2 jobs. Sic.
* Still does nothing more clever than waiting 10 seconds
* like origin, then runs the submitted tasks.
*
* Only difference compared to sleep (from which this
* borrows _heavily_) is that if tasks have not run by the time
* we exit (and do static clean up) we delete the promise + cont
*
* Should be abstracted to some sort of global server function
* probably.
*/
struct waiter {
promise<> done;
timer<> tmr;
waiter() : tmr([this] {done.set_value();})
{
tmr.arm(auth::auth::SUPERUSER_SETUP_DELAY);
}
~waiter() {
if (tmr.armed()) {
tmr.cancel();
done.set_exception(std::runtime_error("shutting down"));
}
logger.trace("Deleting scheduled task");
}
void kill() {
}
};
typedef std::unique_ptr<waiter> waiter_ptr;
static std::vector<waiter_ptr> & thread_waiters() {
static thread_local std::vector<waiter_ptr> the_waiters;
return the_waiters;
}
void auth::auth::schedule_when_up(scheduled_func f) {
logger.trace("Adding scheduled task");
auto & waiters = thread_waiters();
waiters.emplace_back(std::make_unique<waiter>());
auto* w = waiters.back().get();
w->done.get_future().finally([w] {
auto & waiters = thread_waiters();
auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
return p.get() == w;
});
if (i != waiters.end()) {
waiters.erase(i);
}
}).then([f = std::move(f)] {
logger.trace("Running scheduled task");
return f();
}).handle_exception([](auto ep) {
return make_ready_future();
});
}
bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
if (type == classname) {
return true;
}
auto i = classname.find_last_of('.');
return classname.compare(i + 1, sstring::npos, type) == 0;
}
future<> auth::auth::setup() {
auto& db = cql3::get_local_query_processor().db().local();
auto& cfg = db.get_config();
future<> f = perm_cache.start();
if (is_class_type(cfg.authenticator(),
authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)
&& is_class_type(cfg.authorizer(),
authorizer::ALLOW_ALL_AUTHORIZER_NAME)
) {
// just create the objects
return f.then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
});
}
if (!db.has_keyspace(AUTH_KS)) {
std::map<sstring, sstring> opts;
opts["replication_factor"] = "1";
auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
f = service::get_local_migration_manager().announce_new_keyspace(ksm, false);
}
return f.then([] {
return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
90 * 24 * 60 * 60)); // 3 months.
}).then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
}).then([] {
service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
// instead of once-timer, just schedule this later
schedule_when_up([] {
// setup default super user
return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
AUTH_KS, USERS_CF, USER_NAME, SUPER);
cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
logger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
}).handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception&) {
logger.warn("Skipped default superuser setup: some nodes were not ready");
}
});
}
});
});
});
}
future<> auth::auth::shutdown() {
// just make sure we don't have pending tasks.
// this is mostly relevant for test cases where
// db-env-shutdown != process shutdown
return smp::invoke_on_all([] {
thread_waiters().clear();
}).then([] {
return perm_cache.stop();
});
}
future<auth::permission_set> auth::auth::get_permissions(::shared_ptr<authenticated_user> user, data_resource resource) {
return perm_cache.local().get(std::move(user), std::move(resource));
}
static db::consistency_level consistency_for_user(const sstring& username) {
if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return cql3::get_local_query_processor().process(
sprint("SELECT * FROM %s.%s WHERE %s = ?",
auth::auth::AUTH_KS, auth::auth::USERS_CF,
USER_NAME), consistency_for_user(username),
{ username }, true);
}
future<bool> auth::auth::is_existing_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
}
future<bool> auth::auth::is_super_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
});
}
future<> auth::auth::insert_user(const sstring& username, bool is_super)
throw (exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
AUTH_KS, USERS_CF, USER_NAME, SUPER),
consistency_for_user(username), { username, is_super }).discard_result();
}
future<> auth::auth::delete_user(const sstring& username) throw(exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
AUTH_KS, USERS_CF, USER_NAME),
consistency_for_user(username), { username }).discard_result();
}
future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
auto& qp = cql3::get_local_query_processor();
auto& db = qp.db().local();
if (db.has_schema(AUTH_KS, name)) {
return make_ready_future();
}
::shared_ptr<cql3::statements::raw::cf_statement> parsed = static_pointer_cast<
cql3::statements::raw::cf_statement>(cql3::query_processor::parse_statement(cql));
parsed->prepare_keyspace(AUTH_KS);
::shared_ptr<cql3::statements::create_table_statement> statement =
static_pointer_cast<cql3::statements::create_table_statement>(
parsed->prepare(db)->statement);
auto schema = statement->get_cf_meta_data();
auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return service::get_local_migration_manager().announce_new_column_family(b.build(), false);
}
future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
});
});
}

124
auth/auth.hh Normal file
View File

@@ -0,0 +1,124 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "exceptions/exceptions.hh"
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class authenticated_user;
class auth {
public:
class permissions_cache;
static const sstring DEFAULT_SUPERUSER_NAME;
static const sstring AUTH_KS;
static const sstring USERS_CF;
static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
static bool is_class_type(const sstring& type, const sstring& classname);
static future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource);
/**
* Checks if the username is stored in AUTH_KS.USERS_CF.
*
* @param username Username to query.
* @return whether or not Cassandra knows about the user.
*/
static future<bool> is_existing_user(const sstring& username);
/**
* Checks if the user is a known superuser.
*
* @param username Username to query.
* @return true is the user is a superuser, false if they aren't or don't exist at all.
*/
static future<bool> is_super_user(const sstring& username);
/**
* Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
*
* @param username Username to insert.
* @param isSuper User's new status.
* @throws RequestExecutionException
*/
static future<> insert_user(const sstring& username, bool is_super) throw(exceptions::request_execution_exception);
/**
* Deletes the user from AUTH_KS.USERS_CF.
*
* @param username Username to delete.
* @throws RequestExecutionException
*/
static future<> delete_user(const sstring& username) throw(exceptions::request_execution_exception);
/**
* Sets up Authenticator and Authorizer.
*/
static future<> setup();
static future<> shutdown();
/**
* Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
*
* @param name name of the table
* @param cql CREATE TABLE statement
*/
static future<> setup_table(const sstring& name, const sstring& cql);
static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
// For internal use. Run function "when system is up".
typedef std::function<future<>()> scheduled_func;
static void schedule_when_up(scheduled_func);
};
}

View File

@@ -39,30 +39,34 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/authenticated_user.hh"
#include <iostream>
#include "authenticated_user.hh"
#include "auth.hh"
namespace auth {
const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
authenticated_user::authenticated_user(stdx::string_view name)
: name(sstring(name)) {
auth::authenticated_user::authenticated_user()
: _anon(true)
{}
auth::authenticated_user::authenticated_user(sstring name)
: _name(name), _anon(false)
{}
auth::authenticated_user::authenticated_user(authenticated_user&&) = default;
auth::authenticated_user::authenticated_user(const authenticated_user&) = default;
const sstring& auth::authenticated_user::name() const {
return _anon ? ANONYMOUS_USERNAME : _name;
}
std::ostream& operator<<(std::ostream& os, const authenticated_user& u) {
if (!u.name) {
os << "anonymous";
} else {
os << *u.name;
future<bool> auth::authenticated_user::is_super() const {
if (is_anonymous()) {
return make_ready_future<bool>(false);
}
return os;
}
static const authenticated_user the_anonymous_user{};
const authenticated_user& anonymous_user() noexcept {
return the_anonymous_user;
return auth::auth::is_super_user(_name);
}
bool auth::authenticated_user::operator==(const authenticated_user& v) const {
return _anon ? v._anon : _name == v._name;
}

View File

@@ -41,63 +41,42 @@
#pragma once
#include <experimental/string_view>
#include <functional>
#include <iosfwd>
#include <optional>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include "stdx.hh"
#include <seastar/core/future.hh>
namespace auth {
///
/// A type-safe wrapper for the name of a logged-in user, or a nameless (anonymous) user.
///
class authenticated_user final {
class authenticated_user {
public:
///
/// An anonymous user has no name.
///
std::optional<sstring> name{};
static const sstring ANONYMOUS_USERNAME;
///
/// An anonymous user.
///
authenticated_user() = default;
explicit authenticated_user(stdx::string_view name);
};
authenticated_user();
authenticated_user(sstring name);
authenticated_user(authenticated_user&&);
authenticated_user(const authenticated_user&);
///
/// The user name, or "anonymous".
///
std::ostream& operator<<(std::ostream&, const authenticated_user&);
const sstring& name() const;
inline bool operator==(const authenticated_user& u1, const authenticated_user& u2) noexcept {
return u1.name == u2.name;
}
/**
* Checks the user's superuser status.
* Only a superuser is allowed to perform CREATE USER and DROP USER queries.
* Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
* (depends on IAuthorizer implementation).
*/
future<bool> is_super() const;
inline bool operator!=(const authenticated_user& u1, const authenticated_user& u2) noexcept {
return !(u1 == u2);
}
const authenticated_user& anonymous_user() noexcept;
inline bool is_anonymous(const authenticated_user& u) noexcept {
return u == anonymous_user();
}
}
namespace std {
template <>
struct hash<auth::authenticated_user> final {
size_t operator()(const auth::authenticated_user &u) const {
return std::hash<std::optional<sstring>>()(u.name);
/**
* If IAuthenticator doesn't require authentication, this method may return true.
*/
bool is_anonymous() const {
return _anon;
}
bool operator==(const authenticated_user&) const;
private:
sstring _name;
bool _anon;
};
}

View File

@@ -1,64 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <iosfwd>
#include <optional>
#include <stdexcept>
#include <unordered_map>
#include <unordered_set>
#include <seastar/core/print.hh>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace auth {
enum class authentication_option {
password,
options
};
std::ostream& operator<<(std::ostream&, authentication_option);
using authentication_option_set = std::unordered_set<authentication_option>;
using custom_options = std::unordered_map<sstring, sstring>;
struct authentication_options final {
std::optional<sstring> password;
std::optional<custom_options> options;
};
inline bool any_authentication_options(const authentication_options& aos) noexcept {
return aos.password || aos.options;
}
class unsupported_authentication_option : public std::invalid_argument {
public:
explicit unsupported_authentication_option(authentication_option k)
: std::invalid_argument(sprint("The %s option is not supported.", k)) {
}
};
}

View File

@@ -39,14 +39,89 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/authenticator.hh"
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "cql3/query_processor.hh"
#include "authenticator.hh"
#include "authenticated_user.hh"
#include "password_authenticator.hh"
#include "auth.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
const sstring auth::authenticator::USERNAME_KEY("username");
const sstring auth::authenticator::PASSWORD_KEY("password");
const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
auth::authenticator::option auth::authenticator::string_to_option(const sstring& name) {
if (strcasecmp(name.c_str(), "password") == 0) {
return option::PASSWORD;
}
throw std::invalid_argument(name);
}
sstring auth::authenticator::option_to_string(option opt) {
switch (opt) {
case option::PASSWORD:
return "PASSWORD";
default:
throw std::invalid_argument(sprint("Unknown option {}", opt));
}
}
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authenticator> global_authenticator;
future<>
auth::authenticator::setup(const sstring& type) throw (exceptions::configuration_exception) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
class allow_all_authenticator : public authenticator {
public:
const sstring& class_name() const override {
return ALLOW_ALL_AUTHENTICATOR_NAME;
}
bool require_authentication() const override {
return false;
}
option_set supported_options() const override {
return option_set();
}
option_set alterable_options() const override {
return option_set();
}
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override {
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
const resource_ids& protected_resources() const override {
static const resource_ids ids;
return ids;
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
global_authenticator = std::make_unique<allow_all_authenticator>();
} else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
auto pwa = std::make_unique<password_authenticator>();
auto f = pwa->init();
return f.then([pwa = std::move(pwa)]() mutable {
global_authenticator = std::move(pwa);
});
} else {
throw exceptions::configuration_exception("Invalid authenticator type: " + type);
}
return make_ready_future();
}
auth::authenticator& auth::authenticator::get() {
assert(global_authenticator);
return *global_authenticator;
}

View File

@@ -41,24 +41,21 @@
#pragma once
#include <experimental/string_view>
#include <memory>
#include <unordered_map>
#include <set>
#include <stdexcept>
#include <unordered_map>
#include <boost/any.hpp>
#include <seastar/core/enum.hh>
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/authentication_options.hh"
#include "auth/resource.hh"
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/enum.hh>
#include "bytes.hh"
#include "data_resource.hh"
#include "enum_set.hh"
#include "exceptions/exceptions.hh"
#include "stdx.hh"
namespace db {
class config;
@@ -68,104 +65,136 @@ namespace auth {
class authenticated_user;
///
/// Abstract client for authenticating role identity.
///
/// All state necessary to authorize a role is stored externally to the client instance.
///
class authenticator {
public:
///
/// The name of the key to be used for the user-name part of password authentication with \ref authenticate.
///
static const sstring USERNAME_KEY;
///
/// The name of the key to be used for the password part of password authentication with \ref authenticate.
///
static const sstring PASSWORD_KEY;
static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;
using credentials_map = std::unordered_map<sstring, sstring>;
virtual ~authenticator() = default;
virtual future<> start() = 0;
virtual future<> stop() = 0;
///
/// A fully-qualified (class with package) Java-like name for this implementation.
///
virtual const sstring& qualified_java_name() const = 0;
virtual bool require_authentication() const = 0;
virtual authentication_option_set supported_options() const = 0;
///
/// A subset of `supported_options()` that users are permitted to alter for themselves.
///
virtual authentication_option_set alterable_options() const = 0;
///
/// Authenticate a user given implementation-specific credentials.
///
/// If this implementation does not require authentication (\ref require_authentication), an anonymous user may
/// result.
///
/// \returns an exceptional future with \ref exceptions::authentication_exception if given invalid credentials.
///
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const = 0;
///
/// Create an authentication record for a new user. This is required before the user can log-in.
///
/// The options provided must be a subset of `supported_options()`.
///
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const = 0;
///
/// Alter the authentication record of an existing user.
///
/// The options provided must be a subset of `supported_options()`.
///
/// Callers must ensure that the specification of `alterable_options()` is adhered to.
///
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const = 0;
///
/// Delete the authentication record for a user. This will disallow the user from logging in.
///
virtual future<> drop(stdx::string_view role_name) const = 0;
///
/// Query for custom options (those corresponding to \ref authentication_options::options).
///
/// If no options are set the result is an empty container.
///
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.
///
virtual const resource_set& protected_resources() const = 0;
///
/// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).
///
class sasl_challenge {
public:
virtual ~sasl_challenge() = default;
virtual bytes evaluate_response(bytes_view client_response) = 0;
virtual bool is_complete() const = 0;
virtual future<authenticated_user> get_authenticated_user() const = 0;
/**
* Supported CREATE USER/ALTER USER options.
* Currently only PASSWORD is available.
*/
enum class option {
PASSWORD
};
static option string_to_option(const sstring&);
static sstring option_to_string(option);
using option_set = enum_set<super_enum<option, option::PASSWORD>>;
using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
using credentials_map = std::unordered_map<sstring, sstring>;
/**
* Setup is called once upon system startup to initialize the IAuthenticator.
*
* For example, use this method to create any required keyspaces/column families.
* Note: Only call from main thread.
*/
static future<> setup(const sstring& type) throw(exceptions::configuration_exception);
/**
* Returns the system authenticator. Must have called setup before calling this.
*/
static authenticator& get();
virtual ~authenticator()
{}
virtual const sstring& class_name() const = 0;
/**
* Whether or not the authenticator requires explicit login.
* If false will instantiate user with AuthenticatedUser.ANONYMOUS_USER.
*/
virtual bool require_authentication() const = 0;
/**
* Set of options supported by CREATE USER and ALTER USER queries.
* Should never return null - always return an empty set instead.
*/
virtual option_set supported_options() const = 0;
/**
* Subset of supportedOptions that users are allowed to alter when performing ALTER USER [themselves].
* Should never return null - always return an empty set instead.
*/
virtual option_set alterable_options() const = 0;
/**
* Authenticates a user given a Map<String, String> of credentials.
* Should never return null - always throw AuthenticationException instead.
* Returning AuthenticatedUser.ANONYMOUS_USER is an option as well if authentication is not required.
*
* @throws authentication_exception if credentials don't match any known user.
*/
virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) = 0;
/**
* Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
* If authenticator is static then the body of the method should be left blank, but don't throw an exception.
* options are guaranteed to be a subset of supportedOptions().
*
* @param username Username of the user to create.
* @param options Options the user will be created with.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Called during execution of ALTER USER query.
* options are always guaranteed to be a subset of supportedOptions(). Furthermore, if the user performing the query
* is not a superuser and is altering himself, then options are guaranteed to be a subset of alterableOptions().
* Keep the body of the method blank if your implementation doesn't support any options.
*
* @param username Username of the user that will be altered.
* @param options Options to alter.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Called during execution of DROP USER query.
*
* @param username Username of the user that will be dropped.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
* @see resource_ids
*/
virtual const resource_ids& protected_resources() const = 0;
class sasl_challenge {
public:
virtual ~sasl_challenge() {}
virtual bytes evaluate_response(bytes_view client_response) throw(exceptions::authentication_exception) = 0;
virtual bool is_complete() const = 0;
virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const throw(exceptions::authentication_exception) = 0;
};
/**
* Provide a sasl_challenge to be used by the CQL binary protocol server. If
* the configured authenticator requires authentication but does not implement this
* interface we refuse to start the binary protocol server as it will have no way
* of authenticating clients.
* @return sasl_challenge implementation
*/
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
};
inline std::ostream& operator<<(std::ostream& os, authenticator::option opt) {
return os << authenticator::option_to_string(opt);
}
}

104
auth/authorizer.cc Normal file
View File

@@ -0,0 +1,104 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authorizer.hh"
#include "authenticated_user.hh"
#include "default_authorizer.hh"
#include "auth.hh"
#include "db/config.hh"
const sstring auth::authorizer::ALLOW_ALL_AUTHORIZER_NAME("org.apache.cassandra.auth.AllowAllAuthorizer");
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authorizer> global_authorizer;
future<>
auth::authorizer::setup(const sstring& type) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHORIZER_NAME)) {
class allow_all_authorizer : public authorizer {
public:
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
}
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
}
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const override {
throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
}
future<> revoke_all(sstring dropped_user) override {
return make_ready_future();
}
future<> revoke_all(data_resource) override {
return make_ready_future();
}
const resource_ids& protected_resources() override {
static const resource_ids ids;
return ids;
}
future<> validate_configuration() const override {
return make_ready_future();
}
};
global_authorizer = std::make_unique<allow_all_authorizer>();
} else if (auth::auth::is_class_type(type, default_authorizer::DEFAULT_AUTHORIZER_NAME)) {
auto da = std::make_unique<default_authorizer>();
auto f = da->init();
return f.then([da = std::move(da)]() mutable {
global_authorizer = std::move(da);
});
} else {
throw exceptions::configuration_exception("Invalid authorizer type: " + type);
}
return make_ready_future();
}
auth::authorizer& auth::authorizer::get() {
assert(global_authorizer);
return *global_authorizer;
}

View File

@@ -41,116 +41,131 @@
#pragma once
#include <experimental/string_view>
#include <functional>
#include <optional>
#include <stdexcept>
#include <tuple>
#include <vector>
#include <tuple>
#include <experimental/optional>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/permission.hh"
#include "auth/resource.hh"
#include "seastarx.hh"
#include "stdx.hh"
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class role_or_anonymous;
class authenticated_user;
struct permission_details {
sstring role_name;
::auth::resource resource;
sstring user;
data_resource resource;
permission_set permissions;
bool operator<(const permission_details& v) const {
return std::tie(user, resource, permissions) < std::tie(v.user, v.resource, v.permissions);
}
};
inline bool operator==(const permission_details& pd1, const permission_details& pd2) {
return std::forward_as_tuple(pd1.role_name, pd1.resource, pd1.permissions.mask())
== std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions.mask());
}
using std::experimental::optional;
inline bool operator!=(const permission_details& pd1, const permission_details& pd2) {
return !(pd1 == pd2);
}
inline bool operator<(const permission_details& pd1, const permission_details& pd2) {
return std::forward_as_tuple(pd1.role_name, pd1.resource, pd1.permissions)
< std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions);
}
class unsupported_authorization_operation : public std::invalid_argument {
public:
using std::invalid_argument::invalid_argument;
};
///
/// Abstract client for authorizing roles to access resources.
///
/// All state necessary to authorize a role is stored externally to the client instance.
///
class authorizer {
public:
virtual ~authorizer() = default;
static const sstring ALLOW_ALL_AUTHORIZER_NAME;
virtual future<> start() = 0;
virtual ~authorizer() {}
virtual future<> stop() = 0;
/**
* The primary Authorizer method. Returns a set of permissions of a user on a resource.
*
* @param user Authenticated user requesting authorization.
* @param resource Resource for which the authorization is being requested. @see DataResource.
* @return Set of permissions of the user on the resource. Should never return empty. Use permission.NONE instead.
*/
virtual future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const = 0;
///
/// A fully-qualified (class with package) Java-like name for this implementation.
///
virtual const sstring& qualified_java_name() const = 0;
/**
* Grants a set of permissions on a resource to a user.
* The opposite of revoke().
*
* @param performer User who grants the permissions.
* @param permissions Set of permissions to grant.
* @param to Grantee of the permissions.
* @param resource Resource on which to grant the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> grant(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring to) = 0;
///
/// Query for the permissions granted directly to a role for a particular \ref resource (and not any of its
/// parents).
///
/// The optional role name is empty when an anonymous user is authorized. Some implementations may still wish to
/// grant default permissions in this case.
///
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const = 0;
/**
* Revokes a set of permissions on a resource from a user.
* The opposite of grant().
*
* @param performer User who revokes the permissions.
* @param permissions Set of permissions to revoke.
* @param from Revokee of the permissions.
* @param resource Resource on which to revoke the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> revoke(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring from) = 0;
///
/// Grant a set of permissions to a role for a particular \ref resource.
///
/// \throws \ref unsupported_authorization_operation if granting permissions is not supported.
///
virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) const = 0;
/**
* Returns a list of permissions on a resource of a user.
*
* @param performer User who wants to see the permissions.
* @param permissions Set of Permission values the user is interested in. The result should only include the matching ones.
* @param resource The resource on which permissions are requested. Can be null, in which case permissions on all resources
* should be returned.
* @param of The user whose permissions are requested. Can be null, in which case permissions of every user should be returned.
*
* @return All of the matching permission that the requesting user is authorized to know about.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
///
/// Revoke a set of permissions from a role for a particular \ref resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) const = 0;
/**
* This method is called before deleting a user with DROP USER query so that a new user with the same
* name wouldn't inherit permissions of the deleted user in the future.
*
* @param droppedUser The user to revoke all permissions from.
*/
virtual future<> revoke_all(sstring dropped_user) = 0;
///
/// Query for all directly granted permissions.
///
/// \throws \ref unsupported_authorization_operation if listing permissions is not supported.
///
virtual future<std::vector<permission_details>> list_all() const = 0;
/**
* This method is called after a resource is removed (i.e. keyspace or a table is dropped).
*
* @param droppedResource The resource to revoke all permissions on.
*/
virtual future<> revoke_all(data_resource) = 0;
///
/// Revoke all permissions granted directly to a particular role.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(stdx::string_view role_name) const = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
*/
virtual const resource_ids& protected_resources() = 0;
///
/// Revoke all permissions granted to any role for a particular resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(const resource&) const = 0;
/**
* Validates configuration of IAuthorizer implementation (if configurable).
*
* @throws ConfigurationException when there is a configuration error.
*/
virtual future<> validate_configuration() const = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.
///
virtual const resource_set& protected_resources() const = 0;
/**
* Setup is called once upon system startup to initialize the IAuthorizer.
*
* For example, use this method to create any required keyspaces/column families.
*/
static future<> setup(const sstring& type);
/**
* Returns the system authorizer. Must have called setup before calling this.
*/
static authorizer& get();
};
}

View File

@@ -1,97 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/common.hh"
#include <seastar/core/shared_ptr.hh>
#include "cql3/query_processor.hh"
#include "cql3/statements/create_table_statement.hh"
#include "database.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
namespace auth {
namespace meta {
const sstring DEFAULT_SUPERUSER_NAME("cassandra");
const sstring AUTH_KS("system_auth");
const sstring USERS_CF("users");
const sstring AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");
}
static logging::logger auth_log("auth");
// Func must support being invoked more than once.
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func) {
struct empty_state { };
return delay_until_system_ready(as).then([&as, func = std::move(func)] () mutable {
return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {
return func().then_wrapped([] (auto&& f) -> stdx::optional<empty_state> {
if (f.failed()) {
auth_log.info("Auth task failed with error, rescheduling: {}", f.get_exception());
return { };
}
return { empty_state() };
});
});
}).discard_result();
}
future<> create_metadata_table_if_missing(
stdx::string_view table_name,
cql3::query_processor& qp,
stdx::string_view cql,
::service::migration_manager& mm) {
auto& db = qp.db().local();
if (db.has_schema(meta::AUTH_KS, sstring(table_name))) {
return make_ready_future<>();
}
auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(
cql3::query_processor::parse_statement(cql));
parsed_statement->prepare_keyspace(meta::AUTH_KS);
auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
parsed_statement->prepare(db, qp.get_cql_stats())->statement);
const auto schema = statement->get_cf_meta_data(qp.db().local());
const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return mm.announce_new_column_family(b.build(), false);
}
future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db) {
static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };
return do_until([&db] { return db.get_version() != database::empty_version; }, pause).then([&mm] {
return do_until([&mm] { return mm.have_schema_agreement(); }, pause);
});
}
}

View File

@@ -1,85 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <experimental/string_view>
#include <seastar/core/future.hh>
#include <seastar/core/abort_source.hh>
#include <seastar/util/noncopyable_function.hh>
#include <seastar/core/reactor.hh>
#include <seastar/core/resource.hh>
#include <seastar/core/sstring.hh>
#include "log.hh"
#include "seastarx.hh"
#include "utils/exponential_backoff_retry.hh"
using namespace std::chrono_literals;
class database;
namespace service {
class migration_manager;
}
namespace cql3 {
class query_processor;
}
namespace auth {
namespace meta {
extern const sstring DEFAULT_SUPERUSER_NAME;
extern const sstring AUTH_KS;
extern const sstring USERS_CF;
extern const sstring AUTH_PACKAGE_NAME;
}
template <class Task>
future<> once_among_shards(Task&& f) {
if (engine().cpu_id() == 0u) {
return f();
}
return make_ready_future<>();
}
inline future<> delay_until_system_ready(seastar::abort_source& as) {
return sleep_abortable(15s, as);
}
// Func must support being invoked more than once.
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);
future<> create_metadata_table_if_missing(
stdx::string_view table_name,
cql3::query_processor&,
stdx::string_view cql,
::service::migration_manager&);
future<> wait_for_schema_agreement(::service::migration_manager&, const database&);
}

173
auth/data_resource.cc Normal file
View File

@@ -0,0 +1,173 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "data_resource.hh"
#include <regex>
#include "service/storage_proxy.hh"
const sstring auth::data_resource::ROOT_NAME("data");
auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
: _level(l), _ks(ks), _cf(cf)
{
}
auth::data_resource::data_resource()
: data_resource(level::ROOT)
{}
auth::data_resource::data_resource(const sstring& ks)
: data_resource(level::KEYSPACE, ks)
{}
auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
: data_resource(level::COLUMN_FAMILY, ks, cf)
{}
auth::data_resource::level auth::data_resource::get_level() const {
return _level;
}
auth::data_resource auth::data_resource::from_name(
const sstring& s) {
static std::regex slash_regex("/");
auto i = std::regex_token_iterator<sstring::const_iterator>(s.begin(),
s.end(), slash_regex, -1);
auto e = std::regex_token_iterator<sstring::const_iterator>();
auto n = std::distance(i, e);
if (n > 3 || ROOT_NAME != sstring(*i++)) {
throw std::invalid_argument(sprint("%s is not a valid data resource name", s));
}
if (n == 1) {
return data_resource();
}
auto ks = *i++;
if (n == 2) {
return data_resource(ks.str());
}
auto cf = *i++;
return data_resource(ks.str(), cf.str());
}
sstring auth::data_resource::name() const {
switch (get_level()) {
case level::ROOT:
return ROOT_NAME;
case level::KEYSPACE:
return sprint("%s/%s", ROOT_NAME, _ks);
case level::COLUMN_FAMILY:
default:
return sprint("%s/%s/%s", ROOT_NAME, _ks, _cf);
}
}
auth::data_resource auth::data_resource::get_parent() const {
switch (get_level()) {
case level::KEYSPACE:
return data_resource();
case level::COLUMN_FAMILY:
return data_resource(_ks);
default:
throw std::invalid_argument("Root-level resource can't have a parent");
}
}
const sstring& auth::data_resource::keyspace() const
throw (std::invalid_argument) {
if (is_root_level()) {
throw std::invalid_argument("ROOT data resource has no keyspace");
}
return _ks;
}
const sstring& auth::data_resource::column_family() const
throw (std::invalid_argument) {
if (!is_column_family_level()) {
throw std::invalid_argument(sprint("%s data resource has no column family", name()));
}
return _cf;
}
bool auth::data_resource::has_parent() const {
return !is_root_level();
}
bool auth::data_resource::exists() const {
switch (get_level()) {
case level::ROOT:
return true;
case level::KEYSPACE:
return service::get_local_storage_proxy().get_db().local().has_keyspace(_ks);
case level::COLUMN_FAMILY:
default:
return service::get_local_storage_proxy().get_db().local().has_schema(_ks, _cf);
}
}
sstring auth::data_resource::to_string() const {
switch (get_level()) {
case level::ROOT:
return "<all keyspaces>";
case level::KEYSPACE:
return sprint("<keyspace %s>", _ks);
case level::COLUMN_FAMILY:
default:
return sprint("<table %s.%s>", _ks, _cf);
}
}
bool auth::data_resource::operator==(const data_resource& v) const {
return _ks == v._ks && _cf == v._cf;
}
bool auth::data_resource::operator<(const data_resource& v) const {
return _ks < v._ks ? true : (v._ks < _ks ? false : _cf < v._cf);
}
std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
return os << r.to_string();
}

158
auth/data_resource.hh Normal file
View File

@@ -0,0 +1,158 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "utils/hash.hh"
#include <iosfwd>
#include <set>
#include <seastar/core/sstring.hh>
namespace auth {
class data_resource {
private:
enum class level {
ROOT, KEYSPACE, COLUMN_FAMILY
};
static const sstring ROOT_NAME;
level _level;
sstring _ks;
sstring _cf;
data_resource(level, const sstring& ks = {}, const sstring& cf = {});
level get_level() const;
public:
/**
* Creates a DataResource representing the root-level resource.
* @return the root-level resource.
*/
data_resource();
/**
* Creates a DataResource representing a keyspace.
*
* @param keyspace Name of the keyspace.
*/
data_resource(const sstring& ks);
/**
* Creates a DataResource instance representing a column family.
*
* @param keyspace Name of the keyspace.
* @param columnFamily Name of the column family.
*/
data_resource(const sstring& ks, const sstring& cf);
/**
* Parses a data resource name into a DataResource instance.
*
* @param name Name of the data resource.
* @return DataResource instance matching the name.
*/
static data_resource from_name(const sstring&);
/**
* @return Printable name of the resource.
*/
sstring name() const;
/**
* @return Parent of the resource, if any. Throws IllegalStateException if it's the root-level resource.
*/
data_resource get_parent() const;
bool is_root_level() const {
return get_level() == level::ROOT;
}
bool is_keyspace_level() const {
return get_level() == level::KEYSPACE;
}
bool is_column_family_level() const {
return get_level() == level::COLUMN_FAMILY;
}
/**
* @return keyspace of the resource.
* @throws std::invalid_argument if it's the root-level resource.
*/
const sstring& keyspace() const throw(std::invalid_argument);
/**
* @return column family of the resource.
* @throws std::invalid_argument if it's not a cf-level resource.
*/
const sstring& column_family() const throw(std::invalid_argument);
/**
* @return Whether or not the resource has a parent in the hierarchy.
*/
bool has_parent() const;
/**
* @return Whether or not the resource exists in scylla.
*/
bool exists() const;
sstring to_string() const;
bool operator==(const data_resource&) const;
bool operator<(const data_resource&) const;
size_t hash_value() const {
return utils::tuple_hash()(_ks, _cf);
}
};
/**
* Resource id mappings, i.e. keyspace and/or column families.
*/
using resource_ids = std::set<data_resource>;
std::ostream& operator<<(std::ostream&, const data_resource&);
}

View File

@@ -39,309 +39,202 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/default_authorizer.hh"
extern "C" {
#include <crypt.h>
#include <unistd.h>
}
#include <chrono>
#include <crypt.h>
#include <random>
#include <chrono>
#include <boost/algorithm/string/join.hpp>
#include <boost/range.hpp>
#include <seastar/core/reactor.hh>
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/permission.hh"
#include "auth/role_or_anonymous.hh"
#include "auth.hh"
#include "default_authorizer.hh"
#include "authenticated_user.hh"
#include "permission.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
namespace auth {
const sstring auth::default_authorizer::DEFAULT_AUTHORIZER_NAME(
"org.apache.cassandra.auth.CassandraAuthorizer");
const sstring& default_authorizer_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "CassandraAuthorizer";
return name;
}
static const sstring ROLE_NAME = "role";
static const sstring USER_NAME = "username";
static const sstring RESOURCE_NAME = "resource";
static const sstring PERMISSIONS_NAME = "permissions";
static const sstring PERMISSIONS_CF = "role_permissions";
static const sstring PERMISSIONS_CF = "permissions";
static logging::logger alogger("default_authorizer");
static logging::logger logger("default_authorizer");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authorizer,
default_authorizer,
cql3::query_processor&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.CassandraAuthorizer");
default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
: _qp(qp)
, _migration_manager(mm) {
auth::default_authorizer::default_authorizer() {
}
auth::default_authorizer::~default_authorizer() {
}
default_authorizer::~default_authorizer() {
future<> auth::default_authorizer::init() {
sstring create_table = sprint("CREATE TABLE %s.%s ("
"%s text,"
"%s text,"
"%s set<text>,"
"PRIMARY KEY(%s, %s)"
") WITH gc_grace_seconds=%d", auth::auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME,
USER_NAME, RESOURCE_NAME, 90 * 24 * 60 * 60); // 3 months.
return auth::setup_table(PERMISSIONS_CF, create_table);
}
static const sstring legacy_table_name{"permissions"};
bool default_authorizer::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<auth::permission_set> auth::default_authorizer::authorize(
::shared_ptr<authenticated_user> user, data_resource resource) const {
return user->is_super().then([this, user, resource = std::move(resource)](bool is_super) {
if (is_super) {
return make_ready_future<permission_set>(permissions::ALL);
}
future<bool> default_authorizer::any_granted() const {
static const sstring query = sprint("SELECT * FROM %s.%s LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
/**
* TOOD: could create actual data type for permission (translating string<->perm),
* but this seems overkill right now. We still must store strings so...
*/
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?"
, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
.then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
return _qp.process(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{},
true).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return !results->empty();
});
}
future<> default_authorizer::migrate_legacy_metadata() const {
alogger.info("Starting migration of legacy permissions metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
return do_with(
row.get_as<sstring>("username"),
parse_resource(row.get_as<sstring>(RESOURCE_NAME)),
[this, &row](const auto& username, const auto& r) {
const permission_set perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
return grant(username, perms, r);
});
}).finally([results] {});
}).then([] {
alogger.info("Finished migrating legacy permissions metadata.");
}).handle_exception([](std::exception_ptr ep) {
alogger.error("Encountered an error during migration!");
std::rethrow_exception(ep);
});
}
future<> default_authorizer::start() {
static const sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text,"
"%s set<text>,"
"PRIMARY KEY(%s, %s)"
") WITH gc_grace_seconds=%d",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME,
PERMISSIONS_NAME,
ROLE_NAME,
RESOURCE_NAME,
90 * 24 * 60 * 60); // 3 months.
return once_among_shards([this] {
return create_metadata_table_if_missing(
PERMISSIONS_CF,
_qp,
create_table,
_migration_manager).then([this] {
_finished = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
if (legacy_metadata_exists()) {
if (!any_granted().get0()) {
migrate_legacy_metadata().get0();
return;
}
alogger.warn("Ignoring legacy permissions metadata since role permissions exist.");
}
});
});
if (res->empty() || !res->one().has(PERMISSIONS_NAME)) {
return make_ready_future<permission_set>(permissions::NONE);
}
return make_ready_future<permission_set>(permissions::from_strings(res->one().get_set<sstring>(PERMISSIONS_NAME)));
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to authorize {} for {}", user->name(), resource);
return make_ready_future<permission_set>(permissions::NONE);
}
});
});
}
future<> default_authorizer::stop() {
_as.request_abort();
return _finished.handle_exception_type([](const sleep_aborted&) {});
#include <boost/range.hpp>
future<> auth::default_authorizer::modify(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring user, sstring op) {
// TODO: why does this not check super user?
auto& qp = cql3::get_local_query_processor();
auto query = sprint("UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
auth::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
PERMISSIONS_NAME, op, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::ONE, {
permissions::to_strings(set), user, resource.name() }).discard_result();
}
future<permission_set>
default_authorizer::authorize(const role_or_anonymous& maybe_role, const resource& r) const {
if (is_anonymous(maybe_role)) {
return make_ready_future<permission_set>(permissions::NONE);
}
static const sstring query = sprint(
"SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?",
PERMISSIONS_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME);
future<> auth::default_authorizer::grant(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring to) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(to), "+");
}
return _qp.process(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{*maybe_role.name, r.name()}).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return permissions::NONE;
future<> auth::default_authorizer::revoke(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring from) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(from), "-");
}
future<std::vector<auth::permission_details>> auth::default_authorizer::list(
::shared_ptr<authenticated_user> performer, permission_set set,
optional<data_resource> resource, optional<sstring> user) const {
return performer->is_super().then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
if (!is_super && (!user || performer->name() != *user)) {
throw exceptions::unauthorized_exception(sprint("You are not authorized to view %s's permissions", user ? *user : "everyone"));
}
return permissions::from_strings(results->one().get_set<sstring>(PERMISSIONS_NAME));
});
}
auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF);
auto& qp = cql3::get_local_query_processor();
future<>
default_authorizer::modify(
stdx::string_view role_name,
permission_set set,
const resource& resource,
stdx::string_view op) const {
return do_with(
sprint(
"UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
PERMISSIONS_NAME,
PERMISSIONS_NAME,
op,
ROLE_NAME,
RESOURCE_NAME),
[this, &role_name, set, &resource](const auto& query) {
return _qp.process(
query,
db::consistency_level::ONE,
infinite_timeout_config,
{permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();
});
}
// Oh, look, it is a case where it does not pay off to have
// parameters to process in an initializer list.
future<::shared_ptr<cql3::untyped_result_set>> f = make_ready_future<::shared_ptr<cql3::untyped_result_set>>();
if (resource && user) {
query += sprint(" WHERE %s = ? AND %s = ?", USER_NAME, RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
} else if (resource) {
query += sprint(" WHERE %s = ? ALLOW FILTERING", RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {resource->name()});
} else if (user) {
query += sprint(" WHERE %s = ?", USER_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user});
} else {
f = qp.process(query, db::consistency_level::ONE, {});
}
future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "+");
}
return f.then([set](::shared_ptr<cql3::untyped_result_set> res) {
std::vector<permission_details> result;
future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "-");
}
for (auto& row : *res) {
if (row.has(PERMISSIONS_NAME)) {
auto username = row.get_as<sstring>(USER_NAME);
auto resource = data_resource::from_name(row.get_as<sstring>(RESOURCE_NAME));
auto ps = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
ps = permission_set::from_mask(ps.mask() & set.mask());
future<std::vector<permission_details>> default_authorizer::list_all() const {
static const sstring query = sprint(
"SELECT %s, %s, %s FROM %s.%s",
ROLE_NAME,
RESOURCE_NAME,
PERMISSIONS_NAME,
meta::AUTH_KS,
PERMISSIONS_CF);
return _qp.process(
query,
db::consistency_level::ONE,
infinite_timeout_config,
{},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
std::vector<permission_details> all_details;
for (const auto& row : *results) {
if (row.has(PERMISSIONS_NAME)) {
auto role_name = row.get_as<sstring>(ROLE_NAME);
auto resource = parse_resource(row.get_as<sstring>(RESOURCE_NAME));
auto perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
all_details.push_back(permission_details{std::move(role_name), std::move(resource), std::move(perms)});
result.emplace_back(permission_details {username, resource, ps});
}
}
}
return all_details;
return make_ready_future<std::vector<permission_details>>(std::move(result));
});
});
}
future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
static const sstring query = sprint(
"DELETE FROM %s.%s WHERE %s = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME);
return _qp.process(
query,
db::consistency_level::ONE,
infinite_timeout_config,
{sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);
}
});
future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME);
return qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
[dropped_user](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", dropped_user, e);
}
});
}
future<> default_authorizer::revoke_all(const resource& resource) const {
static const sstring query = sprint(
"SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
ROLE_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
RESOURCE_NAME);
return _qp.process(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{resource.name()}).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
future<> auth::default_authorizer::revoke_all(data_resource resource) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
USER_NAME, auth::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
.then_wrapped([resource, &qp](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
return parallel_for_each(
res->begin(),
res->end(),
[this, res, resource](const cql3::untyped_result_set::row& r) {
static const sstring query = sprint(
"DELETE FROM %s.%s WHERE %s = ? AND %s = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME);
return _qp.process(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{r.get_as<sstring>(ROLE_NAME), resource.name()}).discard_result().handle_exception(
[resource](auto ep) {
return parallel_for_each(res->begin(), res->end(), [&qp, res, resource](const cql3::untyped_result_set::row& r) {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ? AND %s = ?"
, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
.discard_result().handle_exception([resource](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
}
});
});
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
return make_ready_future();
}
});
}
const resource_set& default_authorizer::protected_resources() const {
static const resource_set resources({ make_data_resource(meta::AUTH_KS, PERMISSIONS_CF) });
return resources;
const auth::resource_ids& auth::default_authorizer::protected_resources() {
static const resource_ids ids({ data_resource(auth::AUTH_KS, PERMISSIONS_CF) });
return ids;
}
future<> auth::default_authorizer::validate_configuration() const {
return make_ready_future();
}

View File

@@ -41,62 +41,37 @@
#pragma once
#include <functional>
#include <seastar/core/abort_source.hh>
#include "auth/authorizer.hh"
#include "cql3/query_processor.hh"
#include "service/migration_manager.hh"
#include "authorizer.hh"
namespace auth {
const sstring& default_authorizer_name();
class default_authorizer : public authorizer {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
abort_source _as{};
future<> _finished{make_ready_future<>()};
public:
default_authorizer(cql3::query_processor&, ::service::migration_manager&);
static const sstring DEFAULT_AUTHORIZER_NAME;
default_authorizer();
~default_authorizer();
virtual future<> start() override;
future<> init();
virtual future<> stop() override;
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override;
virtual const sstring& qualified_java_name() const override {
return default_authorizer_name();
}
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override;
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
virtual future<> revoke( stdx::string_view, permission_set, const resource&) const override;
future<> revoke_all(sstring) override;
virtual future<std::vector<permission_details>> list_all() const override;
future<> revoke_all(data_resource) override;
virtual future<> revoke_all(stdx::string_view) const override;
const resource_ids& protected_resources() override;
virtual future<> revoke_all(const resource&) const override;
virtual const resource_set& protected_resources() const override;
future<> validate_configuration() const override;
private:
bool legacy_metadata_exists() const;
future<bool> any_granted() const;
future<> migrate_legacy_metadata() const;
future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view) const;
future<> modify(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring, sstring);
};
} /* namespace auth */

View File

@@ -39,57 +39,35 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/password_authenticator.hh"
extern "C" {
#include <crypt.h>
#include <unistd.h>
}
#include <algorithm>
#include <chrono>
#include <crypt.h>
#include <random>
#include <chrono>
#include <boost/algorithm/cxx11/all_of.hpp>
#include <seastar/core/reactor.hh>
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/roles-metadata.hh"
#include "cql3/untyped_result_set.hh"
#include "auth.hh"
#include "password_authenticator.hh"
#include "authenticated_user.hh"
#include "cql3/query_processor.hh"
#include "log.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
namespace auth {
const sstring& password_authenticator_name() {
static const sstring name = meta::AUTH_PACKAGE_NAME + "PasswordAuthenticator";
return name;
}
const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
// name of the hash column.
static const sstring SALTED_HASH = "salted_hash";
static const sstring DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = meta::DEFAULT_SUPERUSER_NAME;
static const sstring USER_NAME = "username";
static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring CREDENTIALS_CF = "credentials";
static logging::logger plogger("password_authenticator");
static logging::logger logger("password_authenticator");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authenticator,
password_authenticator,
cql3::query_processor&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
auth::password_authenticator::~password_authenticator()
{}
password_authenticator::~password_authenticator() {
}
password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
: _qp(qp)
, _migration_manager(mm)
, _stopped(make_ready_future<>()) {
}
auth::password_authenticator::password_authenticator()
{}
// TODO: blowfish
// Origin uses Java bcrypt library, i.e. blowfish salt
@@ -110,10 +88,12 @@ password_authenticator::password_authenticator(cql3::query_processor& qp, ::serv
// and some old-fashioned random salt generation.
static constexpr size_t rand_bytes = 16;
static thread_local crypt_data tlcrypt = { 0, };
static sstring hashpw(const sstring& pass, const sstring& salt) {
auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
// crypt_data is huge. should this be a thread_local static?
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
if (res == nullptr) {
throw std::system_error(errno, std::system_category());
}
@@ -142,14 +122,17 @@ static sstring gensalt() {
sstring salt;
if (!prefix.empty()) {
return prefix + input;
return prefix + salt;
}
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
// Try in order:
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), &tlcrypt)) {
if (crypt_r("fisk", salt.c_str(), tmp.get())) {
prefix = pfx;
return salt;
}
@@ -161,129 +144,65 @@ static sstring hashpw(const sstring& pass) {
return hashpw(pass, gensalt());
}
static bool has_salted_hash(const cql3::untyped_result_set_row& row) {
return !row.get_or<sstring>(SALTED_HASH, "").empty();
}
future<> auth::password_authenticator::init() {
gensalt(); // do this once to determine usable hashing
static const sstring update_row_query = sprint(
"UPDATE %s SET %s = ? WHERE %s = ?",
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text," // salt + hash + number of rounds
"options map<text,text>,"// for future extensions
"PRIMARY KEY(%s)"
") WITH gc_grace_seconds=%d",
auth::auth::AUTH_KS,
CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
90 * 24 * 60 * 60); // 3 months.
static const sstring legacy_table_name{"credentials"};
bool password_authenticator::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<> password_authenticator::migrate_legacy_metadata() const {
plogger.info("Starting migration of legacy authentication metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
db::consistency_level::QUORUM,
infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
auto username = row.get_as<sstring>("username");
auto salted_hash = row.get_as<sstring>(SALTED_HASH);
return _qp.process(
update_row_query,
consistency_for_user(username),
infinite_timeout_config,
{std::move(salted_hash), username}).discard_result();
}).finally([results] {});
}).then([] {
plogger.info("Finished migrating legacy authentication metadata.");
}).handle_exception([](std::exception_ptr ep) {
plogger.error("Encountered an error during migration!");
std::rethrow_exception(ep);
});
}
future<> password_authenticator::create_default_if_missing() const {
return default_role_row_satisfies(_qp, &has_salted_hash).then([this](bool exists) {
if (!exists) {
return _qp.process(
update_row_query,
db::consistency_level::QUORUM,
infinite_timeout_config,
{hashpw(DEFAULT_USER_PASSWORD), DEFAULT_USER_NAME}).then([](auto&&) {
plogger.info("Created default superuser authentication record.");
return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
// instead of once-timer, just schedule this later
auth::schedule_when_up([] {
return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
auth::AUTH_KS,
CREDENTIALS_CF,
USER_NAME, SALTED_HASH
),
db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
logger.info("Created default user '{}'", DEFAULT_USER_NAME);
});
}
});
}
return make_ready_future<>();
});
});
}
future<> password_authenticator::start() {
return once_among_shards([this] {
gensalt(); // do this once to determine usable hashing
auto f = create_metadata_table_if_missing(
meta::roles_table::name,
_qp,
meta::roles_table::creation_query(),
_migration_manager);
_stopped = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {
if (legacy_metadata_exists()) {
plogger.warn("Ignoring legacy authentication metadata since nondefault data already exist.");
}
return;
}
if (legacy_metadata_exists()) {
migrate_legacy_metadata().get0();
return;
}
create_default_if_missing().get0();
});
});
return f;
});
}
future<> password_authenticator::stop() {
_as.request_abort();
return _stopped.handle_exception_type([] (const sleep_aborted&) { });
}
db::consistency_level password_authenticator::consistency_for_user(stdx::string_view role_name) {
if (role_name == DEFAULT_USER_NAME) {
db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
if (username == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
const sstring& password_authenticator::qualified_java_name() const {
return password_authenticator_name();
const sstring& auth::password_authenticator::class_name() const {
return PASSWORD_AUTHENTICATOR_NAME;
}
bool password_authenticator::require_authentication() const {
bool auth::password_authenticator::require_authentication() const {
return true;
}
authentication_option_set password_authenticator::supported_options() const {
return authentication_option_set{authentication_option::password};
auth::authenticator::option_set auth::password_authenticator::supported_options() const {
return option_set::of<option::PASSWORD>();
}
authentication_option_set password_authenticator::alterable_options() const {
return authentication_option_set{authentication_option::password};
auth::authenticator::option_set auth::password_authenticator::alterable_options() const {
return option_set::of<option::PASSWORD>();
}
future<authenticated_user> password_authenticator::authenticate(
const credentials_map& credentials) const {
future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
const credentials_map& credentials) const
throw (exceptions::authentication_exception) {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
}
@@ -300,25 +219,17 @@ future<authenticated_user> password_authenticator::authenticate(
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return futurize_apply([this, username, password] {
static const sstring query = sprint(
"SELECT %s FROM %s WHERE %s = ?",
SALTED_HASH,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.process(
query,
consistency_for_user(username),
infinite_timeout_config,
{username},
true);
auto& qp = cql3::get_local_query_processor();
return qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
consistency_for_user(username), {username}, true);
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
return make_ready_future<authenticated_user>(username);
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>(username));
} catch (std::system_error &) {
std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
} catch (exceptions::request_execution_exception& e) {
@@ -329,62 +240,60 @@ future<authenticated_user> password_authenticator::authenticate(
});
}
future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
future<> auth::password_authenticator::create(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
return _qp.process(
update_row_query,
consistency_for_user(role_name),
infinite_timeout_config,
{hashpw(*options.password), sstring(role_name)}).discard_result();
}
future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
future<> auth::password_authenticator::alter(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
static const sstring query = sprint(
"UPDATE %s SET %s = ? WHERE %s = ?",
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
return _qp.process(
query,
consistency_for_user(role_name),
infinite_timeout_config,
{hashpw(*options.password), sstring(role_name)}).discard_result();
}
future<> password_authenticator::drop(stdx::string_view name) const {
static const sstring query = sprint(
"DELETE %s FROM %s WHERE %s = ?",
SALTED_HASH,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.process(query, consistency_for_user(name), infinite_timeout_config, {sstring(name)}).discard_result();
future<> auth::password_authenticator::drop(sstring username)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {
return make_ready_future<custom_options>();
const auth::resource_ids& auth::password_authenticator::protected_resources() const {
static const resource_ids ids({ data_resource(auth::AUTH_KS, CREDENTIALS_CF) });
return ids;
}
const resource_set& password_authenticator::protected_resources() const {
static const resource_set resources({make_data_resource(meta::AUTH_KS, meta::roles_table::name)});
return resources;
}
::shared_ptr<authenticator::sasl_challenge> password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge : public sasl_challenge {
const password_authenticator& _self;
::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge: public sasl_challenge {
public:
plain_text_password_challenge(const password_authenticator& self) : _self(self) {
}
plain_text_password_challenge(const password_authenticator& a)
: _authenticator(a)
{}
/**
* SASL PLAIN mechanism specifies that credentials are encoded in a
@@ -399,8 +308,9 @@ const resource_set& password_authenticator::protected_resources() const {
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes evaluate_response(bytes_view client_response) override {
plogger.debug("Decoding credentials from client token");
bytes evaluate_response(bytes_view client_response)
throw (exceptions::authentication_exception) override {
logger.debug("Decoding credentials from client token");
sstring username, password;
@@ -434,19 +344,17 @@ const resource_set& password_authenticator::protected_resources() const {
_complete = true;
return {};
}
bool is_complete() const override {
return _complete;
}
future<authenticated_user> get_authenticated_user() const override {
return _self.authenticate(_credentials);
future<::shared_ptr<authenticated_user>> get_authenticated_user() const
throw (exceptions::authentication_exception) override {
return _authenticator.authenticate(_credentials);
}
private:
const password_authenticator& _authenticator;
credentials_map _credentials;
bool _complete = false;
};
return ::make_shared<plain_text_password_challenge>(*this);
}
}

View File

@@ -41,64 +41,32 @@
#pragma once
#include <seastar/core/abort_source.hh>
#include "auth/authenticator.hh"
#include "cql3/query_processor.hh"
namespace service {
class migration_manager;
}
#include "authenticator.hh"
namespace auth {
const sstring& password_authenticator_name();
class password_authenticator : public authenticator {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
future<> _stopped;
seastar::abort_source _as;
public:
static db::consistency_level consistency_for_user(stdx::string_view role_name);
password_authenticator(cql3::query_processor&, ::service::migration_manager&);
static const sstring PASSWORD_AUTHENTICATOR_NAME;
password_authenticator();
~password_authenticator();
virtual future<> start() override;
future<> init();
virtual future<> stop() override;
const sstring& class_name() const override;
bool require_authentication() const override;
option_set supported_options() const override;
option_set alterable_options() const override;
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override;
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
const resource_ids& protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
virtual const sstring& qualified_java_name() const override;
virtual bool require_authentication() const override;
virtual authentication_option_set supported_options() const override;
virtual authentication_option_set alterable_options() const override;
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> drop(stdx::string_view role_name) const override;
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override;
virtual const resource_set& protected_resources() const override;
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
private:
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata() const;
future<> create_default_if_missing() const;
static db::consistency_level consistency_for_user(const sstring& username);
};
}

View File

@@ -39,33 +39,32 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/permission.hh"
#include <boost/algorithm/string.hpp>
#include <unordered_map>
#include <boost/algorithm/string.hpp>
#include "permission.hh"
const auth::permission_set auth::permissions::ALL = auth::permission_set::of<
auth::permission::CREATE,
auth::permission::ALTER,
auth::permission::DROP,
auth::permission::SELECT,
auth::permission::MODIFY,
auth::permission::AUTHORIZE,
auth::permission::DESCRIBE>();
const auth::permission_set auth::permissions::ALL_DATA =
auth::permission_set::of<auth::permission::CREATE,
auth::permission::ALTER, auth::permission::DROP,
auth::permission::SELECT,
auth::permission::MODIFY,
auth::permission::AUTHORIZE>();
const auth::permission_set auth::permissions::ALL = auth::permissions::ALL_DATA;
const auth::permission_set auth::permissions::NONE;
const auth::permission_set auth::permissions::ALTERATIONS =
auth::permission_set::of<auth::permission::CREATE,
auth::permission::ALTER, auth::permission::DROP>();
static const std::unordered_map<sstring, auth::permission> permission_names({
{"READ", auth::permission::READ},
{"WRITE", auth::permission::WRITE},
{"CREATE", auth::permission::CREATE},
{"ALTER", auth::permission::ALTER},
{"DROP", auth::permission::DROP},
{"SELECT", auth::permission::SELECT},
{"MODIFY", auth::permission::MODIFY},
{"AUTHORIZE", auth::permission::AUTHORIZE},
{"DESCRIBE", auth::permission::DESCRIBE}});
{ "READ", auth::permission::READ },
{ "WRITE", auth::permission::WRITE },
{ "CREATE", auth::permission::CREATE },
{ "ALTER", auth::permission::ALTER },
{ "DROP", auth::permission::DROP },
{ "SELECT", auth::permission::SELECT },
{ "MODIFY", auth::permission::MODIFY },
{ "AUTHORIZE", auth::permission::AUTHORIZE },
});
const sstring& auth::permissions::to_string(permission p) {
for (auto& v : permission_names) {

View File

@@ -42,11 +42,9 @@
#pragma once
#include <unordered_set>
#include <seastar/core/sstring.hh>
#include "enum_set.hh"
#include "seastarx.hh"
namespace auth {
@@ -67,13 +65,9 @@ enum class permission {
// permission management
AUTHORIZE, // required for GRANT and REVOKE.
DESCRIBE, // required on the root-level role resource to list all roles.
};
typedef enum_set<
super_enum<
permission,
typedef enum_set<super_enum<permission,
permission::READ,
permission::WRITE,
permission::CREATE,
@@ -81,15 +75,16 @@ typedef enum_set<
permission::DROP,
permission::SELECT,
permission::MODIFY,
permission::AUTHORIZE,
permission::DESCRIBE>> permission_set;
permission::AUTHORIZE>> permission_set;
bool operator<(const permission_set&, const permission_set&);
namespace permissions {
extern const permission_set ALL_DATA;
extern const permission_set ALL;
extern const permission_set NONE;
extern const permission_set ALTERATIONS;
const sstring& to_string(permission);
permission from_string(const sstring&);
@@ -97,6 +92,7 @@ permission from_string(const sstring&);
std::unordered_set<sstring> to_strings(const permission_set&);
permission_set from_strings(const std::unordered_set<sstring>&);
}
}

View File

@@ -1,53 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/permissions_cache.hh"
#include "auth/authorizer.hh"
#include "auth/common.hh"
#include "auth/service.hh"
#include "db/config.hh"
namespace auth {
permissions_cache_config permissions_cache_config::from_db_config(const db::config& dc) {
permissions_cache_config c;
c.max_entries = dc.permissions_cache_max_entries();
c.validity_period = std::chrono::milliseconds(dc.permissions_validity_in_ms());
c.update_period = std::chrono::milliseconds(dc.permissions_update_interval_in_ms());
return c;
}
permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)
: _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {
log.debug("Refreshing permissions for {}", k.first);
return ser.get_uncached_permissions(k.first, k.second);
}) {
}
future<permission_set> permissions_cache::get(const role_or_anonymous& maybe_role, const resource& r) {
return do_with(key_type(maybe_role, r), [this](const auto& k) {
return _cache.get(k);
});
}
}

View File

@@ -1,91 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <experimental/string_view>
#include <functional>
#include <iostream>
#include <optional>
#include <utility>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sstring.hh>
#include "auth/authenticated_user.hh"
#include "auth/permission.hh"
#include "auth/resource.hh"
#include "auth/role_or_anonymous.hh"
#include "log.hh"
#include "stdx.hh"
#include "utils/hash.hh"
#include "utils/loading_cache.hh"
namespace std {
inline std::ostream& operator<<(std::ostream& os, const pair<auth::role_or_anonymous, auth::resource>& p) {
os << "{role: " << p.first << ", resource: " << p.second << "}";
return os;
}
}
namespace db {
class config;
}
namespace auth {
class service;
struct permissions_cache_config final {
static permissions_cache_config from_db_config(const db::config&);
std::size_t max_entries;
std::chrono::milliseconds validity_period;
std::chrono::milliseconds update_period;
};
class permissions_cache final {
using cache_type = utils::loading_cache<
std::pair<role_or_anonymous, resource>,
permission_set,
utils::loading_cache_reload_enabled::yes,
utils::simple_entry_size<permission_set>,
utils::tuple_hash>;
using key_type = typename cache_type::key_type;
cache_type _cache;
public:
explicit permissions_cache(const permissions_cache_config&, service&, logging::logger&);
future <> stop() {
return _cache.stop();
}
future<permission_set> get(const role_or_anonymous&, const resource&);
};
}

View File

@@ -1,296 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/resource.hh"
#include <algorithm>
#include <iterator>
#include <unordered_map>
#include <boost/algorithm/string/join.hpp>
#include <boost/algorithm/string/split.hpp>
#include "service/storage_proxy.hh"
namespace auth {
std::ostream& operator<<(std::ostream& os, resource_kind kind) {
switch (kind) {
case resource_kind::data: os << "data"; break;
case resource_kind::role: os << "role"; break;
}
return os;
}
static const std::unordered_map<resource_kind, stdx::string_view> roots{
{resource_kind::data, "data"},
{resource_kind::role, "roles"}};
static const std::unordered_map<resource_kind, std::size_t> max_parts{
{resource_kind::data, 2},
{resource_kind::role, 1}};
static permission_set applicable_permissions(const data_resource_view& dv) {
if (dv.table()) {
return permission_set::of<
permission::ALTER,
permission::DROP,
permission::SELECT,
permission::MODIFY,
permission::AUTHORIZE>();
}
return permission_set::of<
permission::CREATE,
permission::ALTER,
permission::DROP,
permission::SELECT,
permission::MODIFY,
permission::AUTHORIZE>();
}
static permission_set applicable_permissions(const role_resource_view& rv) {
if (rv.role()) {
return permission_set::of<permission::ALTER, permission::DROP, permission::AUTHORIZE>();
}
return permission_set::of<
permission::CREATE,
permission::ALTER,
permission::DROP,
permission::AUTHORIZE,
permission::DESCRIBE>();
}
resource::resource(resource_kind kind) : _kind(kind), _parts{sstring(roots.at(kind))} {
}
resource::resource(resource_kind kind, std::vector<sstring> parts) : resource(kind) {
_parts.reserve(parts.size() + 1);
_parts.insert(_parts.end(), std::make_move_iterator(parts.begin()), std::make_move_iterator(parts.end()));
}
resource::resource(data_resource_t, stdx::string_view keyspace)
: resource(resource_kind::data, std::vector<sstring>{sstring(keyspace)}) {
}
resource::resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table)
: resource(resource_kind::data, std::vector<sstring>{sstring(keyspace), sstring(table)}) {
}
resource::resource(role_resource_t, stdx::string_view role)
: resource(resource_kind::role, std::vector<sstring>{sstring(role)}) {
}
sstring resource::name() const {
return boost::algorithm::join(_parts, "/");
}
std::optional<resource> resource::parent() const {
if (_parts.size() == 1) {
return {};
}
resource copy = *this;
copy._parts.pop_back();
return copy;
}
permission_set resource::applicable_permissions() const {
permission_set ps;
switch (_kind) {
case resource_kind::data: ps = ::auth::applicable_permissions(data_resource_view(*this)); break;
case resource_kind::role: ps = ::auth::applicable_permissions(role_resource_view(*this)); break;
}
return ps;
}
bool operator<(const resource& r1, const resource& r2) {
if (r1._kind != r2._kind) {
return r1._kind < r2._kind;
}
return std::lexicographical_compare(
r1._parts.cbegin() + 1,
r1._parts.cend(),
r2._parts.cbegin() + 1,
r2._parts.cend());
}
std::ostream& operator<<(std::ostream& os, const resource& r) {
switch (r.kind()) {
case resource_kind::data: return os << data_resource_view(r);
case resource_kind::role: return os << role_resource_view(r);
}
return os;
}
data_resource_view::data_resource_view(const resource& r) : _resource(r) {
if (r._kind != resource_kind::data) {
throw resource_kind_mismatch(resource_kind::data, r._kind);
}
}
std::optional<stdx::string_view> data_resource_view::keyspace() const {
if (_resource._parts.size() == 1) {
return {};
}
return _resource._parts[1];
}
std::optional<stdx::string_view> data_resource_view::table() const {
if (_resource._parts.size() <= 2) {
return {};
}
return _resource._parts[2];
}
std::ostream& operator<<(std::ostream& os, const data_resource_view& v) {
const auto keyspace = v.keyspace();
const auto table = v.table();
if (!keyspace) {
os << "<all keyspaces>";
} else if (!table) {
os << "<keyspace " << *keyspace << '>';
} else {
os << "<table " << *keyspace << '.' << *table << '>';
}
return os;
}
role_resource_view::role_resource_view(const resource& r) : _resource(r) {
if (r._kind != resource_kind::role) {
throw resource_kind_mismatch(resource_kind::role, r._kind);
}
}
std::optional<stdx::string_view> role_resource_view::role() const {
if (_resource._parts.size() == 1) {
return {};
}
return _resource._parts[1];
}
std::ostream& operator<<(std::ostream& os, const role_resource_view& v) {
const auto role = v.role();
if (!role) {
os << "<all roles>";
} else {
os << "<role " << *role << '>';
}
return os;
}
resource parse_resource(stdx::string_view name) {
static const std::unordered_map<stdx::string_view, resource_kind> reverse_roots = [] {
std::unordered_map<stdx::string_view, resource_kind> result;
for (const auto& pair : roots) {
result.emplace(pair.second, pair.first);
}
return result;
}();
std::vector<sstring> parts;
boost::split(parts, name, [](char ch) { return ch == '/'; });
if (parts.empty()) {
throw invalid_resource_name(name);
}
const auto iter = reverse_roots.find(parts[0]);
if (iter == reverse_roots.end()) {
throw invalid_resource_name(name);
}
const auto kind = iter->second;
parts.erase(parts.begin());
if (parts.size() > max_parts.at(kind)) {
throw invalid_resource_name(name);
}
return resource(kind, std::move(parts));
}
static const resource the_root_data_resource{resource_kind::data};
const resource& root_data_resource() {
return the_root_data_resource;
}
static const resource the_root_role_resource{resource_kind::role};
const resource& root_role_resource() {
return the_root_role_resource;
}
resource_set expand_resource_family(const resource& rr) {
resource r = rr;
resource_set rs;
while (true) {
const auto pr = r.parent();
rs.insert(std::move(r));
if (!pr) {
break;
}
r = std::move(*pr);
}
return rs;
}
}

View File

@@ -1,254 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <experimental/string_view>
#include <iostream>
#include <optional>
#include <stdexcept>
#include <tuple>
#include <vector>
#include <unordered_set>
#include <seastar/core/print.hh>
#include <seastar/core/sstring.hh>
#include "auth/permission.hh"
#include "seastarx.hh"
#include "stdx.hh"
#include "utils/hash.hh"
namespace auth {
class invalid_resource_name : public std::invalid_argument {
public:
explicit invalid_resource_name(stdx::string_view name)
: std::invalid_argument(sprint("The resource name '%s' is invalid.", name)) {
}
};
enum class resource_kind {
data, role
};
std::ostream& operator<<(std::ostream&, resource_kind);
///
/// Type tag for constructing data resources.
///
struct data_resource_t final {};
///
/// Type tag for constructing role resources.
///
struct role_resource_t final {};
///
/// Resources are entities that users can be granted permissions on.
///
/// There are data (keyspaces and tables) and role resources. There may be other kinds of resources in the future.
///
/// When they are stored as system metadata, resources have the form `root/part_0/part_1/.../part_n`. Each kind of
/// resource has a specific root prefix, followed by a maximum of `n` parts (where `n` is distinct for each kind of
/// resource as well). In this code, this form is called the "name".
///
/// Since all resources have this same structure, all the different kinds are stored in instances of the same class:
/// \ref resource. When we wish to query a resource for kind-specific data (like the table of a "data" resource), we
/// create a kind-specific "view" of the resource.
///
class resource final {
resource_kind _kind;
std::vector<sstring> _parts;
public:
///
/// A root resource of a particular kind.
///
explicit resource(resource_kind);
resource(data_resource_t, stdx::string_view keyspace);
resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table);
resource(role_resource_t, stdx::string_view role);
resource_kind kind() const noexcept {
return _kind;
}
///
/// A machine-friendly identifier unique to each resource.
///
sstring name() const;
std::optional<resource> parent() const;
permission_set applicable_permissions() const;
private:
resource(resource_kind, std::vector<sstring> parts);
friend class std::hash<resource>;
friend class data_resource_view;
friend class role_resource_view;
friend bool operator<(const resource&, const resource&);
friend bool operator==(const resource&, const resource&);
friend resource parse_resource(stdx::string_view);
};
bool operator<(const resource&, const resource&);
inline bool operator==(const resource& r1, const resource& r2) {
return (r1._kind == r2._kind) && (r1._parts == r2._parts);
}
inline bool operator!=(const resource& r1, const resource& r2) {
return !(r1 == r2);
}
std::ostream& operator<<(std::ostream&, const resource&);
class resource_kind_mismatch : public std::invalid_argument {
public:
explicit resource_kind_mismatch(resource_kind expected, resource_kind actual)
: std::invalid_argument(
sprint("This resource has kind '%s', but was expected to have kind '%s'.", actual, expected)) {
}
};
/// A "data" view of \ref resource.
///
/// If neither `keyspace` nor `table` is present, this is the root resource.
class data_resource_view final {
const resource& _resource;
public:
///
/// \throws `resource_kind_mismatch` if the argument is not a `data` resource.
///
explicit data_resource_view(const resource& r);
std::optional<stdx::string_view> keyspace() const;
std::optional<stdx::string_view> table() const;
};
std::ostream& operator<<(std::ostream&, const data_resource_view&);
///
/// A "role" view of \ref resource.
///
/// If `role` is not present, this is the root resource.
///
class role_resource_view final {
const resource& _resource;
public:
///
/// \throws \ref resource_kind_mismatch if the argument is not a "role" resource.
///
explicit role_resource_view(const resource&);
std::optional<stdx::string_view> role() const;
};
std::ostream& operator<<(std::ostream&, const role_resource_view&);
///
/// Parse a resource from its name.
///
/// \throws \ref invalid_resource_name when the name is malformed.
///
resource parse_resource(stdx::string_view name);
const resource& root_data_resource();
inline resource make_data_resource(stdx::string_view keyspace) {
return resource(data_resource_t{}, keyspace);
}
inline resource make_data_resource(stdx::string_view keyspace, stdx::string_view table) {
return resource(data_resource_t{}, keyspace, table);
}
const resource& root_role_resource();
inline resource make_role_resource(stdx::string_view role) {
return resource(role_resource_t{}, role);
}
}
namespace std {
template <>
struct hash<auth::resource> {
static size_t hash_data(const auth::data_resource_view& dv) {
return utils::tuple_hash()(std::make_tuple(auth::resource_kind::data, dv.keyspace(), dv.table()));
}
static size_t hash_role(const auth::role_resource_view& rv) {
return utils::tuple_hash()(std::make_tuple(auth::resource_kind::role, rv.role()));
}
size_t operator()(const auth::resource& r) const {
std::size_t value;
switch (r._kind) {
case auth::resource_kind::data: value = hash_data(auth::data_resource_view(r)); break;
case auth::resource_kind::role: value = hash_role(auth::role_resource_view(r)); break;
}
return value;
}
};
}
namespace auth {
using resource_set = std::unordered_set<resource>;
//
// A resource and all of its parents.
//
resource_set expand_resource_family(const resource&);
}

View File

@@ -1,169 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <experimental/string_view>
#include <memory>
#include <optional>
#include <stdexcept>
#include <unordered_set>
#include <seastar/core/future.hh>
#include <seastar/core/print.hh>
#include <seastar/core/sstring.hh>
#include "auth/resource.hh"
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
struct role_config final {
bool is_superuser{false};
bool can_login{false};
};
///
/// Differential update for altering existing roles.
///
struct role_config_update final {
std::optional<bool> is_superuser{};
std::optional<bool> can_login{};
};
///
/// A logical argument error for a role-management operation.
///
class roles_argument_exception : public std::invalid_argument {
public:
using std::invalid_argument::invalid_argument;
};
class role_already_exists : public roles_argument_exception {
public:
explicit role_already_exists(stdx::string_view role_name)
: roles_argument_exception(sprint("Role %s already exists.", role_name)) {
}
};
class nonexistant_role : public roles_argument_exception {
public:
explicit nonexistant_role(stdx::string_view role_name)
: roles_argument_exception(sprint("Role %s doesn't exist.", role_name)) {
}
};
class role_already_included : public roles_argument_exception {
public:
role_already_included(stdx::string_view grantee_name, stdx::string_view role_name)
: roles_argument_exception(
sprint("%s already includes role %s.", grantee_name, role_name)) {
}
};
class revoke_ungranted_role : public roles_argument_exception {
public:
revoke_ungranted_role(stdx::string_view revokee_name, stdx::string_view role_name)
: roles_argument_exception(
sprint("%s was not granted role %s, so it cannot be revoked.", revokee_name, role_name)) {
}
};
using role_set = std::unordered_set<sstring>;
enum class recursive_role_query { yes, no };
///
/// Abstract client for managing roles.
///
/// All state necessary for managing roles is stored externally to the client instance.
///
/// All implementations should throw role-related exceptions as documented. Authorization is not addressed here, and
/// access-control should never be enforced in implementations.
///
class role_manager {
public:
virtual ~role_manager() = default;
virtual stdx::string_view qualified_java_name() const noexcept = 0;
virtual const resource_set& protected_resources() const = 0;
virtual future<> start() = 0;
virtual future<> stop() = 0;
///
/// \returns an exceptional future with \ref role_already_exists for a role that has previously been created.
///
virtual future<> create(stdx::string_view role_name, const role_config&) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<> drop(stdx::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<> alter(stdx::string_view role_name, const role_config_update&) const = 0;
///
/// Grant `role_name` to `grantee_name`.
///
/// \returns an exceptional future with \ref nonexistant_role if either the role or the grantee do not exist.
///
/// \returns an exceptional future with \ref role_already_included if granting the role would be redundant, or
/// create a cycle.
///
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const = 0;
///
/// Revoke `role_name` from `revokee_name`.
///
/// \returns an exceptional future with \ref nonexistant_role if either the role or the revokee do not exist.
///
/// \returns an exceptional future with \ref revoke_ungranted_role if the role was not granted.
///
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<role_set> query_granted(stdx::string_view grantee, recursive_role_query) const = 0;
virtual future<role_set> query_all() const = 0;
virtual future<bool> exists(stdx::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<bool> is_superuser(stdx::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<bool> can_login(stdx::string_view role_name) const = 0;
};
}

View File

@@ -1,41 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/role_or_anonymous.hh"
#include <iostream>
namespace auth {
std::ostream& operator<<(std::ostream& os, const role_or_anonymous& mr) {
os << mr.name.value_or("<anonymous>");
return os;
}
bool operator==(const role_or_anonymous& mr1, const role_or_anonymous& mr2) noexcept {
return mr1.name == mr2.name;
}
bool is_anonymous(const role_or_anonymous& mr) noexcept {
return !mr.name.has_value();
}
}

View File

@@ -1,66 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <experimental/string_view>
#include <functional>
#include <iosfwd>
#include <optional>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
class role_or_anonymous final {
public:
std::optional<sstring> name{};
role_or_anonymous() = default;
role_or_anonymous(stdx::string_view name) : name(name) {
}
};
std::ostream& operator<<(std::ostream&, const role_or_anonymous&);
bool operator==(const role_or_anonymous&, const role_or_anonymous&) noexcept;
inline bool operator!=(const role_or_anonymous& mr1, const role_or_anonymous& mr2) noexcept {
return !(mr1 == mr2);
}
bool is_anonymous(const role_or_anonymous&) noexcept;
}
namespace std {
template <>
struct hash<auth::role_or_anonymous> {
size_t operator()(const auth::role_or_anonymous& mr) const {
return hash<std::optional<sstring>>()(mr.name);
}
};
}

View File

@@ -1,122 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/roles-metadata.hh"
#include <boost/algorithm/cxx11/any_of.hpp>
#include <seastar/core/print.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sstring.hh>
#include "auth/common.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
namespace auth {
namespace meta {
namespace roles_table {
stdx::string_view creation_query() {
static const sstring instance = sprint(
"CREATE TABLE %s ("
" %s text PRIMARY KEY,"
" can_login boolean,"
" is_superuser boolean,"
" member_of set<text>,"
" salted_hash text"
")",
qualified_name(),
role_col_name);
return instance;
}
stdx::string_view qualified_name() noexcept {
static const sstring instance = AUTH_KS + "." + sstring(name);
return instance;
}
}
}
future<bool> default_role_row_satisfies(
cql3::query_processor& qp,
std::function<bool(const cql3::untyped_result_set_row&)> p) {
static const sstring query = sprint(
"SELECT * FROM %s WHERE %s = ?",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return do_with(std::move(p), [&qp](const auto& p) {
return qp.process(
query,
db::consistency_level::ONE,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return qp.process(
query,
db::consistency_level::QUORUM,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return make_ready_future<bool>(false);
}
return make_ready_future<bool>(p(results->one()));
});
}
return make_ready_future<bool>(p(results->one()));
});
});
}
future<bool> any_nondefault_role_row_satisfies(
cql3::query_processor& qp,
std::function<bool(const cql3::untyped_result_set_row&)> p) {
static const sstring query = sprint("SELECT * FROM %s", meta::roles_table::qualified_name());
return do_with(std::move(p), [&qp](const auto& p) {
return qp.process(
query,
db::consistency_level::QUORUM,
infinite_timeout_config).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return false;
}
static const sstring col_name = sstring(meta::roles_table::role_col_name);
return boost::algorithm::any_of(*results, [&p](const cql3::untyped_result_set_row& row) {
const bool is_nondefault = row.get_as<sstring>(col_name) != meta::DEFAULT_SUPERUSER_NAME;
return is_nondefault && p(row);
});
});
});
}
}

View File

@@ -1,69 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <experimental/string_view>
#include <functional>
#include <seastar/core/future.hh>
#include "seastarx.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
class untyped_result_set_row;
}
namespace auth {
namespace meta {
namespace roles_table {
stdx::string_view creation_query();
constexpr stdx::string_view name{"roles", 5};
stdx::string_view qualified_name() noexcept;
constexpr stdx::string_view role_col_name{"role", 4};
}
}
///
/// Check that the default role satisfies a predicate, or `false` if the default role does not exist.
///
future<bool> default_role_row_satisfies(
cql3::query_processor&,
std::function<bool(const cql3::untyped_result_set_row&)>);
///
/// Check that any nondefault role satisfies a predicate. `false` if no nondefault roles exist.
///
future<bool> any_nondefault_role_row_satisfies(
cql3::query_processor&,
std::function<bool(const cql3::untyped_result_set_row&)>);
}

View File

@@ -1,583 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/service.hh"
#include <algorithm>
#include <map>
#include <seastar/core/future-util.hh>
#include <seastar/core/sharded.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/allow_all_authenticator.hh"
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/role_or_anonymous.hh"
#include "auth/standard_role_manager.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "db/config.hh"
#include "db/consistency_level_type.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_listener.hh"
#include "utils/class_registrator.hh"
namespace auth {
namespace meta {
static const sstring user_name_col_name("name");
static const sstring superuser_col_name("super");
}
static logging::logger log("auth_service");
class auth_migration_listener final : public ::service::migration_listener {
authorizer& _authorizer;
public:
explicit auth_migration_listener(authorizer& a) : _authorizer(a) {
}
private:
void on_create_keyspace(const sstring& ks_name) override {}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_create_view(const sstring& ks_name, const sstring& view_name) override {}
void on_update_keyspace(const sstring& ks_name) override {}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_drop_keyspace(const sstring& ks_name) override {
_authorizer.revoke_all(
auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
});
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
_authorizer.revoke_all(
auth::make_data_resource(
ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
});
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
};
static future<> validate_role_exists(const service& ser, stdx::string_view role_name) {
return ser.underlying_role_manager().exists(role_name).then([role_name](bool exists) {
if (!exists) {
throw nonexistant_role(role_name);
}
});
}
service_config service_config::from_db_config(const db::config& dc) {
const qualified_name qualified_authorizer_name(meta::AUTH_PACKAGE_NAME, dc.authorizer());
const qualified_name qualified_authenticator_name(meta::AUTH_PACKAGE_NAME, dc.authenticator());
const qualified_name qualified_role_manager_name(meta::AUTH_PACKAGE_NAME, dc.role_manager());
service_config c;
c.authorizer_java_name = qualified_authorizer_name;
c.authenticator_java_name = qualified_authenticator_name;
c.role_manager_java_name = qualified_role_manager_name;
return c;
}
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_manager& mm,
std::unique_ptr<authorizer> z,
std::unique_ptr<authenticator> a,
std::unique_ptr<role_manager> r)
: _permissions_cache_config(std::move(c))
, _permissions_cache(nullptr)
, _qp(qp)
, _migration_manager(mm)
, _authorizer(std::move(z))
, _authenticator(std::move(a))
, _role_manager(std::move(r))
, _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {
// The password authenticator requires that the `standard_role_manager` is running so that the roles metadata table
// it manages is created and updated. This cross-module dependency is rather gross, but we have to maintain it for
// the sake of compatibility with Apache Cassandra and its choice of auth. schema.
if ((_authenticator->qualified_java_name() == password_authenticator_name())
&& (_role_manager->qualified_java_name() != standard_role_manager_name())) {
throw incompatible_module_combination(
sprint(
"The %s authenticator must be loaded alongside the %s role-manager.",
password_authenticator_name(),
standard_role_manager_name()));
}
}
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_manager& mm,
const service_config& sc)
: service(
std::move(c),
qp,
mm,
create_object<authorizer>(sc.authorizer_java_name, qp, mm),
create_object<authenticator>(sc.authenticator_java_name, qp, mm),
create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {
}
future<> service::create_keyspace_if_missing() const {
auto& db = _qp.db().local();
if (!db.has_keyspace(meta::AUTH_KS)) {
std::map<sstring, sstring> opts{{"replication_factor", "1"}};
auto ksm = keyspace_metadata::new_keyspace(
meta::AUTH_KS,
"org.apache.cassandra.locator.SimpleStrategy",
opts,
true);
// We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
// See issue #2129.
return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
}
return make_ready_future<>();
}
future<> service::start() {
return once_among_shards([this] {
return create_keyspace_if_missing();
}).then([this] {
return when_all_succeed(_role_manager->start(), _authorizer->start(), _authenticator->start());
}).then([this] {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
return once_among_shards([this] {
_migration_manager.register_listener(_migration_listener.get());
return make_ready_future<>();
});
});
}
future<> service::stop() {
return _permissions_cache->stop().then([this] {
return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
});
}
future<bool> service::has_existing_legacy_users() const {
if (!_qp.db().local().has_schema(meta::AUTH_KS, meta::USERS_CF)) {
return make_ready_future<bool>(false);
}
static const sstring default_user_query = sprint(
"SELECT * FROM %s.%s WHERE %s = ?",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name);
static const sstring all_users_query = sprint(
"SELECT * FROM %s.%s LIMIT 1",
meta::AUTH_KS,
meta::USERS_CF);
// This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we
// can potentially avoid doing a range query with a high consistency level.
return _qp.process(
default_user_query,
db::consistency_level::ONE,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([this](auto results) {
if (!results->empty()) {
return make_ready_future<bool>(true);
}
return _qp.process(
default_user_query,
db::consistency_level::QUORUM,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([this](auto results) {
if (!results->empty()) {
return make_ready_future<bool>(true);
}
return _qp.process(
all_users_query,
db::consistency_level::QUORUM,
infinite_timeout_config).then([](auto results) {
return make_ready_future<bool>(!results->empty());
});
});
});
}
future<permission_set>
service::get_uncached_permissions(const role_or_anonymous& maybe_role, const resource& r) const {
if (is_anonymous(maybe_role)) {
return _authorizer->authorize(maybe_role, r);
}
const stdx::string_view role_name = *maybe_role.name;
return has_superuser(role_name).then([this, role_name, &r](bool superuser) {
if (superuser) {
return make_ready_future<permission_set>(r.applicable_permissions());
}
//
// Aggregate the permissions from all granted roles.
//
return do_with(permission_set(), [this, role_name, &r](auto& all_perms) {
return get_roles(role_name).then([this, &r, &all_perms](role_set all_roles) {
return do_with(std::move(all_roles), [this, &r, &all_perms](const auto& all_roles) {
return parallel_for_each(all_roles, [this, &r, &all_perms](stdx::string_view role_name) {
return _authorizer->authorize(role_name, r).then([&all_perms](permission_set perms) {
all_perms = permission_set::from_mask(all_perms.mask() | perms.mask());
});
});
});
}).then([&all_perms] {
return all_perms;
});
});
});
}
future<permission_set> service::get_permissions(const role_or_anonymous& maybe_role, const resource& r) const {
return _permissions_cache->get(maybe_role, r);
}
future<bool> service::has_superuser(stdx::string_view role_name) const {
return this->get_roles(std::move(role_name)).then([this](role_set roles) {
return do_with(std::move(roles), [this](const role_set& roles) {
return do_with(false, roles.begin(), [this, &roles](bool& any_super, auto& iter) {
return do_until(
[&roles, &any_super, &iter] { return any_super || (iter == roles.end()); },
[this, &any_super, &iter] {
return _role_manager->is_superuser(*iter++).then([&any_super](bool super) {
any_super = super;
});
}).then([&any_super] {
return any_super;
});
});
});
});
}
future<role_set> service::get_roles(stdx::string_view role_name) const {
//
// We may wish to cache this information in the future (as Apache Cassandra does).
//
return _role_manager->query_granted(role_name, recursive_role_query::yes);
}
future<bool> service::exists(const resource& r) const {
switch (r.kind()) {
case resource_kind::data: {
const auto& db = _qp.db().local();
data_resource_view v(r);
const auto keyspace = v.keyspace();
const auto table = v.table();
if (table) {
return make_ready_future<bool>(db.has_schema(sstring(*keyspace), sstring(*table)));
}
if (keyspace) {
return make_ready_future<bool>(db.has_keyspace(sstring(*keyspace)));
}
return make_ready_future<bool>(true);
}
case resource_kind::role: {
role_resource_view v(r);
const auto role = v.role();
if (role) {
return _role_manager->exists(*role);
}
return make_ready_future<bool>(true);
}
}
return make_ready_future<bool>(false);
}
//
// Free functions.
//
future<bool> has_superuser(const service& ser, const authenticated_user& u) {
if (is_anonymous(u)) {
return make_ready_future<bool>(false);
}
return ser.has_superuser(*u.name);
}
future<role_set> get_roles(const service& ser, const authenticated_user& u) {
if (is_anonymous(u)) {
return make_ready_future<role_set>();
}
return ser.get_roles(*u.name);
}
future<permission_set> get_permissions(const service& ser, const authenticated_user& u, const resource& r) {
return do_with(role_or_anonymous(), [&ser, &u, &r](auto& maybe_role) {
maybe_role.name = u.name;
return ser.get_permissions(maybe_role, r);
});
}
bool is_enforcing(const service& ser) {
const bool enforcing_authorizer = ser.underlying_authorizer().qualified_java_name() != allow_all_authorizer_name();
const bool enforcing_authenticator = ser.underlying_authenticator().qualified_java_name()
!= allow_all_authenticator_name();
return enforcing_authorizer || enforcing_authenticator;
}
bool is_protected(const service& ser, const resource& r) noexcept {
return ser.underlying_role_manager().protected_resources().count(r)
|| ser.underlying_authenticator().protected_resources().count(r)
|| ser.underlying_authorizer().protected_resources().count(r);
}
static void validate_authentication_options_are_supported(
const authentication_options& options,
const authentication_option_set& supported) {
const auto check = [&supported](authentication_option k) {
if (supported.count(k) == 0) {
throw unsupported_authentication_option(k);
}
};
if (options.password) {
check(authentication_option::password);
}
if (options.options) {
check(authentication_option::options);
}
}
future<> create_role(
const service& ser,
stdx::string_view name,
const role_config& config,
const authentication_options& options) {
return ser.underlying_role_manager().create(name, config).then([&ser, name, &options] {
if (!auth::any_authentication_options(options)) {
return make_ready_future<>();
}
return futurize_apply(
&validate_authentication_options_are_supported,
options,
ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {
return ser.underlying_authenticator().create(name, options);
}).handle_exception([&ser, &name](std::exception_ptr ep) {
// Roll-back.
return ser.underlying_role_manager().drop(name).then([ep = std::move(ep)] {
std::rethrow_exception(ep);
});
});
});
}
future<> alter_role(
const service& ser,
stdx::string_view name,
const role_config_update& config_update,
const authentication_options& options) {
return ser.underlying_role_manager().alter(name, config_update).then([&ser, name, &options] {
if (!any_authentication_options(options)) {
return make_ready_future<>();
}
return futurize_apply(
&validate_authentication_options_are_supported,
options,
ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {
return ser.underlying_authenticator().alter(name, options);
});
});
}
future<> drop_role(const service& ser, stdx::string_view name) {
return do_with(make_role_resource(name), [&ser, name](const resource& r) {
auto& a = ser.underlying_authorizer();
return when_all_succeed(
a.revoke_all(name),
a.revoke_all(r)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
});
}).then([&ser, name] {
return ser.underlying_authenticator().drop(name);
}).then([&ser, name] {
return ser.underlying_role_manager().drop(name);
});
}
future<bool> has_role(const service& ser, stdx::string_view grantee, stdx::string_view name) {
return when_all_succeed(
validate_role_exists(ser, name),
ser.get_roles(grantee)).then([name](role_set all_roles) {
return make_ready_future<bool>(all_roles.count(sstring(name)) != 0);
});
}
future<bool> has_role(const service& ser, const authenticated_user& u, stdx::string_view name) {
if (is_anonymous(u)) {
return make_ready_future<bool>(false);
}
return has_role(ser, *u.name, name);
}
future<> grant_permissions(
const service& ser,
stdx::string_view role_name,
permission_set perms,
const resource& r) {
return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {
return ser.underlying_authorizer().grant(role_name, perms, r);
});
}
future<> grant_applicable_permissions(const service& ser, stdx::string_view role_name, const resource& r) {
return grant_permissions(ser, role_name, r.applicable_permissions(), r);
}
future<> grant_applicable_permissions(const service& ser, const authenticated_user& u, const resource& r) {
if (is_anonymous(u)) {
return make_ready_future<>();
}
return grant_applicable_permissions(ser, *u.name, r);
}
future<> revoke_permissions(
const service& ser,
stdx::string_view role_name,
permission_set perms,
const resource& r) {
return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {
return ser.underlying_authorizer().revoke(role_name, perms, r);
});
}
future<std::vector<permission_details>> list_filtered_permissions(
const service& ser,
permission_set perms,
std::optional<stdx::string_view> role_name,
const std::optional<std::pair<resource, recursive_permissions>>& resource_filter) {
return ser.underlying_authorizer().list_all().then([&ser, perms, role_name, &resource_filter](
std::vector<permission_details> all_details) {
if (resource_filter) {
const resource r = resource_filter->first;
const auto resources = resource_filter->second
? auth::expand_resource_family(r)
: auth::resource_set{r};
all_details.erase(
std::remove_if(
all_details.begin(),
all_details.end(),
[&resources](const permission_details& pd) {
return resources.count(pd.resource) == 0;
}),
all_details.end());
}
std::transform(
std::make_move_iterator(all_details.begin()),
std::make_move_iterator(all_details.end()),
all_details.begin(),
[perms](permission_details pd) {
pd.permissions = permission_set::from_mask(pd.permissions.mask() & perms.mask());
return pd;
});
// Eliminate rows with an empty permission set.
all_details.erase(
std::remove_if(all_details.begin(), all_details.end(), [](const permission_details& pd) {
return pd.permissions.mask() == 0;
}),
all_details.end());
if (!role_name) {
return make_ready_future<std::vector<permission_details>>(std::move(all_details));
}
//
// Filter out rows based on whether permissions have been granted to this role (directly or indirectly).
//
return do_with(std::move(all_details), [&ser, role_name](auto& all_details) {
return ser.get_roles(*role_name).then([&all_details](role_set all_roles) {
all_details.erase(
std::remove_if(
all_details.begin(),
all_details.end(),
[&all_roles](const permission_details& pd) {
return all_roles.count(pd.role_name) == 0;
}),
all_details.end());
return make_ready_future<std::vector<permission_details>>(std::move(all_details));
});
});
});
}
}

View File

@@ -1,296 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <experimental/string_view>
#include <memory>
#include <optional>
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include <seastar/util/bool_class.hh>
#include "auth/authenticator.hh"
#include "auth/authorizer.hh"
#include "auth/permission.hh"
#include "auth/permissions_cache.hh"
#include "auth/role_manager.hh"
#include "seastarx.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
}
namespace db {
class config;
}
namespace service {
class migration_manager;
class migration_listener;
}
namespace auth {
class role_or_anonymous;
struct service_config final {
static service_config from_db_config(const db::config&);
sstring authorizer_java_name;
sstring authenticator_java_name;
sstring role_manager_java_name;
};
///
/// Due to poor (in this author's opinion) decisions of Apache Cassandra, certain choices of one role-manager,
/// authenticator, or authorizer imply restrictions on the rest.
///
/// This exception is thrown when an invalid combination of modules is selected, with a message explaining the
/// incompatibility.
///
class incompatible_module_combination : public std::invalid_argument {
public:
using std::invalid_argument::invalid_argument;
};
///
/// Client for access-control in the system.
///
/// Access control encompasses user/role management, authentication, and authorization. This client provides access to
/// the dynamically-loaded implementations of these modules (through the `underlying_*` member functions), but also
/// builds on their functionality with caching and abstractions for common operations.
///
/// All state associated with access-control is stored externally to any particular instance of this class.
///
class service final {
permissions_cache_config _permissions_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache;
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
std::unique_ptr<authorizer> _authorizer;
std::unique_ptr<authenticator> _authenticator;
std::unique_ptr<role_manager> _role_manager;
// Only one of these should be registered, so we end up with some unused instances. Not the end of the world.
std::unique_ptr<::service::migration_listener> _migration_listener;
public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_manager&,
std::unique_ptr<authorizer>,
std::unique_ptr<authenticator>,
std::unique_ptr<role_manager>);
///
/// This constructor is intended to be used when the class is sharded via \ref seastar::sharded. In that case, the
/// arguments must be copyable, which is why we delay construction with instance-construction instructions instead
/// of the instances themselves.
///
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_manager&,
const service_config&);
future<> start();
future<> stop();
///
/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
///
future<permission_set> get_permissions(const role_or_anonymous&, const resource&) const;
///
/// Like \ref get_permissions, but never returns cached permissions.
///
future<permission_set> get_uncached_permissions(const role_or_anonymous&, const resource&) const;
///
/// Query whether the named role has been granted a role that is a superuser.
///
/// A role is always granted to itself. Therefore, a role that "is" a superuser also "has" superuser.
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
future<bool> has_superuser(stdx::string_view role_name) const;
///
/// Return the set of all roles granted to the given role, including itself and roles granted through other roles.
///
/// \returns an exceptional future with \ref nonexistent_role if the role does not exist.
future<role_set> get_roles(stdx::string_view role_name) const;
future<bool> exists(const resource&) const;
const authenticator& underlying_authenticator() const {
return *_authenticator;
}
const authorizer& underlying_authorizer() const {
return *_authorizer;
}
const role_manager& underlying_role_manager() const {
return *_role_manager;
}
private:
future<bool> has_existing_legacy_users() const;
future<> create_keyspace_if_missing() const;
};
future<bool> has_superuser(const service&, const authenticated_user&);
future<role_set> get_roles(const service&, const authenticated_user&);
future<permission_set> get_permissions(const service&, const authenticated_user&, const resource&);
///
/// Access-control is "enforcing" when either the authenticator or the authorizer are not their "allow-all" variants.
///
/// Put differently, when access control is not enforcing, all operations on resources will be allowed and users do not
/// need to authenticate themselves.
///
bool is_enforcing(const service&);
///
/// Protected resources cannot be modified even if the performer has permissions to do so.
///
bool is_protected(const service&, const resource&) noexcept;
///
/// Create a role with optional authentication information.
///
/// \returns an exceptional future with \ref role_already_exists if the user or role exists.
///
/// \returns an exceptional future with \ref unsupported_authentication_option if an unsupported option is included.
///
future<> create_role(
const service&,
stdx::string_view name,
const role_config&,
const authentication_options&);
///
/// Alter an existing role and its authentication information.
///
/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
///
/// \returns an exceptional future with \ref unsupported_authentication_option if an unsupported option is included.
///
future<> alter_role(
const service&,
stdx::string_view name,
const role_config_update&,
const authentication_options&);
///
/// Drop a role from the system, including all permissions and authentication information.
///
/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
///
future<> drop_role(const service&, stdx::string_view name);
///
/// Check if `grantee` has been granted the named role.
///
/// \returns an exceptional future with \ref nonexistent_role if `grantee` or `name` do not exist.
///
future<bool> has_role(const service&, stdx::string_view grantee, stdx::string_view name);
///
/// Check if the authenticated user has been granted the named role.
///
/// \returns an exceptional future with \ref nonexistent_role if the user or `name` do not exist.
///
future<bool> has_role(const service&, const authenticated_user&, stdx::string_view name);
///
/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not
/// supported.
///
future<> grant_permissions(
const service&,
stdx::string_view role_name,
permission_set,
const resource&);
///
/// Like \ref grant_permissions, but grants all applicable permissions on the resource.
///
/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not
/// supported.
///
future<> grant_applicable_permissions(const service&, stdx::string_view role_name, const resource&);
future<> grant_applicable_permissions(const service&, const authenticated_user&, const resource&);
///
/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if revoking permissions is not
/// supported.
///
future<> revoke_permissions(
const service&,
stdx::string_view role_name,
permission_set,
const resource&);
using recursive_permissions = bool_class<struct recursive_permissions_tag>;
///
/// Query for all granted permissions according to filtering criteria.
///
/// Only permissions included in the provided set are included.
///
/// If a role name is provided, only permissions granted (directly or recursively) to the role are included.
///
/// If a resource filter is provided, only permissions granted on the resource are included. When \ref
/// recursive_permissions is `true`, permissions on a parent resource are included.
///
/// \returns an exceptional future with \ref nonexistent_role if a role name is included which refers to a role that
/// does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if listing permissions is not
/// supported.
///
future<std::vector<permission_details>> list_filtered_permissions(
const service&,
permission_set,
std::optional<stdx::string_view> role_name,
const std::optional<std::pair<resource, recursive_permissions>>& resource_filter);
}

View File

@@ -1,552 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/standard_role_manager.hh"
#include <experimental/optional>
#include <unordered_set>
#include <vector>
#include <boost/algorithm/string/join.hpp>
#include <seastar/core/future-util.hh>
#include <seastar/core/print.hh>
#include <seastar/core/sleep.hh>
#include <seastar/core/sstring.hh>
#include <seastar/core/thread.hh>
#include "auth/common.hh"
#include "auth/roles-metadata.hh"
#include "cql3/query_processor.hh"
#include "db/consistency_level_type.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "utils/class_registrator.hh"
namespace auth {
namespace meta {
namespace role_members_table {
constexpr stdx::string_view name{"role_members" , 12};
static stdx::string_view qualified_name() noexcept {
static const sstring instance = AUTH_KS + "." + sstring(name);
return instance;
}
}
}
static logging::logger log("standard_role_manager");
static const class_registrator<
role_manager,
standard_role_manager,
cql3::query_processor&,
::service::migration_manager&> registration("org.apache.cassandra.auth.CassandraRoleManager");
struct record final {
sstring name;
bool is_superuser;
bool can_login;
role_set member_of;
};
static db::consistency_level consistency_for_role(stdx::string_view role_name) noexcept {
if (role_name == meta::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
static future<stdx::optional<record>> find_record(cql3::query_processor& qp, stdx::string_view role_name) {
static const sstring query = sprint(
"SELECT * FROM %s WHERE %s = ?",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return qp.process(
query,
consistency_for_role(role_name),
infinite_timeout_config,
{sstring(role_name)},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return stdx::optional<record>();
}
const cql3::untyped_result_set_row& row = results->one();
return stdx::make_optional(
record{
row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
row.get_as<bool>("is_superuser"),
row.get_as<bool>("can_login"),
(row.has("member_of")
? row.get_set<sstring>("member_of")
: role_set())});
});
}
static future<record> require_record(cql3::query_processor& qp, stdx::string_view role_name) {
return find_record(qp, role_name).then([role_name](stdx::optional<record> mr) {
if (!mr) {
throw nonexistant_role(role_name);
}
return make_ready_future<record>(*mr);
});
}
static bool has_can_login(const cql3::untyped_result_set_row& row) {
return row.has("can_login") && !(boolean_type->deserialize(row.get_blob("can_login")).is_null());
}
stdx::string_view standard_role_manager_name() noexcept {
static const sstring instance = meta::AUTH_PACKAGE_NAME + "CassandraRoleManager";
return instance;
}
stdx::string_view standard_role_manager::qualified_java_name() const noexcept {
return standard_role_manager_name();
}
const resource_set& standard_role_manager::protected_resources() const {
static const resource_set resources({
make_data_resource(meta::AUTH_KS, meta::roles_table::name),
make_data_resource(meta::AUTH_KS, meta::role_members_table::name)});
return resources;
}
future<> standard_role_manager::create_metadata_tables_if_missing() const {
static const sstring create_role_members_query = sprint(
"CREATE TABLE %s ("
" role text,"
" member text,"
" PRIMARY KEY (role, member)"
")",
meta::role_members_table::qualified_name());
return when_all_succeed(
create_metadata_table_if_missing(
meta::roles_table::name,
_qp,
meta::roles_table::creation_query(),
_migration_manager),
create_metadata_table_if_missing(
meta::role_members_table::name,
_qp,
create_role_members_query,
_migration_manager));
}
future<> standard_role_manager::create_default_role_if_missing() const {
return default_role_row_satisfies(_qp, &has_can_login).then([this](bool exists) {
if (!exists) {
static const sstring query = sprint(
"INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, true, true)",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.process(
query,
db::consistency_level::QUORUM,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME}).then([](auto&&) {
log.info("Created default superuser role '{}'.", meta::DEFAULT_SUPERUSER_NAME);
return make_ready_future<>();
});
}
return make_ready_future<>();
}).handle_exception_type([](const exceptions::unavailable_exception& e) {
log.warn("Skipped default role setup: some nodes were not ready; will retry");
return make_exception_future<>(e);
});
}
static const sstring legacy_table_name{"users"};
bool standard_role_manager::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<> standard_role_manager::migrate_legacy_metadata() const {
log.info("Starting migration of legacy user metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
db::consistency_level::QUORUM,
infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
role_config config;
config.is_superuser = row.get_as<bool>("super");
config.can_login = true;
return do_with(
row.get_as<sstring>("name"),
std::move(config),
[this](const auto& name, const auto& config) {
return this->create_or_replace(name, config);
});
}).finally([results] {});
}).then([] {
log.info("Finished migrating legacy user metadata.");
}).handle_exception([](std::exception_ptr ep) {
log.error("Encountered an error during migration!");
std::rethrow_exception(ep);
});
}
future<> standard_role_manager::start() {
return once_among_shards([this] {
return this->create_metadata_tables_if_missing().then([this] {
_stopped = auth::do_after_system_ready(_as, [this] {
return seastar::async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_can_login).get0()) {
if (this->legacy_metadata_exists()) {
log.warn("Ignoring legacy user metadata since nondefault roles already exist.");
}
return;
}
if (this->legacy_metadata_exists()) {
this->migrate_legacy_metadata().get0();
return;
}
create_default_role_if_missing().get0();
});
});
});
});
}
future<> standard_role_manager::stop() {
_as.request_abort();
return _stopped.handle_exception_type([] (const sleep_aborted&) { });
}
future<> standard_role_manager::create_or_replace(stdx::string_view role_name, const role_config& c) const {
static const sstring query = sprint(
"INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, ?, ?)",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.process(
query,
consistency_for_role(role_name),
infinite_timeout_config,
{sstring(role_name), c.is_superuser, c.can_login},
true).discard_result();
}
future<>
standard_role_manager::create(stdx::string_view role_name, const role_config& c) const {
return this->exists(role_name).then([this, role_name, &c](bool role_exists) {
if (role_exists) {
throw role_already_exists(role_name);
}
return this->create_or_replace(role_name, c);
});
}
future<>
standard_role_manager::alter(stdx::string_view role_name, const role_config_update& u) const {
static const auto build_column_assignments = [](const role_config_update& u) -> sstring {
std::vector<sstring> assignments;
if (u.is_superuser) {
assignments.push_back(sstring("is_superuser = ") + (*u.is_superuser ? "true" : "false"));
}
if (u.can_login) {
assignments.push_back(sstring("can_login = ") + (*u.can_login ? "true" : "false"));
}
return boost::algorithm::join(assignments, ", ");
};
return require_record(_qp, role_name).then([this, role_name, &u](record) {
if (!u.is_superuser && !u.can_login) {
return make_ready_future<>();
}
return _qp.process(
sprint(
"UPDATE %s SET %s WHERE %s = ?",
meta::roles_table::qualified_name(),
build_column_assignments(u),
meta::roles_table::role_col_name),
consistency_for_role(role_name),
infinite_timeout_config,
{sstring(role_name)}).discard_result();
});
}
future<> standard_role_manager::drop(stdx::string_view role_name) const {
return this->exists(role_name).then([this, role_name](bool role_exists) {
if (!role_exists) {
throw nonexistant_role(role_name);
}
// First, revoke this role from all roles that are members of it.
const auto revoke_from_members = [this, role_name] {
static const sstring query = sprint(
"SELECT member FROM %s WHERE role = ?",
meta::role_members_table::qualified_name());
return _qp.process(
query,
consistency_for_role(role_name),
infinite_timeout_config,
{sstring(role_name)}).then([this, role_name](::shared_ptr<cql3::untyped_result_set> members) {
return parallel_for_each(
members->begin(),
members->end(),
[this, role_name](const cql3::untyped_result_set_row& member_row) {
const sstring member = member_row.template get_as<sstring>("member");
return this->modify_membership(member, role_name, membership_change::remove);
}).finally([members] {});
});
};
// In parallel, revoke all roles that this role is members of.
const auto revoke_members_of = [this, grantee = role_name] {
return this->query_granted(
grantee,
recursive_role_query::no).then([this, grantee](role_set granted_roles) {
return do_with(
std::move(granted_roles),
[this, grantee](const role_set& granted_roles) {
return parallel_for_each(
granted_roles.begin(),
granted_roles.end(),
[this, grantee](const sstring& role_name) {
return this->modify_membership(grantee, role_name, membership_change::remove);
});
});
});
};
// Finally, delete the role itself.
auto delete_role = [this, role_name] {
static const sstring query = sprint(
"DELETE FROM %s WHERE %s = ?",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
return _qp.process(
query,
consistency_for_role(role_name),
infinite_timeout_config,
{sstring(role_name)}).discard_result();
};
return when_all_succeed(revoke_from_members(), revoke_members_of()).then([delete_role = std::move(delete_role)] {
return delete_role();
});
});
}
future<>
standard_role_manager::modify_membership(
stdx::string_view grantee_name,
stdx::string_view role_name,
membership_change ch) const {
const auto modify_roles = [this, role_name, grantee_name, ch] {
const auto query = sprint(
"UPDATE %s SET member_of = member_of %s ? WHERE %s = ?",
meta::roles_table::qualified_name(),
(ch == membership_change::add ? '+' : '-'),
meta::roles_table::role_col_name);
return _qp.process(
query,
consistency_for_role(grantee_name),
infinite_timeout_config,
{role_set{sstring(role_name)}, sstring(grantee_name)}).discard_result();
};
const auto modify_role_members = [this, role_name, grantee_name, ch] {
switch (ch) {
case membership_change::add:
return _qp.process(
sprint(
"INSERT INTO %s (role, member) VALUES (?, ?)",
meta::role_members_table::qualified_name()),
consistency_for_role(role_name),
infinite_timeout_config,
{sstring(role_name), sstring(grantee_name)}).discard_result();
case membership_change::remove:
return _qp.process(
sprint(
"DELETE FROM %s WHERE role = ? AND member = ?",
meta::role_members_table::qualified_name()),
consistency_for_role(role_name),
infinite_timeout_config,
{sstring(role_name), sstring(grantee_name)}).discard_result();
}
return make_ready_future<>();
};
return when_all_succeed(modify_roles(), modify_role_members());
}
future<>
standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view role_name) const {
const auto check_redundant = [this, role_name, grantee_name] {
return this->query_granted(
grantee_name,
recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {
if (roles.count(sstring(role_name)) != 0) {
throw role_already_included(grantee_name, role_name);
}
return make_ready_future<>();
});
};
const auto check_cycle = [this, role_name, grantee_name] {
return this->query_granted(
role_name,
recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {
if (roles.count(sstring(grantee_name)) != 0) {
throw role_already_included(role_name, grantee_name);
}
return make_ready_future<>();
});
};
return when_all_succeed(check_redundant(), check_cycle()).then([this, role_name, grantee_name] {
return this->modify_membership(grantee_name, role_name, membership_change::add);
});
}
future<>
standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view role_name) const {
return this->exists(role_name).then([this, revokee_name, role_name](bool role_exists) {
if (!role_exists) {
throw nonexistant_role(sstring(role_name));
}
}).then([this, revokee_name, role_name] {
return this->query_granted(
revokee_name,
recursive_role_query::no).then([revokee_name, role_name](role_set roles) {
if (roles.count(sstring(role_name)) == 0) {
throw revoke_ungranted_role(revokee_name, role_name);
}
return make_ready_future<>();
}).then([this, revokee_name, role_name] {
return this->modify_membership(revokee_name, role_name, membership_change::remove);
});
});
}
static future<> collect_roles(
cql3::query_processor& qp,
stdx::string_view grantee_name,
bool recurse,
role_set& roles) {
return require_record(qp, grantee_name).then([&qp, &roles, recurse](record r) {
return do_with(std::move(r.member_of), [&qp, &roles, recurse](const role_set& memberships) {
return do_for_each(memberships.begin(), memberships.end(), [&qp, &roles, recurse](const sstring& role_name) {
roles.insert(role_name);
if (recurse) {
return collect_roles(qp, role_name, true, roles);
}
return make_ready_future<>();
});
});
});
}
future<role_set> standard_role_manager::query_granted(stdx::string_view grantee_name, recursive_role_query m) const {
const bool recurse = (m == recursive_role_query::yes);
return do_with(
role_set{sstring(grantee_name)},
[this, grantee_name, recurse](role_set& roles) {
return collect_roles(_qp, grantee_name, recurse, roles).then([&roles] { return roles; });
});
}
future<role_set> standard_role_manager::query_all() const {
static const sstring query = sprint(
"SELECT %s FROM %s",
meta::roles_table::role_col_name,
meta::roles_table::qualified_name());
// To avoid many copies of a view.
static const auto role_col_name_string = sstring(meta::roles_table::role_col_name);
return _qp.process(query, db::consistency_level::QUORUM, infinite_timeout_config).then([](::shared_ptr<cql3::untyped_result_set> results) {
role_set roles;
std::transform(
results->begin(),
results->end(),
std::inserter(roles, roles.begin()),
[](const cql3::untyped_result_set_row& row) {
return row.get_as<sstring>(role_col_name_string);
});
return roles;
});
}
future<bool> standard_role_manager::exists(stdx::string_view role_name) const {
return find_record(_qp, role_name).then([](stdx::optional<record> mr) {
return static_cast<bool>(mr);
});
}
future<bool> standard_role_manager::is_superuser(stdx::string_view role_name) const {
return require_record(_qp, role_name).then([](record r) {
return r.is_superuser;
});
}
future<bool> standard_role_manager::can_login(stdx::string_view role_name) const {
return require_record(_qp, role_name).then([](record r) {
return r.can_login;
});
}
}

View File

@@ -1,105 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "auth/role_manager.hh"
#include <experimental/string_view>
#include <unordered_set>
#include <seastar/core/abort_source.hh>
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include "stdx.hh"
#include "seastarx.hh"
namespace cql3 {
class query_processor;
}
namespace service {
class migration_manager;
}
namespace auth {
stdx::string_view standard_role_manager_name() noexcept;
class standard_role_manager final : public role_manager {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
future<> _stopped;
seastar::abort_source _as;
public:
standard_role_manager(cql3::query_processor& qp, ::service::migration_manager& mm)
: _qp(qp)
, _migration_manager(mm)
, _stopped(make_ready_future<>()) {
}
virtual stdx::string_view qualified_java_name() const noexcept override;
virtual const resource_set& protected_resources() const override;
virtual future<> start() override;
virtual future<> stop() override;
virtual future<> create(stdx::string_view role_name, const role_config&) const override;
virtual future<> drop(stdx::string_view role_name) const override;
virtual future<> alter(stdx::string_view role_name, const role_config_update&) const override;
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const override;
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const override;
virtual future<role_set> query_granted(stdx::string_view grantee_name, recursive_role_query) const override;
virtual future<role_set> query_all() const override;
virtual future<bool> exists(stdx::string_view role_name) const override;
virtual future<bool> is_superuser(stdx::string_view role_name) const override;
virtual future<bool> can_login(stdx::string_view role_name) const override;
private:
enum class membership_change { add, remove };
future<> create_metadata_tables_if_missing() const;
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata() const;
future<> create_default_role_if_missing() const;
future<> create_or_replace(stdx::string_view role_name, const role_config&) const;
future<> modify_membership(stdx::string_view role_name, stdx::string_view grantee_name, membership_change) const;
};
}

View File

@@ -1,262 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2017 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/authenticated_user.hh"
#include "auth/authenticator.hh"
#include "auth/authorizer.hh"
#include "auth/default_authorizer.hh"
#include "auth/password_authenticator.hh"
#include "auth/permission.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
namespace auth {
static const sstring PACKAGE_NAME("com.scylladb.auth.");
static const sstring& transitional_authenticator_name() {
static const sstring name = PACKAGE_NAME + "TransitionalAuthenticator";
return name;
}
static const sstring& transitional_authorizer_name() {
static const sstring name = PACKAGE_NAME + "TransitionalAuthorizer";
return name;
}
class transitional_authenticator : public authenticator {
std::unique_ptr<authenticator> _authenticator;
public:
static const sstring PASSWORD_AUTHENTICATOR_NAME;
transitional_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
: transitional_authenticator(std::make_unique<password_authenticator>(qp, mm)) {
}
transitional_authenticator(std::unique_ptr<authenticator> a)
: _authenticator(std::move(a)) {
}
virtual future<> start() override {
return _authenticator->start();
}
virtual future<> stop() override {
return _authenticator->stop();
}
virtual const sstring& qualified_java_name() const override {
return transitional_authenticator_name();
}
virtual bool require_authentication() const override {
return true;
}
virtual authentication_option_set supported_options() const override {
return _authenticator->supported_options();
}
virtual authentication_option_set alterable_options() const override {
return _authenticator->alterable_options();
}
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override {
auto i = credentials.find(authenticator::USERNAME_KEY);
if ((i == credentials.end() || i->second.empty())
&& (!credentials.count(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {
// return anon user
return make_ready_future<authenticated_user>(anonymous_user());
}
return make_ready_future().then([this, &credentials] {
return _authenticator->authenticate(credentials);
}).handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::authentication_exception&) {
// return anon user
return make_ready_future<authenticated_user>(anonymous_user());
}
});
}
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override {
return _authenticator->create(role_name, options);
}
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override {
return _authenticator->alter(role_name, options);
}
virtual future<> drop(stdx::string_view role_name) const override {
return _authenticator->drop(role_name);
}
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
return _authenticator->query_custom_options(role_name);
}
virtual const resource_set& protected_resources() const override {
return _authenticator->protected_resources();
}
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
class sasl_wrapper : public sasl_challenge {
public:
sasl_wrapper(::shared_ptr<sasl_challenge> sasl)
: _sasl(std::move(sasl)) {
}
virtual bytes evaluate_response(bytes_view client_response) override {
try {
return _sasl->evaluate_response(client_response);
} catch (exceptions::authentication_exception&) {
_complete = true;
return {};
}
}
virtual bool is_complete() const override {
return _complete || _sasl->is_complete();
}
virtual future<authenticated_user> get_authenticated_user() const {
return futurize_apply([this] {
return _sasl->get_authenticated_user().handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::authentication_exception&) {
// return anon user
return make_ready_future<authenticated_user>(anonymous_user());
}
});
});
}
private:
::shared_ptr<sasl_challenge> _sasl;
bool _complete = false;
};
return ::make_shared<sasl_wrapper>(_authenticator->new_sasl_challenge());
}
};
class transitional_authorizer : public authorizer {
std::unique_ptr<authorizer> _authorizer;
public:
transitional_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
: transitional_authorizer(std::make_unique<default_authorizer>(qp, mm)) {
}
transitional_authorizer(std::unique_ptr<authorizer> a)
: _authorizer(std::move(a)) {
}
~transitional_authorizer() {
}
virtual future<> start() override {
return _authorizer->start();
}
virtual future<> stop() override {
return _authorizer->stop();
}
virtual const sstring& qualified_java_name() const override {
return transitional_authorizer_name();
}
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override {
static const permission_set transitional_permissions =
permission_set::of<
permission::CREATE,
permission::ALTER,
permission::DROP,
permission::SELECT,
permission::MODIFY>();
return make_ready_future<permission_set>(transitional_permissions);
}
virtual future<> grant(stdx::string_view s, permission_set ps, const resource& r) const override {
return _authorizer->grant(s, std::move(ps), r);
}
virtual future<> revoke(stdx::string_view s, permission_set ps, const resource& r) const override {
return _authorizer->revoke(s, std::move(ps), r);
}
virtual future<std::vector<permission_details>> list_all() const override {
return _authorizer->list_all();
}
virtual future<> revoke_all(stdx::string_view s) const override {
return _authorizer->revoke_all(s);
}
virtual future<> revoke_all(const resource& r) const override {
return _authorizer->revoke_all(r);
}
virtual const resource_set& protected_resources() const override {
return _authorizer->protected_resources();
}
};
}
//
// To ensure correct initialization order, we unfortunately need to use string literals.
//
static const class_registrator<
auth::authenticator,
auth::transitional_authenticator,
cql3::query_processor&,
::service::migration_manager&> transitional_authenticator_reg(auth::PACKAGE_NAME + "TransitionalAuthenticator");
static const class_registrator<
auth::authorizer,
auth::transitional_authorizer,
cql3::query_processor&,
::service::migration_manager&> transitional_authorizer_reg(auth::PACKAGE_NAME + "TransitionalAuthorizer");

View File

@@ -1,146 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/core/scheduling.hh>
#include <seastar/core/timer.hh>
#include <seastar/core/gate.hh>
#include <chrono>
// Simple proportional controller to adjust shares for processes for which a backlog can be clearly
// defined.
//
// Goal is to consume the backlog as fast as we can, but not so fast that we steal all the CPU from
// incoming requests, and at the same time minimize user-visible fluctuations in the quota.
//
// What that translates to is we'll try to keep the backlog's firt derivative at 0 (IOW, we keep
// backlog constant). As the backlog grows we increase CPU usage, decreasing CPU usage as the
// backlog diminishes.
//
// The exact point at which the controller stops determines the desired CPU usage. As the backlog
// grows and approach a maximum desired, we need to be more aggressive. We will therefore define two
// thresholds, and increase the constant as we cross them.
//
// Doing that divides the range in three (before the first, between first and second, and after
// second threshold), and we'll be slow to grow in the first region, grow normally in the second
// region, and aggressively in the third region.
//
// The constants q1 and q2 are used to determine the proportional factor at each stage.
class backlog_controller {
public:
future<> shutdown() {
_update_timer.cancel();
return std::move(_inflight_update);
}
protected:
struct control_point {
float input;
float output;
};
seastar::scheduling_group _scheduling_group;
const ::io_priority_class& _io_priority;
std::chrono::milliseconds _interval;
timer<> _update_timer;
std::vector<control_point> _control_points;
std::function<float()> _current_backlog;
// updating shares for an I/O class may contact another shard and returns a future.
future<> _inflight_update;
virtual void update_controller(float quota);
void adjust();
backlog_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval,
std::vector<control_point> control_points, std::function<float()> backlog)
: _scheduling_group(sg)
, _io_priority(iop)
, _interval(interval)
, _update_timer([this] { adjust(); })
, _control_points({{0,0}})
, _current_backlog(std::move(backlog))
, _inflight_update(make_ready_future<>())
{
_control_points.insert(_control_points.end(), control_points.begin(), control_points.end());
_update_timer.arm_periodic(_interval);
}
// Used when the controllers are disabled and a static share is used
// When that option is deprecated we should remove this.
backlog_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares)
: _scheduling_group(sg)
, _io_priority(iop)
, _inflight_update(make_ready_future<>())
{
update_controller(static_shares);
}
virtual ~backlog_controller() {}
public:
backlog_controller(backlog_controller&&) = default;
float backlog_of_shares(float shares) const;
seastar::scheduling_group sg() {
return _scheduling_group;
}
};
// memtable flush CPU controller.
//
// - First threshold is the soft limit line,
// - Maximum is the point in which we'd stop consuming request,
// - Second threshold is halfway between them.
//
// Below the soft limit, we are in no particular hurry to flush, since it means we're set to
// complete flushing before we a new memtable is ready. The quota is dirty * q1, and q1 is set to a
// low number.
//
// The first half of the virtual dirty region is where we expect to be usually, so we have a low
// slope corresponding to a sluggish response between q1 * soft_limit and q2.
//
// In the second half, we're getting close to the hard dirty limit so we increase the slope and
// become more responsive, up to a maximum quota of qmax.
class flush_controller : public backlog_controller {
static constexpr float hard_dirty_limit = 1.0f;
public:
flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, float soft_limit, std::function<float()> current_dirty)
: backlog_controller(sg, iop, std::move(interval),
std::vector<backlog_controller::control_point>({{soft_limit, 10}, {soft_limit + (hard_dirty_limit - soft_limit) / 2, 200} , {hard_dirty_limit, 1000}}),
std::move(current_dirty)
)
{}
};
class compaction_controller : public backlog_controller {
public:
static constexpr unsigned normalization_factor = 30;
static constexpr float disable_backlog = std::numeric_limits<double>::infinity();
static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); }
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, std::function<float()> current_backlog)
: backlog_controller(sg, iop, std::move(interval),
std::vector<backlog_controller::control_point>({{0.5, 10}, {1.5, 100} , {normalization_factor, 1000}}),
std::move(current_backlog)
)
{}
};

View File

@@ -21,17 +21,14 @@
#pragma once
#include "seastarx.hh"
#include "core/sstring.hh"
#include "hashing.hh"
#include <experimental/optional>
#include <iosfwd>
#include <functional>
#include "utils/mutable_view.hh"
using bytes = basic_sstring<int8_t, uint32_t, 31, false>;
using bytes = basic_sstring<int8_t, uint32_t, 31>;
using bytes_view = std::experimental::basic_string_view<int8_t>;
using bytes_mutable_view = basic_mutable_view<bytes_view::value_type>;
using bytes_opt = std::experimental::optional<bytes>;
using sstring_view = std::experimental::string_view;
@@ -78,11 +75,3 @@ struct appending_hash<bytes_view> {
h.update(reinterpret_cast<const char*>(v.begin()), v.size() * sizeof(bytes_view::value_type));
}
};
inline int32_t compare_unsigned(bytes_view v1, bytes_view v2) {
auto n = memcmp(v1.begin(), v2.begin(), std::min(v1.size(), v2.size()));
if (n) {
return n;
}
return (int32_t) (v1.size() - v2.size());
}

View File

@@ -65,9 +65,8 @@ private:
size_type _size;
public:
class fragment_iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
chunk* _current = nullptr;
chunk* _current;
public:
fragment_iterator() = default;
fragment_iterator(chunk* current) : _current(current) {}
fragment_iterator(const fragment_iterator&) = default;
fragment_iterator& operator=(const fragment_iterator&) = default;
@@ -290,24 +289,6 @@ public:
}
}
// Removes n bytes from the end of the bytes_ostream.
// Beware of O(n) algorithm.
void remove_suffix(size_t n) {
_size -= n;
auto left = _size;
auto current = _begin.get();
while (current) {
if (current->offset >= left) {
current->offset = left;
_current = current;
current->next.reset();
return;
}
left -= current->offset;
current = current->next.get();
}
}
// begin() and end() form an input range to bytes_view representing fragments.
// Any modification of this instance invalidates iterators.
fragment_iterator begin() const { return { _begin.get() }; }

View File

@@ -1,671 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include "row_cache.hh"
#include "mutation_reader.hh"
#include "mutation_fragment.hh"
#include "partition_version.hh"
#include "utils/logalloc.hh"
#include "query-request.hh"
#include "partition_snapshot_reader.hh"
#include "partition_snapshot_row_cursor.hh"
#include "read_context.hh"
#include "flat_mutation_reader.hh"
namespace cache {
extern logging::logger clogger;
class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
enum class state {
before_static_row,
// Invariants:
// - position_range(_lower_bound, _upper_bound) covers all not yet emitted positions from current range
// - if _next_row has valid iterators:
// - _next_row points to the nearest row in cache >= _lower_bound
// - _next_row_in_range = _next.position() < _upper_bound
// - if _next_row doesn't have valid iterators, it has no meaning.
reading_from_cache,
// Starts reading from underlying reader.
// The range to read is position_range(_lower_bound, min(_next_row.position(), _upper_bound)).
// Invariants:
// - _next_row_in_range = _next.position() < _upper_bound
move_to_underlying,
// Invariants:
// - Upper bound of the read is min(_next_row.position(), _upper_bound)
// - _next_row_in_range = _next.position() < _upper_bound
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
reading_from_underlying,
end_of_stream
};
partition_snapshot_ptr _snp;
position_in_partition::tri_compare _position_cmp;
query::clustering_key_filter_ranges _ck_ranges;
query::clustering_row_ranges::const_iterator _ck_ranges_curr;
query::clustering_row_ranges::const_iterator _ck_ranges_end;
lsa_manager _lsa_manager;
partition_snapshot_row_weakref _last_row;
// Holds the lower bound of a position range which hasn't been processed yet.
// Only rows with positions < _lower_bound have been emitted, and only
// range_tombstones with positions <= _lower_bound.
position_in_partition _lower_bound;
position_in_partition_view _upper_bound;
state _state = state::before_static_row;
lw_shared_ptr<read_context> _read_context;
partition_snapshot_row_cursor _next_row;
bool _next_row_in_range = false;
// Whether _lower_bound was changed within current fill_buffer().
// If it did not then we cannot break out of it (e.g. on preemption) because
// forward progress is not guaranteed in case iterators are getting constantly invalidated.
bool _lower_bound_changed = false;
future<> do_fill_buffer(db::timeout_clock::time_point);
void copy_from_cache_to_buffer();
future<> process_static_row(db::timeout_clock::time_point);
void move_to_end();
void move_to_next_range();
void move_to_range(query::clustering_row_ranges::const_iterator);
void move_to_next_entry();
void add_to_buffer(const partition_snapshot_row_cursor&);
void add_clustering_row_to_buffer(mutation_fragment&&);
void add_to_buffer(range_tombstone&&);
void add_to_buffer(mutation_fragment&&);
future<> read_from_underlying(db::timeout_clock::time_point);
void start_reading_from_underlying();
bool after_current_range(position_in_partition_view position);
bool can_populate() const;
// Marks the range between _last_row (exclusive) and _next_row (exclusive) as continuous,
// provided that the underlying reader still matches the latest version of the partition.
void maybe_update_continuity();
// Tries to ensure that the lower bound of the current population range exists.
// Returns false if it failed and range cannot be populated.
// Assumes can_populate().
bool ensure_population_lower_bound();
void maybe_add_to_cache(const mutation_fragment& mf);
void maybe_add_to_cache(const clustering_row& cr);
void maybe_add_to_cache(const range_tombstone& rt);
void maybe_add_to_cache(const static_row& sr);
void maybe_set_static_row_continuous();
void finish_reader() {
push_mutation_fragment(partition_end());
_end_of_stream = true;
_state = state::end_of_stream;
}
void touch_partition();
public:
cache_flat_mutation_reader(schema_ptr s,
dht::decorated_key dk,
query::clustering_key_filter_ranges&& crr,
lw_shared_ptr<read_context> ctx,
partition_snapshot_ptr snp,
row_cache& cache)
: flat_mutation_reader::impl(std::move(s))
, _snp(std::move(snp))
, _position_cmp(*_schema)
, _ck_ranges(std::move(crr))
, _ck_ranges_curr(_ck_ranges.begin())
, _ck_ranges_end(_ck_ranges.end())
, _lsa_manager(cache)
, _lower_bound(position_in_partition::before_all_clustered_rows())
, _upper_bound(position_in_partition_view::before_all_clustered_rows())
, _read_context(std::move(ctx))
, _next_row(*_schema, *_snp)
{
clogger.trace("csm {}: table={}.{}", this, _schema->ks_name(), _schema->cf_name());
push_mutation_fragment(partition_start(std::move(dk), _snp->partition_tombstone()));
}
cache_flat_mutation_reader(const cache_flat_mutation_reader&) = delete;
cache_flat_mutation_reader(cache_flat_mutation_reader&&) = delete;
virtual future<> fill_buffer(db::timeout_clock::time_point timeout) override;
virtual void next_partition() override {
clear_buffer_to_next_partition();
if (is_buffer_empty()) {
_end_of_stream = true;
}
}
virtual future<> fast_forward_to(const dht::partition_range&, db::timeout_clock::time_point timeout) override {
clear_buffer();
_end_of_stream = true;
return make_ready_future<>();
}
virtual future<> fast_forward_to(position_range pr, db::timeout_clock::time_point timeout) override {
throw std::bad_function_call();
}
};
inline
future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_point timeout) {
if (_snp->static_row_continuous()) {
_read_context->cache().on_row_hit();
static_row sr = _lsa_manager.run_in_read_section([this] {
return _snp->static_row(_read_context->digest_requested());
});
if (!sr.empty()) {
push_mutation_fragment(mutation_fragment(std::move(sr)));
}
return make_ready_future<>();
} else {
_read_context->cache().on_row_miss();
return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
});
}
}
inline
void cache_flat_mutation_reader::touch_partition() {
if (_snp->at_latest_version()) {
rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
_snp->tracker()->touch(last_dummy);
}
}
inline
future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::before_static_row) {
auto after_static_row = [this, timeout] {
if (_ck_ranges_curr == _ck_ranges_end) {
touch_partition();
finish_reader();
return make_ready_future<>();
}
_state = state::reading_from_cache;
_lsa_manager.run_in_read_section([this] {
move_to_range(_ck_ranges_curr);
});
return fill_buffer(timeout);
};
if (_schema->has_static_columns()) {
return process_static_row(timeout).then(std::move(after_static_row));
} else {
return after_static_row();
}
}
clogger.trace("csm {}: fill_buffer(), range={}, lb={}", this, *_ck_ranges_curr, _lower_bound);
return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this, timeout] {
return do_fill_buffer(timeout);
});
}
inline
future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::move_to_underlying) {
_state = state::reading_from_underlying;
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
return read_from_underlying(timeout);
});
}
if (_state == state::reading_from_underlying) {
return read_from_underlying(timeout);
}
// assert(_state == state::reading_from_cache)
return _lsa_manager.run_in_read_section([this] {
auto next_valid = _next_row.iterators_valid();
clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", this, _lower_bound,
_upper_bound, _next_row.position(), next_valid);
// We assume that if there was eviction, and thus the range may
// no longer be continuous, the cursor was invalidated.
if (!next_valid) {
auto adjacent = _next_row.advance_to(_lower_bound);
_next_row_in_range = !after_current_range(_next_row.position());
if (!adjacent && !_next_row.continuous()) {
_last_row = nullptr; // We could insert a dummy here, but this path is unlikely.
start_reading_from_underlying();
return make_ready_future<>();
}
}
_next_row.maybe_refresh();
clogger.trace("csm {}: next={}, cont={}", this, _next_row.position(), _next_row.continuous());
_lower_bound_changed = false;
while (_state == state::reading_from_cache) {
copy_from_cache_to_buffer();
// We need to check _lower_bound_changed even if is_buffer_full() because
// we may have emitted only a range tombstone which overlapped with _lower_bound
// and thus didn't cause _lower_bound to change.
if ((need_preempt() || is_buffer_full()) && _lower_bound_changed) {
break;
}
}
return make_ready_future<>();
});
}
inline
future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
return consume_mutation_fragments_until(_read_context->underlying().underlying(),
[this] { return _state != state::reading_from_underlying || is_buffer_full(); },
[this] (mutation_fragment mf) {
_read_context->cache().on_row_miss();
maybe_add_to_cache(mf);
add_to_buffer(std::move(mf));
},
[this] {
_state = state::reading_from_cache;
_lsa_manager.run_in_update_section([this] {
auto same_pos = _next_row.maybe_refresh();
if (!same_pos) {
_read_context->cache().on_mispopulate(); // FIXME: Insert dummy entry at _upper_bound.
_next_row_in_range = !after_current_range(_next_row.position());
if (!_next_row.continuous()) {
start_reading_from_underlying();
}
return;
}
if (_next_row_in_range) {
maybe_update_continuity();
_last_row = _next_row;
add_to_buffer(_next_row);
try {
move_to_next_entry();
} catch (const std::bad_alloc&) {
// We cannot reenter the section, since we may have moved to the new range, and
// because add_to_buffer() should not be repeated.
_snp->region().allocator().invalidate_references(); // Invalidates _next_row
}
} else {
if (no_clustering_row_between(*_schema, _upper_bound, _next_row.position())) {
this->maybe_update_continuity();
} else if (can_populate()) {
rows_entry::compare less(*_schema);
auto& rows = _snp->version()->partition().clustered_rows();
if (query::is_single_row(*_schema, *_ck_ranges_curr)) {
with_allocator(_snp->region().allocator(), [&] {
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(_ck_ranges_curr->start()->value()));
// Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
auto inserted = insert_result.second;
auto it = insert_result.first;
if (inserted) {
_snp->tracker()->insert(*e);
e.release();
auto next = std::next(it);
it->set_continuous(next->continuous());
clogger.trace("csm {}: inserted dummy at {}, cont={}", this, it->position(), it->continuous());
}
});
} else if (ensure_population_lower_bound()) {
with_allocator(_snp->region().allocator(), [&] {
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(*_schema, _upper_bound, is_dummy::yes, is_continuous::yes));
// Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
auto inserted = insert_result.second;
if (inserted) {
clogger.trace("csm {}: inserted dummy at {}", this, _upper_bound);
_snp->tracker()->insert(*e);
e.release();
} else {
clogger.trace("csm {}: mark {} as continuous", this, insert_result.first->position());
insert_result.first->set_continuous(true);
}
});
}
} else {
_read_context->cache().on_mispopulate();
}
try {
move_to_next_range();
} catch (const std::bad_alloc&) {
// We cannot reenter the section, since we may have moved to the new range
_snp->region().allocator().invalidate_references(); // Invalidates _next_row
}
}
});
return make_ready_future<>();
});
}
inline
bool cache_flat_mutation_reader::ensure_population_lower_bound() {
if (!_ck_ranges_curr->start()) {
return true;
}
if (!_last_row.refresh(*_snp)) {
return false;
}
// Continuity flag we will later set for the upper bound extends to the previous row in the same version,
// so we need to ensure we have an entry in the latest version.
if (!_last_row.is_in_latest_version()) {
with_allocator(_snp->region().allocator(), [&] {
auto& rows = _snp->version()->partition().clustered_rows();
rows_entry::compare less(*_schema);
// FIXME: Avoid the copy by inserting an incomplete clustering row
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(*_schema, *_last_row));
e->set_continuous(false);
auto insert_result = rows.insert_check(rows.end(), *e, less);
auto inserted = insert_result.second;
if (inserted) {
clogger.trace("csm {}: inserted lower bound dummy at {}", this, e->position());
_snp->tracker()->insert(*e);
e.release();
}
});
}
return true;
}
inline
void cache_flat_mutation_reader::maybe_update_continuity() {
if (can_populate() && ensure_population_lower_bound()) {
with_allocator(_snp->region().allocator(), [&] {
rows_entry& e = _next_row.ensure_entry_in_latest().row;
e.set_continuous(true);
});
} else {
_read_context->cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const mutation_fragment& mf) {
if (mf.is_range_tombstone()) {
maybe_add_to_cache(mf.as_range_tombstone());
} else {
assert(mf.is_clustering_row());
const clustering_row& cr = mf.as_clustering_row();
maybe_add_to_cache(cr);
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (!can_populate()) {
_last_row = nullptr;
_read_context->cache().on_mispopulate();
return;
}
clogger.trace("csm {}: populate({})", this, cr);
_lsa_manager.run_in_update_section_with_allocator([this, &cr] {
mutation_partition& mp = _snp->version()->partition();
rows_entry::compare less(*_schema);
if (_read_context->digest_requested()) {
cr.cells().prepare_hash(*_schema, column_kind::regular_column);
}
auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(*_schema, cr.key(), cr.tomb(), cr.marker(), cr.cells()));
new_entry->set_continuous(false);
auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
: mp.clustered_rows().lower_bound(cr.key(), less);
auto insert_result = mp.clustered_rows().insert_check(it, *new_entry, less);
if (insert_result.second) {
_snp->tracker()->insert(*new_entry);
new_entry.release();
}
it = insert_result.first;
rows_entry& e = *it;
if (ensure_population_lower_bound()) {
clogger.trace("csm {}: set_continuous({})", this, e.position());
e.set_continuous(true);
} else {
_read_context->cache().on_mispopulate();
}
with_allocator(standard_allocator(), [&] {
_last_row = partition_snapshot_row_weakref(*_snp, it, true);
});
});
}
inline
bool cache_flat_mutation_reader::after_current_range(position_in_partition_view p) {
return _position_cmp(p, _upper_bound) >= 0;
}
inline
void cache_flat_mutation_reader::start_reading_from_underlying() {
clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", this, _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);
_state = state::move_to_underlying;
_next_row.touch();
}
inline
void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", this, _next_row.position(), _next_row_in_range);
_next_row.touch();
position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
for (auto &&rts : _snp->range_tombstones(_lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
position_in_partition::less_compare less(*_schema);
// This guarantees that rts starts after any emitted clustering_row
// and not before any emitted range tombstone.
if (!less(_lower_bound, rts.position())) {
rts.set_start(*_schema, _lower_bound);
} else {
_lower_bound = position_in_partition(rts.position());
_lower_bound_changed = true;
if (is_buffer_full()) {
return;
}
}
push_mutation_fragment(std::move(rts));
}
// We add the row to the buffer even when it's full.
// This simplifies the code. For more info see #3139.
if (_next_row_in_range) {
_last_row = _next_row;
add_to_buffer(_next_row);
move_to_next_entry();
} else {
move_to_next_range();
}
}
inline
void cache_flat_mutation_reader::move_to_end() {
finish_reader();
clogger.trace("csm {}: eos", this);
}
inline
void cache_flat_mutation_reader::move_to_next_range() {
auto next_it = std::next(_ck_ranges_curr);
if (next_it == _ck_ranges_end) {
move_to_end();
_ck_ranges_curr = next_it;
} else {
move_to_range(next_it);
}
}
inline
void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::const_iterator next_it) {
auto lb = position_in_partition::for_range_start(*next_it);
auto ub = position_in_partition_view::for_range_end(*next_it);
_last_row = nullptr;
_lower_bound = std::move(lb);
_upper_bound = std::move(ub);
_lower_bound_changed = true;
_ck_ranges_curr = next_it;
auto adjacent = _next_row.advance_to(_lower_bound);
_next_row_in_range = !after_current_range(_next_row.position());
clogger.trace("csm {}: move_to_range(), range={}, lb={}, ub={}, next={}", this, *_ck_ranges_curr, _lower_bound, _upper_bound, _next_row.position());
if (!adjacent && !_next_row.continuous()) {
// FIXME: We don't insert a dummy for singular range to avoid allocating 3 entries
// for a hit (before, at and after). If we supported the concept of an incomplete row,
// we could insert such a row for the lower bound if it's full instead, for both singular and
// non-singular ranges.
if (_ck_ranges_curr->start() && !query::is_single_row(*_schema, *_ck_ranges_curr)) {
// Insert dummy for lower bound
if (can_populate()) {
// FIXME: _lower_bound could be adjacent to the previous row, in which case we could skip this
clogger.trace("csm {}: insert dummy at {}", this, _lower_bound);
auto it = with_allocator(_lsa_manager.region().allocator(), [&] {
auto& rows = _snp->version()->partition().clustered_rows();
auto new_entry = current_allocator().construct<rows_entry>(*_schema, _lower_bound, is_dummy::yes, is_continuous::no);
return rows.insert_before(_next_row.get_iterator_in_latest_version(), *new_entry);
});
_snp->tracker()->insert(*it);
_last_row = partition_snapshot_row_weakref(*_snp, it, true);
} else {
_read_context->cache().on_mispopulate();
}
}
start_reading_from_underlying();
}
}
// _next_row must be inside the range.
inline
void cache_flat_mutation_reader::move_to_next_entry() {
clogger.trace("csm {}: move_to_next_entry(), curr={}", this, _next_row.position());
if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
move_to_next_range();
} else {
if (!_next_row.next()) {
move_to_end();
return;
}
_next_row_in_range = !after_current_range(_next_row.position());
clogger.trace("csm {}: next={}, cont={}, in_range={}", this, _next_row.position(), _next_row.continuous(), _next_row_in_range);
if (!_next_row.continuous()) {
start_reading_from_underlying();
}
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_to_buffer({})", this, mf);
if (mf.is_clustering_row()) {
add_clustering_row_to_buffer(std::move(mf));
} else {
assert(mf.is_range_tombstone());
add_to_buffer(std::move(mf).as_range_tombstone());
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {
if (!row.dummy()) {
_read_context->cache().on_row_hit();
add_clustering_row_to_buffer(row.row(_read_context->digest_requested()));
}
}
// Maintains the following invariants, also in case of exception:
// (1) no fragment with position >= _lower_bound was pushed yet
// (2) If _lower_bound > mf.position(), mf was emitted
inline
void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mf);
auto& row = mf.as_clustering_row();
auto new_lower_bound = position_in_partition::after_key(row.key());
push_mutation_fragment(std::move(mf));
_lower_bound = std::move(new_lower_bound);
_lower_bound_changed = true;
}
inline
void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
clogger.trace("csm {}: add_to_buffer({})", this, rt);
// This guarantees that rt starts after any emitted clustering_row
// and not before any emitted range tombstone.
position_in_partition::less_compare less(*_schema);
if (!less(_lower_bound, rt.end_position())) {
return;
}
if (!less(_lower_bound, rt.position())) {
rt.set_start(*_schema, _lower_bound);
} else {
_lower_bound = position_in_partition(rt.position());
_lower_bound_changed = true;
}
push_mutation_fragment(std::move(rt));
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {
if (can_populate()) {
clogger.trace("csm {}: maybe_add_to_cache({})", this, rt);
_lsa_manager.run_in_update_section_with_allocator([&] {
_snp->version()->partition().row_tombstones().apply_monotonically(*_schema, rt);
});
} else {
_read_context->cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
if (can_populate()) {
clogger.trace("csm {}: populate({})", this, sr);
_read_context->cache().on_static_row_insert();
_lsa_manager.run_in_update_section_with_allocator([&] {
if (_read_context->digest_requested()) {
sr.cells().prepare_hash(*_schema, column_kind::static_column);
}
_snp->version()->partition().static_row().apply(*_schema, column_kind::static_column, sr.cells());
});
} else {
_read_context->cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_set_static_row_continuous() {
if (can_populate()) {
clogger.trace("csm {}: set static row continuous", this);
_snp->version()->partition().set_static_row_continuous(true);
} else {
_read_context->cache().on_mispopulate();
}
}
inline
bool cache_flat_mutation_reader::can_populate() const {
return _snp->at_latest_version() && _read_context->cache().phase_of(_read_context->key()) == _read_context->phase();
}
} // namespace cache
inline flat_mutation_reader make_cache_flat_mutation_reader(schema_ptr s,
dht::decorated_key dk,
query::clustering_key_filter_ranges crr,
row_cache& cache,
lw_shared_ptr<cache::read_context> ctx,
partition_snapshot_ptr snp)
{
return make_flat_mutation_reader<cache::cache_flat_mutation_reader>(
std::move(s), std::move(dk), std::move(crr), std::move(ctx), std::move(snp), cache);
}

View File

@@ -24,7 +24,6 @@
#include <boost/lexical_cast.hpp>
#include "exceptions/exceptions.hh"
#include "json.hh"
#include "seastarx.hh"
class schema;
@@ -59,34 +58,30 @@ class caching_options {
caching_options() : _key_cache(default_key), _row_cache(default_row) {}
public:
std::map<sstring, sstring> to_map() const {
return {{ "keys", _key_cache }, { "rows_per_partition", _row_cache }};
}
sstring to_sstring() const {
return json::to_json(to_map());
return json::to_json(std::map<sstring, sstring>({{ "keys", _key_cache }, { "rows_per_partition", _row_cache }}));
}
template<typename Map>
static caching_options from_map(const Map & map) {
sstring k = default_key;
sstring r = default_row;
static caching_options from_sstring(const sstring& str) {
auto map = json::to_map(str);
if (map.size() > 2) {
throw exceptions::configuration_exception("Invalid map: " + str);
}
sstring k;
sstring r;
if (map.count("keys")) {
k = map.at("keys");
} else {
k = default_key;
}
for (auto& p : map) {
if (p.first == "keys") {
k = p.second;
} else if (p.first == "rows_per_partition") {
r = p.second;
} else {
throw exceptions::configuration_exception("Invalid caching option: " + p.first);
}
if (map.count("rows_per_partition")) {
r = map.at("rows_per_partition");
} else {
r = default_row;
}
return caching_options(k, r);
}
static caching_options from_sstring(const sstring& str) {
return from_map(json::to_map(str));
}
bool operator==(const caching_options& other) const {
return _key_cache == other._key_cache && _row_cache == other._row_cache;
}

View File

@@ -22,7 +22,6 @@
#include "canonical_mutation.hh"
#include "mutation.hh"
#include "mutation_partition_serializer.hh"
#include "counters.hh"
#include "converting_mutation_partition_applier.hh"
#include "hashing_partition_visitor.hh"
#include "utils/UUID.hh"
@@ -45,7 +44,7 @@ canonical_mutation::canonical_mutation(const mutation& m)
mutation_partition_serializer part_ser(*m.schema(), m.partition());
bytes_ostream out;
ser::writer_of_canonical_mutation<bytes_ostream> wr(out);
ser::writer_of_canonical_mutation wr(out);
std::move(wr).write_table_id(m.schema()->id())
.write_schema_version(m.schema()->version())
.write_key(m.key())
@@ -75,7 +74,7 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
auto version = mv.schema_version();
auto pk = mv.key();
mutation m(std::move(s), std::move(pk));
mutation m(std::move(pk), std::move(s));
if (version == m.schema()->version()) {
auto partition_view = mutation_partition_view::from_view(mv.partition());

View File

@@ -1,550 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <boost/intrusive/unordered_set.hpp>
#include "utils/small_vector.hh"
#include "fnv1a_hasher.hh"
#include "mutation_fragment.hh"
#include "mutation_partition.hh"
#include "db/timeout_clock.hh"
class cells_range {
using ids_vector_type = utils::small_vector<column_id, 5>;
position_in_partition_view _position;
ids_vector_type _ids;
public:
using iterator = ids_vector_type::iterator;
using const_iterator = ids_vector_type::const_iterator;
cells_range()
: _position(position_in_partition_view(position_in_partition_view::static_row_tag_t())) { }
explicit cells_range(position_in_partition_view pos, const row& cells)
: _position(pos)
{
_ids.reserve(cells.size());
cells.for_each_cell([this] (auto id, auto&&) {
_ids.emplace_back(id);
});
}
position_in_partition_view position() const { return _position; }
bool empty() const { return _ids.empty(); }
auto begin() const { return _ids.begin(); }
auto end() const { return _ids.end(); }
};
class partition_cells_range {
const mutation_partition& _mp;
public:
class iterator {
const mutation_partition& _mp;
stdx::optional<mutation_partition::rows_type::const_iterator> _position;
cells_range _current;
public:
explicit iterator(const mutation_partition& mp)
: _mp(mp)
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row())
{ }
iterator(const mutation_partition& mp, mutation_partition::rows_type::const_iterator it)
: _mp(mp)
, _position(it)
{ }
iterator& operator++() {
if (!_position) {
_position = _mp.clustered_rows().begin();
} else {
++(*_position);
}
if (_position != _mp.clustered_rows().end()) {
auto it = *_position;
_current = cells_range(position_in_partition_view(position_in_partition_view::clustering_row_tag_t(), it->key()),
it->row().cells());
}
return *this;
}
iterator operator++(int) {
iterator it(*this);
operator++();
return it;
}
cells_range& operator*() {
return _current;
}
cells_range* operator->() {
return &_current;
}
bool operator==(const iterator& other) const {
return _position == other._position;
}
bool operator!=(const iterator& other) const {
return !(*this == other);
}
};
public:
explicit partition_cells_range(const mutation_partition& mp) : _mp(mp) { }
iterator begin() const {
return iterator(_mp);
}
iterator end() const {
return iterator(_mp, _mp.clustered_rows().end());
}
};
class locked_cell;
struct cell_locker_stats {
uint64_t lock_acquisitions = 0;
uint64_t operations_waiting_for_lock = 0;
};
class cell_locker {
private:
class partition_entry;
struct cell_address {
position_in_partition position;
column_id id;
};
class cell_entry : public bi::unordered_set_base_hook<bi::link_mode<bi::auto_unlink>>,
public enable_lw_shared_from_this<cell_entry> {
partition_entry& _parent;
cell_address _address;
db::timeout_semaphore _semaphore { 0 };
friend class cell_locker;
public:
cell_entry(partition_entry& parent, position_in_partition position, column_id id)
: _parent(parent)
, _address { std::move(position), id }
{ }
// Upgrades cell_entry to another schema.
// Changes the value of cell_address, so cell_entry has to be
// temporarily removed from its parent partition_entry.
// Returns true if the cell_entry still exist in the new schema and
// should be reinserted.
bool upgrade(const schema& from, const schema& to, column_kind kind) noexcept {
auto& old_column_mapping = from.get_column_mapping();
auto& column = old_column_mapping.column_at(kind, _address.id);
auto cdef = to.get_column_definition(column.name());
if (!cdef) {
return false;
}
_address.id = cdef->id;
return true;
}
const position_in_partition& position() const {
return _address.position;
}
future<> lock(db::timeout_clock::time_point _timeout) {
return _semaphore.wait(_timeout);
}
void unlock() {
_semaphore.signal();
}
~cell_entry() {
if (!is_linked()) {
return;
}
unlink();
if (!--_parent._cell_count) {
delete &_parent;
}
}
class hasher {
const schema* _schema; // pointer instead of reference for default assignment
public:
explicit hasher(const schema& s) : _schema(&s) { }
size_t operator()(const cell_address& ca) const {
fnv1a_hasher hasher;
ca.position.feed_hash(hasher, *_schema);
::feed_hash(hasher, ca.id);
return hasher.finalize();
}
size_t operator()(const cell_entry& ce) const {
return operator()(ce._address);
}
};
class equal_compare {
position_in_partition::equal_compare _cmp;
private:
bool do_compare(const cell_address& a, const cell_address& b) const {
return a.id == b.id && _cmp(a.position, b.position);
}
public:
explicit equal_compare(const schema& s) : _cmp(s) { }
bool operator()(const cell_address& ca, const cell_entry& ce) const {
return do_compare(ca, ce._address);
}
bool operator()(const cell_entry& ce, const cell_address& ca) const {
return do_compare(ca, ce._address);
}
bool operator()(const cell_entry& a, const cell_entry& b) const {
return do_compare(a._address, b._address);
}
};
};
class partition_entry : public bi::unordered_set_base_hook<bi::link_mode<bi::auto_unlink>> {
using cells_type = bi::unordered_set<cell_entry,
bi::equal<cell_entry::equal_compare>,
bi::hash<cell_entry::hasher>,
bi::constant_time_size<false>>;
static constexpr size_t initial_bucket_count = 16;
using max_load_factor = std::ratio<3, 4>;
dht::decorated_key _key;
cell_locker& _parent;
size_t _rehash_at_size = compute_rehash_at_size(initial_bucket_count);
std::unique_ptr<cells_type::bucket_type[]> _buckets; // TODO: start with internal storage?
size_t _cell_count = 0; // cells_type::empty() is not O(1) if the hook is auto-unlink
cells_type::bucket_type _internal_buckets[initial_bucket_count];
cells_type _cells;
schema_ptr _schema;
friend class cell_entry;
private:
static constexpr size_t compute_rehash_at_size(size_t bucket_count) {
return bucket_count * max_load_factor::num / max_load_factor::den;
}
void maybe_rehash() {
if (_cell_count >= _rehash_at_size) {
auto new_bucket_count = std::min(_cells.bucket_count() * 2, _cells.bucket_count() + 1024);
auto buckets = std::make_unique<cells_type::bucket_type[]>(new_bucket_count);
_cells.rehash(cells_type::bucket_traits(buckets.get(), new_bucket_count));
_buckets = std::move(buckets);
_rehash_at_size = compute_rehash_at_size(new_bucket_count);
}
}
public:
partition_entry(schema_ptr s, cell_locker& parent, const dht::decorated_key& dk)
: _key(dk)
, _parent(parent)
, _cells(cells_type::bucket_traits(_internal_buckets, initial_bucket_count),
cell_entry::hasher(*s), cell_entry::equal_compare(*s))
, _schema(s)
{ }
~partition_entry() {
if (is_linked()) {
_parent._partition_count--;
}
}
// Upgrades partition entry to new schema. Returns false if all
// cell_entries has been removed during the upgrade.
bool upgrade(schema_ptr new_schema);
void insert(lw_shared_ptr<cell_entry> cell) {
_cells.insert(*cell);
_cell_count++;
maybe_rehash();
}
cells_type& cells() {
return _cells;
}
struct hasher {
size_t operator()(const dht::decorated_key& dk) const {
return std::hash<dht::decorated_key>()(dk);
}
size_t operator()(const partition_entry& pe) const {
return operator()(pe._key);
}
};
class equal_compare {
dht::decorated_key_equals_comparator _cmp;
public:
explicit equal_compare(const schema& s) : _cmp(s) { }
bool operator()(const dht::decorated_key& dk, const partition_entry& pe) {
return _cmp(dk, pe._key);
}
bool operator()(const partition_entry& pe, const dht::decorated_key& dk) {
return _cmp(dk, pe._key);
}
bool operator()(const partition_entry& a, const partition_entry& b) {
return _cmp(a._key, b._key);
}
};
};
using partitions_type = bi::unordered_set<partition_entry,
bi::equal<partition_entry::equal_compare>,
bi::hash<partition_entry::hasher>,
bi::constant_time_size<false>>;
static constexpr size_t initial_bucket_count = 4 * 1024;
using max_load_factor = std::ratio<3, 4>;
std::unique_ptr<partitions_type::bucket_type[]> _buckets;
partitions_type _partitions;
size_t _partition_count = 0;
size_t _rehash_at_size = compute_rehash_at_size(initial_bucket_count);
schema_ptr _schema;
// partitions_type uses equality comparator which keeps a reference to the
// original schema, we must ensure that it doesn't die.
schema_ptr _original_schema;
cell_locker_stats& _stats;
friend class locked_cell;
private:
struct locker;
static constexpr size_t compute_rehash_at_size(size_t bucket_count) {
return bucket_count * max_load_factor::num / max_load_factor::den;
}
void maybe_rehash() {
if (_partition_count >= _rehash_at_size) {
auto new_bucket_count = std::min(_partitions.bucket_count() * 2, _partitions.bucket_count() + 64 * 1024);
auto buckets = std::make_unique<partitions_type::bucket_type[]>(new_bucket_count);
_partitions.rehash(partitions_type::bucket_traits(buckets.get(), new_bucket_count));
_buckets = std::move(buckets);
_rehash_at_size = compute_rehash_at_size(new_bucket_count);
}
}
public:
explicit cell_locker(schema_ptr s, cell_locker_stats& stats)
: _buckets(std::make_unique<partitions_type::bucket_type[]>(initial_bucket_count))
, _partitions(partitions_type::bucket_traits(_buckets.get(), initial_bucket_count),
partition_entry::hasher(), partition_entry::equal_compare(*s))
, _schema(s)
, _original_schema(std::move(s))
, _stats(stats)
{ }
~cell_locker() {
assert(_partitions.empty());
}
void set_schema(schema_ptr s) {
_schema = s;
}
schema_ptr schema() const {
return _schema;
}
// partition_cells_range is required to be in cell_locker::schema()
future<std::vector<locked_cell>> lock_cells(const dht::decorated_key& dk, partition_cells_range&& range,
db::timeout_clock::time_point timeout);
};
class locked_cell {
lw_shared_ptr<cell_locker::cell_entry> _entry;
public:
explicit locked_cell(lw_shared_ptr<cell_locker::cell_entry> entry)
: _entry(std::move(entry)) { }
locked_cell(const locked_cell&) = delete;
locked_cell(locked_cell&&) = default;
~locked_cell() {
if (_entry) {
_entry->unlock();
}
}
};
struct cell_locker::locker {
cell_entry::hasher _hasher;
cell_entry::equal_compare _eq_cmp;
partition_entry& _partition_entry;
partition_cells_range _range;
partition_cells_range::iterator _current_ck;
cells_range::const_iterator _current_cell;
db::timeout_clock::time_point _timeout;
std::vector<locked_cell> _locks;
cell_locker_stats& _stats;
private:
void update_ck() {
if (!is_done()) {
_current_cell = _current_ck->begin();
}
}
future<> lock_next();
bool is_done() const { return _current_ck == _range.end(); }
public:
explicit locker(const ::schema& s, cell_locker_stats& st, partition_entry& pe, partition_cells_range&& range, db::timeout_clock::time_point timeout)
: _hasher(s)
, _eq_cmp(s)
, _partition_entry(pe)
, _range(std::move(range))
, _current_ck(_range.begin())
, _timeout(timeout)
, _stats(st)
{
update_ck();
}
locker(const locker&) = delete;
locker(locker&&) = delete;
future<> lock_all() {
// Cannot defer before first call to lock_next().
return lock_next().then([this] {
return do_until([this] { return is_done(); }, [this] {
return lock_next();
});
});
}
std::vector<locked_cell> get() && { return std::move(_locks); }
};
inline
future<std::vector<locked_cell>> cell_locker::lock_cells(const dht::decorated_key& dk, partition_cells_range&& range, db::timeout_clock::time_point timeout) {
partition_entry::hasher pe_hash;
partition_entry::equal_compare pe_eq(*_schema);
auto it = _partitions.find(dk, pe_hash, pe_eq);
std::unique_ptr<partition_entry> partition;
if (it == _partitions.end()) {
partition = std::make_unique<partition_entry>(_schema, *this, dk);
} else if (!it->upgrade(_schema)) {
partition = std::unique_ptr<partition_entry>(&*it);
_partition_count--;
_partitions.erase(it);
}
if (partition) {
std::vector<locked_cell> locks;
for (auto&& r : range) {
if (r.empty()) {
continue;
}
for (auto&& c : r) {
auto cell = make_lw_shared<cell_entry>(*partition, position_in_partition(r.position()), c);
_stats.lock_acquisitions++;
partition->insert(cell);
locks.emplace_back(std::move(cell));
}
}
if (!locks.empty()) {
_partitions.insert(*partition.release());
_partition_count++;
maybe_rehash();
}
return make_ready_future<std::vector<locked_cell>>(std::move(locks));
}
auto l = std::make_unique<locker>(*_schema, _stats, *it, std::move(range), timeout);
auto f = l->lock_all();
return f.then([l = std::move(l)] {
return std::move(*l).get();
});
}
inline
future<> cell_locker::locker::lock_next() {
while (!is_done()) {
if (_current_cell == _current_ck->end()) {
++_current_ck;
update_ck();
continue;
}
auto cid = *_current_cell++;
cell_address ca { position_in_partition(_current_ck->position()), cid };
auto it = _partition_entry.cells().find(ca, _hasher, _eq_cmp);
if (it != _partition_entry.cells().end()) {
_stats.operations_waiting_for_lock++;
return it->lock(_timeout).then([this, ce = it->shared_from_this()] () mutable {
_stats.operations_waiting_for_lock--;
_stats.lock_acquisitions++;
_locks.emplace_back(std::move(ce));
});
}
auto cell = make_lw_shared<cell_entry>(_partition_entry, position_in_partition(_current_ck->position()), cid);
_stats.lock_acquisitions++;
_partition_entry.insert(cell);
_locks.emplace_back(std::move(cell));
}
return make_ready_future<>();
}
inline
bool cell_locker::partition_entry::upgrade(schema_ptr new_schema) {
if (_schema == new_schema) {
return true;
}
auto buckets = std::make_unique<cells_type::bucket_type[]>(_cells.bucket_count());
auto cells = cells_type(cells_type::bucket_traits(buckets.get(), _cells.bucket_count()),
cell_entry::hasher(*new_schema), cell_entry::equal_compare(*new_schema));
_cells.clear_and_dispose([&] (cell_entry* cell_ptr) noexcept {
auto& cell = *cell_ptr;
auto kind = cell.position().is_static_row() ? column_kind::static_column
: column_kind::regular_column;
auto reinsert = cell.upgrade(*_schema, *new_schema, kind);
if (reinsert) {
cells.insert(cell);
} else {
_cell_count--;
}
});
// bi::unordered_set move assignment is actually a swap.
// Original _buckets cannot be destroyed before the container using them is
// so we need to explicitly make sure that the original _cells is no more.
_cells = std::move(cells);
auto destroy = [] (auto) { };
destroy(std::move(cells));
_buckets = std::move(buckets);
_schema = new_schema;
return _cell_count;
}

View File

@@ -27,125 +27,125 @@
class checked_file_impl : public file_impl {
public:
checked_file_impl(const io_error_handler& error_handler, file f)
: _error_handler(error_handler), _file(f) {
checked_file_impl(disk_error_signal_type& s, file f)
: _signal(s) , _file(f) {
_memory_dma_alignment = f.memory_dma_alignment();
_disk_read_dma_alignment = f.disk_read_dma_alignment();
_disk_write_dma_alignment = f.disk_write_dma_alignment();
}
virtual future<size_t> write_dma(uint64_t pos, const void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> write_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, iov, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, iov, pc);
});
}
virtual future<> flush(void) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->flush();
});
}
virtual future<struct stat> stat(void) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->stat();
});
}
virtual future<> truncate(uint64_t length) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->truncate(length);
});
}
virtual future<> discard(uint64_t offset, uint64_t length) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->discard(offset, length);
});
}
virtual future<> allocate(uint64_t position, uint64_t length) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->allocate(position, length);
});
}
virtual future<uint64_t> size(void) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->size();
});
}
virtual future<> close() override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->close();
});
}
// returns a handle for plain file, so make_checked_file() should be called
// on file returned by handle.
virtual std::unique_ptr<seastar::file_handle_impl> dup() override {
return get_file_impl(_file)->dup();
}
virtual subscription<directory_entry> list_directory(std::function<future<> (directory_entry de)> next) override {
return do_io_check(_error_handler, [&] {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->list_directory(next);
});
}
virtual future<temporary_buffer<uint8_t>> dma_read_bulk(uint64_t offset, size_t range_size, const io_priority_class& pc) override {
return do_io_check(_error_handler, [&] {
return get_file_impl(_file)->dma_read_bulk(offset, range_size, pc);
});
}
private:
const io_error_handler& _error_handler;
disk_error_signal_type &_signal;
file _file;
};
inline file make_checked_file(const io_error_handler& error_handler, file f)
inline file make_checked_file(disk_error_signal_type& signal, file& f)
{
return file(::make_shared<checked_file_impl>(error_handler, f));
return file(::make_shared<checked_file_impl>(signal, f));
}
future<file>
inline open_checked_file_dma(const io_error_handler& error_handler,
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags,
file_open_options options = {})
file_open_options options)
{
return do_io_check(error_handler, [&] {
return do_io_check(signal, [&] {
return open_file_dma(name, flags, options).then([&] (file f) {
return make_ready_future<file>(make_checked_file(error_handler, f));
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_directory(const io_error_handler& error_handler,
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags)
{
return do_io_check(signal, [&] {
return open_file_dma(name, flags).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_directory(disk_error_signal_type& signal,
sstring name)
{
return do_io_check(error_handler, [&] {
return do_io_check(signal, [&] {
return engine().open_directory(name).then([&] (file f) {
return make_ready_future<file>(make_checked_file(error_handler, f));
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}

View File

@@ -1,49 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <algorithm>
#include <atomic>
#include <chrono>
#include <cstdint>
extern std::atomic<int64_t> clocks_offset;
template<typename Duration>
static inline void forward_jump_clocks(Duration delta)
{
auto d = std::chrono::duration_cast<std::chrono::seconds>(delta).count();
clocks_offset.fetch_add(d, std::memory_order_relaxed);
}
static inline std::chrono::seconds get_clocks_offset()
{
auto off = clocks_offset.load(std::memory_order_relaxed);
return std::chrono::seconds(off);
}
// Returns a time point which is earlier from t by d, or minimum time point if it cannot be represented.
template<typename Clock, typename Duration, typename Rep, typename Period>
inline
auto saturating_subtract(std::chrono::time_point<Clock, Duration> t, std::chrono::duration<Rep, Period> d) -> decltype(t) {
return std::max(t, decltype(t)::min() + d) - d;
}

View File

@@ -22,7 +22,6 @@
#pragma once
#include <functional>
#include "keys.hh"
#include "schema.hh"
#include "range.hh"
@@ -43,113 +42,86 @@ std::ostream& operator<<(std::ostream& out, const bound_kind k);
bound_kind invert_kind(bound_kind k);
int32_t weight(bound_kind k);
static inline bound_kind flip_bound_kind(bound_kind bk)
{
switch (bk) {
case bound_kind::excl_end: return bound_kind::excl_start;
case bound_kind::incl_end: return bound_kind::incl_start;
case bound_kind::excl_start: return bound_kind::excl_end;
case bound_kind::incl_start: return bound_kind::incl_end;
}
abort();
}
class bound_view {
const static thread_local clustering_key _empty_prefix;
std::reference_wrapper<const clustering_key_prefix> _prefix;
bound_kind _kind;
const static thread_local clustering_key empty_prefix;
public:
const clustering_key_prefix& prefix;
bound_kind kind;
bound_view(const clustering_key_prefix& prefix, bound_kind kind)
: _prefix(prefix)
, _kind(kind)
: prefix(prefix)
, kind(kind)
{ }
bound_view(const bound_view& other) noexcept = default;
bound_view& operator=(const bound_view& other) noexcept = default;
bound_kind kind() const { return _kind; }
const clustering_key_prefix& prefix() const { return _prefix; }
struct tri_compare {
struct compare {
// To make it assignable and to avoid taking a schema_ptr, we
// wrap the schema reference.
std::reference_wrapper<const schema> _s;
tri_compare(const schema& s) : _s(s)
compare(const schema& s) : _s(s)
{ }
int operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
bool operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
auto type = _s.get().clustering_key_prefix_type();
auto res = prefix_equality_tri_compare(type->types().begin(),
type->begin(p1), type->end(p1),
type->begin(p2), type->end(p2),
::tri_compare);
tri_compare);
if (res) {
return res;
return res < 0;
}
auto d1 = p1.size(_s);
auto d2 = p2.size(_s);
if (d1 == d2) {
return w1 - w2;
return w1 < w2;
}
return d1 < d2 ? w1 - (w1 <= 0) : -(w2 - (w2 <= 0));
}
int operator()(const bound_view b, const clustering_key_prefix& p) const {
return operator()(b._prefix, weight(b._kind), p, 0);
}
int operator()(const clustering_key_prefix& p, const bound_view b) const {
return operator()(p, 0, b._prefix, weight(b._kind));
}
int operator()(const bound_view b1, const bound_view b2) const {
return operator()(b1._prefix, weight(b1._kind), b2._prefix, weight(b2._kind));
}
};
struct compare {
// To make it assignable and to avoid taking a schema_ptr, we
// wrap the schema reference.
tri_compare _cmp;
compare(const schema& s) : _cmp(s)
{ }
bool operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
return _cmp(p1, w1, p2, w2) < 0;
return d1 < d2 ? w1 <= 0 : w2 > 0;
}
bool operator()(const bound_view b, const clustering_key_prefix& p) const {
return operator()(b._prefix, weight(b._kind), p, 0);
return operator()(b.prefix, weight(b.kind), p, 0);
}
bool operator()(const clustering_key_prefix& p, const bound_view b) const {
return operator()(p, 0, b._prefix, weight(b._kind));
return operator()(p, 0, b.prefix, weight(b.kind));
}
bool operator()(const bound_view b1, const bound_view b2) const {
return operator()(b1._prefix, weight(b1._kind), b2._prefix, weight(b2._kind));
return operator()(b1.prefix, weight(b1.kind), b2.prefix, weight(b2.kind));
}
};
bool equal(const schema& s, const bound_view other) const {
return _kind == other._kind && _prefix.get().equal(s, other._prefix.get());
return kind == other.kind && prefix.equal(s, other.prefix);
}
bool adjacent(const schema& s, const bound_view other) const {
return invert_kind(other._kind) == _kind && _prefix.get().equal(s, other._prefix.get());
return invert_kind(other.kind) == kind && prefix.equal(s, other.prefix);
}
static bound_view bottom() {
return {_empty_prefix, bound_kind::incl_start};
return {empty_prefix, bound_kind::incl_start};
}
static bound_view top() {
return {_empty_prefix, bound_kind::incl_end};
return {empty_prefix, bound_kind::incl_end};
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
static bound_view from_range_start(const R<clustering_key_prefix>& range) {
return range.start()
? bound_view(range.start()->value(), range.start()->is_inclusive() ? bound_kind::incl_start : bound_kind::excl_start)
: bottom();
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix> )
static bound_view from_range_end(const R<clustering_key_prefix>& range) {
return range.end()
? bound_view(range.end()->value(), range.end()->is_inclusive() ? bound_kind::incl_end : bound_kind::excl_end)
: top();
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix> )
static std::pair<bound_view, bound_view> from_range(const R<clustering_key_prefix>& range) {
return {from_range_start(range), from_range_end(range)};
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
static stdx::optional<typename R<clustering_key_prefix_view>::bound> to_range_bound(const bound_view& bv) {
if (&bv._prefix.get() == &_empty_prefix) {
return {};
}
bool inclusive = bv._kind != bound_kind::excl_end && bv._kind != bound_kind::excl_start;
return {typename R<clustering_key_prefix_view>::bound(bv._prefix.get().view(), inclusive)};
/*
template<template<typename> typename T, typename U>
concept bool Range() {
return requires (T<U> range) {
{ range.start() } -> stdx::optional<U>;
{ range.end() } -> stdx::optional<U>;
};
};*/
template<template<typename> typename Range>
static std::pair<bound_view, bound_view> from_range(const Range<clustering_key_prefix>& range) {
return {
range.start() ? bound_view(range.start()->value(), range.start()->is_inclusive() ? bound_kind::incl_start : bound_kind::excl_start) : bottom(),
range.end() ? bound_view(range.end()->value(), range.end()->is_inclusive() ? bound_kind::incl_end : bound_kind::excl_end) : top(),
};
}
friend std::ostream& operator<<(std::ostream& out, const bound_view& b) {
return out << "{bound: prefix=" << b._prefix.get() << ", kind=" << b._kind << "}";
return out << "{bound: prefix=" << b.prefix << ", kind=" << b.kind << "}";
}
};

View File

@@ -54,7 +54,6 @@ public:
auto end() const { return _ref.end(); }
bool empty() const { return _ref.empty(); }
size_t size() const { return _ref.size(); }
const clustering_row_ranges& ranges() const { return _ref; }
static clustering_key_filter_ranges get_ranges(const schema& schema, const query::partition_slice& slice, const partition_key& key) {
const query::clustering_row_ranges& ranges = slice.row_ranges(schema, key);

View File

@@ -1,219 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "schema.hh"
#include "query-request.hh"
#include "mutation_fragment.hh"
// Utility for in-order checking of overlap with position ranges.
class clustering_ranges_walker {
const schema& _schema;
const query::clustering_row_ranges& _ranges;
query::clustering_row_ranges::const_iterator _current;
query::clustering_row_ranges::const_iterator _end;
bool _in_current; // next position is known to be >= _current_start
bool _with_static_row;
position_in_partition_view _current_start;
position_in_partition_view _current_end;
stdx::optional<position_in_partition> _trim;
size_t _change_counter = 1;
private:
bool advance_to_next_range() {
_in_current = false;
if (!_current_start.is_static_row()) {
if (_current == _end) {
return false;
}
++_current;
}
++_change_counter;
if (_current == _end) {
_current_end = _current_start = position_in_partition_view::after_all_clustered_rows();
return false;
}
_current_start = position_in_partition_view::for_range_start(*_current);
_current_end = position_in_partition_view::for_range_end(*_current);
return true;
}
public:
clustering_ranges_walker(const schema& s, const query::clustering_row_ranges& ranges, bool with_static_row = true)
: _schema(s)
, _ranges(ranges)
, _current(ranges.begin())
, _end(ranges.end())
, _in_current(with_static_row)
, _with_static_row(with_static_row)
, _current_start(position_in_partition_view::for_static_row())
, _current_end(position_in_partition_view::before_all_clustered_rows())
{
if (!with_static_row) {
if (_current == _end) {
_current_start = position_in_partition_view::before_all_clustered_rows();
} else {
_current_start = position_in_partition_view::for_range_start(*_current);
_current_end = position_in_partition_view::for_range_end(*_current);
}
}
}
clustering_ranges_walker(clustering_ranges_walker&& o) noexcept
: _schema(o._schema)
, _ranges(o._ranges)
, _current(o._current)
, _end(o._end)
, _in_current(o._in_current)
, _with_static_row(o._with_static_row)
, _current_start(o._current_start)
, _current_end(o._current_end)
, _trim(std::move(o._trim))
, _change_counter(o._change_counter)
{ }
clustering_ranges_walker& operator=(clustering_ranges_walker&& o) {
if (this != &o) {
this->~clustering_ranges_walker();
new (this) clustering_ranges_walker(std::move(o));
}
return *this;
}
// Excludes positions smaller than pos from the ranges.
// pos should be monotonic.
// No constraints between pos and positions passed to advance_to().
//
// After the invocation, when !out_of_range(), lower_bound() returns the smallest position still contained.
void trim_front(position_in_partition pos) {
position_in_partition::less_compare less(_schema);
do {
if (!less(_current_start, pos)) {
break;
}
if (less(pos, _current_end)) {
_trim = std::move(pos);
_current_start = *_trim;
_in_current = false;
++_change_counter;
break;
}
} while (advance_to_next_range());
}
// Returns true if given position is contained.
// Must be called with monotonic positions.
// Idempotent.
bool advance_to(position_in_partition_view pos) {
position_in_partition::less_compare less(_schema);
do {
if (!_in_current && less(pos, _current_start)) {
break;
}
// All subsequent clustering keys are larger than the start of this
// range so there is no need to check that again.
_in_current = true;
if (less(pos, _current_end)) {
return true;
}
} while (advance_to_next_range());
return false;
}
// Returns true if the range expressed by start and end (as in position_range) overlaps
// with clustering ranges.
// Must be called with monotonic start position. That position must also be greater than
// the last position passed to the other advance_to() overload.
// Idempotent.
bool advance_to(position_in_partition_view start, position_in_partition_view end) {
position_in_partition::less_compare less(_schema);
do {
if (!less(_current_start, end)) {
break;
}
if (less(start, _current_end)) {
return true;
}
} while (advance_to_next_range());
return false;
}
// Returns true if the range tombstone expressed by start and end (as in position_range) overlaps
// with clustering ranges.
// No monotonicity restrictions on argument values across calls.
// Does not affect lower_bound().
// Idempotent.
bool contains_tombstone(position_in_partition_view start, position_in_partition_view end) const {
position_in_partition::less_compare less(_schema);
if (_trim && !less(*_trim, end)) {
return false;
}
auto i = _current;
while (i != _end) {
auto range_start = position_in_partition_view::for_range_start(*i);
if (!less(range_start, end)) {
return false;
}
auto range_end = position_in_partition_view::for_range_end(*i);
if (less(start, range_end)) {
return true;
}
++i;
}
return false;
}
// Returns true if advanced past all contained positions. Any later advance_to() until reset() will return false.
bool out_of_range() const {
return !_in_current && _current == _end;
}
// Resets the state of the walker so that advance_to() can be now called for new sequence of positions.
// Any range trimmings still hold after this.
void reset() {
auto trim = std::move(_trim);
auto ctr = _change_counter;
*this = clustering_ranges_walker(_schema, _ranges, _with_static_row);
_change_counter = ctr + 1;
if (trim) {
trim_front(std::move(*trim));
}
}
// Can be called only when !out_of_range()
position_in_partition_view lower_bound() const {
return _current_start;
}
// When lower_bound() changes, this also does
// Always > 0.
size_t lower_bound_change_counter() const {
return _change_counter;
}
};

View File

@@ -1,3 +0,0 @@
# Scylla Coding Style
Please see the [Seastar style document](https://github.com/scylladb/seastar/blob/master/coding-style.md).

View File

@@ -21,12 +21,7 @@
#pragma once
#include "sstables/shared_sstable.hh"
#include "exceptions/exceptions.hh"
#include "sstables/compaction_backlog_manager.hh"
class table;
using column_family = table;
class column_family;
class schema;
using schema_ptr = lw_shared_ptr<const schema>;
@@ -38,14 +33,12 @@ enum class compaction_strategy_type {
size_tiered,
leveled,
date_tiered,
time_window,
};
class compaction_strategy_impl;
class sstable;
class sstable_set;
struct compaction_descriptor;
struct resharding_descriptor;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -59,13 +52,11 @@ public:
compaction_strategy& operator=(compaction_strategy&&);
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<shared_sstable> candidates);
std::vector<resharding_descriptor> get_resharding_jobs(column_family& cf, std::vector<shared_sstable> candidates);
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
// Some strategies may look at the compacted and resulting sstables to
// get some useful information for subsequent compactions.
void notify_completion(const std::vector<shared_sstable>& removed, const std::vector<shared_sstable>& added);
void notify_completion(const std::vector<lw_shared_ptr<sstable>>& removed, const std::vector<lw_shared_ptr<sstable>>& added);
// Return if parallel compaction is allowed by strategy.
bool parallel_compaction() const;
@@ -88,8 +79,6 @@ public:
return "LeveledCompactionStrategy";
case compaction_strategy_type::date_tiered:
return "DateTieredCompactionStrategy";
case compaction_strategy_type::time_window:
return "TimeWindowCompactionStrategy";
default:
throw std::runtime_error("Invalid Compaction Strategy");
}
@@ -108,8 +97,6 @@ public:
return compaction_strategy_type::leveled;
} else if (short_name == "DateTieredCompactionStrategy") {
return compaction_strategy_type::date_tiered;
} else if (short_name == "TimeWindowCompactionStrategy") {
return compaction_strategy_type::time_window;
} else {
throw exceptions::configuration_exception(sprint("Unable to find compaction strategy class '%s'", name));
}
@@ -122,8 +109,6 @@ public:
}
sstable_set make_sstable_set(schema_ptr schema) const;
compaction_backlog_tracker& get_backlog_tracker();
};
// Creates a compaction_strategy object from one of the strategies available.

View File

@@ -0,0 +1,64 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "query-request.hh"
#include <experimental/optional>
// Wraps ring_position so it is compatible with old-style C++: default constructor,
// stateless comparators, yada yada
class compatible_ring_position {
const schema* _schema = nullptr;
// optional to supply a default constructor, no more
std::experimental::optional<dht::ring_position> _rp;
public:
compatible_ring_position() noexcept = default;
compatible_ring_position(const schema& s, const dht::ring_position& rp)
: _schema(&s), _rp(rp) {
}
compatible_ring_position(const schema& s, dht::ring_position&& rp)
: _schema(&s), _rp(std::move(rp)) {
}
friend int tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
return x._rp->tri_compare(*x._schema, *y._rp);
}
friend bool operator<(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) != 0;
}
};

View File

@@ -1,64 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "query-request.hh"
#include <optional>
// Wraps ring_position_view so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position_view {
const schema* _schema = nullptr;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position_view> _rpv;
public:
constexpr compatible_ring_position_view() = default;
compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
: _schema(&s), _rpv(rpv) {
}
const dht::ring_position_view& position() const {
return *_rpv;
}
friend int tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
}
friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) != 0;
}
};

View File

@@ -22,13 +22,12 @@
#pragma once
#include "types.hh"
#include <iosfwd>
#include <iostream>
#include <algorithm>
#include <vector>
#include <boost/range/iterator_range.hpp>
#include <boost/range/adaptor/transformed.hpp>
#include "utils/serialization.hh"
#include "util/backtrace.hh"
#include "unimplemented.hh"
enum class allow_prefixes { no, yes };
@@ -131,10 +130,10 @@ public:
bytes decompose_value(const value_type& values) {
return serialize_value(values);
}
class iterator : public std::iterator<std::input_iterator_tag, const bytes_view> {
class iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
private:
bytes_view _v;
bytes_view _current;
value_type _current;
private:
void read_current() {
size_type len;
@@ -145,7 +144,7 @@ public:
}
len = read_simple<size_type>(_v);
if (_v.size() < len) {
throw_with_backtrace<marshal_exception>(sprint("compound_type iterator - not enough bytes, expected %d, got %d", len, _v.size()));
throw marshal_exception();
}
}
_current = bytes_view(_v.begin(), len);
@@ -221,9 +220,6 @@ public:
assert(AllowPrefixes == allow_prefixes::yes);
return std::distance(begin(v), end(v)) == (ssize_t)_types.size();
}
bool is_empty(bytes_view v) const {
return begin(v) == end(v);
}
void validate(bytes_view v) {
// FIXME: implement
warn(unimplemented::cause::VALIDATION);

View File

@@ -25,7 +25,6 @@
#include <boost/range/adaptor/transformed.hpp>
#include "compound.hh"
#include "schema.hh"
#include "sstables/version.hh"
//
// This header provides adaptors between the representation used by our compound_type<>
@@ -185,8 +184,6 @@ bytes to_legacy(CompoundType& type, bytes_view packed) {
return legacy_form;
}
class composite_view;
// Represents a value serialized according to Origin's CompositeType.
// If is_compound is true, then the value is one or more components encoded as:
//
@@ -205,7 +202,7 @@ public:
, _is_compound(is_compound)
{ }
explicit composite(bytes&& b)
composite(bytes&& b)
: _bytes(std::move(b))
, _is_compound(true)
{ }
@@ -242,7 +239,7 @@ public:
using component_view = std::pair<bytes_view, eoc>;
private:
template<typename Value, typename = std::enable_if_t<!std::is_same<const data_value, std::decay_t<Value>>::value>>
static size_t size(const Value& val) {
static size_t size(Value& val) {
return val.size();
}
static size_t size(const data_value& val) {
@@ -303,40 +300,27 @@ private:
}
public:
template <typename Describer>
auto describe_type(sstables::sstable_version_types v, Describer f) const {
auto describe_type(Describer f) const {
return f(const_cast<bytes&>(_bytes));
}
// marker is ignored if !is_compound
template<typename RangeOfSerializedComponents>
static composite serialize_value(RangeOfSerializedComponents&& values, bool is_compound = true, eoc marker = eoc::none) {
static bytes serialize_value(RangeOfSerializedComponents&& values, bool is_compound = true) {
auto size = serialized_size(values, is_compound);
bytes b(bytes::initialized_later(), size);
auto i = b.begin();
serialize_value(std::forward<decltype(values)>(values), i, is_compound);
if (is_compound && !b.empty()) {
b.back() = eoc_type(marker);
}
return composite(std::move(b), is_compound);
}
template<typename RangeOfSerializedComponents>
static composite serialize_static(const schema& s, RangeOfSerializedComponents&& values) {
// FIXME: Optimize
auto b = bytes(size_t(2), bytes::value_type(0xff));
std::vector<bytes_view> sv(s.clustering_key_size());
b += composite::serialize_value(boost::range::join(sv, std::forward<RangeOfSerializedComponents>(values)), true).release_bytes();
return composite(std::move(b));
}
static eoc to_eoc(int8_t eoc_byte) {
return eoc_byte == 0 ? eoc::none : (eoc_byte < 0 ? eoc::start : eoc::end);
return b;
}
class iterator : public std::iterator<std::input_iterator_tag, const component_view> {
bytes_view _v;
component_view _current;
private:
eoc to_eoc(int8_t eoc_byte) {
return eoc_byte == 0 ? eoc::none : (eoc_byte < 0 ? eoc::start : eoc::end);
}
void read_current() {
size_type len;
{
@@ -346,7 +330,7 @@ public:
}
len = read_simple<size_type>(_v);
if (_v.size() < len) {
throw_with_backtrace<marshal_exception>(sprint("composite iterator - not enough bytes, expected %d, got %d", len, _v.size()));
throw marshal_exception();
}
}
auto value = bytes_view(_v.begin(), len);
@@ -422,10 +406,6 @@ public:
return _bytes;
}
bytes release_bytes() && {
return std::move(_bytes);
}
size_t size() const {
return _bytes.size();
}
@@ -446,20 +426,26 @@ public:
return _is_compound;
}
// The following factory functions assume this composite is a compound value.
template <typename ClusteringElement>
static composite from_clustering_element(const schema& s, const ClusteringElement& ce) {
return serialize_value(ce.components(s), s.is_compound());
return serialize_value(ce.components(s));
}
static composite from_exploded(const std::vector<bytes_view>& v, bool is_compound, eoc marker = eoc::none) {
static composite from_exploded(const std::vector<bytes_view>& v, eoc marker = eoc::none) {
if (v.size() == 0) {
return composite(bytes(size_t(1), bytes::value_type(marker)), is_compound);
return bytes(size_t(1), bytes::value_type(marker));
}
return serialize_value(v, is_compound, marker);
auto b = serialize_value(v);
b.back() = eoc_type(marker);
return composite(std::move(b));
}
static composite static_prefix(const schema& s) {
return serialize_static(s, std::vector<bytes_view>());
static bytes static_marker(size_t(2), bytes::value_type(0xff));
std::vector<bytes_view> sv(s.clustering_key_size());
return static_marker + serialize_value(sv);
}
explicit operator bytes_view() const {
@@ -470,15 +456,6 @@ public:
friend inline std::ostream& operator<<(std::ostream& os, const std::pair<Component, eoc>& c) {
return os << "{value=" << c.first << "; eoc=" << sprint("0x%02x", eoc_type(c.second) & 0xff) << "}";
}
friend std::ostream& operator<<(std::ostream& os, const composite& v);
struct tri_compare {
const std::vector<data_type>& _types;
tri_compare(const std::vector<data_type>& types) : _types(types) {}
int operator()(const composite&, const composite&) const;
int operator()(composite_view, composite_view) const;
};
};
class composite_view final {
@@ -499,15 +476,14 @@ public:
, _is_compound(true)
{ }
std::vector<bytes_view> explode() const {
std::vector<bytes> explode() const {
if (!_is_compound) {
return { _bytes };
return { to_bytes(_bytes) };
}
std::vector<bytes_view> ret;
ret.reserve(8);
std::vector<bytes> ret;
for (auto it = begin(), e = end(); it != e; ) {
ret.push_back(it->first);
ret.push_back(to_bytes(it->first));
auto marker = it->second;
++it;
if (it != e && marker != composite::eoc::none) {
@@ -529,15 +505,6 @@ public:
return { begin(), end() };
}
composite::eoc last_eoc() const {
if (!_is_compound || _bytes.empty()) {
return composite::eoc::none;
}
bytes_view v(_bytes);
v.remove_prefix(v.size() - 1);
return composite::to_eoc(read_simple<composite::eoc_type>(v));
}
auto values() const {
return components() | boost::adaptors::transformed([](auto&& c) { return c.first; });
}
@@ -560,46 +527,4 @@ public:
bool operator==(const composite_view& k) const { return k._bytes == _bytes && k._is_compound == _is_compound; }
bool operator!=(const composite_view& k) const { return !(k == *this); }
friend inline std::ostream& operator<<(std::ostream& os, composite_view v) {
return os << "{" << ::join(", ", v.components()) << ", compound=" << v._is_compound << ", static=" << v.is_static() << "}";
}
};
inline
std::ostream& operator<<(std::ostream& os, const composite& v) {
return os << composite_view(v);
}
inline
int composite::tri_compare::operator()(const composite& v1, const composite& v2) const {
return (*this)(composite_view(v1), composite_view(v2));
}
inline
int composite::tri_compare::operator()(composite_view v1, composite_view v2) const {
// See org.apache.cassandra.db.composites.AbstractCType#compare
if (v1.empty()) {
return v2.empty() ? 0 : -1;
}
if (v2.empty()) {
return 1;
}
if (v1.is_static() != v2.is_static()) {
return v1.is_static() ? -1 : 1;
}
auto a_values = v1.components();
auto b_values = v2.components();
auto cmp = [&](const data_type& t, component_view c1, component_view c2) {
// First by value, then by EOC
auto r = t->compare(c1.first, c2.first);
if (r) {
return r;
}
return static_cast<int>(c1.second) - static_cast<int>(c2.second);
};
return lexicographical_tri_compare(_types.begin(), _types.end(),
a_values.begin(), a_values.end(),
b_values.begin(), b_values.end(),
cmp);
}

View File

@@ -1,345 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <lz4.h>
#include <zlib.h>
#include <snappy-c.h>
#include "compress.hh"
#include "utils/class_registrator.hh"
const sstring compressor::namespace_prefix = "org.apache.cassandra.io.compress.";
class lz4_processor: public compressor {
public:
using compressor::compressor;
size_t uncompress(const char* input, size_t input_len, char* output,
size_t output_len) const override;
size_t compress(const char* input, size_t input_len, char* output,
size_t output_len) const override;
size_t compress_max_size(size_t input_len) const override;
};
class snappy_processor: public compressor {
public:
using compressor::compressor;
size_t uncompress(const char* input, size_t input_len, char* output,
size_t output_len) const override;
size_t compress(const char* input, size_t input_len, char* output,
size_t output_len) const override;
size_t compress_max_size(size_t input_len) const override;
};
class deflate_processor: public compressor {
public:
using compressor::compressor;
size_t uncompress(const char* input, size_t input_len, char* output,
size_t output_len) const override;
size_t compress(const char* input, size_t input_len, char* output,
size_t output_len) const override;
size_t compress_max_size(size_t input_len) const override;
};
compressor::compressor(sstring name)
: _name(std::move(name))
{}
std::set<sstring> compressor::option_names() const {
return {};
}
std::map<sstring, sstring> compressor::options() const {
return {};
}
shared_ptr<compressor> compressor::create(const sstring& name, const opt_getter& opts) {
if (name.empty()) {
return {};
}
qualified_name qn(namespace_prefix, name);
for (auto& c : { lz4, snappy, deflate }) {
if (c->name() == qn) {
return c;
}
}
return compressor_registry::create(qn, opts);
}
shared_ptr<compressor> compressor::create(const std::map<sstring, sstring>& options) {
auto i = options.find(compression_parameters::SSTABLE_COMPRESSION);
if (i != options.end() && !i->second.empty()) {
return create(i->second, [&options](const sstring& key) -> opt_string {
auto i = options.find(key);
if (i == options.end()) {
return std::experimental::nullopt;
}
return { i->second };
});
}
return {};
}
thread_local const shared_ptr<compressor> compressor::lz4 = make_shared<lz4_processor>(namespace_prefix + "LZ4Compressor");
thread_local const shared_ptr<compressor> compressor::snappy = make_shared<snappy_processor>(namespace_prefix + "SnappyCompressor");
thread_local const shared_ptr<compressor> compressor::deflate = make_shared<deflate_processor>(namespace_prefix + "DeflateCompressor");
const sstring compression_parameters::SSTABLE_COMPRESSION = "sstable_compression";
const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_kb";
const sstring compression_parameters::CRC_CHECK_CHANCE = "crc_check_chance";
compression_parameters::compression_parameters()
: compression_parameters(nullptr)
{}
compression_parameters::~compression_parameters()
{}
compression_parameters::compression_parameters(compressor_ptr c)
: _compressor(std::move(c))
{}
compression_parameters::compression_parameters(const std::map<sstring, sstring>& options) {
_compressor = compressor::create(options);
validate_options(options);
auto chunk_length = options.find(CHUNK_LENGTH_KB);
if (chunk_length != options.end()) {
try {
_chunk_length = std::stoi(chunk_length->second) * 1024;
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sstring("Invalid integer value ") + chunk_length->second + " for " + CHUNK_LENGTH_KB);
}
}
auto crc_chance = options.find(CRC_CHECK_CHANCE);
if (crc_chance != options.end()) {
try {
_crc_check_chance = std::stod(crc_chance->second);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sstring("Invalid double value ") + crc_chance->second + "for " + CRC_CHECK_CHANCE);
}
}
}
void compression_parameters::validate() {
if (_chunk_length) {
auto chunk_length = _chunk_length.value();
if (chunk_length <= 0) {
throw exceptions::configuration_exception(sstring("Invalid negative or null ") + CHUNK_LENGTH_KB);
}
// _chunk_length must be a power of two
if (chunk_length & (chunk_length - 1)) {
throw exceptions::configuration_exception(sstring(CHUNK_LENGTH_KB) + " must be a power of 2.");
}
}
if (_crc_check_chance && (_crc_check_chance.value() < 0.0 || _crc_check_chance.value() > 1.0)) {
throw exceptions::configuration_exception(sstring(CRC_CHECK_CHANCE) + " must be between 0.0 and 1.0.");
}
}
std::map<sstring, sstring> compression_parameters::get_options() const {
if (!_compressor) {
return std::map<sstring, sstring>();
}
auto opts = _compressor->options();
opts.emplace(compression_parameters::SSTABLE_COMPRESSION, _compressor->name());
if (_chunk_length) {
opts.emplace(sstring(CHUNK_LENGTH_KB), std::to_string(_chunk_length.value() / 1024));
}
if (_crc_check_chance) {
opts.emplace(sstring(CRC_CHECK_CHANCE), std::to_string(_crc_check_chance.value()));
}
return opts;
}
bool compression_parameters::operator==(const compression_parameters& other) const {
return _compressor == other._compressor
&& _chunk_length == other._chunk_length
&& _crc_check_chance == other._crc_check_chance;
}
bool compression_parameters::operator!=(const compression_parameters& other) const {
return !(*this == other);
}
void compression_parameters::validate_options(const std::map<sstring, sstring>& options) {
// currently, there are no options specific to a particular compressor
static std::set<sstring> keywords({
sstring(SSTABLE_COMPRESSION),
sstring(CHUNK_LENGTH_KB),
sstring(CRC_CHECK_CHANCE),
});
std::set<sstring> ckw;
if (_compressor) {
ckw = _compressor->option_names();
}
for (auto&& opt : options) {
if (!keywords.count(opt.first) && !ckw.count(opt.first)) {
throw exceptions::configuration_exception(sprint("Unknown compression option '%s'.", opt.first));
}
}
}
size_t lz4_processor::uncompress(const char* input, size_t input_len,
char* output, size_t output_len) const {
// We use LZ4_decompress_safe(). According to the documentation, the
// function LZ4_decompress_fast() is slightly faster, but maliciously
// crafted compressed data can cause it to overflow the output buffer.
// Theoretically, our compressed data is created by us so is not malicious
// (and accidental corruption is avoided by the compressed-data checksum),
// but let's not take that chance for now, until we've actually measured
// the performance benefit that LZ4_decompress_fast() would bring.
// Cassandra's LZ4Compressor prepends to the chunk its uncompressed length
// in 4 bytes little-endian (!) order. We don't need this information -
// we already know the uncompressed data is at most the given chunk size
// (and usually is exactly that, except in the last chunk). The advance
// knowledge of the uncompressed size could be useful if we used
// LZ4_decompress_fast(), but we prefer LZ4_decompress_safe() anyway...
input += 4;
input_len -= 4;
auto ret = LZ4_decompress_safe(input, output, input_len, output_len);
if (ret < 0) {
throw std::runtime_error("LZ4 uncompression failure");
}
return ret;
}
size_t lz4_processor::compress(const char* input, size_t input_len,
char* output, size_t output_len) const {
if (output_len < LZ4_COMPRESSBOUND(input_len) + 4) {
throw std::runtime_error("LZ4 compression failure: length of output is too small");
}
// Write input_len (32-bit data) to beginning of output in little-endian representation.
output[0] = input_len & 0xFF;
output[1] = (input_len >> 8) & 0xFF;
output[2] = (input_len >> 16) & 0xFF;
output[3] = (input_len >> 24) & 0xFF;
#ifdef SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT
auto ret = LZ4_compress_default(input, output + 4, input_len, LZ4_compressBound(input_len));
#else
auto ret = LZ4_compress(input, output + 4, input_len);
#endif
if (ret == 0) {
throw std::runtime_error("LZ4 compression failure: LZ4_compress() failed");
}
return ret + 4;
}
size_t lz4_processor::compress_max_size(size_t input_len) const {
return LZ4_COMPRESSBOUND(input_len) + 4;
}
size_t deflate_processor::uncompress(const char* input,
size_t input_len, char* output, size_t output_len) const {
z_stream zs;
zs.zalloc = Z_NULL;
zs.zfree = Z_NULL;
zs.opaque = Z_NULL;
zs.avail_in = 0;
zs.next_in = Z_NULL;
if (inflateInit(&zs) != Z_OK) {
throw std::runtime_error("deflate uncompression init failure");
}
// yuck, zlib is not const-correct, and also uses unsigned char while we use char :-(
zs.next_in = reinterpret_cast<unsigned char*>(const_cast<char*>(input));
zs.avail_in = input_len;
zs.next_out = reinterpret_cast<unsigned char*>(output);
zs.avail_out = output_len;
auto res = inflate(&zs, Z_FINISH);
inflateEnd(&zs);
if (res == Z_STREAM_END) {
return output_len - zs.avail_out;
} else {
throw std::runtime_error("deflate uncompression failure");
}
}
size_t deflate_processor::compress(const char* input,
size_t input_len, char* output, size_t output_len) const {
z_stream zs;
zs.zalloc = Z_NULL;
zs.zfree = Z_NULL;
zs.opaque = Z_NULL;
zs.avail_in = 0;
zs.next_in = Z_NULL;
if (deflateInit(&zs, Z_DEFAULT_COMPRESSION) != Z_OK) {
throw std::runtime_error("deflate compression init failure");
}
zs.next_in = reinterpret_cast<unsigned char*>(const_cast<char*>(input));
zs.avail_in = input_len;
zs.next_out = reinterpret_cast<unsigned char*>(output);
zs.avail_out = output_len;
auto res = ::deflate(&zs, Z_FINISH);
deflateEnd(&zs);
if (res == Z_STREAM_END) {
return output_len - zs.avail_out;
} else {
throw std::runtime_error("deflate compression failure");
}
}
size_t deflate_processor::compress_max_size(size_t input_len) const {
z_stream zs;
zs.zalloc = Z_NULL;
zs.zfree = Z_NULL;
zs.opaque = Z_NULL;
zs.avail_in = 0;
zs.next_in = Z_NULL;
if (deflateInit(&zs, Z_DEFAULT_COMPRESSION) != Z_OK) {
throw std::runtime_error("deflate compression init failure");
}
auto res = deflateBound(&zs, input_len);
deflateEnd(&zs);
return res;
}
size_t snappy_processor::uncompress(const char* input, size_t input_len,
char* output, size_t output_len) const {
if (snappy_uncompress(input, input_len, output, &output_len)
== SNAPPY_OK) {
return output_len;
} else {
throw std::runtime_error("snappy uncompression failure");
}
}
size_t snappy_processor::compress(const char* input, size_t input_len,
char* output, size_t output_len) const {
auto ret = snappy_compress(input, input_len, output, &output_len);
if (ret != SNAPPY_OK) {
throw std::runtime_error("snappy compression failure: snappy_compress() failed");
}
return output_len;
}
size_t snappy_processor::compress_max_size(size_t input_len) const {
return snappy_max_compressed_length(input_len);
}

View File

@@ -21,103 +21,135 @@
#pragma once
#include <map>
#include <set>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sstring.hh>
#include "exceptions/exceptions.hh"
#include "stdx.hh"
class compressor {
sstring _name;
public:
compressor(sstring);
virtual ~compressor() {}
/**
* Unpacks data in "input" to output. If output_len is of insufficient size,
* exception is thrown. I.e. you should keep track of the uncompressed size.
*/
virtual size_t uncompress(const char* input, size_t input_len, char* output,
size_t output_len) const = 0;
/**
* Packs data in "input" to output. If output_len is of insufficient size,
* exception is thrown. Maximum required size is obtained via "compress_max_size"
*/
virtual size_t compress(const char* input, size_t input_len, char* output,
size_t output_len) const = 0;
/**
* Returns the maximum output size for compressing data on "input_len" size.
*/
virtual size_t compress_max_size(size_t input_len) const = 0;
/**
* Returns accepted option names for this compressor
*/
virtual std::set<sstring> option_names() const;
/**
* Returns original options used in instantiating this compressor
*/
virtual std::map<sstring, sstring> options() const;
/**
* Compressor class name.
*/
const sstring& name() const {
return _name;
}
// to cheaply bridge sstable compression options / maps
using opt_string = stdx::optional<sstring>;
using opt_getter = std::function<opt_string(const sstring&)>;
static shared_ptr<compressor> create(const sstring& name, const opt_getter&);
static shared_ptr<compressor> create(const std::map<sstring, sstring>&);
static thread_local const shared_ptr<compressor> lz4;
static thread_local const shared_ptr<compressor> snappy;
static thread_local const shared_ptr<compressor> deflate;
static const sstring namespace_prefix;
enum class compressor {
none,
lz4,
snappy,
deflate,
};
template<typename BaseType, typename... Args>
class class_registry;
using compressor_ptr = shared_ptr<compressor>;
using compressor_registry = class_registry<compressor_ptr, const typename compressor::opt_getter&>;
class compression_parameters {
public:
static constexpr int32_t DEFAULT_CHUNK_LENGTH = 4 * 1024;
static constexpr double DEFAULT_CRC_CHECK_CHANCE = 1.0;
static const sstring SSTABLE_COMPRESSION;
static const sstring CHUNK_LENGTH_KB;
static const sstring CRC_CHECK_CHANCE;
static constexpr auto SSTABLE_COMPRESSION = "sstable_compression";
static constexpr auto CHUNK_LENGTH_KB = "chunk_length_kb";
static constexpr auto CRC_CHECK_CHANCE = "crc_check_chance";
private:
compressor_ptr _compressor;
compressor _compressor = compressor::none;
std::experimental::optional<int> _chunk_length;
std::experimental::optional<double> _crc_check_chance;
public:
compression_parameters();
compression_parameters(compressor_ptr);
compression_parameters(const std::map<sstring, sstring>& options);
~compression_parameters();
compression_parameters() = default;
compression_parameters(compressor c) : _compressor(c) { }
compression_parameters(const std::map<sstring, sstring>& options) {
validate_options(options);
compressor_ptr get_compressor() const { return _compressor; }
auto it = options.find(SSTABLE_COMPRESSION);
if (it == options.end() || it->second.empty()) {
return;
}
const auto& compressor_class = it->second;
if (is_compressor_class(compressor_class, "LZ4Compressor")) {
_compressor = compressor::lz4;
} else if (is_compressor_class(compressor_class, "SnappyCompressor")) {
_compressor = compressor::snappy;
} else if (is_compressor_class(compressor_class, "DeflateCompressor")) {
_compressor = compressor::deflate;
} else {
throw exceptions::configuration_exception(sstring("Unsupported compression class '") + compressor_class + "'.");
}
auto chunk_length = options.find(CHUNK_LENGTH_KB);
if (chunk_length != options.end()) {
try {
_chunk_length = std::stoi(chunk_length->second) * 1024;
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sstring("Invalid integer value ") + chunk_length->second + " for " + CHUNK_LENGTH_KB);
}
}
auto crc_chance = options.find(CRC_CHECK_CHANCE);
if (crc_chance != options.end()) {
try {
_crc_check_chance = std::stod(crc_chance->second);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sstring("Invalid double value ") + crc_chance->second + "for " + CRC_CHECK_CHANCE);
}
}
}
compressor get_compressor() const { return _compressor; }
int32_t chunk_length() const { return _chunk_length.value_or(int(DEFAULT_CHUNK_LENGTH)); }
double crc_check_chance() const { return _crc_check_chance.value_or(double(DEFAULT_CRC_CHECK_CHANCE)); }
void validate();
std::map<sstring, sstring> get_options() const;
bool operator==(const compression_parameters& other) const;
bool operator!=(const compression_parameters& other) const;
void validate() {
if (_chunk_length) {
auto chunk_length = _chunk_length.value();
if (chunk_length <= 0) {
throw exceptions::configuration_exception(sstring("Invalid negative or null ") + CHUNK_LENGTH_KB);
}
// _chunk_length must be a power of two
if (chunk_length & (chunk_length - 1)) {
throw exceptions::configuration_exception(sstring(CHUNK_LENGTH_KB) + " must be a power of 2.");
}
}
if (_crc_check_chance && (_crc_check_chance.value() < 0.0 || _crc_check_chance.value() > 1.0)) {
throw exceptions::configuration_exception(sstring(CRC_CHECK_CHANCE) + " must be between 0.0 and 1.0.");
}
}
std::map<sstring, sstring> get_options() const {
if (_compressor == compressor::none) {
return std::map<sstring, sstring>();
}
std::map<sstring, sstring> opts;
opts.emplace(sstring(SSTABLE_COMPRESSION), compressor_name());
if (_chunk_length) {
opts.emplace(sstring(CHUNK_LENGTH_KB), std::to_string(_chunk_length.value() / 1024));
}
if (_crc_check_chance) {
opts.emplace(sstring(CRC_CHECK_CHANCE), std::to_string(_crc_check_chance.value()));
}
return opts;
}
bool operator==(const compression_parameters& other) const {
return _compressor == other._compressor
&& _chunk_length == other._chunk_length
&& _crc_check_chance == other._crc_check_chance;
}
bool operator!=(const compression_parameters& other) const {
return !(*this == other);
}
private:
void validate_options(const std::map<sstring, sstring>&);
void validate_options(const std::map<sstring, sstring>& options) {
// currently, there are no options specific to a particular compressor
static std::set<sstring> keywords({
sstring(SSTABLE_COMPRESSION),
sstring(CHUNK_LENGTH_KB),
sstring(CRC_CHECK_CHANCE),
});
for (auto&& opt : options) {
if (!keywords.count(opt.first)) {
throw exceptions::configuration_exception(sprint("Unknown compression option '%s'.", opt.first));
}
}
}
bool is_compressor_class(const sstring& value, const sstring& class_name) {
static const sstring namespace_prefix = "org.apache.cassandra.io.compress.";
return value == class_name || value == namespace_prefix + class_name;
}
sstring compressor_name() const {
switch (_compressor) {
case compressor::lz4:
return "org.apache.cassandra.io.compress.LZ4Compressor";
case compressor::snappy:
return "org.apache.cassandra.io.compress.SnappyCompressor";
case compressor::deflate:
return "org.apache.cassandra.io.compress.DeflateCompressor";
default:
abort();
}
}
};

View File

@@ -12,9 +12,7 @@
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
# It is recommended to change the default value when creating a new cluster.
# You can NOT modify this value for an existing cluster
#cluster_name: 'Test Cluster'
cluster_name: 'Test Cluster'
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
@@ -87,25 +85,16 @@ listen_address: localhost
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4
# When using multiple physical network interfaces, set this to true to listen on broadcast_address
# in addition to the listen_address, allowing nodes to communicate in both interfaces.
# Ignore this property if the network configuration automatically routes between the public and private networks such as EC2.
#
# listen_on_broadcast_address: false
# port for the CQL native transport to listen for clients on
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
native_transport_port: 9042
# Enabling native transport encryption in client_encryption_options allows you to either use
# encryption for the standard port or to use a dedicated, additional port along with the unencrypted
# standard native_transport_port.
# Enabling client encryption and keeping native_transport_port_ssl disabled will use encryption
# for native_transport_port. Setting native_transport_port_ssl to a different value
# from native_transport_port will use encryption for native_transport_port_ssl while
# keeping native_transport_port unencrypted.
#native_transport_port_ssl: 9142
# Throttles all outbound streaming file transfers on this node to the
# given total throughput in Mbps. This is necessary because Scylla does
# mostly sequential IO when streaming data during bootstrap or repair, which
# can lead to saturating the network connection and degrading rpc performance.
# When unset, the default is 200 Mbps or 25 MB/s.
# stream_throughput_outbound_megabits_per_sec: 200
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 5000
@@ -203,9 +192,6 @@ api_address: 127.0.0.1
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
# Fail any multiple-partition batch exceeding this value. 50kb (10x warn threshold) by default.
batch_size_fail_threshold_in_kb: 50
# Authentication backend, identifying users
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
@@ -231,17 +217,9 @@ batch_size_fail_threshold_in_kb: 50
# that do not have vnodes enabled.
# initial_token:
# RPC address to broadcast to drivers and other Scylla nodes. This cannot
# be set to 0.0.0.0. If left blank, this will be set to the value of
# rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must
# be set.
# broadcast_rpc_address: 1.2.3.4
# Uncomment to enable experimental features
# experimental: true
# The directory where hints files are stored if hinted handoff is enabled.
# hints_directory: /var/lib/scylla/hints
###################################################
## Not currently supported, reserved for future use
###################################################
# See http://wiki.apache.org/cassandra/HintedHandoff
# May either be "true" or "false" to enable globally, or contain a list
@@ -265,27 +243,23 @@ batch_size_fail_threshold_in_kb: 50
# cross-dc handoff tends to be slower
# max_hints_delivery_threads: 2
###################################################
## Not currently supported, reserved for future use
###################################################
# Maximum throttle in KBs per second, total. This will be
# reduced proportionally to the number of nodes in the cluster.
# batchlog_replay_throttle_in_kb: 1024
# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
# one example). Defaults to 10000, set to 0 to disable.
# one example). Defaults to 2000, set to 0 to disable.
# Will be disabled automatically for AllowAllAuthorizer.
# permissions_validity_in_ms: 10000
# permissions_validity_in_ms: 2000
# Refresh interval for permissions cache (if enabled).
# After this interval, cache entries become eligible for refresh. Upon next
# access, an async reload is scheduled and the old value returned until it
# completes. If permissions_validity_in_ms is non-zero, then this also must have
# a non-zero value. Defaults to 2000. It's recommended to set this value to
# be at least 3 times smaller than the permissions_validity_in_ms.
# permissions_update_interval_in_ms: 2000
# completes. If permissions_validity_in_ms is non-zero, then this must be
# also.
# Defaults to the same value as permissions_validity_in_ms.
# permissions_update_interval_in_ms: 1000
# The partitioner is responsible for distributing groups of rows (by
# partition key) across nodes in the cluster. You should leave this
@@ -299,6 +273,28 @@ batch_size_fail_threshold_in_kb: 50
#
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# policy for data disk failures:
# die: shut down gossip and Thrift and kill the JVM for any fs errors or
# single-sstable errors, so the node can be replaced.
# stop_paranoid: shut down gossip and Thrift even for single-sstable errors.
# stop: shut down gossip and Thrift, leaving the node effectively dead, but
# can still be inspected via JMX.
# best_effort: stop using the failed disk and respond to requests based on
# remaining available sstables. This means you WILL see obsolete
# data at CL.ONE!
# ignore: ignore fatal errors and let requests fail, as in pre-1.2 Scylla
# disk_failure_policy: stop
# policy for commit disk failures:
# die: shut down gossip and Thrift and kill the JVM, so the node can be replaced.
# stop: shut down gossip and Thrift, leaving the node effectively dead, but
# can still be inspected via JMX.
# stop_commit: shutdown the commit log, letting writes collect but
# continuing to service reads, as in pre-2.0.5 Scylla
# ignore: ignore fatal errors and let the batches fail
# commit_failure_policy: stop
# Maximum size of the key cache in memory.
#
# Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
@@ -488,6 +484,13 @@ commitlog_total_space_in_mb: -1
# Whether to start the thrift rpc server.
# start_rpc: true
# RPC address to broadcast to drivers and other Scylla nodes. This cannot
# be set to 0.0.0.0. If left blank, this will be set to the value of
# rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must
# be set.
# broadcast_rpc_address: 1.2.3.4
# enable or disable keepalive on rpc/native connections
# rpc_keepalive: true
@@ -725,17 +728,22 @@ commitlog_total_space_in_mb: -1
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# truststore: <none, use system trust>
# require_client_auth: False
# priority_string: <none, use default>
# enable or disable client/server encryption.
# client_encryption_options:
# enabled: false
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# truststore: <none, use system trust>
# require_client_auth: False
# priority_string: <none, use default>
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
# truststore: conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
# internode_compression controls whether traffic between nodes is
# compressed.
@@ -781,23 +789,3 @@ commitlog_total_space_in_mb: -1
# By default, Scylla binds all interfaces to the prometheus API
# It is possible to restrict the listening address to a specific one
# prometheus_address: 0.0.0.0
# Distribution of data among cores (shards) within a node
#
# Scylla distributes data within a node among shards, using a round-robin
# strategy:
# [shard0] [shard1] ... [shardN-1] [shard0] [shard1] ... [shardN-1] ...
#
# Scylla versions 1.6 and below used just one repetition of the pattern;
# this intefered with data placement among nodes (vnodes).
#
# Scylla versions 1.7 and above use 4096 repetitions of the pattern; this
# provides for better data distribution.
#
# the value below is log (base 2) of the number of repetitions.
#
# Set to 0 to avoid rewriting all data when upgrading from Scylla 1.6 and
# below.
#
# Keep at 12 for new clusters.
murmur3_partitioner_ignore_msb_bits: 12

View File

@@ -20,11 +20,9 @@
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
#
import os, os.path, textwrap, argparse, sys, shlex, subprocess, tempfile, re, platform
import os, os.path, textwrap, argparse, sys, shlex, subprocess, tempfile, re
from distutils.spawn import find_executable
tempfile.tempdir = "./build/tmp"
configure_args = str.join(' ', [shlex.quote(x) for x in sys.argv[1:]])
for line in open('/etc/os-release'):
@@ -85,33 +83,17 @@ def pkg_config(option, package):
return output.decode('utf-8').strip()
def try_compile(compiler, source = '', flags = []):
return try_compile_and_link(compiler, source, flags = flags + ['-c'])
def ensure_tmp_dir_exists():
if not os.path.exists(tempfile.tempdir):
os.makedirs(tempfile.tempdir)
def try_compile_and_link(compiler, source = '', flags = []):
ensure_tmp_dir_exists()
with tempfile.NamedTemporaryFile() as sfile:
ofile = tempfile.mktemp()
try:
sfile.file.write(bytes(source, 'utf-8'))
sfile.file.flush()
# We can't write to /dev/null, since in some cases (-ftest-coverage) gcc will create an auxiliary
# output file based on the name of the output file, and "/dev/null.gcsa" is not a good name
return subprocess.call([compiler, '-x', 'c++', '-o', ofile, sfile.name] + args.user_cflags.split() + flags,
stdout = subprocess.DEVNULL,
stderr = subprocess.DEVNULL) == 0
finally:
if os.path.exists(ofile):
os.unlink(ofile)
sfile.file.write(bytes(source, 'utf-8'))
sfile.file.flush()
return subprocess.call([compiler, '-x', 'c++', '-o', '/dev/null', '-c', sfile.name] + flags,
stdout = subprocess.DEVNULL,
stderr = subprocess.DEVNULL) == 0
def flag_supported(flag, compiler):
def warning_supported(warning, compiler):
# gcc ignores -Wno-x even if it is not supported
adjusted = re.sub('^-Wno-', '-W', flag)
split = adjusted.split(' ')
return try_compile(flags = ['-Werror'] + split, compiler = compiler)
adjusted = re.sub('^-Wno-', '-W', warning)
return try_compile(flags = [adjusted], compiler = compiler)
def debug_flag(compiler):
src_with_auto = textwrap.dedent('''\
@@ -126,19 +108,6 @@ def debug_flag(compiler):
print('Note: debug information disabled; upgrade your compiler')
return ''
def gold_supported(compiler):
src_main = 'int main(int argc, char **argv) { return 0; }'
if try_compile_and_link(source = src_main, flags = ['-fuse-ld=gold'], compiler = compiler):
return '-fuse-ld=gold'
else:
print('Note: gold not found; using default system linker')
return ''
def maybe_static(flag, libs):
if flag and not args.static:
libs = '-Wl,-Bstatic {} -Wl,-Bdynamic'.format(libs)
return libs
class Thrift(object):
def __init__(self, source, service):
self.source = source
@@ -159,13 +128,6 @@ class Thrift(object):
def endswith(self, end):
return self.source.endswith(end)
def default_target_arch():
mach = platform.machine()
if platform.machine() in ['i386', 'i686', 'x86_64']:
return 'nehalem'
else:
return ''
class Antlr3Grammar(object):
def __init__(self, source):
self.source = source
@@ -187,22 +149,20 @@ modes = {
'debug': {
'sanitize': '-fsanitize=address -fsanitize=leak -fsanitize=undefined',
'sanitize_libs': '-lasan -lubsan',
'opt': '-O0 -DDEBUG -DDEBUG_SHARED_PTR -DDEFAULT_ALLOCATOR -DDEBUG_LSA_SANITIZER',
'opt': '-O0 -DDEBUG -DDEBUG_SHARED_PTR -DDEFAULT_ALLOCATOR',
'libs': '',
},
'release': {
'sanitize': '',
'sanitize_libs': '',
'opt': '-O3',
'opt': '-O2',
'libs': '',
},
}
scylla_tests = [
'tests/mutation_test',
'tests/mvcc_test',
'tests/mutation_fragment_test',
'tests/flat_mutation_reader_test',
'tests/streamed_mutation_test',
'tests/schema_registry_test',
'tests/canonical_mutation_test',
'tests/range_test',
@@ -210,9 +170,6 @@ scylla_tests = [
'tests/keys_test',
'tests/partitioner_test',
'tests/frozen_mutation_test',
'tests/serialized_action_test',
'tests/hint_test',
'tests/clustering_ranges_walker_test',
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
@@ -221,14 +178,9 @@ scylla_tests = [
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/perf/perf_simple_query',
'tests/perf/perf_fast_forward',
'tests/perf/perf_cache_eviction',
'tests/cache_flat_mutation_reader_test',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/perf/perf_sstable',
'tests/cql_query_test',
'tests/secondary_index_test',
'tests/storage_proxy_test',
'tests/schema_change_test',
'tests/mutation_reader_test',
@@ -236,9 +188,7 @@ scylla_tests = [
'tests/row_cache_test',
'tests/test-serialization',
'tests/sstable_test',
'tests/sstable_3_x_test',
'tests/sstable_mutation_test',
'tests/sstable_resharding_test',
'tests/memtable_test',
'tests/commitlog_test',
'tests/cartesian_product_test',
@@ -251,7 +201,6 @@ scylla_tests = [
'tests/config_test',
'tests/gossiping_property_file_snitch_test',
'tests/ec2_snitch_test',
'tests/gce_snitch_test',
'tests/snitch_reset_test',
'tests/network_topology_strategy_test',
'tests/query_processor_test',
@@ -261,7 +210,6 @@ scylla_tests = [
'tests/murmur_hash_test',
'tests/allocation_strategy_test',
'tests/logalloc_test',
'tests/log_heap_test',
'tests/managed_vector_test',
'tests/crc_test',
'tests/flush_queue_test',
@@ -273,50 +221,14 @@ scylla_tests = [
'tests/database_test',
'tests/nonwrapping_range_test',
'tests/input_stream_test',
'tests/virtual_reader_test',
'tests/view_schema_test',
'tests/view_build_test',
'tests/view_complex_test',
'tests/counter_test',
'tests/cell_locker_test',
'tests/row_locker_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/continuous_data_consumer_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/loading_cache_test',
'tests/castas_fcts_test',
'tests/big_decimal_test',
'tests/aggregate_fcts_test',
'tests/role_manager_test',
'tests/caching_options_test',
'tests/auth_resource_test',
'tests/cql_auth_query_test',
'tests/enum_set_test',
'tests/extensions_test',
'tests/cql_auth_syntax_test',
'tests/querier_cache',
'tests/limiting_data_source_test',
'tests/meta_test',
'tests/imr_test',
'tests/partition_data_test',
'tests/reusable_buffer_test',
'tests/multishard_writer_test',
]
perf_tests = [
'tests/perf/perf_mutation_readers',
'tests/perf/perf_mutation_fragment',
'tests/perf/perf_idl',
'tests/sstable_atomic_deletion_test',
]
apps = [
'scylla',
]
tests = scylla_tests + perf_tests
tests = scylla_tests
other = [
'iotune',
@@ -338,12 +250,8 @@ arg_parser.add_argument('--cflags', action = 'store', dest = 'user_cflags', defa
help = 'Extra flags for the C++ compiler')
arg_parser.add_argument('--ldflags', action = 'store', dest = 'user_ldflags', default = '',
help = 'Extra flags for the linker')
arg_parser.add_argument('--target', action = 'store', dest = 'target', default = default_target_arch(),
help = 'Target architecture (-march)')
arg_parser.add_argument('--compiler', action = 'store', dest = 'cxx', default = 'g++',
help = 'C++ compiler path')
arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='gcc',
help='C compiler path')
arg_parser.add_argument('--with-osv', action = 'store', dest = 'with_osv', default = '',
help = 'Shortcut for compile for OSv')
arg_parser.add_argument('--enable-dpdk', action = 'store_true', dest = 'dpdk', default = False,
@@ -355,25 +263,13 @@ arg_parser.add_argument('--debuginfo', action = 'store', dest = 'debuginfo', typ
arg_parser.add_argument('--static-stdc++', dest = 'staticcxx', action = 'store_true',
help = 'Link libgcc and libstdc++ statically')
arg_parser.add_argument('--static-thrift', dest = 'staticthrift', action = 'store_true',
help = 'Link libthrift statically')
arg_parser.add_argument('--static-boost', dest = 'staticboost', action = 'store_true',
help = 'Link boost statically')
arg_parser.add_argument('--static-yaml-cpp', dest = 'staticyamlcpp', action = 'store_true',
help = 'Link libyaml-cpp statically')
help = 'Link libthrift statically')
arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
help = 'Enable(1)/disable(0)compiler debug information generation for tests')
arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
help = 'Python3 path')
add_tristate(arg_parser, name = 'hwloc', dest = 'hwloc', help = 'hwloc support')
add_tristate(arg_parser, name = 'xen', dest = 'xen', help = 'Xen support')
arg_parser.add_argument('--enable-gcc6-concepts', dest='gcc6_concepts', action='store_true', default=False,
help='enable experimental support for C++ Concepts as implemented in GCC 6')
arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_injector', action='store_true', default=False,
help='enable allocation failure injection')
arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
help='path to antlr3 executable')
arg_parser.add_argument('--with-ragel', dest='ragel_exec', action='store', default=None,
help='path to ragel executable')
args = arg_parser.parse_args()
defines = []
@@ -383,50 +279,39 @@ extra_cxxflags = {}
cassandra_interface = Thrift(source = 'interface/cassandra.thrift', service = 'Cassandra')
scylla_core = (['database.cc',
'atomic_cell.cc',
'schema.cc',
'frozen_schema.cc',
'schema_registry.cc',
'bytes.cc',
'mutation.cc',
'mutation_fragment.cc',
'streamed_mutation.cc',
'partition_version.cc',
'row_cache.cc',
'canonical_mutation.cc',
'frozen_mutation.cc',
'memtable.cc',
'schema_mutations.cc',
'supervisor.cc',
'release.cc',
'utils/logalloc.cc',
'utils/large_bitset.cc',
'utils/buffer_input_stream.cc',
'utils/limiting_data_source.cc',
'mutation_partition.cc',
'mutation_partition_view.cc',
'mutation_partition_serializer.cc',
'mutation_reader.cc',
'flat_mutation_reader.cc',
'mutation_query.cc',
'keys.cc',
'counters.cc',
'compress.cc',
'sstables/mp_row_consumer.cc',
'sstables/sstables.cc',
'sstables/sstable_version.cc',
'sstables/compress.cc',
'sstables/row.cc',
'sstables/partition.cc',
'sstables/filter.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/compaction_manager.cc',
'sstables/integrity_checked_file_impl.cc',
'sstables/prepended_input_stream.cc',
'sstables/m_format_write_helpers.cc',
'sstables/m_format_read_helpers.cc',
'sstables/atomic_deletion.cc',
'transport/event.cc',
'transport/event_notifier.cc',
'transport/server.cc',
'transport/messages/result_message.cc',
'cql3/abstract_marker.cc',
'cql3/attributes.cc',
'cql3/cf_name.cc',
@@ -438,7 +323,6 @@ scylla_core = (['database.cc',
'cql3/sets.cc',
'cql3/maps.cc',
'cql3/functions/functions.cc',
'cql3/functions/castas_fcts.cc',
'cql3/statements/cf_prop_defs.cc',
'cql3/statements/cf_statement.cc',
'cql3/statements/authentication_statement.cc',
@@ -446,10 +330,9 @@ scylla_core = (['database.cc',
'cql3/statements/create_table_statement.cc',
'cql3/statements/create_view_statement.cc',
'cql3/statements/create_type_statement.cc',
'cql3/statements/drop_index_statement.cc',
'cql3/statements/create_user_statement.cc',
'cql3/statements/drop_keyspace_statement.cc',
'cql3/statements/drop_table_statement.cc',
'cql3/statements/drop_view_statement.cc',
'cql3/statements/drop_type_statement.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
@@ -466,7 +349,8 @@ scylla_core = (['database.cc',
'cql3/statements/create_index_statement.cc',
'cql3/statements/truncate_statement.cc',
'cql3/statements/alter_table_statement.cc',
'cql3/statements/alter_view_statement.cc',
'cql3/statements/alter_user_statement.cc',
'cql3/statements/drop_user_statement.cc',
'cql3/statements/list_users_statement.cc',
'cql3/statements/authorization_statement.cc',
'cql3/statements/permission_altering_statement.cc',
@@ -475,10 +359,9 @@ scylla_core = (['database.cc',
'cql3/statements/revoke_statement.cc',
'cql3/statements/alter_type_statement.cc',
'cql3/statements/alter_keyspace_statement.cc',
'cql3/statements/role-management-statements.cc',
'cql3/update_parameters.cc',
'cql3/ut_name.cc',
'cql3/role_name.cc',
'cql3/user_options.cc',
'thrift/handler.cc',
'thrift/server.cc',
'thrift/thrift_validation.cc',
@@ -511,28 +394,18 @@ scylla_core = (['database.cc',
'cql3/selection/selector.cc',
'cql3/restrictions/statement_restrictions.cc',
'cql3/result_set.cc',
'cql3/variable_specifications.cc',
'db/consistency_level.cc',
'db/system_keyspace.cc',
'db/system_distributed_keyspace.cc',
'db/schema_tables.cc',
'db/cql_type_parser.cc',
'db/legacy_schema_migrator.cc',
'db/commitlog/commitlog.cc',
'db/commitlog/commitlog_replayer.cc',
'db/commitlog/commitlog_entry.cc',
'db/hints/manager.cc',
'db/hints/resource_manager.cc',
'db/config.cc',
'db/extensions.cc',
'db/heat_load_balance.cc',
'db/large_partition_handler.cc',
'db/index/secondary_index.cc',
'db/marshal/type_parser.cc',
'db/batchlog_manager.cc',
'db/view/view.cc',
'db/view/row_locking.cc',
'index/secondary_index_manager.cc',
'index/secondary_index.cc',
'io/io.cc',
'utils/utils.cc',
'utils/UUID_gen.cc',
'utils/i_filter.cc',
'utils/bloom_filter.cc',
@@ -542,7 +415,6 @@ scylla_core = (['database.cc',
'utils/dynamic_bitset.cc',
'utils/managed_bytes.cc',
'utils/exceptions.cc',
'utils/config_file.cc',
'gms/version_generator.cc',
'gms/versioned_value.cc',
'gms/gossiper.cc',
@@ -552,7 +424,6 @@ scylla_core = (['database.cc',
'gms/gossip_digest_ack2.cc',
'gms/endpoint_state.cc',
'gms/application_state.cc',
'gms/inet_address.cc',
'dht/i_partitioner.cc',
'dht/murmur3_partitioner.cc',
'dht/byte_ordered_partitioner.cc',
@@ -568,6 +439,7 @@ scylla_core = (['database.cc',
'locator/network_topology_strategy.cc',
'locator/everywhere_replication_strategy.cc',
'locator/token_metadata.cc',
'locator/locator.cc',
'locator/snitch_base.cc',
'locator/simple_snitch.cc',
'locator/rack_inferring_snitch.cc',
@@ -575,12 +447,11 @@ scylla_core = (['database.cc',
'locator/production_snitch_base.cc',
'locator/ec2_snitch.cc',
'locator/ec2_multi_region_snitch.cc',
'locator/gce_snitch.cc',
'message/messaging_service.cc',
'service/client_state.cc',
'service/migration_task.cc',
'service/storage_service.cc',
'service/misc_services.cc',
'service/load_broadcaster.cc',
'service/pager/paging_state.cc',
'service/pager/query_pagers.cc',
'streaming/stream_task.cc',
@@ -596,41 +467,26 @@ scylla_core = (['database.cc',
'streaming/stream_manager.cc',
'streaming/stream_result_future.cc',
'streaming/stream_session_state.cc',
'clocks-impl.cc',
'gc_clock.cc',
'partition_slice_builder.cc',
'init.cc',
'lister.cc',
'repair/repair.cc',
'exceptions/exceptions.cc',
'auth/allow_all_authenticator.cc',
'auth/allow_all_authorizer.cc',
'dns.cc',
'auth/auth.cc',
'auth/authenticated_user.cc',
'auth/authenticator.cc',
'auth/common.cc',
'auth/authorizer.cc',
'auth/default_authorizer.cc',
'auth/resource.cc',
'auth/roles-metadata.cc',
'auth/data_resource.cc',
'auth/password_authenticator.cc',
'auth/permission.cc',
'auth/permissions_cache.cc',
'auth/service.cc',
'auth/standard_role_manager.cc',
'auth/transitional.cc',
'auth/authentication_options.cc',
'auth/role_or_anonymous.cc',
'tracing/tracing.cc',
'tracing/trace_keyspace_helper.cc',
'tracing/trace_state.cc',
'table_helper.cc',
'range_tombstone.cc',
'range_tombstone_list.cc',
'disk-error-handler.cc',
'duration.cc',
'vint-serialization.cc',
'utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc',
'querier.cc',
'data/cell.cc',
'multishard_writer.cc',
'db/size_estimates_recorder.cc'
]
+ [Antlr3Grammar('cql3/Cql.g')]
+ [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -667,9 +523,7 @@ api = ['api/api.cc',
'api/api-doc/stream_manager.json',
'api/stream_manager.cc',
'api/api-doc/system.json',
'api/system.cc',
'api/config.cc',
'api/api-doc/config.json',
'api/system.cc'
]
idls = ['idl/gossip_digest.idl.hh',
@@ -693,11 +547,9 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/idl_test.idl.hh',
'idl/commitlog.idl.hh',
'idl/tracing.idl.hh',
'idl/consistency_level.idl.hh',
'idl/cache_temperature.idl.hh',
]
scylla_tests_dependencies = scylla_core + idls + [
scylla_tests_dependencies = scylla_core + api + idls + [
'tests/cql_test_env.cc',
'tests/cql_assertions.cc',
'tests/result_set_assertions.cc',
@@ -710,124 +562,66 @@ scylla_tests_seastar_deps = [
]
deps = {
'scylla': idls + ['main.cc', 'release.cc'] + scylla_core + api,
'scylla': idls + ['main.cc'] + scylla_core + api,
}
pure_boost_tests = set([
tests_not_using_seastar_test_framework = set([
'tests/keys_test',
'tests/partitioner_test',
'tests/map_difference_test',
'tests/keys_test',
'tests/compound_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/nonwrapping_range_test',
'tests/test-serialization',
'tests/range_test',
'tests/crc_test',
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/cartesian_product_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/big_decimal_test',
'tests/caching_options_test',
'tests/auth_resource_test',
'tests/enum_set_test',
'tests/cql_auth_syntax_test',
'tests/meta_test',
'tests/imr_test',
'tests/partition_data_test',
'tests/reusable_buffer_test',
])
tests_not_using_seastar_test_framework = set([
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
'tests/row_cache_alloc_stress',
'tests/perf_row_cache_update',
'tests/cartesian_product_test',
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/message',
'tests/perf/perf_simple_query',
'tests/perf/perf_fast_forward',
'tests/perf/perf_cache_eviction',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/test-serialization',
'tests/gossip',
'tests/compound_test',
'tests/range_test',
'tests/crc_test',
'tests/perf/perf_sstable',
]) | pure_boost_tests
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/nonwrapping_range_test',
])
for t in tests_not_using_seastar_test_framework:
if not t in scylla_tests:
raise Exception("Test %s not found in scylla_tests" % (t))
for t in scylla_tests:
deps[t] = [t + '.cc']
deps[t] = scylla_tests_dependencies + [t + '.cc']
if t not in tests_not_using_seastar_test_framework:
deps[t] += scylla_tests_dependencies
deps[t] += scylla_tests_seastar_deps
else:
deps[t] += scylla_core + idls + ['tests/cql_test_env.cc']
perf_tests_seastar_deps = [
'seastar/tests/perf/perf_tests.cc'
]
deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
for t in perf_tests:
deps[t] = [t + '.cc'] + scylla_tests_dependencies + perf_tests_seastar_deps
deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc', 'tests/sstable_utils.cc']
deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc']
deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
deps['tests/perf/perf_fast_forward'] += ['release.cc']
deps['tests/meta_test'] = ['tests/meta_test.cc']
deps['tests/imr_test'] = ['tests/imr_test.cc']
deps['tests/reusable_buffer_test'] = ['tests/reusable_buffer_test.cc']
warnings = [
'-Wno-mismatched-tags', # clang-only
'-Wno-maybe-uninitialized', # false positives on gcc 5
'-Wno-tautological-compare',
'-Wno-parentheses-equality',
'-Wno-c++11-narrowing',
'-Wno-c++1z-extensions',
'-Wno-sometimes-uninitialized',
'-Wno-return-stack-address',
'-Wno-missing-braces',
'-Wno-unused-lambda-capture',
'-Wno-misleading-indentation',
'-Wno-overflow',
'-Wno-noexcept-type',
'-Wno-nonnull-compare'
]
warnings = [w
for w in warnings
if flag_supported(flag = w, compiler = args.cxx)]
if warning_supported(warning = w, compiler = args.cxx)]
warnings = ' '.join(warnings + ['-Wno-error=deprecated-declarations'])
optimization_flags = [
'--param inline-unit-growth=300',
]
optimization_flags = [o
for o in optimization_flags
if flag_supported(flag = o, compiler = args.cxx)]
modes['release']['opt'] += ' ' + ' '.join(optimization_flags)
gold_linker_flag = gold_supported(compiler = args.cxx)
warnings = ' '.join(warnings)
dbgflag = debug_flag(args.cxx) if args.debuginfo else ''
tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'
@@ -868,22 +662,6 @@ for pkglist in optional_packages:
alternatives = ':'.join(pkglist[1:])
print('Missing optional package {pkglist[0]} (or alteratives {alternatives})'.format(**locals()))
compiler_test_src = '''
#if __GNUC__ < 7
#error "MAJOR"
#elif __GNUC__ == 7
#if __GNUC_MINOR__ < 3
#error "MINOR"
#endif
#endif
int main() { return 0; }
'''
if not try_compile_and_link(compiler=args.cxx, source=compiler_test_src):
print('Wrong GCC version. Scylla needs GCC >= 7.3 to compile.')
sys.exit(1)
if not try_compile(compiler=args.cxx, source='#include <boost/version.hpp>'):
print('Boost not installed. Please install {}.'.format(pkgname("boost-devel")))
sys.exit(1)
@@ -897,9 +675,6 @@ if not try_compile(compiler=args.cxx, source='''\
print('Installed boost version too old. Please update {}.'.format(pkgname("boost-devel")))
sys.exit(1)
has_sanitize_address_use_after_scope = try_compile(compiler=args.cxx, flags=['-fsanitize-address-use-after-scope'], source='int f() {}')
defines = ' '.join(['-D' + d for d in defines])
globals().update(vars(args))
@@ -922,7 +697,7 @@ scylla_release = file.read().strip()
extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
seastar_flags = []
seastar_flags = ['--disable-xen']
if args.dpdk:
# fake dependencies on dpdk, so that it is built before anything else
seastar_flags += ['--enable-dpdk']
@@ -930,22 +705,9 @@ elif args.dpdk_target:
seastar_flags += ['--dpdk-target', args.dpdk_target]
if args.staticcxx:
seastar_flags += ['--static-stdc++']
if args.staticboost:
seastar_flags += ['--static-boost']
if args.staticyamlcpp:
seastar_flags += ['--static-yaml-cpp']
if args.gcc6_concepts:
seastar_flags += ['--enable-gcc6-concepts']
if args.alloc_failure_injector:
seastar_flags += ['--enable-alloc-failure-injector']
seastar_cflags = args.user_cflags
if args.target != '':
seastar_cflags += ' -march=' + args.target
seastar_ldflags = args.user_ldflags
seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' %(seastar_ldflags),
'--c++-dialect=gnu++1z', '--optflags=%s' % (modes['release']['opt']),
]
seastar_cflags = args.user_cflags + " -march=nehalem"
seastar_flags += ['--compiler', args.cxx, '--cflags=%s' % (seastar_cflags)]
status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')
@@ -976,19 +738,7 @@ for mode in build_modes:
seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'
args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-llz4', '-lz', '-lsnappy', pkg_config("--libs", "jsoncpp"),
maybe_static(args.staticboost, '-lboost_filesystem'), ' -lcrypt', ' -lcryptopp',
maybe_static(args.staticboost, '-lboost_date_time'),
])
xxhash_dir = 'xxHash'
if not os.path.exists(xxhash_dir) or not os.listdir(xxhash_dir):
raise Exception(xxhash_dir + ' is empty. Run "git submodule update --init".')
if not args.staticboost:
args.user_cflags += ' -DBOOST_TEST_DYN_LINK'
libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem' + ' -lcrypt' + ' -lboost_date_time'
for pkg in pkgs:
args.user_cflags += ' ' + pkg_config('--cflags', pkg)
libs += ' ' + pkg_config('--libs', pkg)
@@ -1008,31 +758,18 @@ os.makedirs(outdir, exist_ok = True)
do_sanitize = True
if args.static:
do_sanitize = False
if args.antlr3_exec:
antlr3_exec = args.antlr3_exec
else:
antlr3_exec = "antlr3"
if args.ragel_exec:
ragel_exec = args.ragel_exec
else:
ragel_exec = "ragel"
with open(buildfile, 'w') as f:
f.write(textwrap.dedent('''\
configure_args = {configure_args}
builddir = {outdir}
cxx = {cxx}
cxxflags = {user_cflags} {warnings} {defines}
ldflags = {gold_linker_flag} {user_ldflags}
ldflags = {user_ldflags}
libs = {libs}
pool link_pool
depth = {link_pool_depth}
pool seastar_pool
depth = 1
rule ragel
command = {ragel_exec} -G2 -o $out $in
command = ragel -G2 -o $out $in
description = RAGEL $out
rule gen
command = echo -e $text > $out
@@ -1054,9 +791,9 @@ with open(buildfile, 'w') as f:
for mode in build_modes:
modeval = modes[mode]
f.write(textwrap.dedent('''\
cxxflags_{mode} = {opt} -DXXH_PRIVATE_API -I. -I $builddir/{mode}/gen -I seastar -I seastar/build/{mode}/gen
cxxflags_{mode} = -I. -I $builddir/{mode}/gen -I seastar -I seastar/build/{mode}/gen
rule cxx.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -c -o $out $in
command = $cxx -MMD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} -c -o $out $in
description = CXX $out
depfile = $out.d
rule link.{mode}
@@ -1074,19 +811,9 @@ with open(buildfile, 'w') as f:
command = thrift -gen cpp:cob_style -out $builddir/{mode}/gen $in
description = THRIFT $in
rule antlr3.{mode}
# We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.
# Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
# name, we also add a global typedef to avoid compilation errors.
command = sed -e '/^#if 0/,/^#endif/d' $in > $builddir/{mode}/gen/$in $
&& {antlr3_exec} $builddir/{mode}/gen/$in $
&& sed -i -e 's/^\\( *\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$$/\\1const \\2/' $
-e '1i using ExceptionBaseType = int;' $
-e 's/^{{/{{ ExceptionBaseType\* ex = nullptr;/; $
s/ExceptionBaseType\* ex = new/ex = new/; $
s/exceptions::syntax_exception e/exceptions::syntax_exception\& e/' $
build/{mode}/gen/${{stem}}Parser.cpp
command = sed -e '/^#if 0/,/^#endif/d' $in > $builddir/{mode}/gen/$in && antlr3 $builddir/{mode}/gen/$in && sed -i 's/^\\( *\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$$/\\1const \\2/' build/{mode}/gen/${{stem}}Parser.cpp
description = ANTLR3 $in
''').format(mode = mode, antlr3_exec = antlr3_exec, **modeval))
''').format(mode = mode, **modeval))
f.write('build {mode}: phony {artifacts}\n'.format(mode = mode,
artifacts = str.join(' ', ('$builddir/' + mode + '/' + x for x in build_artifacts))))
compiles = {}
@@ -1102,7 +829,6 @@ with open(buildfile, 'w') as f:
objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
for src in srcs
if src.endswith('.cc')]
objs.append('$builddir/../utils/arch/powerpc/crc32-vpmsum/crc32.S')
has_thrift = False
for dep in deps[binary]:
if isinstance(dep, Thrift):
@@ -1110,15 +836,22 @@ with open(buildfile, 'w') as f:
objs += dep.objects('$builddir/' + mode + '/gen')
if isinstance(dep, Antlr3Grammar):
objs += dep.objects('$builddir/' + mode + '/gen')
if binary.endswith('.a'):
if binary.endswith('.pc'):
vars = modeval.copy()
vars.update(globals())
pc = textwrap.dedent('''\
Name: Seastar
URL: http://seastar-project.org/
Description: Advanced C++ framework for high-performance server applications on modern hardware.
Version: 1.0
Libs: -L{srcdir}/{builddir} -Wl,--whole-archive -lseastar -Wl,--no-whole-archive {dbgflag} -Wl,--no-as-needed {static} {pie} -fvisibility=hidden -pthread {user_ldflags} {libs} {sanitize_libs}
Cflags: -std=gnu++1y {dbgflag} {fpie} -Wall -Werror -fvisibility=hidden -pthread -I{srcdir} -I{srcdir}/{builddir}/gen {user_cflags} {warnings} {defines} {sanitize} {opt}
''').format(builddir = 'build/' + mode, srcdir = os.getcwd(), **vars)
f.write('build $builddir/{}/{}: gen\n text = {}\n'.format(mode, binary, repr(pc)))
elif binary.endswith('.a'):
f.write('build $builddir/{}/{}: ar.{} {}\n'.format(mode, binary, mode, str.join(' ', objs)))
else:
if binary.startswith('tests/'):
local_libs = '$libs'
if binary not in tests_not_using_seastar_test_framework or binary in pure_boost_tests:
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
if has_thrift:
local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
# Our code's debugging information is huge, and multiplied
# by many tests yields ridiculous amounts of disk space.
# So we strip the tests by default; The user can very
@@ -1126,15 +859,15 @@ with open(buildfile, 'w') as f:
# to the test name, e.g., "ninja build/release/testname_g"
f.write('build $builddir/{}/{}: {}.{} {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
f.write(' libs = {}\n'.format(local_libs))
if has_thrift:
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
f.write('build $builddir/{}/{}_g: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
f.write(' libs = {}\n'.format(local_libs))
else:
f.write('build $builddir/{}/{}: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
if has_thrift:
f.write(' libs = {} {} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system')))
if has_thrift:
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
for src in srcs:
if src.endswith('.cc'):
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
@@ -1173,7 +906,7 @@ with open(buildfile, 'w') as f:
f.write('build {}: ragel {}\n'.format(hh, src))
for hh in swaggers:
src = swaggers[hh]
f.write('build {}: swagger {} | seastar/json/json2code.py\n'.format(hh,src))
f.write('build {}: swagger {}\n'.format(hh,src))
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh,src))
@@ -1190,12 +923,8 @@ with open(buildfile, 'w') as f:
for cc in grammar.sources('$builddir/{}/gen'.format(mode)):
obj = cc.replace('.cpp', '.o')
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
if cc.endswith('Parser.cpp') and has_sanitize_address_use_after_scope:
# Parsers end up using huge amounts of stack space and overflowing their stack
f.write(' obj_cxxflags = -fno-sanitize-address-use-after-scope\n')
f.write('build seastar/build/{mode}/libseastar.a seastar/build/{mode}/apps/iotune/iotune seastar/build/{mode}/gen/http/request_parser.hh seastar/build/{mode}/gen/http/http_response_parser.hh: ninja {seastar_deps}\n'
.format(**locals()))
f.write(' pool = seastar_pool\n')
f.write(' subdir = seastar\n')
f.write(' target = build/{mode}/libseastar.a build/{mode}/apps/iotune/iotune build/{mode}/gen/http/request_parser.hh build/{mode}/gen/http/http_response_parser.hh\n'.format(**locals()))
f.write(textwrap.dedent('''\
@@ -1206,7 +935,7 @@ with open(buildfile, 'w') as f:
rule configure
command = {python} configure.py $configure_args
generator = 1
build build.ninja: configure | configure.py seastar/configure.py
build build.ninja: configure | configure.py
rule cscope
command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
description = CSCOPE

View File

@@ -22,7 +22,6 @@
#pragma once
#include "mutation_partition_view.hh"
#include "mutation_partition.hh"
#include "schema.hh"
// Mutation partition visitor which applies visited data into
@@ -38,33 +37,17 @@ private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
return;
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
dst.apply(new_def, atomic_cell_or_collection(cell));
}
auto new_cell = [&] {
if (cell.is_live() && !old_type->is_counter()) {
if (cell.is_live_and_has_ttl()) {
return atomic_cell_or_collection(
atomic_cell::make_live(*new_def.type, cell.timestamp(), cell.value().linearize(), cell.expiry(), cell.ttl())
);
}
return atomic_cell_or_collection(
atomic_cell::make_live(*new_def.type, cell.timestamp(), cell.value().linearize())
);
} else {
return atomic_cell_or_collection(*new_def.type, cell);
}
}();
dst.apply(new_def, std::move(new_cell));
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
if (!is_compatible(new_def, old_type, kind)) {
return;
}
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
auto old_view = ctype->deserialize_mutation_form(cell_bv);
auto old_view = ctype->deserialize_mutation_form(cell);
collection_type_impl::mutation_view new_view;
if (old_view.tomb.timestamp > new_def.dropped_at()) {
@@ -76,7 +59,6 @@ private:
}
}
dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
});
}
public:
converting_mutation_partition_applier(
@@ -92,10 +74,6 @@ public:
_p.apply(t);
}
void accept_static_cell(column_id id, atomic_cell cell) {
return accept_static_cell(id, atomic_cell_view(cell));
}
virtual void accept_static_cell(column_id id, atomic_cell_view cell) override {
const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
@@ -116,17 +94,13 @@ public:
_p.apply_row_tombstone(_p_schema, rt);
}
virtual void accept_row(position_in_partition_view key, const row_tombstone& deleted_at, const row_marker& rm, is_dummy dummy, is_continuous continuous) override {
deletable_row& r = _p.clustered_row(_p_schema, key, dummy, continuous);
virtual void accept_row(clustering_key_view key, tombstone deleted_at, const row_marker& rm) override {
deletable_row& r = _p.clustered_row(_p_schema, key);
r.apply(rm);
r.apply(deleted_at);
_current_row = &r;
}
void accept_row_cell(column_id id, atomic_cell cell) {
return accept_row_cell(id, atomic_cell_view(cell));
}
virtual void accept_row_cell(column_id id, atomic_cell_view cell) override {
const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
@@ -142,14 +116,4 @@ public:
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
}
}
// Appends the cell to dst upgrading it to the new schema.
// Cells must have monotonic names.
static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const column_definition& old_def, const atomic_cell_or_collection& cell) {
if (new_def.is_atomic()) {
accept_cell(dst, kind, new_def, old_def.type, cell.as_atomic_cell(old_def));
} else {
accept_cell(dst, kind, new_def, old_def.type, cell.as_collection_mutation());
}
}
};

View File

@@ -1,302 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "service/storage_service.hh"
#include "counters.hh"
#include "mutation.hh"
#include "combine.hh"
counter_id counter_id::local()
{
return counter_id(service::get_local_storage_service().get_local_id());
}
bool counter_id::less_compare_1_7_4::operator()(const counter_id& a, const counter_id& b) const
{
if (a._most_significant != b._most_significant) {
return a._most_significant < b._most_significant;
} else {
return a._least_significant < b._least_significant;
}
}
std::ostream& operator<<(std::ostream& os, const counter_id& id) {
return os << id.to_uuid();
}
std::ostream& operator<<(std::ostream& os, counter_shard_view csv) {
return os << "{global_shard id: " << csv.id() << " value: " << csv.value()
<< " clock: " << csv.logical_clock() << "}";
}
std::ostream& operator<<(std::ostream& os, counter_cell_view ccv) {
return os << "{counter_cell timestamp: " << ccv.timestamp() << " shards: {" << ::join(", ", ccv.shards()) << "}}";
}
void counter_cell_builder::do_sort_and_remove_duplicates()
{
boost::range::sort(_shards, [] (auto& a, auto& b) { return a.id() < b.id(); });
std::vector<counter_shard> new_shards;
new_shards.reserve(_shards.size());
for (auto& cs : _shards) {
if (new_shards.empty() || new_shards.back().id() != cs.id()) {
new_shards.emplace_back(cs);
} else {
new_shards.back().apply(cs);
}
}
_shards = std::move(new_shards);
_sorted = true;
}
std::vector<counter_shard> counter_cell_view::shards_compatible_with_1_7_4() const
{
auto sorted_shards = boost::copy_range<std::vector<counter_shard>>(shards());
counter_id::less_compare_1_7_4 cmp;
boost::range::sort(sorted_shards, [&] (auto& a, auto& b) {
return cmp(a.id(), b.id());
});
return sorted_shards;
}
static bool apply_in_place(const column_definition& cdef, atomic_cell_mutable_view dst, atomic_cell_mutable_view src)
{
auto dst_ccmv = counter_cell_mutable_view(dst);
auto src_ccmv = counter_cell_mutable_view(src);
auto dst_shards = dst_ccmv.shards();
auto src_shards = src_ccmv.shards();
auto dst_it = dst_shards.begin();
auto src_it = src_shards.begin();
while (src_it != src_shards.end()) {
while (dst_it != dst_shards.end() && dst_it->id() < src_it->id()) {
++dst_it;
}
if (dst_it == dst_shards.end() || dst_it->id() != src_it->id()) {
// Fast-path failed. Revert and fall back to the slow path.
if (dst_it == dst_shards.end()) {
--dst_it;
}
while (src_it != src_shards.begin()) {
--src_it;
while (dst_it->id() != src_it->id()) {
--dst_it;
}
src_it->swap_value_and_clock(*dst_it);
}
return false;
}
if (dst_it->logical_clock() < src_it->logical_clock()) {
dst_it->swap_value_and_clock(*src_it);
} else {
src_it->set_value_and_clock(*dst_it);
}
++src_it;
}
auto dst_ts = dst_ccmv.timestamp();
auto src_ts = src_ccmv.timestamp();
dst_ccmv.set_timestamp(std::max(dst_ts, src_ts));
src_ccmv.set_timestamp(dst_ts);
return true;
}
void counter_cell_view::apply(const column_definition& cdef, atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
{
auto dst_ac = dst.as_atomic_cell(cdef);
auto src_ac = src.as_atomic_cell(cdef);
if (!dst_ac.is_live() || !src_ac.is_live()) {
if (dst_ac.is_live() || (!src_ac.is_live() && compare_atomic_cell_for_merge(dst_ac, src_ac) < 0)) {
std::swap(dst, src);
}
return;
}
if (dst_ac.is_counter_update() && src_ac.is_counter_update()) {
auto src_v = src_ac.counter_update_value();
auto dst_v = dst_ac.counter_update_value();
dst = atomic_cell::make_live_counter_update(std::max(dst_ac.timestamp(), src_ac.timestamp()),
src_v + dst_v);
return;
}
assert(!dst_ac.is_counter_update());
assert(!src_ac.is_counter_update());
with_linearized(dst_ac, [&] (counter_cell_view dst_ccv) {
with_linearized(src_ac, [&] (counter_cell_view src_ccv) {
if (dst_ccv.shard_count() >= src_ccv.shard_count()) {
auto dst_amc = dst.as_mutable_atomic_cell(cdef);
auto src_amc = src.as_mutable_atomic_cell(cdef);
if (!dst_amc.is_value_fragmented() && !src_amc.is_value_fragmented()) {
if (apply_in_place(cdef, dst_amc, src_amc)) {
return;
}
}
}
auto dst_shards = dst_ccv.shards();
auto src_shards = src_ccv.shards();
counter_cell_builder result;
combine(dst_shards.begin(), dst_shards.end(), src_shards.begin(), src_shards.end(),
result.inserter(), counter_shard_view::less_compare_by_id(), [] (auto& x, auto& y) {
return x.logical_clock() < y.logical_clock() ? y : x;
});
auto cell = result.build(std::max(dst_ac.timestamp(), src_ac.timestamp()));
src = std::exchange(dst, atomic_cell_or_collection(std::move(cell)));
});
});
}
stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, atomic_cell_view b)
{
assert(!a.is_counter_update());
assert(!b.is_counter_update());
if (!b.is_live() || !a.is_live()) {
if (b.is_live() || (!a.is_live() && compare_atomic_cell_for_merge(b, a) < 0)) {
return atomic_cell(*counter_type, a);
}
return { };
}
return with_linearized(a, [&] (counter_cell_view a_ccv) {
return with_linearized(b, [&] (counter_cell_view b_ccv) {
auto a_shards = a_ccv.shards();
auto b_shards = b_ccv.shards();
auto a_it = a_shards.begin();
auto a_end = a_shards.end();
auto b_it = b_shards.begin();
auto b_end = b_shards.end();
counter_cell_builder result;
while (a_it != a_end) {
while (b_it != b_end && (*b_it).id() < (*a_it).id()) {
++b_it;
}
if (b_it == b_end || (*a_it).id() != (*b_it).id() || (*a_it).logical_clock() > (*b_it).logical_clock()) {
result.add_shard(counter_shard(*a_it));
}
++a_it;
}
stdx::optional<atomic_cell> diff;
if (!result.empty()) {
diff = result.build(std::max(a.timestamp(), b.timestamp()));
} else if (a.timestamp() > b.timestamp()) {
diff = atomic_cell::make_live(*counter_type, a.timestamp(), bytes_view());
}
return diff;
});
});
}
void transform_counter_updates_to_shards(mutation& m, const mutation* current_state, uint64_t clock_offset) {
// FIXME: allow current_state to be frozen_mutation
auto transform_new_row_to_shards = [&s = *m.schema(), clock_offset] (column_kind kind, auto& cells) {
cells.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
auto& cdef = s.column_at(kind, id);
auto acv = ac_o_c.as_atomic_cell(cdef);
if (!acv.is_live()) {
return; // continue -- we are in lambda
}
auto delta = acv.counter_update_value();
auto cs = counter_shard(counter_id::local(), delta, clock_offset + 1);
ac_o_c = counter_cell_builder::from_single_shard(acv.timestamp(), cs);
});
};
if (!current_state) {
transform_new_row_to_shards(column_kind::static_column, m.partition().static_row());
for (auto& cr : m.partition().clustered_rows()) {
transform_new_row_to_shards(column_kind::regular_column, cr.row().cells());
}
return;
}
clustering_key::less_compare cmp(*m.schema());
auto transform_row_to_shards = [&s = *m.schema(), clock_offset] (column_kind kind, auto& transformee, auto& state) {
std::deque<std::pair<column_id, counter_shard>> shards;
state.for_each_cell([&] (column_id id, const atomic_cell_or_collection& ac_o_c) {
auto& cdef = s.column_at(kind, id);
auto acv = ac_o_c.as_atomic_cell(cdef);
if (!acv.is_live()) {
return; // continue -- we are in lambda
}
counter_cell_view::with_linearized(acv, [&] (counter_cell_view ccv) {
auto cs = ccv.local_shard();
if (!cs) {
return; // continue
}
shards.emplace_back(std::make_pair(id, counter_shard(*cs)));
});
});
transformee.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
auto& cdef = s.column_at(kind, id);
auto acv = ac_o_c.as_atomic_cell(cdef);
if (!acv.is_live()) {
return; // continue -- we are in lambda
}
while (!shards.empty() && shards.front().first < id) {
shards.pop_front();
}
auto delta = acv.counter_update_value();
if (shards.empty() || shards.front().first > id) {
auto cs = counter_shard(counter_id::local(), delta, clock_offset + 1);
ac_o_c = counter_cell_builder::from_single_shard(acv.timestamp(), cs);
} else {
auto& cs = shards.front().second;
cs.update(delta, clock_offset + 1);
ac_o_c = counter_cell_builder::from_single_shard(acv.timestamp(), cs);
shards.pop_front();
}
});
};
transform_row_to_shards(column_kind::static_column, m.partition().static_row(), current_state->partition().static_row());
auto& cstate = current_state->partition();
auto it = cstate.clustered_rows().begin();
auto end = cstate.clustered_rows().end();
for (auto& cr : m.partition().clustered_rows()) {
while (it != end && cmp(it->key(), cr.key())) {
++it;
}
if (it == end || cmp(cr.key(), it->key())) {
transform_new_row_to_shards(column_kind::regular_column, cr.row().cells());
continue;
}
transform_row_to_shards(column_kind::regular_column, cr.row().cells(), it->row().cells());
}
}

View File

@@ -1,469 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <boost/range/algorithm/find_if.hpp>
#include "atomic_cell_or_collection.hh"
#include "types.hh"
#include "stdx.hh"
class mutation;
class mutation;
class counter_id {
int64_t _least_significant;
int64_t _most_significant;
public:
static_assert(std::is_same<decltype(std::declval<utils::UUID>().get_least_significant_bits()), int64_t>::value
&& std::is_same<decltype(std::declval<utils::UUID>().get_most_significant_bits()), int64_t>::value,
"utils::UUID is expected to work with two signed 64-bit integers");
counter_id() = default;
explicit counter_id(utils::UUID uuid) noexcept
: _least_significant(uuid.get_least_significant_bits())
, _most_significant(uuid.get_most_significant_bits())
{ }
utils::UUID to_uuid() const {
return utils::UUID(_most_significant, _least_significant);
}
bool operator<(const counter_id& other) const {
return to_uuid() < other.to_uuid();
}
bool operator>(const counter_id& other) const {
return other.to_uuid() < to_uuid();
}
bool operator==(const counter_id& other) const {
return to_uuid() == other.to_uuid();
}
bool operator!=(const counter_id& other) const {
return !(*this == other);
}
public:
// (Wrong) Counter ID ordering used by Scylla 1.7.4 and earlier.
struct less_compare_1_7_4 {
bool operator()(const counter_id& a, const counter_id& b) const;
};
public:
static counter_id local();
// For tests.
static counter_id generate_random() {
return counter_id(utils::make_random_uuid());
}
};
static_assert(std::is_pod<counter_id>::value, "counter_id should be a POD type");
std::ostream& operator<<(std::ostream& os, const counter_id& id);
template<mutable_view is_mutable>
class basic_counter_shard_view {
enum class offset : unsigned {
id = 0u,
value = unsigned(id) + sizeof(counter_id),
logical_clock = unsigned(value) + sizeof(int64_t),
total_size = unsigned(logical_clock) + sizeof(int64_t),
};
private:
using pointer_type = std::conditional_t<is_mutable == mutable_view::no, const signed char*, signed char*>;
pointer_type _base;
private:
template<typename T>
T read(offset off) const {
T value;
std::copy_n(_base + static_cast<unsigned>(off), sizeof(T), reinterpret_cast<signed char*>(&value));
return value;
}
public:
static constexpr auto size = size_t(offset::total_size);
public:
basic_counter_shard_view() = default;
explicit basic_counter_shard_view(pointer_type ptr) noexcept
: _base(ptr) { }
counter_id id() const { return read<counter_id>(offset::id); }
int64_t value() const { return read<int64_t>(offset::value); }
int64_t logical_clock() const { return read<int64_t>(offset::logical_clock); }
void swap_value_and_clock(basic_counter_shard_view& other) noexcept {
static constexpr size_t off = size_t(offset::value);
static constexpr size_t size = size_t(offset::total_size) - off;
signed char tmp[size];
std::copy_n(_base + off, size, tmp);
std::copy_n(other._base + off, size, _base + off);
std::copy_n(tmp, size, other._base + off);
}
void set_value_and_clock(const basic_counter_shard_view& other) noexcept {
static constexpr size_t off = size_t(offset::value);
static constexpr size_t size = size_t(offset::total_size) - off;
std::copy_n(other._base + off, size, _base + off);
}
bool operator==(const basic_counter_shard_view& other) const {
return id() == other.id() && value() == other.value()
&& logical_clock() == other.logical_clock();
}
bool operator!=(const basic_counter_shard_view& other) const {
return !(*this == other);
}
struct less_compare_by_id {
bool operator()(const basic_counter_shard_view& x, const basic_counter_shard_view& y) const {
return x.id() < y.id();
}
};
};
using counter_shard_view = basic_counter_shard_view<mutable_view::no>;
std::ostream& operator<<(std::ostream& os, counter_shard_view csv);
class counter_shard {
counter_id _id;
int64_t _value;
int64_t _logical_clock;
private:
template<typename T>
static void write(const T& value, bytes::iterator& out) {
out = std::copy_n(reinterpret_cast<const signed char*>(&value), sizeof(T), out);
}
private:
// Shared logic for applying counter_shards and counter_shard_views.
// T is either counter_shard or basic_counter_shard_view<U>.
template<typename T>
GCC6_CONCEPT(requires requires(T shard) {
{ shard.value() } -> int64_t;
{ shard.logical_clock() } -> int64_t;
})
counter_shard& do_apply(T&& other) noexcept {
auto other_clock = other.logical_clock();
if (_logical_clock < other_clock) {
_logical_clock = other_clock;
_value = other.value();
}
return *this;
}
public:
counter_shard(counter_id id, int64_t value, int64_t logical_clock) noexcept
: _id(id)
, _value(value)
, _logical_clock(logical_clock)
{ }
explicit counter_shard(counter_shard_view csv) noexcept
: _id(csv.id())
, _value(csv.value())
, _logical_clock(csv.logical_clock())
{ }
counter_id id() const { return _id; }
int64_t value() const { return _value; }
int64_t logical_clock() const { return _logical_clock; }
counter_shard& update(int64_t value_delta, int64_t clock_increment) noexcept {
_value += value_delta;
_logical_clock += clock_increment;
return *this;
}
counter_shard& apply(counter_shard_view other) noexcept {
return do_apply(other);
}
counter_shard& apply(const counter_shard& other) noexcept {
return do_apply(other);
}
static constexpr size_t serialized_size() {
return counter_shard_view::size;
}
void serialize(bytes::iterator& out) const {
write(_id, out);
write(_value, out);
write(_logical_clock, out);
}
};
class counter_cell_builder {
std::vector<counter_shard> _shards;
bool _sorted = true;
private:
void do_sort_and_remove_duplicates();
public:
counter_cell_builder() = default;
counter_cell_builder(size_t shard_count) {
_shards.reserve(shard_count);
}
void add_shard(const counter_shard& cs) {
_shards.emplace_back(cs);
}
void add_maybe_unsorted_shard(const counter_shard& cs) {
add_shard(cs);
if (_sorted && _shards.size() > 1) {
auto current = _shards.rbegin();
auto previous = std::next(current);
_sorted = current->id() > previous->id();
}
}
void sort_and_remove_duplicates() {
if (!_sorted) {
do_sort_and_remove_duplicates();
}
}
size_t serialized_size() const {
return _shards.size() * counter_shard::serialized_size();
}
void serialize(bytes::iterator& out) const {
for (auto&& cs : _shards) {
cs.serialize(out);
}
}
bool empty() const {
return _shards.empty();
}
atomic_cell build(api::timestamp_type timestamp) const {
// If we can assume that the counter shards never cross fragment boundaries
// the serialisation code gets much simpler.
static_assert(data::cell::maximum_external_chunk_length % counter_shard::serialized_size() == 0);
auto ac = atomic_cell::make_live_uninitialized(*counter_type, timestamp, serialized_size());
auto dst_it = ac.value().begin();
auto dst_current = *dst_it++;
for (auto&& cs : _shards) {
if (dst_current.empty()) {
dst_current = *dst_it++;
}
assert(!dst_current.empty());
auto value_dst = dst_current.data();
cs.serialize(value_dst);
dst_current.remove_prefix(counter_shard::serialized_size());
}
return ac;
}
static atomic_cell from_single_shard(api::timestamp_type timestamp, const counter_shard& cs) {
// We don't really need to bother with fragmentation here.
static_assert(data::cell::maximum_external_chunk_length >= counter_shard::serialized_size());
auto ac = atomic_cell::make_live_uninitialized(*counter_type, timestamp, counter_shard::serialized_size());
auto dst = ac.value().first_fragment().begin();
cs.serialize(dst);
return ac;
}
class inserter_iterator : public std::iterator<std::output_iterator_tag, counter_shard> {
counter_cell_builder* _builder;
public:
explicit inserter_iterator(counter_cell_builder& b) : _builder(&b) { }
inserter_iterator& operator=(const counter_shard& cs) {
_builder->add_shard(cs);
return *this;
}
inserter_iterator& operator=(const counter_shard_view& csv) {
return operator=(counter_shard(csv));
}
inserter_iterator& operator++() { return *this; }
inserter_iterator& operator++(int) { return *this; }
inserter_iterator& operator*() { return *this; };
};
inserter_iterator inserter() {
return inserter_iterator(*this);
}
};
// <counter_id> := <int64_t><int64_t>
// <shard> := <counter_id><int64_t:value><int64_t:logical_clock>
// <counter_cell> := <shard>*
template<mutable_view is_mutable>
class basic_counter_cell_view {
protected:
using linearized_value_view = std::conditional_t<is_mutable == mutable_view::no,
bytes_view, bytes_mutable_view>;
using pointer_type = typename linearized_value_view::pointer;
basic_atomic_cell_view<is_mutable> _cell;
linearized_value_view _value;
private:
class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<is_mutable>> {
pointer_type _current;
basic_counter_shard_view<is_mutable> _current_view;
public:
shard_iterator() = default;
shard_iterator(pointer_type ptr) noexcept
: _current(ptr), _current_view(ptr) { }
basic_counter_shard_view<is_mutable>& operator*() noexcept {
return _current_view;
}
basic_counter_shard_view<is_mutable>* operator->() noexcept {
return &_current_view;
}
shard_iterator& operator++() noexcept {
_current += counter_shard_view::size;
_current_view = basic_counter_shard_view<is_mutable>(_current);
return *this;
}
shard_iterator operator++(int) noexcept {
auto it = *this;
operator++();
return it;
}
shard_iterator& operator--() noexcept {
_current -= counter_shard_view::size;
_current_view = basic_counter_shard_view<is_mutable>(_current);
return *this;
}
shard_iterator operator--(int) noexcept {
auto it = *this;
operator--();
return it;
}
bool operator==(const shard_iterator& other) const noexcept {
return _current == other._current;
}
bool operator!=(const shard_iterator& other) const noexcept {
return !(*this == other);
}
};
public:
boost::iterator_range<shard_iterator> shards() const {
auto begin = shard_iterator(_value.data());
auto end = shard_iterator(_value.data() + _value.size());
return boost::make_iterator_range(begin, end);
}
size_t shard_count() const {
return _cell.value().size_bytes() / counter_shard_view::size;
}
protected:
// ac must be a live counter cell
explicit basic_counter_cell_view(basic_atomic_cell_view<is_mutable> ac, linearized_value_view vv) noexcept
: _cell(ac), _value(vv)
{
assert(_cell.is_live());
assert(!_cell.is_counter_update());
}
public:
api::timestamp_type timestamp() const { return _cell.timestamp(); }
static data_type total_value_type() { return long_type; }
int64_t total_value() const {
return boost::accumulate(shards(), int64_t(0), [] (int64_t v, counter_shard_view cs) {
return v + cs.value();
});
}
stdx::optional<counter_shard_view> get_shard(const counter_id& id) const {
auto it = boost::range::find_if(shards(), [&id] (counter_shard_view csv) {
return csv.id() == id;
});
if (it == shards().end()) {
return { };
}
return *it;
}
stdx::optional<counter_shard_view> local_shard() const {
// TODO: consider caching local shard position
return get_shard(counter_id::local());
}
bool operator==(const basic_counter_cell_view& other) const {
return timestamp() == other.timestamp() && boost::equal(shards(), other.shards());
}
};
struct counter_cell_view : basic_counter_cell_view<mutable_view::no> {
using basic_counter_cell_view::basic_counter_cell_view;
template<typename Function>
static decltype(auto) with_linearized(basic_atomic_cell_view<mutable_view::no> ac, Function&& fn) {
return ac.value().with_linearized([&] (bytes_view value_view) {
counter_cell_view ccv(ac, value_view);
return fn(ccv);
});
}
// Returns counter shards in an order that is compatible with Scylla 1.7.4.
std::vector<counter_shard> shards_compatible_with_1_7_4() const;
// Reversibly applies two counter cells, at least one of them must be live.
static void apply(const column_definition& cdef, atomic_cell_or_collection& dst, atomic_cell_or_collection& src);
// Computes a counter cell containing minimal amount of data which, when
// applied to 'b' returns the same cell as 'a' and 'b' applied together.
static stdx::optional<atomic_cell> difference(atomic_cell_view a, atomic_cell_view b);
friend std::ostream& operator<<(std::ostream& os, counter_cell_view ccv);
};
struct counter_cell_mutable_view : basic_counter_cell_view<mutable_view::yes> {
using basic_counter_cell_view::basic_counter_cell_view;
explicit counter_cell_mutable_view(atomic_cell_mutable_view ac) noexcept
: basic_counter_cell_view<mutable_view::yes>(ac, ac.value().first_fragment())
{
assert(!ac.value().is_fragmented());
}
void set_timestamp(api::timestamp_type ts) { _cell.set_timestamp(ts); }
};
// Transforms mutation dst from counter updates to counter shards using state
// stored in current_state.
// If current_state is present it has to be in the same schema as dst.
void transform_counter_updates_to_shards(mutation& dst, const mutation* current_state, uint64_t clock_offset);
template<>
struct appending_hash<counter_shard_view> {
template<typename Hasher>
void operator()(Hasher& h, const counter_shard_view& cshard) const {
::feed_hash(h, cshard.id().to_uuid());
::feed_hash(h, cshard.value());
::feed_hash(h, cshard.logical_clock());
}
};
template<>
struct appending_hash<counter_cell_view> {
template<typename Hasher>
void operator()(Hasher& h, const counter_cell_view& cell) const {
::feed_hash(h, true); // is_live
::feed_hash(h, cell.timestamp());
for (auto&& csv : cell.shards()) {
::feed_hash(h, csv);
}
}
};

Some files were not shown because too many files have changed in this diff Show More