Compare commits

..

197 Commits

Author SHA1 Message Date
Avi Kivity
19907fad15 sstables: fix use-after-free in read_simple()
`r` is moved-from, and later captured in a different lambda. The compiler may
choose to move and perform the other capture later, resulting in a use-after-free.

Fix by copying `r` instead of moving it.

Discovered by sstable_test in debug mode.
Message-Id: <20170702082546.20570-1-avi@scylladb.com>

(cherry picked from commit 07b8adce0e)
2018-02-01 14:28:59 +01:00
Avi Kivity
97f781c4d8 Update seastar submodule
* seastar e23b9b8...a66e0c5 (3):
  > posix.hh: add missing include
  > tls_test: Fix echo test not setting server trust store
  > tls: Actually verify client certificate if requested

Fixes #3072
2018-01-29 15:26:24 +02:00
Avi Kivity
88e69701bd Merge "Fix memory leak on zone reclaim" from Tomek
"_free_segments_in_zones is not adjusted by
segment_pool::reclaim_segments() for empty zones on reclaim under some
conditions. For instance when some zone becomes empty due to regular
free() and then reclaiming is called from the std allocator, and it is
satisfied from a zone after the one which is empty. This would result
in free memory in such zone to appear as being leaked due to corrupted
free segment count, which may cause a later reclaim to fail. This
could result in bad_allocs.

The fix is to always collect such zones.

Fixes #3129
Refs #3119
Refs #3120"

* 'tgrabiec/fix-free_segments_in_zones-leak' of github.com:scylladb/seastar-dev:
  tests: lsa: Test _free_segments_in_zones is kept correct on reclaim
  lsa: Expose max_zone_segments for tests
  lsa: Expose tracker::non_lsa_used_space()
  lsa: Fix memory leak on zone reclaim

(cherry picked from commit 4ad212dc01)
2018-01-16 15:55:09 +02:00
Takuya ASADA
9007b38002 dist/common/systemd: specify correct repo file path for housekeeping service on Ubuntu/Debian
Currently scylla-housekeeping-daily.service/-restart.service hardcoded
"--repo-files '/etc/yum.repos.d/scylla*.repo'" to specify CentOS .repo file,
but we use same .service for Ubuntu/Debian.
It doesn't work correctly, we need to specify .list file for Debian variants.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1513385159-15736-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit c2e87f4677)
2017-12-16 22:05:38 +02:00
Glauber Costa
f2e0affcc5 database: delete created SSTables if streaming writes fail
We have had an issue recently where failed SSTable writes left the
generated SSTables dangling in a potentially invalid state. If the write
had, for instance, started and generated tmp TOCs but not finished,
those files would be left for dead.

We had fixed this in commit b7e1575ad4,
but streaming memtables still have the same isse.

Note that we can't fix this in the common function
write_memtable_to_sstable because different flushers have different
retry policies.

Fixes #3062

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20171213011741.8156-1-glauber@scylladb.com>
(cherry picked from commit 1aabbc75ab)
2017-12-13 10:26:07 +02:00
Avi Kivity
6fce847000 Update seastar submodule
* seastar f27b240...e23b9b8 (1):
  > rpc: make sure that _write_buf stream is always properly closed

Fixes #3018.
2017-11-26 10:40:23 +02:00
Avi Kivity
f6f91a49cb Update seastar submodule
* seastar 121f468...f27b240 (1):
  > scripts/posix_net_conf.sh: supress unwanted output from get_irqs_one

Fixes #2808.
2017-10-08 16:40:00 +03:00
Tomasz Grabiec
266a45ad1e Update seastar submodule
* seastar b3ef898...121f468 (1):
  > configure: disable exception scalability hack on debug build
2017-09-25 10:13:59 +02:00
Tomasz Grabiec
7d88026f22 tests: row_cache_test: Fix test failure
Broken after 0ac2c388b6, which assigns
empty reader to _delegate on hitting wide partition limit. The test
assumed that the original _delegate will be invoked when the
single-partition reader is asked for the next partition, which is no
longer the case.

Message-Id: <20170912172739.6851-1-tgrabiec@scylladb.com>
2017-09-12 20:33:10 +03:00
Duarte Nunes
760af5635d tests: Remove sstable_assertions
The test using these assertions has been removed, and the
infrastructure required for them to work is absent from 1.7.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170912113714.24223-1-duarte@scylladb.com>
2017-09-12 14:41:44 +03:00
Duarte Nunes
8c18bfa8d6 sstable_mutation_test: Remove promoted index monotonicity test
The infrastructure this test relies on is not present in 1.7, so
just remove the test as backporting the required changes would be a
risky, non-trivial effort.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170912081304.10116-1-duarte@scylladb.com>
2017-09-12 11:18:05 +03:00
Avi Kivity
04e3785f77 Update seastar submodule
* seastar 688bb6f...b3ef898 (1):
  > build: fix bad merge artifacts
2017-09-11 16:27:52 +03:00
Avi Kivity
e00e6ad1b6 Update seastar submodule
* seastar 949b710...688bb6f (2):
  > build: export full cflags in pkgconfig file
  > build: disable -Wattributes when gcc -fvisibility=hidden bug strikes

Fixes build with gcc 6.4/gcc 7.
2017-09-11 15:41:52 +03:00
Pekka Enberg
5653ea9f8d release: prepare for 1.7.5 2017-09-11 14:03:12 +03:00
Avi Kivity
4dbd1b77cd Merge "Fix Scylla upgrades when counters are used" from Paweł
"A new feature flag CORRECT_COUNTER_ORDER is introduced to allow seamless
upgrade from 1.7.4 to later Scylla versions. If that feature is not
available Scylla still writes sstables and sends on-wire counters using
the old ordering so that it can be correctly understood by 1.7.4, once
the flag becomes available Scylla switches to the correct order.

Fixes #2752."

* tag 'fix-upgrade-with-counters-1.7/v1' of https://github.com/pdziepak/scylla:
  tests/counter: verify counter_id ordering
  counter: check that utils::UUID uses int64_t
  mutation_partition_serializer: use old counter ordering if necessary
  mutation_partition_view: do not expect counter shards to be sorted
  sstables: write counter shards in the order expected by the cluster
  tests/sstables: add storage_service_for_tests to counter write test
  tests/sstables: add test for reading wrong-order counter cells
  sstables: do not expect counter shards to be sorted
  storage_service: introduce CORRECT_COUNTER_ORDER feature
  tests/counter: test 1.7.4 compatible shard ordering
  counters: add helper for retrieving shards in 1.7.4 order
  tests/counter: add tests for 1.7.4 counter shard order
  counters: add counter id comparator compatible with Scylla 1.7.4
  tests/counter: verify order of counter shards
  tests/counter: add test for sorting and deduplicating shards
  counters: add function for sorting and deduplicating counter cells
  counters: add more comparison operators
2017-09-11 13:27:01 +03:00
Paweł Dziepak
0e61212c20 tests/counter: verify counter_id ordering 2017-09-05 13:49:01 +01:00
Paweł Dziepak
6f4bc82b6e counter: check that utils::UUID uses int64_t 2017-09-05 13:49:01 +01:00
Paweł Dziepak
c1a30d3f60 mutation_partition_serializer: use old counter ordering if necessary
Until the cluster is fully upgraded from a version that uses the
incorrect counter shard ordering it is essential to keep using it lest
the old nodes corrupt the data upon receiving mutations with a counter
shard ordering they do not expect.
2017-09-05 13:49:01 +01:00
Paweł Dziepak
cbad33033f mutation_partition_view: do not expect counter shards to be sorted 2017-09-05 13:49:01 +01:00
Paweł Dziepak
1f31be9ba3 sstables: write counter shards in the order expected by the cluster
If the feature signaling that we have switched to the correct ordering
of counter shards is not enabled it means that the user still can do a
rollback to a version that expects wrong ordering. In order to avoid any
disasters when that happens write sstables using the 1.7.4 order until
we know for sure that it is no longer needed.
2017-09-05 13:49:01 +01:00
Paweł Dziepak
7e89dc3bbf tests/sstables: add storage_service_for_tests to counter write test
Writing a counters to a sstable is going to require cluster feature
information, which requires accessing some singletons.
2017-09-05 13:49:01 +01:00
Paweł Dziepak
2cdcaeba6e tests/sstables: add test for reading wrong-order counter cells 2017-09-05 13:49:01 +01:00
Paweł Dziepak
55cb0cafa8 sstables: do not expect counter shards to be sorted 2017-09-05 13:49:01 +01:00
Paweł Dziepak
660572e85c storage_service: introduce CORRECT_COUNTER_ORDER feature
Scylla 1.7.4 used incorrect ordering of counter shards. In order to fix
this problem a new feature is introduced that will be used to determine
when nodes with that bug fixed can start sending counter shard in the
correct order.
2017-09-05 13:49:01 +01:00
Paweł Dziepak
b86da0c479 tests/counter: test 1.7.4 compatible shard ordering 2017-09-05 13:49:01 +01:00
Paweł Dziepak
b1b8599b1a counters: add helper for retrieving shards in 1.7.4 order 2017-09-05 13:49:00 +01:00
Paweł Dziepak
89c037dfc8 tests/counter: add tests for 1.7.4 counter shard order 2017-09-05 13:49:00 +01:00
Paweł Dziepak
25eec66935 counters: add counter id comparator compatible with Scylla 1.7.4 2017-09-05 13:49:00 +01:00
Paweł Dziepak
b5787ca640 tests/counter: verify order of counter shards 2017-09-05 13:49:00 +01:00
Paweł Dziepak
838dbd98ac tests/counter: add test for sorting and deduplicating shards 2017-09-05 13:49:00 +01:00
Paweł Dziepak
022c2ff53a counters: add function for sorting and deduplicating counter cells
Due to a bug in an implementation of UUID less compare some Scylla
versions sort counter shards in an incorrect order. Moreover, when
dealing with imported correct data the inconsistencies in ordering
caused some counter shards to become duplicated.
2017-09-05 13:49:00 +01:00
Paweł Dziepak
b7c27d73d8 counters: add more comparison operators 2017-09-05 13:49:00 +01:00
Vlad Zolotarov
bdc0ca7064 service::storage_service: initialize auth and tracing after we joined the ring
Initialize the system_auth and system_traces keyspaces and their tables after
the Node joins the token ring because as a part of system_auth initialization
there are going to be issues SELECT and possible INSERT CQL statements.

This patch effectively reverts the d3b8b67 patch and brings the initialization order
to how it was before that patch.

Fixes #2273

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1500417217-16677-1-git-send-email-vladz@scylladb.com>
(cherry picked from commit e98adb13d5)
2017-08-30 09:33:33 +02:00
Calle Wilund
34260ce471 utils::UUID: operator< should behave as comparison of hex strings/bytes
I.e. need to be unsigned comparison.
Message-Id: <1487683665-23426-1-git-send-email-calle@scylladb.com>

(cherry picked from commit 0d87f3dd7d)
2017-08-24 14:18:55 +01:00
Avi Kivity
cffe57bcc7 Merge "repair: Do not allow repair until node is in NORMAL status" from Asias
Fixes #2723.

* tag 'asias/repair_issue_2723_v1' of github.com:cloudius-systems/seastar-dev:
  repair: Do not allow repair until node is in NORMAL status
  gossip: Add is_normal helper

(cherry picked from commit 2f41ed8493)
2017-08-23 09:45:54 +03:00
Paweł Dziepak
adb9ce7f38 lsa: avoid unnecessary segment migrations during reclaim
segment_zone::migrate_all_segments() was trying to migrate all segments
inside a zone to the other one hoping that the original one could be
completely freed. This was an attempt to optimise for throughput.

However, this may unnecesairly hurt latency if the zone is large, but
only few segments are required to satisfy reclaimer's demands.
Message-Id: <20170410171912.26821-1-pdziepak@scylladb.com>

(cherry picked from commit 0318dccafd)
2017-08-22 09:29:05 +02:00
Tomasz Grabiec
5f1fd7a0b1 schema_registry: Ensure schema_ptr is always synced on the other core
global_schema_ptr ensures that schema object is replicated to other
cores on access. It was replicating the "synced" state as well, but
only when the shard didn't know about the schema. It could happen that
the other shard has the entry, but it's not yet synced, in which case
we would fail to replicate the "synced" state. This will result in
exception from mutate(), which rejects attempts to mutate using an
unsynced schema.

The fix is to always replicate the "synced" state. If the entry is
syncing, we will preemptively mark it as synced earlier. The syncing
code is already prepared for this.

Refs #2617.
Message-Id: <1500555224-15825-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 65c64614aa)
2017-08-17 17:15:12 +02:00
Avi Kivity
d1f06633e0 Update seastar submodule
* seastar a4d924e...949b710 (1):
  > fstream: do not ignore unresolved future

Fixes #2697.
2017-08-16 15:12:45 +03:00
Avi Kivity
b54ea3f6cf dist: use correct repository for third-party RPMs 2017-08-16 11:24:42 +03:00
Avi Kivity
63fd65414a Update seastar submodule
* seastar e5825b5...a4d924e (1):
  > Merge "Fix crash in rpc due to access to already destroyed server socket" from Gleb

Fixes #2690
2017-08-14 16:25:03 +03:00
Avi Kivity
9790c2d229 Update seastar submodule
* seastar 8d9fd92...e5825b5 (1):
  > tls: Only recurse once in shutdown code

Fixes #2691
2017-08-14 15:12:01 +03:00
Raphael S. Carvalho
7728a8dec5 sstables: close index file when sstable writer fails
index's file output stream uses write behind but it's not closed
when sstable write fails and that may lead to crash.
It happened before for data file (which is obviously easier to
reproduce for it) and was fixed by 0977f4fdf8.

Fixes #2673.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170807171146.10243-1-raphaelsc@scylladb.com>
(cherry picked from commit dddbd34b52)
2017-08-08 09:59:10 +03:00
Duarte Nunes
1fd4a3ed34 tests/sstable_mutation_test: Don't use moved-from object
Fix a bug introduced in dbbb9e93d and exposed by gcc6 by not using a
moved-from object. Twice.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170802161033.4213-1-duarte@scylladb.com>
(cherry picked from commit 4c9206ba2f)
2017-08-03 09:46:33 +03:00
Avi Kivity
0b48863a7e Merge "Ensure correct EOC for PI block cell names" from Duarte
"This series ensures the always write correct cell names to promoted
index cell blocks, taking into account the eoc of range tombstones.

Fixes #2333"

* 'pi-cell-name/v1' of github.com:duarten/scylla:
  tests/sstable_mutation_test: Test promoted index blocks are monotonic
  sstables: Consider eoc when flushing pi block
  sstables: Extract out converting bound_kind to eoc

(cherry picked from commit db7329b1cb)
2017-08-01 18:13:19 +03:00
Gleb Natapov
aec94b926c cql transport: run accept loop in the foreground
It was meant to be run in the foreground since it is waited upon during
stop(), but as it is now from the stop() perspective it is completed
after first connection is accepted.

Fixes #2652

Message-Id: <20170801125558.GS20001@scylladb.com>
(cherry picked from commit 1da4d5c5ee)
2017-08-01 17:07:55 +03:00
Tomasz Grabiec
0ac2c388b6 row_cache: Avoid deadlock/timeout due to sstable read concurrency limit
database::make_sstable_reader() creates a reader which will need to
obtain a semaphore permit when invoked, so that there is a limit on
sstable read concurrency (edeef03). Therefore, each read may create at
most one such reader in order to be guaranteed to make
progress. Otherwise, the creation of the second reader may deadlock
(in case of system tables) or timeout (non-system tables), if enough
number of such readers tries to do the same thing at the same time.

One instance of the problem fixed by this patch is in cache populating
reader (98c12dc) when we reach partition size limit
(max_cached_partition_size_in_kb). In that case population is
abandoned and a second read is created, while still keeping the old
one alive. We saw this causing deadlocks during schema tables parsing
when system.schema_columns contained large partitions. Fixes #2623.

Another case when this can potentially happen is when populating
readers are recreated by cache. We replace the reader there, but using
assignment, so the old reader is still alive when the new one is
created. This patch fixes two out of three of such cases. The third
one (in a scanning read) is not that easy to fix. That problem doesn't
exist in version 2.0 and master, where the cache is reworked for row
granularity.

Refs #2644.

Message-Id: <1501160300-18097-1-git-send-email-tgrabiec@scylladb.com>
2017-08-01 12:10:39 +03:00
Takuya ASADA
09ac5b57aa dist/redhat: limit metapackage dependencies to specific version of scylla packages
When we install scylla metapackage with version (ex: scylla-1.7.1),
it just always install newest scylla-server/-jmx/-tools on the repo,
instead of installing specified version of packages.

To install same version packages with the metapackage, limited dependencies to
current package version.

Fixes #2642

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170726193321.7399-1-syuu@scylladb.com>
(cherry picked from commit 91a75f141b)
2017-07-27 14:22:06 +03:00
Shlomi Livne
ff643e3e40 release: prepare for 1.7.4
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2017-07-26 17:26:33 +03:00
Asias He
a7b8d89de8 gossip: Fix nr_live_nodes calculation
We need to consider the _live_endpoints size. The nr_live_nodes should
not be larger than _live_endpoints size, otherwise the loop to collect
the live node can run forever.

It is a regression introduced in commit 437899909d
(gossip: Talk to more live nodes in each gossip round).

Fixes #2637

Message-Id: <863ec3890647038ae1dfcffc73dde0163e29db20.1501026478.git.asias@scylladb.com>
(cherry picked from commit 515a744303)
2017-07-26 16:49:11 +03:00
Duarte Nunes
013fa3da14 schema: Calculate default validator
Fixes #2605

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170719105131.21455-3-duarte@scylladb.com>
2017-07-20 10:58:29 +02:00
Duarte Nunes
259cfaf8f9 thrift: Set default validator for static CFs
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170719105131.21455-2-duarte@scylladb.com>
2017-07-20 10:58:29 +02:00
Duarte Nunes
6501bf8e54 schema_tables: Recover comparator type
Fixes #2573

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170718125450.3727-1-duarte@scylladb.com>
2017-07-19 10:58:43 +02:00
Pekka Enberg
41b4055911 release: prepare for 1.7.3 2017-07-18 17:34:46 +03:00
Nadav Har'El
b594f21f91 Allow reading exactly desired byte ranges and fast_forward_to
Allow reading exactly desired byte ranges and fast_forward_to

In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve perforance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170718110643.8667-1-nyh@scylladb.com>
2017-07-18 16:54:11 +03:00
Avi Kivity
bcd2e6249f dist: tolerate sysctl failures
sysctl may fail in a container environment if /proc is not virtualized
properly.

Fixes #1990
Message-Id: <20170625145930.31619-1-avi@scylladb.com>

(cherry picked from commit 08488a75e0)
2017-07-18 15:47:10 +03:00
Takuya ASADA
4c79add7b0 dist/debian: skip tunables when kernel = 3.13.0-*-generic, to prevent kernel panic bug
There is kernel panic bug on kernel = 3.13.0-*-generic(Ubuntu 14.04), we have to skip tunables.

Fixes #1724

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1493196636-25645-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit abf65cb485)
2017-07-18 15:47:03 +03:00
Asias He
00f6ccb75d gossip: Implement the missing fd_max_interval_ms and fd_initial_value_ms option
It is useful for larger cluster with larger gossip message latency. By
default the fd_max_interval_ms is 2 seconds which means the
failure_detector will ignore any gossip message update interval larger
than 2 seconds. However, in larger cluster, the gossip message udpate
interval can be larger than 2 seconds.

Fixes #2603.

Message-Id: <49b387955fbf439e49f22e109723d3a19d11a1b9.1500278434.git.asias@scylladb.com>
(cherry picked from commit adc5f0bd21)
2017-07-17 13:30:34 +03:00
Avi Kivity
77ac5a63db Update seastar submodule
* seastar fc69677...8d9fd92 (1):
  > rpc: start server's send loop only after protocol negotiation

Fixes #2600.
2017-07-17 10:43:12 +03:00
Pekka Enberg
eb9de1a807 Merge "Repair backport for 1.7 branch" from Asias
"This series backports all the repair related fixes to enterprise branch and
 updates the scylla_repair to send ranges to repair to all the shards in
 parallel, indepedently.

 With this series, repair can utilize all the CPUs and is much more efficent."

* tag 'asias/repair-backport-branch-1.7.3-v1' of github.com:cloudius-systems/seastar-dev:
  repair: Use selective_token_range_sharder
  tests: Add test_selective_token_range_sharder
  dht: Add selective_token_range_sharder
  repair: further limit parallelism of checksum calculation
  repair: Do not store the failed ranges
  repair: Prefer nodes in local dc when streaming
  repair: Repair on all shards
  repair: Allow one stream plan in flight
2017-07-14 13:02:26 +03:00
Duarte Nunes
643a777067 storage_proxy: Preserve replica order across mutations
In storage_proxy we arrange the mutations sent by the replicas in a
vector of vectors, such that each row corresponds to a partition key
and each column contains the mutation, possibly empty, as sent by a
particular replica.

There is reconciliation-related code that assumes that all the
mutations sent by a particular replica can be found in a single
column, but that isn't guaranteed by the way we initially arrange the
mutations.

This patch fixes this and enforces the expected order.

Fixes #2531
Fixes #2593

Signed-off-by: Gleb Natapov <gleb@scylladb.com>
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170713162014.15343-1-duarte@scylladb.com>
(cherry picked from commit b8235f2e88)
2017-07-14 12:12:09 +03:00
Avi Kivity
6f91939650 Update seastar submodule
* seastar 8e2f629...fc69677 (1):
  > tls: Wrap all IO in semaphore (Fixes #2575)
2017-07-12 10:24:04 +03:00
Gleb Natapov
15da71266d consistency_level: report less live endpoints in Unavailable exception if there are pending nodes
DowngradingConsistencyRetryPolicy uses live replicas count from
Unavailable exception to adjust CL for retry, but when there are pending
nodes CL is increased internally by a coordinator and that may prevent
retried query from succeeding. Adjust live replica count in case of
pending node presence so that retried query will be able to proceed.

Fixes #2535

Message-Id: <20170710085238.GY2324@scylladb.com>
(cherry picked from commit 739dd878e3)
2017-07-11 17:16:58 +03:00
Botond Dénes
9cd36ade00 Fix crash in the out-of order restrictions error msg composition
Use name of the existing preceeding column with restriction
(last_column) instead of assuming that the column right after the
current column already has restrictions.
This will yield an error message that is different from that of
Cassandra, albeit still a correct one.

Fixes #2421

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <40335768a2c8bd6c911b881c27e9ea55745c442e.1499781685.git.bdenes@scylladb.com>
(cherry picked from commit 33bc62a9cf)
2017-07-11 17:16:01 +03:00
Asias He
6f58a1372e repair: Use selective_token_range_sharder
With this change, we ask all the shard to handle the ranges provided by
user and we use selective_token_range_sharder to split the ranges and
ignore the ranges do not belong to the current shard.

(cherry picked from commit b10e961a64)

 Conflicts:
	repair/repair.cc
2017-07-11 08:40:49 +08:00
Asias He
0a9d26de4a tests: Add test_selective_token_range_sharder
(cherry picked from commit 2a794db61b)
2017-07-11 08:40:49 +08:00
Asias He
35cd63e1f7 dht: Add selective_token_range_sharder
It is like ring_position_range_sharder but it works with
dht::token_range. This sharder will return the ranges belong to a
selected shard.

(cherry picked from commit d835cf2748)
2017-07-11 08:40:49 +08:00
Nadav Har'El
2ada799e07 repair: further limit parallelism of checksum calculation
Repair today has a semaphore limiting the number of ongoing checksum
comparisons running in parallel (on one shard) to 100. We needed this
number to be fairly high, because a "checksum comparison" can involve
high latency operations - namely, sending an RPC request to another node
in a remote DC and waiting for it to calculate a checksum there, and while
waiting for a response we need to proceed calculating checksums in parallel.

But as a consequence, in the current code, we can end up with as many as
100 fibers all at the same stage of reading partitions to checksum from
sstables. This requires tons of memory, to hold at least 128K of buffer
(even more with read-ahead) for each of these fibers, plus partition data
for each. But doing 100 reads in parallel is pointless - one (or very few)
should be enough.

So this patch adds another semaphore to limit the number of checksum
*calculations* (including the read and checksum calculation) on each shard
to just 2. There may still be 100 ongoing checksum *comparisons*, in
other stages of the comparisons (sending the checksum requests to other
and waiting for them to return), but only 2 will ever be in the stage of
reading from disk and checksumming them.

The limit of 2 checksum calculations (per shard) applies on the repair
slave, not just to the master: The slave may receive many checksum
requests in parallel, but will only actually work on 2 at a time.

Because the parallelism=100 now rate-limits operations which use very little
memory, in the future we can safely increase it even more, to support
situations where the disk is very fast but the link between nodes has
very high latency.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170703151329.25716-1-nyh@scylladb.com>
(cherry picked from commit d177ec05cb)
2017-07-11 08:40:49 +08:00
Asias He
b71037ac55 repair: Do not store the failed ranges
The number of failed ranges can be large so it can consume a lot of memory.
We already logged the failed ranges in the log. No need to storge them
in memory.

Message-Id: <7a70c4732667c5c3a69211785e8efff0c222fc28.1498809367.git.asias@scylladb.com>
(cherry picked from commit b2a2fbcf73)

 Conflicts:
	repair/repair.cc
2017-07-11 08:40:49 +08:00
Asias He
8639f32efd repair: Prefer nodes in local dc when streaming
When peer nodes have the same partition data, i.e., with the same
checksum, we currently choose to stream from any of them randomly.
To improve streaming performance, select the peer within the same DC.
This patch is supposed to improve repair perforamnce with multiple DC.

Message-Id: <c6a345b6e8ed2b59f485e53c865241e463b44507.1498490831.git.asias@scylladb.com>
(cherry picked from commit cc02a62756)
2017-07-11 08:40:48 +08:00
Asias He
a0dce7c922 repair: Repair on all shards
Currently, shard zero is the coordinator of the repair. All the work of
checksuming of the local node and sending of the repair checksum rpc
verb is done on shard zero only. This causes other shards being
underutilized.

With this patch, we split the ranges need to be repaired into at least
smp::count ranges, so sizeof(ranges) / smp::count will be assigned to
each shard. For exmaple, we have 8 shards and 256 ragnes, each shard
will repair 32 ranges. Each shard will repair the 32 ranges
sequencially.  There will be at most 8 (smp::count) ranges of repair in
parallel.

(cherry picked from commit 47345078ec)

Conflicts:
	repair/repair.cc
2017-07-11 08:40:48 +08:00
Asias He
d39ff4f2ac repair: Allow one stream plan in flight
In "repair: Use more stream_plan" (commit 2043ffc064), we
switched to do stream while doing checksum instead of do stream only
after checksum pahse is completed. We take a parallelism_semaphore
before we do checksum, if there are more than sub_ranges_to_stream
(1024) ranges, we start a stream_plan and wait for the streaming to
complete (still under the parallelism_semaphore). So at most
parallelism_semaphore (100) stream_plans can be in parallel.

The parallelism_semaphore limits the parallelism of both checksum and the
streaming plan. However, it is not necessary to have the same
parallelism for both checksum and streaming, because 1) a streaming
operation itself runs in parallel (handling ranges on all shards in
prallel, sending mutaitons in parallel) , 2) and with more streaming plan
(in worse case 100) means we can write to 100 memtables at the same time
and flush 100 memtables to disk at the same time which can take a lot of
memory.

With this patch, we only allow one stream plan in flight.

(cherry picked from commit 54831a344c)
2017-07-11 08:40:48 +08:00
Avi Kivity
7cbfe0711f dist: redirect stdout/stderr to the journal on systemd systems
Fixes #2408.

Message-Id: <20170524080729.10085-1-avi@scylladb.com>
(cherry picked from commit 15af6acc8b)
2017-07-10 19:31:14 +03:00
Glauber Costa
139a2d14a1 disable defragment-memory-on-idle-by-default
It's been linked with various performance issues, either by causing
them or making them worse. One example is #1634, and also recently
I have investigated continuous performance degradation that was also
linked to defrag on idle activity.

Until we can figure out how to reduce its impact, we should disable it.

Signed-off-by: Glauber Costa <glauber@glauber.scylladb>
Message-Id: <20170627201109.10775-1-glauber@scylladb.com>
(cherry picked from commit f3742d1e38)
2017-07-10 19:25:12 +03:00
Asias He
6fff331698 gossip: Use vector for _live_endpoints
To speed up the random access in get_random_node. Switch to use vector
instead of set.

(cherry picked from commit e31d4a3940)
Message-Id: <fea90eaa5273fac50d0013b3778d9a4f2562e0b7.1499394330.git.asias@scylladb.com>
2017-07-10 14:42:26 +03:00
Asias He
43ae64cd47 gossip: Talk to more live nodes in each gossip round
In large clusters with multiple DC deployment, it is observed that it
takes long delay for gossip update to disseminate in the cluster.

To speed up, talk to more live nodes in each gossip round.

Fixes #2528

(cherry picked from commit 437899909d)
Message-Id: <9bcdaf1fb5637d14a7fda9188ba76ced8f1afaaf.1499394330.git.asias@scylladb.com>
2017-07-10 14:40:40 +03:00
Tomasz Grabiec
f306b47a88 tests: commitlog: Check there are no segments left on disk after clean shutdown
Reproduces #2550.

Message-Id: <1499358825-17855-2-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 72e01b7fe8)
2017-07-10 12:41:33 +03:00
Tomasz Grabiec
47b1e39410 commitlog: Discard active but unused segments on shutdown
So that they are not left on disk even though we did a clean shutdown.

First part of the fix is to ensure that closed segments are recognized
as not allocating (_closed flag). Not doing this prevents them from
being collected by discard_unused_segments(). Second part is to
actually call discard_unused_segments() on shutdown after all segments
were shut down, so that those whose position are cleared can be
removed.

Fixes #2550.

Message-Id: <1499358825-17855-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 6555a2f50b)
2017-07-10 12:40:43 +03:00
Botond Dénes
0f4d5cde8e cql3: Add K_FROZEN and K_TUPLE to basic_unreserved_keyword
To allow the non-reserved keywords "frozen" and "tuple" to be used as
column names without double-quotes.

Fixes #2507

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <9ae17390662aca90c14ae695c9b4a39531c6cde6.1499329781.git.bdenes@scylladb.com>
(cherry picked from commit c4277d6774)
2017-07-06 18:19:59 +03:00
Avi Kivity
a24dcf1a19 Update seastar submodule
* seastar 18a82e2...8e2f629 (1):
  > future-utils: fix do_for_each exception reporting

Fixes bug during a failed repair.
2017-07-06 17:32:37 +03:00
Raphael S. Carvalho
611c25234e database: fix potential use-after-free in sstable cleanup
when do_for_each is in its last iteration and with_semaphore defers
because there's an ongoing cleanup, sstable object will be used after
freed because it was taken by ref and the container it lives in was
destroyed prematurely.

Let's fix it with a do_with, also making code nicer.

Fixes #2537.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170630035324.19881-1-raphaelsc@scylladb.com>
(cherry picked from commit b9d0645199)
2017-07-03 12:49:34 +03:00
Amos Kong
f64e3e24d4 common/scripts: fix node_exporter url
Commit ff3d83bc2f updated node_exporter
from 0.12.0 to 0.14.0, and it introduced a bug to download install file.

node_exporter started to add 'v' prefix in release tags[1] from 0.13.0,
so we need to fix the url.

[1] https://github.com/prometheus/node_exporter/tags

Fixes #2509

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <42b0a7612539a34034896d404d63a0a31ce79e10.1497919368.git.amos@scylladb.com>
(cherry picked from commit 92731eff4f)
2017-06-22 08:51:35 +03:00
Shlomi Livne
f6034c717d release: prepare 1.7.2
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2017-06-21 22:09:31 +03:00
Amos Kong
b6f4df3cc8 scylla_setup: fix deadloop in inputting invalid option
example: # scylla_setup --invalid-opt

Fixes #2305

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <9a4f631b126d8eaaae479fa99137db7a61a7c869.1493135357.git.amos@scylladb.com>
(cherry picked from commit f655639e5a)
2017-06-19 22:32:38 +03:00
Amnon Heiman
af028360d7 node_exporter_install script update version to 0.14
Fixes #2097

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170612125724.7287-1-amnon@scylladb.com>
(cherry picked from commit ff3d83bc2f)
2017-06-18 12:28:19 +03:00
Duarte Nunes
60af7eab10 udt: Don't check a type is unused after applying the schema mutations
This patch is based on 6c8b5fc. It moves the check whether a dropped
type is still used by other types or tables from schema_tables to
the drop_type_statement, as delaying this check to after applying the
mutations can leave the keyspace in a broken state.

Fixes #2490

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1497466736-28841-1-git-send-email-duarte@scylladb.com>
2017-06-15 10:35:01 +03:00
Calle Wilund
665d14584c database: Fix assert in truncate to handle empty memtables+sstables
If we do two truncates in a row, the second will have neither memtable
nor sstable data. Thus we will not write/remove sstables, and thus
get no resulting truncation replay position.

Fixes #2489

Message-Id: <1497378469-6063-1-git-send-email-calle@scylladb.com>

(cherry picked from commit 525730e135)
2017-06-14 16:25:57 +03:00
Gleb Natapov
bb56e7682c Fix use after free in nonwrapping_range::intersection
end_bound() returns temporary object (end_bound_ref), so it cannot be
taken by reference here and used later. Copy instead.

Message-Id: <20170612132328.GJ21915@scylladb.com>

(cherry picked from commit 21197981a)

Fixes #2482
2017-06-14 12:08:06 +01:00
Avi Kivity
a4bd56ce40 tests: fix partitioner_test build on gcc 5 2017-06-13 21:56:02 +03:00
Calle Wilund
6340fe61af commitlog_test: Fix test_commitlog_delete_when_over_disk_limit
Test should
a.) Wait for the flush semaphore
b.) Only compare segement sets between start and end, not start,
    end and inbetwen. I.e. the test sort of assumed we started
    with < 2 (or so) segments. Not always the case (timing)

Message-Id: <1496828317-14375-1-git-send-email-calle@scylladb.com>
(cherry picked from commit 0c598e5645)
2017-06-13 19:53:13 +03:00
Asias He
f2317a6f3f repair: Fix range use after free
Capture it by value.

scylla:  [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed)
scylla:  [shard 0] repair - Failed sync of range ==<runtime_exception
(runtime error: Invalid token. Should have size 8, has size 0#012)>: streaming::stream_exception (Stream failed)

Message-Id: <7fda4432e54365f64b556e7e4c26e36d3a9bb1b7.1497238229.git.asias@scylladb.com>
(cherry picked from commit 2bcb368a13)
2017-06-13 11:03:14 +03:00
Paweł Dziepak
7bb41b50f9 commitlog: avoid copying column_mapping
It is safe to copy column_mapping accros shards. Such guarantee comes at
the cost of performance.

This patch makes commitlog_entry_writer use IDL generated writer to
serialise commitlog_entry so that column_mapping is not copied. This
also simplifies commitlog_entry itself.

Performance difference tested with:
perf_simple_query -c4 --write --duration 60
(medians)
          before       after      diff
write   79434.35    89247.54    +12.3%

(cherry picked from commit 374c8a56ac)

Also: Fixes #2468.
2017-06-11 15:44:20 +03:00
Paweł Dziepak
57d602fdd6 idl: fix generated writers when member functions are used
When using member name in an idetifer of generated class or method
idl compiler should strip the trailing '()'.

(cherry picked from commit 4df4994b71)

(part of #2468)
2017-06-11 15:43:53 +03:00
Paweł Dziepak
cd14b83192 idl: add start_frame() overload for seastar::simple_output_stream
(cherry picked from commit 018d16d315)

(part of #2468)
2017-06-11 15:43:11 +03:00
Avi Kivity
a85b70d846 Merge "repair memory usage fix" from Asias
"This series switches repair to use more stream plans to stream the mismatched
sub ranges and use a range generator to produce sub ranges.

Test shows no huge memory is used for repair with large data set.

In addition, we now have a progress reporter in the log how many ranges are processed.

   Jun 06 14:18:22  [shard 0] repair - Repair 512 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]
   Jun 06 14:19:55  [shard 0] repair - Repair 513 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]

Fixes #2430."

* tag 'asias/fix-repair-2430-branch-master-v1' of github.com:cloudius-systems/seastar-dev:
  repair: Remove unused sub_ranges_max
  repair: Reduce parallelism in repair_ranges
  repair: Tweak the log a bit
  repair: Use more stream_plan
  repair: iterator over subranges instead of list

(cherry picked from commit 419ad9d6cb)
2017-06-08 14:52:28 +03:00
Avi Kivity
f44ea5335b Update seastar submodule
* seastar 812e232...18a82e2 (1):
  > scripts: posix_net_conf.sh: fix bash syntax causing a failure during bonding iface configuration

Fixes #2269
2017-06-07 18:23:02 +03:00
Pekka Enberg
a95c045b48 Merge "Fixes to thrift/server" from Duarte
"This series fixes some issues with the thrift_server, namely
ensuring that streams and sockets are properly closed.

Fixes #499
Fixes #2437"

* 'thrift-server-fixes/v1' of github.com:duarten/scylla:
  thrift/server: Close connections when stopping server
  thrift/server: Move connection class to header
  thrift/server: Shutdown connection
  thrift/server: Close output_stream when connection is done

(cherry picked from commit a6dc21615b)
2017-06-07 16:08:28 +03:00
Avi Kivity
eb396d2795 Update seastar submodule
* seastar 328fdbc...812e232 (1):
  > rpc: handle messages larger than memory limit

Fixes #2453.
2017-06-07 12:29:59 +03:00
Takuya ASADA
dbbf99d7fa dist/debian: install gdebi when it's not exist
Since we started to use gdebi for install build-dep metapackage that generated by
mk-build-dep, we need to install gdebi on build_deb.sh too.

Fixes #2451

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496819209-30318-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 7fe63c539a)
2017-06-07 10:25:02 +03:00
Raphael S. Carvalho
f7a143e7be sstables: fix report of disk space used by bloom filter
After change in boot, read_filter is called by distributed loader,
so its update to _filter_file_size is lost. The load variant
which receives foreign components that must do it. We were also
not updating it for newly created sstables.

Fixes #2449.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170606151129.5477-1-raphaelsc@scylladb.com>
(cherry picked from commit 0ca1e5cca3)
2017-06-06 19:00:00 +03:00
Takuya ASADA
562102cc76 dist/debian: use gdebi instead of mk-build-deps -i
At least on Debian8, mk-build-deps -i silently finishes with return code 0
even it fails to install dependencies.
To prevent this, we should manually install the metapackage generated by
mk-build-deps using gdebi.

Fixes #2445

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496737502-10737-2-git-send-email-syuu@scylladb.com>
(cherry picked from commit a4c392c113)
2017-06-06 14:18:14 +03:00
Takuya ASADA
d4b444418a dist/debian/dep: install texlive from jessie-backports to prevent gdb build fail on jessie
Installing openjdk-8-jre-headless from jessie-backports breaks texlive on
jessie main repo.
It causes 'Unmet build dependencies' error when building gdb package.
To prevent this, force insatlling texlive from jessie-backports before start
building gdb.

Fixes #2444

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1496737502-10737-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 5608842e96)
2017-06-06 14:18:08 +03:00
Raphael S. Carvalho
befd4c9819 db: fix computation of live disk usage stat after compaction
sstable::data_size() is used by rebuild_statistics() which only
returns uncompressed data size, and the function called by it
expects actual disk space used by all components.
Boot uses add_sstable() which correctly updates the stat with
sstable::bytes_on_disk(). That's what needs to be used by
r__s() too.

Fixes #1592

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170525210055.6391-1-raphaelsc@scylladb.com>
(cherry picked from commit 3b5ad23532)
2017-05-28 10:39:14 +03:00
Avi Kivity
eb2fe0fbd3 Merge "reduce memory requirement for loading sstables" from Rapahel
"fixes a problem in which memory requirement for loading in-memory
components of sstables is very high due to unlimited parallelism."

* 'mem_requirement_sstable_load_v2_2' of github.com:raphaelsc/scylla:
  database: fix indentation of distributed_loader::open_sstable
  database: reduce memory requirement to load sstables
  sstables: loads components for a sstable in parallel
  sstables: enable read ahead for read of in-memory components
  sstables: make random_access_reader work with read ahead

(cherry picked from commit ef428d008c)
2017-05-25 12:59:55 +03:00
Raphael S. Carvalho
eb6b0b1267 db: remove partial sstable created by memtable flush which failed
partial sstable files aren't being removed after each failed attempt
to flush memtable, which happens periodically. If the cause of the
failure is ENOSPC, memtable flush will be attempted forever, and
as a result, column family may be left with a huge amount of partial
files which will overwhelm subsequent boot when removing temporary
TOC. In the past, it led to OOM because removal of temporary TOC
took place in parallel.

Fixes #2407.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170525015455.23776-1-raphaelsc@scylladb.com>
(cherry picked from commit b7e1575ad4)
2017-05-25 11:50:17 +03:00
Asias He
7836600ded streaming: Do not abort session too early in idle detection
Streaming ususally takes long time to complete. Abort it on false
positive idle detection can be very wasteful.

Increase the abort timeout from 10 minutes to a very large timeout, 300
minutes. The real idle session will be aborted eventually if other
mechanisms, e.g., streaming manager has gossip callback for on_remove
and on_restart event to abort, do not abort the session.

Fixes #2197

Message-Id: <57f81bfebfdc6f42164de5a84733097c001b394e.1494552921.git.asias@scylladb.com>
(cherry picked from commit f792c78c96)
2017-05-24 12:30:47 +03:00
Shlomi Livne
230c33da49 release: prepare for 1.7.1
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2017-05-23 22:42:52 +03:00
Raphael S. Carvalho
17d8a0c727 compaction: do not write expired cell as dead cell if it can be purged right away
When compacting a fully expired sstable, we're not allowing that sstable
to be purged because expired cell is *unconditionally* converted into a
dead cell. Why not check if the expired cell can be purged instead using
gc before and max purgeable timestamp?

Currently, we need two compactions to get rid of a fully expired sstable
which cells could have always been purged.

look at this sstable with expired cell:
  {
    "partition" : {
      "key" : [ "2" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 120,
        "liveness_info" : { "tstamp" : "2017-04-09T17:07:12.702597Z",
"ttl" : 20, "expires_at" : "2017-04-09T17:07:32Z", "expired" : true },
        "cells" : [
          { "name" : "country", "value" : "1" },
        ]

now this sstable data after first compaction:
[shard 0] compaction - Compacted 1 sstables to [...]. 120 bytes to 79
(~65% of original) in 229ms = 0.000328997MB/s.

  {
    ...
    "rows" : [
      {
        "type" : "row",
        "position" : 79,
        "cells" : [
          { "name" : "country", "deletion_info" :
{ "local_delete_time" : "2017-04-09T17:07:12Z" },
            "tstamp" : "2017-04-09T17:07:12.702597Z"
          },
        ]

now another compaction will actually get rid of data:
compaction - Compacted 1 sstables to []. 79 bytes to 0 (~0% of original)
in 1ms = 0MB/s. ~2 total partitions merged to 0

NOTE:
It's a waste of time to wait for second compaction because the expired
cell could have been purged at first compaction because it satisfied
gc_before and max purgeable timestamp.

Fixes #2249, #2253

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170413001049.9663-1-raphaelsc@scylladb.com>
(cherry picked from commit a6f8f4fe24)
2017-05-23 20:57:54 +03:00
Tomasz Grabiec
064de6f8de row_cache: Fix undefined behavior in read_wide()
_underlying is created with _range, which is captured by
reference. But range_and_underlyig_reader is moved after being
constructed by do_with(), so _range reference is invalidated.

Fixes #2377.
Message-Id: <1494492025-18091-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 0351ab8bc6)
2017-05-21 19:09:03 +03:00
Gleb Natapov
df56c108b7 database: remove temporary sstables sequentially
The code that removes each sstable runs in a thread. Parallel
removing of a lot of sstables may start a lot of threads each of which
is taking 128k for its stack. There is no much benefit in running
deletion in parallel anyway, so fix it by deleting sstables sequentially.

Fixes #2384

Message-Id: <20170516103018.GQ3874@scylladb.com>
(cherry picked from commit c7ad3b9959)
2017-05-21 18:56:22 +03:00
Tomasz Grabiec
25607ab9df range: Fix SFINAE rule for picking the best do_lower_bound()/do_upper_bound() overload
mutation_partition has a slicing constructor which is supposed to copy
only the rows from the query range. The rows are located using
nonwrapping_range::lower_bound() and
nonwrapping_range::lower_bound(). Those two have two different
implementations chosen with SFINAE. One is using std::lower_bound(),
and one is using container's built in lower_bound() should it
exist. We're using intrusive tree in mutation_partition, so
container's lower_bound() is preferred. It's O(log N) whereas
std::lower_bound() is O(N), because tree's iterator is not random
access.

However, the current rule for picking container's lower_bound() never
triggers, because lower_bound() has two overloads in the container:

  ./range.hh:618:14: error: decltype cannot resolve address of overloaded function
              typename = decltype(&std::remove_reference<Range>::type::upper_bound)>
              ^~~~~~~~

As a result, the overload which uses std::lower_bound() is used.

Spotted when running perf_fast_forward with wide partition limit in
cache lifted off. It's so slow that I timeouted waiting for the result
(> 16 min).

Fixes #2395.

Message-Id: <1495048614-9913-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 3fc1703ccf)
2017-05-18 17:12:00 +03:00
Avi Kivity
b26bd8bbeb tests: fix partitioner_test for g++ 5
It can't make the leap from dht::ring_position to
stdx::optional<range_bound<dht::ring_position>> for some reason.

(cherry picked from commit ba31619594)
2017-05-18 13:10:48 +03:00
Avi Kivity
1ca7f5458b Update seastar submodule
> tls: make shutdown/close do "clean" handshake shutdown in background
  > tls: Make sink/source (i.e. streams) first class channel owners
  > native-stack: Make sink/source (i.e. streams) first class channel owners

More close() fixes, pointed out by Tomek.
2017-05-17 19:01:44 +03:00
Calle Wilund
50c8a08e91 scylla: fix compilation errors on gcc 5
Message-Id: <1495030581-2138-1-git-send-email-calle@scylladb.com>
(cherry picked from commit 6ca07f16c1)
2017-05-17 18:04:58 +03:00
Avi Kivity
9d1b9084ed Update seastar submodule
* seastar bfa1cb2...774c09c (1):
  > posix-stack: Make sink/source (i.e. streams) first class channel owners
2017-05-17 16:44:34 +03:00
Tomasz Grabiec
e2c75d8532 Merge "Fix performance problems with high shard counts tag" from Avi
From http://github.com/avikivity/scylla exponential-sharder/v3.

The sharder, which takes a range of tokens and splits it among shards, is
slow with large shard count and the default
murmur3_partitioner_ignore_msb_bits.

This patchset fixes excessive iteration in sstable sharding metadata writer and
nonsignular range scans.

Without this patchset, sealing a memtable takes > 60 ms on a 48-shard
system.  With the patchset, it drops below the latency tracker threshold I
used (5 ms).

Fixes #2392.

(cherry picked from commit 84648f73ef)
2017-05-17 16:19:24 +03:00
Duarte Nunes
59063f4891 tests: Add test case for nonwrapping_range::intersection()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit f365b7f1f7)
2017-05-17 15:59:06 +03:00
Duarte Nunes
de79792373 nonwrapping_range: Add intersection() function
intersection() returns an optional range with the intersection of the
this range and the other, specified range.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 1f9359efba)
2017-05-17 15:58:55 +03:00
Avi Kivity
3557b449ac Merge "Adding private repository to housekeeping" from Amnon
"This series adds private repository support to scylla-housekeeping"

* 'amnon/housekeeping_private_repo_v3' of github.com:cloudius-systems/seastar-dev:
  scylla-housekeeping service: Support private repositories
  scylla-housekeeping-upstart: Use repository id, when checking for version
  scylla-housekeeping: support private repositories

(cherry picked from commit eb69fe78a4)
2017-05-17 15:58:29 +03:00
Pekka Enberg
a8e89d624a cql3: Fix variable_specifications class get_partition_key_bind_indexes()
The "_specs" array contains column specifications that have the bind
marker name if there is one. That results in
get_partition_key_bind_indices() not being able to look up a column
definition for such columns. Fix the issue by keeping track of the
actual column specifications passed to add() like Cassandra does.

Fixes #2369

(cherry picked from commit a45e656efb4c6478d80e4dfc18de99b94712eeba)
2017-05-10 10:00:47 +03:00
Pekka Enberg
31cd6914a8 cql3: Move variable_specifications implementation to source file
Move the class implementation to source file to reduce the need to
recompile everything when the implementation changes...

Message-Id: <1494312003-8428-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 5b931268d4)
2017-05-10 10:00:31 +03:00
Pekka Enberg
a441f889c3 cql3: Fix partition key bind indices for prepared statements
Fix the CQL front-end to populate the partition key bind index array in
result message prepared metadata, which is needed for CQL binary
protocol v4 to function correctly.

Fixes #2355.

(cherry picked from commit ebd76617276e660c590cec0a07e97e82422111df)

Tested-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <1494257274-1189-1-git-send-email-penberg@scylladb.com>
2017-05-10 10:00:21 +03:00
Pekka Enberg
91b7cb8576 Merge "gossip mark alive fixes" from Asias
"This series fixes the user after free issue in gossip and elimates the
duplicated / unnecessary mark alive operations.

Fixes #2341"

* tag 'asias/gossip_fix_mark_alive/v1' of github.com:cloudius-systems/seastar-dev:
  gossip: Ignore callbacks and mark alive operation in shadow round
  gossip: Ingore the duplicated mark alive operation
  gossip: Fix user after free in mark_alive

(cherry picked from commit 1e04731fa0)
2017-05-09 01:57:23 +03:00
Avi Kivity
2b17c4aacf Merge "Fix update of counter in static rows" from Paweł
"The logic responsible for converting counter updates to counter shards was
not covered by unit tests and didn't transform counter cells inside static
rows.

This series fixes the problem and makes sure that the tests cover both
static rows and transformation logic.

Fixes #2334."

* tag 'pdziepak/static-counter-updates-1.7/v1' of github.com:cloudius-systems/seastar-dev:
  tests/counter: test transform_counter_updates_to_shards
  tests/counter: test static columns
  counters: transform static rows from updates to shards
2017-05-06 15:54:20 +03:00
Pekka Enberg
f61d9ac632 release: prepare for 1.7.0 2017-05-04 15:28:28 +03:00
Asias He
fc9db8bb03 repair: Fix partition estimation
We estimate number of partitions for a given range of a column familiy
and split the range into sub ranges contains fewer partitions as a
checksum unit.

The estimation is wrong, because we need to count the partitions on all
the shards, instead of only counting the local shard.

Fixes #2299

Message-Id: <7876285bd26cfaf65563d6e03ec541626814118a.1493817339.git.asias@scylladb.com>
(cherry picked from commit 66e3b73b9c)
2017-05-03 16:26:01 +03:00
Paweł Dziepak
bd67d23927 tests/counter: test transform_counter_updates_to_shards 2017-05-02 13:49:43 +01:00
Paweł Dziepak
bdeeebbd74 tests/counter: test static columns 2017-05-02 13:49:43 +01:00
Paweł Dziepak
a1cb29e7ec counters: transform static rows from updates to shards 2017-05-02 13:49:43 +01:00
Amnon Heiman
e8369644fd scylla_setup: Fix conditional when checking for newer version
During the changes in the way the housekeeping check for newer version
and warn about it in the installation the UUID part was removed but kept
in the sarounding if.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170426075724.7132-1-amnon@scylladb.com>
(cherry picked from commit b59c95359d)
2017-05-01 12:14:04 +03:00
Glauber Costa
a36cabdb30 reduce kernel scheduler wakeup granularity
We set the scheduler wakeup granularity to 500usec, because that is the
difference in runtime we want to see from a waking task before it
preempts the running task (which will usually be Scylla). Scheduling
other processes less often is usually good for Scylla, but in this case,
one of the "other processes" is also a Scylla thread, the one we have
been using for marking ticks after we have abandoned signals.

However, there is an artifact from the Linux scheduler that causes those
preemption to be missed if the wakeup granularity is exactly twice as
small as the sched_latency. Our sched_latency is set to 1ms, which
represents the maximum time period in which we will run all runnable
tasks.

We want to keep the sched_latency at 1ms, so we will reduce the wakeup
granularity so to something slightly lower than 500usec, to make sure
that such artifact won't affect the scheduler calculations. 499.99usec
will do - according to my tests, but we will reduce it to a round
number.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20170427135039.8350-1-glauber@scylladb.com>
(cherry picked from commit 14b9aa2285)
2017-05-01 11:13:51 +03:00
Raphael S. Carvalho
1d26fab73e sstables: add method to export ancestors
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-05-01 11:09:42 +03:00
Shlomi Livne
5f0c635da7 release: prepare for 1.7.rc3
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2017-05-01 09:53:20 +03:00
Raphael S. Carvalho
82cc3d7aa5 dtcs: do not compact fully expired sstable which ancestor is not deleted yet
Currently, fully expired sstable[1] is unconditionally chosen for compaction
by DTCS, but that may lead to a compaction loop under certain conditions.

Let's consider that an almost expired sstable is compacted, and it's not
deleted yet, and that the new sstable becomes expired before its ancestor is
deleted.
Because this new sstable is expired, it will be chosen by DTCS, but it will
not be purged because 'compacted undeleted' sstables are taken into account
by calculation of max purgeable timestamp and prevents expired data from
being purged. The problem is that this sequence of events can keep happening
forever as reported by issue #2260.
NOTE: This problem was easier to reproduce before improvement on compaction
of expired cells, because fully expired sstable was being converted into a
sstable full of tombstones, which is also considered fully expired.

Fixes #2260.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170428233554.13744-1-raphaelsc@scylladb.com>
(cherry picked from commit 687a4bb0c2)
2017-04-30 19:36:00 +03:00
Paweł Dziepak
98d782cfe1 db: make virtual dirty soft limit configurable
Message-Id: <20170428150005.28454-1-pdziepak@scylladb.com>
(cherry picked from commit 24f4dcf9e4)
2017-04-30 19:17:55 +03:00
Avi Kivity
ea0591ad3d Merge "] Fix problems with slicing using sstable's promoted index" from Tomasz
"Fixes #2327.
Fixes #2326."

* 'tgrabiec/fix-promoted-index-parsing-1.7' of github.com:cloudius-systems/seastar-dev:
  sstables: Fix incorrect parsing of cell names in promoted index
  sstables: Fix find_disk_ranges() to not miss relevant range tombstones
2017-04-30 14:48:54 +03:00
Paweł Dziepak
7eedd743bf lsa: introduce upper bound on zone size
Attempting to create huge zones may introduce significant latency. This
patch introduces the maximum allowed zone size so that the time spent
trying to allocate and initialising zone is bounded.

Fixes #2335.

Message-Id: <20170428145916.28093-1-pdziepak@scylladb.com>
(cherry picked from commit f5cf86484e)
2017-04-30 10:58:34 +03:00
Tomasz Grabiec
8a21961ec9 sstables: Fix incorrect parsing of cell names in promoted index
Range tombstones are serialized to cell names in this place:

  _sst.maybe_flush_pi_block(_out, start, {});

Note that the column set is empty. This is correct. A range tombstone
only has a clustering part. The cell name is deserialized by promoted
index reader using mp_row_consumer::column, like this:

   mp_row_consumer::column col(schema, std::move(col_name),
      api::max_timestamp); return std::move(col.clustering);

The problem is, column constructor assumes that there is always a
component corresponding to a cell name if the table is not dense, and
will pop it from the set of components (the clustering field):

  , cell(!schema.is_dense() ? pop_back(clustering) : (*(schema.regular_begin())).name())

promoted index block which starts or ends with a range tombstone will
appear as having incorrect bounds. This may result in an incorrect
value for data file range start to be calculated.

Fixes #2327.
2017-04-27 18:30:00 +02:00
Tomasz Grabiec
08698d9030 sstables: Fix find_disk_ranges() to not miss relevant range tombstones
Suppose the promoted index looks like this:

block0: start=1 end=2
block1: start=4 end=5

start and end are cell names of the first and last cell in the block.

If there is a range tombstone covering [2,3], it will be only in
block0, because it is no longer in effect when block1 starts. However,
slicing the index for [3, +inf], which intersects with the tombstone,
will yield block1. That's because the slicing looks for a block with
an end which is greater than or equal to the start of the slice:

 if (!found_range_start) {
    if (!range_start || cmp(range_start->value(), end_ck) <= 0) {
       range_start_pos = ie.position() + offset;

We should take into account that any given block may actually contain
information for anything up to the start of the next block, so instead
of using end_ck, effectively use next block's start_ck (exclusive).

Fixes #2326.
2017-04-27 18:30:00 +02:00
Tomasz Grabiec
df5a291c63 sstables: Fix usage of wrong comparator in find_disk_ranges()
This made a difference if clustering restriction bounds were not full
keys but prefixes.

Fixes #2272.

Message-Id: <1493058357-24156-1-git-send-email-tgrabiec@scylladb.com>
2017-04-24 21:56:07 +03:00
Avi Kivity
1a77312aec Merge "Reduce memory reclamation latency" from Tomasz
"Currently eviction is performed until occupancy of the whole region
drops below the 85% threshold. This may take a while if region had
high occupancy and is large. We could improve the situation by only
evicting until occupancy of the sparsest segment drops below the
threshold, as is done by this change.

I tested this using a c-s read workload in which the condition
triggers in the cache region, with 1G per shard:

 lsa-timing - Reclamation cycle took 12.934 us.
 lsa-timing - Reclamation cycle took 47.771 us.
 lsa-timing - Reclamation cycle took 125.946 us.
 lsa-timing - Reclamation cycle took 144356 us.
 lsa-timing - Reclamation cycle took 655.765 us.
 lsa-timing - Reclamation cycle took 693.418 us.
 lsa-timing - Reclamation cycle took 509.869 us.
 lsa-timing - Reclamation cycle took 1139.15 us.

The 144ms pause is when large eviction is necessary.

Statistics for reclamation pauses for a read workload over
larger-than-memory data set:

Before:

 avg = 865.796362
 stdev = 10253.498038
 min = 93.891000
 max = 264078.000000
 sum = 574022.988000
 samples = 663

After:

 avg = 513.685650
 stdev = 275.270157
 min = 212.286000
 max = 1089.670000
 sum = 340573.586000
 samples = 663

Refs #1634."

* tag 'tgrabiec/lsa-reduce-reclaim-latency-v3' of github.com:cloudius-systems/seastar-dev:
  lsa: Reduce reclamation latency
  tests: Add test for log_histogram
  log_histogram: Allow non-power-of-two minimum values
  lsa: Use regular compaction threshold in on-idle compaction
  tests: row_cache_test: Induce update failure more reliably
  lsa: Add getter for region's eviction function

(cherry picked from commit fccbf2c51f)

[avi: adjustments for 1.7's heap vs. master's log_histogram]
2017-04-21 22:12:52 +03:00
Duarte Nunes
ea684c9a3e alter_type_statement: Fix signed to unsigned conversion
This could allow us to alter a non-existing field of an UDT.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170419114254.5582-1-duarte@scylladb.com>
(cherry picked from commit e06bafdc6c)
2017-04-19 14:48:27 +03:00
Raphael S. Carvalho
2df7c80c66 compaction_manager: fix crash when dropping a resharding column family
Problem is that column family field of task wasn't being set for resharding,
so column family wasn't being properly removed from compaction manager.
In addition to fixing this issue, we'll also interrupt ongoing compactions
when dropping a column family, exactly like we do with shutdown.

Fixes #2291.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170418125807.7712-1-raphaelsc@scylladb.com>
(cherry picked from commit e78db43b79)
2017-04-18 17:40:09 +03:00
Raphael S. Carvalho
193b5d1782 partitioned_sstable_set: fix quadratic space complexity
streaming generates lots of small sstables with large token range,
which triggers O(N^2) in space in interval map.
level 0 sstables will now be stored in a structure that has O(N)
in space complexity and which will be included for every read.

Fixes #2287.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com>
(cherry picked from commit 11b74050a1)
2017-04-18 13:05:00 +03:00
Asias He
6609c9accb gossip: Fix possible use-after-free of entry in endpoint_state_map
We take a reference of endpoint_state entry in endpoint_state_map. We
access it again after code which defers, the reference can be invalid
after the defer if someone deletes the entry during the defer.

Fix this by checking take the reference again after the defering code.

I also audited the code to remove unsafe reference to endpoint_state_map entry
as much as possible.

Fixes the following SIGSEGV:

Core was generated by `/usr/bin/scylla --log-to-syslog 1 --log-to-stdout
0 --default-log-level info --'.
Program terminated with signal SIGSEGV, Segmentation fault.
(this=<optimized out>) at /usr/include/c++/5/bits/stl_pair.h:127
127     in /usr/include/c++/5/bits/stl_pair.h
[Current thread is 1 (Thread 0x7f1448f39bc0 (LWP 107308))]

Fixes #2271

Message-Id: <529ec8ede6da884e844bc81d408b93044610afd2.1491960061.git.asias@scylladb.com>
(cherry picked from commit d27b47595b)
2017-04-13 13:18:41 +03:00
Pekka Enberg
2f107d3f61 Update seastar submodule
* seastar 211ab4a...bfa1cb2 (1):
  > resource: reduce default_reserve_memory size to fit low memory environment

Fixes #2186
2017-04-12 08:41:40 +03:00
Takuya ASADA
dd9afa4c93 dist/debian/debian/scylla-server.upstart: export SCYLLA_CONF, SCYLLA_HOME
We are sourcing sysconfig file on upstart, but forgot to load them as
environment variables.
So export them.

Fixes #2236

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1491209505-32293-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit b087616a6c)
2017-04-04 11:00:33 +03:00
Pekka Enberg
4021e2befb Update seastar submodule
* seastar f391f9e...211ab4a (1):
  > http: catch and count errors in read and respond

Fixes #2242
2017-04-03 12:02:43 +03:00
Calle Wilund
9b26a57288 commitlog/replayer: Bugfix: minimum rp broken, and cl reader offset too
The previous fix removed the additional insertion of "min rp" per source
shard based on whether we had processed existing CF:s or not (i.e. if
a CF does not exist as sstable at all, we must tag it as zero-rp, and
make whole shard for it start at same zero.

This is bad in itself, because it can cause data loss. It does not cause
crashing however. But it did uncover another, old old lingering bug,
namely the commitlog reader initiating its stream wrongly when reading
from an actual offset (i.e. not processing the whole file).
We opened the file stream from the file offset, then tried
to read the file header and magic number from there -> boom, error.

Also, rp-to-file mapping was potentially suboptimal due to using
bucket iterator instead of actual range.

I.e. three fixes:
* Reinstate min position guarding for unencoutered CF:s
* Fix stream creating in CL reader
* Fix segment map iterator use.

v2:
* Fix typo
Message-Id: <1490611637-12220-1-git-send-email-calle@scylladb.com>

(cherry picked from commit b12b65db92)
2017-03-28 10:35:04 +02:00
Pekka Enberg
31b5ef13c2 release: prepare for 1.7.rc2 2017-03-23 13:22:59 +02:00
Takuya ASADA
4bbee01288 dist/common/scripts/scylla_raid_setup: don't discard blocks at mkfs time
Discarding blocks on large RAID volume takes too much time, user may suspects
the script doesn't works correctly, so it's better to skip, do discard directly on each volume instead.

Fixes #1896

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1489533460-30127-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit b65d58e90e)
2017-03-23 09:42:51 +02:00
Calle Wilund
3cc03f88fd commitlog_replayer: Do proper const-loopup of min positions for shards
Fixes #2173

Per-shard min positions can be unset if we never collected any
sstable/truncation info for it, yet replay segments of that id.

Wrap the lookups to handle "missing data -> default", which should have been
there in the first place.

Message-Id: <1490185101-12482-1-git-send-email-calle@scylladb.com>
(cherry picked from commit c3a510a08d)
2017-03-22 17:57:30 +02:00
Vlad Zolotarov
4179d8f7c4 Don't report a Tracing session ID unless the current query had a Tracing bit in its flags
Although the current master's behaviour is legal it's suboptimal and some Clients are sensitive to that.
Let's fix that.

Fixes #2179

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1490115157-4657-1-git-send-email-vladz@scylladb.com>
2017-03-22 14:55:39 +02:00
Pekka Enberg
c20ddaf5af dist/docker: Use Scylla 1.7 RPM repository 2017-03-21 15:07:27 +02:00
Pekka Enberg
29dd48621b dist/docker: Expose Prometheus port by default
This patch exposes Scylla's Prometheus port by default. You can now use
the Scylla Monitoring project with the Docker image:

  https://github.com/scylladb/scylla-grafana-monitoring

To configure the IP addresses, use the 'docker inspect' command to
determine Scylla's IP address (assuming your running container is called
'some-scylla'):

  docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla

and then use that IP address in the prometheus/scylla_servers.yml
configuration file.

Fixes #1827

Message-Id: <1490008357-19627-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 85a127bc78)
2017-03-20 15:30:15 +02:00
Amos Kong
87de77a5ea scylla_setup: match '-p' option of lsblk with strict pattern
On Ubuntu 14.04, the lsblk doesn't have '-p' option, but
`scylla_setup` try to get block list by `lsblk -pnr` and
trigger error.

Current simple pattern will match all help content, it might
match wrong options.
  scylla-test@amos-ubuntu-1404:~$ lsblk --help | grep -e -p
   -m, --perms          output info about permissions
   -P, --pairs          use key="value" output format

Let's use strict pattern to only match option at the head. Example:
  scylla-test@amos-ubuntu-1404:~$ lsblk --help | grep -e '^\s*-D'
   -D, --discard        print discard capabilities

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <4f0f318353a43664e27da8a66855f5831457f061.1489712867.git.amos@scylladb.com>
(cherry picked from commit 468df7dd5f)
2017-03-20 08:11:57 +02:00
Raphael S. Carvalho
66c4dcba8e database: serialize sstable cleanup
We're cleaning up sstables in parallel. That means cleanup may need
almost twice the disk space used by all sstables being cleaned up,
if almost all sstables need cleanup and every one will discard an
insignificant portion of its whole data.
Given that cleanup is frequently issued when node is running out of
disk space, we should serialize cleanups in every shard to decrease
the disk space requirement.

Fixes #192.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170317022911.10306-1-raphaelsc@scylladb.com>
(cherry picked from commit 7deeffc953)
2017-03-19 17:16:33 +02:00
Pekka Enberg
7cfdc08af9 cql3: Wire up functions for floating-point types
Fixes #2168
Message-Id: <1489661748-13924-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 3afd7f39b5)
2017-03-17 11:14:51 +02:00
Pekka Enberg
fdbe5caf41 Update scylla-ami submodule
* dist/ami/files/scylla-ami eedd12f...407e8f3 (1):
  > scylla_create_devices: check block device is exists

Fixes #2171
2017-03-17 11:14:17 +02:00
Tomasz Grabiec
522e62089b lsa: Fix debug-mode compilation error
By moving definitions of setters out of #ifdef

(cherry picked from commit 3609665b19)
2017-03-16 18:24:27 +01:00
Avi Kivity
699648d5a1 Merge "tests: Use allocating_section in lsa_async_eviction_test" from Tomasz
"The test allocates objects in batches (allocation is always under a reclaim
lock) of ~3MiB and assumes that it will always succeed because if we cross the
low water mark for free memory (20MiB) in seastar, reclamation will be
performed between the batches, asynchronously.

Unfortunately that's prevented by can_allocate_more_memory(), which fails
segment allocation when we're below the low water mark. LSA currently doesn't
allow allocating below the low water mark.

The solution which is employed across the code base is to use allocating_section,
so use it here as well.

Exposed by recent consistent failures on branch-1.7."

* 'tgrabiec/fix-lsa-async-eviction-test' of github.com:cloudius-systems/seastar-dev:
  tests: lsa_async_eviction_test: Allocate objects under allocating section
  lsa: Allow adjusting reserves in allocating_section

(cherry picked from commit 434a4fee28)
2017-03-16 12:44:54 +02:00
Calle Wilund
698a4e62d9 commitlog_replayer: Make replay parallel per shard
Fixes #2098

Replay previously did all segments in parallel on shard 0, which
caused heavy memory load. To reduce this and spread footprint
across shards, instead do X segments per shard, sequential per shard.

v2:
* Fixed whitespace errors

Message-Id: <1489503382-830-1-git-send-email-calle@scylladb.com>
(cherry picked from commit 078589c508)
2017-03-15 13:07:45 +02:00
Amnon Heiman
63bec22d28 database: requests_blocked_memory metric should be unique
Metrics name should be unique per type.

requests_blocked_memory was registered twice, one as a gauge and one as
derived.

This is not allowed.

Fixes #2165

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170314162826.25521-1-amnon@scylladb.com>
(cherry picked from commit 0a2eba1b94)
2017-03-15 12:43:01 +02:00
Amnon Heiman
3d14e6e802 storage_proxy: metrics should have unique name
Metrics should have their unique name. This patch changes
throttled_writes of the queu lenght to current_throttled_writes.

Without it, metrics will be reported twice under the same name, which
may cause errors in the prometheus server.

This could be related to scylladb/seastar#250

Fixes #2163.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170314081456.6392-1-amnon@scylladb.com>
(cherry picked from commit 295a981c61)
2017-03-15 12:43:01 +02:00
Glauber Costa
ea4a2dad96 raid script: improve test for mounted filesystem
The current test for whether or not the filesystem is mounted is weak
and will fail if multiple pieces of the hierarchy are mounted.

util-linux ships with a mountpoint command that does exactly that,
so we'll use that instead.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1488742801-4907-1-git-send-email-glauber@scylladb.com>
(cherry picked from commit 2d620a25fb)
2017-03-13 17:04:58 +02:00
Glauber Costa
655e6197cb setup: support mount points in raid script
By default behavior is kept the same. There are deployments in which we
would like to mount data and commitlog to different places - as much as
we have avoided this up until this moment.

One example is EC2, where users may want to have the commitlog mounted
in the SSD drives for faster writes but keep the data in larger, less
expensive and durable EBS volumes.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1488258215-2592-1-git-send-email-glauber@scylladb.com>
(cherry picked from commit 9e61a73654)
2017-03-13 16:51:15 +02:00
Asias He
1a1370d33e repair: Fix midpoint is not contained in the split range assertion in split_and_add
We have:

  auto halves = range.split(midpoint, dht::token_comparator());

We saw a case where midpoint == range.start, as a result, range.split
will assert becasue the range.start is marked non-inclusive, so the
midpoint doesn't appear to be contain()ed in the range - hence the
assertion failure.

Fixes #2148

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Asias He <asias@scylladb.com>
Message-Id: <93af2697637c28fbca261ddfb8375a790824df65.1489023933.git.asias@scylladb.com>
(cherry picked from commit 39d2e59e7e)
2017-03-09 09:16:57 +01:00
Paweł Dziepak
7f17424a4e Merge "Avoid loosing changes to keyspace parameters of system_auth and tracing keyspaces" form Tomek
"If a node is bootstrapped with auto_boostrap disabled, it will not
wait for schema sync before creating global keyspaces for auth and
tracing. When such schema changes are then reconciled with schema on
other nodes, they may overwrite changes made by the user before the
node was started, because they will have higher timestamp.

To prevent that, let's use minimum timestamp so that default schema
always looses with manual modifications. This is what Cassandra does.

Fixes #2129."

* tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev:
  db: Create default auth and tracing keyspaces using lowest timestamp
  migration_manager: Append actual keyspace mutations with schema notifications

(cherry picked from commit 6db6d25f66)
2017-03-08 16:31:41 +02:00
Nadav Har'El
dd56f1bec7 sstable decompression: fix skip() to end of file
The skip() implementation for the compressed file input stream incorrectly
handled the case of skipping to the end of file: In that case we just need
to update the file pointer, but not skip anywhere in the compressed disk
file; In particular, we must NOT call locate() to find the relevant on-disk
compressed chunk, because there is none - locate() can only be called on
actual positions of bytes, not on the one-past-end-of-file position.

Fixes #2143

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170308100057.23316-1-nyh@scylladb.com>
(cherry picked from commit 506e074ba4)
2017-03-08 12:35:39 +02:00
Pekka Enberg
5df61797d6 release: prepare for 1.7.rc1 2017-03-08 12:25:34 +02:00
Paweł Dziepak
b6db9e3d51 db: make do_apply_counter_update() propagate timeout to db_apply()
db_apply() expects to be given a time point at which the request will
time out. Originally, do_apply_counter_update() passed 0, which meant
that all requests were timed out if do_apply() needed to wait. The
caller of do_apply_counter_update() is already given a correct timeout
time point so the only thing needed to fix this problem it to propagate
it properly inside do_apply_counter_update() to the call to do_apply().

Fixes #2119.
Message-Id: <20170307104405.5843-1-pdziepak@scylladb.com>
2017-03-07 12:44:11 +01:00
Gleb Natapov
f2595bea85 memtable: do not open code logalloc::reclaim_lock use
logalloc::reclaim_lock prevents reclaim from running which may cause
regular allocation to fail although there is enough of free memory.
To solve that there is an allocation_section which acquire reclaim_lock
and if allocation fails it run reclaimer outside of a lock and retries
the allocation. The patch make use of allocation_section instead of
direct use of reclaim_lock in memtable code.

Fixes #2138.

Message-Id: <20170306160050.GC5902@scylladb.com>
(cherry picked from commit d7bdf16a16)
2017-03-07 11:16:15 +02:00
Gleb Natapov
e930ef0ee0 memtable: do not yield while holding reclaim_lock
Holding reclaim_lock while yielding may cause memory allocations to
fail.

Fixes #2139

Message-Id: <20170306153151.GA5902@scylladb.com>
(cherry picked from commit 5c4158daac)
2017-03-06 18:35:46 +02:00
Takuya ASADA
4cf0f88724 dist/redhat: enables discard on CentOS/RHEL RAID0
Since CentOS/RHEL raid module disables discard by default, we need enable it
again to use.

Fixes #2033

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1488407037-4795-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 6602221442)
2017-03-06 12:22:17 +02:00
Avi Kivity
372f07b06e Update scylla-ami submodule
* dist/ami/files/scylla-ami d5a4397...eedd12f (3):
  > Rewrite disk discovery to handle EBS and NVMEs.
  > add --developer-mode option
  > trivial cleanup: replace tab in indent
2017-03-04 13:31:08 +02:00
Tomasz Grabiec
0ccc6630a8 db: Fix overflow of gc_clock time point
If query_time is time_point::min(), which is used by
to_data_query_result(), the result of subtraction of
gc_grace_seconds() from query_time will overflow.

I don't think this bug would currently have user-perceivable
effects. This affects which tombstones are dropped, but in case of
to_data_query_result() uses, tombstones are not present in the final
data query result, and mutation_partition::do_compact() takes
tombstones into consideration while compacting before expiring them.

Fixes the following UBSAN report:

  /usr/include/c++/5.3.1/chrono:399:55: runtime error: signed integer overflow: -2147483648 - 604800 cannot be represented in type 'int'

Message-Id: <1488385429-14276-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 4b6e77e97e)
2017-03-01 18:50:19 +02:00
Takuya ASADA
b95a2338be dist/debian/dep: fix broken link of gcc-5, update it to 5.4.1-5
Since gcc-5/stretch=5.4.1-2 removed from apt repository, we nolonger able to
build gcc-5.

To avoid dead link, use launchpad.net archives instead of using apt-get source.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1488189378-5607-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit ba323e2074)
2017-03-01 17:13:42 +02:00
Tomasz Grabiec
f2d0ac9994 query: Fix invalid initialization of _memory_tracker by moving-from-self
Fixes the following UBSAN warning:

  core/semaphore.hh:293:74: runtime error: reference binding to misaligned address 0x0000006c55d7 for type 'struct basic_semaphore', which requires 8 byte alignment

Since the field was not initialied properly, probably also fixes some
user-visible bug.
Message-Id: <1488368222-32009-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 0c84f00b16)
2017-03-01 11:56:49 +00:00
Gleb Natapov
56725de0db sstable: close sstable_writer's file if writing of sstable fails.
Failing to close a file properly before destroying file's object causes
crashes.

[tgrabiec: fixed typo]

Fixes #2122.

Message-Id: <20170221144858.GG11471@scylladb.com>
(cherry picked from commit 0977f4fdf8)
2017-02-28 11:04:26 +02:00
Avi Kivity
6f479c8999 Update seastar submodule
* seastar b14373b...f391f9e (1):
  > fix append_challenged_posix_file_impl::process_queue() to handle recursion

Fixes #2121.
2017-02-28 10:55:54 +02:00
Calle Wilund
8c0488bce9 messaging_service: Move log printout to actual listen start
Fixes  #1845
Log printout was before we actually had evaluated endpoint
to create, thus never included SSL info.
Message-Id: <1487766738-27797-1-git-send-email-calle@scylladb.com>

(cherry picked from commit d5f57bd047)
2017-02-23 13:18:33 +02:00
Avi Kivity
68dd11e275 config: enable new sharding algorithm for new deployments
Set murmur3_partitioner_ignore_msb_bits to 12 (enabling the new sharding
algorithm), but do this in scylla.yaml rather than the built-in defaults.
This avoids changing the configuration for existing clusters, as their
scylla.yaml file will not be updated during the upgrade.
Message-Id: <20170214123253.3933-1-avi@scylladb.com>

(cherry picked from commit 9b113ffd3e)
2017-02-22 11:23:46 +01:00
Tomasz Grabiec
a64c53d05f Update seastar submodule
* seastar fc27cec...b14373b (1):
  > reactor utilization should return the utilization in 0-1 range
2017-02-22 09:38:17 +01:00
Paweł Dziepak
42e7a59cca tests/cql_test_env: wait for storage service initialization
Message-Id: <20170221121130.14064-1-pdziepak@scylladb.com>
(cherry picked from commit 274bcd415a)
2017-02-21 17:06:10 +02:00
Avi Kivity
2cd019ee47 Merge "Fixes for counter cell locking" from Paweł
"This series contains some fixes and a unit test for the logic responsible
for locking counter cells."

* 'pdziepak/cell-locking-fixes/v1' of github.com:cloudius-systems/seastar-dev:
  tests: add test for counter cell locker
  cell_locking: fix schema upgrades
  cell_locker: make locker non-movable
  cell_locking: allow to be included by anyone

(cherry picked from commit b8c4b35b57)
2017-02-15 17:37:38 +02:00
Takuya ASADA
bc8b553bec dist/redhat: stop backporting ninja-build from Fedora, install it from EPEL instead
ninja-build-1.6.0-2.fc23.src.rpm on fedora web site deleted for some
reason, but there is ninja-build-1.7.2-2 on EPEL, so we don't need to
backport from Fedora anymore.

Fixes #2087

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1487155729-13257-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 9c8515eeed)
2017-02-15 12:58:44 +02:00
Avi Kivity
0ba98be899 Update seastar submodule
* seastar bff963a...fc27cec (1):
  > collectd: send double correctly for gauge
2017-02-14 16:09:22 +02:00
Avi Kivity
d6899134a7 Update seastar submodule
* seastar f07f8ed...bff963a (1):
  > prometheus: send one MetricFamily per unique metric name
2017-02-13 11:50:43 +02:00
Avi Kivity
5253031110 seastar: point submodule at scylla-seastar.git
Allows backporting seastar patches independently of master.
2017-02-13 11:49:54 +02:00
Avi Kivity
a203c87f0d Merge "Disallow mixed schemas" fro Paweł
"This series makes sure that schemas containing both counter and non-counter
regular or static columns are not allowed."

* 'pdziepak/disallow-mixed-schemas/v1' of github.com:cloudius-systems/seastar-dev:
  schema: verify that there are no both counter and non-counter columns
  test/mutation_source: specify whether to generate counter mutations
  tests/canonical_mutation: don't try to upgrade incompatible schemas

(cherry picked from commit 9e4ae0763d)
2017-02-07 18:04:24 +02:00
Gleb Natapov
37fc0e6840 storage_proxy: use storage_proxy clock instead of explicit lowres_clock
Merge commit 45b6070832 used butchered version of storage_proxy
patch to adjust to rpc timer change instead the one I've sent. This
patch fixes the differences.

Message-Id: <20170206095237.GA7691@scylladb.com>
(cherry picked from commit 3c372525ed)
2017-02-06 12:51:52 +02:00
Avi Kivity
0429e5d8ea cell_locking: work around for missing boost::container::small_vector
small_vector doesn't exist on Ubuntu 14.04's boost, use std::vector
instead.

(cherry picked from commit 6e9e28d5a3)
2017-02-05 20:49:43 +02:00
Avi Kivity
3c147437ac dist: add build dependency on automake
Needed by seastar's c-ares.

(cherry picked from commit 2510b756fc)
2017-02-05 20:17:27 +02:00
Takuya ASADA
e4b3f02286 dist/common/systemd: introduce scylla-housekeeping restart mode
scylla-housekeeping requires to run 'restart mode' for check the version during
scylla-server restart, which wasn't called on systemd timer so added it.

Existing scylla-housekeeping.timer renamed to scylla-housekeeping-daily.timer,
since it is running 'daily mode'.

Fixes #1953

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1486180031-18093-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit e82932b774)
2017-02-05 11:28:03 +02:00
Avi Kivity
5a8013e155 dist: add libtool build dependency for seastar/c-ares
(cherry picked from commit 4175f40da1)
2017-02-05 11:27:38 +02:00
Pekka Enberg
fdba5b8eac release: prepare for 1.7.rc0 2017-02-04 11:04:32 +02:00
Paweł Dziepak
558a52802a cell_locking: fix parititon_entry::equal_compare
The comparator constructor took schema by value instead of const l-ref
and, consequently, later tried to access object that has been destroyed
long time ago.
Message-Id: <20170202135853.8190-1-pdziepak@scylladb.com>

(cherry picked from commit 37b0c71f1d)
2017-02-03 21:28:42 +02:00
Avi Kivity
4f416c7272 Merge "Avoid avalanche of tasks after memtable flush" from Tomasz
"Before, the logic for releasing writes blocked on dirty worked like this:

  1) When region group size changes and it is not under pressure and there
     are some requests blocked, then schedule request releasing task

  2) request releasing task, if no pressure, runs one request and if there are
     still blocked requests, schedules next request releasing task

If requests don't change the size of the region group, then either some request
executes or there is a request releasing task scheduled. The amount of scheduled
tasks is at most 1, there is a single releasing thread.

However, if requests themselves would change the size of the group, then each
such change would schedule yet another request releasing thread, growing the task
queue size by one.

The group size can also change when memory is reclaimed from the groups (e.g.
when contains sparse segments). Compaction may start many request releasing
threads due to group size updates.

Such behavior is detrimental for performance and stability if there are a lot
of blocked requests. This can happen on 1.5 even with modest concurrency
because timed out requests stay in the queue. This is less likely on 1.6 where
they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in the
system. When the amount of scheduled tasks reaches 1000, polling stops and
server becomes unresponsive until all of the released requests are done, which
is either when they start to block on dirty memory again or run out of blocked
requests. It may take a while to reach pressure condition after memtable flush
if it brings virtual dirty much below the threshold, which is currently the
case for workloads with overwrites producing sparse regions.

I saw this happening in a write workload from issue #2021 where the number of
request releasing threads grew into thousands.

Fix by ensuring there is at most one request releasing thread at a time. There
will be one releasing fiber per region group which is woken up when pressure is
lifted. It executes blocked requests until pressure occurs."

* tag 'tgrabiec/lsa-single-threaded-releasing-v2' of github.com:cloudius-systems/seastar-dev:
  tests: lsa: Add test for reclaimer starting and stopping
  tests: lsa: Add request releasing stress test
  lsa: Avoid avalanche releasing of requests
  lsa: Move definitions to .cc
  lsa: Simplify hard pressure notification management
  lsa: Do not start or stop reclaiming on hard pressure
  tests: lsa: Adjust to take into account that reclaimers are run synchronously
  lsa: Document and annotate reclaimer notification callbacks
  tests: lsa: Use with_timeout() in quiesce()

(cherry picked from commit 7a00dd6985)
2017-02-03 09:47:50 +01:00
4093 changed files with 81581 additions and 311156 deletions

View File

@@ -1,4 +0,0 @@
.git
build
seastar/build
testlog

87
.github/CODEOWNERS vendored
View File

@@ -1,87 +0,0 @@
# AUTH
auth/* @elcallio @vladzcloudius
# CACHE
row_cache* @tgrabiec @haaawk
*mutation* @tgrabiec @haaawk
tests/mvcc* @tgrabiec @haaawk
# CDC
cdc/* @haaawk @kbr- @elcallio @piodul @jul-stas
test/cql/cdc_* @haaawk @kbr- @elcallio @piodul @jul-stas
test/boost/cdc_* @haaawk @kbr- @elcallio @piodul @jul-stas
# COMMITLOG / BATCHLOG
db/commitlog/* @elcallio
db/batch* @elcallio
# COORDINATOR
service/storage_proxy* @gleb-cloudius
# COMPACTION
sstables/compaction* @raphaelsc @nyh
# CQL TRANSPORT LAYER
transport/* @penberg
# CQL QUERY LANGUAGE
cql3/* @tgrabiec @penberg @psarna
# COUNTERS
counters* @haaawk @jul-stas
tests/counter_test* @haaawk @jul-stas
# GOSSIP
gms/* @tgrabiec @asias
# DOCKER
dist/docker/* @penberg
# LSA
utils/logalloc* @tgrabiec
# MATERIALIZED VIEWS
db/view/* @nyh @psarna
cql3/statements/*view* @nyh @psarna
test/boost/view_* @nyh @psarna
# PACKAGING
dist/* @syuu1228
# REPAIR
repair/* @tgrabiec @asias @nyh
# SCHEMA MANAGEMENT
db/schema_tables* @tgrabiec @nyh
db/legacy_schema_migrator* @tgrabiec @nyh
service/migration* @tgrabiec @nyh
schema* @tgrabiec @nyh
# SECONDARY INDEXES
db/index/* @nyh @penberg @psarna
cql3/statements/*index* @nyh @penberg @psarna
test/boost/*index* @nyh @penberg @psarna
# SSTABLES
sstables/* @tgrabiec @raphaelsc @nyh
# STREAMING
streaming/* @tgrabiec @asias
service/storage_service.* @tgrabiec @asias
# ALTERNATOR
alternator/* @nyh @psarna
test/alternator/* @nyh @psarna
# HINTED HANDOFF
db/hints/* @haaawk @piodul @vladzcloudius
# REDIS
redis/* @nyh @syuu1228
redis-test/* @nyh @syuu1228
# READERS
reader_* @denesb
querier* @denesb
test/boost/mutation_reader_test.cc @denesb
test/boost/querier_cache_test.cc @denesb

View File

@@ -1,9 +1,3 @@
This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
- [] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.
*Installation details*
Scylla version (or git commit hash):
Cluster size:

16
.gitignore vendored
View File

@@ -9,19 +9,3 @@ dist/ami/files/*.rpm
dist/ami/variables.json
dist/ami/scylla_deploy.sh
*.pyc
Cql.tokens
.kdev4
*.kdev4
CMakeLists.txt.user
.cache
.tox
*.egg-info
__pycache__CMakeLists.txt.user
.gdbinit
resources
.pytest_cache
/expressions.tokens
tags
testlog
test/*/*.reject
.vscode

20
.gitmodules vendored
View File

@@ -1,23 +1,11 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui
url = ../scylla-swagger-ui
ignore = dirty
[submodule "libdeflate"]
path = libdeflate
url = ../libdeflate
[submodule "abseil"]
path = abseil
url = ../abseil-cpp
[submodule "scylla-jmx"]
path = tools/jmx
url = ../scylla-jmx
[submodule "scylla-tools"]
path = tools/java
url = ../scylla-tools-java
[submodule "scylla-python3"]
path = tools/python3
url = ../scylla-python3
[submodule "dist/ami/files/scylla-ami"]
path = dist/ami/files/scylla-ami
url = ../scylla-ami

View File

@@ -1,755 +0,0 @@
cmake_minimum_required(VERSION 3.18)
project(scylla)
if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
message(STATUS "Setting build type to 'Release' as none was specified.")
set(CMAKE_BUILD_TYPE "Release" CACHE
STRING "Choose the type of build." FORCE)
# Set the possible values of build type for cmake-gui
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
"Debug" "Release" "Dev" "Sanitize")
endif()
if(CMAKE_BUILD_TYPE)
string(TOLOWER "${CMAKE_BUILD_TYPE}" BUILD_TYPE)
else()
set(BUILD_TYPE "release")
endif()
function(default_target_arch arch)
set(x86_instruction_sets i386 i686 x86_64)
if(CMAKE_SYSTEM_PROCESSOR IN_LIST x86_instruction_sets)
set(${arch} "westmere" PARENT_SCOPE)
elseif(CMAKE_SYSTEM_PROCESSOR EQUAL "aarch64")
set(${arch} "armv8-a+crc+crypto" PARENT_SCOPE)
else()
set(${arch} "" PARENT_SCOPE)
endif()
endfunction()
default_target_arch(target_arch)
if(target_arch)
set(target_arch_flag "-march=${target_arch}")
endif()
# Configure Seastar compile options to align with Scylla
set(Seastar_CXX_FLAGS -fcoroutines ${target_arch_flag} CACHE INTERNAL "" FORCE)
set(Seastar_CXX_DIALECT gnu++20 CACHE INTERNAL "" FORCE)
add_subdirectory(seastar)
add_subdirectory(abseil)
# Exclude absl::strerror from the default "all" target since it's not
# used in Scylla build and, moreover, makes use of deprecated glibc APIs,
# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,
# which happens to be the case for recent Fedora distribution versions.
#
# Need to use the internal "absl_strerror" target name instead of namespaced
# variant because `set_target_properties` does not understand the latter form,
# unfortunately.
set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)
# System libraries dependencies
find_package(Boost COMPONENTS filesystem program_options system thread regex REQUIRED)
find_package(Lua REQUIRED)
find_package(ZLIB REQUIRED)
find_package(ICU COMPONENTS uc REQUIRED)
set(scylla_build_dir "${CMAKE_BINARY_DIR}/build/${BUILD_TYPE}")
set(scylla_gen_build_dir "${scylla_build_dir}/gen")
file(MAKE_DIRECTORY "${scylla_build_dir}" "${scylla_gen_build_dir}")
# Place libraries, executables and archives in ${buildroot}/build/${mode}/
foreach(mode RUNTIME LIBRARY ARCHIVE)
set(CMAKE_${mode}_OUTPUT_DIRECTORY "${scylla_build_dir}")
endforeach()
# Generate C++ source files from thrift definitions
function(scylla_generate_thrift)
set(one_value_args TARGET VAR IN_FILE OUT_DIR SERVICE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(in_file_name ${args_IN_FILE} NAME_WE)
set(aux_out_file_name ${args_OUT_DIR}/${in_file_name})
set(outputs
${aux_out_file_name}_types.cpp
${aux_out_file_name}_types.h
${aux_out_file_name}_constants.cpp
${aux_out_file_name}_constants.h
${args_OUT_DIR}/${args_SERVICE}.cpp
${args_OUT_DIR}/${args_SERVICE}.h)
add_custom_command(
DEPENDS
${args_IN_FILE}
thrift
OUTPUT ${outputs}
COMMAND ${CMAKE_COMMAND} -E make_directory ${args_OUT_DIR}
COMMAND thrift -gen cpp:cob_style,no_skeleton -out "${args_OUT_DIR}" "${args_IN_FILE}")
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
set(${args_VAR} ${outputs} PARENT_SCOPE)
endfunction()
scylla_generate_thrift(
TARGET scylla_thrift_gen_cassandra
VAR scylla_thrift_gen_cassandra_files
IN_FILE interface/cassandra.thrift
OUT_DIR ${scylla_gen_build_dir}
SERVICE Cassandra)
# Parse antlr3 grammar files and generate C++ sources
function(scylla_generate_antlr3)
set(one_value_args TARGET VAR IN_FILE OUT_DIR)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(in_file_pure_name ${args_IN_FILE} NAME)
get_filename_component(stem ${in_file_pure_name} NAME_WE)
set(outputs
"${args_OUT_DIR}/${stem}Lexer.hpp"
"${args_OUT_DIR}/${stem}Lexer.cpp"
"${args_OUT_DIR}/${stem}Parser.hpp"
"${args_OUT_DIR}/${stem}Parser.cpp")
add_custom_command(
DEPENDS
${args_IN_FILE}
OUTPUT ${outputs}
# Remove #ifdef'ed code from the grammar source code
COMMAND sed -e "/^#if 0/,/^#endif/d" "${args_IN_FILE}" > "${args_OUT_DIR}/${in_file_pure_name}"
COMMAND antlr3 "${args_OUT_DIR}/${in_file_pure_name}"
# We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.
# Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
# name, we also add a global typedef to avoid compilation errors.
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.hpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.cpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Parser.hpp"
COMMAND sed -i
-e "s/^\\( *\\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$/\\1const \\2/"
-e "/^.*On :.*$/d"
-e "1i using ExceptionBaseType = int;"
-e "s/^{/{ ExceptionBaseType\\* ex = nullptr;/; s/ExceptionBaseType\\* ex = new/ex = new/; s/exceptions::syntax_exception e/exceptions::syntax_exception\\& e/"
"${args_OUT_DIR}/${stem}Parser.cpp"
VERBATIM)
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
set(${args_VAR} ${outputs} PARENT_SCOPE)
endfunction()
set(antlr3_grammar_files
cql3/Cql.g
alternator/expressions.g)
set(antlr3_gen_files)
foreach(f ${antlr3_grammar_files})
get_filename_component(grammar_file_name "${f}" NAME_WE)
get_filename_component(f_dir "${f}" DIRECTORY)
scylla_generate_antlr3(
TARGET scylla_antlr3_gen_${grammar_file_name}
VAR scylla_antlr3_gen_${grammar_file_name}_files
IN_FILE ${f}
OUT_DIR ${scylla_gen_build_dir}/${f_dir})
list(APPEND antlr3_gen_files "${scylla_antlr3_gen_${grammar_file_name}_files}")
endforeach()
# Generate C++ sources from ragel grammar files
seastar_generate_ragel(
TARGET scylla_ragel_gen_protocol_parser
VAR scylla_ragel_gen_protocol_parser_file
IN_FILE redis/protocol_parser.rl
OUT_FILE ${scylla_gen_build_dir}/redis/protocol_parser.hh)
# Generate C++ sources from Swagger definitions
set(swagger_files
api/api-doc/cache_service.json
api/api-doc/collectd.json
api/api-doc/column_family.json
api/api-doc/commitlog.json
api/api-doc/compaction_manager.json
api/api-doc/config.json
api/api-doc/endpoint_snitch_info.json
api/api-doc/error_injection.json
api/api-doc/failure_detector.json
api/api-doc/gossiper.json
api/api-doc/hinted_handoff.json
api/api-doc/lsa.json
api/api-doc/messaging_service.json
api/api-doc/storage_proxy.json
api/api-doc/storage_service.json
api/api-doc/stream_manager.json
api/api-doc/system.json
api/api-doc/utils.json)
set(swagger_gen_files)
foreach(f ${swagger_files})
get_filename_component(fname "${f}" NAME_WE)
get_filename_component(dir "${f}" DIRECTORY)
seastar_generate_swagger(
TARGET scylla_swagger_gen_${fname}
VAR scylla_swagger_gen_${fname}_files
IN_FILE "${f}"
OUT_DIR "${scylla_gen_build_dir}/${dir}")
list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")
endforeach()
# Create C++ bindings for IDL serializers
function(scylla_generate_idl_serializer)
set(one_value_args TARGET VAR IN_FILE OUT_FILE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(out_dir ${args_OUT_FILE} DIRECTORY)
set(idl_compiler "${CMAKE_SOURCE_DIR}/idl-compiler.py")
find_package(Python3 COMPONENTS Interpreter)
add_custom_command(
DEPENDS
${args_IN_FILE}
${idl_compiler}
OUTPUT ${args_OUT_FILE}
COMMAND ${CMAKE_COMMAND} -E make_directory ${out_dir}
COMMAND Python3::Interpreter ${idl_compiler} --ns ser -f ${args_IN_FILE} -o ${args_OUT_FILE})
add_custom_target(${args_TARGET}
DEPENDS ${args_OUT_FILE})
set(${args_VAR} ${args_OUT_FILE} PARENT_SCOPE)
endfunction()
set(idl_serializers
idl/cache_temperature.idl.hh
idl/commitlog.idl.hh
idl/consistency_level.idl.hh
idl/frozen_mutation.idl.hh
idl/frozen_schema.idl.hh
idl/gossip_digest.idl.hh
idl/idl_test.idl.hh
idl/keys.idl.hh
idl/messaging_service.idl.hh
idl/mutation.idl.hh
idl/paging_state.idl.hh
idl/partition_checksum.idl.hh
idl/paxos.idl.hh
idl/query.idl.hh
idl/range.idl.hh
idl/read_command.idl.hh
idl/reconcilable_result.idl.hh
idl/replay_position.idl.hh
idl/result.idl.hh
idl/ring_position.idl.hh
idl/streaming.idl.hh
idl/token.idl.hh
idl/tracing.idl.hh
idl/truncation_record.idl.hh
idl/uuid.idl.hh
idl/view.idl.hh)
set(idl_gen_files)
foreach(f ${idl_serializers})
get_filename_component(idl_name "${f}" NAME)
get_filename_component(idl_target "${idl_name}" NAME_WE)
get_filename_component(idl_dir "${f}" DIRECTORY)
string(REPLACE ".idl.hh" ".dist.hh" idl_out_hdr_name "${idl_name}")
scylla_generate_idl_serializer(
TARGET scylla_idl_gen_${idl_target}
VAR scylla_idl_gen_${idl_target}_files
IN_FILE ${f}
OUT_FILE ${scylla_gen_build_dir}/${idl_dir}/${idl_out_hdr_name})
list(APPEND idl_gen_files "${scylla_idl_gen_${idl_target}_files}")
endforeach()
set(scylla_sources
absl-flat_hash_map.cc
alternator/auth.cc
alternator/base64.cc
alternator/conditions.cc
alternator/executor.cc
alternator/expressions.cc
alternator/serialization.cc
alternator/server.cc
alternator/stats.cc
alternator/streams.cc
api/api.cc
api/cache_service.cc
api/collectd.cc
api/column_family.cc
api/commitlog.cc
api/compaction_manager.cc
api/config.cc
api/endpoint_snitch.cc
api/error_injection.cc
api/failure_detector.cc
api/gossiper.cc
api/hinted_handoff.cc
api/lsa.cc
api/messaging_service.cc
api/storage_proxy.cc
api/storage_service.cc
api/stream_manager.cc
api/system.cc
atomic_cell.cc
auth/allow_all_authenticator.cc
auth/allow_all_authorizer.cc
auth/authenticated_user.cc
auth/authentication_options.cc
auth/authenticator.cc
auth/common.cc
auth/default_authorizer.cc
auth/password_authenticator.cc
auth/passwords.cc
auth/permission.cc
auth/permissions_cache.cc
auth/resource.cc
auth/role_or_anonymous.cc
auth/roles-metadata.cc
auth/sasl_challenge.cc
auth/service.cc
auth/standard_role_manager.cc
auth/transitional.cc
bytes.cc
canonical_mutation.cc
cdc/cdc_partitioner.cc
cdc/generation.cc
cdc/log.cc
cdc/metadata.cc
cdc/split.cc
clocks-impl.cc
collection_mutation.cc
compress.cc
connection_notifier.cc
converting_mutation_partition_applier.cc
counters.cc
cql3/abstract_marker.cc
cql3/attributes.cc
cql3/cf_name.cc
cql3/column_condition.cc
cql3/column_identifier.cc
cql3/column_specification.cc
cql3/constants.cc
cql3/cql3_type.cc
cql3/expr/expression.cc
cql3/functions/aggregate_fcts.cc
cql3/functions/castas_fcts.cc
cql3/functions/error_injection_fcts.cc
cql3/functions/functions.cc
cql3/functions/user_function.cc
cql3/index_name.cc
cql3/keyspace_element_name.cc
cql3/lists.cc
cql3/maps.cc
cql3/operation.cc
cql3/query_options.cc
cql3/query_processor.cc
cql3/relation.cc
cql3/restrictions/statement_restrictions.cc
cql3/result_set.cc
cql3/role_name.cc
cql3/selection/abstract_function_selector.cc
cql3/selection/selectable.cc
cql3/selection/selection.cc
cql3/selection/selector.cc
cql3/selection/selector_factories.cc
cql3/selection/simple_selector.cc
cql3/sets.cc
cql3/single_column_relation.cc
cql3/statements/alter_keyspace_statement.cc
cql3/statements/alter_table_statement.cc
cql3/statements/alter_type_statement.cc
cql3/statements/alter_view_statement.cc
cql3/statements/authentication_statement.cc
cql3/statements/authorization_statement.cc
cql3/statements/batch_statement.cc
cql3/statements/cas_request.cc
cql3/statements/cf_prop_defs.cc
cql3/statements/cf_statement.cc
cql3/statements/create_function_statement.cc
cql3/statements/create_index_statement.cc
cql3/statements/create_keyspace_statement.cc
cql3/statements/create_table_statement.cc
cql3/statements/create_type_statement.cc
cql3/statements/create_view_statement.cc
cql3/statements/delete_statement.cc
cql3/statements/drop_function_statement.cc
cql3/statements/drop_index_statement.cc
cql3/statements/drop_keyspace_statement.cc
cql3/statements/drop_table_statement.cc
cql3/statements/drop_type_statement.cc
cql3/statements/drop_view_statement.cc
cql3/statements/function_statement.cc
cql3/statements/grant_statement.cc
cql3/statements/index_prop_defs.cc
cql3/statements/index_target.cc
cql3/statements/ks_prop_defs.cc
cql3/statements/list_permissions_statement.cc
cql3/statements/list_users_statement.cc
cql3/statements/modification_statement.cc
cql3/statements/permission_altering_statement.cc
cql3/statements/property_definitions.cc
cql3/statements/raw/parsed_statement.cc
cql3/statements/revoke_statement.cc
cql3/statements/role-management-statements.cc
cql3/statements/schema_altering_statement.cc
cql3/statements/select_statement.cc
cql3/statements/truncate_statement.cc
cql3/statements/update_statement.cc
cql3/statements/use_statement.cc
cql3/token_relation.cc
cql3/tuples.cc
cql3/type_json.cc
cql3/untyped_result_set.cc
cql3/update_parameters.cc
cql3/user_types.cc
cql3/ut_name.cc
cql3/util.cc
cql3/values.cc
cql3/variable_specifications.cc
data/cell.cc
database.cc
db/batchlog_manager.cc
db/commitlog/commitlog.cc
db/commitlog/commitlog_entry.cc
db/commitlog/commitlog_replayer.cc
db/config.cc
db/consistency_level.cc
db/cql_type_parser.cc
db/data_listeners.cc
db/extensions.cc
db/heat_load_balance.cc
db/hints/manager.cc
db/hints/resource_manager.cc
db/large_data_handler.cc
db/legacy_schema_migrator.cc
db/marshal/type_parser.cc
db/schema_tables.cc
db/size_estimates_virtual_reader.cc
db/snapshot-ctl.cc
db/sstables-format-selector.cc
db/system_distributed_keyspace.cc
db/system_keyspace.cc
db/view/row_locking.cc
db/view/view.cc
db/view/view_update_generator.cc
dht/boot_strapper.cc
dht/i_partitioner.cc
dht/murmur3_partitioner.cc
dht/range_streamer.cc
dht/token.cc
distributed_loader.cc
duration.cc
exceptions/exceptions.cc
flat_mutation_reader.cc
frozen_mutation.cc
frozen_schema.cc
gms/application_state.cc
gms/endpoint_state.cc
gms/failure_detector.cc
gms/feature_service.cc
gms/gossip_digest_ack.cc
gms/gossip_digest_ack2.cc
gms/gossip_digest_syn.cc
gms/gossiper.cc
gms/inet_address.cc
gms/version_generator.cc
gms/versioned_value.cc
hashers.cc
index/secondary_index.cc
index/secondary_index_manager.cc
init.cc
keys.cc
lister.cc
locator/abstract_replication_strategy.cc
locator/ec2_multi_region_snitch.cc
locator/ec2_snitch.cc
locator/everywhere_replication_strategy.cc
locator/gce_snitch.cc
locator/gossiping_property_file_snitch.cc
locator/local_strategy.cc
locator/network_topology_strategy.cc
locator/production_snitch_base.cc
locator/rack_inferring_snitch.cc
locator/simple_snitch.cc
locator/simple_strategy.cc
locator/snitch_base.cc
locator/token_metadata.cc
lua.cc
main.cc
memtable.cc
message/messaging_service.cc
multishard_mutation_query.cc
mutation.cc
raft/fsm.cc
raft/log.cc
raft/progress.cc
raft/raft.cc
raft/server.cc
mutation_fragment.cc
mutation_partition.cc
mutation_partition_serializer.cc
mutation_partition_view.cc
mutation_query.cc
mutation_reader.cc
mutation_writer/multishard_writer.cc
mutation_writer/shard_based_splitting_writer.cc
mutation_writer/timestamp_based_splitting_writer.cc
partition_slice_builder.cc
partition_version.cc
querier.cc
query-result-set.cc
query.cc
range_tombstone.cc
range_tombstone_list.cc
reader_concurrency_semaphore.cc
redis/abstract_command.cc
redis/command_factory.cc
redis/commands.cc
redis/keyspace_utils.cc
redis/lolwut.cc
redis/mutation_utils.cc
redis/options.cc
redis/query_processor.cc
redis/query_utils.cc
redis/server.cc
redis/service.cc
redis/stats.cc
repair/repair.cc
repair/row_level.cc
row_cache.cc
schema.cc
schema_mutations.cc
schema_registry.cc
service/client_state.cc
service/migration_manager.cc
service/migration_task.cc
service/misc_services.cc
service/pager/paging_state.cc
service/pager/query_pagers.cc
service/paxos/paxos_state.cc
service/paxos/prepare_response.cc
service/paxos/prepare_summary.cc
service/paxos/proposal.cc
service/priority_manager.cc
service/storage_proxy.cc
service/storage_service.cc
sstables/compaction.cc
sstables/compaction_manager.cc
sstables/compaction_strategy.cc
sstables/compress.cc
sstables/integrity_checked_file_impl.cc
sstables/kl/writer.cc
sstables/leveled_compaction_strategy.cc
sstables/m_format_read_helpers.cc
sstables/metadata_collector.cc
sstables/mp_row_consumer.cc
sstables/mx/writer.cc
sstables/partition.cc
sstables/prepended_input_stream.cc
sstables/random_access_reader.cc
sstables/size_tiered_compaction_strategy.cc
sstables/sstable_directory.cc
sstables/sstable_version.cc
sstables/sstables.cc
sstables/sstables_manager.cc
sstables/time_window_compaction_strategy.cc
sstables/writer.cc
streaming/progress_info.cc
streaming/session_info.cc
streaming/stream_coordinator.cc
streaming/stream_manager.cc
streaming/stream_plan.cc
streaming/stream_reason.cc
streaming/stream_receive_task.cc
streaming/stream_request.cc
streaming/stream_result_future.cc
streaming/stream_session.cc
streaming/stream_session_state.cc
streaming/stream_summary.cc
streaming/stream_task.cc
streaming/stream_transfer_task.cc
table.cc
table_helper.cc
thrift/controller.cc
thrift/handler.cc
thrift/server.cc
thrift/thrift_validation.cc
timeout_config.cc
tracing/trace_keyspace_helper.cc
tracing/trace_state.cc
tracing/traced_file.cc
tracing/tracing.cc
tracing/tracing_backend_registry.cc
transport/controller.cc
transport/cql_protocol_extension.cc
transport/event.cc
transport/event_notifier.cc
transport/messages/result_message.cc
transport/server.cc
types.cc
unimplemented.cc
utils/UUID_gen.cc
utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc
utils/array-search.cc
utils/ascii.cc
utils/big_decimal.cc
utils/bloom_calculations.cc
utils/bloom_filter.cc
utils/buffer_input_stream.cc
utils/build_id.cc
utils/config_file.cc
utils/directories.cc
utils/disk-error-handler.cc
utils/dynamic_bitset.cc
utils/error_injection.cc
utils/exceptions.cc
utils/file_lock.cc
utils/generation-number.cc
utils/gz/crc_combine.cc
utils/human_readable.cc
utils/i_filter.cc
utils/large_bitset.cc
utils/like_matcher.cc
utils/limiting_data_source.cc
utils/logalloc.cc
utils/managed_bytes.cc
utils/multiprecision_int.cc
utils/murmur_hash.cc
utils/rate_limiter.cc
utils/rjson.cc
utils/runtime.cc
utils/updateable_value.cc
utils/utf8.cc
utils/uuid.cc
validation.cc
vint-serialization.cc
zstd.cc
release.cc)
set(scylla_gen_sources
"${scylla_thrift_gen_cassandra_files}"
"${scylla_ragel_gen_protocol_parser_file}"
"${swagger_gen_files}"
"${idl_gen_files}"
"${antlr3_gen_files}")
add_executable(scylla
${scylla_sources}
${scylla_gen_sources})
target_link_libraries(scylla PRIVATE
seastar
# Boost dependencies
Boost::filesystem
Boost::program_options
Boost::system
Boost::thread
Boost::regex
Boost::headers
# Abseil libs
absl::hashtablez_sampler
absl::raw_hash_set
absl::synchronization
absl::graphcycles_internal
absl::stacktrace
absl::symbolize
absl::debugging_internal
absl::demangle_internal
absl::time
absl::time_zone
absl::int128
absl::city
absl::hash
absl::malloc_internal
absl::spinlock_wait
absl::base
absl::dynamic_annotations
absl::raw_logging_internal
absl::exponential_biased
absl::throw_delegate
# System libs
ZLIB::ZLIB
ICU::uc
systemd
zstd
snappy
${LUA_LIBRARIES}
thrift
crypt)
target_link_libraries(scylla PRIVATE
-Wl,--build-id=sha1 # Force SHA1 build-id generation
# TODO: Use lld linker if it's available, otherwise gold, else bfd
-fuse-ld=lld)
# TODO: patch dynamic linker to match configure.py behavior
target_compile_options(scylla PRIVATE
-std=gnu++20
-fcoroutines # TODO: Clang does not have this flag, adjust to both variants
${target_arch_flag})
# Hacks needed to expose internal APIs for xxhash dependencies
target_compile_definitions(scylla PRIVATE XXH_PRIVATE_API HAVE_LZ4_COMPRESS_DEFAULT)
target_include_directories(scylla PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}"
libdeflate
abseil
"${scylla_gen_build_dir}")
###
### Create crc_combine_table helper executable.
### Use it to generate crc_combine_table.cc to be used in scylla at build time.
###
add_executable(crc_combine_table utils/gz/gen_crc_combine_table.cc)
target_link_libraries(crc_combine_table PRIVATE seastar)
target_include_directories(crc_combine_table PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
target_compile_options(crc_combine_table PRIVATE
-std=gnu++20
-fcoroutines
${target_arch_flag})
add_dependencies(scylla crc_combine_table)
# Generate an additional source file at build time that is needed for Scylla compilation
add_custom_command(OUTPUT "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
COMMAND $<TARGET_FILE:crc_combine_table> > "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
DEPENDS crc_combine_table)
target_sources(scylla PRIVATE "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc")
###
### Generate version file and supply appropriate compile definitions for release.cc
###
execute_process(COMMAND ${CMAKE_SOURCE_DIR}/SCYLLA-VERSION-GEN RESULT_VARIABLE scylla_version_gen_res)
if(scylla_version_gen_res)
message(SEND_ERROR "Version file generation failed. Return code: ${scylla_version_gen_res}")
endif()
file(READ build/SCYLLA-VERSION-FILE scylla_version)
string(STRIP "${scylla_version}" scylla_version)
file(READ build/SCYLLA-RELEASE-FILE scylla_release)
string(STRIP "${scylla_release}" scylla_release)
get_property(release_cdefs SOURCE "${CMAKE_SOURCE_DIR}/release.cc" PROPERTY COMPILE_DEFINITIONS)
list(APPEND release_cdefs "SCYLLA_VERSION=\"${scylla_version}\"" "SCYLLA_RELEASE=\"${scylla_release}\"")
set_source_files_properties("${CMAKE_SOURCE_DIR}/release.cc" PROPERTIES COMPILE_DEFINITIONS "${release_cdefs}")
###
### Custom command for building libdeflate. Link the library to scylla.
###
set(libdeflate_lib "${scylla_build_dir}/libdeflate/libdeflate.a")
add_custom_command(OUTPUT "${libdeflate_lib}"
COMMAND make -C libdeflate
BUILD_DIR=../build/${BUILD_TYPE}/libdeflate/
CC=${CMAKE_C_COMPILER}
"CFLAGS=${target_arch_flag}"
../build/${BUILD_TYPE}/libdeflate//libdeflate.a) # Two backslashes are important!
# Hack to force generating custom command to produce libdeflate.a
add_custom_target(libdeflate DEPENDS "${libdeflate_lib}")
target_link_libraries(scylla PRIVATE "${libdeflate_lib}")
# TODO: create cmake/ directory and move utilities (generate functions etc) there
# TODO: Build tests if BUILD_TESTING=on (using CTest module)

View File

@@ -1,6 +1,6 @@
# Asking questions or requesting help
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.
# Reporting an issue
@@ -8,4 +8,4 @@ Please use the [Issue Tracker](https://github.com/scylladb/scylla/issues/) to re
# Contributing Code to Scylla
To contribute code to Scylla, you need to sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.
To contribute code to Scylla, you need to sign the [Contributor License Agreement](http://www.scylladb.com/opensource/cla/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.

View File

@@ -1,372 +0,0 @@
# Guidelines for developing Scylla
This document is intended to help developers and contributors to Scylla get started. The first part consists of general guidelines that make no assumptions about a development environment or tooling. The second part describes a particular environment and work-flow for exemplary purposes.
## Overview
This section covers some high-level information about the Scylla source code and work-flow.
### Getting the source code
Scylla uses [Git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) to manage its dependency on Seastar and other tools. Be sure that all submodules are correctly initialized when cloning the project:
```bash
$ git clone https://github.com/scylladb/scylla
$ cd scylla
$ git submodule update --init --recursive
```
### Dependencies
Scylla is fairly fussy about its build environment, requiring a very recent
version of the C++20 compiler and numerous tools and libraries to build.
Run `./install-dependencies.sh` (as root) to use your Linux distributions's
package manager to install the appropriate packages on your build machine.
However, this will only work on very recent distributions. For example,
currently Fedora users must upgrade to Fedora 32 otherwise the C++ compiler
will be too old, and not support the new C++20 standard that Scylla uses.
Alternatively, to avoid having to upgrade your build machine or install
various packages on it, we provide another option - the **frozen toolchain**.
This is a script, `./tools/toolchain/dbuild`, that can execute build or run
commands inside a Docker image that contains exactly the right build tools and
libraries. The `dbuild` technique is useful for beginners, but is also the way
in which ScyllaDB produces official releases, so it is highly recommended.
To use `dbuild`, you simply prefix any build or run command with it. Building
and running Scylla becomes as easy as:
```bash
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native
thread, and up to 3 GB per native thread while linking. GCC >= 10 is
required.
Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.
To build for the first time:
```bash
$ ./configure.py
$ ninja-build
```
Afterwards, it is sufficient to just execute Ninja.
The full suite of options for project configuration is available via
```bash
$ ./configure.py --help
```
The most important option is:
- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.
To save time -- for instance, to avoid compiling all unit tests -- you can also specify specific targets to Ninja. For example,
```bash
$ ninja-build build/release/tests/schema_change_test
$ ninja-build build/release/service/storage_proxy.o
```
You can also specify a single mode. For example
```bash
$ ninja-build release
```
Will build everytihng in release mode. The valid modes are
* Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)
and other sanity checks. It has no optimizations, which allows for debugging with tools like
GDB. Debugging builds are generally slower and generate much larger object files than release builds.
* Release: Fewer checks and more optimizations. It still has debug info.
* Dev: No optimizations or debug info. The objective is to compile and link as fast as possible.
This is useful for the first iterations of a patch.
Note that by default unit tests binaries are stripped so they can't be used with gdb or seastar-addr2line.
To include debug information in the unit test binary, build the test binary with a `_g` suffix. For example,
```bash
$ ninja-build build/release/tests/schema_change_test_g
```
### Unit testing
Unit tests live in the `/tests` directory. Like with application source files, test sources and executables are specified manually in `configure.py` and need to be updated when changes are made.
A test target can be any executable. A non-zero return code indicates test failure.
Most tests in the Scylla repository are built using the [Boost.Test](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/index.html) library. Utilities for writing tests with Seastar futures are also included.
Run all tests through the test execution wrapper with
```bash
$ ./test.py --mode={debug,release}
```
The `--name` argument can be specified to run a particular test.
Alternatively, you can execute the test executable directly. For example,
```bash
$ build/release/tests/row_cache_test -- -c1 -m1G
```
The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread and 1 GB of memory.
### Preparing patches
All changes to Scylla are submitted as patches to the public [mailing list](mailto:scylladb-dev@googlegroups.com). Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/). There are also some guidelines that can help you make the patch review process smoother:
1. Before generating patches, make sure your Git configuration points to `.gitorderfile`. You can do it by running
```bash
$ git config diff.orderfile .gitorderfile
```
2. If you are sending more than a single patch, push your changes into a new branch of your fork of Scylla on GitHub and add a URL pointing to this branch to your cover letter.
3. If you are sending a new revision of an earlier patchset, add a brief summary of changes in this version, for example:
```
In v3:
- declared move constructor and move assignment operator as noexcept
- used std::variant instead of a union
...
```
4. Add information about the tests run with this fix. It can look like
```
"Tests: unit ({mode}), dtest ({smp})"
```
The usual is "Tests: unit (dev)", although running debug tests is encouraged.
5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.
6. The Linux kernel's [Submitting Patches](https://www.kernel.org/doc/html/v4.19/process/submitting-patches.html) document offers excellent advice on how to prepare patches and patchsets for review. Since the Scylla development process is derived from the kernel's, almost all of the advice there is directly applicable.
### Finding a person to review and merge your patches
You can use the `scripts/find-maintainer` script to find a subsystem maintainer and/or reviewer for your patches. The script accepts a filename in the git source tree as an argument and outputs a list of subsystems the file belongs to and their respective maintainers and reviewers. For example, if you changed the `cql3/statements/create_view_statement.hh` file, run the script as follows:
```bash
$ ./scripts/find-maintainer cql3/statements/create_view_statement.hh
```
and you will get output like this:
```
CQL QUERY LANGUAGE
Tomasz Grabiec <tgrabiec@scylladb.com> [maintainer]
Pekka Enberg <penberg@scylladb.com> [maintainer]
MATERIALIZED VIEWS
Pekka Enberg <penberg@scylladb.com> [maintainer]
Duarte Nunes <duarte@scylladb.com> [maintainer]
Nadav Har'El <nyh@scylladb.com> [reviewer]
Duarte Nunes <duarte@scylladb.com> [reviewer]
```
### Running Scylla
Once Scylla has been compiled, executing the (`debug` or `release`) target will start a running instance in the foreground:
```bash
$ build/release/scylla
```
The `scylla` executable requires a configuration file, `scylla.yaml`. By default, this is read from `$SCYLLA_HOME/conf/scylla.yaml`. A good starting point for development is located in the repository at `/conf/scylla.yaml`.
For development, a directory at `$HOME/scylla` can be used for all Scylla-related files:
```bash
$ mkdir -p $HOME/scylla $HOME/scylla/conf
$ cp conf/scylla.yaml $HOME/scylla/conf/scylla.yaml
$ # Edit configuration options as appropriate
$ SCYLLA_HOME=$HOME/scylla build/release/scylla
```
The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories` and `commitlog_directory` fields as appropriate.
Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.
Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.
On a development machine, one might run Scylla as
```bash
$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes
```
To interact with scylla it is recommended to build our versions of
cqlsh and nodetool. They are available at
https://github.com/scylladb/scylla-tools-java and can be built with
```bash
$ sudo ./install-dependencies.sh
$ ant jar
```
cqlsh should work out of the box, but nodetool depends on a running
scylla-jmx (https://github.com/scylladb/scylla-jmx). It can be build
with
```bash
$ mvn package
```
and must be started with
```bash
$ ./scripts/scylla-jmx
```
### Branches and tags
Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.
Similarly, tags are used to pin-point precise release versions, including hot-fix versions like 1.5.4. These are named `scylla-1.5.4`, for example.
Most development happens on the `master` branch. Release branches are cut from `master` based on time and/or features. When a patch against `master` fixes a serious issue like a node crash or data loss, it is backported to a particular release branch with `git cherry-pick` by the project maintainers.
## Example: development on Fedora 25
This section describes one possible work-flow for developing Scylla on a Fedora 25 system. It is presented as an example to help you to develop a work-flow and tools that you are comfortable with.
### Preface
This guide will be written from the perspective of a fictitious developer, Taylor Smith.
### Git work-flow
Having two Git remotes is useful:
- A public clone of Seastar (`"public"`)
- A private clone of Seastar (`"private"`) for in-progress work or work that is not yet ready to share
The first step to contributing a change to Scylla is to create a local branch dedicated to it. For example, a feature that fixes a bug in the CQL statement for creating tables could be called `ts/cql_create_table_error/v1`. The branch name is prefaced by the developer's initials and has a suffix indicating that this is the first version. The version suffix is useful when branches are shared publicly and changes are requested on the mailing list. Having a branch for each version of the patch (or patch set) shared publicly makes it easier to reference and compare the history of a change.
Setting the upstream branch of your development branch to `master` is a useful way to track your changes. You can do this with
```bash
$ git branch -u master ts/cql_create_table_error/v1
```
As a patch set is developed, you can periodically push the branch to the private remote to back-up work.
Once the patch set is ready to be reviewed, push the branch to the public remote and prepare an email to the `scylladb-dev` mailing list. Including a link to the branch on your public remote allows for reviewers to quickly test and explore your changes.
### Development environment and source code navigation
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt`, for use only with development environments (not for building) so that they can properly analyze the source code.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice for code hygiene, though its C++ parser sometimes makes errors and flags false issues.
Other good options that directly parse CMake files are [KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
To use the `CMakeLists.txt` file with these programs, define the `FOR_IDE` CMake variable or shell environmental variable.
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects, and its C++ parser has many similar issues as CLion.
### Distributed compilation: `distcc` and `ccache`
Scylla's compilations times can be long. Two tools help somewhat:
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and re-uses them when possible
- [distcc](https://github.com/distcc/distcc) distributes compilation jobs to remote machines
A reasonably-powered laptop acts as the coordinator for compilation. A second, more powerful, machine acts as a passive compilation server.
Having a direct wired connection between the machines ensures that object files can be transmitted quickly and limits the overhead of remote compilation.
The coordinator has been assigned the static IP address `10.0.0.1` and the passive compilation machine has been assigned `10.0.0.2`.
On Fedora, installing the `ccache` package places symbolic links for `gcc` and `g++` in the `PATH`. This allows normal compilation to transparently invoke `ccache` for compilation and cache object files on the local file-system.
Next, set `CCACHE_PREFIX` so that `ccache` is responsible for invoking `distcc` as necessary:
```bash
export CCACHE_PREFIX="distcc"
```
On each host, edit `/etc/sysconfig/distccd` to include the allowed coordinators and the total number of jobs that the machine should accept.
This example is for the laptop, which has 2 physical cores (4 logical cores with hyper-threading):
```
OPTIONS="--allow 10.0.0.2 --allow 127.0.0.1 --jobs 4"
```
`10.0.0.2` has 8 physical cores (16 logical cores) and 64 GB of memory.
As a rule-of-thumb, the number of jobs that a machine should be specified to support should be equal to the number of its native threads.
Restart the `distccd` service on all machines.
On the coordinator machine, edit `$HOME/.distcc/hosts` with the available hosts for compilation. Order of the hosts indicates preference.
```
10.0.0.2/16 localhost/2
```
In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine will be sent up to 2. Allowing for two extra threads on the host machine for coordination, we run compilation with `16 + 2 + 2 = 20` jobs in total: `ninja-build -j20`.
When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.
One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next sections speeding up this process.
### Using the `gold` linker
Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds the linking process. On Fedora, you can switch the system linker using
```bash
$ sudo alternatives --config ld
```
### Using split dwarf
With debug info enabled, most of the link time is spent copying and
relocating it. It is possible to leave most of the debug info out of
the link by writing it to a side .dwo file. This is done by passing
`-gsplit-dwarf` to gcc.
Unfortunately just `-gsplit-dwarf` would slow down `gdb` startup. To
avoid that the gold linker can be told to create an index with
`--gdb-index`.
More info at https://gcc.gnu.org/wiki/DebugFission.
Both options can be enable by passing `--split-dwarf` to configure.py.
Note that distcc is *not* compatible with it, but icecream
(https://github.com/icecc/icecream) is.
### Testing changes in Seastar with Scylla
Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
One way to do this it to create a local remote for the Seastar submodule in the Scylla repository:
```bash
$ cd $HOME/src/scylla
$ cd seastar
$ git remote add local /home/tsmith/src/seastar
$ git remote update
$ git checkout -t local/my_local_seastar_branch
```
### Core dump debugging
Slides:
2018.11.20: https://www.slideshare.net/tomekgrabiec/scylla-core-dump-debugging-tools

View File

View File

@@ -1,7 +1,2 @@
This project includes code developed by the Apache Software Foundation (http://www.apache.org/),
especially Apache Cassandra.
It includes files from https://github.com/antonblanchard/crc32-vpmsum (author Anton Blanchard <anton@au.ibm.com>, IBM).
These files are located in utils/arch/powerpc/crc32-vpmsum. Their license may be found in licenses/LICENSE-crc32-vpmsum.TXT.
It includes modified code from https://gitbox.apache.org/repos/asf?p=cassandra-dtest.git (owned by The Apache Software Foundation)

29
README-DPDK.md Normal file
View File

@@ -0,0 +1,29 @@
Seastar and DPDK
================
Seastar uses the Data Plane Development Kit to drive NIC hardware directly. This
provides an enormous performance boost.
To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a
run-time parameter. This will use the DPDK package provided as a git submodule with the
seastar sources.
To use your own self-compiled DPDK package, follow this procedure:
1. Setup host to compile DPDK:
- Ubuntu
`sudo apt-get install -y build-essential linux-image-extra-$(uname -r)`
2. Prepare a DPDK SDK:
- Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`
- Untar it.
- Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.
- For DPDK 1.7.x: edit config/common_linuxapp:
- Set CONFIG_RTE_LIBRTE_PMD_BOND to 'n'.
- Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.
- Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.
- Start the tools/setup.sh script as root.
- Compile a linuxapp target (option 9).
- Install IGB_UIO module (option 11).
- Bind some physical port to IGB_UIO (option 17).
- Configure hugepage mappings (option 14/15).
3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.

157
README.md
View File

@@ -1,113 +1,88 @@
# Scylla
[![Slack](https://img.shields.io/badge/slack-scylla-brightgreen.svg?logo=slack)](http://slack.scylladb.com)
[![Twitter](https://img.shields.io/twitter/follow/ScyllaDB.svg?style=social&label=Follow)](https://twitter.com/intent/follow?screen_name=ScyllaDB)
## What is Scylla?
Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB.
Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.
For more information, please see the [ScyllaDB web site].
[ScyllaDB web site]: https://www.scylladb.com
## Build Prerequisites
Scylla is fairly fussy about its build environment, requiring very recent
versions of the C++20 compiler and of many libraries to build. The document
[HACKING.md](HACKING.md) includes detailed information on building and
developing Scylla, but to get Scylla building quickly on (almost) any build
machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),
This is a pre-configured Docker image which includes recent versions of all
the required compilers, libraries and build tools. Using the frozen toolchain
allows you to avoid changing anything in your build machine to meet Scylla's
requirements - you just need to meet the frozen toolchain's prerequisites
(mostly, Docker or Podman being available).
## Building Scylla
Building Scylla with the frozen toolchain `dbuild` is as easy as:
In addition to required packages by Seastar, the following packages are required by Scylla.
```bash
$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
### Submodules
Scylla uses submodules, so make sure you pull the submodules first by doing:
```
git submodule init
git submodule update --init --recursive
```
For further information, please see:
### Building and Running Scylla on Fedora
* Installing required packages:
* [Developer documentation] for more information on building Scylla.
* [Build documentation] on how to build Scylla binaries, tests, and packages.
* [Docker image build documentation] for information on how to build Docker images.
[developer documentation]: HACKING.md
[build documentation]: docs/building.md
[docker image build documentation]: dist/docker/redhat/README.md
## Running Scylla
To start Scylla server, run:
```bash
$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1
```
sudo dnf install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel protobuf-devel protobuf-compiler systemd-devel libunwind-devel
```
This will start a Scylla node with one CPU core allocated to it and data files stored in the `tmp` directory.
The `--developer-mode` is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations).
Please note that you need to run Scylla with `dbuild` if you built it with the frozen toolchain.
* Build Scylla
```
./configure.py --mode=release --with=scylla --disable-xen
ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM
For more run options, run:
```bash
$ ./tools/toolchain/dbuild ./build/release/scylla --help
```
## Testing
* Run Scylla
```
./build/release/scylla
See [test.py manual](docs/testing.md).
```
## Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
Thrift. There is also support for the API of Amazon DynamoDB™,
which needs to be enabled and configured in order to be used. For more
information on how to enable the DynamoDB™ API in Scylla,
and the current compatibility of this feature as well as Scylla-specific extensions, see
[Alternator](docs/alternator/alternator.md) and
[Getting started with Alternator](docs/alternator/getting-started.md).
* run Scylla with one CPU and ./tmp as data directory
## Documentation
```
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
```
Documentation can be found in [./docs](./docs) and on the
[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear
definition of what goes where, so when looking for something be sure to check
both.
Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
User documentation can be found [here](https://docs.scylladb.com/).
* For more run options:
```
./build/release/scylla --help
```
## Training
## Building Fedora RPM
Training material and online courses can be found at [Scylla University](https://university.scylladb.com/).
The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling,
administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions,
multi-datacenters and how Scylla integrates with third-party applications.
As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:
```
# Install mock:
sudo yum install mock
# Add user to the "mock" group:
usermod -a -G mock $USER && newgrp mock
```
Then, to build an RPM, run:
```
./dist/redhat/build_rpm.sh
```
The built RPM is stored in ``/var/lib/mock/<configuration>/result`` directory.
For example, on Fedora 21 mock reports the following:
```
INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result
```
## Building Fedora-based Docker image
Build a Docker image with:
```
cd dist/docker
docker build -t <image-name> .
```
Run the image with:
```
docker run -p $(hostname -i):9042:9042 -i -t <image name>
```
## Contributing to Scylla
If you want to report a bug or submit a pull request or a patch, please read the [contribution guidelines].
If you are a developer working on Scylla, please read the [developer guidelines].
[contribution guidelines]: CONTRIBUTING.md
[developer guidelines]: HACKING.md
## Contact
* The [users mailing list] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.
* The [developers mailing list] is for developers and people interested in following the development of ScyllaDB to discuss technical topics.
[Users mailing list]: https://groups.google.com/forum/#!forum/scylladb-users
[Slack channel]: http://slack.scylladb.com/
[Developers mailing list]: https://groups.google.com/forum/#!forum/scylladb-dev
[Guidelines for contributing](CONTRIBUTING.md)

View File

@@ -1,7 +1,6 @@
#!/bin/sh
PRODUCT=scylla
VERSION=4.4.dev
VERSION=1.7.5
if test -f version
then
@@ -19,16 +18,7 @@ else
SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
fi
if [ -f build/SCYLLA-RELEASE-FILE ]; then
RELEASE_FILE=$(cat build/SCYLLA-RELEASE-FILE)
GIT_COMMIT_FILE=$(cat build/SCYLLA-RELEASE-FILE |cut -d . -f 3)
if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
exit 0
fi
fi
echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"
mkdir -p build
echo "$SCYLLA_VERSION" > build/SCYLLA-VERSION-FILE
echo "$SCYLLA_RELEASE" > build/SCYLLA-RELEASE-FILE
echo "$PRODUCT" > build/SCYLLA-PRODUCT-FILE

1
abseil

Submodule abseil deleted from 1e3d25b265

View File

@@ -1,47 +0,0 @@
/*
* Copyright (C) 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <absl/container/flat_hash_map.h>
#include <seastar/core/sstring.hh>
using namespace seastar;
struct sstring_hash {
using is_transparent = void;
size_t operator()(std::string_view v) const noexcept;
};
struct sstring_eq {
using is_transparent = void;
bool operator()(std::string_view a, std::string_view b) const noexcept {
return a == b;
}
};
template <typename K, typename V, typename... Ts>
struct flat_hash_map : public absl::flat_hash_map<K, V, Ts...> {
};
template <typename V>
struct flat_hash_map<sstring, V>
: public absl::flat_hash_map<sstring, V, sstring_hash, sstring_eq> {};

View File

@@ -1,146 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/error.hh"
#include "log.hh"
#include <string>
#include <string_view>
#include <gnutls/crypto.h>
#include <seastar/util/defer.hh>
#include "hashers.hh"
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/roles-metadata.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
namespace alternator {
static logging::logger alogger("alternator-auth");
static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
hmac_sha256_digest digest;
int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
if (ret) {
throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
}
return digest;
}
static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
return signing;
}
static std::string apply_sha256(std::string_view msg) {
sha256_hasher hasher;
hasher.update(msg.data(), msg.size());
return to_hex(hasher.finalize());
}
static std::string format_time_point(db_clock::time_point tp) {
time_t time_point_repr = db_clock::to_time_t(tp);
std::string time_point_str;
time_point_str.resize(17);
::tm time_buf;
// strftime prints the terminating null character as well
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));
time_point_str.resize(16);
return time_point_str;
}
void check_expiry(std::string_view signature_date) {
//FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
std::string expiration_str = format_time_point(db_clock::now() - 15min);
std::string validity_str = format_time_point(db_clock::now() + 15min);
if (signature_date < expiration_str) {
throw api_error::invalid_signature(
fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
signature_date, expiration_str));
}
if (signature_date > validity_str) {
throw api_error::invalid_signature(
fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
signature_date, validity_str));
}
}
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
auto amz_date_it = signed_headers_map.find("x-amz-date");
if (amz_date_it == signed_headers_map.end()) {
throw api_error::invalid_signature("X-Amz-Date header is mandatory for signature verification");
}
std::string_view amz_date = amz_date_it->second;
check_expiry(amz_date);
std::string_view datestamp = amz_date.substr(0, 8);
if (datestamp != orig_datestamp) {
throw api_error::invalid_signature(
format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
orig_datestamp, datestamp));
}
std::string_view canonical_uri = "/";
std::stringstream canonical_headers;
for (const auto& header : signed_headers_map) {
canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
}
std::string payload_hash = apply_sha256(body_content);
std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
std::string_view algorithm = "AWS4-HMAC-SHA256";
std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope, apply_sha256(canonical_request));
hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
}
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
auth::meta::roles_table::qualified_name, auth::meta::roles_table::role_col_name);
auto cl = auth::password_authenticator::consistency_for_user(username);
return qp.execute_internal(query, cl, auth::internal_distributed_query_state(), {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (res->empty()) {
throw api_error::unrecognized_client(fmt::format("User not found: {}", username));
}
salted_hash = res->one().get_opt<sstring>("salted_hash");
if (!salted_hash) {
throw api_error::unrecognized_client(fmt::format("No password found for user: {}", username));
}
return make_ready_future<std::string>(*salted_hash);
});
}
}

View File

@@ -1,46 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include <array>
#include "gc_clock.hh"
#include "utils/loading_cache.hh"
namespace cql3 {
class query_processor;
}
namespace alternator {
using hmac_sha256_digest = std::array<char, 32>;
using key_cache = utils::loading_cache<std::string, std::string>;
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);
}

View File

@@ -1,145 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values
// be encoded in the JSON API as base64-encoded strings. This is code to
// convert byte arrays to base64-encoded strings, and back.
#include "base64.hh"
#include <ctype.h>
// Arrays for quickly converting to and from an integer between 0 and 63,
// and the character used in base64 encoding to represent it.
static class base64_chars {
public:
static constexpr const char to[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int8_t from[255];
base64_chars() {
static_assert(sizeof(to) == 64 + 1);
for (int i = 0; i < 255; i++) {
from[i] = -1; // signal invalid character
}
for (int i = 0; i < 64; i++) {
from[(unsigned) to[i]] = i;
}
}
} base64_chars;
std::string base64_encode(bytes_view in) {
std::string ret;
ret.reserve(((4 * in.size() / 3) + 3) & ~3);
int i = 0;
unsigned char chunk3[3]; // chunk of input
for (auto byte : in) {
chunk3[i++] = byte;
if (i == 3) {
ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
ret += base64_chars.to[ chunk3[2] & 0x3f ];
i = 0;
}
}
if (i) {
// i can be 1 or 2.
for(int j = i; j < 3; j++)
chunk3[j] = '\0';
ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
if (i == 2) {
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
} else {
ret += '=';
}
ret += '=';
}
return ret;
}
static std::string base64_decode_string(std::string_view in) {
int i = 0;
int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;
std::string ret;
ret.reserve(in.size() * 3 / 4);
for (unsigned char c : in) {
uint8_t dc = base64_chars.from[c];
if (dc == 255) {
// Any unexpected character, include the "=" character usually
// used for padding, signals the end of the decode.
break;
}
chunk4[i++] = dc;
if (i == 4) {
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];
i = 0;
}
}
if (i) {
// i can be 2 or 3, meaning 1 or 2 more output characters
if (i>=2)
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
if (i==3)
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
}
return ret;
}
bytes base64_decode(std::string_view in) {
// FIXME: This copy is sad. The problem is we need back "bytes"
// but "bytes" doesn't have efficient append and std::string.
// To fix this we need to use bytes' "uninitialized" feature.
std::string ret = base64_decode_string(in);
return bytes(ret.begin(), ret.end());
}
static size_t base64_padding_len(std::string_view str) {
size_t padding = 0;
padding += (!str.empty() && str.back() == '=');
padding += (str.size() > 1 && *(str.end() - 2) == '=');
return padding;
}
size_t base64_decoded_len(std::string_view str) {
return str.size() / 4 * 3 - base64_padding_len(str);
}
bool base64_begins_with(std::string_view base, std::string_view operand) {
if (base.size() < operand.size() || base.size() % 4 != 0 || operand.size() % 4 != 0) {
return false;
}
if (base64_padding_len(operand) == 0) {
return base.starts_with(operand);
}
const std::string_view unpadded_base_prefix = base.substr(0, operand.size() - 4);
const std::string_view unpadded_operand = operand.substr(0, operand.size() - 4);
if (unpadded_base_prefix != unpadded_operand) {
return false;
}
// Decode and compare last 4 bytes of base64-encoded strings
const std::string base_remainder = base64_decode_string(base.substr(operand.size() - 4, operand.size()));
const std::string operand_remainder = base64_decode_string(operand.substr(operand.size() - 4));
return base_remainder.starts_with(operand_remainder);
}

View File

@@ -1,38 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string_view>
#include "bytes.hh"
#include "utils/rjson.hh"
std::string base64_encode(bytes_view);
bytes base64_decode(std::string_view);
inline bytes base64_decode(const rjson::value& v) {
return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
}
size_t base64_decoded_len(std::string_view str);
bool base64_begins_with(std::string_view base, std::string_view operand);

View File

@@ -1,650 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <list>
#include <map>
#include <string_view>
#include "alternator/conditions.hh"
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "utils/rjson.hh"
#include "serialization.hh"
#include "base64.hh"
#include <stdexcept>
#include <boost/algorithm/cxx11/all_of.hpp>
#include <boost/algorithm/cxx11/any_of.hpp>
#include "utils/overloaded_functor.hh"
#include "expressions.hh"
namespace alternator {
static logging::logger clogger("alternator-conditions");
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {
static std::unordered_map<std::string, comparison_operator_type> ops = {
{"EQ", comparison_operator_type::EQ},
{"NE", comparison_operator_type::NE},
{"LE", comparison_operator_type::LE},
{"LT", comparison_operator_type::LT},
{"GE", comparison_operator_type::GE},
{"GT", comparison_operator_type::GT},
{"IN", comparison_operator_type::IN},
{"NULL", comparison_operator_type::IS_NULL},
{"NOT_NULL", comparison_operator_type::NOT_NULL},
{"BETWEEN", comparison_operator_type::BETWEEN},
{"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
{"CONTAINS", comparison_operator_type::CONTAINS},
{"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
};
if (!comparison_operator.IsString()) {
throw api_error::validation(format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error::validation(format("Unsupported comparison operator {}", op));
}
return it->second;
}
namespace {
struct size_check {
// True iff size passes this check.
virtual bool operator()(rapidjson::SizeType size) const = 0;
// Check description, such that format("expected array {}", check.what()) is human-readable.
virtual sstring what() const = 0;
};
class exact_size : public size_check {
rapidjson::SizeType _expected;
public:
explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}
bool operator()(rapidjson::SizeType size) const override { return size == _expected; }
sstring what() const override { return format("of size {}", _expected); }
};
struct empty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size < 1; }
sstring what() const override { return "to be empty"; }
};
struct nonempty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size > 0; }
sstring what() const override { return "to be non-empty"; }
};
} // anonymous namespace
// Check that array has the expected number of elements
static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {
if (!array && expected(0)) {
// If expected() allows an empty AttributeValueList, it is also fine
// that it is missing.
return;
}
if (!array || !array->IsArray()) {
throw api_error::validation("With ComparisonOperator, AttributeValueList must be given and an array");
}
if (!expected(array->Size())) {
throw api_error::validation(
format("{} operator requires AttributeValueList {}, instead found list size {}",
op, expected.what(), array->Size()));
}
}
struct rjson_engaged_ptr_comp {
bool operator()(const rjson::value* p1, const rjson::value* p2) const {
return rjson::single_value_comp()(*p1, *p2);
}
};
// It's not enough to compare underlying JSON objects when comparing sets,
// as internally they're stored in an array, and the order of elements is
// not important in set equality. See issue #5021
static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {
if (set1.Size() != set2.Size()) {
return false;
}
std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;
for (auto it = set1.Begin(); it != set1.End(); ++it) {
set1_raw.insert(&*it);
}
for (const auto& a : set2.GetArray()) {
if (!set1_raw.contains(&a)) {
return false;
}
}
return true;
}
// Check if two JSON-encoded values match with the EQ relation
static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {
return check_EQ_for_sets(it1->value, it2->value);
}
}
return *v1 == v2;
}
// Check if two JSON-encoded values match with the NE relation
static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
return !v1 || *v1 != v2; // null is unequal to anything.
}
// Check if two JSON-encoded values match with the BEGINS_WITH relation
static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
// BEGINS_WITH requires that its single operand (v2) be a string or
// binary - otherwise it's a validation error. However, problems with
// the stored attribute (v1) will just return false (no match).
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error::validation(format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
}
auto it2 = v2.MemberBegin();
if (it2->name != "S" && it2->name != "B") {
throw api_error::validation(format("BEGINS_WITH operator requires String or Binary type in AttributeValue, got {}", it2->name));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
auto it1 = v1->MemberBegin();
if (it1->name != it2->name) {
return false;
}
if (it2->name == "S") {
return rjson::to_string_view(it1->value).starts_with(rjson::to_string_view(it2->value));
} else /* it2->name == "B" */ {
return base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));
}
}
static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {
return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");
}
// Check if two JSON-encoded values match with the CONTAINS relation
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
const auto& kv2 = *v2.MemberBegin();
if (kv1.name == "S" && kv2.name == "S") {
return rjson::to_string_view(kv1.value).find(rjson::to_string_view(kv2.value)) != std::string_view::npos;
} else if (kv1.name == "B" && kv2.name == "B") {
return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;
} else if (is_set_of(kv1.name, kv2.name)) {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (*i == kv2.value) {
return true;
}
}
} else if (kv1.name == "L") {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (!i->IsObject() || i->MemberCount() != 1) {
clogger.error("check_CONTAINS received a list whose element is malformed");
return false;
}
const auto& el = *i->MemberBegin();
if (el.name == kv2.name && el.value == kv2.value) {
return true;
}
}
}
return false;
}
// Check if two JSON-encoded values match with the NOT_CONTAINS relation
static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
return !check_CONTAINS(v1, v2);
}
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
static bool check_IN(const rjson::value* val, const rjson::value& array) {
if (!array[0].IsObject() || array[0].MemberCount() != 1) {
throw api_error::validation(
format("IN operator encountered malformed AttributeValue: {}", array[0]));
}
const auto& type = array[0].MemberBegin()->name;
if (type != "S" && type != "N" && type != "B") {
throw api_error::validation(
"IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
}
if (!val) {
return false;
}
bool have_match = false;
for (const auto& elem : array.GetArray()) {
if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
throw api_error::validation(
"IN operator requires all AttributeValueList elements to have the same type ");
}
if (!have_match && *val == elem) {
// Can't return yet, must check types of all array elements. <sigh>
have_match = true;
}
}
return have_match;
}
// Another variant of check_IN, this one for ConditionExpression. It needs to
// check whether the first element in the given vector is equal to any of the
// others.
static bool check_IN(const std::vector<rjson::value>& array) {
const rjson::value* first = &array[0];
for (unsigned i = 1; i < array.size(); i++) {
if (check_EQ(first, array[i])) {
return true;
}
}
return false;
}
static bool check_NULL(const rjson::value* val) {
return val == nullptr;
}
static bool check_NOT_NULL(const rjson::value* val) {
return val != nullptr;
}
// Check if two JSON-encoded values match with cmp.
template <typename Comparator>
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error::validation(
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error::validation(
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
if (kv1.name != kv2.name) {
return false;
}
if (kv1.name == "N") {
return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));
}
if (kv1.name == "S") {
return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
}
if (kv1.name == "B") {
return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
}
clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
return false;
}
struct cmp_lt {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
// We cannot use the normal comparison operators like "<" on the bytes
// type, because they treat individual bytes as signed but we need to
// compare them as *unsigned*. So we need a specialization for bytes.
bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) < 0; }
static constexpr const char* diagnostic = "LT operator";
};
struct cmp_le {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs <= rhs; }
bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) <= 0; }
static constexpr const char* diagnostic = "LE operator";
};
struct cmp_ge {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs >= rhs; }
bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) >= 0; }
static constexpr const char* diagnostic = "GE operator";
};
struct cmp_gt {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs > rhs; }
bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) > 0; }
static constexpr const char* diagnostic = "GT operator";
};
// True if v is between lb and ub, inclusive. Throws if lb > ub.
template <typename T>
static bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
if (cmp_lt()(ub, lb)) {
throw api_error::validation(
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
}
return cmp_ge()(v, lb) && cmp_le()(v, ub);
}
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
if (!v) {
return false;
}
if (!v->IsObject() || v->MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
}
if (!lb.IsObject() || lb.MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
}
if (!ub.IsObject() || ub.MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
}
const auto& kv_v = *v->MemberBegin();
const auto& kv_lb = *lb.MemberBegin();
const auto& kv_ub = *ub.MemberBegin();
if (kv_lb.name != kv_ub.name) {
throw api_error::validation(
format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
kv_lb.name, kv_ub.name));
}
if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
return false;
}
if (kv_v.name == "N") {
const char* diag = "BETWEEN operator";
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
}
if (kv_v.name == "S") {
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
}
if (kv_v.name == "B") {
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
}
throw api_error::validation(
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
kv_lb.name));
}
// Verify one Expect condition on one attribute (whose content is "got")
// for the verify_expected() below.
// This function returns true or false depending on whether the condition
// succeeded - it does not throw ConditionalCheckFailedException.
// However, it may throw ValidationException on input validation errors.
static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {
const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");
const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");
const rjson::value* value = rjson::find(condition, "Value");
const rjson::value* exists = rjson::find(condition, "Exists");
// There are three types of conditions that Expected supports:
// A value, not-exists, and a comparison of some kind. Each allows
// and requires a different combinations of parameters in the request
if (value) {
if (exists && (!exists->IsBool() || exists->GetBool() != true)) {
throw api_error::validation("Cannot combine Value with Exists!=true");
}
if (comparison_operator) {
throw api_error::validation("Cannot combine Value with ComparisonOperator");
}
return check_EQ(got, *value);
} else if (exists) {
if (comparison_operator) {
throw api_error::validation("Cannot combine Exists with ComparisonOperator");
}
if (!exists->IsBool() || exists->GetBool() != false) {
throw api_error::validation("Exists!=false requires Value");
}
// Remember Exists=false, so we're checking that the attribute does *not* exist:
return !got;
} else {
if (!comparison_operator) {
throw api_error::validation("Missing ComparisonOperator, Value or Exists");
}
comparison_operator_type op = get_comparison_operator(*comparison_operator);
switch (op) {
case comparison_operator_type::EQ:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_EQ(got, (*attribute_value_list)[0]);
case comparison_operator_type::NE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NE(got, (*attribute_value_list)[0]);
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
case comparison_operator_type::LE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_le{});
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
case comparison_operator_type::GE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
case comparison_operator_type::IN:
verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
return check_IN(got, *attribute_value_list);
case comparison_operator_type::IS_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NULL(got);
case comparison_operator_type::NOT_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NOT_NULL(got);
case comparison_operator_type::BETWEEN:
verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
case comparison_operator_type::CONTAINS:
{
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
// Expected's "CONTAINS" has this artificial limitation.
// ConditionExpression's "contains()" does not...
const rjson::value& arg = (*attribute_value_list)[0];
const auto& argtype = (*arg.MemberBegin()).name;
if (argtype != "S" && argtype != "N" && argtype != "B") {
throw api_error::validation(
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", argtype));
}
return check_CONTAINS(got, arg);
}
case comparison_operator_type::NOT_CONTAINS:
{
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
// Expected's "NOT_CONTAINS" has this artificial limitation.
// ConditionExpression's "contains()" does not...
const rjson::value& arg = (*attribute_value_list)[0];
const auto& argtype = (*arg.MemberBegin()).name;
if (argtype != "S" && argtype != "N" && argtype != "B") {
throw api_error::validation(
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", argtype));
}
return check_NOT_CONTAINS(got, arg);
}
}
throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));
}
}
conditional_operator_type get_conditional_operator(const rjson::value& req) {
const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
if (!conditional_operator) {
return conditional_operator_type::MISSING;
}
if (!conditional_operator->IsString()) {
throw api_error::validation("'ConditionalOperator' parameter, if given, must be a string");
}
auto s = rjson::to_string_view(*conditional_operator);
if (s == "AND") {
return conditional_operator_type::AND;
} else if (s == "OR") {
return conditional_operator_type::OR;
} else {
throw api_error::validation(
format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));
}
}
// Check if the existing values of the item (previous_item) match the
// conditions given by the Expected and ConditionalOperator parameters
// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
// This function can throw an ValidationException API error if there
// are errors in the format of the condition itself.
bool verify_expected(const rjson::value& req, const rjson::value* previous_item) {
const rjson::value* expected = rjson::find(req, "Expected");
auto conditional_operator = get_conditional_operator(req);
if (conditional_operator != conditional_operator_type::MISSING &&
(!expected || (expected->IsObject() && expected->GetObject().ObjectEmpty()))) {
throw api_error::validation("'ConditionalOperator' parameter cannot be specified for missing or empty Expression");
}
if (!expected) {
return true;
}
if (!expected->IsObject()) {
throw api_error::validation("'Expected' parameter, if given, must be an object");
}
bool require_all = conditional_operator != conditional_operator_type::OR;
return verify_condition(*expected, require_all, previous_item);
}
bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item) {
for (auto it = condition.MemberBegin(); it != condition.MemberEnd(); ++it) {
const rjson::value* got = nullptr;
if (previous_item) {
got = rjson::find(*previous_item, rjson::to_string_view(it->name));
}
bool success = verify_expected_one(it->value, got);
if (success && !require_all) {
// When !require_all, one success is enough!
return true;
} else if (!success && require_all) {
// When require_all, one failure is enough!
return false;
}
}
// If we got here and require_all, none of the checks failed, so succeed.
// If we got here and !require_all, all of the checks failed, so fail.
return require_all;
}
static bool calculate_primitive_condition(const parsed::primitive_condition& cond,
const rjson::value* previous_item) {
std::vector<rjson::value> calculated_values;
calculated_values.reserve(cond._values.size());
for (const parsed::value& v : cond._values) {
calculated_values.push_back(calculate_value(v,
cond._op == parsed::primitive_condition::type::VALUE ?
calculate_value_caller::ConditionExpressionAlone :
calculate_value_caller::ConditionExpression,
previous_item));
}
switch (cond._op) {
case parsed::primitive_condition::type::BETWEEN:
if (calculated_values.size() != 3) {
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error(format("Wrong number of values {} in BETWEEN primitive_condition", cond._values.size()));
}
return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2]);
case parsed::primitive_condition::type::IN:
return check_IN(calculated_values);
case parsed::primitive_condition::type::VALUE:
if (calculated_values.size() != 1) {
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error(format("Unexpected values in primitive_condition", cond._values.size()));
}
// Unwrap the boolean wrapped as the value (if it is a boolean)
if (calculated_values[0].IsObject() && calculated_values[0].MemberCount() == 1) {
auto it = calculated_values[0].MemberBegin();
if (it->name == "BOOL" && it->value.IsBool()) {
return it->value.GetBool();
}
}
throw api_error::validation(
format("ConditionExpression: condition results in a non-boolean value: {}",
calculated_values[0]));
default:
// All the rest of the operators have exactly two parameters (and unless
// we have a bug in the parser, that's what we have in the parsed object:
if (calculated_values.size() != 2) {
throw std::logic_error(format("Wrong number of values {} in primitive_condition object", cond._values.size()));
}
}
switch (cond._op) {
case parsed::primitive_condition::type::EQ:
return check_EQ(&calculated_values[0], calculated_values[1]);
case parsed::primitive_condition::type::NE:
return check_NE(&calculated_values[0], calculated_values[1]);
case parsed::primitive_condition::type::GT:
return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{});
case parsed::primitive_condition::type::GE:
return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{});
case parsed::primitive_condition::type::LT:
return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{});
case parsed::primitive_condition::type::LE:
return check_compare(&calculated_values[0], calculated_values[1], cmp_le{});
default:
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error(format("Unknown type {} in primitive_condition object", (int)(cond._op)));
}
}
// Check if the existing values of the item (previous_item) match the
// conditions given by the given parsed ConditionExpression.
bool verify_condition_expression(
const parsed::condition_expression& condition_expression,
const rjson::value* previous_item) {
if (condition_expression.empty()) {
return true;
}
bool ret = std::visit(overloaded_functor {
[&] (const parsed::primitive_condition& cond) -> bool {
return calculate_primitive_condition(cond, previous_item);
},
[&] (const parsed::condition_expression::condition_list& list) -> bool {
auto verify_condition = [&] (const parsed::condition_expression& e) {
return verify_condition_expression(e, previous_item);
};
switch (list.op) {
case '&':
return boost::algorithm::all_of(list.conditions, verify_condition);
case '|':
return boost::algorithm::any_of(list.conditions, verify_condition);
default:
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error("bad operator in condition_list");
}
}
}, condition_expression._expression);
return condition_expression._negated ? !ret : ret;
}
}

View File

@@ -1,60 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* This file contains definitions and functions related to placing conditions
* on Alternator queries (equivalent of CQL's restrictions).
*
* With conditions, it's possible to add criteria to selection requests (Scan, Query)
* and use them for narrowing down the result set, by means of filtering or indexing.
*
* Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
*/
#pragma once
#include "cql3/restrictions/statement_restrictions.hh"
#include "serialization.hh"
#include "expressions_types.hh"
namespace alternator {
enum class comparison_operator_type {
EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
};
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);
enum class conditional_operator_type {
AND, OR, MISSING
};
conditional_operator_type get_conditional_operator(const rjson::value& req);
bool verify_expected(const rjson::value& req, const rjson::value* previous_item);
bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item);
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);
bool verify_condition_expression(
const parsed::condition_expression& condition_expression,
const rjson::value* previous_item);
}

View File

@@ -1,86 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
namespace alternator {
// api_error contains a DynamoDB error message to be returned to the user.
// It can be returned by value (see executor::request_return_type) or thrown.
// The DynamoDB's error messages are described in detail in
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// An error message has an HTTP code (almost always 400), a type, e.g.,
// "ResourceNotFoundException", and a human readable message.
// Eventually alternator::api_handler will convert a returned or thrown
// api_error into a JSON object, and that is returned to the user.
class api_error final {
public:
using status_type = httpd::reply::status_type;
status_type _http_code;
std::string _type;
std::string _msg;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
: _http_code(std::move(http_code))
, _type(std::move(type))
, _msg(std::move(msg))
{ }
// Factory functions for some common types of DynamoDB API errors
static api_error validation(std::string msg) {
return api_error("ValidationException", std::move(msg));
}
static api_error resource_not_found(std::string msg) {
return api_error("ResourceNotFoundException", std::move(msg));
}
static api_error resource_in_use(std::string msg) {
return api_error("ResourceInUseException", std::move(msg));
}
static api_error invalid_signature(std::string msg) {
return api_error("InvalidSignatureException", std::move(msg));
}
static api_error unrecognized_client(std::string msg) {
return api_error("UnrecognizedClientException", std::move(msg));
}
static api_error unknown_operation(std::string msg) {
return api_error("UnknownOperationException", std::move(msg));
}
static api_error access_denied(std::string msg) {
return api_error("AccessDeniedException", std::move(msg));
}
static api_error conditional_check_failed(std::string msg) {
return api_error("ConditionalCheckFailedException", std::move(msg));
}
static api_error expired_iterator(std::string msg) {
return api_error("ExpiredIteratorException", std::move(msg));
}
static api_error trimmed_data_access_exception(std::string msg) {
return api_error("TrimmedDataAccessException", std::move(msg));
}
static api_error internal(std::string msg) {
return api_error("InternalServerError", std::move(msg), reply::status_type::internal_server_error);
}
};
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,154 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include <seastar/json/json_elements.hh>
#include <seastar/core/sharded.hh>
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
#include "service/client_state.hh"
#include "db/timeout_clock.hh"
#include "alternator/error.hh"
#include "stats.hh"
#include "utils/rjson.hh"
namespace db {
class system_distributed_keyspace;
}
namespace query {
class partition_slice;
class result;
}
namespace cql3::selection {
class selection;
}
namespace service {
class storage_service;
}
namespace alternator {
class rmw_operation;
struct make_jsonable : public json::jsonable {
rjson::value _value;
public:
explicit make_jsonable(rjson::value&& value);
std::string to_json() const override;
};
struct json_string : public json::jsonable {
std::string _value;
public:
explicit json_string(std::string&& value);
std::string to_json() const override;
};
class executor : public peering_sharded_service<executor> {
service::storage_proxy& _proxy;
service::migration_manager& _mm;
db::system_distributed_keyspace& _sdks;
service::storage_service& _ss;
// An smp_service_group to be used for limiting the concurrency when
// forwarding Alternator request between shards - if necessary for LWT.
smp_service_group _ssg;
public:
using client_state = service::client_state;
using request_return_type = std::variant<json::json_return_type, api_error>;
stats _stats;
static constexpr auto ATTRS_COLUMN_NAME = ":attrs";
static constexpr auto KEYSPACE_NAME_PREFIX = "alternator_";
static constexpr std::string_view INTERNAL_TABLE_PREFIX = ".scylla.alternator.";
executor(service::storage_proxy& proxy, service::migration_manager& mm, db::system_distributed_keyspace& sdks, service::storage_service& ss, smp_service_group ssg)
: _proxy(proxy), _mm(mm), _sdks(sdks), _ss(ss), _ssg(ssg) {}
future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> list_tables(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> describe_endpoints(client_state& client_state, service_permit permit, rjson::value request, std::string host_header);
future<request_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> list_streams(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> describe_stream(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> get_shard_iterator(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> get_records(client_state& client_state, tracing::trace_state_ptr, service_permit permit, rjson::value request);
future<> start();
future<> stop() { return make_ready_future<>(); }
future<> create_keyspace(std::string_view keyspace_name);
static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
static sstring table_name(const schema&);
static db::timeout_clock::time_point default_timeout();
static schema_ptr find_table(service::storage_proxy&, const rjson::value& request);
private:
friend class rmw_operation;
static bool is_alternator_keyspace(const sstring& ks_name);
static sstring make_keyspace_name(const sstring& table_name);
static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr);
static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&);
public:
static std::optional<rjson::value> describe_single_item(schema_ptr,
const query::partition_slice&,
const cql3::selection::selection&,
const query::result&,
const std::unordered_set<std::string>&);
static void describe_single_item(const cql3::selection::selection&,
const std::vector<bytes_opt>&,
const std::unordered_set<std::string>&,
rjson::value&,
bool = false);
void add_stream_options(const rjson::value& stream_spec, schema_builder&) const;
void supplement_table_info(rjson::value& descr, const schema& schema) const;
void supplement_table_stream_info(rjson::value& descr, const schema& schema) const;
};
}

View File

@@ -1,728 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "expressions.hh"
#include "serialization.hh"
#include "base64.hh"
#include "conditions.hh"
#include "alternator/expressionsLexer.hpp"
#include "alternator/expressionsParser.hpp"
#include "utils/overloaded_functor.hh"
#include "error.hh"
#include "seastarx.hh"
#include <seastar/core/print.hh>
#include <seastar/util/log.hh>
#include <boost/algorithm/cxx11/any_of.hpp>
#include <boost/algorithm/cxx11/all_of.hpp>
#include <functional>
#include <unordered_map>
namespace alternator {
template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
Result do_with_parser(std::string input, Func&& f) {
expressionsLexer::InputStreamType input_stream{
reinterpret_cast<const ANTLR_UINT8*>(input.data()),
ANTLR_ENC_UTF8,
static_cast<ANTLR_UINT32>(input.size()),
nullptr };
expressionsLexer lexer(&input_stream);
expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());
expressionsParser parser(&tstream);
auto result = f(parser);
return result;
}
parsed::update_expression
parse_update_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::update_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));
}
}
std::vector<parsed::path>
parse_projection_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::projection_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));
}
}
parsed::condition_expression
parse_condition_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::condition_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing ConditionExpression '{}': {}", query, std::current_exception()));
}
}
namespace parsed {
void update_expression::add(update_expression::action a) {
std::visit(overloaded_functor {
[&] (action::set&) { seen_set = true; },
[&] (action::remove&) { seen_remove = true; },
[&] (action::add&) { seen_add = true; },
[&] (action::del&) { seen_del = true; }
}, a._action);
_actions.push_back(std::move(a));
}
void update_expression::append(update_expression other) {
if ((seen_set && other.seen_set) ||
(seen_remove && other.seen_remove) ||
(seen_add && other.seen_add) ||
(seen_del && other.seen_del)) {
throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");
}
std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));
seen_set |= other.seen_set;
seen_remove |= other.seen_remove;
seen_add |= other.seen_add;
seen_del |= other.seen_del;
}
void condition_expression::append(condition_expression&& a, char op) {
std::visit(overloaded_functor {
[&] (condition_list& x) {
// If 'a' has a single condition, we could, instead of inserting
// it insert its single condition (possibly negated if a._negated)
// But considering it we don't evaluate these expressions many
// times, this optimization is not worth extra code complexity.
if (!x.conditions.empty() && x.op != op) {
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error("condition_expression::append called with mixed operators");
}
x.conditions.push_back(std::move(a));
x.op = op;
},
[&] (primitive_condition& x) {
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error("condition_expression::append called on primitive_condition");
}
}, _expression);
}
} // namespace parsed
// The following resolve_*() functions resolve references in parsed
// expressions of different types. Resolving a parsed expression means
// replacing:
// 1. In parsed::path objects, replace references like "#name" with the
// attribute name from ExpressionAttributeNames,
// 2. In parsed::constant objects, replace references like ":value" with
// the value from ExpressionAttributeValues.
// These function also track which name and value references were used, to
// allow complaining if some remain unused.
// Note that the resolve_*() functions modify the expressions in-place,
// so if we ever intend to cache parsed expression, we need to pass a copy
// into this function.
//
// Doing the "resolving" stage before the evaluation stage has two benefits.
// First, it allows us to be compatible with DynamoDB in catching unused
// names and values (see issue #6572). Second, in the FilterExpression case,
// we need to resolve the expression just once but then use it many times
// (once for each item to be filtered).
static void resolve_path(parsed::path& p,
const rjson::value* expression_attribute_names,
std::unordered_set<std::string>& used_attribute_names) {
const std::string& column_name = p.root();
if (column_name.size() > 0 && column_name.front() == '#') {
if (!expression_attribute_names) {
throw api_error::validation(
format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));
}
const rjson::value* value = rjson::find(*expression_attribute_names, column_name);
if (!value || !value->IsString()) {
throw api_error::validation(
format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));
}
used_attribute_names.emplace(column_name);
p.set_root(std::string(rjson::to_string_view(*value)));
}
}
static void resolve_constant(parsed::constant& c,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_values) {
std::visit(overloaded_functor {
[&] (const std::string& valref) {
if (!expression_attribute_values) {
throw api_error::validation(
format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));
}
const rjson::value* value = rjson::find(*expression_attribute_values, valref);
if (!value) {
throw api_error::validation(
format("ExpressionAttributeValues missing entry '{}' required by expression", valref));
}
if (value->IsNull()) {
throw api_error::validation(
format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));
}
validate_value(*value, "ExpressionAttributeValues");
used_attribute_values.emplace(valref);
c.set(*value);
},
[&] (const parsed::constant::literal& lit) {
// Nothing to do, already resolved
}
}, c._value);
}
void resolve_value(parsed::value& rhs,
const rjson::value* expression_attribute_names,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_names,
std::unordered_set<std::string>& used_attribute_values) {
std::visit(overloaded_functor {
[&] (parsed::constant& c) {
resolve_constant(c, expression_attribute_values, used_attribute_values);
},
[&] (parsed::value::function_call& f) {
for (parsed::value& value : f._parameters) {
resolve_value(value, expression_attribute_names, expression_attribute_values,
used_attribute_names, used_attribute_values);
}
},
[&] (parsed::path& p) {
resolve_path(p, expression_attribute_names, used_attribute_names);
}
}, rhs._value);
}
void resolve_set_rhs(parsed::set_rhs& rhs,
const rjson::value* expression_attribute_names,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_names,
std::unordered_set<std::string>& used_attribute_values) {
resolve_value(rhs._v1, expression_attribute_names, expression_attribute_values,
used_attribute_names, used_attribute_values);
if (rhs._op != 'v') {
resolve_value(rhs._v2, expression_attribute_names, expression_attribute_values,
used_attribute_names, used_attribute_values);
}
}
void resolve_update_expression(parsed::update_expression& ue,
const rjson::value* expression_attribute_names,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_names,
std::unordered_set<std::string>& used_attribute_values) {
for (parsed::update_expression::action& action : ue.actions()) {
resolve_path(action._path, expression_attribute_names, used_attribute_names);
std::visit(overloaded_functor {
[&] (parsed::update_expression::action::set& a) {
resolve_set_rhs(a._rhs, expression_attribute_names, expression_attribute_values,
used_attribute_names, used_attribute_values);
},
[&] (parsed::update_expression::action::remove& a) {
// nothing to do
},
[&] (parsed::update_expression::action::add& a) {
resolve_constant(a._valref, expression_attribute_values, used_attribute_values);
},
[&] (parsed::update_expression::action::del& a) {
resolve_constant(a._valref, expression_attribute_values, used_attribute_values);
}
}, action._action);
}
}
static void resolve_primitive_condition(parsed::primitive_condition& pc,
const rjson::value* expression_attribute_names,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_names,
std::unordered_set<std::string>& used_attribute_values) {
for (parsed::value& value : pc._values) {
resolve_value(value,
expression_attribute_names, expression_attribute_values,
used_attribute_names, used_attribute_values);
}
}
void resolve_condition_expression(parsed::condition_expression& ce,
const rjson::value* expression_attribute_names,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_names,
std::unordered_set<std::string>& used_attribute_values) {
std::visit(overloaded_functor {
[&] (parsed::primitive_condition& cond) {
resolve_primitive_condition(cond,
expression_attribute_names, expression_attribute_values,
used_attribute_names, used_attribute_values);
},
[&] (parsed::condition_expression::condition_list& list) {
for (parsed::condition_expression& cond : list.conditions) {
resolve_condition_expression(cond,
expression_attribute_names, expression_attribute_values,
used_attribute_names, used_attribute_values);
}
}
}, ce._expression);
}
void resolve_projection_expression(std::vector<parsed::path>& pe,
const rjson::value* expression_attribute_names,
std::unordered_set<std::string>& used_attribute_names) {
for (parsed::path& p : pe) {
resolve_path(p, expression_attribute_names, used_attribute_names);
}
}
// condition_expression_on() checks whether a condition_expression places any
// condition on the given attribute. It can be useful, for example, for
// checking whether the condition tries to restrict a key column.
static bool value_on(const parsed::value& v, std::string_view attribute) {
return std::visit(overloaded_functor {
[&] (const parsed::constant& c) {
return false;
},
[&] (const parsed::value::function_call& f) {
for (const parsed::value& value : f._parameters) {
if (value_on(value, attribute)) {
return true;
}
}
return false;
},
[&] (const parsed::path& p) {
return p.root() == attribute;
}
}, v._value);
}
static bool primitive_condition_on(const parsed::primitive_condition& pc, std::string_view attribute) {
for (const parsed::value& value : pc._values) {
if (value_on(value, attribute)) {
return true;
}
}
return false;
}
bool condition_expression_on(const parsed::condition_expression& ce, std::string_view attribute) {
return std::visit(overloaded_functor {
[&] (const parsed::primitive_condition& cond) {
return primitive_condition_on(cond, attribute);
},
[&] (const parsed::condition_expression::condition_list& list) {
for (const parsed::condition_expression& cond : list.conditions) {
if (condition_expression_on(cond, attribute)) {
return true;
}
}
return false;
}
}, ce._expression);
}
// for_condition_expression_on() runs a given function over all the attributes
// mentioned in the expression. If the same attribute is mentioned more than
// once, the function will be called more than once for the same attribute.
static void for_value_on(const parsed::value& v, const noncopyable_function<void(std::string_view)>& func) {
std::visit(overloaded_functor {
[&] (const parsed::constant& c) { },
[&] (const parsed::value::function_call& f) {
for (const parsed::value& value : f._parameters) {
for_value_on(value, func);
}
},
[&] (const parsed::path& p) {
func(p.root());
}
}, v._value);
}
void for_condition_expression_on(const parsed::condition_expression& ce, const noncopyable_function<void(std::string_view)>& func) {
std::visit(overloaded_functor {
[&] (const parsed::primitive_condition& cond) {
for (const parsed::value& value : cond._values) {
for_value_on(value, func);
}
},
[&] (const parsed::condition_expression::condition_list& list) {
for (const parsed::condition_expression& cond : list.conditions) {
for_condition_expression_on(cond, func);
}
}
}, ce._expression);
}
// The following calculate_value() functions calculate, or evaluate, a parsed
// expression. The parsed expression is assumed to have been "resolved", with
// the matching resolve_* function.
// Take two JSON-encoded list values (remember that a list value is
// {"L": [...the actual list]}) and return the concatenation, again as
// a list value.
static rjson::value list_concatenate(const rjson::value& v1, const rjson::value& v2) {
const rjson::value* list1 = unwrap_list(v1);
const rjson::value* list2 = unwrap_list(v2);
if (!list1 || !list2) {
throw api_error::validation("UpdateExpression: list_append() given a non-list");
}
rjson::value cat = rjson::copy(*list1);
for (const auto& a : list2->GetArray()) {
rjson::push_back(cat, rjson::copy(a));
}
rjson::value ret = rjson::empty_object();
rjson::set(ret, "L", std::move(cat));
return ret;
}
// calculate_size() is ConditionExpression's size() function, i.e., it takes
// a JSON-encoded value and returns its "size" as defined differently for the
// different types - also as a JSON-encoded number.
// It return a JSON-encoded "null" value if this value's type has no size
// defined. Comparisons against this non-numeric value will later fail.
static rjson::value calculate_size(const rjson::value& v) {
// NOTE: If v is improperly formatted for our JSON value encoding, it
// must come from the request itself, not from the database, so it makes
// sense to throw a ValidationException if we see such a problem.
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error::validation(format("invalid object: {}", v));
}
auto it = v.MemberBegin();
int ret;
if (it->name == "S") {
if (!it->value.IsString()) {
throw api_error::validation(format("invalid string: {}", v));
}
ret = it->value.GetStringLength();
} else if (it->name == "NS" || it->name == "SS" || it->name == "BS" || it->name == "L") {
if (!it->value.IsArray()) {
throw api_error::validation(format("invalid set: {}", v));
}
ret = it->value.Size();
} else if (it->name == "M") {
if (!it->value.IsObject()) {
throw api_error::validation(format("invalid map: {}", v));
}
ret = it->value.MemberCount();
} else if (it->name == "B") {
if (!it->value.IsString()) {
throw api_error::validation(format("invalid byte string: {}", v));
}
ret = base64_decoded_len(rjson::to_string_view(it->value));
} else {
rjson::value json_ret = rjson::empty_object();
rjson::set(json_ret, "null", rjson::value(true));
return json_ret;
}
rjson::value json_ret = rjson::empty_object();
rjson::set(json_ret, "N", rjson::from_string(std::to_string(ret)));
return json_ret;
}
static const rjson::value& calculate_value(const parsed::constant& c) {
return std::visit(overloaded_functor {
[&] (const parsed::constant::literal& v) -> const rjson::value& {
return *v;
},
[&] (const std::string& valref) -> const rjson::value& {
// Shouldn't happen, we should have called resolve_value() earlier
// and replaced the value reference by the literal constant.
throw std::logic_error("calculate_value() called before resolve_value()");
}
}, c._value);
}
static rjson::value to_bool_json(bool b) {
rjson::value json_ret = rjson::empty_object();
rjson::set(json_ret, "BOOL", rjson::value(b));
return json_ret;
}
static bool known_type(std::string_view type) {
static thread_local const std::unordered_set<std::string_view> types = {
"N", "S", "B", "NS", "SS", "BS", "L", "M", "NULL", "BOOL"
};
return types.contains(type);
}
using function_handler_type = rjson::value(calculate_value_caller, const rjson::value*, const parsed::value::function_call&);
static const
std::unordered_map<std::string_view, function_handler_type*> function_handlers {
{"list_append", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::UpdateExpression) {
throw api_error::validation(
format("{}: list_append() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
format("{}: list_append() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
return list_concatenate(v1, v2);
}
},
{"if_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::UpdateExpression) {
throw api_error::validation(
format("{}: if_not_exists() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
format("{}: if_not_exists() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
throw api_error::validation(
format("{}: if_not_exists() must include path as its first argument", caller));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
return v1.IsNull() ? std::move(v2) : std::move(v1);
}
},
{"size", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpression) {
throw api_error::validation(
format("{}: size() not allowed here", caller));
}
if (f._parameters.size() != 1) {
throw api_error::validation(
format("{}: size() accepts 1 parameter, got {}", caller, f._parameters.size()));
}
rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
return calculate_size(v);
}
},
{"attribute_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
format("{}: attribute_exists() not allowed here", caller));
}
if (f._parameters.size() != 1) {
throw api_error::validation(
format("{}: attribute_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));
}
if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
throw api_error::validation(
format("{}: attribute_exists()'s parameter must be a path", caller));
}
rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
return to_bool_json(!v.IsNull());
}
},
{"attribute_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
format("{}: attribute_not_exists() not allowed here", caller));
}
if (f._parameters.size() != 1) {
throw api_error::validation(
format("{}: attribute_not_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));
}
if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
throw api_error::validation(
format("{}: attribute_not_exists()'s parameter must be a path", caller));
}
rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
return to_bool_json(v.IsNull());
}
},
{"attribute_type", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
format("{}: attribute_type() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
format("{}: attribute_type() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
// There is no real reason for the following check (not
// allowing the type to come from a document attribute), but
// DynamoDB does this check, so we do too...
if (!f._parameters[1].is_constant()) {
throw api_error::validation(
format("{}: attribute_types()'s first parameter must be an expression attribute", caller));
}
rjson::value v0 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v1 = calculate_value(f._parameters[1], caller, previous_item);
if (v1.IsObject() && v1.MemberCount() == 1 && v1.MemberBegin()->name == "S") {
// If the type parameter is not one of the legal types
// we should generate an error, not a failed condition:
if (!known_type(rjson::to_string_view(v1.MemberBegin()->value))) {
throw api_error::validation(
format("{}: attribute_types()'s second parameter, {}, is not a known type",
caller, v1.MemberBegin()->value));
}
if (v0.IsObject() && v0.MemberCount() == 1) {
return to_bool_json(v1.MemberBegin()->value == v0.MemberBegin()->name);
} else {
return to_bool_json(false);
}
} else {
throw api_error::validation(
format("{}: attribute_type() second parameter must refer to a string, got {}", caller, v1));
}
}
},
{"begins_with", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
format("{}: begins_with() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
format("{}: begins_with() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
// TODO: There's duplication here with check_BEGINS_WITH().
// But unfortunately, the two functions differ a bit.
// If one of v1 or v2 is malformed or has an unsupported type
// (not B or S), what we do depends on whether it came from
// the user's query (is_constant()), or the item. Unsupported
// values in the query result in an error, but if they are in
// the item, we silently return false (no match).
bool bad = false;
if (!v1.IsObject() || v1.MemberCount() != 1) {
bad = true;
if (f._parameters[0].is_constant()) {
throw api_error::validation(format("{}: begins_with() encountered malformed AttributeValue: {}", caller, v1));
}
} else if (v1.MemberBegin()->name != "S" && v1.MemberBegin()->name != "B") {
bad = true;
if (f._parameters[0].is_constant()) {
throw api_error::validation(format("{}: begins_with() supports only string or binary in AttributeValue: {}", caller, v1));
}
}
if (!v2.IsObject() || v2.MemberCount() != 1) {
bad = true;
if (f._parameters[1].is_constant()) {
throw api_error::validation(format("{}: begins_with() encountered malformed AttributeValue: {}", caller, v2));
}
} else if (v2.MemberBegin()->name != "S" && v2.MemberBegin()->name != "B") {
bad = true;
if (f._parameters[1].is_constant()) {
throw api_error::validation(format("{}: begins_with() supports only string or binary in AttributeValue: {}", caller, v2));
}
}
bool ret = false;
if (!bad) {
auto it1 = v1.MemberBegin();
auto it2 = v2.MemberBegin();
if (it1->name == it2->name) {
if (it2->name == "S") {
std::string_view val1 = rjson::to_string_view(it1->value);
std::string_view val2 = rjson::to_string_view(it2->value);
ret = val1.starts_with(val2);
} else /* it2->name == "B" */ {
ret = base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));
}
}
}
return to_bool_json(ret);
}
},
{"contains", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
if (caller != calculate_value_caller::ConditionExpressionAlone) {
throw api_error::validation(
format("{}: contains() not allowed here", caller));
}
if (f._parameters.size() != 2) {
throw api_error::validation(
format("{}: contains() accepts 2 parameters, got {}", caller, f._parameters.size()));
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
return to_bool_json(check_CONTAINS(v1.IsNull() ? nullptr : &v1, v2));
}
},
};
// Given a parsed::value, which can refer either to a constant value from
// ExpressionAttributeValues, to the value of some attribute, or to a function
// of other values, this function calculates the resulting value.
// "caller" determines which expression - ConditionExpression or
// UpdateExpression - is asking for this value. We need to know this because
// DynamoDB allows a different choice of functions for different expressions.
rjson::value calculate_value(const parsed::value& v,
calculate_value_caller caller,
const rjson::value* previous_item) {
return std::visit(overloaded_functor {
[&] (const parsed::constant& c) -> rjson::value {
return rjson::copy(calculate_value(c));
},
[&] (const parsed::value::function_call& f) -> rjson::value {
auto function_it = function_handlers.find(std::string_view(f._function_name));
if (function_it == function_handlers.end()) {
throw api_error::validation(
format("UpdateExpression: unknown function '{}' called.", f._function_name));
}
return function_it->second(caller, previous_item, f);
},
[&] (const parsed::path& p) -> rjson::value {
if (!previous_item) {
return rjson::null_value();
}
std::string update_path = p.root();
if (p.has_operators()) {
// FIXME: support this
throw api_error::validation("Reading attribute paths not yet implemented");
}
const rjson::value* previous_value = rjson::find(*previous_item, update_path);
return previous_value ? rjson::copy(*previous_value) : rjson::null_value();
}
}, v._value);
}
// Same as calculate_value() above, except takes a set_rhs, which may be
// either a single value, or v1+v2 or v1-v2.
rjson::value calculate_value(const parsed::set_rhs& rhs,
const rjson::value* previous_item) {
switch (rhs._op) {
case 'v':
return calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);
case '+': {
rjson::value v1 = calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);
rjson::value v2 = calculate_value(rhs._v2, calculate_value_caller::UpdateExpression, previous_item);
return number_add(v1, v2);
}
case '-': {
rjson::value v1 = calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);
rjson::value v2 = calculate_value(rhs._v2, calculate_value_caller::UpdateExpression, previous_item);
return number_subtract(v1, v2);
}
}
// Can't happen
return rjson::null_value();
}
} // namespace alternator

View File

@@ -1,265 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*
* This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
* top-level directory for licensing information.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* The DynamoDB protocol is based on JSON, and most DynamoDB requests
* describe the operation and its parameters via JSON objects such as maps
* and lists. Nevertheless, in some types of requests an "expression" is
* passed as a single string, and we need to parse this string. These
* cases include:
* 1. Attribute paths, such as "a[3].b.c", are used in projection
* expressions as well as inside other expressions described below.
* 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
* used in conditional updates, filters, and other places.
* 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
*
* All these expression syntaxes are very simple: Most of them could be
* parsed as regular expressions, and the parenthesized condition expression
* could be done with a simple hand-written lexical analyzer and recursive-
* descent parser. Nevertheless, we decided to specify these parsers in the
* ANTLR3 language already used in the Scylla project, hopefully making these
* parsers easier to reason about, and easier to change if needed - and
* reducing the amount of boiler-plate code.
*/
grammar expressions;
options {
language = Cpp;
}
@parser::namespace{alternator}
@lexer::namespace{alternator}
/* TODO: explain what these traits things are. I haven't seen them explained
* in any document... Compilation fails without these fail because a definition
* of "expressionsLexerTraits" and "expressionParserTraits" is needed.
*/
@lexer::traits {
class expressionsLexer;
class expressionsParser;
typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
}
@parser::traits {
typedef expressionsLexerTraits expressionsParserTraits;
}
@lexer::header {
#include "alternator/expressions.hh"
// ANTLR generates a bunch of unused variables and functions. Yuck...
#pragma GCC diagnostic ignored "-Wunused-variable"
#pragma GCC diagnostic ignored "-Wunused-function"
}
@parser::header {
#include "expressionsLexer.hpp"
}
/* By default, ANTLR3 composes elaborate syntax-error messages, saying which
* token was unexpected, where, and so on on, but then dutifully writes these
* error messages to the standard error, and returns from the parser as if
* everything was fine, with a half-constructed output object! If we define
* the "displayRecognitionError" method, it will be called upon to build this
* error message, and we can instead throw an exception to stop the parsing
* immediately. This is good enough for now, for our simple needs, but if
* we ever want to show more information about the syntax error, Cql3.g
* contains an elaborate implementation (it would be nice if we could reuse
* it, not duplicate it).
* Unfortunately, we have to repeat the same definition twice - once for the
* parser, and once for the lexer.
*/
@parser::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
/*
* Lexical analysis phase, i.e., splitting the input up to tokens.
* Lexical analyzer rules have names starting in capital letters.
* "fragment" rules do not generate tokens, and are just aliases used to
* make other rules more readable.
* Characters *not* listed here, e.g., '=', '(', etc., will be handled
* as individual tokens on their own right.
* Whitespace spans are skipped, so do not generate tokens.
*/
WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
/* shortcuts for case-insensitive keywords */
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
/* These keywords must be appear before the generic NAME token below,
* because NAME matches too, and the first to match wins.
*/
SET: S E T;
REMOVE: R E M O V E;
ADD: A D D;
DELETE: D E L E T E;
AND: A N D;
OR: O R;
NOT: N O T;
BETWEEN: B E T W E E N;
IN: I N;
fragment ALPHA: 'A'..'Z' | 'a'..'z';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA | DIGIT | '_';
INTEGER: DIGIT+;
NAME: ALPHA ALNUM*;
NAMEREF: '#' ALNUM+;
VALREF: ':' ALNUM+;
/*
* Parsing phase - parsing the string of tokens generated by the lexical
* analyzer defined above.
*/
path_component: NAME | NAMEREF;
path returns [parsed::path p]:
root=path_component { $p.set_root($root.text); }
( '.' name=path_component { $p.add_dot($name.text); }
| '[' INTEGER ']' { $p.add_index(std::stoi($INTEGER.text)); }
)*;
value returns [parsed::value v]:
VALREF { $v.set_valref($VALREF.text); }
| path { $v.set_path($path.p); }
| NAME { $v.set_func_name($NAME.text); }
'(' x=value { $v.add_func_parameter($x.v); }
(',' x=value { $v.add_func_parameter($x.v); })*
')'
;
update_expression_set_rhs returns [parsed::set_rhs rhs]:
v=value { $rhs.set_value(std::move($v.v)); }
( '+' v=value { $rhs.set_plus(std::move($v.v)); }
| '-' v=value { $rhs.set_minus(std::move($v.v)); }
)?
;
update_expression_set_action returns [parsed::update_expression::action a]:
path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
update_expression_remove_action returns [parsed::update_expression::action a]:
path { $a.assign_remove($path.p); };
update_expression_add_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_add($path.p, $VALREF.text); };
update_expression_delete_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_del($path.p, $VALREF.text); };
update_expression_clause returns [parsed::update_expression e]:
SET s=update_expression_set_action { $e.add(s); }
(',' s=update_expression_set_action { $e.add(s); })*
| REMOVE r=update_expression_remove_action { $e.add(r); }
(',' r=update_expression_remove_action { $e.add(r); })*
| ADD a=update_expression_add_action { $e.add(a); }
(',' a=update_expression_add_action { $e.add(a); })*
| DELETE d=update_expression_delete_action { $e.add(d); }
(',' d=update_expression_delete_action { $e.add(d); })*
;
// Note the "EOF" token at the end of the update expression. We want to the
// parser to match the entire string given to it - not just its beginning!
update_expression returns [parsed::update_expression e]:
(update_expression_clause { e.append($update_expression_clause.e); })* EOF;
projection_expression returns [std::vector<parsed::path> v]:
p=path { $v.push_back(std::move($p.p)); }
(',' p=path { $v.push_back(std::move($p.p)); } )* EOF;
primitive_condition returns [parsed::primitive_condition c]:
v=value { $c.add_value(std::move($v.v));
$c.set_operator(parsed::primitive_condition::type::VALUE); }
( ( '=' { $c.set_operator(parsed::primitive_condition::type::EQ); }
| '<' '>' { $c.set_operator(parsed::primitive_condition::type::NE); }
| '<' { $c.set_operator(parsed::primitive_condition::type::LT); }
| '<' '=' { $c.set_operator(parsed::primitive_condition::type::LE); }
| '>' { $c.set_operator(parsed::primitive_condition::type::GT); }
| '>' '=' { $c.set_operator(parsed::primitive_condition::type::GE); }
)
v=value { $c.add_value(std::move($v.v)); }
| BETWEEN { $c.set_operator(parsed::primitive_condition::type::BETWEEN); }
v=value { $c.add_value(std::move($v.v)); }
AND
v=value { $c.add_value(std::move($v.v)); }
| IN '(' { $c.set_operator(parsed::primitive_condition::type::IN); }
v=value { $c.add_value(std::move($v.v)); }
(',' v=value { $c.add_value(std::move($v.v)); })*
')'
)?
;
// The following rules for parsing boolean expressions are verbose and
// somewhat strange because of Antlr 3's limitations on recursive rules,
// common rule prefixes, and (lack of) support for operator precedence.
// These rules could have been written more clearly using a more powerful
// parser generator - such as Yacc.
boolean_expression returns [parsed::condition_expression e]:
b=boolean_expression_1 { $e.append(std::move($b.e), '|'); }
(OR b=boolean_expression_1 { $e.append(std::move($b.e), '|'); } )*
;
boolean_expression_1 returns [parsed::condition_expression e]:
b=boolean_expression_2 { $e.append(std::move($b.e), '&'); }
(AND b=boolean_expression_2 { $e.append(std::move($b.e), '&'); } )*
;
boolean_expression_2 returns [parsed::condition_expression e]:
p=primitive_condition { $e.set_primitive(std::move($p.c)); }
| NOT b=boolean_expression_2 { $e = std::move($b.e); $e.apply_not(); }
| '(' b=boolean_expression ')' { $e = std::move($b.e); }
;
condition_expression returns [parsed::condition_expression e]:
boolean_expression { e=std::move($boolean_expression.e); } EOF;

View File

@@ -1,102 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <stdexcept>
#include <vector>
#include <unordered_set>
#include <string_view>
#include <seastar/util/noncopyable_function.hh>
#include "expressions_types.hh"
#include "utils/rjson.hh"
namespace alternator {
class expressions_syntax_error : public std::runtime_error {
public:
using runtime_error::runtime_error;
};
parsed::update_expression parse_update_expression(std::string query);
std::vector<parsed::path> parse_projection_expression(std::string query);
parsed::condition_expression parse_condition_expression(std::string query);
void resolve_update_expression(parsed::update_expression& ue,
const rjson::value* expression_attribute_names,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_names,
std::unordered_set<std::string>& used_attribute_values);
void resolve_projection_expression(std::vector<parsed::path>& pe,
const rjson::value* expression_attribute_names,
std::unordered_set<std::string>& used_attribute_names);
void resolve_condition_expression(parsed::condition_expression& ce,
const rjson::value* expression_attribute_names,
const rjson::value* expression_attribute_values,
std::unordered_set<std::string>& used_attribute_names,
std::unordered_set<std::string>& used_attribute_values);
void validate_value(const rjson::value& v, const char* caller);
bool condition_expression_on(const parsed::condition_expression& ce, std::string_view attribute);
// for_condition_expression_on() runs the given function on the attributes
// that the expression uses. It may run for the same attribute more than once
// if the same attribute is used more than once in the expression.
void for_condition_expression_on(const parsed::condition_expression& ce, const noncopyable_function<void(std::string_view)>& func);
// calculate_value() behaves slightly different (especially, different
// functions supported) when used in different types of expressions, as
// enumerated in this enum:
enum class calculate_value_caller {
UpdateExpression, ConditionExpression, ConditionExpressionAlone
};
inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {
switch (caller) {
case calculate_value_caller::UpdateExpression:
out << "UpdateExpression";
break;
case calculate_value_caller::ConditionExpression:
out << "ConditionExpression";
break;
case calculate_value_caller::ConditionExpressionAlone:
out << "ConditionExpression";
break;
default:
out << "unknown type of expression";
break;
}
return out;
}
rjson::value calculate_value(const parsed::value& v,
calculate_value_caller caller,
const rjson::value* previous_item);
rjson::value calculate_value(const parsed::set_rhs& rhs,
const rjson::value* previous_item);
} /* namespace alternator */

View File

@@ -1,255 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include <string>
#include <variant>
#include <seastar/core/shared_ptr.hh>
#include "utils/rjson.hh"
/*
* Parsed representation of expressions and their components.
*
* Types in alternator::parse namespace are used for holding the parse
* tree - objects generated by the Antlr rules after parsing an expression.
* Because of the way Antlr works, all these objects are default-constructed
* first, and then assigned when the rule is completed, so all these types
* have only default constructors - but setter functions to set them later.
*/
namespace alternator {
namespace parsed {
// "path" is an attribute's path in a document, e.g., a.b[3].c.
class path {
// All paths have a "root", a top-level attribute, and any number of
// "dereference operators" - each either an index (e.g., "[2]") or a
// dot (e.g., ".xyz").
std::string _root;
std::vector<std::variant<std::string, unsigned>> _operators;
public:
void set_root(std::string root) {
_root = std::move(root);
}
void add_index(unsigned i) {
_operators.emplace_back(i);
}
void add_dot(std::string(name)) {
_operators.emplace_back(std::move(name));
}
const std::string& root() const {
return _root;
}
bool has_operators() const {
return !_operators.empty();
}
};
// When an expression is first parsed, all constants are references, like
// ":val1", into ExpressionAttributeValues. This uses std::string() variant.
// The resolve_value() function replaces these constants by the JSON item
// extracted from the ExpressionAttributeValues.
struct constant {
// We use lw_shared_ptr<rjson::value> just to make rjson::value copyable,
// to make this entire object copyable as ANTLR needs.
using literal = lw_shared_ptr<rjson::value>;
std::variant<std::string, literal> _value;
void set(const rjson::value& v) {
_value = make_lw_shared<rjson::value>(rjson::copy(v));
}
void set(std::string& s) {
_value = s;
}
};
// "value" is is a value used in the right hand side of an assignment
// expression, "SET a = ...". It can be a constant (a reference to a value
// included in the request, e.g., ":val"), a path to an attribute from the
// existing item (e.g., "a.b[3].c"), or a function of other such values.
// Note that the real right-hand-side of an assignment is actually a bit
// more general - it allows either a value, or a value+value or value-value -
// see class set_rhs below.
struct value {
struct function_call {
std::string _function_name;
std::vector<value> _parameters;
};
std::variant<constant, path, function_call> _value;
void set_constant(constant c) {
_value = std::move(c);
}
void set_valref(std::string s) {
_value = constant { std::move(s) };
}
void set_path(path p) {
_value = std::move(p);
}
void set_func_name(std::string s) {
_value = function_call {std::move(s), {}};
}
void add_func_parameter(value v) {
std::get<function_call>(_value)._parameters.emplace_back(std::move(v));
}
bool is_constant() const {
return std::holds_alternative<constant>(_value);
}
bool is_path() const {
return std::holds_alternative<path>(_value);
}
bool is_func() const {
return std::holds_alternative<function_call>(_value);
}
};
// The right-hand-side of a SET in an update expression can be either a
// single value (see above), or value+value, or value-value.
class set_rhs {
public:
char _op; // '+', '-', or 'v''
value _v1;
value _v2;
void set_value(value&& v1) {
_op = 'v';
_v1 = std::move(v1);
}
void set_plus(value&& v2) {
_op = '+';
_v2 = std::move(v2);
}
void set_minus(value&& v2) {
_op = '-';
_v2 = std::move(v2);
}
};
class update_expression {
public:
struct action {
path _path;
struct set {
set_rhs _rhs;
};
struct remove {
};
struct add {
constant _valref;
};
struct del {
constant _valref;
};
std::variant<set, remove, add, del> _action;
void assign_set(path p, set_rhs rhs) {
_path = std::move(p);
_action = set { std::move(rhs) };
}
void assign_remove(path p) {
_path = std::move(p);
_action = remove { };
}
void assign_add(path p, std::string v) {
_path = std::move(p);
_action = add { constant { std::move(v) } };
}
void assign_del(path p, std::string v) {
_path = std::move(p);
_action = del { constant { std::move(v) } };
}
};
private:
std::vector<action> _actions;
bool seen_set = false;
bool seen_remove = false;
bool seen_add = false;
bool seen_del = false;
public:
void add(action a);
void append(update_expression other);
bool empty() const {
return _actions.empty();
}
const std::vector<action>& actions() const {
return _actions;
}
std::vector<action>& actions() {
return _actions;
}
};
// A primitive_condition is a condition expression involving one condition,
// while the full condition_expression below adds boolean logic over these
// primitive conditions.
// The supported primitive conditions are:
// 1. Binary operators - v1 OP v2, where OP is =, <>, <, <=, >, or >= and
// v1 and v2 are values - from the item (an attribute path), the query
// (a ":val" reference), or a function of the the above (only the size()
// function is supported).
// 2. Ternary operator - v1 BETWEEN v2 and v3 (means v1 >= v2 AND v1 <= v3).
// 3. N-ary operator - v1 IN ( v2, v3, ... )
// 4. A single function call (attribute_exists etc.). The parser actually
// accepts a more general "value" here but later stages reject a value
// which is not a function call (because DynamoDB does it too).
class primitive_condition {
public:
enum class type {
UNDEFINED, VALUE, EQ, NE, LT, LE, GT, GE, BETWEEN, IN
};
type _op = type::UNDEFINED;
std::vector<value> _values;
void set_operator(type op) {
_op = op;
}
void add_value(value&& v) {
_values.push_back(std::move(v));
}
bool empty() const {
return _op == type::UNDEFINED;
}
};
class condition_expression {
public:
bool _negated = false; // If true, the entire condition is negated
struct condition_list {
char op = '|'; // '&' or '|'
std::vector<condition_expression> conditions;
};
std::variant<primitive_condition, condition_list> _expression = condition_list();
void set_primitive(primitive_condition&& p) {
_expression = std::move(p);
}
void append(condition_expression&& c, char op);
void apply_not() {
_negated = !_negated;
}
bool empty() const {
return std::holds_alternative<condition_list>(_expression) &&
std::get<condition_list>(_expression).conditions.empty();
}
};
} // namespace parsed
} // namespace alternator

View File

@@ -1,128 +0,0 @@
/*
* Copyright 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "seastarx.hh"
#include "service/storage_proxy.hh"
#include "service/storage_proxy.hh"
#include "utils/rjson.hh"
#include "executor.hh"
namespace alternator {
// An rmw_operation encapsulates the common logic of all the item update
// operations which may involve a read of the item before the write
// (so-called Read-Modify-Write operations). These operations include PutItem,
// UpdateItem and DeleteItem: All of these may be conditional operations (the
// "Expected" parameter) which requir a read before the write, and UpdateItem
// may also have an update expression which refers to the item's old value.
//
// The code below supports running the read and the write together as one
// transaction using LWT (this is why rmw_operation is a subclass of
// cas_request, as required by storage_proxy::cas()), but also has optional
// modes not using LWT.
class rmw_operation : public service::cas_request, public enable_shared_from_this<rmw_operation> {
public:
// The following options choose which mechanism to use for isolating
// parallel write operations:
// * The FORBID_RMW option forbids RMW (read-modify-write) operations
// such as conditional updates. For the remaining write-only
// operations, ordinary quorum writes are isolated enough.
// * The LWT_ALWAYS option always uses LWT (lightweight transactions)
// for any write operation - whether or not it also has a read.
// * The LWT_RMW_ONLY option uses LWT only for RMW operations, and uses
// ordinary quorum writes for write-only operations.
// This option is not safe if the user may send both RMW and write-only
// operations on the same item.
// * The UNSAFE_RMW option does read-modify-write operations as separate
// read and write. It is unsafe - concurrent RMW operations are not
// isolated at all. This option will likely be removed in the future.
enum class write_isolation {
FORBID_RMW, LWT_ALWAYS, LWT_RMW_ONLY, UNSAFE_RMW
};
static constexpr auto WRITE_ISOLATION_TAG_KEY = "system:write_isolation";
static write_isolation get_write_isolation_for_schema(schema_ptr schema);
static write_isolation default_write_isolation;
public:
static void set_default_write_isolation(std::string_view mode);
protected:
// The full request JSON
rjson::value _request;
// All RMW operations involve a single item with a specific partition
// and optional clustering key, in a single table, so the following
// information is common to all of them:
schema_ptr _schema;
partition_key _pk = partition_key::make_empty();
clustering_key _ck = clustering_key::make_empty();
write_isolation _write_isolation;
// All RMW operations can have a ReturnValues parameter from the following
// choices. But note that only UpdateItem actually supports all of them:
enum class returnvalues {
NONE, ALL_OLD, UPDATED_OLD, ALL_NEW, UPDATED_NEW
} _returnvalues;
static returnvalues parse_returnvalues(const rjson::value& request);
// When _returnvalues != NONE, apply() should store here, in JSON form,
// the values which are to be returned in the "Attributes" field.
// The default null JSON means do not return an Attributes field at all.
// This field is marked "mutable" so that the const apply() can modify
// it (see explanation below), but note that because apply() may be
// called more than once, if apply() will sometimes set this field it
// must set it (even if just to the default empty value) every time.
mutable rjson::value _return_attributes;
public:
// The constructor of a rmw_operation subclass should parse the request
// and try to discover as many input errors as it can before really
// attempting the read or write operations.
rmw_operation(service::storage_proxy& proxy, rjson::value&& request);
// rmw_operation subclasses (update_item_operation, put_item_operation
// and delete_item_operation) shall implement an apply() function which
// takes the previous value of the item (if it was read) and creates the
// write mutation. If the previous value of item does not pass the needed
// conditional expression, apply() should return an empty optional.
// apply() may throw if it encounters input errors not discovered during
// the constructor.
// apply() may be called more than once in case of contention, so it must
// not change the state saved in the object (issue #7218 was caused by
// violating this). We mark apply() "const" to let the compiler validate
// this for us. The output-only field _return_attributes is marked
// "mutable" above so that apply() can still write to it.
virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const = 0;
// Convert the above apply() into the signature needed by cas_request:
virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts) override;
virtual ~rmw_operation() = default;
schema_ptr schema() const { return _schema; }
const rjson::value& request() const { return _request; }
rjson::value&& move_request() && { return std::move(_request); }
future<executor::request_return_type> execute(service::storage_proxy& proxy,
service::client_state& client_state,
tracing::trace_state_ptr trace_state,
service_permit permit,
bool needs_read_before_write,
stats& stats);
std::optional<shard_id> shard_for_execute(bool needs_read_before_write);
};
} // namespace alternator

View File

@@ -1,375 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "base64.hh"
#include "log.hh"
#include "serialization.hh"
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
static logging::logger slogger("alternator-serialization");
namespace alternator {
type_info type_info_from_string(std::string_view type) {
static thread_local const std::unordered_map<std::string_view, type_info> type_infos = {
{"S", {alternator_type::S, utf8_type}},
{"B", {alternator_type::B, bytes_type}},
{"BOOL", {alternator_type::BOOL, boolean_type}},
{"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_infos.find(type);
if (it == type_infos.end()) {
return {alternator_type::NOT_SUPPORTED_YET, utf8_type};
}
return it->second;
}
type_representation represent_type(alternator_type atype) {
static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {
{alternator_type::S, {"S", utf8_type}},
{alternator_type::B, {"B", bytes_type}},
{alternator_type::BOOL, {"BOOL", boolean_type}},
{alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_representations.find(atype);
if (it == type_representations.end()) {
throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));
}
return it->second;
}
struct from_json_visitor {
const rjson::value& v;
bytes_ostream& bo;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };
void operator()(const string_type_impl& t) {
bo.write(t.from_string(rjson::to_string_view(v)));
}
void operator()(const bytes_type_impl& t) const {
bo.write(base64_decode(v));
}
void operator()(const boolean_type_impl& t) const {
bo.write(boolean_type->decompose(v.GetBool()));
}
void operator()(const decimal_type_impl& t) const {
try {
bo.write(t.from_string(rjson::to_string_view(v)));
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", v));
}
}
// default
void operator()(const abstract_type& t) const {
bo.write(from_json_object(t, v, cql_serialization_format::internal()));
}
};
bytes serialize_item(const rjson::value& item) {
if (item.IsNull() || item.MemberCount() != 1) {
throw api_error::validation(format("An item can contain only one attribute definition: {}", item));
}
auto it = item.MemberBegin();
type_info type_info = type_info_from_string(rjson::to_string_view(it->name)); // JSON keys are guaranteed to be strings
if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal serialization of type {}", it->name);
return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));
}
bytes_ostream bo;
bo.write(bytes{int8_t(type_info.atype)});
visit(*type_info.dtype, from_json_visitor{it->value, bo});
return bytes(bo.linearize());
}
struct to_json_visitor {
rjson::value& deserialized;
const std::string& type_ident;
bytes_view bv;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };
void operator()(const decimal_type_impl& t) const {
auto s = to_json_string(*decimal_type, bytes(bv));
//FIXME(sarna): unnecessary copy
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));
}
void operator()(const string_type_impl& t) {
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));
}
void operator()(const bytes_type_impl& t) const {
std::string b64 = base64_encode(bv);
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));
}
// default
void operator()(const abstract_type& t) const {
rjson::set_with_string_name(deserialized, type_ident, rjson::parse(to_json_string(t, bytes(bv))));
}
};
rjson::value deserialize_item(bytes_view bv) {
rjson::value deserialized(rapidjson::kObjectType);
if (bv.empty()) {
throw api_error::validation("Serialized value empty");
}
alternator_type atype = alternator_type(bv[0]);
bv.remove_prefix(1);
if (atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));
return rjson::parse(std::string_view(reinterpret_cast<const char *>(bv.data()), bv.size()));
}
type_representation type_representation = represent_type(atype);
visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});
return deserialized;
}
std::string type_to_string(data_type type) {
static thread_local std::unordered_map<data_type, std::string> types = {
{utf8_type, "S"},
{bytes_type, "B"},
{boolean_type, "BOOL"},
{decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type
};
auto it = types.find(type);
if (it == types.end()) {
// fall back to string, in order to be able to present
// internal Scylla types in a human-readable way
return "S";
}
return it->second;
}
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text();
const rjson::value* key_typed_value = rjson::find(item, column_name);
if (!key_typed_value) {
throw api_error::validation(format("Key column {} not found", column_name));
}
return get_key_from_typed_value(*key_typed_value, column);
}
// Parses the JSON encoding for a key value, which is a map with a single
// entry, whose key is the type (expected to match the key column's type)
// and the value is the encoded value.
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column) {
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1 ||
!key_typed_value.MemberBegin()->value.IsString()) {
throw api_error::validation(
format("Malformed value object for key column {}: {}",
column.name_as_text(), key_typed_value));
}
auto it = key_typed_value.MemberBegin();
if (it->name != type_to_string(column.type)) {
throw api_error::validation(
format("Type mismatch: expected type {} for key column {}, got type {}",
type_to_string(column.type), column.name_as_text(), it->name));
}
std::string_view value_view = rjson::to_string_view(it->value);
if (value_view.empty()) {
throw api_error::validation(
format("The AttributeValue for a key attribute cannot contain an empty string value. Key: {}", column.name_as_text()));
}
if (column.type == bytes_type) {
return base64_decode(it->value);
} else {
return column.type->from_string(rjson::to_string_view(it->value));
}
}
rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {
if (column.type == bytes_type) {
std::string b64 = base64_encode(cell);
return rjson::from_string(b64);
} if (column.type == utf8_type) {
return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));
} else if (column.type == decimal_type) {
// FIXME: use specialized Alternator number type, not the more
// general "decimal_type". A dedicated type can be more efficient
// in storage space and in parsing speed.
auto s = to_json_string(*decimal_type, bytes(cell));
return rjson::from_string(s);
} else {
// Support for arbitrary key types is useful for parsing values of virtual tables,
// which can involve any type supported by Scylla.
// In order to guarantee that the returned type is parsable by alternator clients,
// they are represented simply as strings.
return rjson::from_string(column.type->to_string(bytes(cell)));
}
}
partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {
std::vector<bytes> raw_pk;
// FIXME: this is a loop, but we really allow only one partition key column.
for (const column_definition& cdef : schema->partition_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_pk.push_back(std::move(raw_value));
}
return partition_key::from_exploded(raw_pk);
}
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key::make_empty();
}
std::vector<bytes> raw_ck;
// FIXME: this is a loop, but we really allow only one clustering key column.
for (const column_definition& cdef : schema->clustering_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value));
}
return clustering_key::from_exploded(raw_ck);
}
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error::validation(format("{}: invalid number object", diagnostic));
}
auto it = v.MemberBegin();
if (it->name != "N") {
throw api_error::validation(format("{}: expected number, found type '{}'", diagnostic, it->name));
}
try {
if (it->value.IsNumber()) {
// FIXME(sarna): should use big_decimal constructor with numeric values directly:
return big_decimal(rjson::print(it->value));
}
if (!it->value.IsString()) {
throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(rjson::to_string_view(it->value));
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", it->value));
}
}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return {"", nullptr};
}
auto it = v.MemberBegin();
const std::string it_key = it->name.GetString();
if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
return {"", nullptr};
}
return std::make_pair(it_key, &(it->value));
}
const rjson::value* unwrap_list(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return nullptr;
}
auto it = v.MemberBegin();
if (it->name != std::string("L")) {
return nullptr;
}
return &(it->value);
}
// Take two JSON-encoded numeric values ({"N": "thenumber"}) and return the
// sum, again as a JSON-encoded number.
rjson::value number_add(const rjson::value& v1, const rjson::value& v2) {
auto n1 = unwrap_number(v1, "UpdateExpression");
auto n2 = unwrap_number(v2, "UpdateExpression");
rjson::value ret = rjson::empty_object();
std::string str_ret = std::string((n1 + n2).to_string());
rjson::set(ret, "N", rjson::from_string(str_ret));
return ret;
}
rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2) {
auto n1 = unwrap_number(v1, "UpdateExpression");
auto n2 = unwrap_number(v2, "UpdateExpression");
rjson::value ret = rjson::empty_object();
std::string str_ret = std::string((n1 - n2).to_string());
rjson::set(ret, "N", rjson::from_string(str_ret));
return ret;
}
// Take two JSON-encoded set values (e.g. {"SS": [...the actual set]}) and
// return the sum of both sets, again as a set value.
rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {
auto [set1_type, set1] = unwrap_set(v1);
auto [set2_type, set2] = unwrap_set(v2);
if (set1_type != set2_type) {
throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));
}
if (!set1 || !set2) {
throw api_error::validation("UpdateExpression: ADD operation for sets must be given sets as arguments");
}
rjson::value sum = rjson::copy(*set1);
std::set<rjson::value, rjson::single_value_comp> set1_raw;
for (auto it = sum.Begin(); it != sum.End(); ++it) {
set1_raw.insert(rjson::copy(*it));
}
for (const auto& a : set2->GetArray()) {
if (!set1_raw.contains(a)) {
rjson::push_back(sum, rjson::copy(a));
}
}
rjson::value ret = rjson::empty_object();
rjson::set_with_string_name(ret, set1_type, std::move(sum));
return ret;
}
// Take two JSON-encoded set values (e.g. {"SS": [...the actual list]}) and
// return the difference of s1 - s2, again as a set value.
// DynamoDB does not allow empty sets, so if resulting set is empty, return
// an unset optional instead.
std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value& v2) {
auto [set1_type, set1] = unwrap_set(v1);
auto [set2_type, set2] = unwrap_set(v2);
if (set1_type != set2_type) {
throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));
}
if (!set1 || !set2) {
throw api_error::validation("UpdateExpression: DELETE operation can only be performed on a set");
}
std::set<rjson::value, rjson::single_value_comp> set1_raw;
for (auto it = set1->Begin(); it != set1->End(); ++it) {
set1_raw.insert(rjson::copy(*it));
}
for (const auto& a : set2->GetArray()) {
set1_raw.erase(a);
}
if (set1_raw.empty()) {
return std::nullopt;
}
rjson::value ret = rjson::empty_object();
rjson::set_with_string_name(ret, set1_type, rjson::empty_array());
rjson::value& result_set = ret[set1_type];
for (const auto& a : set1_raw) {
rjson::push_back(result_set, rjson::copy(a));
}
return ret;
}
}

View File

@@ -1,89 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include "types.hh"
#include "schema_fwd.hh"
#include "keys.hh"
#include "utils/rjson.hh"
#include "utils/big_decimal.hh"
namespace alternator {
enum class alternator_type : int8_t {
S, B, BOOL, N, NOT_SUPPORTED_YET
};
struct type_info {
alternator_type atype;
data_type dtype;
};
struct type_representation {
std::string ident;
data_type dtype;
};
type_info type_info_from_string(std::string_view type);
type_representation represent_type(alternator_type atype);
bytes serialize_item(const rjson::value& item);
rjson::value deserialize_item(bytes_view bv);
std::string type_to_string(data_type type);
bytes get_key_column_value(const rjson::value& item, const column_definition& column);
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column);
rjson::value json_key_column_value(bytes_view cell, const column_definition& column);
partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it. Otherwise,
// raises ValidationException with diagnostic.
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
// and returns set's type and a pointer to that set. If the object does not encode a set,
// returned value is {"", nullptr}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);
// Check if a given JSON object encodes a list (i.e., it is a {"L": [...]}
// and returns a pointer to that list.
const rjson::value* unwrap_list(const rjson::value& v);
// Take two JSON-encoded numeric values ({"N": "thenumber"}) and return the
// sum, again as a JSON-encoded number.
rjson::value number_add(const rjson::value& v1, const rjson::value& v2);
rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2);
// Take two JSON-encoded set values (e.g. {"SS": [...the actual set]}) and
// return the sum of both sets, again as a set value.
rjson::value set_sum(const rjson::value& v1, const rjson::value& v2);
// Take two JSON-encoded set values (e.g. {"SS": [...the actual list]}) and
// return the difference of s1 - s2, again as a set value.
// DynamoDB does not allow empty sets, so if resulting set is empty, return
// an unset optional instead.
std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value& v2);
}

View File

@@ -1,499 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/server.hh"
#include "log.hh"
#include <seastar/http/function_handlers.hh>
#include <seastar/json/json_elements.hh>
#include "seastarx.hh"
#include "error.hh"
#include "utils/rjson.hh"
#include "auth.hh"
#include <cctype>
#include "cql3/query_processor.hh"
#include "service/storage_service.hh"
#include "utils/overloaded_functor.hh"
static logging::logger slogger("alternator-server");
using namespace httpd;
namespace alternator {
static constexpr auto TARGET = "X-Amz-Target";
inline std::vector<std::string_view> split(std::string_view text, char separator) {
std::vector<std::string_view> tokens;
if (text == "") {
return tokens;
}
while (true) {
auto pos = text.find_first_of(separator);
if (pos != std::string_view::npos) {
tokens.emplace_back(text.data(), pos);
text.remove_prefix(pos + 1);
} else {
tokens.emplace_back(text);
break;
}
}
return tokens;
}
// DynamoDB HTTP error responses are structured as follows
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Our handlers throw an exception to report an error. If the exception
// is of type alternator::api_error, it unwrapped and properly reported to
// the user directly. Other exceptions are unexpected, and reported as
// Internal Server Error.
class api_handler : public handler_base {
public:
api_handler(const std::function<future<executor::request_return_type>(std::unique_ptr<request> req)>& _handle) : _f_handle(
[this, _handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
return seastar::futurize_invoke(_handle, std::move(req)).then_wrapped([this, rep = std::move(rep)](future<executor::request_return_type> resf) mutable {
if (resf.failed()) {
// Exceptions of type api_error are wrapped as JSON and
// returned to the client as expected. Other types of
// exceptions are unexpected, and returned to the user
// as an internal server error:
try {
resf.get();
} catch (api_error &ae) {
generate_error_reply(*rep, ae);
} catch (rjson::error & re) {
generate_error_reply(*rep,
api_error::validation(re.what()));
} catch (...) {
generate_error_reply(*rep,
api_error::internal(format("Internal server error: {}", std::current_exception())));
}
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
auto res = resf.get0();
std::visit(overloaded_functor {
[&] (const json::json_return_type& json_return_value) {
slogger.trace("api_handler success case");
if (json_return_value._body_writer) {
rep->write_body("json", std::move(json_return_value._body_writer));
} else {
rep->_content += json_return_value._res;
}
},
[&] (const api_error& err) {
generate_error_reply(*rep, err);
}
}, res);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}), _type("json") { }
api_handler(const api_handler&) = default;
future<std::unique_ptr<reply>> handle(const sstring& path,
std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
return _f_handle(std::move(req), std::move(rep)).then(
[this](std::unique_ptr<reply> rep) {
rep->done(_type);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}
protected:
void generate_error_reply(reply& rep, const api_error& err) {
rep._content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + err._type + "\"," +
"\"message\":\"" + err._msg + "\"}";
rep._status = err._http_code;
slogger.trace("api_handler error case: {}", rep._content);
}
future_handler_function _f_handle;
sstring _type;
};
class gated_handler : public handler_base {
seastar::gate& _gate;
public:
gated_handler(seastar::gate& gate) : _gate(gate) {}
virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) = 0;
virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) final override {
return with_gate(_gate, [this, &path, req = std::move(req), rep = std::move(rep)] () mutable {
return do_handle(path, std::move(req), std::move(rep));
});
}
};
class health_handler : public gated_handler {
public:
health_handler(seastar::gate& pending_requests) : gated_handler(pending_requests) {}
protected:
virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
rep->set_status(reply::status_type::ok);
rep->write_body("txt", format("healthy: {}", req->get_header("Host")));
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
};
class local_nodelist_handler : public gated_handler {
public:
local_nodelist_handler(seastar::gate& pending_requests) : gated_handler(pending_requests) {}
protected:
virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
rjson::value results = rjson::empty_array();
// It's very easy to get a list of all live nodes on the cluster,
// using gms::get_local_gossiper().get_live_members(). But getting
// just the list of live nodes in this DC needs more elaborate code:
sstring local_dc = locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(
utils::fb_utilities::get_broadcast_address());
std::unordered_set<gms::inet_address> local_dc_nodes =
service::get_local_storage_service().get_token_metadata().
get_topology().get_datacenter_endpoints().at(local_dc);
for (auto& ip : local_dc_nodes) {
if (gms::get_local_gossiper().is_alive(ip)) {
rjson::push_back(results, rjson::from_string(ip.to_sstring()));
}
}
rep->set_status(reply::status_type::ok);
rep->set_content_type("json");
rep->_content = rjson::print(results);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
};
future<> server::verify_signature(const request& req) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<>();
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
throw api_error::invalid_signature("Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (authorization_it == req._headers.end()) {
throw api_error::invalid_signature("Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
std::string credential;
std::string user_signature;
std::string signed_headers_str;
std::vector<std::string_view> signed_headers;
for (std::string_view entry : credentials_raw) {
std::vector<std::string_view> entry_split = split(entry, '=');
if (entry_split.size() != 2) {
if (entry != "AWS4-HMAC-SHA256") {
throw api_error::invalid_signature(format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
}
continue;
}
std::string_view auth_value = entry_split[1];
// Commas appear as an additional (quite redundant) delimiter
if (auth_value.back() == ',') {
auth_value.remove_suffix(1);
}
if (entry_split[0] == "Credential") {
credential = std::string(auth_value);
} else if (entry_split[0] == "Signature") {
user_signature = std::string(auth_value);
} else if (entry_split[0] == "SignedHeaders") {
signed_headers_str = std::string(auth_value);
signed_headers = split(auth_value, ';');
std::sort(signed_headers.begin(), signed_headers.end());
}
}
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
throw api_error::validation(format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
std::string region(credential_split[2]);
std::string service(credential_split[3]);
std::map<std::string_view, std::string_view> signed_headers_map;
for (const auto& header : signed_headers) {
signed_headers_map.emplace(header, std::string_view());
}
for (auto& header : req._headers) {
std::string header_str;
header_str.resize(header.first.size());
std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
auto it = signed_headers_map.find(header_str);
if (it != signed_headers_map.end()) {
it->second = std::string_view(header.second);
}
}
auto cache_getter = [&qp = _qp] (std::string username) {
return get_key_from_roles(qp, std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req,
user = std::move(user),
host = std::move(host),
datestamp = std::move(datestamp),
signed_headers_str = std::move(signed_headers_str),
signed_headers_map = std::move(signed_headers_map),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
throw api_error::unrecognized_client("The security token included in the request is invalid.");
}
});
}
future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
_executor._stats.total_operations++;
sstring target = req->get_header(TARGET);
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
slogger.trace("Request: {} {} {}", op, req->content, req->_headers);
return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor._stats.unsupported_operations++;
throw api_error::unknown_operation(format("Unsupported operation {}", op));
}
return with_gate(_pending_requests, [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] () mutable {
//FIXME: Client state can provide more context, e.g. client's endpoint address
// We use unique_ptr because client_state cannot be moved or copied
return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()),
[this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);
tracing::trace(trace_state, op);
// JSON parsing can allocate up to roughly 2x the size of the raw document, + a couple of bytes for maintenance.
// FIXME: by this time, the whole HTTP request was already read, so some memory is already occupied.
// Once HTTP allows working on streams, we should grab the permit *before* reading the HTTP payload.
size_t mem_estimate = req->content.size() * 3 + 8000;
auto units_fut = get_units(*_memory_limiter, mem_estimate);
if (_memory_limiter->waiters()) {
++_executor._stats.requests_blocked_memory;
}
return units_fut.then([this, callback_it = std::move(callback_it), &client_state, trace_state, req = std::move(req)] (semaphore_units<> units) mutable {
return _json_parser.parse(req->content).then([this, callback_it = std::move(callback_it), &client_state, trace_state,
units = std::move(units), req = std::move(req)] (rjson::value json_request) mutable {
return callback_it->second(_executor, *client_state, trace_state, make_service_permit(std::move(units)), std::move(json_request), std::move(req)).finally([trace_state] {});
});
});
});
});
});
}
void server::set_routes(routes& r) {
api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
return handle_api_request(std::move(req));
});
r.put(operation_type::POST, "/", req_handler);
r.put(operation_type::GET, "/", new health_handler(_pending_requests));
// The "/localnodes" request is a new Alternator feature, not supported by
// DynamoDB and not required for DynamoDB compatibility. It allows a
// client to enquire - using a trivial HTTP request without requiring
// authentication - the list of all live nodes in the same data center of
// the Alternator cluster. The client can use this list to balance its
// request load to all the nodes in the same geographical region.
// Note that this API exposes - openly without authentication - the
// information on the cluster's members inside one data center. We do not
// consider this to be a security risk, because an attacker can already
// scan an entire subnet for nodes responding to the health request,
// or even just scan for open ports.
r.put(operation_type::GET, "/localnodes", new local_nodelist_handler(_pending_requests));
}
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(executor& exec, cql3::query_processor& qp)
: _http_server("http-alternator")
, _https_server("https-alternator")
, _executor(exec)
, _qp(qp)
, _key_cache(1024, 1min, slogger)
, _enforce_authorization(false)
, _enabled_servers{}
, _pending_requests{}
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"UpdateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.update_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.update_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.delete_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_tables(client_state, std::move(permit), std::move(json_request));
}},
{"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.scan(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_endpoints(client_state, std::move(permit), std::move(json_request), req->get_header("Host"));
}},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.batch_write_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.batch_get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.query(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"TagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.tag_resource(client_state, std::move(permit), std::move(json_request));
}},
{"UntagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.untag_resource(client_state, std::move(permit), std::move(json_request));
}},
{"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request));
}},
{"ListStreams", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_streams(client_state, std::move(permit), std::move(json_request));
}},
{"DescribeStream", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_stream(client_state, std::move(permit), std::move(json_request));
}},
{"GetShardIterator", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_shard_iterator(client_state, std::move(permit), std::move(json_request));
}},
{"GetRecords", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_records(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
} {
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
bool enforce_authorization, semaphore* memory_limiter) {
_memory_limiter = memory_limiter;
_enforce_authorization = enforce_authorization;
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
}
return seastar::async([this, addr, port, https_port, creds] {
try {
_executor.start().get();
if (port) {
set_routes(_http_server._routes);
_http_server.set_content_length_limit(server::content_length_limit);
_http_server.listen(socket_address{addr, *port}).get();
_enabled_servers.push_back(std::ref(_http_server));
}
if (https_port) {
set_routes(_https_server._routes);
_https_server.set_content_length_limit(server::content_length_limit);
_https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
if (ep) {
slogger.warn("Exception loading {}: {}", files, ep);
} else {
slogger.info("Reloaded {}", files);
}
}).get0());
_https_server.listen(socket_address{addr, *https_port}).get();
_enabled_servers.push_back(std::ref(_https_server));
}
} catch (...) {
slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
std::throw_with_nested(std::runtime_error(
format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));
}
});
}
future<> server::stop() {
return parallel_for_each(_enabled_servers, [] (http_server& server) {
return server.stop();
}).then([this] {
return _pending_requests.close();
}).then([this] {
return _json_parser.stop();
});
}
server::json_parser::json_parser() : _run_parse_json_thread(async([this] {
while (true) {
_document_waiting.wait().get();
if (_as.abort_requested()) {
return;
}
try {
_parsed_document = rjson::parse_yieldable(_raw_document);
_current_exception = nullptr;
} catch (...) {
_current_exception = std::current_exception();
}
_document_parsed.signal();
}
})) {
}
future<rjson::value> server::json_parser::parse(std::string_view content) {
if (content.size() < yieldable_parsing_threshold) {
return make_ready_future<rjson::value>(rjson::parse(content));
}
return with_semaphore(_parsing_sem, 1, [this, content] {
_raw_document = content;
_document_waiting.signal();
return _document_parsed.wait().then([this] {
if (_current_exception) {
return make_exception_future<rjson::value>(_current_exception);
}
return make_ready_future<rjson::value>(std::move(_parsed_document));
});
});
}
future<> server::json_parser::stop() {
_as.request_abort();
_document_waiting.signal();
_document_parsed.broken();
return std::move(_run_parse_json_thread);
}
}

View File

@@ -1,84 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "alternator/executor.hh"
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include <seastar/net/tls.hh>
#include <optional>
#include "alternator/auth.hh"
#include "utils/small_vector.hh"
#include <seastar/core/units.hh>
namespace alternator {
class server {
static constexpr size_t content_length_limit = 16*MB;
using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,
tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
http_server _http_server;
http_server _https_server;
executor& _executor;
cql3::query_processor& _qp;
key_cache _key_cache;
bool _enforce_authorization;
utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;
gate _pending_requests;
alternator_callbacks_map _callbacks;
semaphore* _memory_limiter;
class json_parser {
static constexpr size_t yieldable_parsing_threshold = 16*KB;
std::string_view _raw_document;
rjson::value _parsed_document;
std::exception_ptr _current_exception;
semaphore _parsing_sem{1};
condition_variable _document_waiting;
condition_variable _document_parsed;
abort_source _as;
future<> _run_parse_json_thread;
public:
json_parser();
future<rjson::value> parse(std::string_view content);
future<> stop();
};
json_parser _json_parser;
public:
server(executor& executor, cql3::query_processor& qp);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
bool enforce_authorization, semaphore* memory_limiter);
future<> stop();
private:
void set_routes(seastar::httpd::routes& r);
future<> verify_signature(const seastar::httpd::request& r);
future<executor::request_return_type> handle_api_request(std::unique_ptr<request>&& req);
};
}

View File

@@ -1,109 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "stats.hh"
#include "utils/histogram_metrics_helper.hh"
#include <seastar/core/metrics.hh>
namespace alternator {
const char* ALTERNATOR_METRICS = "alternator";
stats::stats() : api_operations{} {
// Register the
seastar::metrics::label op("op");
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name);}),
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
OPERATION(create_global_table, "CreateGlobalTable")
OPERATION(create_table, "CreateTable")
OPERATION(delete_backup, "DeleteBackup")
OPERATION(delete_item, "DeleteItem")
OPERATION(delete_table, "DeleteTable")
OPERATION(describe_backup, "DescribeBackup")
OPERATION(describe_continuous_backups, "DescribeContinuousBackups")
OPERATION(describe_endpoints, "DescribeEndpoints")
OPERATION(describe_global_table, "DescribeGlobalTable")
OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")
OPERATION(describe_limits, "DescribeLimits")
OPERATION(describe_table, "DescribeTable")
OPERATION(describe_time_to_live, "DescribeTimeToLive")
OPERATION(get_item, "GetItem")
OPERATION(list_backups, "ListBackups")
OPERATION(list_global_tables, "ListGlobalTables")
OPERATION(list_tables, "ListTables")
OPERATION(list_tags_of_resource, "ListTagsOfResource")
OPERATION(put_item, "PutItem")
OPERATION(query, "Query")
OPERATION(restore_table_from_backup, "RestoreTableFromBackup")
OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")
OPERATION(scan, "Scan")
OPERATION(tag_resource, "TagResource")
OPERATION(transact_get_items, "TransactGetItems")
OPERATION(transact_write_items, "TransactWriteItems")
OPERATION(untag_resource, "UntagResource")
OPERATION(update_continuous_backups, "UpdateContinuousBackups")
OPERATION(update_global_table, "UpdateGlobalTable")
OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")
OPERATION(update_item, "UpdateItem")
OPERATION(update_table, "UpdateTable")
OPERATION(update_time_to_live, "UpdateTimeToLive")
OPERATION_LATENCY(put_item_latency, "PutItem")
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
OPERATION(list_streams, "ListStreams")
OPERATION(describe_stream, "DescribeStream")
OPERATION(get_shard_iterator, "GetShardIterator")
OPERATION(get_records, "GetRecords")
OPERATION_LATENCY(get_records_latency, "GetRecords")
});
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API")),
seastar::metrics::make_total_operations("total_operations", total_operations,
seastar::metrics::description("number of total operations via Alternator API")),
seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
seastar::metrics::description("number of performed read-before-write operations")),
seastar::metrics::make_total_operations("write_using_lwt", write_using_lwt,
seastar::metrics::description("number of writes that used LWT")),
seastar::metrics::make_total_operations("shard_bounce_for_lwt", shard_bounce_for_lwt,
seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements")),
seastar::metrics::make_total_operations("requests_blocked_memory", requests_blocked_memory,
seastar::metrics::description("Counts a number of requests blocked due to memory pressure.")),
seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
});
}
}

View File

@@ -1,103 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <cstdint>
#include <seastar/core/metrics_registration.hh>
#include "seastarx.hh"
#include "utils/estimated_histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
// Object holding per-shard statistics related to Alternator.
// While this object is alive, these metrics are also registered to be
// visible by the metrics REST API, with the "alternator" prefix.
class stats {
public:
stats();
// Count of DynamoDB API operations by types
struct {
uint64_t batch_get_item = 0;
uint64_t batch_write_item = 0;
uint64_t create_backup = 0;
uint64_t create_global_table = 0;
uint64_t create_table = 0;
uint64_t delete_backup = 0;
uint64_t delete_item = 0;
uint64_t delete_table = 0;
uint64_t describe_backup = 0;
uint64_t describe_continuous_backups = 0;
uint64_t describe_endpoints = 0;
uint64_t describe_global_table = 0;
uint64_t describe_global_table_settings = 0;
uint64_t describe_limits = 0;
uint64_t describe_table = 0;
uint64_t describe_time_to_live = 0;
uint64_t get_item = 0;
uint64_t list_backups = 0;
uint64_t list_global_tables = 0;
uint64_t list_tables = 0;
uint64_t list_tags_of_resource = 0;
uint64_t put_item = 0;
uint64_t query = 0;
uint64_t restore_table_from_backup = 0;
uint64_t restore_table_to_point_in_time = 0;
uint64_t scan = 0;
uint64_t tag_resource = 0;
uint64_t transact_get_items = 0;
uint64_t transact_write_items = 0;
uint64_t untag_resource = 0;
uint64_t update_continuous_backups = 0;
uint64_t update_global_table = 0;
uint64_t update_global_table_settings = 0;
uint64_t update_item = 0;
uint64_t update_table = 0;
uint64_t update_time_to_live = 0;
uint64_t list_streams = 0;
uint64_t describe_stream = 0;
uint64_t get_shard_iterator = 0;
uint64_t get_records = 0;
utils::time_estimated_histogram put_item_latency;
utils::time_estimated_histogram get_item_latency;
utils::time_estimated_histogram delete_item_latency;
utils::time_estimated_histogram update_item_latency;
utils::time_estimated_histogram get_records_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;
uint64_t unsupported_operations = 0;
uint64_t reads_before_write = 0;
uint64_t write_using_lwt = 0;
uint64_t shard_bounce_for_lwt = 0;
uint64_t requests_blocked_memory = 0;
// CQL-derived stats
cql3::cql_stats cql_stats;
private:
// The metric_groups object holds this stat object's metrics registered
// as long as the stats object is alive.
seastar::metrics::metric_groups _metrics;
};
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,53 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "serializer.hh"
#include "schema.hh"
#include "db/extensions.hh"
namespace alternator {
class tags_extension : public schema_extension {
public:
static constexpr auto NAME = "scylla_tags";
tags_extension() = default;
explicit tags_extension(const std::map<sstring, sstring>& tags) : _tags(std::move(tags)) {}
explicit tags_extension(bytes b) : _tags(tags_extension::deserialize(b)) {}
explicit tags_extension(const sstring& s) {
throw std::logic_error("Cannot create tags from string");
}
bytes serialize() const override {
return ser::serialize_to_buffer<bytes>(_tags);
}
static std::map<sstring, sstring> deserialize(bytes_view buffer) {
return ser::deserialize_from_buffer(buffer, boost::type<std::map<sstring, sstring>>());
}
const std::map<sstring, sstring>& tags() const {
return _tags;
}
private:
std::map<sstring, sstring> _tags;
};
}

View File

@@ -13,7 +13,7 @@
{
"method":"GET",
"summary":"get row cache save period in seconds",
"type": "long",
"type":"int",
"nickname":"get_row_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -35,7 +35,7 @@
"description":"row cache save period in seconds",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"get key cache save period in seconds",
"type": "long",
"type":"int",
"nickname":"get_key_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -70,7 +70,7 @@
"description":"key cache save period in seconds",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -83,7 +83,7 @@
{
"method":"GET",
"summary":"get counter cache save period in seconds",
"type": "long",
"type":"int",
"nickname":"get_counter_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -105,7 +105,7 @@
"description":"counter cache save period in seconds",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -118,7 +118,7 @@
{
"method":"GET",
"summary":"get row cache keys to save",
"type": "long",
"type":"int",
"nickname":"get_row_cache_keys_to_save",
"produces":[
"application/json"
@@ -140,7 +140,7 @@
"description":"row cache keys to save",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -153,7 +153,7 @@
{
"method":"GET",
"summary":"get key cache keys to save",
"type": "long",
"type":"int",
"nickname":"get_key_cache_keys_to_save",
"produces":[
"application/json"
@@ -175,7 +175,7 @@
"description":"key cache keys to save",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -188,7 +188,7 @@
{
"method":"GET",
"summary":"get counter cache keys to save",
"type": "long",
"type":"int",
"nickname":"get_counter_cache_keys_to_save",
"produces":[
"application/json"
@@ -210,7 +210,7 @@
"description":"counter cache keys to save",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -448,7 +448,7 @@
{
"method": "GET",
"summary": "Get key entries",
"type": "long",
"type": "int",
"nickname": "get_key_entries",
"produces": [
"application/json"
@@ -568,7 +568,7 @@
{
"method": "GET",
"summary": "Get row entries",
"type": "long",
"type": "int",
"nickname": "get_row_entries",
"produces": [
"application/json"
@@ -688,7 +688,7 @@
{
"method": "GET",
"summary": "Get counter entries",
"type": "long",
"type": "int",
"nickname": "get_counter_entries",
"produces": [
"application/json"

View File

@@ -70,7 +70,7 @@
{
"method":"POST",
"summary":"Force a major compaction of this column family",
"type":"void",
"type":"string",
"nickname":"force_major_compaction",
"produces":[
"application/json"
@@ -121,7 +121,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -172,7 +172,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -223,7 +223,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
},
{
@@ -231,7 +231,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -380,54 +380,16 @@
"operations":[
{
"method":"GET",
"summary":"check if the auto_compaction property is enabled for a given table",
"summary":"check if the auto compaction disabled",
"type":"boolean",
"nickname":"get_auto_compaction",
"nickname":"is_auto_compaction_disabled",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The table name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
},
{
"method":"POST",
"summary":"Enable table auto compaction",
"type":"void",
"nickname":"enable_auto_compaction",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The table name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
},
{
"method":"DELETE",
"summary":"Disable table auto compaction",
"type":"void",
"nickname":"disable_auto_compaction",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The table name in keyspace:name format",
"description":"The column family name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -493,7 +455,7 @@
"operations":[
{
"method":"GET",
"summary":"Returns a list of sstable filenames that contain the given partition key on this node",
"summary":"Returns a list of filenames that contain the given key on this node",
"type":"array",
"items":{
"type":"string"
@@ -513,7 +475,7 @@
},
{
"name":"key",
"description":"The partition key. In a composite-key scenario, use ':' to separate the columns in the key.",
"description":"The key",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -582,7 +544,7 @@
"summary":"sstable count for each level. empty unless leveled compaction is used",
"type":"array",
"items":{
"type": "long"
"type":"int"
},
"nickname":"get_sstable_count_per_level",
"produces":[
@@ -649,54 +611,6 @@
}
]
},
{
"path":"/column_family/toppartitions/{name}",
"operations":[
{
"method":"GET",
"summary":"Toppartitions query",
"type":"toppartitions_query_results",
"nickname":"toppartitions",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"duration",
"description":"Duration (in milliseconds) of monitoring operation",
"required":true,
"allowMultiple":false,
"type": "long",
"paramType":"query"
},
{
"name":"list_size",
"description":"number of the top partitions to list",
"required":false,
"allowMultiple":false,
"type": "long",
"paramType":"query"
},
{
"name":"capacity",
"description":"capacity of stream summary: determines amount of resources used in query processing",
"required":false,
"allowMultiple":false,
"type": "long",
"paramType":"query"
}
]
}
]
},
{
"path":"/column_family/metrics/memtable_columns_count/",
"operations":[
@@ -959,7 +873,7 @@
{
"method":"GET",
"summary":"Get memtable switch count",
"type": "long",
"type":"int",
"nickname":"get_memtable_switch_count",
"produces":[
"application/json"
@@ -983,7 +897,7 @@
{
"method":"GET",
"summary":"Get all memtable switch count",
"type": "long",
"type":"int",
"nickname":"get_all_memtable_switch_count",
"produces":[
"application/json"
@@ -1120,7 +1034,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type": "long",
"type":"int",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1273,7 +1187,7 @@
{
"method":"GET",
"summary":"Get all read latency",
"type": "long",
"type":"int",
"nickname":"get_all_read_latency",
"produces":[
"application/json"
@@ -1289,7 +1203,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type": "long",
"type":"int",
"nickname":"get_range_latency",
"produces":[
"application/json"
@@ -1313,7 +1227,7 @@
{
"method":"GET",
"summary":"Get all range latency",
"type": "long",
"type":"int",
"nickname":"get_all_range_latency",
"produces":[
"application/json"
@@ -1329,7 +1243,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type": "long",
"type":"int",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1482,7 +1396,7 @@
{
"method":"GET",
"summary":"Get all write latency",
"type": "long",
"type":"int",
"nickname":"get_all_write_latency",
"produces":[
"application/json"
@@ -1498,7 +1412,7 @@
{
"method":"GET",
"summary":"Get pending flushes",
"type": "long",
"type":"int",
"nickname":"get_pending_flushes",
"produces":[
"application/json"
@@ -1522,7 +1436,7 @@
{
"method":"GET",
"summary":"Get all pending flushes",
"type": "long",
"type":"int",
"nickname":"get_all_pending_flushes",
"produces":[
"application/json"
@@ -1538,7 +1452,7 @@
{
"method":"GET",
"summary":"Get pending compactions",
"type": "long",
"type":"int",
"nickname":"get_pending_compactions",
"produces":[
"application/json"
@@ -1562,7 +1476,7 @@
{
"method":"GET",
"summary":"Get all pending compactions",
"type": "long",
"type":"int",
"nickname":"get_all_pending_compactions",
"produces":[
"application/json"
@@ -1578,7 +1492,7 @@
{
"method":"GET",
"summary":"Get live ss table count",
"type": "long",
"type":"int",
"nickname":"get_live_ss_table_count",
"produces":[
"application/json"
@@ -1602,7 +1516,7 @@
{
"method":"GET",
"summary":"Get all live ss table count",
"type": "long",
"type":"int",
"nickname":"get_all_live_ss_table_count",
"produces":[
"application/json"
@@ -1618,7 +1532,7 @@
{
"method":"GET",
"summary":"Get live disk space used",
"type": "long",
"type":"int",
"nickname":"get_live_disk_space_used",
"produces":[
"application/json"
@@ -1642,7 +1556,7 @@
{
"method":"GET",
"summary":"Get all live disk space used",
"type": "long",
"type":"int",
"nickname":"get_all_live_disk_space_used",
"produces":[
"application/json"
@@ -1658,7 +1572,7 @@
{
"method":"GET",
"summary":"Get total disk space used",
"type": "long",
"type":"int",
"nickname":"get_total_disk_space_used",
"produces":[
"application/json"
@@ -1682,7 +1596,7 @@
{
"method":"GET",
"summary":"Get all total disk space used",
"type": "long",
"type":"int",
"nickname":"get_all_total_disk_space_used",
"produces":[
"application/json"
@@ -2138,7 +2052,7 @@
{
"method":"GET",
"summary":"Get speculative retries",
"type": "long",
"type":"int",
"nickname":"get_speculative_retries",
"produces":[
"application/json"
@@ -2162,7 +2076,7 @@
{
"method":"GET",
"summary":"Get all speculative retries",
"type": "long",
"type":"int",
"nickname":"get_all_speculative_retries",
"produces":[
"application/json"
@@ -2242,7 +2156,7 @@
{
"method":"GET",
"summary":"Get row cache hit out of range",
"type": "long",
"type":"int",
"nickname":"get_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2266,7 +2180,7 @@
{
"method":"GET",
"summary":"Get all row cache hit out of range",
"type": "long",
"type":"int",
"nickname":"get_all_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2282,7 +2196,7 @@
{
"method":"GET",
"summary":"Get row cache hit",
"type": "long",
"type":"int",
"nickname":"get_row_cache_hit",
"produces":[
"application/json"
@@ -2306,7 +2220,7 @@
{
"method":"GET",
"summary":"Get all row cache hit",
"type": "long",
"type":"int",
"nickname":"get_all_row_cache_hit",
"produces":[
"application/json"
@@ -2322,7 +2236,7 @@
{
"method":"GET",
"summary":"Get row cache miss",
"type": "long",
"type":"int",
"nickname":"get_row_cache_miss",
"produces":[
"application/json"
@@ -2346,7 +2260,7 @@
{
"method":"GET",
"summary":"Get all row cache miss",
"type": "long",
"type":"int",
"nickname":"get_all_row_cache_miss",
"produces":[
"application/json"
@@ -2362,7 +2276,7 @@
{
"method":"GET",
"summary":"Get cas prepare",
"type": "long",
"type":"int",
"nickname":"get_cas_prepare",
"produces":[
"application/json"
@@ -2386,7 +2300,7 @@
{
"method":"GET",
"summary":"Get cas propose",
"type": "long",
"type":"int",
"nickname":"get_cas_propose",
"produces":[
"application/json"
@@ -2410,7 +2324,7 @@
{
"method":"GET",
"summary":"Get cas commit",
"type": "long",
"type":"int",
"nickname":"get_cas_commit",
"produces":[
"application/json"
@@ -2902,44 +2816,6 @@
"description":"The column family type"
}
}
},
"toppartitions_record":{
"id":"toppartitions_record",
"description":"nodetool toppartitions query record",
"properties":{
"partition":{
"type":"string",
"description":"Partition key"
},
"count":{
"type":"long",
"description":"Number of read/write operations"
},
"error":{
"type":"long",
"description":"Indication of inaccuracy in counting PKs"
}
}
},
"toppartitions_query_results":{
"id":"toppartitions_query_results",
"description":"nodetool toppartitions query results",
"properties":{
"read":{
"type":"array",
"items":{
"type":"toppartitions_record"
},
"description":"Read results"
},
"write":{
"type":"array",
"items":{
"type":"toppartitions_record"
},
"description":"Write results"
}
}
}
}
}

View File

@@ -118,7 +118,7 @@
{
"method": "GET",
"summary": "Get pending tasks",
"type": "long",
"type": "int",
"nickname": "get_pending_tasks",
"produces": [
"application/json"
@@ -127,24 +127,6 @@
}
]
},
{
"path": "/compaction_manager/metrics/pending_tasks_by_table",
"operations": [
{
"method": "GET",
"summary": "Get pending tasks by table name",
"type": "array",
"items": {
"type": "pending_compaction"
},
"nickname": "get_pending_tasks_by_table",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/compaction_manager/metrics/completed_tasks",
"operations": [
@@ -181,7 +163,7 @@
{
"method": "GET",
"summary": "Get bytes compacted",
"type": "long",
"type": "int",
"nickname": "get_bytes_compacted",
"produces": [
"application/json"
@@ -197,7 +179,7 @@
"description":"A row merged information",
"properties":{
"key":{
"type": "long",
"type":"int",
"description":"The number of sstable"
},
"value":{
@@ -262,23 +244,6 @@
}
}
},
"pending_compaction": {
"id": "pending_compaction",
"properties": {
"cf": {
"type": "string",
"description": "The column family name"
},
"ks": {
"type":"string",
"description": "The keyspace name"
},
"task": {
"type":"long",
"description": "The number of pending tasks"
}
}
},
"history": {
"id":"history",
"description":"Compaction history information",

View File

@@ -1,30 +0,0 @@
"/v2/config/{id}": {
"get": {
"description": "Return a config value",
"operationId": "find_config_id",
"produces": [
"application/json"
],
"tags": ["config"],
"parameters": [
{
"name": "id",
"in": "path",
"description": "ID of config to return",
"required": true,
"type": "string"
}
],
"responses": {
"200": {
"description": "Config value"
},
"default": {
"description": "unexpected error",
"schema": {
"$ref": "#/definitions/ErrorModel"
}
}
}
}
}

View File

@@ -1,90 +0,0 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/error_injection",
"produces":[
"application/json"
],
"apis":[
{
"path":"/v2/error_injection/injection/{injection}",
"operations":[
{
"method":"POST",
"summary":"Activate an injection that triggers an error in code",
"type":"void",
"nickname":"enable_injection",
"produces":[
"application/json"
],
"parameters":[
{
"name":"injection",
"description":"injection name, should correspond to an injection added in code",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"one_shot",
"description":"boolean flag indicating whether the injection should be enabled to trigger only once",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
},
{
"method":"DELETE",
"summary":"Deactivate an injection previously activated by the API",
"type":"void",
"nickname":"disable_injection",
"produces":[
"application/json"
],
"parameters":[
{
"name":"injection",
"description":"injection name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/v2/error_injection/injection",
"operations":[
{
"method":"GET",
"summary":"List all enabled injections on all shards, i.e. injections that will trigger an error in the code",
"type":"array",
"items":{
"type":"string"
},
"nickname":"get_enabled_injections_on_all",
"produces":[
"application/json"
],
"parameters":[]
},
{
"method":"DELETE",
"summary":"Deactivate all injections previously activated on all shards by the API",
"type":"void",
"nickname":"disable_on_all",
"produces":[
"application/json"
],
"parameters":[]
}
]
}
]
}

View File

@@ -110,7 +110,7 @@
{
"method":"GET",
"summary":"Get count down endpoint",
"type": "long",
"type":"int",
"nickname":"get_down_endpoint_count",
"produces":[
"application/json"
@@ -126,7 +126,7 @@
{
"method":"GET",
"summary":"Get count up endpoint",
"type": "long",
"type":"int",
"nickname":"get_up_endpoint_count",
"produces":[
"application/json"
@@ -180,11 +180,11 @@
"description": "The endpoint address"
},
"generation": {
"type": "long",
"type": "int",
"description": "The heart beat generation"
},
"version": {
"type": "long",
"type": "int",
"description": "The heart beat version"
},
"update_time": {
@@ -209,7 +209,7 @@
"description": "Holds a version value for an application state",
"properties": {
"application_state": {
"type": "long",
"type": "int",
"description": "The application state enum index"
},
"value": {
@@ -217,7 +217,7 @@
"description": "The version value"
},
"version": {
"type": "long",
"type": "int",
"description": "The application state version"
}
}

View File

@@ -75,7 +75,7 @@
{
"method":"GET",
"summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",
"type": "long",
"type":"int",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -99,7 +99,7 @@
{
"method":"GET",
"summary":"Get heart beat version for a node",
"type": "long",
"type":"int",
"nickname":"get_current_heart_beat_version",
"produces":[
"application/json"

View File

@@ -99,7 +99,7 @@
{
"method": "GET",
"summary": "Get create hint count",
"type": "long",
"type": "int",
"nickname": "get_create_hint_count",
"produces": [
"application/json"
@@ -123,7 +123,7 @@
{
"method": "GET",
"summary": "Get not stored hints count",
"type": "long",
"type": "int",
"nickname": "get_not_stored_hints_count",
"produces": [
"application/json"

View File

@@ -191,7 +191,7 @@
{
"method":"GET",
"summary":"Get the version number",
"type": "long",
"type":"int",
"nickname":"get_version",
"produces":[
"application/json"
@@ -249,7 +249,7 @@
"MIGRATION_REQUEST",
"PREPARE_MESSAGE",
"PREPARE_DONE_MESSAGE",
"UNUSED__STREAM_MUTATION",
"STREAM_MUTATION",
"STREAM_MUTATION_DONE",
"COMPLETE_MESSAGE",
"REPAIR_CHECKSUM_RANGE",

View File

@@ -68,7 +68,7 @@
"summary":"Get the hinted handoff enabled by dc",
"type":"array",
"items":{
"type":"array"
"type":"mapper_list"
},
"nickname":"get_hinted_handoff_enabled_by_dc",
"produces":[
@@ -105,7 +105,7 @@
{
"method":"GET",
"summary":"Get the max hint window",
"type": "long",
"type":"int",
"nickname":"get_max_hint_window",
"produces":[
"application/json"
@@ -128,7 +128,7 @@
"description":"max hint window in ms",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -141,7 +141,7 @@
{
"method":"GET",
"summary":"Get max hints in progress",
"type": "long",
"type":"int",
"nickname":"get_max_hints_in_progress",
"produces":[
"application/json"
@@ -164,7 +164,7 @@
"description":"max hints in progress",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -177,7 +177,7 @@
{
"method":"GET",
"summary":"get hints in progress",
"type": "long",
"type":"int",
"nickname":"get_hints_in_progress",
"produces":[
"application/json"
@@ -602,7 +602,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "long",
"type": "int",
"nickname": "get_cas_write_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -632,7 +632,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "long",
"type": "int",
"nickname": "get_cas_write_metrics_condition_not_met",
"produces": [
"application/json"
@@ -641,28 +641,13 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_write/failed_read_round_optimization",
"operations": [
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "long",
"nickname": "get_cas_write_metrics_failed_read_round_optimization",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/unfinished_commit",
"operations": [
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "long",
"type": "int",
"nickname": "get_cas_read_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -686,13 +671,28 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/condition_not_met",
"operations": [
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"nickname": "get_cas_read_metrics_condition_not_met",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/timeouts",
"operations": [
{
"method": "GET",
"summary": "Get read metrics",
"type": "long",
"type": "int",
"nickname": "get_read_metrics_timeouts",
"produces": [
"application/json"
@@ -707,7 +707,7 @@
{
"method": "GET",
"summary": "Get read metrics",
"type": "long",
"type": "int",
"nickname": "get_read_metrics_unavailables",
"produces": [
"application/json"
@@ -791,36 +791,6 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS read rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_read_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/view_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get view write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_view_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/moving_average_histogram",
"operations": [
@@ -842,7 +812,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "long",
"type": "int",
"nickname": "get_range_metrics_timeouts",
"produces": [
"application/json"
@@ -857,7 +827,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "long",
"type": "int",
"nickname": "get_range_metrics_unavailables",
"produces": [
"application/json"
@@ -902,7 +872,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "long",
"type": "int",
"nickname": "get_write_metrics_timeouts",
"produces": [
"application/json"
@@ -917,7 +887,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "long",
"type": "int",
"nickname": "get_write_metrics_unavailables",
"produces": [
"application/json"
@@ -986,21 +956,6 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path":"/storage_proxy/metrics/read/estimated_histogram/",
"operations":[
@@ -1023,7 +978,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type": "long",
"type":"int",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1055,7 +1010,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type": "long",
"type":"int",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1087,7 +1042,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type": "long",
"type":"int",
"nickname":"get_range_latency",
"produces":[
"application/json"

View File

@@ -458,7 +458,7 @@
{
"method":"GET",
"summary":"Return the generation value for this node.",
"type": "long",
"type":"int",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -511,21 +511,6 @@
}
]
},
{
"path":"/storage_service/cdc_streams_check_and_repair",
"operations":[
{
"method":"POST",
"summary":"Checks that CDC streams reflect current cluster topology and regenerates them if not.",
"type":"void",
"nickname":"cdc_streams_check_and_repair",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/snapshots",
"operations":[
@@ -597,15 +582,7 @@
},
{
"name":"kn",
"description":"Comma seperated keyspaces name that their snapshot will be deleted",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"an optional table name that its snapshot will be deleted",
"description":"Comma seperated keyspaces name to snapshot",
"required":false,
"allowMultiple":false,
"type":"string",
@@ -669,7 +646,7 @@
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace",
"type": "long",
"type":"int",
"nickname":"force_keyspace_cleanup",
"produces":[
"application/json"
@@ -701,7 +678,7 @@
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",
"type": "long",
"type":"int",
"nickname":"scrub",
"produces":[
"application/json"
@@ -749,7 +726,7 @@
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",
"type": "long",
"type":"int",
"nickname":"upgrade_sstables",
"produces":[
"application/json"
@@ -815,68 +792,13 @@
}
]
},
{
"path":"/storage_service/active_repair/",
"operations":[
{
"method":"GET",
"summary":"Return an array with the ids of the currently active repairs",
"type":"array",
"items":{
"type": "long"
},
"nickname":"get_active_repair_async",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/repair_status/",
"operations":[
{
"method":"GET",
"summary":"Query the repair status and return when the repair is finished or timeout",
"type":"string",
"enum":[
"RUNNING",
"SUCCESSFUL",
"FAILED"
],
"nickname":"repair_await_completion",
"produces":[
"application/json"
],
"parameters":[
{
"name":"id",
"description":"The repair ID to check for status",
"required":true,
"allowMultiple":false,
"type": "long",
"paramType":"query"
},
{
"name":"timeout",
"description":"Seconds to wait before the query returns even if the repair is not finished. The value -1 or not providing this parameter means no timeout",
"required":false,
"allowMultiple":false,
"type": "long",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/repair_async/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",
"type": "long",
"type":"int",
"nickname":"repair_async",
"produces":[
"application/json"
@@ -1007,7 +929,7 @@
"description":"The repair ID to check for status",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -1030,22 +952,6 @@
}
]
},
{
"path":"/storage_service/force_terminate_repair",
"operations":[
{
"method":"POST",
"summary":"Force terminate all repair sessions",
"type":"void",
"nickname":"force_terminate_all_repair_sessions_new",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/decommission",
"operations":[
@@ -1337,18 +1243,18 @@
},
{
"name":"dynamic_update_interval",
"description":"interval in ms (default 100)",
"description":"integer, in ms (default 100)",
"required":false,
"allowMultiple":false,
"type":"long",
"type":"integer",
"paramType":"query"
},
{
"name":"dynamic_reset_interval",
"description":"interval in ms (default 600,000)",
"description":"integer, in ms (default 600,000)",
"required":false,
"allowMultiple":false,
"type":"long",
"type":"integer",
"paramType":"query"
},
{
@@ -1553,7 +1459,7 @@
"description":"Stream throughput",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -1561,7 +1467,7 @@
{
"method":"GET",
"summary":"Get stream throughput mb per sec",
"type": "long",
"type":"int",
"nickname":"get_stream_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1577,7 +1483,7 @@
{
"method":"GET",
"summary":"get compaction throughput mb per sec",
"type": "long",
"type":"int",
"nickname":"get_compaction_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1599,7 +1505,7 @@
"description":"compaction throughput",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -2003,7 +1909,7 @@
{
"method":"GET",
"summary":"Returns the threshold for warning of queries with many tombstones",
"type": "long",
"type":"int",
"nickname":"get_tombstone_warn_threshold",
"produces":[
"application/json"
@@ -2025,7 +1931,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -2038,7 +1944,7 @@
{
"method":"GET",
"summary":"",
"type": "long",
"type":"int",
"nickname":"get_tombstone_failure_threshold",
"produces":[
"application/json"
@@ -2060,7 +1966,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -2073,7 +1979,7 @@
{
"method":"GET",
"summary":"Returns the threshold for rejecting queries due to a large batch size",
"type": "long",
"type":"int",
"nickname":"get_batch_size_failure_threshold",
"produces":[
"application/json"
@@ -2095,7 +2001,7 @@
"description":"batch size debug threshold",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -2119,7 +2025,7 @@
"description":"throttle in kb",
"required":true,
"allowMultiple":false,
"type": "long",
"type":"int",
"paramType":"query"
}
]
@@ -2132,7 +2038,7 @@
{
"method":"GET",
"summary":"Get load",
"type": "long",
"type":"int",
"nickname":"get_metrics_load",
"produces":[
"application/json"
@@ -2148,7 +2054,7 @@
{
"method":"GET",
"summary":"Get exceptions",
"type": "long",
"type":"int",
"nickname":"get_exceptions",
"produces":[
"application/json"
@@ -2164,7 +2070,7 @@
{
"method":"GET",
"summary":"Get total hints in progress",
"type": "long",
"type":"int",
"nickname":"get_total_hints_in_progress",
"produces":[
"application/json"
@@ -2180,7 +2086,7 @@
{
"method":"GET",
"summary":"Get total hints",
"type": "long",
"type":"int",
"nickname":"get_total_hints",
"produces":[
"application/json"
@@ -2189,77 +2095,7 @@
]
}
]
},
{
"path":"/storage_service/view_build_statuses/{keyspace}/{view}",
"operations":[
{
"method":"GET",
"summary":"Gets the progress of a materialized view build",
"type":"array",
"items":{
"type":"mapper"
},
"nickname":"view_build_statuses",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"view",
"description":"View name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/storage_service/sstable_info",
"operations":[
{
"method":"GET",
"summary":"SSTable information",
"type":"array",
"items":{
"type":"table_sstables"
},
"nickname":"sstable_info",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
}
],
"models":{
"mapper":{
@@ -2323,11 +2159,11 @@
"description":"The column family"
},
"total":{
"type":"long",
"type":"int",
"description":"The total snapshot size"
},
"live":{
"type":"long",
"type":"int",
"description":"The live snapshot size"
}
}
@@ -2419,92 +2255,6 @@
"description":"The endpoint details"
}
}
},
"named_maps":{
"id":"named_maps",
"properties":{
"group":{
"type":"string"
},
"attributes":{
"type":"array",
"items":{
"type":"mapper"
}
}
}
},
"sstable":{
"id":"sstable",
"properties":{
"size":{
"type":"long",
"description":"Total size in bytes of sstable"
},
"data_size":{
"type":"long",
"description":"The size in bytes on disk of data"
},
"index_size":{
"type":"long",
"description":"The size in bytes on disk of index"
},
"filter_size":{
"type":"long",
"description":"The size in bytes on disk of filter"
},
"timestamp":{
"type":"datetime",
"description":"File creation time"
},
"generation":{
"type":"long",
"description":"SSTable generation"
},
"level":{
"type":"long",
"description":"SSTable level"
},
"version":{
"type":"string",
"enum":[
"ka", "la", "mc", "md"
],
"description":"SSTable version"
},
"properties":{
"type":"array",
"description":"SSTable attributes",
"items":{
"type":"mapper"
}
},
"extended_properties":{
"type":"array",
"description":"SSTable extended attributes",
"items":{
"type":"named_maps"
}
}
}
},
"table_sstables":{
"id":"table_sstables",
"description":"Per-table SSTable info and attributes",
"properties":{
"keyspace":{
"type":"string"
},
"table":{
"type":"string"
},
"sstables":{
"type":"array",
"items":{
"$ref":"sstable"
}
}
}
}
}
}

View File

@@ -32,7 +32,7 @@
{
"method":"GET",
"summary":"Get number of active outbound streams",
"type": "long",
"type":"int",
"nickname":"get_all_active_streams_outbound",
"produces":[
"application/json"
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"Get total incoming bytes",
"type": "long",
"type":"int",
"nickname":"get_total_incoming_bytes",
"produces":[
"application/json"
@@ -72,7 +72,7 @@
{
"method":"GET",
"summary":"Get all total incoming bytes",
"type": "long",
"type":"int",
"nickname":"get_all_total_incoming_bytes",
"produces":[
"application/json"
@@ -88,7 +88,7 @@
{
"method":"GET",
"summary":"Get total outgoing bytes",
"type": "long",
"type":"int",
"nickname":"get_total_outgoing_bytes",
"produces":[
"application/json"
@@ -112,7 +112,7 @@
{
"method":"GET",
"summary":"Get all total outgoing bytes",
"type": "long",
"type":"int",
"nickname":"get_all_total_outgoing_bytes",
"produces":[
"application/json"
@@ -154,7 +154,7 @@
"description":"The peer"
},
"session_index":{
"type": "long",
"type":"int",
"description":"The session index"
},
"connecting":{
@@ -211,7 +211,7 @@
"description":"The ID"
},
"files":{
"type": "long",
"type":"int",
"description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."
},
"total_size":{
@@ -242,7 +242,7 @@
"description":"The peer address"
},
"session_index":{
"type": "long",
"type":"int",
"description":"The session index"
},
"file_name":{

View File

@@ -1,29 +0,0 @@
{
"swagger": "2.0",
"info": {
"version": "1.0.0",
"title": "Scylla API",
"description": "The scylla API version 2.0",
"termsOfService": "http://www.scylladb.com/tos/",
"contact": {
"name": "Scylla Team",
"email": "info@scylladb.com",
"url": "http://scylladb.com"
},
"license": {
"name": "AGPL",
"url": "https://github.com/scylladb/scylla/blob/master/LICENSE.AGPL"
}
},
"host": "{{Host}}",
"basePath": "/v2",
"schemes": [
"http"
],
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"paths": {

View File

@@ -52,21 +52,6 @@
}
]
},
{
"path":"/system/uptime_ms",
"operations":[
{
"method":"GET",
"summary":"Get system uptime, in milliseconds",
"type":"long",
"nickname":"get_system_uptime",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/system/logger/{name}",
"operations":[

View File

@@ -20,9 +20,9 @@
*/
#include "api.hh"
#include <seastar/http/file_handler.hh>
#include <seastar/http/transformers.hh>
#include <seastar/http/api_docs.hh>
#include "http/file_handler.hh"
#include "http/transformers.hh"
#include "http/api_docs.hh"
#include "storage_service.hh"
#include "commitlog.hh"
#include "gossiper.hh"
@@ -36,13 +36,9 @@
#include "endpoint_snitch.hh"
#include "compaction_manager.hh"
#include "hinted_handoff.hh"
#include "error_injection.hh"
#include <seastar/http/exception.hh>
#include "http/exception.hh"
#include "stream_manager.hh"
#include "system.hh"
#include "api/config.hh"
logging::logger apilog("api");
namespace api {
@@ -53,35 +49,25 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
throw bad_param_exception(ex.what());
}
// We never going to get here
throw std::runtime_error("exception_reply");
return std::make_unique<reply>();
}
future<> set_server_init(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
auto rb02 = std::make_shared < api_registry_builder20 > (ctx.api_doc, "/v2");
return ctx.http_server.set_routes([rb, &ctx, rb02](routes& r) {
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
r.register_exeption_handler(exception_reply);
r.put(GET, "/ui", new httpd::file_handler(ctx.api_dir + "/index.html",
new content_replace("html")));
r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
new content_replace("html")));
rb->set_api_doc(r);
rb02->set_api_doc(r);
rb02->register_api_file(r, "swagger20_header");
rb->register_function(r, "system",
"The system related API");
set_system(ctx, r);
});
}
future<> set_server_config(http_context& ctx) {
auto rb02 = std::make_shared < api_registry_builder20 > (ctx.api_doc, "/v2");
return ctx.http_server.set_routes([&ctx, rb02](routes& r) {
set_config(rb02, ctx, r);
});
}
static future<> register_api(http_context& ctx, const sstring& api_name,
const sstring api_desc,
std::function<void(http_context& ctx, routes& r)> f) {
@@ -93,42 +79,10 @@ static future<> register_api(http_context& ctx, const sstring& api_name,
});
}
future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl) {
return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_transport_controller(ctx, r, ctl); });
}
future<> unset_transport_controller(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_transport_controller(ctx, r); });
}
future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl) {
return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_rpc_controller(ctx, r, ctl); });
}
future<> unset_rpc_controller(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_rpc_controller(ctx, r); });
}
future<> set_server_storage_service(http_context& ctx) {
return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
}
future<> set_server_repair(http_context& ctx, sharded<netw::messaging_service>& ms) {
return ctx.http_server.set_routes([&ctx, &ms] (routes& r) { set_repair(ctx, r, ms); });
}
future<> unset_server_repair(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_repair(ctx, r); });
}
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl) {
return ctx.http_server.set_routes([&ctx, &snap_ctl] (routes& r) { set_snapshot(ctx, r, snap_ctl); });
}
future<> unset_server_snapshot(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });
}
future<> set_server_snitch(http_context& ctx) {
return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", set_endpoint_snitch);
}
@@ -143,14 +97,9 @@ future<> set_server_load_sstable(http_context& ctx) {
"The column family API", set_column_family);
}
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms) {
future<> set_server_messaging_service(http_context& ctx) {
return register_api(ctx, "messaging_service",
"The messaging service API", [&ms] (http_context& ctx, routes& r) {
set_messaging_service(ctx, r, ms);
});
}
future<> unset_server_messaging_service(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_messaging_service(ctx, r); });
"The messaging service API", set_messaging_service);
}
future<> set_server_storage_proxy(http_context& ctx) {
@@ -163,11 +112,6 @@ future<> set_server_stream_manager(http_context& ctx) {
"The stream manager API", set_stream_manager);
}
future<> set_server_cache(http_context& ctx) {
return register_api(ctx, "cache_service",
"The cache service API", set_cache_service);
}
future<> set_server_gossip_settle(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
@@ -175,6 +119,9 @@ future<> set_server_gossip_settle(http_context& ctx) {
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
});
}
@@ -197,9 +144,6 @@ future<> set_server_done(http_context& ctx) {
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
rb->register_function(r, "error_injection",
"The error injection API");
set_error_injection(ctx, r);
});
}

View File

@@ -21,17 +21,14 @@
#pragma once
#include <seastar/json/json_elements.hh>
#include <type_traits>
#include "json/json_elements.hh"
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <boost/units/detail/utility.hpp>
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include <seastar/http/exception.hh>
#include "http/exception.hh"
#include "api_init.hh"
#include "seastarx.hh"
namespace api {
@@ -218,44 +215,4 @@ std::vector<T> concat(std::vector<T> a, std::vector<T>&& b) {
return a;
}
template <class T, class Base = T>
class req_param {
public:
sstring name;
sstring param;
T value;
req_param(const request& req, sstring name, T default_val) : name(name) {
param = req.get_query_param(name);
if (param.empty()) {
value = default_val;
return;
}
try {
// boost::lexical_cast does not use boolalpha. Converting a
// true/false throws exceptions. We don't want that.
if constexpr (std::is_same_v<Base, bool>) {
// Cannot use boolalpha because we (probably) want to
// accept 1 and 0 as well as true and false. And True. And fAlse.
std::transform(param.begin(), param.end(), param.begin(), ::tolower);
if (param == "true" || param == "1") {
value = T(true);
} else if (param == "false" || param == "0") {
value = T(false);
} else {
throw boost::bad_lexical_cast{};
}
} else {
value = T{boost::lexical_cast<Base>(param)};
}
} catch (boost::bad_lexical_cast&) {
throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
}
}
operator T() const { return value; }
};
utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);
}

View File

@@ -19,16 +19,9 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "database_fwd.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <seastar/http/httpd.hh>
namespace service { class load_meter; }
namespace locator { class shared_token_metadata; }
namespace cql_transport { class controller; }
class thrift_controller;
namespace db { class snapshot_ctl; }
namespace netw { class messaging_service; }
#include "http/httpd.hh"
namespace api {
@@ -38,38 +31,22 @@ struct http_context {
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
const sharded<locator::shared_token_metadata>& shared_token_metadata;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)
: db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm) {
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
}
const locator::token_metadata& get_token_metadata();
};
future<> set_server_init(http_context& ctx);
future<> set_server_config(http_context& ctx);
future<> set_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx);
future<> set_server_repair(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_repair(http_context& ctx);
future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);
future<> unset_transport_controller(http_context& ctx);
future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);
future<> unset_rpc_controller(http_context& ctx);
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_snapshot(http_context& ctx);
future<> set_server_gossip(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx);
future<> set_server_cache(http_context& ctx);
future<> set_server_done(http_context& ctx);
}

View File

@@ -208,11 +208,9 @@ void set_cache_service(http_context& ctx, routes& r) {
});
cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([](database& db) -> uint64_t {
return db.row_cache_tracker().region().occupancy().used_space();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
return map_reduce_cf(ctx, uint64_t(0), [](const column_family& cf) {
return cf.get_row_cache().get_cache_tracker().region().occupancy().used_space();
}, std::plus<uint64_t>());
});
cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -253,19 +251,15 @@ void set_cache_service(http_context& ctx, routes& r) {
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead
return ctx.db.map_reduce0([](database& db) -> uint64_t {
return db.row_cache_tracker().partitions();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().num_entries();
}, std::plus<uint64_t>());
});
cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([](database& db) -> uint64_t {
return db.row_cache_tracker().partitions();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().num_entries();
}, std::plus<uint64_t>());
});
cs::get_counter_capacity.set(r, [] (std::unique_ptr<request> req) {

View File

@@ -21,8 +21,8 @@
#include "collectd.hh"
#include "api/api-doc/collectd.json.hh"
#include <seastar/core/scollectd.hh>
#include <seastar/core/scollectd_api.hh>
#include "core/scollectd.hh"
#include "core/scollectd_api.hh"
#include "endian.h"
#include <boost/range/irange.hpp>
#include <regex>
@@ -64,7 +64,7 @@ static const char* str_to_regex(const sstring& v) {
void set_collectd(http_context& ctx, routes& r) {
cd::get_collectd.set(r, [&ctx](std::unique_ptr<request> req) {
auto id = ::make_shared<scollectd::type_instance_id>(req->param["pluginid"],
auto id = make_shared<scollectd::type_instance_id>(req->param["pluginid"],
req->get_query_param("instance"), req->get_query_param("type"),
req->get_query_param("type_instance"));

View File

@@ -22,14 +22,10 @@
#include "column_family.hh"
#include "api/api-doc/column_family.json.hh"
#include <vector>
#include <seastar/http/exception.hh>
#include "http/exception.hh"
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/system_keyspace_view_types.hh"
#include "db/data_listeners.hh"
extern logging::logger apilog;
namespace api {
using namespace httpd;
@@ -38,7 +34,7 @@ using namespace std;
using namespace json;
namespace cf = httpd::column_family_json;
std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto pos = name.find("%3A");
size_t end;
if (pos == sstring::npos) {
@@ -50,22 +46,14 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
} else {
end = pos + 3;
}
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {
try {
return db.find_uuid(ks, cf);
return db.find_uuid(name.substr(0, pos), name.substr(end));
} catch (std::out_of_range& e) {
throw bad_param_exception(format("Column family '{}:{}' not found", ks, cf));
throw bad_param_exception("Column family '" + name.substr(0, pos) + ":"
+ name.substr(end) + "' not found");
}
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
return get_uuid(ks, cf, db);
}
future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {
auto uuid = get_uuid(name, ctx.db.local());
@@ -75,28 +63,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family_stats::*f) {
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family_stats::*f) {
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, f](database& db) {
// Histograms information is sample of the actual load
@@ -112,14 +100,14 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
@@ -130,7 +118,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
@@ -146,7 +134,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
@@ -157,7 +145,7 @@ static future<json::json_return_type> get_cf_rate_and_histogram(http_context& c
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db) {
utils::rate_moving_average_and_histogram res;
for (auto i : db.get_column_families()) {
@@ -178,27 +166,36 @@ static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ct
}, std::plus<int64_t>());
}
static int64_t min_partition_size(column_family& cf) {
static int64_t min_row_size(column_family& cf) {
int64_t res = INT64_MAX;
for (auto i: *cf.get_sstables() ) {
res = std::min(res, i->get_stats_metadata().estimated_partition_size.min());
res = std::min(res, i->get_stats_metadata().estimated_row_size.min());
}
return (res == INT64_MAX) ? 0 : res;
}
static int64_t max_partition_size(column_family& cf) {
static int64_t max_row_size(column_family& cf) {
int64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res = std::max(i->get_stats_metadata().estimated_partition_size.max(), res);
res = std::max(i->get_stats_metadata().estimated_row_size.max(), res);
}
return res;
}
static integral_ratio_holder mean_partition_size(column_family& cf) {
static double update_ratio(double acc, double f, double total) {
if (f && !total) {
throw bad_param_exception("total should include all elements");
} else if (total) {
acc += f / total;
}
return acc;
}
static integral_ratio_holder mean_row_size(column_family& cf) {
integral_ratio_holder res;
for (auto i: *cf.get_sstables() ) {
auto c = i->get_stats_metadata().estimated_partition_size.count();
res.sub += i->get_stats_metadata().estimated_partition_size.mean() * c;
auto c = i->get_stats_metadata().estimated_row_size.count();
res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;
res.total += c;
}
return res;
@@ -249,22 +246,17 @@ static future<json::json_return_type> sum_sstable(http_context& ctx, bool total)
});
}
future<json::json_return_type> map_reduce_cf_time_histogram(http_context& ctx, const sstring& name, std::function<utils::time_estimated_histogram(const column_family&)> f) {
return map_reduce_cf_raw(ctx, name, utils::time_estimated_histogram(), f, utils::time_estimated_histogram_merge).then([](const utils::time_estimated_histogram& res) {
return make_ready_future<json::json_return_type>(time_to_json_histogram(res));
});
}
template <typename T>
class sum_ratio {
uint64_t _n = 0;
T _total = 0;
public:
void operator()(T value) {
future<> operator()(T value) {
if (value > 0) {
_total += value;
_n++;
}
return make_ready_future<>();
}
// Returns average value of all registered ratios.
T get() && {
@@ -291,16 +283,6 @@ static std::vector<uint64_t> concat_sstable_count_per_level(std::vector<uint64_t
return a;
}
ratio_holder filter_false_positive_as_ratio_holder(const sstables::shared_sstable& sst) {
double f = sst->filter_get_false_positive();
return ratio_holder(f + sst->filter_get_true_positive(), f);
}
ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared_sstable& sst) {
double f = sst->filter_get_recent_false_positive();
return ratio_holder(f + sst->filter_get_recent_true_positive(), f);
}
void set_column_family(http_context& ctx, routes& r) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
vector<sstring> res;
@@ -413,31 +395,29 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);
});
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);
return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_partition_size);
res.merge(i->get_stats_metadata().estimated_row_size);
}
return res;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
uint64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res += i->get_stats_metadata().estimated_partition_size.count();
res += i->get_stats_metadata().estimated_row_size.count();
}
return res;
},
@@ -448,7 +428,7 @@ void set_column_family(http_context& ctx, routes& r) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_cells_count);
res.merge(i->get_stats_metadata().estimated_column_count);
}
return res;
},
@@ -462,67 +442,67 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);
});
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::pending_flushes);
return get_cf_stats(ctx, &column_family::stats::pending_flushes);
});
cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);
return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);
});
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family_stats::reads);
return get_cf_stats_count(ctx, &column_family::stats::reads);
});
cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);
return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);
});
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family_stats::writes);
return get_cf_stats_count(ctx, &column_family::stats::writes);
});
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
});
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
});
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family_stats::writes);
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family_stats::writes);
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -538,11 +518,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);
return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);
});
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::live_sstable_count);
return get_cf_stats(ctx, &column_family::stats::live_sstable_count);
});
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -565,36 +545,30 @@ void set_column_family(http_context& ctx, routes& r) {
return sum_sstable(ctx, true);
});
// FIXME: this refers to partitions, not rows.
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_row_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, INT64_MAX, min_partition_size, min_int64);
return map_reduce_cf(ctx, INT64_MAX, min_row_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), max_partition_size, max_int64);
return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
});
// FIXME: this refers to partitions, not rows.
cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
});
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -630,27 +604,39 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_false_positive();
return update_ratio(s, f, f + sst->filter_get_true_positive());
});
}, std::plus<double>());
});
cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, req->param["name"], double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
return map_reduce_cf(ctx, double(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), double(0), [](double s, auto& sst) {
double f = sst->filter_get_recent_false_positive();
return update_ratio(s, f, f + sst->filter_get_recent_true_positive());
});
}, std::plus<double>());
});
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -801,22 +787,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return cf.get_stats().estimated_cas_prepare;
});
cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return cf.get_stats().estimated_cas_accept;
});
cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return cf.get_stats().estimated_cas_learn;
});
cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -827,11 +816,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -842,51 +831,19 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
cf::get_auto_compaction.set(r, [&ctx] (const_req req) {
const utils::UUID& uuid = get_uuid(req.param["name"], ctx.db.local());
column_family& cf = ctx.db.local().find_column_family(uuid);
return !cf.is_auto_compaction_disabled_by_user();
cf::is_auto_compaction_disabled.set(r, [] (const_req req) {
// FIXME
// currently auto compaction is disable
// it should be changed when it would have an API
return true;
});
cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
return foreach_column_family(ctx, req->param["name"], [](column_family &cf) {
cf.enable_auto_compaction();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
cf::get_built_indexes.set(r, [](const_req) {
// FIXME
// Currently there are no index support
return std::vector<sstring>();
});
cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
return foreach_column_family(ctx, req->param["name"], [](column_family &cf) {
cf.disable_auto_compaction();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);
auto&& ks = std::get<0>(ks_cf);
auto&& cf_name = std::get<1>(ks_cf);
return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
std::set<sstring> vp;
for (auto b : vb) {
if (b.view.first == ks) {
vp.insert(b.view.second);
}
}
std::vector<sstring> res;
auto uuid = get_uuid(ks, cf_name, ctx.db.local());
column_family& cf = ctx.db.local().find_column_family(uuid);
res.reserve(cf.get_index_manager().list_indexes().size());
for (auto&& i : cf.get_index_manager().list_indexes()) {
if (!vp.contains(secondary_index::index_table_name(i.metadata().name()))) {
res.emplace_back(i.metadata().name());
}
}
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
// FIXME
@@ -914,15 +871,17 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_read;
});
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_write;
});
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -957,70 +916,5 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<request> req) {
auto key = req->get_query_param("key");
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.map_reduce0([key, uuid] (database& db) {
return db.find_column_family(uuid).get_sstables_by_partition_key(key);
}, std::unordered_set<sstring>(),
[](std::unordered_set<sstring> a, std::unordered_set<sstring>&& b) mutable {
a.insert(b.begin(),b.end());
return a;
}).then([](const std::unordered_set<sstring>& res) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
});
cf::toppartitions.set(r, [&ctx] (std::unique_ptr<request> req) {
auto name_param = req->param["name"];
auto [ks, cf] = parse_fully_qualified_cf_name(name_param);
api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};
api::req_param<unsigned> capacity(*req, "capacity", 256);
api::req_param<unsigned> list_size(*req, "list_size", 10);
apilog.info("toppartitions query: name={} duration={} list_size={} capacity={}",
name_param, duration.param, list_size.param, capacity.param);
return seastar::do_with(db::toppartitions_query(ctx.db, ks, cf, duration.value, list_size, capacity), [&ctx](auto& q) {
return q.scatter().then([&q] {
return sleep(q.duration()).then([&q] {
return q.gather(q.capacity()).then([&q] (auto topk_results) {
apilog.debug("toppartitions query: processing results");
cf::toppartitions_query_results results;
for (auto& d: topk_results.read.top(q.list_size())) {
cf::toppartitions_record r;
r.partition = sstring(d.item);
r.count = d.count;
r.error = d.error;
results.read.push(r);
}
for (auto& d: topk_results.write.top(q.list_size())) {
cf::toppartitions_record r;
r.partition = sstring(d.item);
r.count = d.count;
r.error = d.error;
results.write.push(r);
}
return make_ready_future<json::json_return_type>(results);
});
});
});
});
});
cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
if (req->get_query_param("split_output") != "") {
fail(unimplemented::cause::API);
}
return foreach_column_family(ctx, req->param["name"], [](column_family &cf) {
return cf.compact_all_sstables();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
}
}

View File

@@ -24,8 +24,6 @@
#include "api.hh"
#include "api/api-doc/column_family.json.hh"
#include "database.hh"
#include <seastar/core/future-util.hh>
#include <any>
namespace api {
@@ -39,15 +37,9 @@ template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
using mapper_type = std::function<std::unique_ptr<std::any>(database&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {
return std::make_unique<std::any>(I(mapper(db.find_column_family(uuid))));
}), std::make_unique<std::any>(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
})).then([] (std::unique_ptr<std::any> r) {
return std::any_cast<I>(std::move(*r));
});
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer);
}
@@ -59,46 +51,35 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
});
}
template<class Mapper, class I, class Reducer, class Result>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer);
}
template<class Mapper, class I, class Reducer, class Result>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([result](const I& res) mutable {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer, result).then([result](const I& res) mutable {
result = res;
return make_ready_future<json::json_return_type>(result);
});
}
future<json::json_return_type> map_reduce_cf_time_histogram(http_context& ctx, const sstring& name, std::function<utils::time_estimated_histogram(const column_family&)> f);
struct map_reduce_column_families_locally {
std::any init;
std::function<std::unique_ptr<std::any>(column_family&)> mapper;
std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;
future<std::unique_ptr<std::any>> operator()(database& db) const {
auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));
return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
*res = std::move(reducer(std::move(*res), mapper(*i.second.get())));
}).then([res] {
return std::move(*res);
});
}
};
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
using mapper_type = std::function<std::unique_ptr<std::any>(column_family&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {
return std::make_unique<std::any>(I(mapper(cf)));
});
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
});
return ctx.db.map_reduce0(map_reduce_column_families_locally{init,
std::move(wrapped_mapper), wrapped_reducer}, std::make_unique<std::any>(init), wrapped_reducer).then([] (std::unique_ptr<std::any> res) {
return std::any_cast<I>(std::move(*res));
});
return ctx.db.map_reduce0([mapper, init, reducer](database& db) {
auto res = init;
for (auto i : db.get_column_families()) {
res = reducer(res, mapper(*i.second.get()));
}
return res;
}, init, reducer);
}
@@ -111,9 +92,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family_stats::*f);
int64_t column_family::stats::*f);
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family_stats::*f);
int64_t column_family::stats::*f);
}

View File

@@ -20,18 +20,17 @@
*/
#include "commitlog.hh"
#include "db/commitlog/commitlog.hh"
#include <db/commitlog/commitlog.hh>
#include "api/api-doc/commitlog.json.hh"
#include "database.hh"
#include <vector>
namespace api {
template<typename T>
static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {
typedef T ret_type;
template<typename Func>
static auto acquire_cl_metric(http_context& ctx, Func&& func) {
typedef std::result_of_t<Func(db::commitlog *)> ret_type;
return ctx.db.map_reduce0([func = std::move(func)](database& db) {
return ctx.db.map_reduce0([func = std::forward<Func>(func)](database& db) {
if (db.commitlog() == nullptr) {
return make_ready_future<ret_type>();
}
@@ -64,15 +63,15 @@ void set_commitlog(http_context& ctx, routes& r) {
});
httpd::commitlog_json::get_completed_tasks.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));
});
httpd::commitlog_json::get_pending_tasks.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));
});
httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
});
}

View File

@@ -20,14 +20,13 @@
*/
#include "compaction_manager.hh"
#include "sstables/compaction_manager.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
#include <utility>
namespace api {
using namespace scollectd;
namespace cm = httpd::compaction_manager_json;
using namespace json;
@@ -39,16 +38,6 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,
return make_ready_future<json::json_return_type>(res);
});
}
static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> sum_pending_tasks(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>&& a,
const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& b) {
for (auto&& i : b) {
if (i.second) {
a[i.first] += i.second;
}
}
return std::move(a);
}
void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -58,8 +47,8 @@ void set_compaction_manager(http_context& ctx, routes& r) {
for (const auto& c : cm.get_compactions()) {
cm::summary s;
s.ks = c->ks_name;
s.cf = c->cf_name;
s.ks = c->ks;
s.cf = c->cf;
s.unit = "keys";
s.task_type = sstables::compaction_name(c->type);
s.completed = c->total_keys_written;
@@ -72,32 +61,6 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
});
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([&ctx](database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
return make_ready_future<>();
}).then([&tasks] {
return std::move(tasks);
});
});
}, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(
[](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {
std::vector<cm::pending_compaction> res;
res.reserve(task_map.size());
for (auto i : task_map) {
cm::pending_compaction task;
task.ks = i.first.first;
task.cf = i.first.second;
task.task = i.second;
res.emplace_back(std::move(task));
}
return make_ready_future<json::json_return_type>(res);
});
});
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
@@ -140,37 +103,29 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {
std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first] {
return db::system_keyspace::get_compaction_history([&s, &first](const db::system_keyspace::compaction_history_entry& entry) mutable {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
auto fut = first ? make_ready_future<>() : s.write(", ");
first = false;
return fut.then([&s, h = std::move(h)] {
return formatter::write(s, h);
});
}).then([&s] {
return s.write("]").then([&s] {
return s.close();
});
});
});
});
};
return make_ready_future<json::json_return_type>(std::move(f));
return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {
std::vector<cm::history> res;
res.reserve(history.size());
for (auto& entry : history) {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
res.push_back(std::move(h));
}
return make_ready_future<json::json_return_type>(res);
});
});
cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {

View File

@@ -1,119 +0,0 @@
/*
* Copyright 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "api/config.hh"
#include "api/api-doc/config.json.hh"
#include "db/config.hh"
#include "database.hh"
#include <sstream>
#include <boost/algorithm/string/replace.hpp>
namespace api {
template<class T>
json::json_return_type get_json_return_type(const T& val) {
return json::json_return_type(val);
}
/*
* As commented on db::seed_provider_type is not used
* and probably never will.
*
* Just in case, we will return its name
*/
template<>
json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
return json::json_return_type(val.class_name);
}
std::string_view format_type(std::string_view type) {
if (type == "int") {
return "integer";
}
return type;
}
future<> get_config_swagger_entry(std::string_view name, const std::string& description, std::string_view type, bool& first, output_stream<char>& os) {
std::stringstream ss;
if (first) {
first=false;
} else {
ss <<',';
};
ss << "\"/config/" << name <<"\": {"
"\"get\": {"
"\"description\": \"" << boost::replace_all_copy(boost::replace_all_copy(boost::replace_all_copy(description,"\n","\\n"),"\"", "''"), "\t", " ") <<"\","
"\"operationId\": \"find_config_"<< name <<"\","
"\"produces\": ["
"\"application/json\""
"],"
"\"tags\": [\"config\"],"
"\"parameters\": ["
"],"
"\"responses\": {"
"\"200\": {"
"\"description\": \"Config value\","
"\"schema\": {"
"\"type\": \"" << format_type(type) << "\""
"}"
"},"
"\"default\": {"
"\"description\": \"unexpected error\","
"\"schema\": {"
"\"$ref\": \"#/definitions/ErrorModel\""
"}"
"}"
"}"
"}"
"}";
return os.write(ss.str());
}
namespace cs = httpd::config_json;
void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {
rb->register_function(r, [&ctx] (output_stream<char>& os) {
return do_with(true, [&os, &ctx] (bool& first) {
auto f = make_ready_future();
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
f = f.then([&os, &first, &cfg] {
return get_config_swagger_entry(cfg.name(), std::string(cfg.desc()), cfg.type_name(), first, os);
});
}
return f;
});
});
cs::find_config_id.set(r, [&ctx] (const_req r) {
auto id = r.param["id"];
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
if (id == cfg.name()) {
return cfg.value_as_json();
}
}
throw bad_param_exception(sstring("No such config entry: ") + id);
});
}
}

View File

@@ -1,30 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "api.hh"
#include <seastar/http/api_docs.hh>
namespace api {
void set_config(std::shared_ptr<api_registry_builder20> rb, http_context& ctx, routes& r);
}

View File

@@ -1,69 +0,0 @@
/*
* Copyright (C) 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "api/api-doc/error_injection.json.hh"
#include "api/api.hh"
#include <seastar/http/exception.hh>
#include "log.hh"
#include "utils/error_injection.hh"
#include "seastar/core/future-util.hh"
namespace api {
namespace hf = httpd::error_injection_json;
void set_error_injection(http_context& ctx, routes& r) {
hf::enable_injection.set(r, [](std::unique_ptr<request> req) {
sstring injection = req->param["injection"];
bool one_shot = req->get_query_param("one_shot") == "True";
auto& errinj = utils::get_local_injector();
return errinj.enable_on_all(injection, one_shot).then([] {
return make_ready_future<json::json_return_type>(json::json_void());
});
});
hf::get_enabled_injections_on_all.set(r, [](std::unique_ptr<request> req) {
auto& errinj = utils::get_local_injector();
auto ret = errinj.enabled_injections_on_all();
return make_ready_future<json::json_return_type>(ret);
});
hf::disable_injection.set(r, [](std::unique_ptr<request> req) {
sstring injection = req->param["injection"];
auto& errinj = utils::get_local_injector();
return errinj.disable_on_all(injection).then([] {
return make_ready_future<json::json_return_type>(json::json_void());
});
});
hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {
auto& errinj = utils::get_local_injector();
return errinj.disable_on_all().then([] {
return make_ready_future<json::json_return_type>(json::json_void());
});
});
}
} // namespace api

View File

@@ -1,30 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "api.hh"
namespace api {
void set_error_injection(http_context& ctx, routes& r);
}

View File

@@ -21,7 +21,7 @@
#include "gossiper.hh"
#include "api/api-doc/gossiper.json.hh"
#include "gms/gossiper.hh"
#include <gms/gossiper.hh>
namespace api {
using namespace json;

View File

@@ -24,6 +24,7 @@
namespace api {
using namespace scollectd;
using namespace json;
namespace hh = httpd::hinted_handoff_json;

View File

@@ -23,17 +23,17 @@
#include "api/lsa.hh"
#include "api/api.hh"
#include <seastar/http/exception.hh>
#include "http/exception.hh"
#include "utils/logalloc.hh"
#include "log.hh"
namespace api {
static logging::logger alogger("lsa-api");
static logging::logger logger("lsa-api");
void set_lsa(http_context& ctx, routes& r) {
httpd::lsa_json::lsa_compact.set(r, [&ctx](std::unique_ptr<request> req) {
alogger.info("Triggering compaction");
logger.info("Triggering compaction");
return ctx.db.invoke_on_all([] (database&) {
logalloc::shard_tracker().reclaim(std::numeric_limits<size_t>::max());
}).then([] {

View File

@@ -21,13 +21,13 @@
#include "messaging_service.hh"
#include "message/messaging_service.hh"
#include <seastar/rpc/rpc_types.hh>
#include "rpc/rpc_types.hh"
#include "api/api-doc/messaging_service.json.hh"
#include <iostream>
#include <sstream>
using namespace httpd::messaging_service_json;
using namespace netw;
using namespace net;
namespace api {
@@ -53,8 +53,8 @@ std::vector<message_counter> map_to_message_counters(
* according to a function that it gets as a parameter.
*
*/
future_json_function get_client_getter(sharded<netw::messaging_service>& ms, std::function<uint64_t(const shard_info&)> f) {
return [&ms, f](std::unique_ptr<request> req) {
future_json_function get_client_getter(std::function<uint64_t(const shard_info&)> f) {
return [f](std::unique_ptr<request> req) {
using map_type = std::unordered_map<gms::inet_address, uint64_t>;
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
@@ -63,70 +63,70 @@ future_json_function get_client_getter(sharded<netw::messaging_service>& ms, std
});
return map;
};
return ms.map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
return get_messaging_service().map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
then([](map_type&& map) {
return make_ready_future<json::json_return_type>(map_to_message_counters(map));
});
};
}
future_json_function get_server_getter(sharded<netw::messaging_service>& ms, std::function<uint64_t(const rpc::stats&)> f) {
return [&ms, f](std::unique_ptr<request> req) {
future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)> f) {
return [f](std::unique_ptr<request> req) {
using map_type = std::unordered_map<gms::inet_address, uint64_t>;
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
ms.foreach_server_connection_stats([&map, f] (const rpc::client_info& info, const rpc::stats& stats) mutable {
map[gms::inet_address(info.addr.addr())] = f(stats);
map[gms::inet_address(net::ipv4_address(info.addr))] = f(stats);
});
return map;
};
return ms.map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
return get_messaging_service().map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
then([](map_type&& map) {
return make_ready_future<json::json_return_type>(map_to_message_counters(map));
});
};
}
void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms) {
get_timeout_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
void set_messaging_service(http_context& ctx, routes& r) {
get_timeout_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().timeout;
}));
get_sent_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_sent_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().sent_messages;
}));
get_dropped_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_dropped_messages.set(r, get_client_getter([](const shard_info& c) {
// We don't have the same drop message mechanism
// as origin has.
// hence we can always return 0
return 0;
}));
get_exception_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_exception_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().exception_received;
}));
get_pending_messages.set(r, get_client_getter(ms, [](const shard_info& c) {
get_pending_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().pending;
}));
get_respond_pending_messages.set(r, get_server_getter(ms, [](const rpc::stats& c) {
get_respond_pending_messages.set(r, get_server_getter([](const rpc::stats& c) {
return c.pending;
}));
get_respond_completed_messages.set(r, get_server_getter(ms, [](const rpc::stats& c) {
get_respond_completed_messages.set(r, get_server_getter([](const rpc::stats& c) {
return c.sent_messages;
}));
get_version.set(r, [&ms](const_req req) {
return ms.local().get_raw_version(req.get_query_param("addr"));
get_version.set(r, [](const_req req) {
return net::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
});
get_dropped_messages_by_ver.set(r, [&ms](std::unique_ptr<request> req) {
get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);
return ms.map_reduce([map](const uint64_t* local_map) mutable {
return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
for (auto i = 0; i < num_verb; i++) {
(*map)[i]+= local_map[i];
}
@@ -139,7 +139,7 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
messaging_verb v = i; // for type safety we use messaging_verb values
auto idx = static_cast<uint32_t>(v);
if (idx >= map->size()) {
throw std::runtime_error(format("verb index out of bounds: {:d}, map size: {:d}", idx, map->size()));
throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
}
if ((*map)[idx] > 0) {
c.count = (*map)[idx];
@@ -151,18 +151,5 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
});
});
}
void unset_messaging_service(http_context& ctx, routes& r) {
get_timeout_messages.unset(r);
get_sent_messages.unset(r);
get_dropped_messages.unset(r);
get_exception_messages.unset(r);
get_pending_messages.unset(r);
get_respond_pending_messages.unset(r);
get_respond_completed_messages.unset(r);
get_version.unset(r);
get_dropped_messages_by_ver.unset(r);
}
}

View File

@@ -23,11 +23,8 @@
#include "api.hh"
namespace netw { class messaging_service; }
namespace api {
void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);
void unset_messaging_service(http_context& ctx, routes& r);
void set_messaging_service(http_context& ctx, routes& r);
}

View File

@@ -26,8 +26,6 @@
#include "service/storage_service.hh"
#include "db/config.hh"
#include "utils/histogram.hh"
#include "database.hh"
#include "seastar/core/scheduling_specific.hh"
namespace api {
@@ -35,70 +33,12 @@ namespace sp = httpd::storage_proxy_json;
using proxy = service::storage_proxy;
using namespace json;
/**
* This function implement a two dimentional map reduce where
* the first level is a distributed storage_proxy class and the
* second level is the stats per scheduling group class.
* @param d - a reference to the storage_proxy distributed class.
* @param mapper - the internal mapper that is used to map the internal
* stat class into a value of type `V`.
* @param reducer - the reducer that is used in both outer and inner
* aggregations.
* @param initial_value - the initial value to use for both aggregations
* @return A future that resolves to the result of the aggregation.
*/
template<typename V, typename Reducer, typename InnerMapper>
future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,
InnerMapper mapper, Reducer reducer, V initial_value) {
return d.map_reduce0( [mapper, reducer, initial_value] (const service::storage_proxy& sp) {
return map_reduce_scheduling_group_specific<service::storage_proxy_stats::stats>(
mapper, reducer, initial_value, sp.get_stats_key());
}, initial_value, reducer);
static future<utils::rate_moving_average> sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return d.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average(),
std::plus<utils::rate_moving_average>());
}
/**
* This function implement a two dimentional map reduce where
* the first level is a distributed storage_proxy class and the
* second level is the stats per scheduling group class.
* @param d - a reference to the storage_proxy distributed class.
* @param f - a field pointer which is the implicit internal reducer.
* @param reducer - the reducer that is used in both outer and inner
* aggregations.
* @param initial_value - the initial value to use for both aggregations* @return
* @return A future that resolves to the result of the aggregation.
*/
template<typename V, typename Reducer, typename F>
future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,
V F::*f, Reducer reducer, V initial_value) {
return two_dimensional_map_reduce(d, [f] (F& stats) {
return stats.*f;
}, reducer, initial_value);
}
/**
* A partial Specialization of sum_stats for the storage proxy
* case where the get stats function doesn't return a
* stats object with fields but a per scheduling group
* stats object, the name was also changed since functions
* partial specialization is not supported in C++.
*
*/
template<typename V, typename F>
future<json::json_return_type> sum_stats_storage_proxy(distributed<proxy>& d, V F::*f) {
return two_dimensional_map_reduce(d, [f] (F& stats) { return stats.*f; }, std::plus<V>(), V(0)).then([] (V val) {
return make_ready_future<json::json_return_type>(val);
});
}
static future<utils::rate_moving_average> sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).rate();
}, std::plus<utils::rate_moving_average>(), utils::rate_moving_average());
}
static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
httpd::utils_json::rate_moving_average m;
m = val;
@@ -106,93 +46,29 @@ static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>&
});
}
httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average() {
return timer_to_json(utils::rate_moving_average_and_histogram());
}
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
return make_ready_future<json::json_return_type>(val.count);
});
}
utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val) {
utils_json::estimated_histogram res;
for (size_t i = 0; i < val.size(); i++) {
res.buckets.push(val.get(i));
res.bucket_offsets.push(val.get_bucket_lower_limit(i));
}
return res;
}
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, utils::time_estimated_histogram service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(ctx.sp, f, utils::time_estimated_histogram_merge,
utils::time_estimated_histogram()).then([](const utils::time_estimated_histogram& val) {
return make_ready_future<json::json_return_type>(time_to_json_histogram(val));
});
}
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, utils::estimated_histogram service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(ctx.sp, f, utils::estimated_histogram_merge,
utils::estimated_histogram()).then([](const utils::estimated_histogram& val) {
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, utils::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, utils::estimated_histogram(),
utils::estimated_histogram_merge).then([](const utils::estimated_histogram& val) {
utils_json::estimated_histogram res;
res = val;
return make_ready_future<json::json_return_type>(res);
});
}
static future<json::json_return_type> total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(ctx.sp, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).hist.mean * (stats.*f).hist.count;
}, std::plus<double>(), 0.0).then([](double val) {
static future<json::json_return_type> total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).hist.mean * (p.get_stats().*f).hist.count;}, 0.0,
std::plus<double>()).then([](double val) {
int64_t res = val;
return make_ready_future<json::json_return_type>(res);
});
}
/**
* A partial Specialization of sum_histogram_stats
* for the storage proxy case where the get stats
* function doesn't return a stats object with
* fields but a per scheduling group stats object,
* the name was also changed since function partial
* specialization is not supported in C++.
*/
template<typename F>
future<json::json_return_type>
sum_histogram_stats_storage_proxy(distributed<proxy>& d,
utils::timed_rate_moving_average_and_histogram F::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).hist;
}, std::plus<utils::ihistogram>(), utils::ihistogram()).
then([](const utils::ihistogram& val) {
return make_ready_future<json::json_return_type>(to_json(val));
});
}
/**
* A partial Specialization of sum_timer_stats for the
* storage proxy case where the get stats function
* doesn't return a stats object with fields but a
* per scheduling group stats object, the name
* was also changed since partial function specialization
* is not supported in C++.
*/
template<typename F>
future<json::json_return_type>
sum_timer_stats_storage_proxy(distributed<proxy>& d,
utils::timed_rate_moving_average_and_histogram F::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).rate();
}, std::plus<utils::rate_moving_average_and_histogram>(),
utils::rate_moving_average_and_histogram()).then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));
});
}
void set_storage_proxy(http_context& ctx, routes& r) {
sp::get_total_hints.set(r, [](std::unique_ptr<request> req) {
//TBD
@@ -200,40 +76,33 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
return make_ready_future<json::json_return_type>(!filter.is_disabled_for_all());
sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// hinted handoff is not supported currently,
// so we should return false
return make_ready_future<json::json_return_type>(false);
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("enable");
auto filter = (enable == "true" || enable == "1")
? db::hints::host_filter(db::hints::host_filter::enabled_for_all_tag {})
: db::hints::host_filter(db::hints::host_filter::disabled_for_all_tag {});
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_hinted_handoff_enabled_by_dc.set(r, [](std::unique_ptr<request> req) {
std::vector<sstring> res;
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
const auto& dcs = filter.get_dcs();
res.reserve(res.size());
std::copy(dcs.begin(), dcs.end(), std::back_inserter(res));
//TBD
unimplemented();
std::vector<sp::mapper_list> res;
return make_ready_future<json::json_return_type>(res);
});
sp::set_hinted_handoff_enabled_by_dc_list.set(r, [](std::unique_ptr<request> req) {
auto dcs = req->get_query_param("dcs");
auto filter = db::hints::host_filter::parse_from_dc_list(std::move(dcs));
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
//TBD
unimplemented();
auto enable = req->get_query_param("dcs");
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_max_hint_window.set(r, [](std::unique_ptr<request> req) {
@@ -352,15 +221,15 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_attempts);
return sum_stats(ctx.sp, &proxy::stats::read_repair_attempts);
});
sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_blocking);
return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_blocking);
});
sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_background);
return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_background);
});
sp::get_schema_versions.set(r, [](std::unique_ptr<request> req) {
@@ -376,154 +245,163 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
});
sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_metrics_failed_read_round_optimization.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_failed_read_round_optimization);
sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_timeouts);
});
sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_unavailables);
});
sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_timeouts);
});
sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_unavailables);
});
sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_timeouts);
});
sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_unavailables);
});
sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_timeouts);
});
sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_unavailables);
});
sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);
return sum_histogram_stats(ctx.sp, &proxy::stats::range);
});
sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);
return sum_histogram_stats(ctx.sp, &proxy::stats::write);
});
sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);
return sum_histogram_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);
return sum_timer_stats(ctx.sp, &proxy::stats::range);
});
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);
});
sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
});
sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
});
sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// No View metrics are available, so just return empty moving average
return make_ready_future<json::json_return_type>(get_empty_moving_average());
return sum_timer_stats(ctx.sp, &proxy::stats::write);
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_read);
return sum_estimated_histogram(ctx, &proxy::stats::estimated_read);
});
sp::get_read_latency.set(r, [&ctx](std::unique_ptr<request> req) {
return total_latency(ctx, &service::storage_proxy_stats::stats::read);
return total_latency(ctx, &proxy::stats::read);
});
sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_write);
return sum_estimated_histogram(ctx, &proxy::stats::estimated_write);
});
sp::get_write_latency.set(r, [&ctx](std::unique_ptr<request> req) {
return total_latency(ctx, &service::storage_proxy_stats::stats::write);
return total_latency(ctx, &proxy::stats::write);
});
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {
return total_latency(ctx, &service::storage_proxy_stats::stats::range);
return total_latency(ctx, &proxy::stats::range);
});
}

View File

@@ -22,37 +22,21 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include "db/schema_tables.hh"
#include <optional>
#include <time.h>
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include "service/storage_service.hh"
#include "service/load_meter.hh"
#include "db/commitlog/commitlog.hh"
#include "gms/gossiper.hh"
#include "db/system_keyspace.hh"
#include "seastar/http/exception.hh"
#include <service/storage_service.hh>
#include <db/commitlog/commitlog.hh>
#include <gms/gossiper.hh>
#include <db/system_keyspace.hh>
#include "http/exception.hh"
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
#include "log.hh"
#include "release.hh"
#include "sstables/compaction_manager.hh"
#include "sstables/sstables.hh"
#include "database.hh"
#include "db/extensions.hh"
#include "db/snapshot-ctl.hh"
#include "transport/controller.hh"
#include "thrift/controller.hh"
#include "locator/token_metadata.hh"
namespace api {
const locator::token_metadata& http_context::get_token_metadata() {
return *shared_token_metadata.local().get();
}
namespace ss = httpd::storage_service_json;
using namespace json;
@@ -63,195 +47,27 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
}
static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
std::vector<ss::token_range> res;
for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
r.endpoint_details.push(ed);
}
r.endpoint_details.push(ed);
res.push_back(r);
}
return r;
}
using ks_cf_func = std::function<future<json::json_return_type>(http_context&, std::unique_ptr<request>, sstring, std::vector<sstring>)>;
static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {
return [&ctx, f = std::move(f)](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = split_cf(req->get_query_param("cf"));
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return f(ctx, std::move(req), std::move(keyspace), std::move(column_families));
};
}
future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
if (tables.empty()) {
tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return service::get_local_storage_service().set_tables_autocompaction(keyspace, tables, enabled).then([]{
return make_ready_future<json::json_return_type>(json_void());
});
}
void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {
ss::start_native_transport.set(r, [&ctl](std::unique_ptr<request> req) {
return ctl.start_server().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<request> req) {
return ctl.stop_server().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::is_native_transport_running.set(r, [&ctl] (std::unique_ptr<request> req) {
return ctl.is_server_running().then([] (bool running) {
return make_ready_future<json::json_return_type>(running);
});
});
}
void unset_transport_controller(http_context& ctx, routes& r) {
ss::start_native_transport.unset(r);
ss::stop_native_transport.unset(r);
ss::is_native_transport_running.unset(r);
}
void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {
ss::stop_rpc_server.set(r, [&ctl](std::unique_ptr<request> req) {
return ctl.stop_server().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::start_rpc_server.set(r, [&ctl](std::unique_ptr<request> req) {
return ctl.start_server().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::is_rpc_server_running.set(r, [&ctl] (std::unique_ptr<request> req) {
return ctl.is_server_running().then([] (bool running) {
return make_ready_future<json::json_return_type>(running);
});
});
}
void unset_rpc_controller(http_context& ctx, routes& r) {
ss::stop_rpc_server.unset(r);
ss::start_rpc_server.unset(r);
ss::is_rpc_server_running.unset(r);
}
void set_repair(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms) {
ss::repair_async.set(r, [&ctx, &ms](std::unique_ptr<request> req) {
static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
"jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace",
"startToken", "endToken" };
std::unordered_map<sstring, sstring> options_map;
for (auto o : options) {
auto s = req->get_query_param(o);
if (s != "") {
options_map[o] = s;
}
}
// The repair process is asynchronous: repair_start only starts it and
// returns immediately, not waiting for the repair to finish. The user
// then has other mechanisms to track the ongoing repair's progress,
// or stop it.
return repair_start(ctx.db, ms, validate_keyspace(ctx, req->param),
options_map).then([] (int i) {
return make_ready_future<json::json_return_type>(i);
});
});
ss::get_active_repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
return get_active_repairs(ctx.db).then([] (std::vector<int> res){
return make_ready_future<json::json_return_type>(res);
});
});
ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {
return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
} catch(std::runtime_error& e) {
throw httpd::bad_param_exception(e.what());
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::repair_await_completion.set(r, [&ctx](std::unique_ptr<request> req) {
int id;
using clock = std::chrono::steady_clock;
clock::time_point expire;
try {
id = boost::lexical_cast<int>(req->get_query_param("id"));
// If timeout is not provided, it means no timeout.
sstring s = req->get_query_param("timeout");
int64_t timeout = s.empty() ? int64_t(-1) : boost::lexical_cast<int64_t>(s);
if (timeout < 0 && timeout != -1) {
return make_exception_future<json::json_return_type>(
httpd::bad_param_exception("timeout can only be -1 (means no timeout) or non negative integer"));
}
if (timeout < 0) {
expire = clock::time_point::max();
} else {
expire = clock::now() + std::chrono::seconds(timeout);
}
} catch (std::exception& e) {
return make_exception_future<json::json_return_type>(httpd::bad_param_exception(e.what()));
}
return repair_await_completion(ctx.db, id, expire)
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
} catch (std::exception& e) {
return make_exception_future<json::json_return_type>(httpd::server_error_exception(e.what()));
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {
return repair_abort_all(service::get_local_storage_service().db()).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
}
void unset_repair(http_context& ctx, routes& r) {
ss::repair_async.unset(r);
ss::get_active_repair_async.unset(r);
ss::repair_async_status.unset(r);
ss::repair_await_completion.unset(r);
ss::force_terminate_all_repair_sessions.unset(r);
ss::force_terminate_all_repair_sessions_new.unset(r);
return res;
}
void set_storage_service(http_context& ctx, routes& r) {
@@ -261,43 +77,42 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::get_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().sorted_tokens(), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
ss::get_tokens.set(r, [] (const_req req) {
auto tokens = service::get_local_storage_service().get_token_metadata().sorted_tokens();
return container_to_vec(tokens);
});
ss::get_node_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {
gms::inet_address addr(req->param["endpoint"]);
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().get_tokens(addr), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
ss::get_node_tokens.set(r, [] (const_req req) {
gms::inet_address addr(req.param["endpoint"]);
auto tokens = service::get_local_storage_service().get_token_metadata().get_tokens(addr);
return container_to_vec(tokens);
});
ss::get_commitlog.set(r, [&ctx](const_req req) {
return ctx.db.local().commitlog()->active_config().commit_log_location;
});
ss::get_token_endpoint.set(r, [] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_to_endpoint_map(), [](const auto& i) {
storage_service_json::mapper val;
val.key = boost::lexical_cast<std::string>(i.first);
val.value = boost::lexical_cast<std::string>(i.second);
return val;
}));
ss::get_token_endpoint.set(r, [] (const_req req) {
auto token_to_ep = service::get_local_storage_service().get_token_to_endpoint_map();
std::vector<storage_service_json::mapper> res;
return map_to_key_value(token_to_ep, res);
});
ss::get_leaving_nodes.set(r, [&ctx](const_req req) {
return container_to_vec(ctx.get_token_metadata().get_leaving_endpoints());
ss::get_leaving_nodes.set(r, [](const_req req) {
return container_to_vec(service::get_local_storage_service().get_token_metadata().get_leaving_endpoints());
});
ss::get_moving_nodes.set(r, [](const_req req) {
auto points = service::get_local_storage_service().get_token_metadata().get_moving_endpoints();
std::unordered_set<sstring> addr;
for (auto i: points) {
addr.insert(boost::lexical_cast<std::string>(i.second));
}
return container_to_vec(addr);
});
ss::get_joining_nodes.set(r, [&ctx](const_req req) {
auto points = ctx.get_token_metadata().get_bootstrap_tokens();
ss::get_joining_nodes.set(r, [](const_req req) {
auto points = service::get_local_storage_service().get_token_metadata().get_bootstrap_tokens();
std::unordered_set<sstring> addr;
for (auto i: points) {
addr.insert(boost::lexical_cast<std::string>(i.second));
@@ -325,26 +140,11 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
std::vector<ss::maplist_mapper> res;
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_range_to_address_map(keyspace),
[](const std::pair<dht::token_range, std::vector<gms::inet_address>>& entry){
ss::maplist_mapper m;
if (entry.first.start()) {
m.key.push(entry.first.start().value().value().to_sstring());
} else {
m.key.push("");
}
if (entry.first.end()) {
m.key.push(entry.first.end().value().value().to_sstring());
} else {
m.key.push("");
}
for (const gms::inet_address& address : entry.second) {
m.value.push(address.to_sstring());
}
return m;
}));
return make_ready_future<json::json_return_type>(res);
});
ss::get_pending_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -355,26 +155,27 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));
ss::describe_any_ring.set(r, [&ctx](const_req req) {
return describe_ring("");
});
ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));
ss::describe_ring.set(r, [&ctx](const_req req) {
auto keyspace = validate_keyspace(ctx, req.param);
return describe_ring(keyspace);
});
ss::get_host_id_map.set(r, [&ctx](const_req req) {
ss::get_host_id_map.set(r, [](const_req req) {
std::vector<ss::mapper> res;
return map_to_key_value(ctx.get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);
return map_to_key_value(service::get_local_storage_service().
get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);
});
ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
});
ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.lmeter.get_load_map().then([] (auto&& load_map) {
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
std::vector<ss::map_string_double> res;
for (auto i : load_map) {
ss::map_string_double val;
@@ -399,12 +200,64 @@ void set_storage_service(http_context& ctx, routes& r) {
req.get_query_param("key")));
});
ss::cdc_streams_check_and_repair.set(r, [&ctx] (std::unique_ptr<request> req) {
return service::get_local_storage_service().check_and_repair_cdc_streams().then([] {
ss::get_snapshot_details.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().get_snapshot_details().then([] (auto result) {
std::vector<ss::snapshots> res;
for (auto& map: result) {
ss::snapshots all_snapshots;
all_snapshots.key = map.first;
std::vector<ss::snapshot> snapshot;
for (auto& cf: map.second) {
ss::snapshot s;
s.ks = cf.ks;
s.cf = cf.cf;
s.live = cf.live;
s.total = cf.total;
snapshot.push_back(std::move(s));
}
all_snapshots.value = std::move(snapshot);
res.push_back(std::move(all_snapshots));
}
return make_ready_future<json::json_return_type>(std::move(res));
});
});
ss::take_snapshot.set(r, [](std::unique_ptr<request> req) {
auto tag = req->get_query_param("tag");
auto column_family = req->get_query_param("cf");
std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");
auto resp = make_ready_future<>();
if (column_family.empty()) {
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_family, tag);
}
return resp.then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::del_snapshot.set(r, [](std::unique_ptr<request> req) {
auto tag = req->get_query_param("tag");
std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");
return service::get_local_storage_service().clear_snapshot(tag, keynames).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::true_snapshots_size.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().true_snapshots_size().then([] (int64_t size) {
return make_ready_future<json::json_return_type>(size);
});
});
ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = split_cf(req->get_query_param("cf"));
@@ -430,40 +283,38 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
if (!is_cleanup_allowed) {
return make_exception_future<json::json_return_type>(
std::runtime_error("Can not perform cleanup operation when topology changes"));
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return parallel_for_each(column_families_vec, [&cm, &db] (column_family* cf) {
return cm.perform_cleanup(db, cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
});
ss::upgrade_sstables.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
return ctx.db.invoke_on_all([=] (database& db) {
return do_for_each(column_families, [=, &db](sstring cfname) {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
return cm.perform_sstable_upgrade(db, &cf, exclude_current_version);
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
}));
});
ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
auto disable_snapshot = req->get_query_param("disable_snapshot");
auto skip_corrupted = req->get_query_param("skip_corrupted");
return make_ready_future<json::json_return_type>(json_void());
});
ss::upgrade_sstables.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
auto exclude_current_version = req->get_query_param("exclude_current_version");
return make_ready_future<json::json_return_type>(json_void());
});
ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
@@ -481,6 +332,47 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
"jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace",
"startToken", "endToken" };
std::unordered_map<sstring, sstring> options_map;
for (auto o : options) {
auto s = req->get_query_param(o);
if (s != "") {
options_map[o] = s;
}
}
// The repair process is asynchronous: repair_start only starts it and
// returns immediately, not waiting for the repair to finish. The user
// then has other mechanisms to track the ongoing repair's progress,
// or stop it.
return repair_start(ctx.db, validate_keyspace(ctx, req->param),
options_map).then([] (int i) {
return make_ready_future<json::json_return_type>(i);
});
});
ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {
return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
} catch(std::runtime_error& e) {
return make_ready_future<json::json_return_type>(json_exception(httpd::bad_param_exception(e.what())));
}
return make_ready_future<json::json_return_type>(json::json_return_type(res));
});
});
ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
ss::decommission.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().decommission().then([] {
return make_ready_future<json::json_return_type>(json_void());
@@ -548,7 +440,7 @@ void set_storage_service(http_context& ctx, routes& r) {
return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {
return ss.get_drain_progress();
}).then([] (auto&& progress) {
auto progress_str = format("Drained {}/{} ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
return make_ready_future<json::json_return_type>(std::move(progress_str));
});
});
@@ -616,8 +508,46 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::stop_rpc_server.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().stop_rpc_server().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::start_rpc_server.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().start_rpc_server().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::is_rpc_server_running.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().is_rpc_server_running().then([] (bool running) {
return make_ready_future<json::json_return_type>(running);
});
});
ss::start_native_transport.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().start_native_transport().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::stop_native_transport.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().stop_native_transport().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::is_native_transport_running.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().is_native_transport_running().then([] (bool running) {
return make_ready_future<json::json_return_type>(running);
});
});
ss::join_ring.set(r, [](std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(json_void());
return service::get_local_storage_service().join_ring().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
@@ -721,11 +651,7 @@ void set_storage_service(http_context& ctx, routes& r) {
auto coordinator = std::hash<sstring>()(cf) % smp::count;
return service::get_storage_service().invoke_on(coordinator, [ks = std::move(ks), cf = std::move(cf)] (service::storage_service& s) {
return s.load_new_sstables(ks, cf);
}).then_wrapped([] (auto&& f) {
if (f.failed()) {
auto msg = fmt::format("Failed to load new sstables: {}", f.get_exception());
return make_exception_future<json::json_return_type>(httpd::server_error_exception(msg));
}
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
@@ -738,17 +664,14 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::reset_local_schema.set(r, [](std::unique_ptr<request> req) {
// FIXME: We should truncate schema tables if more than one node in the cluster.
auto& sp = service::get_storage_proxy();
auto& fs = service::get_local_storage_service().features();
return db::schema_tables::recalculate_schema_version(sp, fs).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {
auto probability = req->get_query_param("probability");
return futurize_invoke([probability] {
return futurize<json::json_return_type>::apply([probability] {
double real_prob = std::stod(probability.c_str());
return tracing::tracing::tracing_instance().invoke_on_all([real_prob] (auto& local_tracing) {
local_tracing.set_trace_probability(real_prob);
@@ -762,7 +685,7 @@ void set_storage_service(http_context& ctx, routes& r) {
} catch (std::out_of_range& e) {
throw httpd::bad_param_exception(e.what());
} catch (std::invalid_argument&){
throw httpd::bad_param_exception(format("Bad format in a probability value: \"{}\"", probability.c_str()));
throw httpd::bad_param_exception(sprint("Bad format in a probability value: \"%s\"", probability.c_str()));
}
});
});
@@ -798,22 +721,24 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
} catch (...) {
throw httpd::bad_param_exception(format("Bad format value: "));
throw httpd::bad_param_exception(sprint("Bad format value: "));
}
});
ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = split_cf(req->get_query_param("cf"));
return set_tables_autocompaction(ctx, keyspace, tables, true);
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(json_void());
});
ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto tables = split_cf(req->get_query_param("cf"));
return set_tables_autocompaction(ctx, keyspace, tables, false);
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(json_void());
});
ss::deliver_hints.set(r, [](std::unique_ptr<request> req) {
@@ -877,8 +802,10 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
ss::get_metrics_load.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
ss::get_exceptions.set(r, [](const_req req) {
@@ -911,252 +838,6 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
});
});
ss::view_build_statuses.set(r, [&ctx] (std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto view = req->param["view"];
return service::get_local_storage_service().view_build_statuses(std::move(keyspace), std::move(view)).then([] (std::unordered_map<sstring, sstring> status) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
});
});
ss::sstable_info.set(r, [&ctx] (std::unique_ptr<request> req) {
auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;
auto cf = api::req_param<sstring>(*req, "cf", {}).value;
// The size of this vector is bound by ks::cf. I.e. it is as most Nks + Ncf long
// which is not small, but not huge either.
using table_sstables_list = std::vector<ss::table_sstables>;
return do_with(table_sstables_list{}, [ks, cf, &ctx](table_sstables_list& dst) {
return service::get_local_storage_service().db().map_reduce([&dst](table_sstables_list&& res) {
for (auto&& t : res) {
auto i = std::find_if(dst.begin(), dst.end(), [&t](const ss::table_sstables& t2) {
return t.keyspace() == t2.keyspace() && t.table() == t2.table();
});
if (i == dst.end()) {
dst.emplace_back(std::move(t));
continue;
}
auto& ssd = i->sstables;
for (auto&& sd : t.sstables._elements) {
auto j = std::find_if(ssd._elements.begin(), ssd._elements.end(), [&sd](const ss::sstable& s) {
return s.generation() == sd.generation();
});
if (j == ssd._elements.end()) {
i->sstables.push(std::move(sd));
}
}
}
}, [ks, cf](const database& db) {
// see above
table_sstables_list res;
auto& ext = db.get_config().extensions();
for (auto& t : db.get_column_families() | boost::adaptors::map_values) {
auto& schema = t->schema();
if ((ks.empty() || ks == schema->ks_name()) && (cf.empty() || cf == schema->cf_name())) {
// at most Nsstables long
ss::table_sstables tst;
tst.keyspace = schema->ks_name();
tst.table = schema->cf_name();
for (auto sstable : *t->get_sstables_including_compacted_undeleted()) {
auto ts = db_clock::to_time_t(sstable->data_file_write_time());
::tm t;
::gmtime_r(&ts, &t);
ss::sstable info;
info.timestamp = t;
info.generation = sstable->generation();
info.level = sstable->get_sstable_level();
info.size = sstable->bytes_on_disk();
info.data_size = sstable->ondisk_data_size();
info.index_size = sstable->index_size();
info.filter_size = sstable->filter_size();
info.version = sstable->get_version();
if (sstable->has_component(sstables::component_type::CompressionInfo)) {
auto& c = sstable->get_compression();
auto cp = sstables::get_sstable_compressor(c);
ss::named_maps nm;
nm.group = "compression_parameters";
for (auto& p : cp->options()) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
if (!cp->options().contains(compression_parameters::SSTABLE_COMPRESSION)) {
ss::mapper e;
e.key = compression_parameters::SSTABLE_COMPRESSION;
e.value = cp->name();
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
sstables::file_io_extension::attr_value_map map;
for (auto* ep : ext.sstable_file_io_extensions()) {
map.merge(ep->get_attributes(*sstable));
}
for (auto& p : map) {
struct {
const sstring& key;
ss::sstable& info;
void operator()(const std::map<sstring, sstring>& map) const {
ss::named_maps nm;
nm.group = key;
for (auto& p : map) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
void operator()(const sstring& value) const {
ss::mapper e;
e.key = key;
e.value = value;
info.properties.push(std::move(e));
}
} v{p.first, info};
std::visit(v, p.second);
}
tst.sstables.push(std::move(info));
}
res.emplace_back(std::move(tst));
}
}
std::sort(res.begin(), res.end(), [](const ss::table_sstables& t1, const ss::table_sstables& t2) {
return t1.keyspace() < t2.keyspace() || (t1.keyspace() == t2.keyspace() && t1.table() < t2.table());
});
return res;
}).then([&dst] {
return make_ready_future<json::json_return_type>(stream_object(dst));
});
});
});
}
void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl) {
ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<request> req) {
return snap_ctl.local().get_snapshot_details().then([] (std::unordered_map<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& result) {
std::function<future<>(output_stream<char>&&)> f = [result = std::move(result)](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [&result] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first, &result] {
return do_for_each(result, [&s, &first](std::tuple<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& map){
return do_with(ss::snapshots(), [&s, &first, &map](ss::snapshots& all_snapshots) {
all_snapshots.key = std::get<0>(map);
future<> f = first ? make_ready_future<>() : s.write(", ");
first = false;
std::vector<ss::snapshot> snapshot;
for (auto& cf: std::get<1>(map)) {
ss::snapshot snp;
snp.ks = cf.ks;
snp.cf = cf.cf;
snp.live = cf.live;
snp.total = cf.total;
snapshot.push_back(std::move(snp));
}
all_snapshots.value = std::move(snapshot);
return f.then([&s, &all_snapshots] {
return all_snapshots.write(s);
});
});
});
}).then([&s] {
return s.write("]").then([&s] {
return s.close();
});
});
});
};
return make_ready_future<json::json_return_type>(std::move(f));
});
});
ss::take_snapshot.set(r, [&snap_ctl](std::unique_ptr<request> req) {
auto tag = req->get_query_param("tag");
auto column_families = split(req->get_query_param("cf"), ",");
std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");
auto resp = make_ready_future<>();
if (column_families.empty()) {
resp = snap_ctl.local().take_snapshot(tag, keynames);
} else {
if (keynames.empty()) {
throw httpd::bad_param_exception("The keyspace of column families must be specified");
}
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
resp = snap_ctl.local().take_column_family_snapshot(keynames[0], column_families, tag);
}
return resp.then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::del_snapshot.set(r, [&snap_ctl](std::unique_ptr<request> req) {
auto tag = req->get_query_param("tag");
auto column_family = req->get_query_param("cf");
std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");
return snap_ctl.local().clear_snapshot(tag, keynames, column_family).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::true_snapshots_size.set(r, [&snap_ctl](std::unique_ptr<request> req) {
return snap_ctl.local().true_snapshots_size().then([] (int64_t size) {
return make_ready_future<json::json_return_type>(size);
});
});
ss::scrub.set(r, wrap_ks_cf(ctx, [&snap_ctl] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
const auto skip_corrupted = req_param<bool>(*req, "skip_corrupted", false);
auto f = make_ready_future<>();
if (!req_param<bool>(*req, "disable_snapshot", false)) {
auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
f = parallel_for_each(column_families, [&snap_ctl, keyspace, tag](sstring cf) {
return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag);
});
}
return f.then([&ctx, keyspace, column_families, skip_corrupted] {
return ctx.db.invoke_on_all([=] (database& db) {
return do_for_each(column_families, [=, &db](sstring cfname) {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
return cm.perform_sstable_scrub(&cf, skip_corrupted);
});
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
}));
}
void unset_snapshot(http_context& ctx, routes& r) {
ss::get_snapshot_details.unset(r);
ss::take_snapshot.unset(r);
ss::del_snapshot.unset(r);
ss::true_snapshots_size.unset(r);
ss::scrub.unset(r);
}
}

View File

@@ -21,24 +21,10 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
namespace cql_transport { class controller; }
class thrift_controller;
namespace db { class snapshot_ctl; }
namespace netw { class messaging_service; }
namespace api {
void set_storage_service(http_context& ctx, routes& r);
void set_repair(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);
void unset_repair(http_context& ctx, routes& r);
void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl);
void unset_transport_controller(http_context& ctx, routes& r);
void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl);
void unset_rpc_controller(http_context& ctx, routes& r);
void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl);
void unset_snapshot(http_context& ctx, routes& r);
}

View File

@@ -22,8 +22,7 @@
#include "api/api-doc/system.json.hh"
#include "api/api.hh"
#include <seastar/core/reactor.hh>
#include <seastar/http/exception.hh>
#include "http/exception.hh"
#include "log.hh"
namespace api {
@@ -31,10 +30,6 @@ namespace api {
namespace hs = httpd::system_json;
void set_system(http_context& ctx, routes& r) {
hs::get_system_uptime.set(r, [](const_req req) {
return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();
});
hs::get_all_logger_names.set(r, [](const_req req) {
return logging::logger_registry().get_all_logger_names();
});

View File

@@ -1,287 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "counters.hh"
#include "types.hh"
/// LSA mirator for cells with irrelevant type
///
///
const data::type_imr_descriptor& no_type_imr_descriptor() {
static thread_local data::type_imr_descriptor state(data::type_info::make_variable_size());
return state;
}
atomic_cell atomic_cell::make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
auto& imr_data = no_type_imr_descriptor();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_dead(timestamp, deletion_time), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value, collection_member cm)
{
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm)
{
auto& imr_data = type.imr_state();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
auto& imr_data = no_type_imr_descriptor();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live_counter_update(timestamp, value), &imr_data.lsa_migrator())
);
}
atomic_cell atomic_cell::make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size) {
auto& imr_data = no_type_imr_descriptor();
return atomic_cell(
imr_data.type_info(),
imr_object_type::make(data::cell::make_live_uninitialized(imr_data.type_info(), timestamp, size), &imr_data.lsa_migrator())
);
}
static imr::utils::object<data::cell::structure> copy_cell(const data::type_imr_descriptor& imr_data, const uint8_t* ptr)
{
using imr_object_type = imr::utils::object<data::cell::structure>;
// If the cell doesn't own any memory it is trivial and can be copied with
// memcpy.
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
if (!f.template get<data::cell::tags::external_data>()) {
data::cell::context ctx(f, imr_data.type_info());
// XXX: We may be better off storing the total cell size in memory. Measure!
auto size = data::cell::structure::serialized_object_size(ptr, ctx);
return imr_object_type::make_raw(size, [&] (uint8_t* dst) noexcept {
std::copy_n(ptr, size, dst);
}, &imr_data.lsa_migrator());
}
return imr_object_type::make(data::cell::copy_fn(imr_data.type_info(), ptr), &imr_data.lsa_migrator());
}
atomic_cell::atomic_cell(const abstract_type& type, atomic_cell_view other)
: atomic_cell(type.imr_state().type_info(),
copy_cell(type.imr_state(), other._view.raw_pointer()))
{ }
atomic_cell_or_collection atomic_cell_or_collection::copy(const abstract_type& type) const {
if (!_data.get()) {
return atomic_cell_or_collection();
}
auto& imr_data = type.imr_state();
return atomic_cell_or_collection(
copy_cell(imr_data, _data.get())
);
}
atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type, atomic_cell_view acv)
: _data(copy_cell(type.imr_state(), acv._view.raw_pointer()))
{
}
bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
{
auto ptr_a = _data.get();
auto ptr_b = other._data.get();
if (!ptr_a || !ptr_b) {
return !ptr_a && !ptr_b;
}
if (type.is_atomic()) {
auto a = atomic_cell_view::from_bytes(type.imr_state().type_info(), _data);
auto b = atomic_cell_view::from_bytes(type.imr_state().type_info(), other._data);
if (a.timestamp() != b.timestamp()) {
return false;
}
if (a.is_live() != b.is_live()) {
return false;
}
if (a.is_live()) {
if (a.is_counter_update() != b.is_counter_update()) {
return false;
}
if (a.is_counter_update()) {
return a.counter_update_value() == b.counter_update_value();
}
if (a.is_live_and_has_ttl() != b.is_live_and_has_ttl()) {
return false;
}
if (a.is_live_and_has_ttl()) {
if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {
return false;
}
}
return a.value() == b.value();
}
return a.deletion_time() == b.deletion_time();
} else {
return as_collection_mutation().data == other.as_collection_mutation().data;
}
}
size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t) const
{
if (!_data.get()) {
return 0;
}
auto ctx = data::cell::context(_data.get(), t.imr_state().type_info());
auto view = data::cell::structure::make_view(_data.get(), ctx);
auto flags = view.get<data::cell::tags::flags>();
size_t external_value_size = 0;
if (flags.get<data::cell::tags::external_data>()) {
if (flags.get<data::cell::tags::collection>()) {
external_value_size = as_collection_mutation().data.size_bytes();
} else {
auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
external_value_size = cell_view.value_size();
}
// Add overhead of chunk headers. The last one is a special case.
external_value_size += (external_value_size - 1) / data::cell::effective_external_chunk_length * data::cell::external_chunk_overhead;
external_value_size += data::cell::external_last_chunk_overhead;
}
return data::cell::structure::serialized_object_size(_data.get(), ctx)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view& acv) {
if (acv.is_live()) {
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
acv.is_counter_update()
? "counter_update_value=" + to_sstring(acv.counter_update_value())
: to_hex(acv.value().linearize()),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell& ac) {
return os << atomic_cell_view(ac);
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {
auto& type = acvp._type;
auto& acv = acvp._cell;
if (acv.is_live()) {
std::ostringstream cell_value_string_builder;
if (type.is_counter()) {
if (acv.is_counter_update()) {
cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();
} else {
cell_value_string_builder << "shards: ";
counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {
cell_value_string_builder << ::join(", ", ccv.shards());
});
}
} else {
cell_value_string_builder << type.to_string(acv.value().linearize());
}
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
cell_value_string_builder.str(),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell::printer& acp) {
return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {
if (!p._cell._data.get()) {
return os << "{ null atomic_cell_or_collection }";
}
using dc = data::cell;
os << "{ ";
if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {
os << "collection ";
auto cmv = p._cell.as_collection_mutation();
os << collection_mutation_view::printer(*p._cdef.type, cmv);
} else {
os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));
}
return os << " }";
}

View File

@@ -26,55 +26,149 @@
#include "tombstone.hh"
#include "gc_clock.hh"
#include "utils/managed_bytes.hh"
#include <seastar/net//byteorder.hh>
#include "net/byteorder.hh"
#include <cstdint>
#include <iosfwd>
#include "data/cell.hh"
#include "data/schema_info.hh"
#include "imr/utils.hh"
#include "utils/fragmented_temporary_buffer.hh"
#include "serializer.hh"
template<typename T>
static inline
void set_field(managed_bytes& v, unsigned offset, T val) {
reinterpret_cast<net::packed<T>*>(v.begin() + offset)->raw = net::hton(val);
}
template<typename T>
static inline
T get_field(const bytes_view& v, unsigned offset) {
return net::ntoh(*reinterpret_cast<const net::packed<T>*>(v.begin() + offset));
}
class abstract_type;
class collection_type_impl;
class atomic_cell_or_collection;
using atomic_cell_value_view = data::value_view;
using atomic_cell_value_mutable_view = data::value_mutable_view;
/// View of an atomic cell
template<mutable_view is_mutable>
class basic_atomic_cell_view {
protected:
data::cell::basic_atomic_cell_view<is_mutable> _view;
/*
* Represents atomic cell layout. Works on serialized form.
*
* Layout:
*
* <live> := <int8_t:flags><int64_t:timestamp>(<int32_t:expiry><int32_t:ttl>)?<value>
* <dead> := <int8_t: 0><int64_t:timestamp><int32_t:deletion_time>
*/
class atomic_cell_type final {
private:
static constexpr int8_t LIVE_FLAG = 0x01;
static constexpr int8_t EXPIRY_FLAG = 0x02; // When present, expiry field is present. Set only for live cells
static constexpr int8_t REVERT_FLAG = 0x04; // transient flag used to efficiently implement ReversiblyMergeable for atomic cells.
static constexpr int8_t COUNTER_UPDATE_FLAG = 0x08; // Cell is a counter update.
static constexpr unsigned flags_size = 1;
static constexpr unsigned timestamp_offset = flags_size;
static constexpr unsigned timestamp_size = 8;
static constexpr unsigned expiry_offset = timestamp_offset + timestamp_size;
static constexpr unsigned expiry_size = 4;
static constexpr unsigned deletion_time_offset = timestamp_offset + timestamp_size;
static constexpr unsigned deletion_time_size = 4;
static constexpr unsigned ttl_offset = expiry_offset + expiry_size;
static constexpr unsigned ttl_size = 4;
private:
static bool is_counter_update(bytes_view cell) {
return cell[0] & COUNTER_UPDATE_FLAG;
}
static bool is_revert_set(bytes_view cell) {
return cell[0] & REVERT_FLAG;
}
template<typename BytesContainer>
static void set_revert(BytesContainer& cell, bool revert) {
cell[0] = (cell[0] & ~REVERT_FLAG) | (revert * REVERT_FLAG);
}
static bool is_live(const bytes_view& cell) {
return cell[0] & LIVE_FLAG;
}
static bool is_live_and_has_ttl(const bytes_view& cell) {
return cell[0] & EXPIRY_FLAG;
}
static bool is_dead(const bytes_view& cell) {
return !is_live(cell);
}
// Can be called on live and dead cells
static api::timestamp_type timestamp(const bytes_view& cell) {
return get_field<api::timestamp_type>(cell, timestamp_offset);
}
// Can be called on live cells only
static bytes_view value(bytes_view cell) {
auto expiry_field_size = bool(cell[0] & EXPIRY_FLAG) * (expiry_size + ttl_size);
auto value_offset = flags_size + timestamp_size + expiry_field_size;
cell.remove_prefix(value_offset);
return cell;
}
// Can be called only when is_dead() is true.
static gc_clock::time_point deletion_time(const bytes_view& cell) {
assert(is_dead(cell));
return gc_clock::time_point(gc_clock::duration(
get_field<int32_t>(cell, deletion_time_offset)));
}
// Can be called only when is_live_and_has_ttl() is true.
static gc_clock::time_point expiry(const bytes_view& cell) {
assert(is_live_and_has_ttl(cell));
auto expiry = get_field<int32_t>(cell, expiry_offset);
return gc_clock::time_point(gc_clock::duration(expiry));
}
// Can be called only when is_live_and_has_ttl() is true.
static gc_clock::duration ttl(const bytes_view& cell) {
assert(is_live_and_has_ttl(cell));
return gc_clock::duration(get_field<int32_t>(cell, ttl_offset));
}
static managed_bytes make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
managed_bytes b(managed_bytes::initialized_later(), flags_size + timestamp_size + deletion_time_size);
b[0] = 0;
set_field(b, timestamp_offset, timestamp);
set_field(b, deletion_time_offset, deletion_time.time_since_epoch().count());
return b;
}
static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value) {
auto value_offset = flags_size + timestamp_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
b[0] = LIVE_FLAG;
set_field(b, timestamp_offset, timestamp);
std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
return b;
}
static managed_bytes make_live_counter_update(api::timestamp_type timestamp, bytes_view value) {
auto value_offset = flags_size + timestamp_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
b[0] = LIVE_FLAG | COUNTER_UPDATE_FLAG;
set_field(b, timestamp_offset, timestamp);
std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
return b;
}
static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value, gc_clock::time_point expiry, gc_clock::duration ttl) {
auto value_offset = flags_size + timestamp_size + expiry_size + ttl_size;
managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
b[0] = EXPIRY_FLAG | LIVE_FLAG;
set_field(b, timestamp_offset, timestamp);
set_field(b, expiry_offset, expiry.time_since_epoch().count());
set_field(b, ttl_offset, ttl.count());
std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
return b;
}
template<typename ByteContainer>
friend class atomic_cell_base;
friend class atomic_cell;
public:
using pointer_type = std::conditional_t<is_mutable == mutable_view::no, const uint8_t*, uint8_t*>;
};
template<typename ByteContainer>
class atomic_cell_base {
protected:
explicit basic_atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> v)
: _view(std::move(v)) { }
basic_atomic_cell_view(const data::type_info& ti, pointer_type ptr)
: _view(data::cell::make_atomic_cell_view(ti, ptr))
{ }
ByteContainer _data;
protected:
atomic_cell_base(ByteContainer&& data) : _data(std::forward<ByteContainer>(data)) { }
friend class atomic_cell_or_collection;
public:
operator basic_atomic_cell_view<mutable_view::no>() const noexcept {
return basic_atomic_cell_view<mutable_view::no>(_view);
}
void swap(basic_atomic_cell_view& other) noexcept {
using std::swap;
swap(_view, other._view);
}
bool is_counter_update() const {
return _view.is_counter_update();
return atomic_cell_type::is_counter_update(_data);
}
bool is_revert_set() const {
return atomic_cell_type::is_revert_set(_data);
}
bool is_live() const {
return _view.is_live();
return atomic_cell_type::is_live(_data);
}
bool is_live(tombstone t, bool is_counter) const {
return is_live() && !is_covered_by(t, is_counter);
@@ -83,161 +177,151 @@ public:
return is_live() && !is_covered_by(t, is_counter) && !has_expired(now);
}
bool is_live_and_has_ttl() const {
return _view.is_expiring();
return atomic_cell_type::is_live_and_has_ttl(_data);
}
bool is_dead(gc_clock::time_point now) const {
return !is_live() || has_expired(now);
return atomic_cell_type::is_dead(_data) || has_expired(now);
}
bool is_covered_by(tombstone t, bool is_counter) const {
return timestamp() <= t.timestamp || (is_counter && t.timestamp != api::missing_timestamp);
}
// Can be called on live and dead cells
api::timestamp_type timestamp() const {
return _view.timestamp();
}
void set_timestamp(api::timestamp_type ts) {
_view.set_timestamp(ts);
return atomic_cell_type::timestamp(_data);
}
// Can be called on live cells only
data::basic_value_view<is_mutable> value() const {
return _view.value();
}
// Can be called on live cells only
size_t value_size() const {
return _view.value_size();
}
bool is_value_fragmented() const {
return _view.is_value_fragmented();
}
// Can be called on live counter update cells only
int64_t counter_update_value() const {
return _view.counter_update_value();
bytes_view value() const {
return atomic_cell_type::value(_data);
}
// Can be called only when is_dead(gc_clock::time_point)
gc_clock::time_point deletion_time() const {
return !is_live() ? _view.deletion_time() : expiry() - ttl();
return !is_live() ? atomic_cell_type::deletion_time(_data) : expiry() - ttl();
}
// Can be called only when is_live_and_has_ttl()
gc_clock::time_point expiry() const {
return _view.expiry();
return atomic_cell_type::expiry(_data);
}
// Can be called only when is_live_and_has_ttl()
gc_clock::duration ttl() const {
return _view.ttl();
return atomic_cell_type::ttl(_data);
}
// Can be called on live and dead cells
bool has_expired(gc_clock::time_point now) const {
return is_live_and_has_ttl() && expiry() <= now;
return is_live_and_has_ttl() && expiry() < now;
}
bytes_view serialize() const {
return _view.serialize();
return _data;
}
void set_revert(bool revert) {
atomic_cell_type::set_revert(_data, revert);
}
};
class atomic_cell_view final : public basic_atomic_cell_view<mutable_view::no> {
atomic_cell_view(const data::type_info& ti, const uint8_t* data)
: basic_atomic_cell_view<mutable_view::no>(ti, data) {}
template<mutable_view is_mutable>
atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> view)
: basic_atomic_cell_view<mutable_view::no>(view) { }
friend class atomic_cell;
class atomic_cell_view final : public atomic_cell_base<bytes_view> {
atomic_cell_view(bytes_view data) : atomic_cell_base(std::move(data)) {}
public:
static atomic_cell_view from_bytes(const data::type_info& ti, const imr::utils::object<data::cell::structure>& data) {
return atomic_cell_view(ti, data.get());
}
static atomic_cell_view from_bytes(const data::type_info& ti, bytes_view bv) {
return atomic_cell_view(ti, reinterpret_cast<const uint8_t*>(bv.begin()));
}
static atomic_cell_view from_bytes(bytes_view data) { return atomic_cell_view(data); }
friend class atomic_cell;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
class printer {
const abstract_type& _type;
const atomic_cell_view& _cell;
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
atomic_cell_mutable_view(const data::type_info& ti, uint8_t* data)
: basic_atomic_cell_view<mutable_view::yes>(ti, data) {}
class atomic_cell_ref final : public atomic_cell_base<managed_bytes&> {
public:
static atomic_cell_mutable_view from_bytes(const data::type_info& ti, imr::utils::object<data::cell::structure>& data) {
return atomic_cell_mutable_view(ti, data.get());
}
friend class atomic_cell;
atomic_cell_ref(managed_bytes& buf) : atomic_cell_base(buf) {}
};
using atomic_cell_ref = atomic_cell_mutable_view;
class atomic_cell final : public basic_atomic_cell_view<mutable_view::yes> {
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
atomic_cell(const data::type_info& ti, imr::utils::object<data::cell::structure>&& data)
: basic_atomic_cell_view<mutable_view::yes>(ti, data.get()), _data(std::move(data)) {}
class atomic_cell final : public atomic_cell_base<managed_bytes> {
atomic_cell(managed_bytes b) : atomic_cell_base(std::move(b)) {}
public:
class collection_member_tag;
using collection_member = bool_class<collection_member_tag>;
atomic_cell(const atomic_cell&) = default;
atomic_cell(atomic_cell&&) = default;
atomic_cell& operator=(const atomic_cell&) = delete;
atomic_cell& operator=(const atomic_cell&) = default;
atomic_cell& operator=(atomic_cell&&) = default;
void swap(atomic_cell& other) noexcept {
basic_atomic_cell_view<mutable_view::yes>::swap(other);
_data.swap(other._data);
static atomic_cell from_bytes(managed_bytes b) {
return atomic_cell(std::move(b));
}
operator atomic_cell_view() const { return atomic_cell_view(_view); }
atomic_cell(const abstract_type& t, atomic_cell_view other);
static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
collection_member cm = collection_member::no) {
return make_live(type, timestamp, bytes_view(value), cm);
atomic_cell(atomic_cell_view other) : atomic_cell_base(managed_bytes{other._data}) {}
operator atomic_cell_view() const {
return atomic_cell_view(_data);
}
static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm = collection_member::no)
static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
return atomic_cell_type::make_dead(timestamp, deletion_time);
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value) {
return atomic_cell_type::make_live(timestamp, value);
}
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value) {
return make_live(timestamp, bytes_view(value));
}
static atomic_cell make_live_counter_update(api::timestamp_type timestamp, bytes_view value) {
return atomic_cell_type::make_live_counter_update(timestamp, value);
}
static atomic_cell make_live_counter_update(api::timestamp_type timestamp, const bytes& value) {
return atomic_cell_type::make_live_counter_update(timestamp, bytes_view(value));
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return make_live(type, timestamp, bytes_view(value), expiry, ttl, cm);
return atomic_cell_type::make_live(timestamp, value, expiry, ttl);
}
static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, ttl_opt ttl, collection_member cm = collection_member::no) {
static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value,
gc_clock::time_point expiry, gc_clock::duration ttl)
{
return make_live(timestamp, bytes_view(value), expiry, ttl);
}
static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value, ttl_opt ttl) {
if (!ttl) {
return make_live(type, timestamp, value, cm);
return atomic_cell_type::make_live(timestamp, value);
} else {
return make_live(type, timestamp, value, gc_clock::now() + *ttl, *ttl, cm);
return atomic_cell_type::make_live(timestamp, value, gc_clock::now() + *ttl, *ttl);
}
}
static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
friend class atomic_cell_or_collection;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
class printer : atomic_cell_view::printer {
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class collection_mutation_view;
// Represents a mutation of a collection. Actual format is determined by collection type,
// and is:
// set: list of atomic_cell
// map: list of pair<atomic_cell, bytes> (for key/value)
// list: tbd, probably ugly
class collection_mutation {
public:
managed_bytes data;
collection_mutation() {}
collection_mutation(managed_bytes b) : data(std::move(b)) {}
collection_mutation(collection_mutation_view v);
operator collection_mutation_view() const;
};
class collection_mutation_view {
public:
bytes_view data;
bytes_view serialize() const { return data; }
static collection_mutation_view from_bytes(bytes_view v) { return { v }; }
};
inline
collection_mutation::collection_mutation(collection_mutation_view v)
: data(v.data) {
}
inline
collection_mutation::operator collection_mutation_view() const {
return { data };
}
namespace db {
template<typename T>
class serializer;
}
class column_definition;
int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);
void merge_column(const abstract_type& def,
void merge_column(const column_definition& def,
atomic_cell_or_collection& old,
const atomic_cell_or_collection& neww);

View File

@@ -24,9 +24,7 @@
// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh
#include "types.hh"
#include "types/collection.hh"
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "hashing.hh"
#include "counters.hh"
@@ -34,13 +32,12 @@ template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
});
auto m_view = collection_type_impl::deserialize_mutation_form(cell);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
}
};
@@ -52,9 +49,7 @@ struct appending_hash<atomic_cell_view> {
feed_hash(h, cell.timestamp());
if (cell.is_live()) {
if (cdef.is_counter()) {
counter_cell_view::with_linearized(cell, [&] (counter_cell_view ccv) {
::feed_hash(h, ccv);
});
::feed_hash(h, counter_cell_view(cell));
return;
}
if (cell.is_live_and_has_ttl()) {
@@ -83,15 +78,3 @@ struct appending_hash<collection_mutation> {
feed_hash(h, static_cast<collection_mutation_view>(cm), cdef);
}
};
template<>
struct appending_hash<atomic_cell_or_collection> {
template<typename Hasher>
void operator()(Hasher& h, const atomic_cell_or_collection& c, const column_definition& cdef) const {
if (cdef.is_atomic()) {
feed_hash(h, c.as_atomic_cell(cdef), cdef);
} else {
feed_hash(h, c.as_collection_mutation(), cdef);
}
}
};

View File

@@ -22,72 +22,49 @@
#pragma once
#include "atomic_cell.hh"
#include "collection_mutation.hh"
#include "schema.hh"
#include "hashing.hh"
#include "imr/utils.hh"
// A variant type that can hold either an atomic_cell, or a serialized collection.
// Which type is stored is determined by the schema.
// Has an "empty" state.
// Objects moved-from are left in an empty state.
class atomic_cell_or_collection final {
// FIXME: This has made us lose small-buffer optimisation. Unfortunately,
// due to the changed cell format it would be less effective now, anyway.
// Measure the actual impact because any attempts to fix this will become
// irrelevant once rows are converted to the IMR as well, so maybe we can
// live with this like that.
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
managed_bytes _data;
private:
atomic_cell_or_collection(imr::utils::object<data::cell::structure>&& data) : _data(std::move(data)) {}
atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
public:
atomic_cell_or_collection() = default;
atomic_cell_or_collection(atomic_cell_or_collection&&) = default;
atomic_cell_or_collection(const atomic_cell_or_collection&) = delete;
atomic_cell_or_collection& operator=(atomic_cell_or_collection&&) = default;
atomic_cell_or_collection& operator=(const atomic_cell_or_collection&) = delete;
atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
atomic_cell_or_collection(const abstract_type& at, atomic_cell_view acv);
static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
atomic_cell_view as_atomic_cell(const column_definition& cdef) const { return atomic_cell_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
atomic_cell_ref as_atomic_cell_ref(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
atomic_cell_mutable_view as_mutable_atomic_cell(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm._data)) { }
atomic_cell_or_collection copy(const abstract_type&) const;
atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
atomic_cell_ref as_atomic_cell_ref() { return { _data }; }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
explicit operator bool() const {
return bool(_data);
return !_data.empty();
}
static constexpr bool can_use_mutable_view() {
return true;
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
return std::move(data.data);
}
void swap(atomic_cell_or_collection& other) noexcept {
_data.swap(other._data);
collection_mutation_view as_collection_mutation() const {
return collection_mutation_view{_data};
}
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) { return std::move(data._data); }
collection_mutation_view as_collection_mutation() const;
bytes_view serialize() const;
bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;
size_t external_memory_usage(const abstract_type&) const;
class printer {
const column_definition& _cdef;
const atomic_cell_or_collection& _cell;
public:
printer(const column_definition& cdef, const atomic_cell_or_collection& cell)
: _cdef(cdef), _cell(cell) { }
printer(const printer&) = delete;
printer(printer&&) = delete;
friend std::ostream& operator<<(std::ostream&, const printer&);
};
friend std::ostream& operator<<(std::ostream&, const printer&);
bytes_view serialize() const {
return _data;
}
bool operator==(const atomic_cell_or_collection& other) const {
return _data == other._data;
}
template<typename Hasher>
void feed_hash(Hasher& h, const column_definition& def) const {
if (def.is_atomic()) {
::feed_hash(h, as_atomic_cell(), def);
} else {
::feed_hash(h, as_collection_mutation(), def);
}
}
size_t external_memory_usage() const {
return _data.external_memory_usage();
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};
namespace std {
inline void swap(atomic_cell_or_collection& a, atomic_cell_or_collection& b) noexcept
{
a.swap(b);
}
}

View File

@@ -1,38 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/allow_all_authenticator.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
namespace auth {
constexpr std::string_view allow_all_authenticator_name("org.apache.cassandra.auth.AllowAllAuthenticator");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authenticator,
allow_all_authenticator,
cql3::query_processor&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
}

View File

@@ -1,101 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <stdexcept>
#include "auth/authenticated_user.hh"
#include "auth/authenticator.hh"
#include "auth/common.hh"
namespace cql3 {
class query_processor;
}
namespace service {
class migration_manager;
}
namespace auth {
extern const std::string_view allow_all_authenticator_name;
class allow_all_authenticator final : public authenticator {
public:
allow_all_authenticator(cql3::query_processor&, ::service::migration_manager&) {
}
virtual future<> start() override {
return make_ready_future<>();
}
virtual future<> stop() override {
return make_ready_future<>();
}
virtual std::string_view qualified_java_name() const override {
return allow_all_authenticator_name;
}
virtual bool require_authentication() const override {
return false;
}
virtual authentication_option_set supported_options() const override {
return authentication_option_set();
}
virtual authentication_option_set alterable_options() const override {
return authentication_option_set();
}
future<authenticated_user> authenticate(const credentials_map& credentials) const override {
return make_ready_future<authenticated_user>(anonymous_user());
}
virtual future<> create(std::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> alter(std::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> drop(std::string_view) const override {
return make_ready_future();
}
virtual future<custom_options> query_custom_options(std::string_view role_name) const override {
return make_ready_future<custom_options>();
}
virtual const resource_set& protected_resources() const override {
static const resource_set resources;
return resources;
}
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
}

View File

@@ -1,38 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "utils/class_registrator.hh"
namespace auth {
constexpr std::string_view allow_all_authorizer_name("org.apache.cassandra.auth.AllowAllAuthorizer");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authorizer,
allow_all_authorizer,
cql3::query_processor&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthorizer");
}

View File

@@ -1,92 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "auth/authorizer.hh"
#include "exceptions/exceptions.hh"
namespace cql3 {
class query_processor;
}
namespace service {
class migration_manager;
}
namespace auth {
extern const std::string_view allow_all_authorizer_name;
class allow_all_authorizer final : public authorizer {
public:
allow_all_authorizer(cql3::query_processor&, ::service::migration_manager&) {
}
virtual future<> start() override {
return make_ready_future<>();
}
virtual future<> stop() override {
return make_ready_future<>();
}
virtual std::string_view qualified_java_name() const override {
return allow_all_authorizer_name;
}
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
virtual future<> grant(std::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke(std::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual future<std::vector<permission_details>> list_all() const override {
return make_exception_future<std::vector<permission_details>>(
unsupported_authorization_operation(
"LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(std::string_view) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(const resource&) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual const resource_set& protected_resources() const override {
static const resource_set resources;
return resources;
}
};
}

387
auth/auth.cc Normal file
View File

@@ -0,0 +1,387 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <seastar/core/sleep.hh>
#include <seastar/core/distributed.hh>
#include "auth.hh"
#include "authenticator.hh"
#include "authorizer.hh"
#include "database.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/raw/cf_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "db/config.hh"
#include "service/migration_manager.hh"
#include "utils/loading_cache.hh"
#include "utils/hash.hh"
const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
const sstring auth::auth::AUTH_KS("system_auth");
const sstring auth::auth::USERS_CF("users");
static const sstring USER_NAME("name");
static const sstring SUPER("super");
static logging::logger logger("auth");
// TODO: configurable
using namespace std::chrono_literals;
const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
class auth_migration_listener : public service::migration_listener {
void on_create_keyspace(const sstring& ks_name) override {}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_create_view(const sstring& ks_name, const sstring& view_name) override {}
void on_update_keyspace(const sstring& ks_name) override {}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_drop_keyspace(const sstring& ks_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name));
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name, cf_name));
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
};
static auth_migration_listener auth_migration;
namespace std {
template <>
struct hash<auth::data_resource> {
size_t operator()(const auth::data_resource & v) const {
return v.hash_value();
}
};
template <>
struct hash<auth::authenticated_user> {
size_t operator()(const auth::authenticated_user & v) const {
return utils::tuple_hash()(v.name(), v.is_anonymous());
}
};
}
class auth::auth::permissions_cache {
public:
typedef utils::loading_cache<std::pair<authenticated_user, data_resource>, permission_set, utils::tuple_hash> cache_type;
typedef typename cache_type::key_type key_type;
permissions_cache()
: permissions_cache(
cql3::get_local_query_processor().db().local().get_config()) {
}
permissions_cache(const db::config& cfg)
: _cache(cfg.permissions_cache_max_entries(), expiry(cfg),
std::chrono::milliseconds(
cfg.permissions_validity_in_ms()),
[](const key_type& k) {
logger.debug("Refreshing permissions for {}", k.first.name());
return authorizer::get().authorize(::make_shared<authenticated_user>(k.first), k.second);
}) {
}
static std::chrono::milliseconds expiry(const db::config& cfg) {
auto exp = cfg.permissions_update_interval_in_ms();
if (exp == 0 || exp == std::numeric_limits<uint32_t>::max()) {
exp = cfg.permissions_validity_in_ms();
}
return std::chrono::milliseconds(exp);
}
future<> stop() {
return make_ready_future<>();
}
future<permission_set> get(::shared_ptr<authenticated_user> user, data_resource resource) {
return _cache.get(key_type(*user, std::move(resource)));
}
private:
cache_type _cache;
};
static distributed<auth::auth::permissions_cache> perm_cache;
/**
* Poor mans job schedule. For maximum 2 jobs. Sic.
* Still does nothing more clever than waiting 10 seconds
* like origin, then runs the submitted tasks.
*
* Only difference compared to sleep (from which this
* borrows _heavily_) is that if tasks have not run by the time
* we exit (and do static clean up) we delete the promise + cont
*
* Should be abstracted to some sort of global server function
* probably.
*/
struct waiter {
promise<> done;
timer<> tmr;
waiter() : tmr([this] {done.set_value();})
{
tmr.arm(auth::auth::SUPERUSER_SETUP_DELAY);
}
~waiter() {
if (tmr.armed()) {
tmr.cancel();
done.set_exception(std::runtime_error("shutting down"));
}
logger.trace("Deleting scheduled task");
}
void kill() {
}
};
typedef std::unique_ptr<waiter> waiter_ptr;
static std::vector<waiter_ptr> & thread_waiters() {
static thread_local std::vector<waiter_ptr> the_waiters;
return the_waiters;
}
void auth::auth::schedule_when_up(scheduled_func f) {
logger.trace("Adding scheduled task");
auto & waiters = thread_waiters();
waiters.emplace_back(std::make_unique<waiter>());
auto* w = waiters.back().get();
w->done.get_future().finally([w] {
auto & waiters = thread_waiters();
auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
return p.get() == w;
});
if (i != waiters.end()) {
waiters.erase(i);
}
}).then([f = std::move(f)] {
logger.trace("Running scheduled task");
return f();
}).handle_exception([](auto ep) {
return make_ready_future();
});
}
bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
if (type == classname) {
return true;
}
auto i = classname.find_last_of('.');
return classname.compare(i + 1, sstring::npos, type) == 0;
}
future<> auth::auth::setup() {
auto& db = cql3::get_local_query_processor().db().local();
auto& cfg = db.get_config();
future<> f = perm_cache.start();
if (is_class_type(cfg.authenticator(),
authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)
&& is_class_type(cfg.authorizer(),
authorizer::ALLOW_ALL_AUTHORIZER_NAME)
) {
// just create the objects
return f.then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
});
}
if (!db.has_keyspace(AUTH_KS)) {
std::map<sstring, sstring> opts;
opts["replication_factor"] = "1";
auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
// We use min_timestamp so that default keyspace metadata will loose with any manual adjustments. See issue #2129.
f = service::get_local_migration_manager().announce_new_keyspace(ksm, api::min_timestamp, false);
}
return f.then([] {
return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
90 * 24 * 60 * 60)); // 3 months.
}).then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
}).then([] {
service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
// instead of once-timer, just schedule this later
schedule_when_up([] {
// setup default super user
return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
AUTH_KS, USERS_CF, USER_NAME, SUPER);
cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
logger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
}).handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception&) {
logger.warn("Skipped default superuser setup: some nodes were not ready");
}
});
}
});
});
});
}
future<> auth::auth::shutdown() {
// just make sure we don't have pending tasks.
// this is mostly relevant for test cases where
// db-env-shutdown != process shutdown
return smp::invoke_on_all([] {
thread_waiters().clear();
}).then([] {
return perm_cache.stop();
});
}
future<auth::permission_set> auth::auth::get_permissions(::shared_ptr<authenticated_user> user, data_resource resource) {
return perm_cache.local().get(std::move(user), std::move(resource));
}
static db::consistency_level consistency_for_user(const sstring& username) {
if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return cql3::get_local_query_processor().process(
sprint("SELECT * FROM %s.%s WHERE %s = ?",
auth::auth::AUTH_KS, auth::auth::USERS_CF,
USER_NAME), consistency_for_user(username),
{ username }, true);
}
future<bool> auth::auth::is_existing_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
}
future<bool> auth::auth::is_super_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
});
}
future<> auth::auth::insert_user(const sstring& username, bool is_super)
throw (exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
AUTH_KS, USERS_CF, USER_NAME, SUPER),
consistency_for_user(username), { username, is_super }).discard_result();
}
future<> auth::auth::delete_user(const sstring& username) throw(exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
AUTH_KS, USERS_CF, USER_NAME),
consistency_for_user(username), { username }).discard_result();
}
future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
auto& qp = cql3::get_local_query_processor();
auto& db = qp.db().local();
if (db.has_schema(AUTH_KS, name)) {
return make_ready_future();
}
::shared_ptr<cql3::statements::raw::cf_statement> parsed = static_pointer_cast<
cql3::statements::raw::cf_statement>(cql3::query_processor::parse_statement(cql));
parsed->prepare_keyspace(AUTH_KS);
::shared_ptr<cql3::statements::create_table_statement> statement =
static_pointer_cast<cql3::statements::create_table_statement>(
parsed->prepare(db, qp.get_cql_stats())->statement);
auto schema = statement->get_cf_meta_data();
auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return service::get_local_migration_manager().announce_new_column_family(b.build(), false);
}
future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
});
});
}

124
auth/auth.hh Normal file
View File

@@ -0,0 +1,124 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "exceptions/exceptions.hh"
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class authenticated_user;
class auth {
public:
class permissions_cache;
static const sstring DEFAULT_SUPERUSER_NAME;
static const sstring AUTH_KS;
static const sstring USERS_CF;
static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
static bool is_class_type(const sstring& type, const sstring& classname);
static future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource);
/**
* Checks if the username is stored in AUTH_KS.USERS_CF.
*
* @param username Username to query.
* @return whether or not Cassandra knows about the user.
*/
static future<bool> is_existing_user(const sstring& username);
/**
* Checks if the user is a known superuser.
*
* @param username Username to query.
* @return true is the user is a superuser, false if they aren't or don't exist at all.
*/
static future<bool> is_super_user(const sstring& username);
/**
* Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
*
* @param username Username to insert.
* @param isSuper User's new status.
* @throws RequestExecutionException
*/
static future<> insert_user(const sstring& username, bool is_super) throw(exceptions::request_execution_exception);
/**
* Deletes the user from AUTH_KS.USERS_CF.
*
* @param username Username to delete.
* @throws RequestExecutionException
*/
static future<> delete_user(const sstring& username) throw(exceptions::request_execution_exception);
/**
* Sets up Authenticator and Authorizer.
*/
static future<> setup();
static future<> shutdown();
/**
* Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
*
* @param name name of the table
* @param cql CREATE TABLE statement
*/
static future<> setup_table(const sstring& name, const sstring& cql);
static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
// For internal use. Run function "when system is up".
typedef std::function<future<>()> scheduled_func;
static void schedule_when_up(scheduled_func);
};
}

View File

@@ -39,30 +39,34 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/authenticated_user.hh"
#include <iostream>
#include "authenticated_user.hh"
#include "auth.hh"
namespace auth {
const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
authenticated_user::authenticated_user(std::string_view name)
: name(sstring(name)) {
auth::authenticated_user::authenticated_user()
: _anon(true)
{}
auth::authenticated_user::authenticated_user(sstring name)
: _name(name), _anon(false)
{}
auth::authenticated_user::authenticated_user(authenticated_user&&) = default;
auth::authenticated_user::authenticated_user(const authenticated_user&) = default;
const sstring& auth::authenticated_user::name() const {
return _anon ? ANONYMOUS_USERNAME : _name;
}
std::ostream& operator<<(std::ostream& os, const authenticated_user& u) {
if (!u.name) {
os << "anonymous";
} else {
os << *u.name;
future<bool> auth::authenticated_user::is_super() const {
if (is_anonymous()) {
return make_ready_future<bool>(false);
}
return os;
}
static const authenticated_user the_anonymous_user{};
const authenticated_user& anonymous_user() noexcept {
return the_anonymous_user;
return auth::auth::is_super_user(_name);
}
bool auth::authenticated_user::operator==(const authenticated_user& v) const {
return _anon ? v._anon : _name == v._name;
}

View File

@@ -41,62 +41,42 @@
#pragma once
#include <string_view>
#include <functional>
#include <iosfwd>
#include <optional>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include <seastar/core/future.hh>
namespace auth {
///
/// A type-safe wrapper for the name of a logged-in user, or a nameless (anonymous) user.
///
class authenticated_user final {
class authenticated_user {
public:
///
/// An anonymous user has no name.
///
std::optional<sstring> name{};
static const sstring ANONYMOUS_USERNAME;
///
/// An anonymous user.
///
authenticated_user() = default;
explicit authenticated_user(std::string_view name);
};
authenticated_user();
authenticated_user(sstring name);
authenticated_user(authenticated_user&&);
authenticated_user(const authenticated_user&);
///
/// The user name, or "anonymous".
///
std::ostream& operator<<(std::ostream&, const authenticated_user&);
const sstring& name() const;
inline bool operator==(const authenticated_user& u1, const authenticated_user& u2) noexcept {
return u1.name == u2.name;
}
/**
* Checks the user's superuser status.
* Only a superuser is allowed to perform CREATE USER and DROP USER queries.
* Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
* (depends on IAuthorizer implementation).
*/
future<bool> is_super() const;
inline bool operator!=(const authenticated_user& u1, const authenticated_user& u2) noexcept {
return !(u1 == u2);
}
const authenticated_user& anonymous_user() noexcept;
inline bool is_anonymous(const authenticated_user& u) noexcept {
return u == anonymous_user();
}
}
namespace std {
template <>
struct hash<auth::authenticated_user> final {
size_t operator()(const auth::authenticated_user &u) const {
return std::hash<std::optional<sstring>>()(u.name);
/**
* If IAuthenticator doesn't require authentication, this method may return true.
*/
bool is_anonymous() const {
return _anon;
}
bool operator==(const authenticated_user&) const;
private:
sstring _name;
bool _anon;
};
}

View File

@@ -1,37 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/authentication_options.hh"
#include <iostream>
namespace auth {
std::ostream& operator<<(std::ostream& os, authentication_option a) {
switch (a) {
case authentication_option::password: os << "PASSWORD"; break;
case authentication_option::options: os << "OPTIONS"; break;
}
return os;
}
}

View File

@@ -1,64 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <iosfwd>
#include <optional>
#include <stdexcept>
#include <unordered_map>
#include <unordered_set>
#include <seastar/core/print.hh>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace auth {
enum class authentication_option {
password,
options
};
std::ostream& operator<<(std::ostream&, authentication_option);
using authentication_option_set = std::unordered_set<authentication_option>;
using custom_options = std::unordered_map<sstring, sstring>;
struct authentication_options final {
std::optional<sstring> password;
std::optional<custom_options> options;
};
inline bool any_authentication_options(const authentication_options& aos) noexcept {
return aos.password || aos.options;
}
class unsupported_authentication_option : public std::invalid_argument {
public:
explicit unsupported_authentication_option(authentication_option k)
: std::invalid_argument(format("The {} option is not supported.", k)) {
}
};
}

View File

@@ -39,13 +39,89 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/authenticator.hh"
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "cql3/query_processor.hh"
#include "utils/class_registrator.hh"
#include "authenticator.hh"
#include "authenticated_user.hh"
#include "password_authenticator.hh"
#include "auth.hh"
#include "db/config.hh"
const sstring auth::authenticator::USERNAME_KEY("username");
const sstring auth::authenticator::PASSWORD_KEY("password");
const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
auth::authenticator::option auth::authenticator::string_to_option(const sstring& name) {
if (strcasecmp(name.c_str(), "password") == 0) {
return option::PASSWORD;
}
throw std::invalid_argument(name);
}
sstring auth::authenticator::option_to_string(option opt) {
switch (opt) {
case option::PASSWORD:
return "PASSWORD";
default:
throw std::invalid_argument(sprint("Unknown option {}", opt));
}
}
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authenticator> global_authenticator;
future<>
auth::authenticator::setup(const sstring& type) throw (exceptions::configuration_exception) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
class allow_all_authenticator : public authenticator {
public:
const sstring& class_name() const override {
return ALLOW_ALL_AUTHENTICATOR_NAME;
}
bool require_authentication() const override {
return false;
}
option_set supported_options() const override {
return option_set();
}
option_set alterable_options() const override {
return option_set();
}
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override {
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
const resource_ids& protected_resources() const override {
static const resource_ids ids;
return ids;
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
global_authenticator = std::make_unique<allow_all_authenticator>();
} else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
auto pwa = std::make_unique<password_authenticator>();
auto f = pwa->init();
return f.then([pwa = std::move(pwa)]() mutable {
global_authenticator = std::move(pwa);
});
} else {
throw exceptions::configuration_exception("Invalid authenticator type: " + type);
}
return make_ready_future();
}
auth::authenticator& auth::authenticator::get() {
assert(global_authenticator);
return *global_authenticator;
}

View File

@@ -41,22 +41,19 @@
#pragma once
#include <string_view>
#include <memory>
#include <unordered_map>
#include <set>
#include <stdexcept>
#include <unordered_map>
#include <boost/any.hpp>
#include <seastar/core/enum.hh>
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/authentication_options.hh"
#include "auth/resource.hh"
#include "auth/sasl_challenge.hh"
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/enum.hh>
#include "bytes.hh"
#include "data_resource.hh"
#include "enum_set.hh"
#include "exceptions/exceptions.hh"
@@ -68,90 +65,136 @@ namespace auth {
class authenticated_user;
///
/// Abstract client for authenticating role identity.
///
/// All state necessary to authorize a role is stored externally to the client instance.
///
class authenticator {
public:
///
/// The name of the key to be used for the user-name part of password authentication with \ref authenticate.
///
static const sstring USERNAME_KEY;
///
/// The name of the key to be used for the password part of password authentication with \ref authenticate.
///
static const sstring PASSWORD_KEY;
static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;
/**
* Supported CREATE USER/ALTER USER options.
* Currently only PASSWORD is available.
*/
enum class option {
PASSWORD
};
static option string_to_option(const sstring&);
static sstring option_to_string(option);
using option_set = enum_set<super_enum<option, option::PASSWORD>>;
using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
using credentials_map = std::unordered_map<sstring, sstring>;
virtual ~authenticator() = default;
/**
* Setup is called once upon system startup to initialize the IAuthenticator.
*
* For example, use this method to create any required keyspaces/column families.
* Note: Only call from main thread.
*/
static future<> setup(const sstring& type) throw(exceptions::configuration_exception);
virtual future<> start() = 0;
/**
* Returns the system authenticator. Must have called setup before calling this.
*/
static authenticator& get();
virtual future<> stop() = 0;
virtual ~authenticator()
{}
///
/// A fully-qualified (class with package) Java-like name for this implementation.
///
virtual std::string_view qualified_java_name() const = 0;
virtual const sstring& class_name() const = 0;
/**
* Whether or not the authenticator requires explicit login.
* If false will instantiate user with AuthenticatedUser.ANONYMOUS_USER.
*/
virtual bool require_authentication() const = 0;
virtual authentication_option_set supported_options() const = 0;
/**
* Set of options supported by CREATE USER and ALTER USER queries.
* Should never return null - always return an empty set instead.
*/
virtual option_set supported_options() const = 0;
///
/// A subset of `supported_options()` that users are permitted to alter for themselves.
///
virtual authentication_option_set alterable_options() const = 0;
/**
* Subset of supportedOptions that users are allowed to alter when performing ALTER USER [themselves].
* Should never return null - always return an empty set instead.
*/
virtual option_set alterable_options() const = 0;
///
/// Authenticate a user given implementation-specific credentials.
///
/// If this implementation does not require authentication (\ref require_authentication), an anonymous user may
/// result.
///
/// \returns an exceptional future with \ref exceptions::authentication_exception if given invalid credentials.
///
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const = 0;
/**
* Authenticates a user given a Map<String, String> of credentials.
* Should never return null - always throw AuthenticationException instead.
* Returning AuthenticatedUser.ANONYMOUS_USER is an option as well if authentication is not required.
*
* @throws authentication_exception if credentials don't match any known user.
*/
virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) = 0;
///
/// Create an authentication record for a new user. This is required before the user can log-in.
///
/// The options provided must be a subset of `supported_options()`.
///
virtual future<> create(std::string_view role_name, const authentication_options& options) const = 0;
/**
* Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
* If authenticator is static then the body of the method should be left blank, but don't throw an exception.
* options are guaranteed to be a subset of supportedOptions().
*
* @param username Username of the user to create.
* @param options Options the user will be created with.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
///
/// Alter the authentication record of an existing user.
///
/// The options provided must be a subset of `supported_options()`.
///
/// Callers must ensure that the specification of `alterable_options()` is adhered to.
///
virtual future<> alter(std::string_view role_name, const authentication_options& options) const = 0;
/**
* Called during execution of ALTER USER query.
* options are always guaranteed to be a subset of supportedOptions(). Furthermore, if the user performing the query
* is not a superuser and is altering himself, then options are guaranteed to be a subset of alterableOptions().
* Keep the body of the method blank if your implementation doesn't support any options.
*
* @param username Username of the user that will be altered.
* @param options Options to alter.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
///
/// Delete the authentication record for a user. This will disallow the user from logging in.
///
virtual future<> drop(std::string_view role_name) const = 0;
///
/// Query for custom options (those corresponding to \ref authentication_options::options).
///
/// If no options are set the result is an empty container.
///
virtual future<custom_options> query_custom_options(std::string_view role_name) const = 0;
/**
* Called during execution of DROP USER query.
*
* @param username Username of the user that will be dropped.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.
///
virtual const resource_set& protected_resources() const = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
* @see resource_ids
*/
virtual const resource_ids& protected_resources() const = 0;
class sasl_challenge {
public:
virtual ~sasl_challenge() {}
virtual bytes evaluate_response(bytes_view client_response) throw(exceptions::authentication_exception) = 0;
virtual bool is_complete() const = 0;
virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const throw(exceptions::authentication_exception) = 0;
};
/**
* Provide a sasl_challenge to be used by the CQL binary protocol server. If
* the configured authenticator requires authentication but does not implement this
* interface we refuse to start the binary protocol server as it will have no way
* of authenticating clients.
* @return sasl_challenge implementation
*/
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
};
inline std::ostream& operator<<(std::ostream& os, authenticator::option opt) {
return os << authenticator::option_to_string(opt);
}
}

104
auth/authorizer.cc Normal file
View File

@@ -0,0 +1,104 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authorizer.hh"
#include "authenticated_user.hh"
#include "default_authorizer.hh"
#include "auth.hh"
#include "db/config.hh"
const sstring auth::authorizer::ALLOW_ALL_AUTHORIZER_NAME("org.apache.cassandra.auth.AllowAllAuthorizer");
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authorizer> global_authorizer;
future<>
auth::authorizer::setup(const sstring& type) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHORIZER_NAME)) {
class allow_all_authorizer : public authorizer {
public:
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
}
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
}
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const override {
throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
}
future<> revoke_all(sstring dropped_user) override {
return make_ready_future();
}
future<> revoke_all(data_resource) override {
return make_ready_future();
}
const resource_ids& protected_resources() override {
static const resource_ids ids;
return ids;
}
future<> validate_configuration() const override {
return make_ready_future();
}
};
global_authorizer = std::make_unique<allow_all_authorizer>();
} else if (auth::auth::is_class_type(type, default_authorizer::DEFAULT_AUTHORIZER_NAME)) {
auto da = std::make_unique<default_authorizer>();
auto f = da->init();
return f.then([da = std::move(da)]() mutable {
global_authorizer = std::move(da);
});
} else {
throw exceptions::configuration_exception("Invalid authorizer type: " + type);
}
return make_ready_future();
}
auth::authorizer& auth::authorizer::get() {
assert(global_authorizer);
return *global_authorizer;
}

View File

@@ -41,115 +41,131 @@
#pragma once
#include <string_view>
#include <functional>
#include <optional>
#include <stdexcept>
#include <tuple>
#include <vector>
#include <tuple>
#include <experimental/optional>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/permission.hh"
#include "auth/resource.hh"
#include "seastarx.hh"
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class role_or_anonymous;
class authenticated_user;
struct permission_details {
sstring role_name;
::auth::resource resource;
sstring user;
data_resource resource;
permission_set permissions;
bool operator<(const permission_details& v) const {
return std::tie(user, resource, permissions) < std::tie(v.user, v.resource, v.permissions);
}
};
inline bool operator==(const permission_details& pd1, const permission_details& pd2) {
return std::forward_as_tuple(pd1.role_name, pd1.resource, pd1.permissions.mask())
== std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions.mask());
}
using std::experimental::optional;
inline bool operator!=(const permission_details& pd1, const permission_details& pd2) {
return !(pd1 == pd2);
}
inline bool operator<(const permission_details& pd1, const permission_details& pd2) {
return std::forward_as_tuple(pd1.role_name, pd1.resource, pd1.permissions)
< std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions);
}
class unsupported_authorization_operation : public std::invalid_argument {
public:
using std::invalid_argument::invalid_argument;
};
///
/// Abstract client for authorizing roles to access resources.
///
/// All state necessary to authorize a role is stored externally to the client instance.
///
class authorizer {
public:
virtual ~authorizer() = default;
static const sstring ALLOW_ALL_AUTHORIZER_NAME;
virtual future<> start() = 0;
virtual ~authorizer() {}
virtual future<> stop() = 0;
/**
* The primary Authorizer method. Returns a set of permissions of a user on a resource.
*
* @param user Authenticated user requesting authorization.
* @param resource Resource for which the authorization is being requested. @see DataResource.
* @return Set of permissions of the user on the resource. Should never return empty. Use permission.NONE instead.
*/
virtual future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const = 0;
///
/// A fully-qualified (class with package) Java-like name for this implementation.
///
virtual std::string_view qualified_java_name() const = 0;
/**
* Grants a set of permissions on a resource to a user.
* The opposite of revoke().
*
* @param performer User who grants the permissions.
* @param permissions Set of permissions to grant.
* @param to Grantee of the permissions.
* @param resource Resource on which to grant the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> grant(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring to) = 0;
///
/// Query for the permissions granted directly to a role for a particular \ref resource (and not any of its
/// parents).
///
/// The optional role name is empty when an anonymous user is authorized. Some implementations may still wish to
/// grant default permissions in this case.
///
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const = 0;
/**
* Revokes a set of permissions on a resource from a user.
* The opposite of grant().
*
* @param performer User who revokes the permissions.
* @param permissions Set of permissions to revoke.
* @param from Revokee of the permissions.
* @param resource Resource on which to revoke the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> revoke(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring from) = 0;
///
/// Grant a set of permissions to a role for a particular \ref resource.
///
/// \throws \ref unsupported_authorization_operation if granting permissions is not supported.
///
virtual future<> grant(std::string_view role_name, permission_set, const resource&) const = 0;
/**
* Returns a list of permissions on a resource of a user.
*
* @param performer User who wants to see the permissions.
* @param permissions Set of Permission values the user is interested in. The result should only include the matching ones.
* @param resource The resource on which permissions are requested. Can be null, in which case permissions on all resources
* should be returned.
* @param of The user whose permissions are requested. Can be null, in which case permissions of every user should be returned.
*
* @return All of the matching permission that the requesting user is authorized to know about.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
///
/// Revoke a set of permissions from a role for a particular \ref resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke(std::string_view role_name, permission_set, const resource&) const = 0;
/**
* This method is called before deleting a user with DROP USER query so that a new user with the same
* name wouldn't inherit permissions of the deleted user in the future.
*
* @param droppedUser The user to revoke all permissions from.
*/
virtual future<> revoke_all(sstring dropped_user) = 0;
///
/// Query for all directly granted permissions.
///
/// \throws \ref unsupported_authorization_operation if listing permissions is not supported.
///
virtual future<std::vector<permission_details>> list_all() const = 0;
/**
* This method is called after a resource is removed (i.e. keyspace or a table is dropped).
*
* @param droppedResource The resource to revoke all permissions on.
*/
virtual future<> revoke_all(data_resource) = 0;
///
/// Revoke all permissions granted directly to a particular role.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(std::string_view role_name) const = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
*/
virtual const resource_ids& protected_resources() = 0;
///
/// Revoke all permissions granted to any role for a particular resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(const resource&) const = 0;
/**
* Validates configuration of IAuthorizer implementation (if configurable).
*
* @throws ConfigurationException when there is a configuration error.
*/
virtual future<> validate_configuration() const = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.
///
virtual const resource_set& protected_resources() const = 0;
/**
* Setup is called once upon system startup to initialize the IAuthorizer.
*
* For example, use this method to create any required keyspaces/column families.
*/
static future<> setup(const sstring& type);
/**
* Returns the system authorizer. Must have called setup before calling this.
*/
static authorizer& get();
};
}

View File

@@ -1,124 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/common.hh"
#include <seastar/core/shared_ptr.hh>
#include "cql3/query_processor.hh"
#include "cql3/statements/create_table_statement.hh"
#include "database.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
#include "timeout_config.hh"
namespace auth {
namespace meta {
constexpr std::string_view AUTH_KS("system_auth");
constexpr std::string_view USERS_CF("users");
constexpr std::string_view AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");
}
static logging::logger auth_log("auth");
// Func must support being invoked more than once.
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func) {
struct empty_state { };
return delay_until_system_ready(as).then([&as, func = std::move(func)] () mutable {
return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {
return func().then_wrapped([] (auto&& f) -> std::optional<empty_state> {
if (f.failed()) {
auth_log.debug("Auth task failed with error, rescheduling: {}", f.get_exception());
return { };
}
return { empty_state() };
});
});
}).discard_result();
}
static future<> create_metadata_table_if_missing_impl(
std::string_view table_name,
cql3::query_processor& qp,
std::string_view cql,
::service::migration_manager& mm) {
static auto ignore_existing = [] (seastar::noncopyable_function<future<>()> func) {
return futurize_invoke(std::move(func)).handle_exception_type([] (exceptions::already_exists_exception& ignored) { });
};
auto& db = qp.db();
auto parsed_statement = cql3::query_processor::parse_statement(cql);
auto& parsed_cf_statement = static_cast<cql3::statements::raw::cf_statement&>(*parsed_statement);
parsed_cf_statement.prepare_keyspace(meta::AUTH_KS);
auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
parsed_cf_statement.prepare(db, qp.get_cql_stats())->statement);
const auto schema = statement->get_cf_meta_data(qp.db());
const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
schema_ptr table = b.build();
return ignore_existing([&mm, table = std::move(table)] () {
return mm.announce_new_column_family(table, false);
});
}
future<> create_metadata_table_if_missing(
std::string_view table_name,
cql3::query_processor& qp,
std::string_view cql,
::service::migration_manager& mm) noexcept {
return futurize_invoke(create_metadata_table_if_missing_impl, table_name, qp, cql, mm);
}
future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db, seastar::abort_source& as) {
static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };
return do_until([&db, &as] {
as.check();
return db.get_version() != database::empty_version;
}, pause).then([&mm, &as] {
return do_until([&mm, &as] {
as.check();
return mm.have_schema_agreement();
}, pause);
});
}
::service::query_state& internal_distributed_query_state() noexcept {
#ifdef DEBUG
// Give the much slower debug tests more headroom for completing auth queries.
static const auto t = 30s;
#else
static const auto t = 5s;
#endif
static const timeout_config tc{t, t, t, t, t, t, t};
static thread_local ::service::client_state cs(::service::client_state::internal_tag{}, tc);
static thread_local ::service::query_state qs(cs, empty_service_permit());
return qs;
}
}

View File

@@ -1,93 +0,0 @@
/*
* Copyright (C) 2017 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <string_view>
#include <seastar/core/future.hh>
#include <seastar/core/abort_source.hh>
#include <seastar/util/noncopyable_function.hh>
#include <seastar/core/seastar.hh>
#include <seastar/core/resource.hh>
#include <seastar/core/sstring.hh>
#include <seastar/core/smp.hh>
#include "log.hh"
#include "seastarx.hh"
#include "utils/exponential_backoff_retry.hh"
#include "service/query_state.hh"
using namespace std::chrono_literals;
class database;
class timeout_config;
namespace service {
class migration_manager;
}
namespace cql3 {
class query_processor;
}
namespace auth {
namespace meta {
constexpr std::string_view DEFAULT_SUPERUSER_NAME("cassandra");
extern const std::string_view AUTH_KS;
extern const std::string_view USERS_CF;
extern const std::string_view AUTH_PACKAGE_NAME;
}
template <class Task>
future<> once_among_shards(Task&& f) {
if (this_shard_id() == 0u) {
return f();
}
return make_ready_future<>();
}
inline future<> delay_until_system_ready(seastar::abort_source& as) {
return sleep_abortable(15s, as);
}
// Func must support being invoked more than once.
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);
future<> create_metadata_table_if_missing(
std::string_view table_name,
cql3::query_processor&,
std::string_view cql,
::service::migration_manager&) noexcept;
future<> wait_for_schema_agreement(::service::migration_manager&, const database&, seastar::abort_source&);
///
/// Time-outs for internal, non-local CQL queries.
///
::service::query_state& internal_distributed_query_state() noexcept;
}

173
auth/data_resource.cc Normal file
View File

@@ -0,0 +1,173 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "data_resource.hh"
#include <regex>
#include "service/storage_proxy.hh"
const sstring auth::data_resource::ROOT_NAME("data");
auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
: _level(l), _ks(ks), _cf(cf)
{
}
auth::data_resource::data_resource()
: data_resource(level::ROOT)
{}
auth::data_resource::data_resource(const sstring& ks)
: data_resource(level::KEYSPACE, ks)
{}
auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
: data_resource(level::COLUMN_FAMILY, ks, cf)
{}
auth::data_resource::level auth::data_resource::get_level() const {
return _level;
}
auth::data_resource auth::data_resource::from_name(
const sstring& s) {
static std::regex slash_regex("/");
auto i = std::regex_token_iterator<sstring::const_iterator>(s.begin(),
s.end(), slash_regex, -1);
auto e = std::regex_token_iterator<sstring::const_iterator>();
auto n = std::distance(i, e);
if (n > 3 || ROOT_NAME != sstring(*i++)) {
throw std::invalid_argument(sprint("%s is not a valid data resource name", s));
}
if (n == 1) {
return data_resource();
}
auto ks = *i++;
if (n == 2) {
return data_resource(ks.str());
}
auto cf = *i++;
return data_resource(ks.str(), cf.str());
}
sstring auth::data_resource::name() const {
switch (get_level()) {
case level::ROOT:
return ROOT_NAME;
case level::KEYSPACE:
return sprint("%s/%s", ROOT_NAME, _ks);
case level::COLUMN_FAMILY:
default:
return sprint("%s/%s/%s", ROOT_NAME, _ks, _cf);
}
}
auth::data_resource auth::data_resource::get_parent() const {
switch (get_level()) {
case level::KEYSPACE:
return data_resource();
case level::COLUMN_FAMILY:
return data_resource(_ks);
default:
throw std::invalid_argument("Root-level resource can't have a parent");
}
}
const sstring& auth::data_resource::keyspace() const
throw (std::invalid_argument) {
if (is_root_level()) {
throw std::invalid_argument("ROOT data resource has no keyspace");
}
return _ks;
}
const sstring& auth::data_resource::column_family() const
throw (std::invalid_argument) {
if (!is_column_family_level()) {
throw std::invalid_argument(sprint("%s data resource has no column family", name()));
}
return _cf;
}
bool auth::data_resource::has_parent() const {
return !is_root_level();
}
bool auth::data_resource::exists() const {
switch (get_level()) {
case level::ROOT:
return true;
case level::KEYSPACE:
return service::get_local_storage_proxy().get_db().local().has_keyspace(_ks);
case level::COLUMN_FAMILY:
default:
return service::get_local_storage_proxy().get_db().local().has_schema(_ks, _cf);
}
}
sstring auth::data_resource::to_string() const {
switch (get_level()) {
case level::ROOT:
return "<all keyspaces>";
case level::KEYSPACE:
return sprint("<keyspace %s>", _ks);
case level::COLUMN_FAMILY:
default:
return sprint("<table %s.%s>", _ks, _cf);
}
}
bool auth::data_resource::operator==(const data_resource& v) const {
return _ks == v._ks && _cf == v._cf;
}
bool auth::data_resource::operator<(const data_resource& v) const {
return _ks < v._ks ? true : (v._ks < _ks ? false : _cf < v._cf);
}
std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
return os << r.to_string();
}

158
auth/data_resource.hh Normal file
View File

@@ -0,0 +1,158 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "utils/hash.hh"
#include <iosfwd>
#include <set>
#include <seastar/core/sstring.hh>
namespace auth {
class data_resource {
private:
enum class level {
ROOT, KEYSPACE, COLUMN_FAMILY
};
static const sstring ROOT_NAME;
level _level;
sstring _ks;
sstring _cf;
data_resource(level, const sstring& ks = {}, const sstring& cf = {});
level get_level() const;
public:
/**
* Creates a DataResource representing the root-level resource.
* @return the root-level resource.
*/
data_resource();
/**
* Creates a DataResource representing a keyspace.
*
* @param keyspace Name of the keyspace.
*/
data_resource(const sstring& ks);
/**
* Creates a DataResource instance representing a column family.
*
* @param keyspace Name of the keyspace.
* @param columnFamily Name of the column family.
*/
data_resource(const sstring& ks, const sstring& cf);
/**
* Parses a data resource name into a DataResource instance.
*
* @param name Name of the data resource.
* @return DataResource instance matching the name.
*/
static data_resource from_name(const sstring&);
/**
* @return Printable name of the resource.
*/
sstring name() const;
/**
* @return Parent of the resource, if any. Throws IllegalStateException if it's the root-level resource.
*/
data_resource get_parent() const;
bool is_root_level() const {
return get_level() == level::ROOT;
}
bool is_keyspace_level() const {
return get_level() == level::KEYSPACE;
}
bool is_column_family_level() const {
return get_level() == level::COLUMN_FAMILY;
}
/**
* @return keyspace of the resource.
* @throws std::invalid_argument if it's the root-level resource.
*/
const sstring& keyspace() const throw(std::invalid_argument);
/**
* @return column family of the resource.
* @throws std::invalid_argument if it's not a cf-level resource.
*/
const sstring& column_family() const throw(std::invalid_argument);
/**
* @return Whether or not the resource has a parent in the hierarchy.
*/
bool has_parent() const;
/**
* @return Whether or not the resource exists in scylla.
*/
bool exists() const;
sstring to_string() const;
bool operator==(const data_resource&) const;
bool operator<(const data_resource&) const;
size_t hash_value() const {
return utils::tuple_hash()(_ks, _cf);
}
};
/**
* Resource id mappings, i.e. keyspace and/or column families.
*/
using resource_ids = std::set<data_resource>;
std::ostream& operator<<(std::ostream&, const data_resource&);
}

View File

@@ -39,298 +39,202 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/default_authorizer.hh"
extern "C" {
#include <crypt.h>
#include <unistd.h>
}
#include <chrono>
#include <crypt.h>
#include <random>
#include <chrono>
#include <boost/algorithm/string/join.hpp>
#include <boost/range.hpp>
#include <seastar/core/seastar.hh>
#include <seastar/core/reactor.hh>
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/permission.hh"
#include "auth/role_or_anonymous.hh"
#include "auth.hh"
#include "default_authorizer.hh"
#include "authenticated_user.hh"
#include "permission.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "database.hh"
namespace auth {
const sstring auth::default_authorizer::DEFAULT_AUTHORIZER_NAME(
"org.apache.cassandra.auth.CassandraAuthorizer");
std::string_view default_authorizer::qualified_java_name() const {
return "org.apache.cassandra.auth.CassandraAuthorizer";
static const sstring USER_NAME = "username";
static const sstring RESOURCE_NAME = "resource";
static const sstring PERMISSIONS_NAME = "permissions";
static const sstring PERMISSIONS_CF = "permissions";
static logging::logger logger("default_authorizer");
auth::default_authorizer::default_authorizer() {
}
auth::default_authorizer::~default_authorizer() {
}
static constexpr std::string_view ROLE_NAME = "role";
static constexpr std::string_view RESOURCE_NAME = "resource";
static constexpr std::string_view PERMISSIONS_NAME = "permissions";
static constexpr std::string_view PERMISSIONS_CF = "role_permissions";
future<> auth::default_authorizer::init() {
sstring create_table = sprint("CREATE TABLE %s.%s ("
"%s text,"
"%s text,"
"%s set<text>,"
"PRIMARY KEY(%s, %s)"
") WITH gc_grace_seconds=%d", auth::auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME,
USER_NAME, RESOURCE_NAME, 90 * 24 * 60 * 60); // 3 months.
static logging::logger alogger("default_authorizer");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authorizer,
default_authorizer,
cql3::query_processor&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.CassandraAuthorizer");
default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
: _qp(qp)
, _migration_manager(mm) {
return auth::setup_table(PERMISSIONS_CF, create_table);
}
default_authorizer::~default_authorizer() {
}
static const sstring legacy_table_name{"permissions"};
future<auth::permission_set> auth::default_authorizer::authorize(
::shared_ptr<authenticated_user> user, data_resource resource) const {
return user->is_super().then([this, user, resource = std::move(resource)](bool is_super) {
if (is_super) {
return make_ready_future<permission_set>(permissions::ALL);
}
bool default_authorizer::legacy_metadata_exists() const {
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
}
/**
* TOOD: could create actual data type for permission (translating string<->perm),
* but this seems overkill right now. We still must store strings so...
*/
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?"
, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
.then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
future<bool> default_authorizer::any_granted() const {
static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
{},
true).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return !results->empty();
});
}
future<> default_authorizer::migrate_legacy_metadata() const {
alogger.info("Starting migration of legacy permissions metadata.");
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
return do_with(
row.get_as<sstring>("username"),
parse_resource(row.get_as<sstring>(RESOURCE_NAME)),
[this, &row](const auto& username, const auto& r) {
const permission_set perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
return grant(username, perms, r);
});
}).finally([results] {});
}).then([] {
alogger.info("Finished migrating legacy permissions metadata.");
}).handle_exception([](std::exception_ptr ep) {
alogger.error("Encountered an error during migration!");
std::rethrow_exception(ep);
});
}
future<> default_authorizer::start() {
static const sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text,"
"%s set<text>,"
"PRIMARY KEY(%s, %s)"
") WITH gc_grace_seconds=%d",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME,
PERMISSIONS_NAME,
ROLE_NAME,
RESOURCE_NAME,
90 * 24 * 60 * 60); // 3 months.
return once_among_shards([this] {
return create_metadata_table_if_missing(
PERMISSIONS_CF,
_qp,
create_table,
_migration_manager).then([this] {
_finished = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();
if (legacy_metadata_exists()) {
if (!any_granted().get0()) {
migrate_legacy_metadata().get0();
return;
}
alogger.warn("Ignoring legacy permissions metadata since role permissions exist.");
}
});
});
if (res->empty() || !res->one().has(PERMISSIONS_NAME)) {
return make_ready_future<permission_set>(permissions::NONE);
}
return make_ready_future<permission_set>(permissions::from_strings(res->one().get_set<sstring>(PERMISSIONS_NAME)));
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to authorize {} for {}", user->name(), resource);
return make_ready_future<permission_set>(permissions::NONE);
}
});
});
}
future<> default_authorizer::stop() {
_as.request_abort();
return _finished.handle_exception_type([](const sleep_aborted&) {}).handle_exception_type([](const abort_requested_exception&) {});
#include <boost/range.hpp>
future<> auth::default_authorizer::modify(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring user, sstring op) {
// TODO: why does this not check super user?
auto& qp = cql3::get_local_query_processor();
auto query = sprint("UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
auth::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
PERMISSIONS_NAME, op, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::ONE, {
permissions::to_strings(set), user, resource.name() }).discard_result();
}
future<permission_set>
default_authorizer::authorize(const role_or_anonymous& maybe_role, const resource& r) const {
if (is_anonymous(maybe_role)) {
return make_ready_future<permission_set>(permissions::NONE);
}
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",
PERMISSIONS_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME);
future<> auth::default_authorizer::grant(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring to) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(to), "+");
}
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
{*maybe_role.name, r.name()}).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return permissions::NONE;
future<> auth::default_authorizer::revoke(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring from) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(from), "-");
}
future<std::vector<auth::permission_details>> auth::default_authorizer::list(
::shared_ptr<authenticated_user> performer, permission_set set,
optional<data_resource> resource, optional<sstring> user) const {
return performer->is_super().then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
if (!is_super && (!user || performer->name() != *user)) {
throw exceptions::unauthorized_exception(sprint("You are not authorized to view %s's permissions", user ? *user : "everyone"));
}
return permissions::from_strings(results->one().get_set<sstring>(PERMISSIONS_NAME));
});
}
auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF);
auto& qp = cql3::get_local_query_processor();
future<>
default_authorizer::modify(
std::string_view role_name,
permission_set set,
const resource& resource,
std::string_view op) const {
return do_with(
format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
PERMISSIONS_NAME,
PERMISSIONS_NAME,
op,
ROLE_NAME,
RESOURCE_NAME),
[this, &role_name, set, &resource](const auto& query) {
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();
});
}
// Oh, look, it is a case where it does not pay off to have
// parameters to process in an initializer list.
future<::shared_ptr<cql3::untyped_result_set>> f = make_ready_future<::shared_ptr<cql3::untyped_result_set>>();
if (resource && user) {
query += sprint(" WHERE %s = ? AND %s = ?", USER_NAME, RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
} else if (resource) {
query += sprint(" WHERE %s = ? ALLOW FILTERING", RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {resource->name()});
} else if (user) {
query += sprint(" WHERE %s = ?", USER_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user});
} else {
f = qp.process(query, db::consistency_level::ONE, {});
}
future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "+");
}
return f.then([set](::shared_ptr<cql3::untyped_result_set> res) {
std::vector<permission_details> result;
future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "-");
}
for (auto& row : *res) {
if (row.has(PERMISSIONS_NAME)) {
auto username = row.get_as<sstring>(USER_NAME);
auto resource = data_resource::from_name(row.get_as<sstring>(RESOURCE_NAME));
auto ps = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
ps = permission_set::from_mask(ps.mask() & set.mask());
future<std::vector<permission_details>> default_authorizer::list_all() const {
static const sstring query = format("SELECT {}, {}, {} FROM {}.{}",
ROLE_NAME,
RESOURCE_NAME,
PERMISSIONS_NAME,
meta::AUTH_KS,
PERMISSIONS_CF);
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
std::vector<permission_details> all_details;
for (const auto& row : *results) {
if (row.has(PERMISSIONS_NAME)) {
auto role_name = row.get_as<sstring>(ROLE_NAME);
auto resource = parse_resource(row.get_as<sstring>(RESOURCE_NAME));
auto perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
all_details.push_back(permission_details{std::move(role_name), std::move(resource), std::move(perms)});
result.emplace_back(permission_details {username, resource, ps});
}
}
}
return all_details;
return make_ready_future<std::vector<permission_details>>(std::move(result));
});
});
}
future<> default_authorizer::revoke_all(std::string_view role_name) const {
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME);
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);
}
});
future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME);
return qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
[dropped_user](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", dropped_user, e);
}
});
}
future<> default_authorizer::revoke_all(const resource& resource) const {
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",
ROLE_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
RESOURCE_NAME);
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
{resource.name()}).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
future<> auth::default_authorizer::revoke_all(data_resource resource) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
USER_NAME, auth::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
.then_wrapped([resource, &qp](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
return parallel_for_each(
res->begin(),
res->end(),
[this, res, resource](const cql3::untyped_result_set::row& r) {
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME);
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
{r.get_as<sstring>(ROLE_NAME), resource.name()}).discard_result().handle_exception(
[resource](auto ep) {
return parallel_for_each(res->begin(), res->end(), [&qp, res, resource](const cql3::untyped_result_set::row& r) {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ? AND %s = ?"
, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
.discard_result().handle_exception([resource](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
}
});
});
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
return make_ready_future();
}
});
}
const resource_set& default_authorizer::protected_resources() const {
static const resource_set resources({ make_data_resource(meta::AUTH_KS, PERMISSIONS_CF) });
return resources;
const auth::resource_ids& auth::default_authorizer::protected_resources() {
static const resource_ids ids({ data_resource(auth::AUTH_KS, PERMISSIONS_CF) });
return ids;
}
future<> auth::default_authorizer::validate_configuration() const {
return make_ready_future();
}

View File

@@ -41,58 +41,37 @@
#pragma once
#include <functional>
#include <seastar/core/abort_source.hh>
#include "auth/authorizer.hh"
#include "cql3/query_processor.hh"
#include "service/migration_manager.hh"
#include "authorizer.hh"
namespace auth {
class default_authorizer : public authorizer {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
abort_source _as{};
future<> _finished{make_ready_future<>()};
public:
default_authorizer(cql3::query_processor&, ::service::migration_manager&);
static const sstring DEFAULT_AUTHORIZER_NAME;
default_authorizer();
~default_authorizer();
virtual future<> start() override;
future<> init();
virtual future<> stop() override;
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override;
virtual std::string_view qualified_java_name() const override;
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
virtual future<> grant(std::string_view, permission_set, const resource&) const override;
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
virtual future<> revoke( std::string_view, permission_set, const resource&) const override;
future<> revoke_all(sstring) override;
virtual future<std::vector<permission_details>> list_all() const override;
future<> revoke_all(data_resource) override;
virtual future<> revoke_all(std::string_view) const override;
const resource_ids& protected_resources() override;
virtual future<> revoke_all(const resource&) const override;
virtual const resource_set& protected_resources() const override;
future<> validate_configuration() const override;
private:
bool legacy_metadata_exists() const;
future<bool> any_granted() const;
future<> migrate_legacy_metadata() const;
future<> modify(std::string_view, permission_set, const resource&, std::string_view) const;
future<> modify(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring, sstring);
};
} /* namespace auth */

View File

@@ -39,185 +39,175 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/password_authenticator.hh"
#include <algorithm>
#include <chrono>
#include <unistd.h>
#include <crypt.h>
#include <random>
#include <string_view>
#include <optional>
#include <chrono>
#include <boost/algorithm/cxx11/all_of.hpp>
#include <seastar/core/seastar.hh>
#include <seastar/core/reactor.hh>
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/passwords.hh"
#include "auth/roles-metadata.hh"
#include "cql3/untyped_result_set.hh"
#include "auth.hh"
#include "password_authenticator.hh"
#include "authenticated_user.hh"
#include "cql3/query_processor.hh"
#include "log.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
namespace auth {
constexpr std::string_view password_authenticator_name("org.apache.cassandra.auth.PasswordAuthenticator");
const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
// name of the hash column.
static constexpr std::string_view SALTED_HASH = "salted_hash";
static constexpr std::string_view OPTIONS = "options";
static constexpr std::string_view DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = sstring(meta::DEFAULT_SUPERUSER_NAME);
static const sstring SALTED_HASH = "salted_hash";
static const sstring USER_NAME = "username";
static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring CREDENTIALS_CF = "credentials";
static logging::logger plogger("password_authenticator");
static logging::logger logger("password_authenticator");
// To ensure correct initialization order, we unfortunately need to use a string literal.
static const class_registrator<
authenticator,
password_authenticator,
cql3::query_processor&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
auth::password_authenticator::~password_authenticator()
{}
static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());
auth::password_authenticator::password_authenticator()
{}
password_authenticator::~password_authenticator() {
// TODO: blowfish
// Origin uses Java bcrypt library, i.e. blowfish salt
// generation and hashing, which is arguably a "better"
// password hash than sha/md5 versions usually available in
// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo
// which should be the same order of safe, so the only
// real issue should be salted hash compatibility with
// origin if importing system tables from there.
//
// Since bcrypt/blowfish is _not_ (afaict) not available
// as a dev package/lib on most linux distros, we'd have to
// copy and compile for example OWL crypto
// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)
// to be fully bit-compatible.
//
// Until we decide this is needed, let's just use crypt_r,
// and some old-fashioned random salt generation.
static constexpr size_t rand_bytes = 16;
static sstring hashpw(const sstring& pass, const sstring& salt) {
// crypt_data is huge. should this be a thread_local static?
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
if (res == nullptr) {
throw std::system_error(errno, std::system_category());
}
return res;
}
password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
: _qp(qp)
, _migration_manager(mm)
, _stopped(make_ready_future<>()) {
static bool checkpw(const sstring& pass, const sstring& salted_hash) {
auto tmp = hashpw(pass, salted_hash);
return tmp == salted_hash;
}
static bool has_salted_hash(const cql3::untyped_result_set_row& row) {
return !row.get_or<sstring>(SALTED_HASH, "").empty();
}
static sstring gensalt() {
static sstring prefix;
static const sstring& update_row_query() {
static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name,
SALTED_HASH,
meta::roles_table::role_col_name);
return update_row_query;
}
std::random_device rd;
std::default_random_engine e1(rd());
std::uniform_int_distribution<char> dist;
static const sstring legacy_table_name{"credentials"};
sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
sstring input(rand_bytes, 0);
bool password_authenticator::legacy_metadata_exists() const {
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
}
for (char&c : input) {
c = valid_salt[dist(e1) % valid_salt.size()];
}
future<> password_authenticator::migrate_legacy_metadata() const {
plogger.info("Starting migration of legacy authentication metadata.");
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
sstring salt;
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
auto username = row.get_as<sstring>("username");
auto salted_hash = row.get_as<sstring>(SALTED_HASH);
if (!prefix.empty()) {
return prefix + salt;
}
return _qp.execute_internal(
update_row_query(),
consistency_for_user(username),
internal_distributed_query_state(),
{std::move(salted_hash), username}).discard_result();
}).finally([results] {});
}).then([] {
plogger.info("Finished migrating legacy authentication metadata.");
}).handle_exception([](std::exception_ptr ep) {
plogger.error("Encountered an error during migration!");
std::rethrow_exception(ep);
});
}
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
future<> password_authenticator::create_default_if_missing() const {
return default_role_row_satisfies(_qp, &has_salted_hash).then([this](bool exists) {
if (!exists) {
return _qp.execute_internal(
update_row_query(),
db::consistency_level::QUORUM,
internal_distributed_query_state(),
{passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {
plogger.info("Created default superuser authentication record.");
});
// Try in order:
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), tmp.get())) {
prefix = pfx;
return salt;
}
}
throw std::runtime_error("Could not initialize hashing algorithm");
}
return make_ready_future<>();
static sstring hashpw(const sstring& pass) {
return hashpw(pass, gensalt());
}
future<> auth::password_authenticator::init() {
gensalt(); // do this once to determine usable hashing
sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text," // salt + hash + number of rounds
"options map<text,text>,"// for future extensions
"PRIMARY KEY(%s)"
") WITH gc_grace_seconds=%d",
auth::auth::AUTH_KS,
CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
90 * 24 * 60 * 60); // 3 months.
return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
// instead of once-timer, just schedule this later
auth::schedule_when_up([] {
return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
auth::AUTH_KS,
CREDENTIALS_CF,
USER_NAME, SALTED_HASH
),
db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
logger.info("Created default user '{}'", DEFAULT_USER_NAME);
});
}
});
});
});
}
future<> password_authenticator::start() {
return once_among_shards([this] {
auto f = create_metadata_table_if_missing(
meta::roles_table::name,
_qp,
meta::roles_table::creation_query(),
_migration_manager);
_stopped = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {
if (legacy_metadata_exists()) {
plogger.warn("Ignoring legacy authentication metadata since nondefault data already exist.");
}
return;
}
if (legacy_metadata_exists()) {
migrate_legacy_metadata().get0();
return;
}
create_default_if_missing().get0();
});
});
return f;
});
}
future<> password_authenticator::stop() {
_as.request_abort();
return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});
}
db::consistency_level password_authenticator::consistency_for_user(std::string_view role_name) {
if (role_name == DEFAULT_USER_NAME) {
db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
if (username == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
std::string_view password_authenticator::qualified_java_name() const {
return password_authenticator_name;
const sstring& auth::password_authenticator::class_name() const {
return PASSWORD_AUTHENTICATOR_NAME;
}
bool password_authenticator::require_authentication() const {
bool auth::password_authenticator::require_authentication() const {
return true;
}
authentication_option_set password_authenticator::supported_options() const {
return authentication_option_set{authentication_option::password, authentication_option::options};
auth::authenticator::option_set auth::password_authenticator::supported_options() const {
return option_set::of<option::PASSWORD>();
}
authentication_option_set password_authenticator::alterable_options() const {
return authentication_option_set{authentication_option::password, authentication_option::options};
auth::authenticator::option_set auth::password_authenticator::alterable_options() const {
return option_set::of<option::PASSWORD>();
}
future<authenticated_user> password_authenticator::authenticate(
const credentials_map& credentials) const {
if (!credentials.contains(USERNAME_KEY)) {
throw exceptions::authentication_exception(format("Required key '{}' is missing", USERNAME_KEY));
future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
const credentials_map& credentials) const
throw (exceptions::authentication_exception) {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
}
if (!credentials.contains(PASSWORD_KEY)) {
throw exceptions::authentication_exception(format("Required key '{}' is missing", PASSWORD_KEY));
if (!credentials.count(PASSWORD_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));
}
auto& username = credentials.at(USERNAME_KEY);
@@ -228,140 +218,143 @@ future<authenticated_user> password_authenticator::authenticate(
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return futurize_invoke([this, username, password] {
static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name,
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
consistency_for_user(username),
internal_distributed_query_state(),
{username},
true);
return futurize_apply([this, username, password] {
auto& qp = cql3::get_local_query_processor();
return qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
consistency_for_user(username), {username}, true);
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (!res->empty()) {
salted_hash = res->one().get_opt<sstring>(SALTED_HASH);
}
if (!salted_hash || !passwords::check(password, *salted_hash)) {
if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
return make_ready_future<authenticated_user>(username);
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>(username));
} catch (std::system_error &) {
std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
} catch (exceptions::request_execution_exception& e) {
std::throw_with_nested(exceptions::authentication_exception(e.what()));
} catch (exceptions::authentication_exception& e) {
std::throw_with_nested(e);
} catch (...) {
std::throw_with_nested(exceptions::authentication_exception("authentication failed"));
}
});
}
future<> password_authenticator::maybe_update_custom_options(std::string_view role_name, const authentication_options& options) const {
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name,
OPTIONS,
meta::roles_table::role_col_name);
if (!options.options) {
return make_ready_future<>();
future<> auth::password_authenticator::create(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
std::vector<std::pair<data_value, data_value>> entries;
for (const auto& entry : *options.options) {
entries.push_back({data_value(entry.first), data_value(entry.second)});
}
auto map_value = make_map_value(map_type_impl::get_instance(utf8_type, utf8_type, false), entries);
return _qp.execute_internal(
query,
consistency_for_user(role_name),
internal_distributed_query_state(),
{std::move(map_value), sstring(role_name)}).discard_result();
}
future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return maybe_update_custom_options(role_name, options);
future<> auth::password_authenticator::alter(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
return _qp.execute_internal(
update_row_query(),
consistency_for_user(role_name),
internal_distributed_query_state(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result().then([this, role_name, &options] {
return maybe_update_custom_options(role_name, options);
});
}
future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return maybe_update_custom_options(role_name, options);
future<> auth::password_authenticator::drop(sstring username)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name,
SALTED_HASH,
meta::roles_table::role_col_name);
return _qp.execute_internal(
query,
consistency_for_user(role_name),
internal_distributed_query_state(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result().then([this, role_name, &options] {
return maybe_update_custom_options(role_name, options);
}).discard_result();
}
future<> password_authenticator::drop(std::string_view name) const {
static const sstring query = format("DELETE {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name,
meta::roles_table::role_col_name);
return _qp.execute_internal(
query, consistency_for_user(name),
internal_distributed_query_state(),
{sstring(name)}).discard_result();
const auth::resource_ids& auth::password_authenticator::protected_resources() const {
static const resource_ids ids({ data_resource(auth::AUTH_KS, CREDENTIALS_CF) });
return ids;
}
future<custom_options> password_authenticator::query_custom_options(std::string_view role_name) const {
static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",
OPTIONS,
meta::roles_table::qualified_name,
meta::roles_table::role_col_name);
::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge: public sasl_challenge {
public:
plain_text_password_challenge(const password_authenticator& a)
: _authenticator(a)
{}
return _qp.execute_internal(
query, consistency_for_user(role_name),
internal_distributed_query_state(),
{sstring(role_name)}).then([](::shared_ptr<cql3::untyped_result_set> rs) {
custom_options opts;
const auto& row = rs->one();
if (row.has(OPTIONS)) {
row.get_map_data<sstring, sstring>(OPTIONS, std::inserter(opts, opts.end()), utf8_type, utf8_type);
/**
* SASL PLAIN mechanism specifies that credentials are encoded in a
* sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
* The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
* authzId is optional, and in fact we don't care about it here as we'll
* set the authzId to match the authnId (that is, there is no concept of
* a user being authorized to act on behalf of another).
*
* @param bytes encoded credentials string sent by the client
* @return map containing the username/password pairs in the form an IAuthenticator
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes evaluate_response(bytes_view client_response)
throw (exceptions::authentication_exception) override {
logger.debug("Decoding credentials from client token");
sstring username, password;
auto b = client_response.crbegin();
auto e = client_response.crend();
auto i = b;
while (i != e) {
if (*i == 0) {
sstring tmp(i.base(), b.base());
if (password.empty()) {
password = std::move(tmp);
} else if (username.empty()) {
username = std::move(tmp);
}
b = ++i;
continue;
}
++i;
}
if (username.empty()) {
throw exceptions::authentication_exception("Authentication ID must not be null");
}
if (password.empty()) {
throw exceptions::authentication_exception("Password must not be null");
}
_credentials[USERNAME_KEY] = std::move(username);
_credentials[PASSWORD_KEY] = std::move(password);
_complete = true;
return {};
}
return opts;
});
}
const resource_set& password_authenticator::protected_resources() const {
static const resource_set resources({make_data_resource(meta::AUTH_KS, meta::roles_table::name)});
return resources;
}
::shared_ptr<sasl_challenge> password_authenticator::new_sasl_challenge() const {
return ::make_shared<plain_sasl_challenge>([this](std::string_view username, std::string_view password) {
credentials_map credentials{};
credentials[USERNAME_KEY] = sstring(username);
credentials[PASSWORD_KEY] = sstring(password);
return this->authenticate(credentials);
});
}
bool is_complete() const override {
return _complete;
}
future<::shared_ptr<authenticated_user>> get_authenticated_user() const
throw (exceptions::authentication_exception) override {
return _authenticator.authenticate(_credentials);
}
private:
const password_authenticator& _authenticator;
credentials_map _credentials;
bool _complete = false;
};
return ::make_shared<plain_text_password_challenge>(*this);
}

View File

@@ -41,66 +41,32 @@
#pragma once
#include <seastar/core/abort_source.hh>
#include "auth/authenticator.hh"
#include "cql3/query_processor.hh"
namespace service {
class migration_manager;
}
#include "authenticator.hh"
namespace auth {
extern const std::string_view password_authenticator_name;
class password_authenticator : public authenticator {
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
future<> _stopped;
seastar::abort_source _as;
public:
static db::consistency_level consistency_for_user(std::string_view role_name);
password_authenticator(cql3::query_processor&, ::service::migration_manager&);
static const sstring PASSWORD_AUTHENTICATOR_NAME;
password_authenticator();
~password_authenticator();
virtual future<> start() override;
future<> init();
virtual future<> stop() override;
const sstring& class_name() const override;
bool require_authentication() const override;
option_set supported_options() const override;
option_set alterable_options() const override;
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override;
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
const resource_ids& protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
virtual std::string_view qualified_java_name() const override;
virtual bool require_authentication() const override;
virtual authentication_option_set supported_options() const override;
virtual authentication_option_set alterable_options() const override;
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;
virtual future<> create(std::string_view role_name, const authentication_options& options) const override;
virtual future<> alter(std::string_view role_name, const authentication_options& options) const override;
virtual future<> drop(std::string_view role_name) const override;
virtual future<custom_options> query_custom_options(std::string_view role_name) const override;
virtual const resource_set& protected_resources() const override;
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
private:
future<> maybe_update_custom_options(std::string_view role_name, const authentication_options& options) const;
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata() const;
future<> create_default_if_missing() const;
static db::consistency_level consistency_for_user(const sstring& username);
};
}

View File

@@ -1,84 +0,0 @@
/*
* Copyright (C) 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/passwords.hh"
#include <cerrno>
#include <optional>
extern "C" {
#include <crypt.h>
#include <unistd.h>
}
namespace auth::passwords {
static thread_local crypt_data tlcrypt = { 0, };
namespace detail {
scheme identify_best_supported_scheme() {
const auto all_schemes = { scheme::bcrypt_y, scheme::bcrypt_a, scheme::sha_512, scheme::sha_256, scheme::md5 };
// "Random", for testing schemes.
const sstring random_part_of_salt = "aaaabbbbccccdddd";
for (scheme c : all_schemes) {
const sstring salt = sstring(prefix_for_scheme(c)) + random_part_of_salt;
const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
if (e && (e[0] != '*')) {
return c;
}
}
throw no_supported_schemes();
}
sstring hash_with_salt(const sstring& pass, const sstring& salt) {
auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
if (!res || (res[0] == '*')) {
throw std::system_error(errno, std::system_category());
}
return res;
}
const char* prefix_for_scheme(scheme c) noexcept {
switch (c) {
case scheme::bcrypt_y: return "$2y$";
case scheme::bcrypt_a: return "$2a$";
case scheme::sha_512: return "$6$";
case scheme::sha_256: return "$5$";
case scheme::md5: return "$1$";
default: return nullptr;
}
}
} // namespace detail
no_supported_schemes::no_supported_schemes()
: std::runtime_error("No allowed hashing schemes are supported on this system") {
}
bool check(const sstring& pass, const sstring& salted_hash) {
return detail::hash_with_salt(pass, salted_hash) == salted_hash;
}
} // namespace auth::paswords

Some files were not shown because too many files have changed in this diff Show More