Compare commits

...

1549 Commits

Author SHA1 Message Date
Konstantin Osipov
fd293768e7 storage_proxy: do not touch all_replicas.front() if it's empty.
The list of all endpoints for a query can be empty if we have
replication_factor 0 or there are no live endpoints for this token.
Do not access all_replicas.front() in this case.

Fixes #5935.
Message-Id: <20200306192521.73486-2-kostja@scylladb.com>

(cherry picked from commit 9827efe554)
2020-06-22 18:29:15 +03:00
Gleb Natapov
22dfa48585 cql transport: do not log broken pipe error when a client closes its side of a connection abruptly
Fixes #5661

Message-Id: <20200615075958.GL335449@scylladb.com>
(cherry picked from commit 7ca937778d)
2020-06-21 13:09:22 +03:00
Benny Halevy
2f3d7f1408 cql3::util::maybe_quote: avoid stack overflow and fix quote doubling
The function was reimplemented to solve the following issues.
The cutom implementation also improved its performance in
close to 19%

Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings
as found with the limits_test.py:TestLimits.max_key_length_test dtest.

std::regex_replace does not replace in-place so no doubling of
quotes was actually done.

Add unit test that reproduces the crash without this fix
and tests various string patterns for correctness.

Note that defining the regex with std::regex::optimize
still ended up with stack overflow.

Fixes #5671

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0329fe1fd1)
2020-06-21 13:07:21 +03:00
Gleb Natapov
76a08df939 commitlog: fix size of a write used to zero a segment
Due to a bug the entire segment is written in one huge write of 32Mb.
The idea was to split it to writes of 128K, so fix it.

Fixes #5857

Message-Id: <20200220102939.30769-1-gleb@scylladb.com>
(cherry picked from commit df2f67626b)
2020-06-21 13:03:05 +03:00
Amnon Heiman
6aa129d3b0 api/storage_service.cc: stream result of token_range
The get token range API can become big which can cause large allocation
and stalls.

This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.

Fixes #6297

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 7c4562d532)
2020-06-21 12:57:48 +03:00
Takuya ASADA
b4f781e4eb scylla_post_install.sh: fix operator precedence issue with multiple statements
In bash, 'A || B && C' will be problem because when A is true, then it will be
evaluates C, since && and || have the same precedence.
To avoid the issue we need make B && C in one statement.

Fixes #5764

(cherry picked from commit b6988112b4)
2020-06-21 12:47:05 +03:00
Takuya ASADA
27594ca50e scylla_raid_setup: create missing directories
We need to create hints, view_hints, saved_caches directories
on RAID volume.

Fixes #5811

(cherry picked from commit 086f0ffd5a)
2020-06-21 12:45:27 +03:00
Rafael Ávila de Espíndola
0f2f0d65d7 configure: Reduce the dynamic linker path size
gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path
size.

Fixes: #6494

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200528202741.398695-2-espindola@scylladb.com>
(cherry picked from commit aa778ec152)
2020-06-21 12:29:16 +03:00
Tomasz Grabiec
31c2f8a3ae row_cache: Fix undefined behavior on key linearization
This is relevant only when using partition or clustering keys which
have a representation in memory which is larger than 12.8 KB (10% of
LSA segment size).

There are several places in code (cache, background garbage
collection) which may need to linearize keys because of performing key
comparison, but it's not done safely:

 1) the code does not run with the LSA region locked, so pointers may
get invalidated on linearization if it needs to reclaim memory. This
is fixed by running the code inside an allocating section.

 2) LSA region is locked, but the scope of
with_linearized_managed_bytes() encloses the allocating section. If
allocating section needs to reclaim, linearization context will
contain invalidated pointers. The fix is to reorder the scopes so
that linearization context lives within an allocating section.

Example of 1 can be found in
range_populating_reader::handle_end_of_stream() where it performs a
lookup:

  auto prev = std::prev(it);
  if (prev->key().equal(*_cache._schema, *_last_key->_key)) {
     it->set_continuous(true);

but handle_end_of_stream() is not invoked under allocating section.

Example of 2 can be found in mutation_cleaner_impl::merge_some() where
it does:

  return with_linearized_managed_bytes([&] {
  ...
    return _worker_state->alloc_section(region, [&] {

Fixes #6637.
Refs #6108.

Tests:

  - unit (all)

Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e81fc1f095)
2020-06-21 11:58:59 +03:00
Yaron Kaikov
ec12331f11 release: prepare for 3.3.4 2020-06-15 21:19:02 +03:00
Avi Kivity
ccc463b5e5 tools: toolchain: regenerate for gnutls 3.6.14
CVE-2020-13777.

Fixes #6627.

Toolchain source image registry disambiguated due to tighter podman defaults.
2020-06-15 08:05:58 +03:00
Calle Wilund
4a9676f6b7 gms::inet_address: Fix sign extension error in custom address formatting
Fixes #5808

Seems some gcc:s will generate the code as sign extending. Mine does not,
but this should be more correct anyhow.

Added small stringify test to serialization_test for inet_address

(cherry picked from commit a14a28cdf4)
2020-06-09 20:16:50 +03:00
Takuya ASADA
aaf4989c31 aws: update enhanced networking supported instance list
Sync enhanced networking supported instance list to latest one.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Fixes #6540

(cherry picked from commit 969c4258cf)
2020-06-09 16:03:00 +03:00
Asias He
b29f954f20 gossip: Make is_safe_for_bootstrap more strict
Consider

1. Start n1, n2 in the cluster
2. Stop n2 and delete all data for n2
3. Start n2 to replace itself with replace_address_first_boot: n2
4. Kill n2 before n2 finishes the replace operation
5. Remove replace_address_first_boot: n2 from scylla.yaml of n2
6. Delete all data for n2
7. Start n2

At step 7, n2 will be allowed to bootstrap as a new node, because the
application state of n2 in the cluster is HIBERNATE which is not
rejected in the check of is_safe_for_bootstrap. As a result, n2 will
replace n2 with a different tokens and a different host_id, as if the
old n2 node was removed from the cluster silently.

Fixes #5172

(cherry picked from commit cdcedf5eb9)
2020-05-25 14:30:53 +03:00
Eliran Sinvani
5546d5df7b Auth: return correct error code when role is not found
Scylla returns the wrong error code (0000 - server internal error)
in response to trying to do authentication/authorization operations
that involves a non-existing role.
This commit changes those cases to return error code 2200 (invalid
query) which is the correct one and also the one that Cassandra
returns.
Tests:
    Unit tests (Dev)
    All auth and auth_role dtests

(cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214)

Fixes #6363.
2020-05-25 12:58:38 +03:00
Amnon Heiman
541c29677f storage_service: get_range_to_address_map prevent use after free
The implementation of get_range_to_address_map has a default behaviour,
when getting an empty keypsace, it uses the first non-system keyspace
(first here is basically, just a keyspace).

The current implementation has two issues, first, it uses a reference to
a string that is held on a stack of another function. In other word,
there's a use after free that is not clear why we never hit.

The second, it calls get_non_system_keyspaces twice. Though this is not
a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling
that function does have a cost).

This patch solves both issues, by chaning the implementation to hold a
string instead of a reference to a string.

Second, it stores the results from get_non_system_keyspaces and reuse
them it's more efficient and holds the returned values on the local
stack.

Fixes #6465

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 69a46d4179)
2020-05-25 12:48:48 +03:00
Hagit Segev
06f18108c0 release: prepare for 3.3.3 2020-05-24 23:28:07 +03:00
Tomasz Grabiec
90002ca3d2 sstables: index_reader: Fix overflow when calculating promoted index end
When index file is larger than 4GB, offset calculation will overflow
uint32_t and _promoted_index_end will be too small.

As a result, promoted_index_size calculation will underflow and the
rest of the page will be interpretd as a promoted index.

The partitions which are in the remainder of the index page will not
be found by single-partition queries.

Data is not lost.

Introduced in 6c5f8e0eda.

Fixes #6040
Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com>

(cherry picked from commit a6c87a7b9e)
2020-05-24 09:46:11 +03:00
Rafael Ávila de Espíndola
da23902311 repair: Make sure sinks are always closed
In a recent next failure I got the following backtrace

    function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101
    at ./seastar/include/seastar/core/shared_ptr.hh:463
    at repair/row_level.cc:2059

This patch changes a few functions to use finally to make sure the sink
is always closed.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200515202803.60020-1-espindola@scylladb.com>
(cherry picked from commit 311fbe2f0a)

Ref #6414
2020-05-20 09:00:57 +03:00
Asias He
2b0dc21f97 repair: Fix race between write_end_of_stream and apply_rows
Consider: n1, n2, n1 is the repair master, n2 is the repair follower.

=== Case 1 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to
   partition 1, r2 belongs to partition 2. It yields after row r1 is
   written.
   data: partition_start, r1
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream()
   data: partition_start, r1, partition_end
5) Step 2 resumes to apply the rows.
   data: partition_start, r1, partition_end, partition_end, partition_start, r2

=== Case 2 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to partition
   1, r2 belongs to partition 2. It yields after partition_start for r2
   is written but before _partition_opened is set to true.
   data: partition_start, r1, partition_end, partition_start
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream().
   Since _partition_opened[node_idx] is false, partition_end is skipped,
   end_of_stream is written.
   data: partition_start, r1, partition_end, partition_start, end_of_stream

This causes unbalanced partition_start and partition_end in the stream
written to sstables.

To fix, serialize the write_end_of_stream and apply_rows with a semaphore.

Fixes: #6394
Fixes: #6296
Fixes: #6414
(cherry picked from commit b2c4d9fdbc)
2020-05-20 08:22:05 +03:00
Piotr Dulikowski
b544691493 hinted handoff: don't keep positions of old hints in rps_set
When sending hints from one file, rps_set field in send_one_file_ctx
keeps track of commitlog positions of hints that are being currently
sent, or have failed to be sent. At the end of the operation, if sending
of some hints failed, we will choose position of the earliest hint that
failed to be sent, and will retry sending that file later, starting from
that position. This position is stored in _last_not_complete_rp.

Usually, this set has a bounded size, because we impose a limit of at
most 128 hints being sent concurrently. Because we do not attempt to
send any more hints after a failure is detected, rps_set should not have
more than 128 elements at a time.

Due to a bug, commitlog positions of old hints (older than
gc_grace_seconds of the destination table) were inserted into rps_set
but not removed after checking their age. This could cause rps_set to
grow very large when replaying a file with old hints.

Moreover, if the file mixed expired and non-expired hints (which could
happen if it had hints to two tables with different gc_grace_seconds),
and sending of some non-expired hints failed, then positions of expired
hints could influence calculation _last_not_complete_rp, and more hints
than necessary would be resent on the next retry.

This simple patch removes commitlog position of a hint from rps_set when
it is detected to be too old.

Fixes #6422

(cherry picked from commit 85d5c3d5ee)
2020-05-20 08:06:17 +03:00
Piotr Dulikowski
d420b06844 hinted handoff: remove discarded hint positions from rps_set
Related commit: 85d5c3d

When attempting to send a hint, an exception might occur that results in
that hint being discarded (e.g. keyspace or table of the hint was
removed).

When such an exception is thrown, position of the hint will already be
stored in rps_set. We are only allowed to retain positions of hints that
failed to be sent and needed to be retried later. Dropping a hint is not
an error, therefore its position should be removed from rps_set - but
current logic does not do that.

Because of that bug, hint files with many discardable hints might cause
rps_set to grow large when the file is replayed. Furthermore, leaving
positions of such hints in rps_set might cause more hints than necessary
to be re-sent if some non-discarded hints fail to be sent.

This commit fixes the problem by removing positions of discarded hints
from rps_set.

Fixes #6433

(cherry picked from commit 0c5ac0da98)
2020-05-20 08:04:10 +03:00
Avi Kivity
b3a2cb2f68 Update seastar submodule
* seastar 0ebd89a858...30f03aeba9 (1):
  > timer: add scheduling_group awareness

Fixes #6170.
2020-05-10 18:39:20 +03:00
Hagit Segev
c8c057f5f8 release: prepare for 3.3.2 2020-05-10 18:16:28 +03:00
Gleb Natapov
038bfc925c storage_proxy: limit read repair only to replicas that answered during speculative reads
Speculative reader has more targets that needed for CL. In case there is
a digest mismatch the repair runs between all of them, but that violates
provided CL. The patch makes it so that repair runs only between
replicas that answered (there will be CL of them).

Fixes #6123

Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402132245.GA21956@scylladb.com>
(cherry picked from commit 36a24bbb70)
2020-05-07 19:48:37 +03:00
Mike Goltsov
13a4e7db83 fix error in fstrim service (scylla_util.py)
On Centos 7 machine:

fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation
When trying run scylla-fstrim service manually you get error:

Traceback (most recent call last):
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module>
main()
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main
cfg = parse_scylla_dirs_with_default(conf=args.config)
File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default
if key not in y or not y[k]:
NameError: name 'k' is not defined

It caused by error in scylla_util.py

Fixes #6294.

(cherry picked from commit 068bb3a5bf)
2020-05-07 19:45:50 +03:00
Juliusz Stasiewicz
727d6cf8f3 atomic_cell: special rule for printing counter cells
Until now, attempts to print counter update cell would end up
calling abort() because `atomic_cell_view::value()` has no
specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`,
i.e. counter update IMR type. Such visitor is not easy to write
if we want to intercept counters only (and not all int64_t values).

Anyway, linearized byte representation of counter cell would not
be helpful without knowing if it consists of counter shards or
counter update (delta) - and this must be known upon `deserialize`.

This commit introduces simple approach: it determines cell type on
high level (from `atomic_cell_view`) and prints counter contents by
`counter_cell_view` or `atomic_cell_view::counter_update_value()`.

Fixes #5616

(cherry picked from commit 0ea17216fe)
2020-05-07 19:40:47 +03:00
Tomasz Grabiec
6d6d7b4abe sstables: Release reserved space for sharding metadata
The intention of the code was to clear sharding metadata
chunked_vector so that it doesn't bloat memory.

The type of c is `chunked_vector*`. Assigning `{}`
clears the pointer while the intended behavior was to reset the
`chunked_vector` instance. The original instance is left unmodified
with all its reserved space.

Because of this, the previous fix had no effect because token ranges
are stored entirely inline and popping them doesn't realease memory.

Fixes #4951

Tests:
  - sstable_mutation_test (dev)
  - manual using scylla binary on customer data on top of 2019.1.5

Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5fe626a887)
2020-05-07 19:06:22 +03:00
Tomasz Grabiec
28f974b810 Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael
Row cache needs to be invalidated whenever data in sstables
changes. Cleanup removes data from sstables which doesn't belong to
the node anymore, which means cache must be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which
data are still stored in the node's row cache, because cleanup didn't
invalidate the cache."

Fixes #4446.

tests:
- unit tests (dev mode)
- dtests:
    update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
    cleanup_test.py

(cherry picked from commit d0b6be0820)
2020-05-07 16:24:51 +03:00
Piotr Sarna
5fdadcaf3b network_topology_strategy: validate integers
In order to prevent users from creating a network topology
strategy instance with invalid inputs, it's not enough to use
std::stol() on the input: a string "3abc" still returns the number '3',
but will later confuse cqlsh and other drivers, when they ask for
topology strategy details.
The error message is now more human readable, since for incorrect
numeric inputs it used to return a rather cryptic message:
    ServerError: stol()
This commit fixes the issue and comes with a simple test.

Fixes #3801
Tests: unit(dev)
Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com>

(cherry picked from commit 5b7a35e02b)
2020-05-07 16:24:49 +03:00
Pekka Enberg
a960394f27 scripts/jobs: Keep memory reserve when calculating parallelism
The "jobs" script is used to determine the amount of compilation
parallelism on a machine. It attempts to ensure each GCC process has at
least 4 GB of memory per core. However, in the worst case scenario, we
could end up having the GCC processes take up all the system memory,
forcin swapping or OOM killer to kick in. For example, on a 4 core
machine with 16 GB of memory, this worst case scenario seems easy to
trigger in practice.

Fix up the problem by keeping a 1 GB of memory reserve for other
processes and calculating parallelism based on that.

Message-Id: <20200423082753.31162-1-penberg@scylladb.com>
(cherry picked from commit 7304a795e5)
2020-05-04 19:01:54 +03:00
Piotr Sarna
3216a1a70a alternator: fix signature timestamps
Generating timestamps for auth signatures used a non-thread-safe
::gmtime function instead of thread-safe ::gmtime_r.

Tests: unit(dev)
Fixes #6345

(cherry picked from commit fb7fa7f442)
2020-05-04 17:08:13 +03:00
Avi Kivity
5a7fd41618 Merge 'Fix hang in multishard_writer' from Asias
"
This series fix hang in multishard_writer when error happens. It contains
- multishard_writer: Abort the queue attached to consumers when producer fails
- repair: Fix hang when the writer is dead

Fixes #6241
Refs: #6248
"

* asias-stream_fix_multishard_writer_hang:
  repair: Fix hang when the writer is dead
  mutation_writer_test: Add test_multishard_writer_producer_aborts
  multishard_writer: Abort the queue attached to consumers when producer fails

(cherry picked from commit 8925e00e96)
2020-05-01 20:13:00 +03:00
Raphael S. Carvalho
dd24ba7a62 api/service: fix segfault when taking a snapshot without keyspace specified
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
(cherry picked from commit 02e046608f)

Fixes #6336.
2020-04-30 12:57:14 +03:00
Avi Kivity
204f6dd393 Update seastar submodule
* seastar a0bdc6cd85...0ebd89a858 (1):
  > http server: fix "Date" header format

Fixes #6253.
2020-04-26 19:31:44 +03:00
Nadav Har'El
b1278adc15 alternator: unzero "scylla_alternator_total_operations" metric
In commit 388b492040, which was only supposed
to move around code, we accidentally lost the line which does

    _executor.local()._stats.total_operations++;

So after this commit this counter was always zero...
This patch returns the line incrementing this counter.

Arguably, this counter is not very important - a user can also calculate
this number by summing up all the counters in the scylla_alternator_operation
array (these are counters for individual types of operations). Nevertheless,
as long as we do export a "scylla_alternator_total_operations" metric,
we need to correctly calculate it and can't leave it zero :-)

Fixes #5836

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219162820.14205-1-nyh@scylladb.com>
(cherry picked from commit b8aed18a24)
2020-04-19 19:07:31 +03:00
Botond Dénes
ee9677ef71 schema: schema(): use std::stable_sort() to sort key columns
When multiple key columns (clustering or partition) are passed to
the schema constructor, all having the same column id, the expectation
is that these columns will retain the order in which they were passed to
`schema_builder::with_column()`. Currently however this is not
guaranteed as the schema constructor sort key columns by column id with
`std::sort()`, which doesn't guarantee that equally comparing elements
retain their order. This can be an issue for indexes, the schemas of
which are built independently on each node. If there is any room for
variance between for the key column order, this can result in different
nodes having incompatible schemas for the same index.
The fix is to use `std::stable_sort()` which guarantees that the order
of equally comparing elements won't change.

This is a suspected cause of #5856, although we don't have hard proof.

Fixes: #5856
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
[avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes
      unstable at 17 elements, and the failing schema had a
      clustering key with 23 elements]
Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com>
(cherry picked from commit a4aa753f0f)
2020-04-19 18:19:05 +03:00
Nadav Har'El
2060e361cf materialized views: fix corner case of view updates used by Alternator
While CQL does not allow creation of a materialized view with more than one
base regular column in the view's key, in Alternator we do allow this - both
partition and clustering key may be a base regular column. We had a bug in
the logic handling this case:

If the new base row is missing a value for *one* of the view key columns,
we shouldn't create a view row. Similarly, if the existing base row was
missing a value for *one* of the view key columns, a view row does not
exist and doesn't need to be deleted.  This was done incorrectly, and made
decisions based on just one of the key columns, and the logic is now
fixed (and I think, simplified) in this patch.

With this patch, the Alternator test which previously failed because of
this problem now passes. The patch also includes new tests in the existing
C++ unit test test_view_with_two_regular_base_columns_in_key. This tests
was already supposed to be testing various cases of two-new-key-columns
updates, but missed the cases explained above. These new tests failed
badly before this patch - some of them had clean write errors, others
caused crashes. With this patch, they pass.

Fixes #6008.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200312162503.8944-1-nyh@scylladb.com>
(cherry picked from commit 635e6d887c)
2020-04-19 15:24:19 +03:00
Hagit Segev
6f939ffe19 release: prepare for 3.3.1 2020-04-18 00:23:31 +03:00
Kamil Braun
69105bde8a sstables: freeze types nested in collection types in legacy sstables
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).

SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).

This commit does two things:
1. for all SSTables created after this commit, include a new feature
   flag, CorrectUDTsInCollections, presence of which implies that frozen
   UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
   nested inside collections are always frozen, even if they don't have
   the tag. This assumption is safe to be made, because at the time of
   this commit, Scylla does not allow non-frozen (multi-cell) types inside
   collections or UDTs, and because of point 1 above.

There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.

Fixes #6130.

(cherry picked from commit 3d811e2f95)
2020-04-17 09:12:28 +03:00
Kamil Braun
e09e9a5929 sstables: move definition of column_translation::state::build to a .cc file
Ref #6130
2020-04-17 09:12:28 +03:00
Piotr Sarna
2308bdbccb alternator: use partition tombstone if there's no clustering key
As @tgrabiec helpfully pointed out, creating a row tombstone
for a table which does not have a clustering key in its schema
creates something that looks like an open-ended range tombstone.
That's problematic for KA/LA sstable formats, which are incapable
of writing such tombstones, so a workaround is provided
in order to allow using KA/LA in alternator.

Fixes #6035
Cherry-picked from 0a2d7addc0
2020-04-16 12:14:10 +02:00
Asias He
a2d39c9a2e gossip: Add an option to force gossip generation
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529c2b)
2020-03-27 12:49:23 +01:00
Asias He
5fe2ce3bbe gossiper: Always use the new generation number
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.

Consier the following:

- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
  INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
  storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber1, filber 1 finishes faster filber 2, it
  sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
  gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
  endpoint_state_map, at this point endpoint_state_map contains
  information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
  with new generation number generated correctly in
  storage_service::prepare_to_join, but in
  maybe_initialize_local_state(generation_nbr), it will not set new
  generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
  hbs.update_heart_beat() the heartbeat is set to the number starting
  from 0.
- n1 and n2 will not get update from n3 because they use the same
  generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.

To fix, always use the the new generation number.

Fixes: #5800
Backports: 3.0 3.1 3.2
(cherry picked from commit 62774ff882)
2020-03-27 12:49:20 +01:00
Piotr Sarna
aafa34bbad cql: fix qualifying indexed columns for filtering
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.

Fixes #5708
Tests: unit(dev)

(cherry picked from commit 767ff59418)
2020-03-22 09:00:51 +01:00
Hagit Segev
7ae2cdf46c release: prepare for 3.3.0 2020-03-19 21:46:44 +02:00
Hagit Segev
863f88c067 release: prepare for 3.3.rc3 2020-03-15 22:45:30 +02:00
Avi Kivity
90b4e9e595 Update seastar submodule
* seastar f54084c08f...a0bdc6cd85 (1):
  > tls: Fix race and stale memory use in delayed shutdown

Fixes #5759 (maybe)
2020-03-12 19:41:50 +02:00
Konstantin Osipov
434ad4548f locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>

(cherry picked from commit ac6f64a885)
2020-03-12 12:09:46 +02:00
Avi Kivity
cbbb15af5c logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>

(cherry picked from commit c020b4e5e2)
2020-03-12 11:25:20 +02:00
Benny Halevy
3231580c05 dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.

To prevent that we need to set the following macros as follows:
  unset `_unique_build_ids`
  set `_no_recompute_build_ids` to 1

Fixes #5881

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
2020-03-09 15:21:50 +02:00
Piotr Sarna
62364d9dcd Merge 'cql3: do_execute_base_query: fix null deref ...
... when clustering key is unavailable' from Benny

This series fixes null pointer dereference seen in #5794

efd7efe cql3: generate_base_key_from_index_pk; support optional index_ck
7af1f9e cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
7fe1a9e cql3: do_execute_base_query: fixup indentation

Fixes #5794

Branches: 3.3

Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug)

* bhalevy/fix-5794-generate_base_key_from_index_pk:
  cql3: do_execute_base_query: fixup indentation
  cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
  cql3: generate_base_key_from_index_pk; support optional index_ck

(cherry picked from commit 4e95b67501)
2020-03-09 15:20:01 +02:00
Takuya ASADA
3bed8063f6 dist/debian: fix "unable to open node-exporter.service.dpkg-new" error
It seems like *.service is conflicting on install time because the file
installed twice, both debian/*.service and debian/scylla-server.install.

We don't need to use *.install, so we can just drop the line.

Fixes #5640

(cherry picked from commit 29285b28e2)
2020-03-03 12:40:39 +02:00
Yaron Kaikov
413fcab833 release: prepare for 3.3.rc2 2020-02-27 14:45:18 +02:00
Juliusz Stasiewicz
9f3c3036bf cdc: set TTLs on CDC log cells
Cells in CDC logs used to be created while completely neglecting
TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs
to all cells; there are no row markers, so wee need not set TTL
there.

Fixes #5688

(cherry picked from commit 67b92c584f)
2020-02-26 18:12:55 +02:00
Benny Halevy
ff2e108a6d gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
(cherry picked from commit f45fabab73)
2020-02-26 13:00:11 +02:00
Gleb Natapov
ade788ffe8 commitlog: use commitlog IO scheduling class for segment zeroing
There may be other commitlog writes waiting for zeroing to complete, so
not using proper scheduling class causes priority inversion.

Fixes #5858.

Message-Id: <20200220102939.30769-2-gleb@scylladb.com>
(cherry picked from commit 6a78cc9e31)
2020-02-26 12:51:10 +02:00
Benny Halevy
1f8bb754d9 storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service
Match subscription done in main() and avoid cross shard access
to _lifecycle_subscribers vector.

Fixes #5385

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>
(cherry picked from commit 5b0ea4c114)
2020-02-25 16:39:49 +02:00
Tomasz Grabiec
7b2eb09225 Merge fixes for use-after-frees related to shutdown of services
Backport of 884d5e2bcb and
4839ca8491.

Fixes crashes when scylla is stopped early during boot.

Merged from https://github.com/xemul/scylla/tree/br-mm-combined-fixes-for-3.3

Fixes #5765.
2020-02-25 13:34:01 +01:00
Pavel Emelyanov
d2293f9fd5 migration_manager: Abort and wait cluster upgrade waiters
The maybe_schedule_schema_pull waits for schema_tables_v3 to
become available. This is unsafe in case migration manager
goes away before the feature is enabled.

Fix this by subscribing on feature with feature::listener and
waiting for condition variable in maybe_schedule_schema_pull.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
25b31f6c23 migration_manager: Abort and wait delayed schema pulls
The sleep is interrupted with the abort source, the "wait" part
is done with the existing _background_tasks gate. Also we need
to make sure the gate stays alive till the end of the function,
so make use of the async_sharded_service (migration manager is
already such).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
742a1ce7d6 storage_service: Unregister from gossiper notifications ... at all
This unregistration doesn't happen currently, but doesn't seem to
cause any problems in general, as on stop gossiper is stopped and
nothing from it hits the store_service.

However (!) if an exception pops up between the storage_service
is subscribed on gossiper and the drain_on_shutdown defer action
is set up  then we _may_ get into the following situation:

- main's stuff gets unrolled back
- gossiper is not stopped (drain_on_shutdown defer is not set up)
- migration manager is stopped (with deferred action in main)
- a nitification comes from gossiper
    -> storage_service::on_change might want to pull schema with
       the help of local migration manager
    -> assert(local_is_initialized) strikes

Fix this by registering storage_service to gossiper a bit earlier
(both are already initialized y that time) and setting up unregister
defer right afterwards.

Test: unit(dev), manual start-stop
Bug: #5628

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200130190343.25656-1-xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Avi Kivity
4ca9d23b83 Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"
This reverts commit bdc542143e. Exposes a data resurrection
bug (#5838).
2020-02-24 10:02:58 +02:00
Avi Kivity
9e97f3a9b3 Update seastar submodule
* seastar dd686552ff...f54084c08f (2):
  > reactor: fallback to epoll backend when fs.aio-max-nr is too small
  > util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as()

Fixes #5638.
2020-02-20 10:25:00 +02:00
Piotr Dulikowski
183418f228 hh: handle counter update hints correctly
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.

When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.

When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.

This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.

Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
2020-02-19 16:49:12 +02:00
Piotr Sarna
756574d094 db,view: fix generating view updates for partition tombstones
The update generation path must track and apply all tombstones,
both from the existing base row (if read-before-write was needed)
and for the new row. One such path contained an error, because
it assumed that if the existing row is empty, then the update
can be simply generated from the new row. However, lack of the
existing row can also be the result of a partition/range tombstone.
If that's the case, it needs to be applied, because it's entirely
possible that this partition row also hides the new row.
Without taking the partition tombstone into account, creating
a future tombstone and inserting an out-of-order write before it
in the base table can result in ghost rows in the view table.
This patch comes with a test which was proven to fail before the
changes.

Branches 3.1,3.2,3.3
Fixes #5793

Tests: unit(dev)
Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>
(cherry picked from commit e93c54e837)
2020-02-16 20:26:28 +02:00
Rafael Ávila de Espíndola
a348418918 service: Add a lock around migration_notifier::_listeners
Before this patch the iterations over migration_notifier::_listeners
could race with listeners being added and removed.

The addition side is not modified, since it is common to add a
listener during construction and it would require a fairly big
refactoring. Instead, the iteration is modified to use indexes instead
of iterators so that it is still valid if another listener is added
concurrently.

For removal we use a rw lock, since removing an element invalidates
indexes too. There are only a few places that needed refactoring to
handle unregister_listener returning a future<>, so this is probably
OK.

Fixes #5541.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200120192819.136305-1-espindola@scylladb.com>
(cherry picked from commit 27bd3fe203)
2020-02-16 20:13:42 +02:00
Avi Kivity
06c0bd0681 Update seastar submodule
* seastar 3f3e117de3...dd686552ff (1):
  > perftune.py: Use safe_load() for fix arbitrary code execution

Fixes #5630.
2020-02-16 15:53:16 +02:00
Avi Kivity
223c300435 Point seastar submodule at scylla-seastar.git branch-3.3
This allows us to backport seastar patches to Scylla 3.3.
2020-02-16 15:51:46 +02:00
Gleb Natapov
ac8bef6781 commitlog: fix flushing an entry marked as "sync" in periodic mode
After 546556b71b we can have mixed writes into commitlog,
some do flush immediately some do not. If non flushing write races with
flushing one and becomes responsible for writing back its buffer into a
file flush will be skipped which will cause assert in batch_cycle() to
trigger since flush position will not be advanced. Fix that by checking
that flush was skipped and in this case flush explicitly our file
position.

Fixes #5670

Message-Id: <20200128145103.GI26048@scylladb.com>
(cherry picked from commit c654ffe34b)
2020-02-16 15:48:40 +02:00
Pavel Solodovnikov
68691907af lwt: fix handling of nulls in parameter markers for LWT queries
This patch affects the LWT queries with IF conditions of the
following form: `IF col in :value`, i.e. if the parameter
marker is used.

When executing a prepared query with a bound value
of `(None,)` (tuple with null, example for Python driver), it is
serialized not as NULL but as "empty" value (serialization
format differs in each case).

Therefore, Scylla deserializes the parameters in the request as
empty `data_value` instances, which are, in turn, translated
to non-empty `bytes_opt` with empty byte-string value later.

Account for this case too in the CAS condition evaluation code.

Example of a problem this patch aims to fix:

Suppose we have a table `tbl` with a boolean field `test` and
INSERT a row with NULL value for the `test` column.

Then the following update query fails to apply due to the
error in IF condition evaluation code (assume `v=(null)`):
`UPDATE tbl SET test=false WHERE key=0 IF test IN :v`
returns false in `[applied]` column, but is expected to succeed.

Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286)

Fixes: #5710

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit bcc4647552)
2020-02-16 15:29:28 +02:00
Avi Kivity
f59d2fcbf1 Merge "stop passing tracing state pointer in client_state" from Gleb
"
client_state is used simultaneously by many requests running in parallel
while tracing state pointer is per request. Both those facts do not sit
well together and as a result sometimes tracing state is being overwritten
while still been used by active request which may cause incorrect trace
or even a crash.
"

Fixes #5700.

Backported from 9f1f60fc38

* 'gleb/trace_fix_3.3_backport' of ssh://github.com/scylladb/seastar-dev:
  client_state: drop the pointer to a tracing state from client_state
  transport: pass tracing state explicitly instead of relying on it been in the client_state
  alternator: pass tracing state explicitly instead of relying on it been in the client_state
2020-02-16 15:23:41 +02:00
Asias He
bdc542143e streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.

It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.

The steps are:

- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
  new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges

In summary, this patch will avoid a lot of cache invalidation for
streaming.

Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
2020-02-16 15:16:24 +02:00
Botond Dénes
061a02237c row: append(): downgrade assert to on_internal_error()
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.

Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
2020-02-16 15:12:46 +02:00
Gleb Natapov
35b6505517 client_state: drop the pointer to a tracing state from client_state
client_state is shared between requests and tracing state is per
request. It is not safe to use the former as a container for the later
since a state can be overwritten prematurely by subsequent requests.

(cherry picked from commit 31cf2434d6)
2020-02-13 13:45:56 +02:00
Gleb Natapov
866c04dd64 transport: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per request.
Currently next request may overwrite tracing state for previous one
causing, in a best case, wrong trace to be taken or crash if overwritten
pointer is freed prematurely.

Fixes #5700

(cherry picked from commit 9f1f60fc38)
2020-02-13 13:45:56 +02:00
Gleb Natapov
dc588e6e7b alternator: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per
request. This is not yet an issue for the alternator since it creates
new client_state object for each request, but first of all it should not
and second trace state will be dropped from the client_state, by later
patch.

(cherry picked from commit 38fcab3db4)
2020-02-13 13:45:56 +02:00
Takuya ASADA
f842154453 dist/debian: keep /etc/systemd .conf files on 'remove'
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.

Fixes #5734

(cherry picked from commit 43097854a5)
2020-02-12 14:26:40 +02:00
Yaron Kaikov
b38193f71d dist/docker: Switch to 3.3 release repository (#5756)
Change the SCYLLA_REPO_URL variable to point to branch-3.3 instead of
master. This ensures that Docker image builds that don't specify the
variable build from the right repository by default.
2020-02-10 11:11:38 +02:00
Rafael Ávila de Espíndola
f47ba6dc06 lua: Handle nil returns correctly
This is a minimum backport to 3.3.

With this patch lua nil values are mapped to CQL null values instead
of producing an error.

Fixes #5667

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200203164918.70450-1-espindola@scylladb.com>
2020-02-09 18:55:42 +02:00
Hagit Segev
0d0c1d4318 release: prepare for 3.3.rc1 2020-02-09 15:55:24 +02:00
Takuya ASADA
9225b17b99 scylla_post_install.sh: fix 'integer expression expected' error
awk returns float value on Debian, it causes postinst script failure
since we compare it as integer value.
Replaced with sed + bash.

Fixes #5569

(cherry picked from commit 5627888b7c)
2020-02-04 14:30:04 +02:00
Gleb Natapov
00b3f28199 db/system_keyspace: use user memory limits for local.paxos table
Treat writes to local.paxos as user memory, as the number of writes is
dependent on the amount of user data written with LWT.

Fixes #5682

Message-Id: <20200130150048.GW26048@scylladb.com>
(cherry picked from commit b08679e1d3)
2020-02-02 17:36:52 +02:00
Rafael Ávila de Espíndola
1bbe619689 types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
2020-02-02 16:00:58 +02:00
Avi Kivity
c36f71c783 test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>

(cherry picked from commit ec5b721db7)
2020-02-01 13:20:22 +02:00
Pekka Enberg
f5471d268b release: prepare for 3.3.rc0 2020-01-30 14:00:51 +02:00
Takuya ASADA
fd5c65d9dc dist/debian: Use tilde for release candidate builds
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.

Fixes #5641

(cherry picked from commit dd81fd3454)
2020-01-28 18:34:48 +02:00
Avi Kivity
3aa406bf00 tools: toolchain: dbuild: relax process limit in container
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.

Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).

I checked that --pids-limit is supported by old versions of
docker and by podman.

Fixes #5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>

(cherry picked from commit 897320f6ab)
2020-01-28 18:14:01 +02:00
Piotr Sarna
c0253d9221 db,view: fix checking for secondary index special columns
A mistake in handling legacy checks for special 'idx_token' column
resulted in not recognizing materialized views backing secondary
indexes properly. The mistake is really a typo, but with bad
consequences - instead of checking the view schema for being an index,
we asked for the base schema, which is definitely not an index of
itself.

Branches 3.1,3.2 (asap)
Fixes #5621
Fixes #4744

(cherry picked from commit 9b379e3d63)
2020-01-21 23:32:11 +02:00
Avi Kivity
12bc965f71 atomic_cell: consistently use comma as separator in pretty-printers
The atomic_cell pretty printers use a mix of commas and semicolons.
This change makes them use commas everywhere, for consistency.
Message-Id: <20200116133327.2610280-1-avi@scylladb.com>
2020-01-16 17:26:33 +01:00
Nadav Har'El
1ed21d70dc merge: CDC: do mutation augmentation from storage proxy
Merged pull request https://github.com/scylladb/scylla/pull/5567
from Calle Wilund:

Fixes #5314

Instead of tying CDC handling into cql statement objects, this patch set
moves it to storage proxy, i.e. shared code for mutating stuff. This means
we automatically handle cdc for code paths outside cql (i.e. alternator).

It also adds api handling (though initially inefficient) for batch statements.

CDC is tied into storage proxy by giving the former a ref to the latter (per
shard). Initially this is not a constructor parameter, because right now we
have chicken and egg issues here. Hopefully, Pavels refactoring of migration
manager and notifications will untie these and this relationship can become
nicer.

The actual augmentation can (as stated above) be made much more efficient.
Hopefully, the stream management refactoring will deal with expensive stream
lookup, and eventually, we can maybe coalesce pre-image selects for batches.
However, that is left as an exercise for when deemed needed.

The augmentation API has an optional return value for a "post-image handler"
to be used iff returned after mutation call is finished (and successful).
It is not yet actually invoked from storage_proxy, but it is at least in the
call chain.
2020-01-16 17:12:56 +02:00
Avi Kivity
e677f56094 Merge "Enable general centos RPM (not only centos7)" from Hagit 2020-01-16 14:13:24 +02:00
Tomasz Grabiec
36d90e637e Merge "Relax migration manager dependencies" from Pavel Emalyanov
The set make dependencies between mm and other services cleaner,
in particular, after the set:

- the query processor no longer needs migration manager
  (which doesn't need query processor either)

- the database no longer needs migration manager, thus the mutual
  dependency between these two is dropped, only migration manager
  -> database is left

- the migration manager -> storage_service dependency is relaxed,
  one more patchset will be needed to remove it, thus dropping one
  more mutual dependency between them, only the storage_service
  -> migration manager will be left

- the migration manager is stopped on drain, but several more
  services need it on stop, thus causing use after free problems,
  in particular there's a caught bug when view builder crashes
  when unregistering from notifier list on stop. Fixed.

Tests: unit(dev)
Fixes: #5404
2020-01-16 12:12:25 +01:00
Hagit Segev
d0405003bd building-packages doc: Update no specific el7 on path 2020-01-16 12:49:08 +02:00
Rafael Ávila de Espíndola
c42a2c6f28 configure: Add -O1 when compiling generated parsers
Enabling asan enables a few cleanup optimizations in gcc. The net
result is that using

  -fsanitize=address -fno-sanitize-address-use-after-scope

Produces code that uses a lot less stack than if the file is compiled
with just -O0.

This patch adds -O1 in addition to
-fno-sanitize-address-use-after-scope to protect the unfortunate
developer that decides to build in dev mode with --cflags='-O0 -g'.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-2-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
317e0228a8 configure: Put user flags after the mode flags
It is sometimes convenient to build with flags that don't match any
existing mode.

Recently I was tracking a bug that would not reproduce with debug, but
reproduced with dev, so I tried debugging the result of

./configure.py --cflags="-O0 -g"

While the binary had debug info, it still had optimizations because
configure.py put the mode flags after the user flags (-O0 -O1). This
patch flips the order (-O1 -O0) so that the flags passed in the
command line win.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Gleb Natapov
51281bc8ad lwt: fix write timeout exception reporting
CQL transport code relies on an exception's C++ type to create correct
reply, but in lwt we converted some mutation_timeout exceptions to more
generic request_timeout while forwarding them which broke the protocol.
Do not drop type information.

Fixes #5598.

Message-Id: <20200115180313.GQ9084@scylladb.com>
2020-01-16 12:05:50 +02:00
Piotr Jastrzębski
0c8c1ec014 config: fix description of enable_deprecated_partitioners
Murmur3 is the default partitioner.
ByteOrder and Random are the deprecated ones
and should be mentioned in the description.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-16 12:05:50 +02:00
Nadav Har'El
9953a33354 merge "Adding a schema file when creating a snapshot"
Merged pull request https://github.com/scylladb/scylla/pull/5294 from
Amnon Heiman:

To use a snapshot we need a schema file that is similar to the result of
running cql DESCRIBE command.

The DESCRIBE is implemented in the cql driver so the functionality needs
to be re-implemented inside scylla.

This series adds a describe method to the schema file and use it when doing
a snapshot.

There are different approach of how to handle materialize views and
secondary indexes.

This implementation creates each schema.cql file in its own relevant
directory, so the schema for materializing view, for example, will be
placed in the snapshot directory of the table of that view.

Fixes #4192
2020-01-16 12:05:50 +02:00
Piotr Dulikowski
c383652061 gossip: allow for aborting on sleep
This commit makes most sleeps in gossip.cc abortable. It is now possible
to quickly shut down a node during startup, most notably during the
phase while it waits for gossip to settle.
2020-01-16 12:05:50 +02:00
Avi Kivity
e5e0642f2a tools: toolchain: add dependencies for building debian and rpm packages
This reduces network traffic and eliminates time for installation when
building packages from the frozen toolchain, as well as isolating the
build from updates to those dependencies which may cause breakage.
2020-01-16 12:05:50 +02:00
Pekka Enberg
da9dae3dbe Merge 'test.py: add support for CQL tests' from Kostja
This patch set adds support for CQL tests to test.py,
as well as many other improvements:

* --name is now a positional argument
* test output is preserved in testlog/${mode}
* concise output format
* better color support
* arbitrary number of test suites
* per-suite yaml-based configuration
* options --jenkins and --xunit are removed and xml
  files are generated for all runs

A simple driver is written in C++ to read CQL for
standard input, execute in embedded mode and produce output.

The patch is checked with BYO.

Reviewed-by: Dejan Mircevski <dejan@scylladb.com>
* 'test.py' of github.com:/scylladb/scylla-dev: (39 commits)
  test.py: introduce BoostTest and virtualize custom boost arguments
  test.py: sort tests within a suite, and sort suites
  test.py: add a basic CQL test
  test.py: add CQL .reject files to gitignore
  test.py: print a colored unidiff in case of test failure
  test.py: add CqlTestSuite to run CQL tests
  test.py: initial import of CQL test driver, cql_repl
  test.py: remove custom colors and define a color palette
  test.py: split test output per test mode
  test.py: remove tests_to_run
  test.py: virtualize Test.run(), to introduce CqlTest.Run next
  test.py: virtualize test search pattern per TestSuite
  test.py: virtualize write_xunit_report()
  test.py: ensure print_summary() is agnostic of test type
  test.py: tidy up print_summary()
  test.py: introduce base class Test for CQL and Unit tests
  test.py: move the default arguments handling to UnitTestSuite
  test.py: move custom unit test command line arguments to suite.yaml
  test.py: move command line argument processing to UnitTestSuite
  test.py: introduce add_test(), which is suite-specific
  ...
2020-01-16 12:05:50 +02:00
Pekka Enberg
e8b659ec5d dist/docker: Remove Ubuntu-based Docker image
The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated
since 2017. Let's remove it as unmaintained.

Message-Id: <20200115102405.23567-1-penberg@scylladb.com>
2020-01-16 12:05:50 +02:00
Avi Kivity
546556b71b Merge "allow commitlog to wait for specific entires to be flushed on disk" from Gleb
"
Currently commitlog supports two modes of operation. First is 'periodic'
mode where all commitlog writes are ready the moment they are stored in
a memory buffer and the memory buffer is flushed to a storage periodically.
Second is a 'batch' mode where each write is flushed as soon as possible
(after previous flush completed) and writes are only ready after they
are flushed.

The first option is not very durable, the second is not very efficient.
This series adds an option to mark some writes as "more durable" in
periodic mode meaning that they will be flushed immediately and reported
complete only after the flush is complete (flushing a durable write also
flushes all writes that came before it). It also changes paxos to use
those durable writes to store paxos state.

Note that strictly speaking the last patch is not needed since after
writing to an actual table the code updates paxos table and the later
uses durable writes that make sure all previous writes are flushed. Given
that both writes supposed to run on the same shard this should be enough.
But it feels right to make base table writes durable as well.
"

* 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev:
  paxos: immediately sync commitlog entries for writes made by paxos learn stage
  paxos: mark paxos table schema as "always sync"
  schema: allow schema to be marked as 'always sync to commitlog'
  commitlog: add test for per entry sync mode
  database: pass sync flag from db::apply function to the commitlog
  commitlog: add sync method to entry_writer
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
2ebd1463b2 tests: Handle null and not present values differently
Before this patch result_set_assertions was handling both null values
and missing values in the same way.

This patch changes the handling of missing values so that now checking
for a null value is not the same as checking for a value not being
present.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114184116.75546-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Botond Dénes
0c52c2ba50 data: make cell::make_collection(): more consistent and safer
3ec889816 changed cell::make_collection() to take different code paths
depending whether its `data` argument is nothrow copyable/movable or
not. In case it is not, it is wrapped in a view to make it so (see the
above mentioned commit for a full explanation), relying on the methods
pre-existing requirement for callers to keep `data` alive while the
created writer is in use.
On closer look however it turns out that this requirement is neither
respected, nor enforced, at least not on the code level. The real
requirement is that the underlying data represented by `data` is kept
alive. If `data` is a view, it is not expected to be kept alive and
callers don't, it is instead copied into `make_collection()`.
Non-views however *are* expected to be kept alive. This makes the API
error prone.
To avoid any future errors due to this ambiguity, require all `data`
arguments to be nothrow copyable and movable. Callers are now required
to pass views of nonconforming objects.

This patch is a usability improvement and is not fixing a bug. The
current code works as-is because it happens to conform to the underlying
requirements.

Refs: #5575
Refs: #5341

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>
2020-01-16 12:05:50 +02:00
Amnon Heiman
ac8aac2b53 tests/cql_query_test: Add schema describe tests
This patch adds tests for the describe method.

test_describe_simple_schema tests regular tables.

test_describe_view_schema tests view and index.

Each test, create a table, find the schema, call the describe method and
compare the results to the string that was used to create the table.

The view tests also verify that adding an index or view does not change
the base table.

When comparing results, leading and trailing white spaces are ignored
and all combination of whitespaces and new lines are treated equaly.

Additional tests may be added at a future phase if required.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:07:57 +02:00
Amnon Heiman
028525daeb database: add schema.cql file when creating a snapshot
When creating a snapshot we need to add a schema.cql file in the
snapshot directory that describes the table in that snapshot.

This patch adds the file using the schema describe method.

get_snapshot_details and manifest_json_filter were modified to ignore
the schema.cql file.

Fixes #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
82367b325a schema: Add a describe method
This patch adds a describe method to a table schema.

It acts similar to a DESCRIBE cql command that is implemented in a CQL
driver.

The method supports tables, secondary indexes local indexes and
materialize views.

relates to: #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
6f58d51c83 secondary_index_manager: add the index_name_from_table_name function
index_name_from_table_name is a reverse of index_table_name,
it gets a table name that was generated for an index and return the name
of the index that generated that table.

Relates to #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Pavel Emelyanov
555856b1cd migration_manager: Use in-place value factory
The factory is purely a state-less thing, there is no difference what
instance of it to use, so we may omit referencing the storage_service
in passive_announce

This is 2nd simple migration_manager -> storage_service link to cut
(more to come later).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
f129d8380f migration_manager: Get database through storage_proxy
There are several places where migration_manager needs storage_service
reference to get the database from, thus forming the mutual dependency
between them. This is the simplest case where the migration_manager
link to the storage_service can be cut -- the databse reference can be
obtained from storage_proxy instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
5cf365d7e7 database: Explicitly pass migration_manager through init_non_system_keyspace
This is the last place where database code needs the migration_manager
instance to be alive, so now the mutual dependency between these two
is gone, only the migration_manager needs the database, but not the
vice-versa.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
ebebf9f8a8 database: Do not request migration_manager instance for passive_announce
The helper in question is static, so no need to play with the
migration_manager instances.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
3f84256853 migration_manager: Remove register/unregister helpers
In the 2nd patch the migration_manager kept those for
simpler patching, but now we can drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9e4b41c32a tests: Switch on migration notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9d31bc166b cdc: Use migration_notifier to (un)register for events
If no one provided -- get it from storage_service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:19 +03:00
Pavel Emelyanov
ecab51f8cc storage_service: Use migration_notifier (and stop worrying)
The storage_server needs migration_manager for notifications and
carefully handles the manager's stop process not to demolish the
listeners list from under itself. From now on this dependency is
no longer valid (however the storage_service seems still need the
migration_manager, but this is different story).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7814ed3c12 cql_server: Use migration_notifier in events_notifier
This patch removes an implicit cql_server -> migration_manager
dependency, as the former's event notifier uses the latter
for notifications.

This dependency also breaks a loop:
storage_service -> cql_server -> migration_manager -> storage_service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
d9edcb3f15 query_processor: Use migration_notifier
This patch breaks one (probably harmless but still) dependency
loop. The query_processor -> migration_manager -> storage_proxy
 -> tracing -> query_processor.

The first link is not not needed, as the query_processor needs the
migration_manager purely to (ub)subscribe on notifications.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
2735024a53 auth: Use migration_notifier
The same as with view builder. The constructor still needs both,
but the life-time reference is now for notifier only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
28f1250b8b view_builder: Use migration notifier
The migration manager itself is still needed on start to wait
for schema agreement, but there's no longer the need for the
life-time reference on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7cfab1de77 database: Switch on mnotifier from migration_manager
Do not call for local migration manager instance to send notifications,
call for the local migration notifier, it will always be alive.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f45b23f088 storage_service: Keep migration_notifier
The storage service will need this guy to initialize sub-services
with. Also it registers itself with notifiers.

That said, it's convenient to have the migration notifier on board.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
e327feb77f database: Prepare to use on-database migration_notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f240d5760c migration_manager: Split notifier from main class
The _listeners list on migration_manager class and the corresponding
notify_xxx helpers have nothing to do with the its instances, they
are just transport for notification delivery.

At the same time some services need the migration manager to be alive
at their stop time to unregister from it, while the manager itself
may need them for its needs.

The proposal is to move the migration notifier into a complete separate
sharded "service". This service doesn't need anything, so it's started
first and stopped last.

While it's not effectively a "migration" notifier, we inherited the name
from Cassandra and renaming it will "scramble neurons in the old-timers'
brains but will make it easier for newcomers" as Avi says.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:19 +03:00
Pavel Emelyanov
074cc0c8ac migration_manager: Helpers for on_before_ notifications
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Pavel Emelyanov
1992755c72 storage_service: Kill initialization helper from init.cc
The helper just makes further patching more complex, so drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Konstantin Osipov
a665fab306 test.py: introduce BoostTest and virtualize custom boost arguments 2020-01-15 13:37:25 +03:00
Gleb Natapov
51672e5990 paxos: immediately sync commitlog entries for writes made by paxos learn stage 2020-01-15 12:15:42 +02:00
Gleb Natapov
0fc48515d8 paxos: mark paxos table schema as "always sync"
We want all writes to paxos table to be persisted on a storage before
declared completed.
2020-01-15 12:15:42 +02:00
Gleb Natapov
16e0fc4742 schema: allow schema to be marked as 'always sync to commitlog'
All writes that uses this schema will be immediately persisted on a
storage.
2020-01-15 12:15:42 +02:00
Gleb Natapov
0ce70c7a04 commitlog: add test for per entry sync mode 2020-01-15 12:15:42 +02:00
Gleb Natapov
29574c1271 database: pass sync flag from db::apply function to the commitlog
Allow upper layers to request a mutation to be persisted on a disk before
making future ready independent of which mode commitlog is running in.
2020-01-15 12:15:42 +02:00
Gleb Natapov
e0bc4aa098 commitlog: add sync method to entry_writer
If the method returns true commitlog should sync to file immediately
after writing the entry and wait for flush to complete before returning.
2020-01-15 12:15:42 +02:00
Piotr Sarna
9aab75db60 alternator: clean up single value rjson comparator
The comparator is refreshed to ensure the following:
 - null compares less to all other types;
 - null, true and false are comparable against each other,
   while other types are only comparable against themselves and null.

Comparing mixed types is not currently reachable from the alternator
API, because it's only used for sets, which can only use
strings, binary blobs and numbers - thus, no new pytest cases are added.

Fixes #5454
2020-01-15 10:57:49 +02:00
Juliusz Stasiewicz
d87d01b501 storage_proxy: intercept rpc::closed_error if counter leader is down (#5579)
When counter mutation is about to be sent, a leader is elected, but
if the leader fails after election, we get `rpc::closed_error`. The
exception propagates high up, causing all connections to be dropped.

This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters`
and translates it to `mutation_write_failure_exception`.

References #2859
2020-01-15 09:56:45 +01:00
Konstantin Osipov
a351ea57d5 test.py: sort tests within a suite, and sort suites
This makes it easier to navigate the test artefacts.

No need to sort suites since they are already
stored in a dict.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
ba87e73f8e test.py: add a basic CQL test 2020-01-15 11:41:19 +03:00
Konstantin Osipov
44d31db1fc test.py: add CQL .reject files to gitignore
To avoid accidental commit, add .reject files to .gitignore
2020-01-15 11:41:19 +03:00
Konstantin Osipov
4f64f0c652 test.py: print a colored unidiff in case of test failure
Print a colored unidiff between result and reject files in case of test
failure.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
d3f9e64028 test.py: add CqlTestSuite to run CQL tests
Run the test and compare results. Manage temporary
and .reject files.

Now that there are CQL tests, improve logging.

run_test success no longer means test success.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
b114bfe0bd test.py: initial import of CQL test driver, cql_repl
cql_repl is a simple program which reads CQL from stdin,
executes it, and writes results to stdout.

It support --input, --output and --log options.
--log is directed to cql_test.log by default.
--input is stdin by default
--output is stdout by default.

The result set output is print with a basic
JSON visitor.
2020-01-15 11:41:16 +03:00
Konstantin Osipov
0ec27267ab test.py: remove custom colors and define a color palette
Using a standard Python module improves readability,
and allows using colors easily in other output.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
0165413405 test.py: split test output per test mode
Store test temporary files and logs in ${testdir}/${mode}.
Remove --jenkins and --xunit, and always write XML
files at a predefined location: ${testdir}/${mode}/xml/.

Use .xunit.xml extension for tests which XML output is
in xunit format, and junit.xml for an accumulated output
of all non-boost tests in junit format.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
4095ab08c8 test.py: remove tests_to_run
Avoid storing each test twice, use per-tests
list to construct a global iterable.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
169128f80b test.py: virtualize Test.run(), to introduce CqlTest.Run next 2020-01-15 10:53:24 +03:00
Konstantin Osipov
d05f6c3cc7 test.py: virtualize test search pattern per TestSuite
CQL tests have .cql extension, while unit tests
have .cc.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
abcc182ab3 test.py: virtualize write_xunit_report()
Make sure any non-boost test can participate in the report.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
18aafacfad test.py: ensure print_summary() is agnostic of test type
Introduce a virtual Test.print_summary() to print
a failed test summary.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
21fbe5fa81 test.py: tidy up print_summary()
Now that we have tabular output, make print_summary()
more concise.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c171882b51 test.py: introduce base class Test for CQL and Unit tests 2020-01-15 10:53:24 +03:00
Konstantin Osipov
fd6897d53e test.py: move the default arguments handling to UnitTestSuite
Move UnitTeset default seastar argument handling to UnitTestSuite
(cleanup).
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d3126f08ed test.py: move custom unit test command line arguments to suite.yaml
Load the command line arguments, if any, from suite.yaml, rather
than keep them hard-coded in test.py.

This is allows operations team to have easier access to these.

Note I had to sacrifice dynamic smp count for mutation_reader_test
(the new smp count is fixed at 3) since this is part
of test configuration now.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ef6cebcbd2 test.py: move command line argument processing to UnitTestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
4a20617be3 test.py: introduce add_test(), which is suite-specific 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7e10bebcda test.py: move long test list to suite.yaml
Use suite.yaml for long tests
2020-01-15 10:53:24 +03:00
Konstantin Osipov
32ffde91ba test.py: move test id assignment to TestSuite
Going forward finding and creating tests will be
a responsibility of TestSuite, so the id generator
needs to be shared.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b5b4944111 test.py: move repeat handling to TestSuite
This way we can avoid iterating over all tests
to handle --repeat.
Besides, going forward the tests will be stored
in two places: in the global list of all tests,
for the runner, and per suite, for suite-based
reporting, so it's easier if TestSuite
if fully responsible for finding and adding tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
34a1b49fc3 test.py: move add_test_list() to TestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
44e1c4267c test.py: introduce test suites
- UnitTestSuite - for test/unit tests
- BoostTestSuite - a tweak on UnitTestSuite, with options
  to log xml test output to a dedicated file
2020-01-15 10:53:24 +03:00
Konstantin Osipov
eed3201ca6 test.py: use path, rather than test kind, for search pattern
Going forward there may be multiple suites of the same kind.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f95c97667f test.py: support arbitrary number of test suites
Scan entire test/ for folders that contain suite.yaml,
and load tests from these folders. Skip the rest.

Each folder with a suite.yaml is expected to have a valid
suite configuration in the yaml file.

A suite is a folder with test of the same type. E.g.
it can be a folder with unit tests, boost tests, or CQL
tests.

The harness will use suite.yaml to create an appropriate
suite test driver, to execute tests in different formats.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c1f8169cd4 test.py: add suite.yaml to boost and unit tests
The plan is to move suite-specific settings to the
configuration file.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ec9ad04c8a test.py: move 'success' to TestUnit class
There will be other success attributes: program return
status 0 doesn't mean the test is successful for all tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b4aa4d35c3 test.py: save test output in tmpdir
It is handy to have it so that a reference of a failed
test is available without re-running it.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f4efe03ade test.py: always produce xml output, derive output paths from tmpdir
It reduces the number of configurations to re-test when test.py is
modified.  and simplifies usage of test.py in build tools, since you no
longer need to bother with extra arguments.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d2b546d464 test.py: output job count in the log 2020-01-15 10:53:24 +03:00
Konstantin Osipov
233f921f9d test.py: make test output brief&tabular
New format:

% ./test.py --verbose --mode=release
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[1/111]   boost/UUID_test                                    release  [ PASS ]
[2/111]   boost/enum_set_test                                release  [ PASS ]
[3/111]   boost/like_matcher_test                            release  [ PASS ]
[4/111]   boost/observable_test                              release  [ PASS ]
[5/111]   boost/allocation_strategy_test                     release  [ PASS ]
^C
% ./test.py foo
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[3/3]     unit/memory_footprint_test                          debug   [ PASS ]
------------------------------------------------------------------------------
2020-01-15 10:53:24 +03:00
Konstantin Osipov
879bea20ab test.py: add a log file
Going forward I'd like to make terminal output brief&tabular,
but some test details are necessary to preserve so that a failure
is easy to debug. This information now goes to the log file.

- open and truncate the log file on each harness start
- log options of each invoked test in the log, so that
  a failure is easy to reproduce
- log test result in the log

Since tests are run concurrently, having an exact
trace of concurrent execution also helps
debugging flaky tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
cbee76fb95 test.py: gitignore the default ./test.py tmpdir, ./testlog 2020-01-15 10:53:24 +03:00
Konstantin Osipov
1de69228f1 test.py: add --tmpdir
It will be used for test log files.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
caf742f956 test.py: flake8 style fix 2020-01-15 10:53:24 +03:00
Konstantin Osipov
dab364c87d test.py: sort imports 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7ec4b98200 test.py: make name a positional argument.
Accept multiple test names, treat test name
as a substring, and if the same name is given
multiple times, run the test multiple times.
2020-01-15 10:53:24 +03:00
Dejan Mircevski
bb2e04cc8b alternator: Improve comments on comparators
Some comparator methods in conditions.cc use unexpected operators;
explain why.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-01-14 22:25:55 +02:00
Tomasz Grabiec
c8a5a27bd9 Merge "storage_service: Move load_broadcaster away" from Pavel E.
The storage_service struct is a collection of diverse things,
most of them requiring only on start and on stop and/or runing
on shard 0 (but is nonetheless sharded).

As a part of clearing this structure and generated by it inter-
-componenes dependencies, here's the sanitation of load_broadcaster.
2020-01-14 19:26:06 +01:00
Calle Wilund
313ed91ab0 cdc: Listen for migration callbacks on all shards
Fixes #5582

... but only populate log on shard 0.

Migration manager callbacks are slightly assymetric. Notifications
for pre-create/update mutations are sent only on initiating shard
(neccesary, because we consider the mutations mutable).
But "created" callbacks are sent on all shards (immutable).

We must subscribe on all shards, but still do population of cdc table
only once, otherwise we can either miss table creat or populate
more than once.

v2:
- Add test case
Message-Id: <20200113140524.14890-1-calle@scylladb.com>
2020-01-14 16:35:41 +01:00
Avi Kivity
2138657d3a Update seastar submodule
* seastar 36cf5c5ff0...3f3e117de3 (16):
  > memcached: don't use C++17-only std::optional
  > reactor: Comment why _backend is assigned in constructor body
  > log: restore --log-to-stdout for backward compatibility
  > used_size.hh: Include missing headers
  > core: Move some code from reactor.cc to future.cc
  > future-util: move parallel_for_each to future-util.cc
  > task: stop wrapping tasks with unique_ptr
  > Merge "Setup timer signal handler in backend constructor" from Pavel
Fixes #5524
  > future: avoid a branch in future's move constructor if type is trivial
  > utils: Expose used_size
  > stream: Call get_future early
  > future-util: Move parallel_for_each_state code to a .cc
  > memcached: log exceptions
  > stream: Delete dead code
  > core: Turn pollable_fd into a simple proxy over pollable_fd_state.
  > Merge "log to std::cerr" from Benny
2020-01-14 16:56:25 +02:00
Pavel Emelyanov
e1ed8f3f7e storage_service: Remove _shadow_token_metadata
This is the part of de-bloating storage_service.

The field in question is used to temporary keep the _token_metadata
value during shard-wide replication. There's no need to have it as
class member, any "local" copy is enough.

Also, as the size of token_metadata is huge, and invoke_on_all()
copies the function for each shard, keep one local copy of metadata
using do_with() and pass it into the invoke_on_all() by reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Reviewed-by:  Asias He <asias@scylladb.com>
Message-Id: <20200113171657.10246-1-xemul@scylladb.com>
2020-01-14 16:29:10 +02:00
Rafael Ávila de Espíndola
054f5761a7 types: Refactor code into a serialize_varint helper
This is a bit cleaner and avoids a boost::multiprecision::cpp_int copy
while serializing a decimal.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200110221422.35807-1-espindola@scylladb.com>
2020-01-14 16:28:27 +02:00
Avi Kivity
6c84dd0045 cql3: update_statement: do not set query option always_return_static_content for list read-before-write
The query option always_return_static_content was added for lightweight
transations in commits e0b31dd273 (infrastructure) and 65b86d155e
(actual use). However, the flag was added unconditionally to
update_parameters::options. This caused it to be set for list
read-modify-write operations, not just for lightweight transactions.
This is a little wasteful, and worse, it breaks compatibility as old
nodes do not understand the always_return_static_content flag and
complain when they see it.

To fix, remove the always_return_static_content from
update_parameters::options and only set it from compare-and-swap
operations that are used to implement lightweight transactions.

Fixes #5593.

Reviewed-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20200114135133.2338238-1-avi@scylladb.com>
2020-01-14 16:15:20 +02:00
Hagit Segev
ef88e1e822 CentOS RPMs: Remove target to enable general centos. 2020-01-14 14:31:03 +02:00
Alejo Sanchez
6909d4db42 cql3: BYPASS CACHE query counter
This patch is the first part of requested full scan metrics.
It implements a counter of SELECT queries with BYPASS CACHE option.

In scope of #5209

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20200113222740.506610-2-alejo.sanchez@scylladb.com>
2020-01-14 12:19:00 +02:00
Rafael Ávila de Espíndola
dca1bc480f everywhere: Use serialized(foo) instead of data_value(foo).serialize()
This is just a simple cleanup that reduces the size of another patch I
am working on and is an independent improvement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114051739.370127-1-espindola@scylladb.com>
2020-01-14 12:17:12 +02:00
Pavel Emelyanov
b9f28e9335 storage_service: Remove dead drain branch
The drain_in_progress variable here is the future that's set by the
drain() operation itself. Its promise is set when the drain() finishes.

The check for this future in the beginning of drain() is pointless.
No two drain()-s can run in parallels because of run_with_api_lock()
protection. Doing the 2nd drain after successfull 1st one is also
impossible due to the _operation_mode check. The 2nd drain after
_exceptioned_ (and thus incomplete) 1st one will deadlock, after
this patch will try to drain for the 2nd time, but that should by ok.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200114094724.23876-1-xemul@scylladb.com>
2020-01-14 12:07:29 +02:00
Piotr Sarna
36ec43a262 Merge "add table with connected cql clients" from Juliusz
This change introduces system.clients table, which provides
information about CQL clients connected.

PK is the client's IP address, CK consists of outgoing port number
and client_type (which will be extended in future to thrift/alternator/redis).
Table supplies also shard_id and username. Other columns,
like connection_stage, driver_name, driver_version...,
are currently empty but exist for C* compatibility and future use.

This is an ordinary table (i.e. non-virtual) and it's updated upon
accepting connections. This is also why C*'s column request_count
was not introduced. In case of abrupt DB stop, the table should not persist,
so it's being truncated on startup.

Resolves #4820
2020-01-14 10:01:07 +02:00
Avi Kivity
1f46133273 Merge "data: make cell::make_collection() exception safe" from Botond
"
Most of the code in `cell` and the `imr` infrastructure it is built on
is `noexcept`. This means that extra care must be taken to avoid rouge
exceptions as they will bring down the node. The changes introduced by
0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this
code path by violating an undocumented and unvalidated assumption --
that fragment ranges passed to `cell::make_collection()` are nothrow
copyable and movable.

This series refactors `cell::make_collection()` such that it does not
have this assumption anymore and is safe to use with any range.

Note that the unit test included in this series, that was used to find
all the possible exception sources will not be currently run in any of
our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not
being set. I plan to address this in a followup because setting this
flags fails other tests using the failure injection mechanism. This is
because these tests are normally run with the failure injection disabled
so failures managed to lurk in without anyone noticing.

Fixes: #5575
Refs: #5341

Tests: unit(dev, debug)
"

* 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla:
  test: mutation_test: add exception safety test for large collection serialization
  data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
  utils/fragment_range.hh: introduce fragment_range_view
2020-01-14 10:01:06 +02:00
Nadav Har'El
5b08ec3d2c alternator: error on unsupported ScanIndexForward=false
We do not yet support the ScanIndexForward=false option for reversing
the sort order of a Query operation, as reported in issue #5153.
But even before implementing this feature, it is important that we
produce an error if a user attempts to use it - instead of outright
ignoring this parameter and giving the user wrong results. This is
what this patch does.

Before this patch, the reverse-order query in the xfailing test
test_query.py::test_query_reverse seems to succeed - yet gives
results in the wrong order. With this patch, the query itself fails -
stating that the ScanIndexForward=false argument is not supported.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200105113719.26326-1-nyh@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
c4bf532d37 storage_service: Fix race in removenode/force_removenode/other
Here's another theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and some other operation.
Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

At this point some other operation may start. Say, drain:

  run_with_api_lock
    _operation_in_progress = "drain"
    storage_service::drain
      ...
      go to sleep somewhere

No let's go back to the 1st op that wakes up from its sleep.
The code it executes is

    while (!ss._operation_in_progress.empty()) {
        sleep_abortable()
    }

and while the drain is running it will never exit.

However (! and this is the core of the race) should the drain
operation happen _before_ the force_removenode, another check
for _operation_in_progress would have made the latter exit with
the "Operation drain is in progress, try again" message.

Fix this inconsistency by making the check for current operation
every wake-up from the sleep_abortable.

Fixes #5591

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
cc92683894 storage_service: Fix race and deadlock in removenode/force_removenode
Here's a theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and removenode (again)
operations. Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      _force_remove_completion = false
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

! at this point we have _force_remove_completion = false and
_operation_in_progress = <empty>, which opens the following
opportunity for the 3d removenode:

  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now here's what we have in 2nd and 3rd ops:

1. _operation_in_progress = "removenode" (set by 3rd) prevents the
   force_removenode from exiting its loop
2. _force_remove_completion = false (set by 1st on exit) prevents
   the removenode from waiting on replicating_nodes list

One can start the 4th call with force_removenode, it will proceed and
wake up the 3rd op, but after it we'll have two force_removenode-s
running in parallel and killing each other.

I propose not to set _force_remove_completion to false in removenode,
but just exit and let the owner of this flag unset it once it gets
the control back.

Fixes #5590

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Benny Halevy
ff55b5dca3 cql3: functions: limit sum overflow detection to integral types
Other types do not have a wider accumulator at the moment.
And static_cast<accumulator_type>(ret) != _sum evaluates as
false for NaN/Inf floating point values.

Fixes #5586

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
e3310201dd atomic_cell_or_collection: type-aware print atomic_cell or collection components
Now that atomic_cell_view and collection_mutation_view have
type-aware printers, we can use them in the type-aware atomic_cell_or_collection
printer.
Message-Id: <20191231142832.594960-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
931b196d20 mutation_partition: row: resolve column name when in schema-aware printer
Instead of printing the column id, print the full column name.
Message-Id: <20191231142944.595272-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Nadav Har'El
4aa323154e merge: Pretty print canonical_mutation objects
Merged pull request https://github.com/scylladb/scylla/pull/5533
from Avi Kivity:

canonical_mutation objects are used for schema reconciliation, which is a
fragile area and thus deserves some debugging help.

This series makes canonical_mutation objects printable.
2020-01-14 10:01:06 +02:00
Takuya ASADA
5241deda2d dist: nonroot: fix CLI tool path for nonroot (#5584)
CLI tool path is hardcorded, need to specify correct path on nonroot.
2020-01-14 10:01:06 +02:00
Nadav Har'El
1511b945f8 merge: Handle multiple regular base columns in view pk
Merged patch series from Piotr Sarna:

"Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.

This series is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006
Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)"

Piotr Sarna (3):
  db,view: fix checking if partition key is empty
  view: handle multiple regular base columns in view pk
  test: add a case for multiple base regular columns in view key

 alternator-test/test_gsi.py              |  1 -
 view_info.hh                             |  5 +-
 cql3/statements/alter_table_statement.cc |  2 +-
 db/view/view.cc                          | 77 ++++++++++++++----------
 mutation_partition.cc                    |  2 +-
 test/boost/cql_query_test.cc             | 58 ++++++++++++++++++
 6 files changed, 109 insertions(+), 36 deletions(-)
2020-01-14 10:01:00 +02:00
Nadav Har'El
f16e3b0491 merge: bouncing lwt request to an owning shard
Merged patch series from Gleb Natapov:

"LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.

The nicer way to achieve the same would be to jump to a right shard
inside of the storage_proxy::cas(), but unfortunately with current
implementation of the modification statements they are unusable by
a shard different from where it was created, so the jump should happen
before a modification statement for an cas() is created. When we fix our
cql code to be more cross-shard friendly this can be reworked to do the
jump in the storage_proxy."

Gleb Natapov (4):
  transport: change make_result to takes a reference to cql result
    instead of shared_ptr
  storage_service: move start_native_transport into a thread
  lwt: Process lwt request on a owning shard
  lwt: drop invoke_on in paxos_state prepare and accept

 auth/service.hh                           |   5 +-
 message/messaging_service.hh              |   2 +-
 service/client_state.hh                   |  30 +++-
 service/paxos/paxos_state.hh              |  10 +-
 service/query_state.hh                    |   6 +
 service/storage_proxy.hh                  |   2 +
 transport/messages/result_message.hh      |  20 +++
 transport/messages/result_message_base.hh |   4 +
 transport/request.hh                      |   4 +
 transport/server.hh                       |  25 ++-
 cql3/statements/batch_statement.cc        |   6 +
 cql3/statements/modification_statement.cc |   6 +
 cql3/statements/select_statement.cc       |   8 +
 message/messaging_service.cc              |   2 +-
 service/paxos/paxos_state.cc              |  48 ++---
 service/storage_proxy.cc                  |  47 ++++-
 service/storage_service.cc                | 120 +++++++------
 test/boost/cql_query_test.cc              |   1 +
 thrift/handler.cc                         |   3 +
 transport/messages/result_message.cc      |   5 +
 transport/server.cc                       | 203 ++++++++++++++++------
 21 files changed, 377 insertions(+), 180 deletions(-)
2020-01-14 09:59:59 +02:00
Botond Dénes
300728120f test: mutation_test: add exception safety test for large collection serialization
Use `seastar::memory::local_failure_injector()` to inject al possible
`std::bad_alloc`:s into the collection serialization code path. The test
just checks that there are no `std::abort()`:s caused by any of the
exceptions.

The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is
not defined.
2020-01-13 16:53:35 +02:00
Botond Dénes
3ec889816a data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
`cell::make_collection()` assumes that all ranges passed to it are
nothrow copyable and movable views. This is not guaranteed, is not
expressed in the interface and is not mentioned in the comments either.
The changes introduced by 0a453e5d3a to collection serialization, making
it use fragmented buffers, fell into this trap, as it passes
`bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy
constructor allocates and hence can throw, triggering an
`std::terminate()` inside `cell::make_collection()` as the latter is
noexcept.

To solve this issue, non-nothrow copyable and movable ranges are now
wrapped in a `fragment_range_view` to make them so.
`cell::make_collection()` already requires callers to keep alive the
range for the duration of the call, so this does not introduce any new
requirements to the callers. Additionally, to avoid any future
accidents, do not accept temporaries for the `data` parameter. We don't
ever want to move this param anyway, we will either have a trivially
copyable view, or a potentially heavy-weight range that we will create a
trivially copyable view of.
2020-01-13 16:53:35 +02:00
Botond Dénes
b52b4d36a2 utils/fragment_range.hh: introduce fragment_range_view
A lightweight, trivially copyable and movable view for fragment ranges.
Allows for uniform treatment of all kinds of ranges, i.e. treating all
of them as a view. Currently `fragment_range.hh` provides lightweight,
view-like adaptors for empty and single-fragment ranges (`bytes_view`). To
allow code to treat owning multi-fragment ranges the shame way as the
former two, we need a view for the latter as well -- this is
`fragment_range_view`.
2020-01-13 16:52:59 +02:00
Calle Wilund
75f2b2876b cdc: Remove free function for mutation augmentation 2020-01-13 13:18:55 +00:00
Calle Wilund
3eda3122af cdc: Move mutation augment from cql3::modification_statement to storage proxy
Using the attached service object
2020-01-13 13:18:55 +00:00
Juliusz Stasiewicz
27dfda0b9e main/transport: using the infrastructure of system.clients
Resolves #4820. Execution path in main.cc now cleans up system.clients
table if it exists (this is done on startup). Also, server.cc now calls
functions that notify about cql clients connecting/disconnecting.
2020-01-13 14:07:04 +01:00
Pavel Emelyanov
148da64a7e storage_servce: Move load_broadcaster away
This simplifies the storage_service API and fixes the
complain about shared_ptr usage instead of unique_ptr.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:55:09 +03:00
Pavel Emelyanov
b6e1e6df64 misc_services: Introduce load_meter
There's a lonely get_load_map() call on storage_service that
needs only load broadcaster, always runs on shard 0 and that's it.

Next patch will move this whole stuff into its own helper no-shard
container and this is preparation for this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:53:08 +03:00
Gleb Natapov
5753ab7195 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call on paxos_state level. RPC calls may
still arrive to a wrong shard so we need to make cross shard call there.
2020-01-13 10:26:02 +02:00
Gleb Natapov
d28dd4957b lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2020-01-13 10:26:02 +02:00
Piotr Sarna
3853594108 alternator-test: turn off TLS self-signed verification
Two test cases did not ignore TLS self-signed warnings, which are used
locally for testing HTTPS.

Fixes #5557

Tests(test_health, test_authorization)
Message-Id: <8bda759dc1597644c534f94d00853038c2688dd7.1578394444.git.sarna@scylladb.com>
2020-01-10 15:31:30 +02:00
Rafael Ávila de Espíndola
5313828ab8 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-2-espindola@scylladb.com>
2020-01-09 10:42:55 +02:00
Rafael Ávila de Espíndola
4da6dc1a7f cql3: Change a lambda capture order to match another
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-1-espindola@scylladb.com>
2020-01-09 10:42:49 +02:00
Avi Kivity
6d454d13ac db/schema_tables: make gratuitous generic lambdas in do_merge_schema() concrete
Those gratuitous lambdas make life harder for IDE users by hiding the actual
types from the IDEs.
Message-Id: <20200107154746.1918648-1-avi@scylladb.com>
2020-01-08 17:43:18 +01:00
Avi Kivity
454074f284 Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz
"
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
"

* tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla:
  database: Avoid OOMing with flush continuations after failed memtable flush
  lsa: Introduce operator bool() to occupancy_stats
  lsa: Expose region_impl::evictable_occupancy in the region class
2020-01-08 16:58:54 +02:00
Gleb Natapov
feed544c5d paxos: fix truncation time checking during learn stage
The comparison is done in millisecons, not microseconds.

Fixes #5566

Message-Id: <20200108094927.GN9084@scylladb.com>
2020-01-08 14:37:07 +01:00
Gleb Natapov
2832f1d9eb storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2020-01-08 14:57:57 +02:00
Gleb Natapov
7fb2e8eb9f transport: change make_result to takes a reference to cql result instead of shared_ptr 2020-01-08 14:57:57 +02:00
Avi Kivity
0bde5906b3 Merge "cql3: detect and handle int overflow in aggregate functions #5537" from Benny
"
Fix overflow handling in sum() and avg().

sum:
 - aggregated into __int128
 - detect overflow when computing result and log a warning if found

avg:
 - fix division function to divide the accumulator type _sum (__int128 for integers) by _count

Add unit tests for both cases

Test:
  - manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it.
  - unit(dev), cql_query_test(debug)

Fixes #5536
"

* 'cql3-sum-overflow' of https://github.com/bhalevy/scylla:
  test: cql_query_test: test avg overflow
  cql3: functions: protect against int overflow in avg
  test: cql_query_test: test sum overflow
  cql3: functions: detect and handle int overflow in sum
  exceptions: sort exception_code definitions
  exceptions: define additional cassandra CQL exceptions codes
2020-01-08 10:39:38 +02:00
Avi Kivity
d649371baa Merge "Fix crash on SELECT SUM(udf(...))" from Rafael
"
We were failing to start a thread when the UDF call was nested in an
aggregate function call like SUM.
"

* 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla:
  cql3: Fix indentation
  cql3: Add missing with_thread_if_needed call
  cql3: Implement abstract_function_selector::requires_thread
  remove make_ready_future call
2020-01-08 10:25:42 +02:00
Benny Halevy
dafbd88349 query: initialize read_command timestamp to now
This was initialized to api::missing_timestamp but
should be set to either a client provided-timestamp or
the server's.

Unlike write operations, this timestamp need not be unique
as the one generated by client_state::get_timestamp.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>
2020-01-08 10:19:07 +02:00
Benny Halevy
39325cf297 storage_proxy: fix int overflow in service::abstract_read_executor::execute
exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp,
causing:
  service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int'
  Aborting on shard 1.

Do not optimize cross-dc repair if read_timestamp is missing (or just negative)
We're interested in reads that happen within write_timeout of a write.

Fixes #5556

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>
2020-01-08 10:18:59 +02:00
Raphael S. Carvalho
390c8b9b37 sstables: Move STCS implementation to source file
header only implementation potentially create a problem with duplicate symbols

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>
2020-01-08 09:55:35 +02:00
Benny Halevy
20a0b1a0b6 test: cql_query_test: test avg overflow
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:50:50 +02:00
Benny Halevy
1c81422c1b cql3: functions: protect against int overflow in avg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
9053ef90c7 test: cql_query_test: test sum overflow
Add unit tests for summing up int's and bigint's
with possible handling of overflow.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
e97a111f64 cql3: functions: detect and handle int overflow in sum
Detect integer overflow in cql sum functions and throw an error.
Note that Cassandra quietly truncates the sum if it doesn't fit
in the input type but we rather break compatibility in this
case. See https://issues.apache.org/jira/browse/CASSANDRA-4914?focusedCommentId=14158400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14158400

Fixes #5536

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
98260254df exceptions: sort exception_code definitions
Be compatible with Cassandra source.
It's easier to maintain this way.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:21 +02:00
Benny Halevy
30d0f1df75 exceptions: define additional cassandra CQL exceptions codes
As of e9da85723a

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:40:57 +02:00
Rafael Ávila de Espíndola
282228b303 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
4316bc2e18 cql3: Add missing with_thread_if_needed call
This fixes an assert when doing sum(udf(...)).

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
d301d31de0 cql3: Implement abstract_function_selector::requires_thread
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:24 -08:00
Rafael Ávila de Espíndola
dc9b3b8ff2 remove make_ready_future call
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:10:27 -08:00
Calle Wilund
9f6b22d882 cdc: Assign self to storage proxy object 2020-01-07 12:01:58 +00:00
Calle Wilund
fc5904372b storage_proxy: Add (optional) cdc service object pointer member
The cdc service is assigned from outside, post construction, mainly
because of the chickens and eggs in main startup. Would be nice to
have it unconditionally, but this is workable.
2020-01-07 12:01:58 +00:00
Calle Wilund
d6003253dd storage_proxy: Move mutate_counters to private section
It is (and shall) only be called from inside storage proxy,
and we would like this to be reflected in the interface
so our eventual moving of cdc logic into the mutate call
chains become easier to verify and comprehend.
2020-01-07 12:01:58 +00:00
Calle Wilund
b6c788fccf cdc: Add augmentation call to cdc service
To eventually replace the free function.
Main difference is this is build to both handle batches correctly
and to eventually allow hanging cdc object on storage proxy,
and caches on the cdc object.
2020-01-07 12:01:58 +00:00
Piotr Sarna
04dc8faec9 test: add a case for multiple base regular columns in view key
The test case checks that having two base regular columns
in the materialized view key (not obtainable via CQL),
still works fine when values are inserted or deleted.
If TTL was involved and these columns would have different expiration
rules, the case would be more complicated, but it's not possible
for a user to reach that case - neither with CQL, nor with alternator.
2020-01-07 12:19:06 +01:00
Piotr Sarna
155a47cc55 view: handle multiple regular base columns in view pk
Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.
This patch is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006

Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)

Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>
2020-01-07 12:18:39 +01:00
Avi Kivity
6e0a073b2e mutation_partition: use type-aware printing of the clustering row
Now that position_in_partition_view has type-aware printing, use it
to provide a human readable version of clustering keys.
Message-Id: <20191231151315.602559-2-avi@scylladb.com>
2020-01-07 12:17:11 +01:00
Avi Kivity
488c42408a position_in_partition_view: add type-aware printer
If the position_in_partition_view represents a clustering key,
we can now see it with the clustering key decoded according to
the schema.
Message-Id: <20191231151315.602559-1-avi@scylladb.com>
2020-01-07 12:15:09 +01:00
Piotr Sarna
54315f89cd db,view: fix checking if partition key is empty
Previous implementation did not take into account that a column
in a partition key might exist in a mutation, but in a DEAD state
- if it's deleted. There are no regressions for CQL, while for
alternator and its capability of having two regular base columns
in a view key, this additional check must be performed.
2020-01-07 12:05:36 +01:00
Avi Kivity
3a3c20d337 schema_tables: de-templatize diff_table_or_view()
This reduces code bloat and makes the code friendlier for IDEs, as the
IDE now understands the type of create_schema.
Message-Id: <20191231134803.591190-1-avi@scylladb.com>
2020-01-07 11:56:54 +01:00
Avi Kivity
e5e42672f5 sstables: reduce bloat from sstables::write_simple()
sstables::write_simple() has quite a lot of boilerplate
which gets replicated into each template instance. Move
all of that into a non-template do_write_simple(), leaving
only things that truly depend on the component being written
in the template, and encapsulating them with a
noncopyable_function.

An explicit template instantiation was added, since this
is used in a header file. Before, it likely worked by
accident and stopped working when the template became
small enough to inline.

Tests: unit (dev)
Message-Id: <20200106135453.1634311-1-avi@scylladb.com>
2020-01-07 11:56:11 +01:00
Avi Kivity
8f7f56d6a0 schema_tables: make gratuitous generic lambda in create_tables_from_partitions() concrete
The generic lambda made IDE searches for create_table_from_table_row() fail.
Message-Id: <20191231135210.591972-1-avi@scylladb.com>
2020-01-07 11:49:10 +01:00
Avi Kivity
92fd83d3af schema_tables: make gratuitoous generic lambda in create_table_from_name() concrete
The lambda made IDE searches for read_table_mutations fail.
Message-Id: <20191231135103.591741-1-avi@scylladb.com>
2020-01-07 11:48:56 +01:00
Avi Kivity
dd6dd97df9 schema_tables: make gratuitous generic lambda in merge_tables_and_views() concrete
The generic lambda made IDE searches for create_table_from_mutations fail.
Message-Id: <20191231135059.591681-1-avi@scylladb.com>
2020-01-07 11:48:39 +01:00
Avi Kivity
c63cf02745 canonical_mutation: add pretty printing
Add type-aware printing of canonical_mutation objects.
2020-01-07 12:06:31 +02:00
Avi Kivity
e093121687 mutation_partition_view: add virtual visitor
mutation_partition_view now supports a compile-time resolved visitor.
This is performant but results in bloat when the performance is not
needed. Furthermore, the template function that applies the object
to the visitor is private and out-of-line, to reduce compile time.

To allow visitation on mutation_partition_view objects, add a virtual
visitor type and a non-template accept function.

Note: mutation_partition_visitor is very similar to the new type,
but different enough to break the template visitor which is used
to implement the new visitor.

The new visitor will be used to implement pretty printing for
canonical_mutation.
2020-01-07 12:06:31 +02:00
Avi Kivity
75d9909b27 collection_mutation_view: add type-aware pretty printer
Add a way for the user to associate a type with a collection_mutation_view
and get a nice printout.
2020-01-07 12:06:29 +02:00
Rafael Ávila de Espíndola
b80852c447 main: Explicitly allow scylla core dumps
I have not looked into the security reason for disabling it when
a program has file capabilities.

Fixes #5560

[avi: remove extraneous semicolon]
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200106231836.99052-1-espindola@scylladb.com>
2020-01-07 11:15:59 +02:00
Rafael Ávila de Espíndola
07f1cb53ea tests: run with ASAN_OPTIONS='disable_coredump=0:abort_on_error=1'
These are the same options we use in seastar.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200107001513.122238-1-espindola@scylladb.com>
2020-01-07 11:11:49 +02:00
Takuya ASADA
238a25a0f4 docker: fix typo of scylla-jmx script path (#5551)
The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx.

Fixes #5542
2020-01-07 10:54:16 +02:00
Asias He
401854dbaf repair: Avoid duplicated partition_end write
Consider this:

1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
   wait_for_writer_done(). A duplicate partition_end is written.

To fix, track the partition_start and partition_end written, avoid
unpaired writes.

Backports: 3.1 and 3.2
Fixes: #5527
2020-01-06 14:06:02 +02:00
Eliran Sinvani
e64445d7e5 debian-reloc: Propagate PRODUCT variable to renaming command in debian pkg
commit 21dec3881c introduced
a bug that will cause scylla debian build to fail. This is
because the commit relied on the environment PRODUCT variable
to be exported (and as a result, to propogate to the rename
command that is executed by find in a subshell)
This commit fixes it by explicitly passing the PRODUCT variable
into the rename command.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>
2020-01-06 12:31:58 +02:00
Asias He
38d4015619 gossiper: Remove HIBERNATE status from dead state
In scylla, the replacing node is set as HIBERNATE status. It is the only
place we use HIBERNATE status. The replacing node is supposed to be
alive and updating its heartbeat, so it is not supposed to be in dead
state.

This patch fixes the following problem in replacing.

   1) start n1, n2
   2) n2 is down
   3) start n3 to replace n2, but kill n3 in the middle of the replace
   4) start n4 to replace n2

After step 3 and step 4, the old n3 will stay in gossip forever until a
full cluster shutdown. Note n3 will only stay in gossip but in
system.peers table. User will see the annoying and infinite logs like on
all the nodes

   rpc - client $ip_of_n3:7000: fail to connect: Connection refused

Fixes: #5449
Tests: replace_address_test.py + manual test
2020-01-06 11:47:31 +02:00
Amos Kong
c5ec1e3ddc scylla_ntp_setup: check redhat variant version by prase_version (#5434)
VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7"
scylla_ntp_setup doesn't work on OEL7.7 for ValueError.

- ValueError: invalid literal for int() with base 10: '7.7'

This patch changed redhat_version() to return version string, and compare
with parse_version().

Fixes #5433

Signed-off-by: Amos Kong <amos@scylladb.com>
2020-01-06 11:43:14 +02:00
Asias He
145fd0313a streaming: Fix map access in stream_manager::get_progress
When the progress is queried, e.g., query from nodetool netstats
the progress info might not be updated yet.

Fix it by checking before access the map to avoid errors like:

std::out_of_range (_Map_base::at)

Fixes: #5437
Tests: nodetool_additional_test.py:TestNodetool.netstats_test
2020-01-06 10:31:15 +02:00
Rafael Ávila de Espíndola
98cd8eddeb tests: Run with halt_on_error=1:abort_on_error=1
This depends on the just emailed fixes to undefined behavior in
tests. With this change we should quickly notice if a change
introduces undefined behavior.

Fixes #4054

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>

Message-Id: <20191230222646.89628-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Rafael Ávila de Espíndola
dc5ecc9630 enum_option_test: Add explicit underlying types to enums
We expect to be able to create variables with out of range values, so
these enums needs explicit underlying types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200102173422.68704-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Nadav Har'El
f0d8dd4094 merge: CDC rolling upgrade
Merged pull request https://github.com/scylladb/scylla/pull/5538 from
Avi Kivity and Piotr Jastrzębski.

This series prepares CDC for rolling upgrade. This consists of
reducing the footprint of cdc, when disabled, on the schema, adding
a cluster feature, and redacting the cdc column when transferring
it to other nodes. The latter is needed because we'll want to backport
this to 3.2, which doesn't have canonical_mutations yet.
2020-01-05 17:13:12 +02:00
Gleb Natapov
720c0aa285 commitlog: update last sync timestamp when cycle a buffer
If in memory buffer has not enough space for incoming mutation it is
written into a file, but the code missed updating timestamp of a last
sync, so we may sync to often.
Message-Id: <20200102155049.21291-9-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
14746e4218 commitlog: drop segment gate
The code that enters the gate never defers before leaving, so the gate
behaves like a flag. Lets use existing flag to prohibit adding data to a
closed segment.
Message-Id: <20200102155049.21291-8-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
f8c8a5bd1f test: fix error reporting in commitlog_test
Message-Id: <20200102155049.21291-7-gleb@scylladb.com>
2020-01-05 16:13:58 +02:00
Gleb Natapov
680330ae70 commitlog: introduce segment::close() function.
Currently segment closing code is spread over several functions and
activated based on the _closed flag. Make segment closing explicit
by moving all the code into close() function and call it where _closed
flag is set.
Message-Id: <20200102155049.21291-6-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
a1ae08bb63 commitlog: remove unused segment::flush() parameter
Message-Id: <20200102155049.21291-5-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
1e15e1ef44 commitlog: cleanup segment sync()
Call cycle() only once.
Message-Id: <20200102155049.21291-4-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
3d3d2c572e commitlog: move segment shutdown code from sync()
Currently sync() does two completely different things based on the
shutdown parameter. Separate code into two different function.
Message-Id: <20200102155049.21291-3-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
89afb92b28 commitlog: drop superfluous this
Message-Id: <20200102155049.21291-2-gleb@scylladb.com>
2020-01-05 16:13:53 +02:00
Piotr Jastrzebski
95feeece0b scylla_tables: treat empty cdc props as disabled
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
396e35bf20 cdc: add schema_change test for cdc_options
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after CDC was enabled
and a table with CDC enabled is created,
in order to make sure that the digest computed
including CDC column does not change spuriously as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
c08e6985cd cdc: allow cluster rolling upgrade
Addition of cdc column in scylla_tables changes how schema
digests are calculated, and affect the ABI of schema update
messages (adding a column changes other columns' indexes
in frozen_mutation).

To fix this, extend the schema_tables mechanism with support
for the cdc column, and adjust schemas and mutations to remove
that column when sending schemas during upgrade.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
caa0a4e154 tests: disable CDC in schema_change_tests
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
129af99b94 cdc: Return reference from cluster_supports_cdc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
4639989964 cdc: Add CDC_OPTIONS schema_feature
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Avi Kivity
c150f2e5d7 schema_tables, cdc: don't store empty cdc columns in scylla_tables
An empty cdc column in scylla_tables is hashed differently from
a missing column. This causes schema mismatch when a schema is
propagated to another node, because the other node will redact
the schema column completely if the cluster feature isn't enabled,
and an empty value is hashed differently from a missing value.

Store a tombstone instead. Tombstones are removed before
digesting, so they don't affect the outcome.

This change also undoes the changes in 386221da84 ("schema_tables:
 handle 'cdc' options") to schema_change_test
test_merging_does_not_alter_tables_which_didnt_change. That change
enshrined the breakage into the test, instead of fixing the root cause,
which was that we added an an extra mutation to the schema (for
cdc options, which were disabled).
2020-01-05 14:36:18 +02:00
Rafael Ávila de Espíndola
3d641d4062 lua: Use existing cpp_int cast logic
Different versions of boost have different rules for what conversions
from cpp_int to smaller intergers are allowed.

We already had a function that worked with all supported versions, but
it was not being use by lua.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200104041028.215153-1-espindola@scylladb.com>
2020-01-05 12:10:54 +02:00
Rafael Ávila de Espíndola
88b5aadb05 tests: cql_test_env: wait for two futures starting internal services
I noticed this while looking at the crashes next is currently
experiencing.

While I have no idea if this fixes the issue, it does avoid broken
future warnings (for no_sharded_instance_exception) in a debug build.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200103201540.65324-1-espindola@scylladb.com>
2020-01-05 12:09:59 +02:00
Avi Kivity
4b8e2f5003 Update seastar submodule
* seastar 0525bbb08...36cf5c5ff (6):
  > memcached: Fix use after free in shutdown
  > Revert "task: stop wrapping tasks with unique_ptr"
  > task: stop wrapping tasks with unique_ptr
  > http: Change exception formating to the generic seastar one
  > Merge "Avoid a few calls to ~exception_ptr" from Rafael
  > tests: fix core generation with asan
2020-01-03 15:48:53 +02:00
Nadav Har'El
44c2a44b54 alternator-test: test for ConditionExpression feature
This patch adds a very comprehensive test for the ConditionExpression
feature, i.e., the newer syntax of conditional writes replacing
the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem
operations.

I wrote these tests while closely following the DynamoDB ConditionExpression
documentation, and attempted to cover all conceivable features, subfeatures
and subcases of the ConditionExpression syntax - to serve as a test for a
future support for this feature in Alternator (see issue #5053).

As usual, all these tests pass on AWS DynamoDB, but because we haven't yet
implemented this feature in Alternator, all but one xfail on Alternator.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191229143556.24002-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
aad5eeab51 alternator: better error messages when Alternator port is taken
If Alternator is requested to be enabled on a specific port but the port is
already taken, the boot fails as expected - but the error log is confusing;
It currently looks something like this:

WARN  2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
... (many more messages about the server shutting down)
INFO  2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

There are two problems here. First, the "WARN" should really be an "ERROR",
because it causes the server to be shut down and the user must see this error.
Second, the final line in the log, something the user is likely to see first,
contains only the ultimate cause for the exception (an address already in use)
but not the information what this address was needed for.

This patch solves both issues, and the log now looks like:

ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna
tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system
:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
...
INFO  2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191224124127.7093-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
1f64a3bbc9 alternator: error on unsupported ReturnValues option
We don't support yet the ReturnValues option on PutItem, UpdateItem or
DeleteItem operations (see issue #5053), but if a user tries to use such
an option anyway, we silently ignore this option. It's better to fail,
reporting the unsupported option.

In this patch we check the ReturnValues option and if it is anything but
the supported default ("NONE"), we report an error.

Also added a test to confirm this fix. The test verifies that "NONE" is
allowed, and something which is unsupported (e.g., "DOG") is not ignored
but rather causes an error.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191216193310.20060-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
dc93228b66 reloc: Turn the default flags into common flags
These are flags we always want to enable. In particular, we want them
to be used by the bots, but the bots run this script with
--configure-flags, so they were being discarded.

We put the user option later so that they can override the common
options.

Fixes #5505

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
d4dfb6ff84 build-id: Handle the binary having multiple PT_NOTE headers
There is no requirement that all notes be placed in a single
PT_NOTE. It looks like recent lld's actually put each section in its
own PT_NOTE.

This change looks for build-id in all PT_NOTE headers.

Fixes #5525

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191227000311.421843-1-espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Avi Kivity
1e9237d814 dist: redhat: use parallel compression for rpm payload
rpm compression uses xz, which is painfully slow. Adjust the
compression settings to run on all threads.

The xz utility documentation suggests that 0 threads is
equivalent to all CPUs, but apparently the library interface
(which rpmbuild uses) doesn't think the same way.

Message-Id: <20200101141544.1054176-1-avi@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
de1171181c user defined types: fix support for case-sensitive type names
In the current code, support for case-sensitive (quoted) user-defined type
names is broken. For example, a test doing:

    CREATE TYPE "PHone" (country_code int, number text)
    CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk))

Fails - the first line creates the type with the case-sensitive name PHone,
but the second line wrongly ends up looking for the lowercased name phone,
and fails with an exception "Unknown type ks.phone".

The problem is in cql3_type_name_impl. This class is used to convert a
type object into its proper CQL syntax - for example frozen<list<int>>.
The problem is that for a user-defined type, we forgot to quote its name
if not lowercase, and the result is wrong CQL; For example, a list of
PHone will be written as list<PHone> - but this is wrong because the CQL
parser, when it sees this expression, lowercases the unquoted type name
PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>.

The solution is for cql3_type_name_impl to use for a user-defined type
its get_name_as_cql_string() method instead of get_name_as_string().

get_name_as_cql_string() is a new method which prints the name of the
user type as it should be in a CQL expression, i.e., quoted if necessary.

The bug in the above test was apparently caused when our code serialized
the type name to disk as the string PHone (without any quoting), and then
later deserialized it using the CQL type parser, which converted it into
a lowercase phone. With this patch, the type's name is serialized as
"PHone", with the quotes, and deserialized properly as the type PHone.
While the extra quotes may seem excessive, they are necessary for the
correct CQL type expression - remember that the type expression may be
significantly more complex, e.g., frozen<list<"PHone">> and all of this,
including the quotes, is necessary for our parser to be able to translate
this string back into a type object.

This patch may cause breakage to existing databases which used case-
sensitive user-defined types, but I argue that these use cases were
already broken (as demonstrated by this test) so we won't break anything
that actually worked before.

Fixes #5544

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200101160805.15847-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
34f8762c4d storage_service: Drop _update_jobs
This field is write-only.
Leftover from 83ffae1 (storage_service: Drop block_until_update_pending_ranges_finished)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226091210.20966-1-xemul@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
f2b20e7083 cache_hitrate_calculator: Do not reinvent the peering_sharded_service
The class in question wants to run its own instances on different
shards, for this sake it keeps reference on sharded self to call
invoke_on() on. There's a handy peering_sharded_service<> in seastar
for the same, using it makes the code nicer and shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226112401.23960-1-xemul@scylladb.com>
2020-01-03 15:48:19 +02:00
Rafael Ávila de Espíndola
bbed9cac35 cql3: move function creation to a .cc file
We had a lot of code in a .hh file, that while using templeates, was
only used from creating functions during startup.

This moves it to a new .cc file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200101002158.246736-1-espindola@scylladb.com>
2020-01-03 15:48:19 +02:00
Benny Halevy
c0883407fe scripts: Add cpp-name-format: pretty printer
Pretty-print cpp-names, useful for deciphering complex backtraces.

For example, the following line:
    service::storage_proxy::init_messaging_service()::{lambda(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>)#1}::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Is formatted as:
    service::storage_proxy::init_messaging_service()::{
      lambda(
        seastar::rpc::client_info const&,
        seastar::rpc::opt_time_point,
        std::vector<
          frozen_mutation,
          std::allocator<frozen_mutation>
        >,
        db::consistency_level,
        std::optional<tracing::trace_info>
      )#1
    }::operator()(
      seastar::rpc::client_info const&,
      seastar::rpc::opt_time_point,
      std::vector<
        frozen_mutation,
        std::allocator<frozen_mutation>
      >,
      db::consistency_level,
      std::optional<tracing::trace_info>
    ) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191226142212.37260-1-bhalevy@scylladb.com>
2020-01-01 12:08:12 +02:00
Rafael Ávila de Espíndola
75817d1fe7 sstable: Add checks to help track problems with large_data_handler use after free
I can't quite figure out how we were trying to write a sstable with
the large data handler already stopped, but the backtrace suggests a
good place to add extra checks.

This patch adds two check. One at the start and one at the end of
sstable::write_components. The first one should give us better
backtraces if the large_data_handler is already stopped. The second
one should help catch some race condition.

Refs: #5470
Message-Id: <20191231173237.19040-1-espindola@scylladb.com>
2020-01-01 12:03:31 +02:00
Rafael Ávila de Espíndola
3c34e2f585 types: Avoid an unaligned load in json integer serialization
The patch also adds a test that makes the fixed issue easier to
reproduce.

Fixes #5413
Message-Id: <20191231171406.15980-1-espindola@scylladb.com>
2019-12-31 19:23:42 +02:00
Gleb Natapov
bae5cb9f37 commitlog: remove unused argument during segment creation
Since 99a5a77234 all segments are created
equal and "active" argument is never true, so drop it.

Message-Id: <20191231150639.GR9084@scylladb.com>
2019-12-31 17:14:03 +02:00
Rafael Ávila de Espíndola
aa535a385d enum_option_test: Add an explicit underlying type to an enum
We expect to be able to create a variable with an out of range value,
so the enum needs an explicit underlying type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191230222029.88942-1-espindola@scylladb.com>
2019-12-31 16:59:00 +02:00
Nadav Har'El
48a914c291 Fix uninitialized members
Merged pull request https://github.com/scylladb/scylla/pull/5532 from
Benny Halevy:

Initialize bool members in row_level_repair and _storage_service causing
ubsan errors.

Fixes #5531
2019-12-31 10:32:54 +02:00
Takuya ASADA
aa87169670 dist/debian: add procps on Depends
We require procps package to use sysctl on postinst script for scylla-kernel-conf.

Fixes #5494

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191218234100.37844-1-syuu@scylladb.com>
2019-12-30 19:30:35 +02:00
Avi Kivity
972127e3a8 atomic_cell: add type-aware pretty printing
The standard printer for atomic_cell prints the value as hex,
because atomic_cell does not include the type. Add a type-aware
printer that allows the user to provide the type.
2019-12-30 18:27:04 +02:00
Avi Kivity
19f68412ad atomic_cell: move pretty printers from database.cc to atomic_cell.cc
atomic_cell.cc is the logical home for atomic_cell pretty printers,
and since we plan to add more pretty printers, start by tidying up.
2019-12-30 18:20:30 +02:00
Eliran Sinvani
21dec3881c debian-reloc: rename buld product to the name specified in SCYLLA-VERSION-GEN
When the product name is other than "scylla", the debian
packaging scripts go over all files that starts with "scylla-"
and change the prefix to be the actual product name.
However, if there are no such files in the directory
the script will fail since the renaming command will
get the wildcard string instrad of an actual file name.
This patch replaces the command with a command with
an equivalent desired effect that only operates on files
if there are any.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>
2019-12-30 17:45:50 +02:00
Takuya ASADA
263385cb4b dist: stop replacing /usr/lib/scylla with symlink (#5530)
Since we merged /usr/lib/scylla with /opt/scylladb, we removed
/usr/lib/scylla and replace it with the symlink point to /opt/scylladb.
However, RPM does not support replacing a directory with a symlink,
we are doing some dirty hack using RPM scriptlet, but it causes
multiple issues on upgrade/downgrade.
(See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/)

To minimize Scylla upgrading/downgrade issues on user side, it's better
to keep /usr/lib/scylla directory.
Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb,
we can create symlinks for each setup scripts like
/usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>.

Fixes #5522
Fixes #4585
Fixes #4611
2019-12-30 13:52:24 +02:00
Hagit Segev
9d454b7dc6 reloc/build_rpm.sh: Fix '--builddir' option handling (#5519)
The '--builddir' option value is assigned to the "builddir" variable,
which is wrong. The correct variable is "BUILDDIR" so use that instead
to fix the '--builddir' option.

Also, add logging to the script when executing the "dist/redhat_build.rpm.sh"
script to simplify debugging.
2019-12-30 13:25:22 +02:00
Benny Halevy
8aa5d84dd8 storage_service: initialize _is_bootstrap_mode
Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode:
  service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool'

The use site is:
  service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519

While at it, initialize `_initialized` to false as well, just in case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Benny Halevy
474ffb6e54 repair: initialize row_level_repair: _zero_rows
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'

Fixes #5531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Fabiano Lucchese
d7795b1efa scylla_setup: Support for enforcing optimal Linux clocksource setting (#5499)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is used.

Fixes #4474
Fixes #5474
Fixes #5480
2019-12-30 10:54:14 +02:00
Avi Kivity
e223154268 cdc: options: return an empty options map when cdc is disabled
This is compatible with 3.1 and below, which didn't have that schema
field at all.
2019-12-29 16:34:37 +02:00
Benny Halevy
27e0aee358 docs/debugging.md: fix anchor links
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191229074136.13516-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Pavel Solodovnikov
aba9a11ff0 cql: pass variable_specifications via lw_shared_ptr
Instances of `variable_specifications` are passed around as
shared_ptr's, which are redundant in this case since the class
is marked as `final`. Use `lw_shared_ptr` instead since we know
for sure it's not a polymorphic pointer.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>
2019-12-29 16:26:26 +02:00
Benny Halevy
4c884908bb directories: Keep a unique set of directories to initialize
If any two directories of data/commitlog/hints/view_hints
are the same we still end up running verify_owner_and_mode
and disk_sanity(check_direct_io_support) in parallel
on the same directoriea and hit #5510.

This change uses std::set rather than std::vector to
collect a unique set of directories that need initialization.

Fixes #5510

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Gleb Natapov
60a851d3a5 commitlog: always flush segments atomically with writing
db::commitlog::segment::batch_cycle() assumes that after a write
for a certain position completes (as reported by
_pending_ops.wait_for_pending()) it will also be flushed, but this is
true only if writing and flushing are atomic wrt _pending_ops lock.
It usually is unless flush_after is set to false when cycle() is
called. In this case only writing is done under the lock. This
is exactly what happens when a segment is closed. Flush is skipped
because zero header is added after the last entry and then flushed, but
this optimization breaks batch_cycle() assumption. Fix it by flushing
after the write atomically even if a segment is being closed.

Fixes #5496

Message-Id: <20191224115814.GA6398@scylladb.com>
2019-12-24 14:52:23 +02:00
Pavel Emelyanov
a5cdfea799 directories: Do not mess with per-shard base dir
The hints and view_hints directory has per-shard sub-dirs,
and the directories code tries to create, check and lock
all of them, including the base one.

The manipulations in question are excessive -- it's enough
to check and lock either the base dir, or all the per-shard
ones, but not everything. Let's take the latter approach for
its simplicity.

Fixes #5510

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Looks-good-to: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223142429.28448-1-xemul@scylladb.com>
2019-12-24 14:49:28 +02:00
Benny Halevy
f8f5db42ca dbuild: try to pull image if not present locally
Pekka Enberg <penberg@scylladb.com> wrote:
> Image might not be present, but the subsequent "docker run" command will automatically pull it.

Just letting "docker run" fail produces kinda confusing error message,
referring to docker help, but the we want to provide the user
with our own help, so still fail early, just also try to pull the image
if "docker image inspect" failed, indicating it's not present locally.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>
2019-12-24 11:13:23 +02:00
Benny Halevy
ee2f97680a dbuild: just die when no image-id is provided
Suggested-by: Pekka Enberg <penberg@scylladb.com>
> This will print all the available Docker images,
> many (most?) of them completely unrelated.
> Why not just print an error saying that no image was specified,
> and then perhaps print usage.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>
2019-12-24 11:13:22 +02:00
Benny Halevy
87b2f189f7 dbuild: s/usage/die/
Suggested-by: Dejan Mircevski <dejan@scylladb.com>
> The use pattern of this function strongly suggests a name like `die`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-2-bhalevy@scylladb.com>
2019-12-24 11:13:21 +02:00
Benny Halevy
718e9eb341 table: move_sstables_from_staging: fix use after free of shared_sstable
Introduced in 4b3243f5b9

Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test
and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test

==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0
READ of size 8 at 0x60200023de18 thread T1 (reactor-1)
    #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289
    #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530
    #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556
    #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(
std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618
    #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626
    #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20)
freed by thread T1 (reactor-1) here:
    #0 0x7f8a153b796f in operator delete(void*) (/lib64/libasan.so.5+0x11096f)
    #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128
    #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470
    #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351
    #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332
    #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680
    #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477
    #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496
    #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573
    #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645
    #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488
    #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

Fixes #5511

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>
2019-12-23 15:20:41 +02:00
Konstantin Osipov
476fbc60be test.py: prepare to remove custom colors
Add dbuild dependency on python3-colorama,
which will be used in test.py instead of a hand-made palette.

[avi: update tools/toolchain/image]
Message-Id: <20191223125251.92064-2-kostja@scylladb.com>
2019-12-23 15:13:22 +02:00
Pavel Emelyanov
d361894b9d batchlog_manager: Speed up token_metadata endpoints counting a bit
In this place we only need to know the number of endpoints,
while current code additionally shuffles them before counting.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:45 +02:00
Pavel Emelyanov
6e06c88b4c token_metadata: Remove unused helper
There are two _identical_ methods in token_metadata class:
get_all_endpoints_count() and number_of_endpoints().
The former one is used (called) the latter one is not used, so
let's remove it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:43 +02:00
Pavel Emelyanov
2662d9c596 migration_manager: Remove run_may_throw() first argument
It's unused in this function. Also this helps getting
rid of global instances of components.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:42 +02:00
Pavel Emelyanov
703b16516a storage_service: Remove unused helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:41 +02:00
Takuya ASADA
e0071b1756 reloc: don't archive dist/ami/files/*.rpm on relocatable package
We should skip archiving dist/ami/files/*.rpm on relocatable package,
since it doesn't used.
Also packer and variables.json, too.

Fixes #5508

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191223121044.163861-1-syuu@scylladb.com>
2019-12-23 14:19:51 +02:00
Tomasz Grabiec
28dec80342 db/schema_tables: Add trace-level logging of schema digesting
This greatly helps to narrow down the source of schema digest mismatch
between nodes. Intented use is to enable this logger on disagreeing
nodes and trigger schema digest recalculation and observe which
mutations differ in digest and then examine their content.

Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>
2019-12-23 12:28:22 +02:00
Konstantin Osipov
1116700bc9 test.py: do not return 0 if there are failed tests
Fix a return value regression introduced when switching to asyncio.

Message-Id: <20191222134706.16616-2-kostja@scylladb.com>
2019-12-22 16:14:32 +02:00
Asias He
7322b749e0 repair: Do not return working_row_buf_nr in get combined row hash verb
In commit b463d7039c (repair: Introduce
get_combined_row_hash_response), working_row_buf_nr is returned in
REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is
scheduled to be part of 3.1 release. However it is not backported to 3.1
by accident.

In order to be compatible between 3.1 and 3.2 repair. We need to drop
the working_row_buf_nr in 3.2 release.

Fixes: #5490
Backports: 3.2
Tests: Run repair in a mixed 3.1 and 3.2 cluster
2019-12-21 20:13:15 +02:00
Takuya ASADA
8eaecc5ed6 dist/common/scripts/scylla_setup: add swap existance check
Show warnings when no swap is configured on the node.

Closes #2511

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220080222.46607-1-syuu@scylladb.com>
2019-12-21 20:03:58 +02:00
Pavel Solodovnikov
5a15bed569 cql3: return result_set by cref in cql3::result::result_set
Changes summary:
* make `cql3::result_set` movable-only
* change signature of `cql3::result::result_set` to return by cref
* adjust available call sites to the aforementioned method to accept cref

Motivation behind this change is elimination of dangerous API,
which can easily set a trap for developers who don't expect that
result_set would be returned by value.

There is no point in copying the `result_set` around, so make
`cql3::result::result_set` to cache `result_set` internally in a
`unique_ptr` member variable and return a const reference so to
minimize unnecessary copies here and there.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>
2019-12-21 16:56:42 +02:00
Takuya ASADA
3a6cb0ed8c install.sh: drop limits.d from nonroot mode
The file only required for root mode.

Fixes #5507

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220101940.52596-1-syuu@scylladb.com>
2019-12-21 15:26:08 +02:00
Botond Dénes
08bb0bd6aa mutation_fragment_stream_validator: wrap exceptions into own exception type
So a higher level component using the validator to validate a stream can
catch only validation errors, and let any other incidental exception
through.

This allows building data correctors on top of the
`mutation_fragment_stream_validator`, by filtering a fragment stream
through a validator, catching invalid fragment stream exceptions and
dropping the respective fragments from the stream.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>
2019-12-20 12:05:00 +01:00
Rafael Ávila de Espíndola
91c7f5bf44 Print build-id on startup
Fixes #5426

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191218031556.120089-1-espindola@scylladb.com>
2019-12-19 15:43:04 +02:00
Avi Kivity
440ad6abcc Revert "relocatable: Check that patchelf didn't mangle the PT_LOAD headers"
This reverts commit 237ba74743. While it
works for the scylla executable, it fails for iotune, which is built
by seastar. It should be reinstated after we pass the correct link
parameters to the seastar build system.
2019-12-19 11:20:34 +02:00
Pekka Enberg
c0aea19419 Merge "Add a timeout for housekeeping for offline installs" from Amnon
"
These series solves an issue with scylla_setup and prevent it from
waiting forever if housekeeping cannot look for the new Scylla version.

Fixes #5302

It should be backported to versions that support offline installations.
"

* 'scylla_setup_timeout' of git://github.com/amnonh/scylla:
  scylla_setup: do not wait forever if no reply is return housekeeping
  scylla_util.py: Add optional timeout to out function
2019-12-19 08:18:19 +02:00
Rafael Ávila de Espíndola
8d777b3ad5 relocatable: Use a super long path for the dynamic linker
Having a long path allows patchelf to change the interpreter without
changing the PT_LOAD headers and therefore without moving the
build-id out of the first page.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-1-espindola@scylladb.com>
2019-12-18 19:10:59 +02:00
Pavel Solodovnikov
c451f6d82a LWT: Fix required participants calculation for LOCAL_SERIAL CL
Suppose we have a multi-dc setup (e.g. 9 nodes distributed across
3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]).

When a query that uses LWT is executed with LOCAL_SERIAL consistency
level, the `storage_proxy::get_paxos_participants` function
incorrectly calculates the number of required participants to serve
the query.

In the example above it's calculated to be 5 (i.e. the number of
nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL,
which is equivalent to LOCAL_QUORUM cl in this case).

This behavior results in an exception being thrown when executing
the following query with LOCAL_SERIAL cl:

INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS

Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'}

Tests: unit(dev), dtest(consistency_test.py)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>
2019-12-18 16:58:32 +01:00
Botond Dénes
cd6bf3cb28 scylla-gdb.py: static_vector: update for changed storage
The actual buffer is now in a member called 'data'. Leave the old
`dummy.dummy` and `dummy` as fall-back. This seems to change every
Fedora release.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>
2019-12-18 17:39:56 +02:00
Tomasz Grabiec
5865d08d6c migration_manager: Recalculate schema only on shard 0
Schema is node-global, update_schema_version_and_announce() updates
all shards.  We don't need to recalculate it from every shard, so
install the listeners only on shard 0. Reduces noise in the logs.

Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>
2019-12-18 16:43:26 +02:00
Pavel Emelyanov
998f51579a storage_service: Rip join_ring config option
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.

This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.

The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
2019-12-18 12:45:13 +02:00
Nadav Har'El
8157f530f5 merge: CDC: handle schema changes
Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund:

Moves schema creation/alter/drop awareness to use new "before" callbacks from
migration manager, and adds/modifies log and streams table as part of the base
table modification.

Makes schema changes semi-atomic per node. While this does not deal with updates
coming in before a schema change has propagated cluster, it now falls into the
same pit as when this happens without CDC.

Added side effect is also that now schemas are transparent across all subsystems,
not just cql.

Patches:
  cdc_test: Add small test for altering base schema (add column)
  cdc: Handle schema changes via migration manager callbacks
  migration_manager: Invoke "before" callbacks for table operations
  migration_listener: Add empty base class and "before" callbacks for tables
  cql_test_env: Include cdc service in cql tests
  cdc: Add sharded service that does nothing.
  cdc: Move "options" to separate header to avoid to much header inclusion
  cdc: Remove some code from header
2019-12-17 23:04:36 +02:00
Avi Kivity
1157ee16a5 Update seastar submodule
* seastar 00da4c8760...0525bbb08f (7):
  > future: Simplify future_state_base::any move constructor
  > future: don't create temporary tuple on future::get().
  > future: don't instantiate new future on future::then_wrapped().
  > future: clean-up the Result handling in then_wrapped().
  > Merge "Fix core dumps when asan is enabled" from Rafael
  > future: Move ignore to the base class
  > future: Don't delete in ignore
2019-12-17 19:47:50 +02:00
Botond Dénes
638623b56b configure.py: make build.ninja target depend on SCYLLA-VERSION-GEN
Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and
hence changes done to it will not be picked up by ninja. To trigger a
rebuild and hence version changes to appear in the `scylla` target
binary, one has to do `touch configure.py`. This is counter intuitive
and frustrating to people who don't know about it and wonder why their
changed version is not appearing as the output of `scylla --version`.

This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja,
making the `build.ninja` target out-of-date whenever
`SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of
`configure.py` when the next target is built, allowing a build of e.g.
`scylla` to pick up any changes done to the version automatically.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>
2019-12-17 17:40:04 +02:00
Avi Kivity
7152ba0c70 Merge "tests: automatically search for unit tests" from Kostja
"
This patch set rearranges the test files so that
it is now possible to search for tests automatically,
and adds this functionality to test.py
"

* 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev:
  cmake: update CMakeLists.txt to scan test/ rather than tests/
  test.py: automatically lookup all unit and boost tests
  tests: move all test source files to their new locations
  tests: move a few remaining headers
  tests: move another set of headers to the new test layout
  tests: move .hh files and resources to new locations
  tests: remove executable property from data_listeners_test.cc
2019-12-17 17:32:18 +02:00
Amnon Heiman
dd42f83013 scylla_setup: do not wait forever if no reply is return housekeeping
When scylla is installed without a network connectivity, the test if a
newer version is available can cause scylla_setup to wait forever.

This patch adds a limit to the time scylla_setup will wait for a reply.

When there is no reply, the relevent error will be shown that it was
unable to check for newer version, but this will not block the setup
script.

Fixes #5302

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-17 14:56:47 +02:00
Nadav Har'El
aa1de5a171 merge: Synchronize snapshot and staging sstable deletion using sem
Merged pull request https://github.com/scylladb/scylla/pull/5343 from
Benny Halevy.

Fixes #5340

Hold the sstable_deletion_sem table::move_sstables_from_subdirs to
serialize access to the staging directory. It now synchronizes snapshot,
compaction deletion of sstables, and view_update_generator moving of
sstables from staging.

Tests:

    unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master]
    snapshot_test.py (dev)
2019-12-17 14:06:02 +02:00
Juliusz Stasiewicz
7fdc8563bf system_keyspace: Added infrastructure for table `system.clients'
I used the following as a reference:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java
At this moment there is only info about IP, clients outgoing port,
client 'type' (i.e. CQL/thrift/alternator), shard ID and username.
Column `request_count' is NOT present and CK consists of
(`port', `client_type'), contrary to what C*'s has: (`port').

Code that notifies `system.clients` about new connections goes
to top-level files `connection_notifier.*`. Currently only CQL
clients are observed, but enum `client_type` can be used in future
to notify about connections with other protocols.
2019-12-17 11:31:28 +01:00
Benny Halevy
4b3243f5b9 table: move_sstables_from_staging_in_thread with _sstable_deletion_sem
Hold the _sstable_deletion_sem while moving sstables from the staging directory
so not to move them under the feet of table::snapshot.

Fixes #5340

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0446ce712a view_update_generator::start: use variable binding
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
5d7c80c148 view_update_generator::start: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
02784f46b9 view_update_generator: handle errors when processing sstable
Consumer may throw, in this case, break from the loop and retry.

move_sstable_from_staging_in_thread may theoretically throw too,
ignore the error in this case since the sstable was already processed,
individual move failures are already ignored and moving from staging
will be retried upon restart.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
abda12107f sstables: move_to_new_dir: add do_sync_dirs param
To be used for "batch" move of several sstables from staging
to the base directory, allowing the caller to sync the directories
once when all are moved rather than for each one of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
6efef84185 sstable: return future from move_to_new_dir
distributed_loader::probe_file needlessly creates a seastar
thread for it and the next patch will use it as part of
a parallel_for_each loop to move a list of sstables
(and sync the directories once at the end).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0d2a7111b2 view_update_generator: sstable_with_table: std::move constructor args
Just a small optimization.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:19:55 +02:00
Nadav Har'El
fc85c49491 alternator: error on unsupported parallel scan
We do not yet support the parallel Scan options (TotalSegments, Segment),
as reported in issue #5059. But even before implementing this feature, it
is important that we produce an error if a user attempts to use it - instead
of outright ignoring this parameter. This is what this patch does.

The patch also adds a full test, test_scan.py::test_scan_parallel, for the
parallel scan feature. The test passes on DynamoDB, and still xfails
on Alternator after this patch - but now the Scan request fails immediately
reporting the unsupported option - instead of what the pre-patch code did:
returning the wrong results and the test failing just when the results
do not match the expectations.

Refs #5059.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191217084917.26191-1-nyh@scylladb.com>
2019-12-17 11:27:56 +02:00
Avi Kivity
f7d69b0428 Revert "Merge "bouncing lwt request to an owning shard" from Gleb"
This reverts commit 64cade15cc, reversing
changes made to 9f62a3538c.

This commit is suspected of corrupting the response stream.

Fixes #5479.
2019-12-17 11:06:10 +02:00
Rafael Ávila de Espíndola
237ba74743 relocatable: Check that patchelf didn't mangle the PT_LOAD headers
Should avoid issue #4983 showing up again.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-2-espindola@scylladb.com>
2019-12-16 20:18:32 +02:00
Avi Kivity
3b7aca3406 Merge "db: Don't create a reference to nullptr" from Rafael
"
Only the first patch is needed to fix the undefined behavior, but the
followup ones simplify the memory management around user types.
"

* 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla:
  db: Don't use lw_shared_ptr for user_types_metadata
  user_types_metadata: don't implement enable_lw_shared_from_this
  cql3: pass a const user_types_metadata& to prepare_internal
  db: drop special case for top level UDTs
  db: simplify db::cql_type_parser::parse
  db: Don't create a reference to nullptr
  Add test for loading a schema with a non native type
2019-12-16 17:10:58 +02:00
Konstantin Osipov
d6bc7cae67 cmake: update CMakeLists.txt to scan test/ rather than tests/
A follow up on directory rename.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
e079a04f2a test.py: automatically lookup all unit and boost tests 2019-12-16 17:47:42 +03:00
Konstantin Osipov
1c8736f998 tests: move all test source files to their new locations
1. Move tests to test (using singular seems to be a convention
   in the rest of the code base)
2. Move boost tests to test/boost, other
   (non-boost) unit tests to test/unit, tests which are
   expected to be run manually to test/manual.

Update configure.py and test.py with new paths to tests.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
2fca24e267 tests: move a few remaining headers
Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to
test/lib or test/boost and update dependent .cc files.
Move tests/perf_sstable.hh to test/perf/perf_sstable.hh
2019-12-16 17:47:42 +03:00
Konstantin Osipov
b9bf1fbede tests: move another set of headers to the new test layout
Move another small subset of headers to test/
with the same goals:
- preserve bisectability
- make the revision history traceable after a move

Update dependent files.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
8047d24c48 tests: move .hh files and resources to new locations
The plan is to move the unstructured content of tests/ directory
into the following directories of test/:

test/lib - shared header and source files for unit tests
test/boost - boost unit tests
test/unit - non-boost unit tests
test/manual - tests intended to be run manually
test/resource - binary test resources and configuration files

In order to not break git bisect and preserve the file history,
first move most of the header files and resources.
Update paths to these files in .cc files, which are not moved.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
644595e15f tests: remove executable property from data_listeners_test.cc
Executable flag must be committed to git by mistake.
2019-12-16 17:47:41 +03:00
Benny Halevy
d2e00abe13 tests: commitlog_test: test_allocation_failure: improve error reporting
We're seeing the following error from test from time to time:
  fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record

This is not reproducible and the error string does not contain
enough information to figure out what happened exactly, therefore
this patch adds an exception if the call succeeded unexpectedly
and also prints the unexpected exception if one was caught.

Refs #4714

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>
2019-12-16 15:38:48 +01:00
Asias He
6b7344f6e5 streaming: Fix typo in stream_result_future::maybe_complete
s/progess/progress/

Refs: #5437
2019-12-16 11:12:03 +02:00
Dejan Mircevski
f3883cd935 dbuild: Fix podman invocation (#5481)
The is_podman check was depending on `docker -v` printing "podman" in
the output, but that doesn't actually work, since podman prints $0.
Use `docker --help` instead, which will output "podman".

Also return podman's return status, which was previously being
dropped.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-16 11:11:48 +02:00
Avi Kivity
00ae4af94c Merge "Sanitize and speed-up (a bit) directories set up" from Pavel
"
On start there are two things that scylla does on data/commitlog/etc.
dirs: locks and verifies permissions. Right now these two actions are
managed by different approaches, it's convenient to merge them.

Also the introduced in this set directories class makes a ground for
better --workdir option handling. In particular, right now the db::config
entries are modified after options parse to update directories with
the workdir prefix. With the directories class at hands will be able
to stop doing this.
"

* 'br-directories-cleanup' of https://github.com/xemul/scylla:
  directories: Make internals work on fs::path
  directories: Cleanup adding dirs to the vector to work on
  directories: Drop seastar::async usage
  directories: Do touch_and_lock and verify sequentially
  directories: Do touch_and_lock in parallel
  directories: Move the whole stuff into own .cc file
  directories: Move all the dirs code into .init method
  file_lock: Work with fs::path, not sstring
2019-12-15 16:02:46 +02:00
Takuya ASADA
5e502ccea9 install.sh: setup workdir correctly on nonroot mode
Specify correct workdir on nonroot mode, to set correct path of
data / commitlog / hints directories at once.

Fixes #5475

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191213012755.194145-1-syuu@scylladb.com>
2019-12-15 16:00:57 +02:00
Avi Kivity
c25d51a4ea Revert "scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)"
This reverts commit 4333b37f9e. It breaks upgrades,
and the user question is not informative enough for the user to make a correct
decision.

Fixes #5478.
Fixes #5480.
2019-12-15 14:37:40 +02:00
Pavel Emelyanov
23a8d32920 directories: Make internals work on fs::path
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
373fcfdb3e directories: Cleanup adding dirs to the vector to work on
The unordered_set is turned into vector since for fs::path
there's no hash() method that's needed for set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
14437da769 directories: Drop seastar::async usage
Now the only future-able operation remained is the call to
parallel_for_each(), all the rest is non-blocking preparation,
so we can drop the seastar::async and just return the future
from parallel_for_each.

The indendation is now good, as in previous patch is was prepared
just for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
06f4f3e6d8 directories: Do touch_and_lock and verify sequentially
The goal is to drop the seastar::async() usage.

Currently we have two places that return futures -- calls to
parallel_for_each-s.  We can either chain them together or,
since both are working on the same set of directories, chain
actions inside them.

For code simplicity I propose to chain actions.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
8d0c820aa1 directories: Do touch_and_lock in parallel
The list of paths that should be touch-and-locked is already
at hands, this shortens the code and makes it slightly faster
(in theory).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
71a528d404 directories: Move the whole stuff into own .cc file
In order not to pollute the root dir place the code in
utils/ directory, "utils" namespace.

While doing this -- move the touch_and_lock from the
class declaration.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Benny Halevy
9ec98324ed messaging_service: unregister_handler: return rpc unregister_handler future
Now that seastar returns it.

Fixes https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>
2019-12-12 16:38:36 +02:00
Pavel Emelyanov
f2b3c17e66 directories: Move all the dirs code into .init method
The seastar::async usage is tempoarary, added for bisect-safety,
soon it will go away. For this reason the indentation in the
.init method is not "canonical", but is prepared for one-patch
drop of the seastar::async.

The hinted_handoff_enabled arg is there, as it's not just a
parameter on config, it had been parsed in main.cc.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:33:11 +03:00
Pavel Emelyanov
82ef2a7730 file_lock: Work with fs::path, not sstring
The main.cc code that converts sstring to fs::path
will be patched soon, the file_desc::open belongs
to seastar and works on sstrings.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:32:10 +03:00
Konstantin Osipov
bc482ee666 test.py: remove an unused option
Message-Id: <20191204142622.89920-2-kostja@scylladb.com>
2019-12-12 15:53:35 +02:00
Avi Kivity
64cade15cc Merge "bouncing lwt request to an owning shard" from Gleb
"
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.
"

* 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev:
  lwt: take raw lock for entire cas duration
  lwt: drop invoke_on in paxos_state prepare and accept
  lwt: Process lwt request on a owning shard
  storage_service: move start_native_transport into a thread
  transport: change make_result to takes a reference to cql result instead of shared_ptr
2019-12-12 15:50:22 +02:00
Nadav Har'El
9f62a3538c alternator: fix BEGINS_WITH operator for blobs
The implementation of Expected's BEGINS_WITH operator on blobs was
incorrect, naively comparing the base64-encoded strings, which doesn't
work. This patches fixes the code to compare the decoded strings.

The reason why the BEGINS_WITH test missed this bug was that we forgot
to check the blob case and only tested the string case; So this patch
also adds the missing test - which reproduces this bug, and verifies
its fix.

Fixes #5457

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191211115526.29862-1-nyh@scylladb.com>
2019-12-12 14:02:56 +01:00
Dejan Mircevski
27b8b6fe9d cql3: Fix needs_filtering() for clustering columns
The LIKE operator requires filtering, so needs_filtering() must check
is_LIKE().  This already happens for partition columns, but it was
overlooked for clustering columns in the initial implementation of
LIKE.

Fixes #5400.

Tests: unit(dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-12 01:19:13 +02:00
Benny Halevy
d1bcb39e7f hinted handoff: log message after removing hints directory (#5372)
To be used by dtest as an indicator that endpoint's hints
were drained and hints directory is removed.

Refs #5354

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-12 01:16:19 +02:00
Rafael Ávila de Espíndola
3b61cf3f0b db: Don't use lw_shared_ptr for user_types_metadata
The user_types_metadata can simply be owned by the keyspace. This
simplifies the code since we never have to worry about nulls and the
ownership is now explicit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
a55838323b user_types_metadata: don't implement enable_lw_shared_from_this
It looks like this was done just to avoid including
user_types_metadata.hh, which seems a bit much considering that it
requires adding specialization to the seastar namespace.

A followup patch will also stop using lw_shared_ptr for
user_types_metadata.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
f7c2c60b07 cql3: pass a const user_types_metadata& to prepare_internal
We never modify the user_types_metadata via prepare_internal, so we
can pass it a const reference.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
99cb8965be db: drop special case for top level UDTs
This was originally done in 7f64a6ec4b,
but that commit was reverted in reverted in
8517eecc28.

The revert was done because the original change would call parse_raw
for non UDT types. Unlike the old patch, this one doesn't change the
behavior of non UDT types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
7ae9955c5f db: simplify db::cql_type_parser::parse
The variant of db::cql_type_parser::parse that has a
user_types_metadata argument was only used from the variant that
didn't. This inlines one in the other.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
2092e1ef6f db: Don't create a reference to nullptr
The user_types variable can be null during db startup since we have to
create types before reading the system table defining user types.

This avoids undefined behavior, but is unlikely that it was causing
more serious problems since the variable is only used when creating
user types and we don't create any until after all system tables are
read, in which case the user_types variable is not null.

Fixes #5193

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
6143941535 Add test for loading a schema with a non native type
This would have found the error with the previous version of the patch
series.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:43:34 -08:00
Gleb Natapov
64cfb9b1f6 lwt: take raw lock for entire cas duration
It will prevent parallel update by the same coordinator and should
reduce contention.
2019-12-11 14:41:31 +02:00
Gleb Natapov
898d2330a2 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call.
2019-12-11 14:41:31 +02:00
Gleb Natapov
964c532c4f lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2019-12-11 14:41:31 +02:00
Gleb Natapov
54be057af3 storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2019-12-11 14:41:31 +02:00
Gleb Natapov
007ba3e38e transport: change make_result to takes a reference to cql result instead of shared_ptr 2019-12-11 14:41:31 +02:00
Nadav Har'El
9e5c6995a3 alternator-test: add tests for ReturnValues parameter
This patch adds comprehensive tests for the ReturnValue parameter of
the write operations (PutItem, UpdateItem, DeleteItem), which can return
pre-write or post-write values of the modified item. The tests are in
a new test file, alternator-test/test_returnvalues.py.

This feature is not yet implemented in Alternator, so all the new
tests xfail on Alternator (and all pass on AWS).

Refs #5053

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127163735.19499-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Nadav Har'El
ab69bfc111 alternator-test: add xfailing tests for ScanIndexForward
This patch adds tests for Query's "ScanIndexForward" parameter, which
can be used to return items in reversed sort order.
We test that a Limit works and returns the given number of *last* items
in the sort order, and also that such reverse queries can be resumed,
i.e., paging works in the reverse order.

These tests pass against AWS DynamoDB, but fail against Alternator (which
doesn't support ScanIndexForward yet), so it is marked xfail.

Refs #5153.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127114657.14953-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Pekka Enberg
6bc18ba713 storage_proxy: Remove reference to MBean interface
The JMX interface is implemented by the scylla-jmx project, not scylla.
Therefore, let's remove this historical reference to MBeans from
storage_proxy.

Message-Id: <20191211121652.22461-1-penberg@scylladb.com>
2019-12-11 14:24:28 +02:00
Avi Kivity
63474a3380 Merge "Add experimental_features option" from Dejan
"
Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser.

Fixes #5338
"

* 'vecexper' of https://github.com/dekimir/scylla:
  config: Add `experimental_features` option
  utils: Add enum_option
2019-12-11 14:23:08 +02:00
Avi Kivity
56b9bdc90f Update seastar submodule
* seastar e440e831c8...00da4c8760 (7):
  > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi
Fixes #5443.
  > install-dependencies.sh: fix arch dependencies
  > Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny
  > Merge "testing: improve the observability of abandoned failed futures" from Botond
  > rework the fair_queue tester
  > directory_test: Update to use run instead of run_deprecated
  > log: support fmt 6.0 branch with chrono.h for log
2019-12-11 14:17:49 +02:00
Benny Halevy
105c8ef5a9 messaging_service: wait on unregister_handler
Prepare for returning future<> from seastar rpc
unregister_handler.

Refs https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>
2019-12-11 14:17:41 +02:00
Nadav Har'El
06c3802a1a storage_proxy: avoid overflow in view-backlog delay calculation
In the calculate_delay() code for view-backlog flow control, we calculate
a delay and cap it at a "budget" - the remaining timeout. This timeout is
measured in milliseconds, but the capping calculation converted it into
microseconds, which overflowed if the timeout is very large. This causes
some tests which enable the UB sanitizer to fail.

We fix this problem by comparing the delay to the budget in millisecond
resolution, not in microsecond resolution. Then, if the calculated delay
is short enough, we return it using its full microsecond resolution.

Fixes #5412

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191205131130.16793-1-nyh@scylladb.com>
2019-12-11 14:10:54 +02:00
Nadav Har'El
2824d8f6aa Merge: alternator: Fix EQ operator for sets
Merged pull request https://github.com/scylladb/scylla/pull/5453
from Piotr Sarna:

Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:

[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 13:20:25 +02:00
Piotr Sarna
421db1dc9d alternator-test: remove XFAIL from set EQ test
With this series merged, test_update_expected_1_eq_set from
test_expected.py suite starts passing.
2019-12-11 12:07:39 +01:00
Piotr Sarna
a8e45683cb alternator: add EQ comparison for sets
Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:
[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 12:07:39 +01:00
Piotr Sarna
fb37394995 schema_tables: notify table deletions before creations
If a set of mutations contains both an entry that deletes a table
and an entry that adds a table with the same name, it's expected
to be a replacement operation (delete old + create new),
rather than a useless "try to create a table even though it exists
already and then immediately delete the original one" operation.
As such, notifications about the deletions should be performed
before notifications about the creations. The place that originally
suffered from this wrong order is view building - which in this case
created an incorrect duplicated entry in the view building bookkeeping,
and then immediately deleted it, resulting in having old, deprecated
entries with stale UUIDS lying in the build queue and never proceeding,
because the underlying table is long gone.
The issue is fixed by ensuring the order of notifications:
 - drops are announced first, view drops are announced before table drops;
 - creations follow, table creations are announced before views;
 - finally, changes to tables and views are announced;

Fixes #4382

Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test
2019-12-11 12:48:29 +02:00
Benny Halevy
d544df6c3c dist/ami/build_ami.sh: support incremental build of rpms (#5191)
Iterate over an array holding all rpm names to see if any
of them is missing from `dist/ami/files`. If they are missing,
look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh
was run manually before dist/ami/build_ami.sh we can just collect
the built rpms from its output dir.

If we're still missing any rpms, then run reloc/build_rpm.sh
and copy the required rpms from build/redhat/RPMS/x86_64.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
2019-12-11 12:48:29 +02:00
Amnon Heiman
f43285f39a api: replace swagger definition to use long instead of int (#5380)
In swagger 1.2 int is defined as int32.

We originally used int following the jmx definition, in practice
internally we use uint and int64 in many places.

While the API format the type correctly, an external system that uses
swagger-based code generator can face a type issue problem.

This patch replace all use of int in a return type with long that is defined as int64.

Changing the return type, have no impact on the system, but it does help
external systems that use code generator from swagger.

Fixes #5347

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-11 12:48:29 +02:00
Nadav Har'El
2abac32f2e Merged: alternator: Implement CONTAINS and NOT_CONTAINS in Expected
Merged pull request https://github.com/scylladb/scylla/pull/5447
by Dejan Mircevski.

Adds the last missing operators in the "Expected" parameter and re-enable
their tests.

Fixes #5034.
2019-12-11 12:48:29 +02:00
Cem Sancak
86b8036502 Fix DPDK mode in prepare script
Fixes #5455.
2019-12-11 12:48:29 +02:00
Calle Wilund
35089da983 conf/config: Add better descriptive text on server/client encryption
Provide some explanation on prio strings + direction to gnutls manual.
Document client auth option.
Remove confusing/misleading statement on "custom options"

Message-Id: <20191210123714.12278-1-calle@scylladb.com>
2019-12-11 12:48:28 +02:00
Dejan Mircevski
32af150f1d alternator: Implement NOT_CONTAINS operator in Expected
Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of
recognized operators, implement check_NOT_CONTAINS, and hook it up to
verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
bd2bd3c7c8 alternator: Implement CONTAINS operator in Expected
Enable existing CONTAINS test, implement check_CONTAINS, and hook it
up to verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
5a56fd384c config: Add experimental_features option
When the user wants to turn on only some experimental features, they
can use this new option.  The existing `experimental` option is
preserved for backwards compatibility.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 11:47:03 -05:00
Piotr Sarna
9504bbf5a4 alternator: move unwrap_set to serialization header
The utility function for unwrapping a set is going to be useful
across source files, so it's moved to serialization.hh/serialization.cc.
2019-12-10 15:08:47 +01:00
Piotr Sarna
4660e58088 alternator: move rjson value comparison to rjson.hh
The comparison struct is going to be useful across source files,
so it's moved into rjson header, where it conceptually belongs anyway.
2019-12-10 15:08:47 +01:00
Botond Dénes
db0e2d8f90 scylla-gdb.py: document and add safety net to seastar::thread related commands
Almost all commands provided by `scylla-gdb.py` are safe to use. The
worst that could happen if they fail is that you won't get the desired
information. There is one notable exception: `scylla thread`. If
anything goes wrong while this command is executed - gdb crashes, a bug
in the command, etc. - there is a good change the process under
examination will crash. Sometimes this is fine, but other times e.g.
when live debugging a production node, this is unacceptable.
To avoid any accidents add documentation to all commands working with
`seastar::thread`. And since most people don't read documentation,
especially when debugging under pressure, add a safety net to the
`scylla thread` command. When run, this command will now warn of the
dangers and will ask for explicit acknowledgment of the risk of crash,
by means of passing an `--iamsure` flag. When this flag is missing, it
will refuse to run. I am sure this will be very annoying but I am also
sure that the avoided crashes are worth it.

As part of making `scylla thread` safe, its argument parsing code is
migrated to `argparse`. This changes the usage but this should be fine
because it is well documented.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>
2019-12-10 11:51:57 +02:00
Eliran Sinvani
765db5d14f build_ami: Trim ami description attribute to the allowed size
The ami description attribute is only allowed to be 255
characters long. When build_ami.sh generates an ami, it
generates an ami description which is a concatenation
of all of the componnents version strings. It can
happen that the description string is too long which
eventually causes the ami build to fail. This patch
trims the description string to 255 characters.
It is ok since the individual versions of the components
are also saved in tags attached to the image.

Tests:
 1. Reproduced with a long description and
    validated that it doesn't fail after the fix.

Fixes #5435

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>
2019-12-10 11:51:57 +02:00
Fabiano Lucchese
4333b37f9e scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is
used.

Fixes #4474
2019-12-10 11:51:57 +02:00
Pavel Emelyanov
3a21419fdb features: Remove _FEATURE suffix from hinted_handoff feature name
All the other features are named w/o one. The internal const-s
are all different, but I'm fixing it separately.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191209154310.21649-1-xemul@scylladb.com>
2019-12-10 11:51:57 +02:00
Dejan Mircevski
a26bd9b847 utils: Add enum_option
This allows us to accept command-line options with a predefined set of
valid arguments.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-09 09:45:59 -05:00
Calle Wilund
7c5e4c527d cdc_test: Add small test for altering base schema (add column) 2019-12-09 14:35:04 +00:00
Calle Wilund
cb0117eb44 cdc: Handle schema changes via migration manager callbacks
This allows us to create/alter/drop log and desc tables "atomically"
with the base, by including these mutations in the original mutation
set, i.e. batch create/alter tables.

Note that population does not happen until types are actually
already put into database (duh), thus there _is_ still a gap
between creating cdc and it being truly usable. This may or may
not need handling later.
2019-12-09 14:35:04 +00:00
Rafael Ávila de Espíndola
761b19cee5 build: Split the build and host linker flags
A general build system knows about 3 machines:

* build: where the building is running
* host: where the built software will run
* target: the machine the software will produce code for

The target machine is only relevant for compilers, so we can ignore
it.

Until now we could ignore the build and host distinction too. This
patch adds the first difference: don't use host ld_flags when linking
build tools (gen_crc_combine_table).

The reason for this change is to make it possible to build with
-Wl,--dynamic-linker pointing to a path that will exist on the host
machine, but may not exist on the build machine.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191207030408.987508-1-espindola@scylladb.com>
2019-12-09 15:54:57 +02:00
Calle Wilund
27183f648d migration_manager: Invoke "before" callbacks for table operations
Potentially allowing (cdc) augmentation of mutations.

Note: only does the listener part in seastar::thread, to avoid
changing call behaviour.
2019-12-09 12:12:09 +00:00
Calle Wilund
f78a3bf656 migration_listener: Add empty base class and "before" callbacks for tables
Empty base type makes for less boiler plate in implementations.
The "before" callbacks are for listeners who need to potentially
react/augment type creation/alteration _before_ actually
committing type to schema tables (and holding the semaphore for this).

I.e. it is for cdc to add/modify log/desc tables "atomically" with base.
2019-12-09 12:12:09 +00:00
Calle Wilund
4e406105b1 cql_test_env: Include cdc service in cql tests 2019-12-09 12:12:09 +00:00
Calle Wilund
a21e140169 cdc: Add sharded service that does nothing.
But can be used to hang functionality into eventually.
2019-12-09 12:12:09 +00:00
Calle Wilund
2787b0c4f8 cdc: Move "options" to separate header to avoid to much header inclusion
cdc should not contaminate the whole universe.
2019-12-09 12:12:09 +00:00
fastio
8f326b28f4 Redis: Combine all the source files redis/commands/* into redis/commands.{hh,cc}
Fixes: #5394

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-08 13:54:33 +02:00
Avi Kivity
9c63cd8da5 sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417)
The vm.swappiness sysctl controls the kernel's prefernce for swapping
anonymous memory vs page cache. Since Scylla uses very large amounts
of anonymous memory, and tiny amounts of page cache, the correct setting
is to prefer swapping page cache. If the kernel swaps anonymous memory
the reactor will stall until the page fault is satisfied. On the other
hand, page cache pages usually belong to other applications, usually
backup processes that read Scylla files.

This setting has been used in production in Scylla Cloud for a while
with good results.

Users can opt out by not installing the scylla-kernel-conf package
(same as with the other kernel tunables).
2019-12-08 13:04:25 +02:00
Avi Kivity
0e319e0359 Update seastar submodule
* seastar 166061da3...e440e831c (8):
  > Fail tests on ubsan errors
  > future: make a couple of asserts more strict
  > future: Move make_ready out of line
  > config: Do not allow zero rates
Fixes #5360
  > future: add new state to avoid temporaries in get_available_state().
  > future: avoid temporary future_state on get_available_state().
  > future: inline future::abandoned
  > noncopyable_function: Avoid uninitialized warning on empty types
2019-12-06 18:33:23 +02:00
Piotr Sarna
0718ff5133 Merge 'min/max on collections returns human-readable result' from Juliusz
Previously, scylla used min/max(blob)->blob overload for collections,
tuples and UDTs; effectively making the results being printed as blobs.
This PR adds "dynamically"-typed min()/max() functions for compound types.

These types can be complicated, like map<int,set<tuple<..., and created
in runtime, so functions for them are created on-demand,
similarly to tojson(). The comparison remains unchanged - underneath
this is still byte-by-byte weak lex ordering.

Fixes #5139

* jul-stas/5139-minmax-bad-printing-collections:
  cql_query_tests: Added tests for min/max/count on collections
  cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
2019-12-06 16:40:17 +01:00
Juliusz Stasiewicz
75955beb0b cql_query_tests: Added tests for min/max/count on collections
This tests new min/max function for collections and tuples. CFs
in test suite were named according to types being tested, e.g.
`cf_map<int,text>' what is not a valid CF name. Therefore, these
names required "escaping" of invalid characters, here: simply
replacing with '_'.
2019-12-06 12:15:49 +01:00
Juliusz Stasiewicz
9efad36fb8 cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
Before:
cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]);
cqlsh> select max(val) from ks.list_types;

 system.max(val)
------------------------------------------------------------
 0x00000003000000040000000300000004000000040000000400000005

After:
cqlsh> select max(val) from ks.list_types;

 system.max(val)
--------------------
 [3, 4, 5]

This is accomplished similarly to `tojson()`/`fromjson()`: functions
are generated on demand from within `cql3::functions::get()`.
Because collections can have a variety of types, including UDTs
and tuples, it would be impossible to statically define max(T t)->T
for every T. Until now, max(blob)->blob overload was used.

Because `impl_max/min_function_for` is templated with the
input/output type, which can be defined in runtime, we need type-erased
("dynamic") versions of these functors. They work identically, i.e.
they compare byte representations of lhs and rhs with
`bytes::operator<`.

Resolves #5139
2019-12-06 12:14:51 +01:00
Avi Kivity
a18a921308 docs: maintainer.md: use command line to merge multi-commit pull requests
If you merge a pull request that contains multiple patches via
the github interface, it will document itself as the committer.

Work around this brain damage by using the command line.
2019-12-06 10:59:46 +01:00
Botond Dénes
7b37a700e1 configure.py: make tests explicitely depend on libseastar_testing.a
So that changes to libseastar_testing.a make all test target out of
date.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191205142436.560823-1-bdenes@scylladb.com>
2019-12-05 19:30:34 +02:00
Piotr Sarna
3a46b1bb2b Merge "handle hints on separate connection and scheduling group" from Piotr
Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write.

The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.

Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION.

Fixes #4974.

Tests: unit(release)
2019-12-05 17:25:26 +01:00
Calle Wilund
c11874d851 gms::inet_address: Use special ostream formatting to match Java
To make gms::inet_address::to_string() similar in output to origin.
The sole purpose being quick and easy fix of API/JMX ipv6
formatting of endpoints etc, where strings are used as lexical
comparisons instead of textual representation.

A better, but more work, solution is to fix the scylla-jmx
bridge to do explicit parse + re-format of addresses, but there
are many such callpoints.

An even better solution would be to fix nodetool to not make this
mistake of doing lexical comparisons, but then we risk breaking
merge compatibility. But could be an option for a separate
nodeprobe impl.

Message-Id: <20191204135319.1142-1-calle@scylladb.com>
2019-12-05 17:01:26 +02:00
Gleb Natapov
4893bc9139 tracing: split adding prepared query parameters from stopping of a trace
Currently query_options objects is passed to a trace stopping function
which makes it mandatory to make them alive until the end of the
query. The reason for that is to add prepared statement parameters to
the trace.  All other query options that we want to put in the trace are
copied into trace_state::params_values, so lets copy prepared statement
parameters there too. Trace enabled case will become a little bit more
expensive but on the other hand we can drop a continuation that holds
query_options object alive from a fast path. It is safe to drop the call
to stop_foreground_prepared() here since The tracing will be stopped
in process_request_one().

Message-Id: <20191205102026.GJ9084@scylladb.com>
2019-12-05 17:00:47 +02:00
Tomasz Grabiec
aa173898d6 Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz
Selected semaphores' names are now included in exception messages in
case of timeout or when admission queue overflows.

Resolves #5281
2019-12-05 14:19:56 +01:00
Nadav Har'El
5b2f35a21a Merge "Redis: fix the options related to Redis API, fix the DEL and GET command"
Merged pull request https://github.com/scylladb/scylla/pull/5381 by
Peng Jian, fixing multiple small issues with Redis:

* Rename the options related to Redis API, and describe them clearly.
* Rename redis_transport_port to redis_port
* Rename redis_transport_port_ssl to redis_ssl_port
* Rename redis_default_database_count to redis_database_count
* Remove unnecessary option enable_redis_protocol
* Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM

* Fix the DEL command: support to delete mutilple keys in one command.

* Fix the GET command: return the empty string when the required key is not exists.

* Fix the redis-test/test_del_non_existent_key: mark xfail.
2019-12-05 11:58:34 +02:00
Avi Kivity
85822c7786 database: fix schema use-after-move in make_multishard_streaming_reader
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.

Fix by evaluating full_slice before moving the schema.

Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.

Fixes #5419.
2019-12-05 11:58:34 +02:00
Piotr Sarna
79c3a508f4 table: Reduce read amplification in view update generation
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418
2019-12-05 11:58:34 +02:00
Konstantin Osipov
6a5e7c0e22 tests: reduce the number of iterations of dynamic_bitset_test
This test execution time dominates by a serious margin
test execution time in dev/release mode: reducing its
execution time improves the test.py turnaround by over 70%.

Message-Id: <20191204135315.86374-2-kostja@scylladb.com>
2019-12-05 11:58:34 +02:00
Avi Kivity
07427c89a2 gdb: change 'scylla thread' command to access fs_base register directly
Currently, 'scylla thread' uses arch_prctl() to extract the value of
fsbase, used to reference thread local variables. gdb 8 added support
for directly accessing the value as $fs_base, so use that instead. This
works from core dumps as well as live processes, as you don't need to
execute inferior functions.

The patch is required for debugging threads in core dumps, but not
sufficient, as we still need to set $rip and $rsp, and gdb still[1]
doesn't allow this.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370
2019-12-05 11:58:34 +02:00
Piotr Dulikowski
adfa7d7b8d messaging_service: don't move unsigned values in handlers
Performing std::move on integral types is pointless. This commit gets
rid of moves of values of `unsigned` type in rpc handlers.
2019-12-05 00:58:31 +01:00
Piotr Dulikowski
77d2ceaeba storage_proxy: handle hints through separate rpc verb 2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2609065090 storage_proxy: move register_mutation handler to local lambda
This refactor makes it possible to reuse the lambda in following
commits.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
6198ee2735 hh: introduce HINTED_HANDOFF_SEPARATE_CONNECTION feature
The feature introduced by this commit declares that hints can be sent
using the new dedicated RPC verb. Before using the new verb, nodes need
to know if other nodes in the cluster will be able to handle the new
RPC verb.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2e802ca650 hh: add HINT_MUTATION verb
Introduce a new verb dedicated for receiving and sending hints:
HINT_MUTATION. It is handled on the streaming connection, which is
separate from the one used for handling mutations sent by coordinator
during a write.

The intent of using a separate connection is to increase fariness while
handling hints and user requests - this way, a situation can be avoided
in which one type of requests saturate the connection, negatively
impacting the other one.
2019-12-05 00:51:49 +01:00
Avi Kivity
fd951a36e3 Merge "Let compaction wait on background deletions" from Benny
"
In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done.
However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted.

This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish.

Fixes #4909

Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction
"
2019-12-04 11:18:41 +02:00
Takuya ASADA
c9d8606786 dist/common/scripts/scylla_ntp_setup: relax RHEL version check
We may able to use chrony setup script on future version of RHEL/CentOS,
it better to run chrony setup when RHEL version >= 8, not only 8.

Note that on Fedora it still provides ntp/ntpdate package, so we run
ntp setup on it for now. (same on debian variants)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191203192812.5861-1-syuu@scylladb.com>
2019-12-04 10:59:14 +02:00
Juliusz Stasiewicz
430b2ad19d commitlog+region_group: timeout exceptions with names
`segment_manager' now uses a decorated version of `timed_out_error'
with hardcoded name. On the other hand `region_group' uses named
`on_request_expiry' within its `expiring_fifo'.
2019-12-03 19:07:19 +01:00
Avi Kivity
91d3f2afce docs: maintainers.md: fix typo in git push --force-with-lease
Just one lease, not many.

Reported by Piotr Sarna.
2019-12-03 18:17:46 +01:00
Calle Wilund
56a5e0a251 commitlog_replayer: Ensure applied frozen_mutation is safe during apply
Fixes #5211

In 79935df959 replay apply-call was
changed from one with no continuation to one with. But the frozen
mutation arg was still just lambda local.

Change to use do_with for this case as well.

Message-Id: <20191203162606.1664-1-calle@scylladb.com>
2019-12-03 18:28:01 +02:00
Juliusz Stasiewicz
d043393f52 db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore
Exception messages contain semaphore's name (provided in ctor).
This affects the queue overflow exception as well as timeout
exception. Also, custom throwing function in ctor was changed
to `prethrow_action', i.e. metrics can still be updated there but
now callers have no control over the type of the exception being
thrown. This affected `restricted_reader_max_queue_length' test.
`reader_concurrency_semaphore'-s docs are updated accordingly.
2019-12-03 15:41:34 +01:00
Amos Kong
e26b396f16 scylla-docker: fix default data_directories in scyllasetup.py (#5399)
Use default data_file_directories if it's not assigned in scylla.yaml

Fixes #5398

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 13:58:17 +02:00
Rafael Ávila de Espíndola
1cd17887fa build: strip debug when configured with --debuginfo 0
In a build configured with --debuginfo 0 the scylla binary still ends
up with some debug info from the libraries that are statically linked
in.

We should avoid compiling subprojects (including seastar) with debug
info when none is needed, but this at least avoids it showing up in
the binary.

The main motivation for this is that it is confusing to get a binary
with *some* debug info in it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191127215843.44992-1-espindola@scylladb.com>
2019-12-03 12:41:04 +02:00
Tomasz Grabiec
0a453e5d30 Merge "Use fragmented buffers for collection de/serialization" from Botond
This series refactors the collection de/serialization code to use
fragmented buffers, avoiding the large allocations and the associated
pains when working with large collections. Currently all operations that
involve collections require deserializing them, executing the operation,
then serializing them again to their internal storage format. The
de/serialization operations happen in linearized buffers, which means
that we have to allocate a buffer large enough to hold the *entire*
collection. This can cause immense pressure on the memory allocator,
which, in the face of memory fragmentation, might be unable to serve the
allocation at all. We've seen this causing all sorts of nasty problems,
including but not limited to: failing compactions, failing memtable
flush, OOM crash and etc.

Users are strongly discouraged from using large collections, yet they
are still a fact of life and have been haunting us since forever.

The proper solution for these problems would be to come up with an
in-memory format for collections, however that is a major effort, with a
lot of unknowns. This is something we plan on doing at some point but
until it happens we should make life less painful for those with large
collections.

The goal of this series is to avoid the need of allocating these large
buffers. Serialization now happens into a `bytes_ostream` which
automatically fragments the values internally. Deserialization happens
with `utils::linearizing_input_stream` (introduced by this series), which
linearizes only the individual collection cells, but not the entire
collection.
An important goal of this series was to introduce the least amount of
risk, and hence the least amount of code. This series does not try to
make a revolution and completely revamp and optimize the
de/serialization codepaths. These codepaths have their days numbered so
investing a lot of effort into them is in vain. We can apply incremental
optimizations where we deem it necessary.

Fixes: #5341
2019-12-03 10:31:34 +01:00
fastio
01599ffbae Redis API: Support the syntax of deleting multiple keys in one DEL command, fix the returning value for GET command.
Support to delete multiple keys in one DEL command.
The feature of returning number of the really deleted keys is still not supported.
Return empty string to client for GET command when the required key is not exists.

Fixes: #5334

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:27:40 +08:00
fastio
039b83ad3b Redis API: Rename options related to Redis API, describe them clearly, and remove unnecessary one.
Rename option redis_transport_port to redis_port, which the redis transport listens on for clients.
Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients.
Rename option redis_database_count. Set the redis dabase count.
Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace.
Remove option enable_redis_protocol, which is unnecessary.

Fixes: #5335

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:13:35 +08:00
Nadav Har'El
7b93360c8d Merge: redis: skip processing request of EOF
Merged pull request https://github.com/scylladb/scylla/pull/5393/ by
Amos Kong:
`
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
2019-12-03 10:40:20 +02:00
Avi Kivity
83feb9ea77 tools: toolchain: update frozen image
Commit 96009881d8 added diffutils to the dependencies via
Seastar's install-dependencies.sh, after it was inadvertantly
dropped in 1164ff5329 (update to Fedora 31; diffutils is no
longer brought in as a side effect of something else).

Regenerate the image to include diffutils.

Ref #5401.
2019-12-03 10:36:55 +02:00
Amos Kong
fb9af2a86b redis-test: add test_raw_cmd.py
This patch added subtests for EOF process, it reads and writes the socket
directly by using protocol cmds.

We can add more tests in future, tests with Redis module will hide some
protocol error.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Amos Kong
4fa862adf4 redis: skip processing request of EOF
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Rafael Ávila de Espíndola
bb114de023 dbuild: Fix confusion about relabeling
podman needs to relabel directories in exactly the same cases docker
does. The difference is that podman cannot relabel /tmp.

The reason it was working before is that in practice anyone using
dbuild has already relabeled any directories that need relabeling,
with the exception of /tmp, since it is recreated on every boot.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-2-espindola@scylladb.com>
2019-12-02 18:38:16 +02:00
Rafael Ávila de Espíndola
867cdbda28 dbuild: Use a temporary directory for /tmp
With this we don't have to use --security-opt label=disable.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-1-espindola@scylladb.com>
2019-12-02 18:38:14 +02:00
Botond Dénes
1d1f8b0d82 tests: mutation_test: add large collection allocation test
Checking that there are no large allocations when a large collection is
de/serialized.
2019-12-02 17:13:53 +02:00
Avi Kivity
28355af134 docs: add maintainer's handbook (#5396)
This is a list of recipes used by maintainers to maintain
scylla.git.
2019-12-02 15:01:54 +02:00
Calle Wilund
8c6d6254cf cdc: Remove some code from header 2019-12-02 13:00:19 +00:00
Botond Dénes
4c59487502 collection_mutation: don't linearize the buffer on deserialization
Use `utils::linearizing_input_stream` for the deserizalization of the
collection. Allows for avoiding the linearization of the entire cell
value, instead only linearizing individual values as they are
deserialized from the buffer.
2019-12-02 10:10:31 +02:00
Botond Dénes
690e9d2b44 utils: introduce linearizing_input_stream
`linearizing_input_stream` allows transparently reading linearized
values from a fragmented buffer. This is done by linearizing on-the-fly
only those read values that happen to be split across multiple
fragments. This reduces the size of the largest allocation from the size
of the entire buffer (when the entire buffer is linearized) to the size
of the largest read value. This is a huge gain when the buffer contains
loads of small objects, and modest gains when the buffer contains few
large objects. But the even in the worst case the size of the largest
allocation will be less or equal compared to the case where the entire
buffer is linearized.

This stream is planned to be used as glue code between the fragmented
cell value and the collection deserialization code which expects to be
reading linearized values.
2019-12-02 10:10:31 +02:00
Botond Dénes
065d8d37eb tests: random-utils: get_string(): add overload that takes engine parameter 2019-12-02 10:10:31 +02:00
Botond Dénes
2f9307c973 collection_mutation: use a fragmented buffer for serialization
For the serialization `bytes_ostream` is used.
2019-12-02 10:10:31 +02:00
Botond Dénes
fc5b096f73 imr: value_writer::write_to_destination(): don't dereference chunk iterator eagerly
Currently the loop which writes the data from the fragmented origin to
the destination, moves to the next chunk eagerly after writing the value
of the current chunk, if the current chunk is exhausted.
This presents a problem when we are writing the last piece of data from
the last chunk, as the chunk will be exhausted and we eagerly attempt to
move to the next chunk, which doesn't exist and dereferencing it will
fail. The solution is to not be eager about moving to the next chunk and
only attempt it if we actually have more data to write and hence expect
more chunks.
2019-12-02 10:10:31 +02:00
Botond Dénes
875314fc4b bytes_ostream: make it a FragmentRange
The presence of `const_iterator` seems to be a requirement as well
although it is not part of the concept. But perhaps it is just an
assumption made by code using it.
2019-12-02 10:10:31 +02:00
Botond Dénes
4054ba0c45 serialization: accept any CharOutputIterator
Not just bytes::output_iterator. Allow writing into streams other than
just `bytes`. In fact we should be very careful with writing into
`bytes` as they require potentially large contiguous allocations.

The `write()` method is now templatized also on the type of its first
argument, which now accepts any CharOutputIterator. Due to our poor
usage of namespace this now collides with `write` defined inside
`db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to
be templatized on the data type it reads from, and de-templatizing it
resolves the clash.
2019-12-02 10:10:31 +02:00
Botond Dénes
07007edab9 bytes_ostream: add output_iterator
To allow it being used for serialization code, which works in terms of
output iterators.
2019-12-02 10:10:31 +02:00
Takuya ASADA
c5a95210fe dist/common/scripts/scylla_setup: list virtio-blk devices correctly on interactive RAID setup
Currently interactive RAID setup prompt does not list virtio-blk devices due to
following reasons:
 - We fail matching '-p' option on 'lsblk --help' output since misusage of
   regex functon, list_block_devices() always skipping to use lsblk output.
 - We don't check existance of /dev/vd* when we skipping to use lsblk.
 - We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e'
   option, but we actually needed them.

To fix the problem we need to use re.search() instead of re.match() to match
'-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list,
then need to stop '-e 252' option on lsblk which excludes virtio-blk.

Additionally, it better to parse 'TYPE' field of lsblk output, we should skip
'loop' devices and 'rom' devices since these are not disk devices.

Fixes #4066

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191201160143.219456-1-syuu@scylladb.com>
2019-12-01 18:36:48 +02:00
Takuya ASADA
124da83103 dist/common/scripts: use chrony as NTP server on RHEL8/CentOS8
We need to use chrony as NTP server on RHEL8/CentOS8, since it dropped
ntpd/ntpdate.

Fixes #4571

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191101174032.29171-1-syuu@scylladb.com>
2019-12-01 18:35:03 +02:00
Nadav Har'El
b82417ba27 Merge "alternator: Implement Expected operators LE, GE, and BETWEEN"
Merged pull request https://github.com/scylladb/scylla/pull/5392 from
Dejan Mircevski.

Refs #5034

The patches:
  alternator: Implement LE operator in Expected
  alternator: Implement GE operator in Expected
  alternator: Make cmp diagnostic a value, not funct
  utils: Add operator<< for big_decimal
  alternator: Implement BETWEEN operator in Expected
2019-12-01 16:11:11 +02:00
Nadav Har'El
8614c30bcf Merge "implement echo command"
Merged pull request https://github.com/scylladb/scylla/pull/5387 from
Amos Kong:

This patch implemented echo command, which return the string back to client.

Reference:

    https://redis.io/commands/echo
2019-12-01 10:29:57 +02:00
Amos Kong
49fee4120e redis-test: add test_echo
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:32:00 +08:00
Amos Kong
3e2034f07b redis: implement echo command
This patch implemented echo command, which return the string back to client.

Reference:
- https://redis.io/commands/echo

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:30:35 +08:00
Dejan Mircevski
dcb1b360ba alternator: Implement BETWEEN operator in Expected
Enable existing BETWEEN test, and add some more coverage to it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 16:47:21 -05:00
Dejan Mircevski
c43b286f35 utils: Add operator<< for big_decimal
... and remove an existing duplicate from lua.cc.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:32:09 -05:00
Dejan Mircevski
e0d77739cc alternator: Make cmp diagnostic a value, not funct
All check_compare diagnostics are static strings, so there's no need
to call functions to get them.  Instead of a function, make diagnostic
a simple value.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:09:05 -05:00
Dejan Mircevski
65cb84150a alternator: Implement GE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 12:29:08 -05:00
Dejan Mircevski
f201f0eaee alternator: Implement LE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 11:59:52 -05:00
Avi Kivity
96009881d8 Update seastar submodule
* seastar 8eb6a67a4...166061da3 (3):
  > install-dependencies.sh: add diffutils
  > reactor: replace std::optional (in _network_stack_ready) with compat::optional
  > noncopyable_function: disable -Wuninitialized warning in noncopyable_function_base

Ref #5386.
2019-11-29 12:50:48 +02:00
Tomasz Grabiec
6562c60c86 Merge "test.py: terminate children upon signal" from Kostja
Allows a signal to terminate the outstanding
test tasks, to avoid dangling children.
2019-11-29 12:05:03 +02:00
Pekka Enberg
bb227cf2b4 Merge "Fix default directories in Scylla setup scripts" from Amos
"Fix two problem in scylla_io_setup:

 - Problem 1: paths of default directories is invalid, introduced by
   commit 5ec1915 ("scylla_io_setup: assume default directories under
   /var/lib/scylla").

 - Problem 2: wrong path join, introduced by commit 31ddb21
   ("dist/common/scripts: support nonroot mode on setup scripts").

Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py:

  - Fixed default scylla directories when they aren't assigned in
    scylla.yaml"

Fixes #5370

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>

* 'scylla_io_setup' of git://github.com/amoskong/scylla:
  use parse_scylla_dirs_with_default to get scylla directories
  scylla_io_setup: fix data_file_directories check
  scylla_util: introduce helper to process the default scylla directories
  scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
  scylla_io_setup: fix path join of default scylla directories
2019-11-29 12:05:03 +02:00
Ultrabug
61f1e6e99c test.py: fix undefined variable 'options' in write_xunit_report() 2019-11-28 19:06:22 +03:00
Ultrabug
5bdc0386c4 test.py: comparison to False should be 'if cond is False:' 2019-11-28 19:06:22 +03:00
Ultrabug
737b1cff5e test.py: use isinstance() for type comparison 2019-11-28 19:06:22 +03:00
Konstantin Osipov
c611325381 test.py: terminate children upon signal
Use asyncio as a more modern way to work with concurrency,
Process signals in an event loop, terminate all outstanding
tests before exiting.

Breaking change: this commit requires Python 3.7 or
newer to run this script. The patch adds a version
check and a message to enforce it.
2019-11-28 19:06:22 +03:00
Botond Dénes
cf24f4fe30 imr: move documentation to docs/
Where all the other documentation is, and hence where people would be
looking for it.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191128144612.378244-1-bdenes@scylladb.com>
2019-11-28 16:47:52 +02:00
Avi Kivity
36dd0140a8 Update seastar submodule
* seastar 5c25de907a...8eb6a67a4b (1):
  > util/backtrace.hh: add missing print.hh include
2019-11-28 16:47:16 +02:00
Benny Halevy
7aef39e400 tracing: one_session_records: keep local tracing ptr
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.

Fixes #5243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-28 15:24:10 +01:00
Gleb Natapov
75499896ab client_state: store _user as optional instead of shared_ptr
_user cannot outlive client_state class instance, so there is no point
in holding it in shared_ptr.

Tested: debug test.py and dtest auth_test.py

Message-Id: <20191128131217.26294-5-gleb@scylladb.com>
2019-11-28 15:48:59 +02:00
Gleb Natapov
1538cea043 cql: modification_statement: store _restrictions as optional instead of shared_ptr
_restrictions can be optional since its lifetime is managed by
modification_statement class explicitly.

Message-Id: <20191128131217.26294-4-gleb@scylladb.com>
2019-11-28 15:48:54 +02:00
Gleb Natapov
ce5d6d5eee storage_service: store thrift server as an optional instead of shared_ptr
Only do_stop_rpc_server uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-3-gleb@scylladb.com>
2019-11-28 15:48:51 +02:00
Gleb Natapov
b9b99431a8 storage_service: store cql server as an optional instead of shared_ptr
Only do_stop_native_transport() uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-2-gleb@scylladb.com>
2019-11-28 15:48:47 +02:00
Avi Kivity
2b7e97514a Update seastar submodule
* seastar 6f0ef32514...5c25de907a (7):
  > shared_future: Fix crash when all returned futures time out
Fixes #5322.
  > future: don't create temporaries on get_value().
  > reactor: lower the default stall threshold to 200ms
  > reactor: Simplify network initialization
  > reactor: Replace most std::function with noncopyable_function
  > futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode
  > inet_address: Make inet_address == operator ignore scope (again)
2019-11-28 14:48:01 +02:00
Juliusz Stasiewicz
fa12394dfe reader_concurrency_semaphore: cosmetic changes
Added line breaks, replaced unused include, included seastarx.hh
instead of `using namespace seastar`.
2019-11-28 13:39:08 +01:00
Nadav Har'El
fde336a882 Merged "5139 minmax bad printing"
Merged pull request https://github.com/scylladb/scylla/pull/5311 from
Juliusz Stasiewicz:

This is a partial solution to #5139 (only for two types) because of the
above and because collections are much harder to do. They are coming in
a separate PR.
2019-11-28 14:06:43 +02:00
Juliusz Stasiewicz
3b9ebca269 tests/cql_query_test: add test for aggregates on inet+time_type
This is a test to max(), min() and count() system functions on
the arguments of types: `net::inet_address` and `time_native_type`.
2019-11-28 11:20:43 +01:00
Juliusz Stasiewicz
9c23d89531 cql3/functions: add missing min/max/count for inet and time type
References #5139. Aggregate functions, like max(), when invoked
on `inet_address' and `time_native_type' used to choose
max(blob)->blob overload, with casting of argument and result to
bytes. This is because appropriate calls to
`aggregate_fcts::make_XXX_function()' were missing. This commit
adds them. Functioning remains the same but now clients see
user-friendly representations of aggregate result, not binary.

Comparing inet addresses without inet::operator< is performed by
trick, where ADL is bypassed by wrapping the name of std::min/max
and providing an overload of wrapper on inet type.
2019-11-28 11:18:31 +01:00
Pavel Emelyanov
8532093c61 cql: The cql_server does not need proxy reference
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191127153842.4098-1-xemul@scylladb.com>
2019-11-28 10:58:46 +01:00
Amos Kong
e2eb754d03 use parse_scylla_dirs_with_default to get scylla directories
Use default data_file_directories/commitlog_directory if it's not assigned
in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:48:14 +08:00
Amos Kong
bd265bda4f scylla_io_setup: fix data_file_directories check
Use default data_file_directories if it's not assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:47:56 +08:00
Amos Kong
123c791366 scylla_util: introduce helper to process the default scylla directories
Currently we support to assign workdir from scylla.yaml, and we use many
hardcode '/var/lib/scylla' in setup scripts.

Some setup scripts get scylla directories by parsing scylla.yaml, introduced
parse_scylla_dirs_with_default() that adds default values if scylla directories
aren't assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:54:32 +08:00
Amos Kong
b75061b4bc scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:38:01 +08:00
Amos Kong
ada0e92b85 scylla_io_setup: fix path join of default scylla directories
Currently we are checking an invalid path of some default scylla directories,
the directories don't exist, so the tune will always be skipped. It caused by
two problem.

Problem 1: paths of default directories is invalid

Introduced by commit 5ec191536e, we try to tune some scylla default directories
if they exist. But the directory paths we try are wrong.

For example:
- What we check: /var/lib/scylla/commitlog_directory
- Correct one: /var/lib/scylla/commitlog

Problem 2: wrong path join

Introduced by commit 31ddb2145a, default_path might be replaced from
'/var/lib/scylla/' to '/var/lib/scylla'.

Our code tries to check an invalid path that is wrongly join, eg:
'/var/lib/scyllacommitlog'

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:37:58 +08:00
Amos Kong
d4a26f2ad0 scylla_util: get_scylla_dirs: return default data/commitlog directories if they aren't set (#5358)
The default values of data_file_directories and commitlog_directory were
commented by commit e0f40ed16a. It causes scylla_util.py:get_scylla_dirs() to
fail in checking the values.

This patch changed get_scylla_dirs() to return default data/commitlog
directories if they aren't set.

Fixes #5358 

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-27 13:52:05 +02:00
Nadav Har'El
cb1ed5eab2 alternator-test: test Query's Limit parameter
Add a test, test_query.py::test_query_limit, to verify that the Limit
parameter correctly limits the number of rows returned by the Query.
This was supposed to already work correctly - but we never had a test for
it. As we hoped, the test passes (on both Alternator and DynamoDB).

Another test, test_query.py::test_query_limit_paging, verifies that
paging can be done with any setting of Limit. We already had tests
for paging of the Scan operation, but not for the Query operation.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:27:26 +01:00
Nadav Har'El
c01ca661a0 alternator-test: Select parameter of Query and Scan
This is a comprehensive test for the "Select" parameter of Query and Scan
operations, but only for the base-table case, not index, so another future
patch should add similar tests in test_gsi.py and test_lsi.py as well.

The main use of the Select parameter is to allow returning just the count
of items, instead of their content, but it also has other esoteric options,
all of which we test here.

The test currently succeeds on AWS DynamoDB, demonstrating that the test
is correct, but fails on Alternator because the "Select" parameter is not
yet supported. So the test is marked xfail.

Refs #5058

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:22:33 +01:00
Botond Dénes
9d09f57ba5 scylla-gdb.py: scylla_smp_queues: use lazy initalization
Currently the command tries to read all seastar smp queues in its
initialization code in the constructor. This constructor is run each
time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and
sometimes also annoying errors because the sourcing happens in the wrong
context and seastar symbols are not available.
Avoid this by running this initializing code lazily, on the first
invocation.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>
2019-11-27 12:04:57 +01:00
Tomasz Grabiec
87b72dad3e Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov
This patchset adds missing "const" function qualifiers throughout
the Scylla code base, which would make code less error-prone.

The changeset incorporates Kostja's work regarding const qualifiers
in the cql code hierarchy along with a follow-up patch addressing the
review comment of the corresponding patch set (the patch subject is
"cql: propagate const property through prepared statement tree.").
2019-11-27 10:56:20 +01:00
Rafael Ávila de Espíndola
91b43f1f06 dbuild: fix podman with selinux enabled
With this change I am able to run tests using docker-podman. The
option also exists in docker.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194101.25221-1-espindola@scylladb.com>
2019-11-26 21:50:56 +02:00
Rafael Ávila de Espíndola
480055d3b5 dbuild: Fix missing docker options
With the recent changes docker was missing a few options. In
particular, it was missing -u.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194347.25699-1-espindola@scylladb.com>
2019-11-26 21:45:31 +02:00
Rafael Ávila de Espíndola
c0a2cd70ff lua: fix test with boost 1.66
The boost 1.67 release notes says

Changed maximum supported year from 10000 to 9999 to resolve various issues

So change the test to use a larger number so that we get an exception
with both boost 1.66 and boost 1.67.

Fixes #5344

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126180327.93545-1-espindola@scylladb.com>
2019-11-26 21:17:15 +02:00
Pavel Solodovnikov
55a1d46133 cql: some more missing const qualifiers
There are several virtual functions in public interfaces named "is_*"
that clearly should be marked as "const", so fix that.
2019-11-26 17:57:51 +03:00
Pavel Solodovnikov
412f1f946a cql: remove "mutable" on _opts in select_statement
_opts initialization can be safely done in the constructor, hence no need to make it mutable.
2019-11-26 17:55:10 +03:00
Piotr Sarna
d90dbd6ab0 Merge "support podman as a replacement to docker" from Avi
Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8.
Podman is a drop-in replacement for docker; this series adds support
for using podman in dbuild.

Apart from actually working on Fedora 31 hosts,
podman is nicer in being more secure and not requiring a daemon.

Fixes #5332
2019-11-26 15:17:49 +01:00
Tomasz Grabiec
5c9fe83615 Merge "Sanitize sub-modules shutting down" from Pavel
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "[it] was successull" one. In between it catches
the exception (if any) and warns this in logs.

By "then" I mean literally then, not the seastar's then() :)

Fixes: #4586
2019-11-26 15:14:22 +02:00
Piotr Sarna
9c5a5a5ac2 treewide: add names to semaphores
By default, semaphore exceptions bring along very little context:
either that a semaphore was broken or that it timed out.
In order to make debugging easier without introducing significant
runtime costs, a notion of named semaphore is added.
A named semaphore is simply a semaphore with statically defined
name, which is present in its errors, bringing valuable context.
A semaphore defined as:

  auto sem = semaphore(0);

will present the following message when it breaks:
"Semaphore broken"
However, a named semaphore:

  auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"});

will present a message with at least some debugging context:

  "Semaphore broken: io_concurrency_sem"

It's not much, but it would really help in pinpointing bugs
without having to inspect core dumps.

At the same time, it does not incur any costs for normal
semaphore operations (except for its creation), but instead
only uses more CPU in case an error is actually thrown,
which is considered rare and not to be on the hot path.

Refs #4999

Tests: unit(dev), manual: hardcoding a failure in view building code
2019-11-26 15:14:21 +02:00
Avi Kivity
6fbb724140 conf: remove unsupported options from scylla.yaml (#5299)
These unsupported options do nothing except to confuse users who
try to tune them.

Options removed:

hinted_handoff_throttle_in_kb
max_hints_delivery_threads
batchlog_replay_throttle_in_kb
key_cache_size_in_mb
key_cache_save_period
key_cache_keys_to_save
row_cache_size_in_mb
row_cache_save_period
row_cache_keys_to_save
counter_cache_size_in_mb
counter_cache_save_period
counter_cache_keys_to_save
memory_allocator
saved_caches_directory
concurrent_reads
concurrent_writes
concurrent_counter_writes
file_cache_size_in_mb
index_summary_capacity_in_mb
index_summary_resize_interval_in_minutes
trickle_fsync
trickle_fsync_interval_in_kb
internode_authenticator
native_transport_max_threads
native_transport_max_concurrent_connections
native_transport_max_concurrent_connections_per_ip
rpc_server_type
rpc_min_threads
rpc_max_threads
rpc_send_buff_size_in_bytes
rpc_recv_buff_size_in_bytes
internode_send_buff_size_in_bytes
internode_recv_buff_size_in_bytes
thrift_framed_transport_size_in_mb
concurrent_compactors
compaction_throughput_mb_per_sec
sstable_preemptive_open_interval_in_mb
inter_dc_stream_throughput_outbound_megabits_per_sec
cross_node_timeout
streaming_socket_timeout_in_ms
dynamic_snitch_update_interval_in_ms
dynamic_snitch_reset_interval_in_ms
dynamic_snitch_badness_threshold
request_scheduler
request_scheduler_options
throttle_limit
default_weight
weights
request_scheduler_id
2019-11-26 15:14:21 +02:00
Amos Kong
817f34d1a9 ami: support new aws instance types: c5d, m5d, m5ad, r5d, z1d (#5330)
Currently scylla_io_setup will skip in scylla_setup, because we didn't support
those new instance types.

I manually executed scylla_io_setup, and the scylla-server started and worked
well.

Let's apply this patch first, then check if there is some new problem in
ami-test.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-26 15:14:21 +02:00
Konstantin Osipov
90346236ac cql: propagate const property through prepared statement tree.
cql_statement is a class representing a prepared statement in Scylla.
It is used concurrently during execution, so it is important that its
change is not changed by execution.

Add const qualifier to the execution methods family, throghout the
cql hierarchy.

Mark a few places which do mutate prepared statement state during
execution as mutable. While these are not affecting production today,
as code ages, they may become a source of latent bugs and should be
moved out of the prepared state or evaluated at prepare eventually:

cf_property_defs::_compaction_strategy_class
list_permissions_statement::_resource
permission_altering_statement::_resource
property_definitions::_properties
select_statement::_opts
2019-11-26 14:18:17 +03:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Pavel Emelyanov
50a1ededde main: Remove now unused defer-with-log helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
a0f92d40ee main: Shut down sighup handler with verbose helper
And (!) fix the misprinted variable name.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
0719369d83 repair: Remove extra logging on shutdown
The shutdown start/finish messages are already printed in verbose_shutdown()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
2d64fc3a3e main: Shut down database with verbose_shutdown helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
636c300db5 main: Shut down prometheus with verbose_shutdown()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

---

v2:
- Have stop easrlier so that exception in start/listen do
  not prevent prometheu.stop from calling
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
804b152527 main: Sanitize shutting down callbacks
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "shutting down foo was successfull". In between
it catches the exception (if any) and warns this in logs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:45:49 +03:00
Nadav Har'El
4160b3630d Merge "Return preimage from CDC only when it's enabled"
Merged pull request https://github.com/scylladb/scylla/pull/5218
from Piotr Jastrzębski:

Users should be able to decide whether they need preimage or not. There is
already an option for that but it's not respected by the implementation.
This PR adds support for this functionality.

Tests: unit(dev).

Individual patches:
  cdc: Don't take storage_proxy as transformer::pre_image_select param
  cdc::append_log_mutations: use do_with instead of shared_ptr
  cdc::append_log_mutations: fix undefined behavior
  cdc: enable preimage in test_pre_image_logging test
  cdc: Return preimage only when it's requested
  cdc: test both enabled and disabled preimage in test_pre_image_logging
2019-11-25 14:32:17 +02:00
Pavel Emelyanov
f6ac969f1e mm: Stop migration manager
Before stopping the db itself, stop the migration service.
It must be stopped before RPC, but RPC is not stopped yet
itself, so we should be safe here.

Here's the tail of the resulting logs:

INFO  2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager
INFO  2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished
INFO  2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server
INFO  2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete.

Also -- stop the mm on drain before the commitlog it stopped.
[Tomasz: mm needs the cl because pulling schema changes from other nodes
involves applying them into the database. So cl/db needs to be
stopped after mm is stopped.]

The drain logs would look like

...
INFO  2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED:

and then on stop

...
INFO  2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished
INFO  2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server
INFO  2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete.

Fixes #5300

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191125080605.7661-1-xemul@scylladb.com>
2019-11-25 12:59:01 +01:00
Asias He
6ec602ff2c repair: Fix rx_hashes_nr metrics (#5213)
In get_full_row_hashes_with_rpc_stream and
repair_get_row_diff_with_rpc_stream_process_op which were introduced in
the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not
updated correctly.

In the test we have 3 nodes and run repair on node3, we makes sure the
following metrics are correct.

assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'],
   	    node3_metrics['scylla_repair_rx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'],
   	    node3_metrics['scylla_repair_tx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'],
   	    node3_metrics['scylla_repair_rx_row_nr'])
assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'],
   	    node3_metrics['scylla_repair_tx_row_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'],
   	    node3_metrics['scylla_repair_rx_row_bytes'])
assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'],
            node3_metrics['scylla_repair_tx_row_bytes'])

Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test
Fixes: #5339
Backports: 3.2
2019-11-25 13:57:37 +02:00
Piotr Jastrzebski
2999cb5576 cdc: test both enabled and disabled preimage in test_pre_image_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
222b94c707 cdc: Return preimage only when it's requested
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
c94a5947b7 cdc: enable preimage in test_pre_image_logging test
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
595c9f9d32 cdc::append_log_mutations: fix undefined behavior
The code was iterating over a collection that was modified
at the same time. Iterators were used for that and collection
modification can invalidate all iterators.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
f0f44f9c51 cdc::append_log_mutations: use do_with instead of shared_ptr
This will not only safe some allocations but also improve
code readability.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
b8d9158c21 cdc: Don't take storage_proxy as transformer::pre_image_select param
transformer has access to storage_proxy through its _ctx field.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Nadav Har'El
3eab6cd549 Merged "toolchain: update to Fedora 31"
Merged pull request https://github.com/scylladb/scylla/pull/5310 from
Avi Kivity:

This is a minor update as gcc and boost versions did not change. A noteable
update is patchelf 0.10, which adds support to large binaries.

A few minor issues exposed by the update are fixed in preparatory patches.

Patches:
  dist: rpm: correct systemd post-uninstall scriptlet
  build: force xz compression on rpm binary payload
  tools: toolchain: update to Fedora 31
2019-11-24 13:38:45 +02:00
Tomasz Grabiec
e3d025d014 row_cache: Fix abort on bad_alloc during cache update
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).

Fix by destroying the coroutine first.

Fixes #5327.

Tests:
  - row_cache_test (dev)

Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola
8599f8205b rpmbuild: don't use dwz
By default rpm uses dwz to merge the debug info from various
binaries. Unfortunately, it looks like addr2line has not been updated
to handle this:

// This works
$ addr2line  -e build/release/scylla 0x1234567

$ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug

// now this fails
$ addr2line -e build/release/scylla 0x1234567

I think the issue is

https://sourceware.org/bugzilla/show_bug.cgi?id=23652

Fixes #5289

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123015734.89331-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
25d5d39b3c reloc: Force using sha1 for build-ids
The default build-id used by lld is xxhash, which is 8 bytes long. rpm
requires build-ids to be at least 16 bytes long
(https://github.com/rpm-software-management/rpm/issues/950). We force
using sha1 for now. That has no impact in gold and bfd since that is
their default. We set it in here instead of configure.py to not slow
down regular builds.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123020801.89750-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
b5667b9c31 build: don't compress debug info in executables
By default we were compressing debug info only in release
executables. The idea, if I understand it correctly, is that those are
the ones we ship, so we want a more compact binary.

I don't think that was doing anything useful. The compression is just
gzip, so when we ship a .tar.xz, having the debug info compressed
inside the scylla binary probably reduces the overall compression a
bit.

When building a rpm the situation in amusing. As part of the rpm
build process the debug info is decompressed and extracted to an
external file.

Given that most of the link time goes to compressing debug info, it is
probably a good idea to just skip that.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123022825.102837-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
d84859475e Merge "Refactor test.py and cleanup resources" from Kostja
Structure the code to be able to introduce futures.
Apply trivial cleanups.
Switch to asyncio and use it to work with processes and
handle signals. Cleanup all processes upon signal.
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
e166fdfa26 Merge "Optimize LWT query phase" from Vladimir Davydov
This patch implements a simple optimization for LWT: it makes PAXOS
prepare phase query locally and return the current value of the modified
key so that a separate query is not necessary. For more details see
patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial
preparatory refactoring.
2019-11-24 11:35:29 +02:00
Pavel Solodovnikov
4879db70a6 system_keyspace: support timeouts in queries to system.paxos table.
Also introduce supplementary `execute_cql_with_timeout` function.

Remove redundant comment for `execute_cql`.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
bf5f864d80 paxos: piggyback result query on prepare response
Current LWT implementation uses at least three network round trips:
 - first, execute PAXOS prepare phase
 - second, query the current value of the updated key
 - third, propose the change to participating replicas

(there's also learn phase, but we don't wait for it to complete).

The idea behind the optimization implemented by this patch is simple:
piggyback the current value of the updated key on the prepare response
to eliminate one round trip.

To generate less network traffic, only the closest to the coordinator
replica sends data while other participating replicas send digests which
are used to check data consistency.

Note, this patch changes the API of some RPC calls used by PAXOS, but
this should be okay as long as the feature in the early development
stage and marked experimental.

To assess the impact of this optimization on LWT performance, I ran a
simple benchmark that starts a number of concurrent clients each of
which updates its own key (uncontended case) stored in a cluster of
three AWS i3.2xlarge nodes located in the same region (us-west-1) and
measures the aggregate bandwidth and latency. The test uses shard-aware
gocql driver. Here are the results:

                latency 99% (ms)    bandwidth (rq/s)    timeouts (rq/s)
    clients     before  after       before  after       before  after
          1          2      2          626    637            0      0
          5          4      3         2616   2843            0      0
         10          3      3         4493   4767            0      0
         50          7      7        10567  10833            0      0
        100         15     15        12265  12934            0      0
        200         48     30        13593  14317            0      0
        400        185     60        14796  15549            0      0
        600        290     94        14416  15669            0      0
        800        568    118        14077  15820            2      0
       1000        710    118        13088  15830            9      0
       2000       1388    232        13342  15658           85      0
       3000       1110    363        13282  15422          233      0
       4000       1735    454        13387  15385          329      0

That is, this optimization improves max LWT bandwidth by about 15%
and allows to run 3-4x more clients while maintaining the same level
of system responsiveness.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
6160b9017d commitlog: make sure a file is closed
If allocate or truncate throws, we have to close the file.

Fixes #4877

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
3d1d4b018f paxos: remove unnecessary move constructor invocations
invoke_on() guarantees that captures object won't be destroyed until the
future returned by the invoked function is resolved so there's no need
to move key, token, proposal for calling paxos_state::*_impl helpers.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
cfb079b2c9 types: Refactor duplicated value_cast implementation
The two implementations of value_cast were almost identical.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-3-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
ef2e96c47c storage_proxy: factor out helper to sort endpoints by proximity
We need it for PAXOS.
2019-11-24 11:35:29 +02:00
Nadav Har'El
854e6c8d7b alternator-test: test_health_only_works_for_root_path: remove wrong check
The test_health_only_works_for_root_path test checks that while Alternator's
HTTP server responds to a "GET /" request with success ("health check"), it
should respond to different URLs with failures (page not found).

One of the URLs it tested was "/..", but unfortunately some versions of
Python's HTTP client canonize this request to just a "/", causing the
request to unexpectedly succeed - and the test to fail.

So this patch just drops the "/.." check. A few other nonsense URLs are
attempted by the test - e.g., "/abc".

Fixes #5321

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
63d4590336 storage_proxy: move digest_algorithm upper
We need it for PAXOS.

Mark it as static inline while we are at it.
2019-11-24 11:35:29 +02:00
Nadav Har'El
43d3e8adaf alternator: make DescribeTable return table schema
One of the fields still missing in DescribeTable's response (Refs #5026)
was the table's schema - KeySchema and AttributeDefinitions.

This patch adds this missing feature, and enables the previously-xfailing
test test_describe_table_schema.

A complication of this patch is that in a table with secondary indexes,
we need to return not just the base table's schema, but also the indexes'
schema. The existing tests did not cover that feature, so we add here
two more tests in test_gsi.py for that.

One of these secondary-index schema tests, test_gsi_2_describe_table_schema,
still fails, because it outputs a range-key which Scylla added to a view
because of its own implementation needs, but wasn't in the user's
definition of the GSI. I opened a separate issue #5320 for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
f5c2a23118 serializer: add reference_wrapper handling
Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats
reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or
std::tuple<> as T.

We need it to avoid copying query::result while serializing
paxos::promise.
2019-11-24 11:35:29 +02:00
Botond Dénes
89f9b89a89 scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0
Currently even if `-a` or `-s 0` is provided, `scylla task_histogram`
will scan a limited amount of pages due to a bug in the scan loop's stop
condition, which will be trigger a stop once the default sample limit is
reached. Fix the loop by skipping this check when the user wants to scan
all tasks.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
1452653fbc query_context: fix use after free of timeout_config in execute_cql_with_timeout
timeout_config is used by reference by cql3::query_processor::process(),
see cql3::query_options, so the caller must make sure it doesn't go away.
2019-11-24 11:35:29 +02:00
Avi Kivity
ff7e78330c tools: toolchain: dbuild: work around "podman logs --follow" hang
At least some versions of 'podman logs --follow' hang when the
container eventually exits (also happens with docker on recent
versions). Fortunately, we don't need to use 'podman logs --follow'
and can use the more natural non-detached 'podman run', because
podman does not proxy SIGTERM and instead shuts down the container
when it receives it.

So, to work around the problem, use the same code path in interactive
and non-interactive runs, when podman is in use instead of docker.
2019-11-22 13:59:05 +02:00
Avi Kivity
702834d0e4 tools: dbuild: avoid uid/gid/selinux hacks when using podman
With docker, we went to considerable lengths to ensure that
access to mounted volume was done using the calling user, including
supplementary groups. This avoids root-owned files being left around
after a build, and ensures that access to group-shared files (like
/var/cache/ccache) works as expected.

All of this is unnecessary and broken when using podman. Podman
uses a proxy to access files on behalf of the container, so naturally
all access is done using the calling user's identity. Since it remaps
user and group IDs, assigning the host uid/gid is meaningless. Using
--userns host also breaks, because sudo no longer works.

Fix this by making all the uid/gid/selinux games specific to docker and
ignore them when using podman. To preserve the functionality of tools
that depend on $HOME, set that according to the host setting.
2019-11-22 13:58:29 +02:00
Tomasz Grabiec
9d7f8f18ab database: Avoid OOMing with flush continuations after failed memtable flush
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
2019-11-22 12:08:36 +01:00
Tomasz Grabiec
fb28543116 lsa: Introduce operator bool() to occupancy_stats 2019-11-22 12:08:28 +01:00
Tomasz Grabiec
a69fda819c lsa: Expose region_impl::evictable_occupancy in the region class 2019-11-22 12:08:10 +01:00
Avi Kivity
1c181c1b85 tools: dbuild: don't mount duplicate volumes
podman refuses to start with duplicate volumes, which routinely
happen if the toplevel directory is the working directory. Detect
this and avoid the duplicate.
2019-11-22 10:13:30 +02:00
Konstantin Osipov
b8b5834cf1 test.py: simplify message output in run_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
90a8f79d7e test.py: use UnitTest class where possible 2019-11-21 23:16:22 +03:00
Konstantin Osipov
8cd8cfc307 test.py: rename harness command line arguments to 'options'
UnitTest class uses juggles with the name 'args' quite a bit to
construct the command line for a unit test, so let's spread
the harness command line arguments from the unit test command line
arguments a bit apart by consistently calling the harness command line
arguments 'options', and unit test command line arguments 'args'.

Rename usage() to parse_cmd_line().
2019-11-21 23:16:22 +03:00
Konstantin Osipov
e5d624d055 test.py: consolidate argument handling in UnitTest constructor
Create unique UnitTest objects in find_tests() for each found match,
including repeat, to ensure each test has its own unique id.
This will also be used to store execution state in the test.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
dd60673cef test.py: move --collectd to standard args 2019-11-21 23:16:22 +03:00
Konstantin Osipov
fe12f73d7f test.py: introduce class UnitTest 2019-11-21 23:16:22 +03:00
Konstantin Osipov
bbcdee37f7 test.py: add add_test_list() to find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
4723afa09c test.py: add long tests with add_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
13f1e2abc6 test.py: store the non-default seastar arguments along with definition 2019-11-21 23:16:22 +03:00
Konstantin Osipov
72ef11eb79 test.py: introduce add_test() to find_tests()
To avoid code duplication, and to build upon later.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b50b24a8a7 test.py: avoid an unnecessary loop in find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
a5103d0092 test.py: move args.repeat processing to find_tests()
It somewhat stands in the way of using asyncio

This patch also implements a more comprehensive
fix for #5303, since we not only have --repeat, but
run some tests in different configurations, in which
case xml output is also overwritten.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
0f0a49b811 test.py: introduce print_summary() and write_xunit_report()
(One more moving of the code around).
2019-11-21 23:16:22 +03:00
Konstantin Osipov
22166771ef test.py: rename test_to_run tests_to_run 2019-11-21 23:16:22 +03:00
Konstantin Osipov
1d94d9827e test.py: introduce run_all_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
29087e1349 test.py: move out run_test() routine
(Trivial code refactoring.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
79506fc5ab test.py: introduce find_tests()
Trivial code refactoring.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
a44a1c4124 test.py: remove print_status_succint
(Trivial code cleanup.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b9605c1d37 test.py: move mode list evaluation to usage() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
0c4df5a548 test.py: add usage() 2019-11-21 23:16:22 +03:00
Pavel Emelyanov
e0f40ed16a cli: Add the --workdir|-W option
When starting scylla daemon as non-root the initialization fails
because standard /var/lib/scylla is not accessible by regular users.
Making the default dir accessible for user is not very convenient
either, as it will cause conflicts if two or more instances of scylla
are in use.

This problem can be resolved by specifying --commitlog-directory,
--data-file-directories, etc on start, but it's too much typing. I
propose to revive Nadav's --home option that allows to move all the
directories under the same prefix in one go.

Unlike Nadav's approach the --workdir option doesn't do any tricky
manipulations with existing directories. Insead, as Pekka suggested,
the individual directories are placed under the workir if and only
if the respective option is NOT provided. Otherwise the directory
configuration is taken as is regardless of whether its absolute or
relative path.

The values substutution is done early on start. Avi suggested that
this is unsafe wrt HUP config re-read and proper paths must be
resolved on the fly, but this patch doesn't address that yet, here's
why.

First of all, the respective options are MustRestart now and the
substitution is done before HUP handler is installed.

Next, commitlog and data_file values are copied on start, so marking
the options as LiveUpdate won't make any effect.

Finally, the existing named_value::operator() returns a reference,
so returning a calculated (and thus temporary) value is not possible
(from my current understanding, correct me if I'm wrong). Thus if we
want the *_directory() to return calculated value all callers of them
must be patched to call something different (e.g. *_directory.get() ?)
which will lead to more confusion and errors.

Changes v3:
 - the option is --workdir back again
 - the existing *directory are only affected if unset
 - default config doesn't have any of these set
 - added the short -W alias

Changes v2:
 - the option is --home now
 - all other paths are changed to be relative

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191119130059.18066-1-xemul@scylladb.com>
2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola
5417c5356b types: Move get_castas_fctn to cql3
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-9-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
f06d6df4df types: Simplify casts to string
These now just use the to_string member functions, which makes it
possible to move the code to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-8-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
786b1ec364 types: Move json code to its own file
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-7-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
af8e207491 types: Avoid using deserialize_value in json code
This makes it independent of internal functions and makes it possible
to move it to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-6-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
ed65e2c848 types: Move cql3_kind to the cql3 directory
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-5-espindola@scylladb.com>
2019-11-21 12:08:47 +02:00
Rafael Ávila de Espíndola
bd560e5520 types: Fix dynamic types of some data_value objects
I found these mismatched types while converting some member functions
to standalone functions, since they have to use the public API that
has more type checks.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-4-espindola@scylladb.com>
2019-11-21 12:08:46 +02:00
Rafael Ávila de Espíndola
0d953d8a35 types: Add a test for value_cast
We had no tests on when value_cast throws or when it moves the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-2-espindola@scylladb.com>
2019-11-21 12:08:45 +02:00
Konstantin Osipov
002ff51053 lua: make sure the latest master builds on Debian/Ubuntu
Use pkg-config to search for Lua dependencies rather
than hard-code include and link paths.

Avoid using boost internals, not present in earlier
versions of boost.

Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com>
Message-Id: <20191120170005.49649-1-kostja@scylladb.com>
2019-11-21 07:57:12 +02:00
Pavel Solodovnikov
d910899d61 configure.py: support multi-threaded linking via gold
Use `-Wl,--threads` flag to enable multi-threaded linking when
using `ld.gold` linker.

Additional compilation test is required because it depends on whether
or not the `gold` linker has been compiled with `--enable-threads` option.

This patch introduces a substantial improvement to the link times of
`scylla` binary in release and debug modes (around 30 percent).

Local setup reports the following numbers with release build for
linking only build/release/scylla:

Single-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30
Multi-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>
2019-11-20 19:28:00 +02:00
Nadav Har'El
89d6d668cb Merge "Redis API in Scylla"
Merged patch series from Peng Jian, adding optionally-enabled Redis API
support to Scylla. This feature is experimental, and partial - the extent
of this support is detailed in docs/redis/redis.md.

Patches:
   Document: add docs/redis/redis.md
   redis: Redis API in Scylla
   Redis API: graft redis module to Scylla
   redis-test: add test cases for Redis API
2019-11-20 16:59:13 +02:00
Piotr Sarna
086e744f8f scripts/find-maintainer: refresh maintainers list
This commit attempts to make the maintainers list up-to-date
to the best of my knowledge, because it got really stale over the time.

Message-Id: <eab6d3f481712907eb83e91ed2b8dbfa0872155f.1574261533.git.sarna@scylladb.com>
2019-11-20 16:56:31 +02:00
Glauber Costa
73aff1fc95 api: export system uptime via REST
This will be useful for tools like nodetool that want to query the uptime
of the system.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190619110850.14206-1-glauber@scylladb.com>
2019-11-20 16:44:11 +02:00
Tomasz Grabiec
9a686ac551 Merge "scylla-gdb: active sstables: support k_l/mc sstable readers" from Benny
Fixes #5277
2019-11-19 23:49:39 +01:00
Avi Kivity
1164ff5329 tools: toolchain: update to Fedora 31
This is a minor update as gcc and boost versions do not change.

glibc-langpack-en no longer gets pulled in by default. As it is required
by some locale use somewhere, it is added to the explicit dependencies.
2019-11-20 00:08:30 +02:00
Avi Kivity
301c835cbf build: force xz compression on rpm binary payload
Fedora 31 switched the default compression to zstd, which isn't readable
by some older rpm distributions (CentOS 7 in particular). Tell it to use
the older xz compression instead, so packages produced on Fedora 31 can
be installed on older distributions.
2019-11-20 00:08:24 +02:00
Avi Kivity
3ebd68ef8a dist: rpm: correct systemd post-uninstall scriptlet
The post-uninstall scriptlet requires a parameter, but older versions
of rpm survived without it. Fedora 31's rpm is more strict, so supply
this parameter.
2019-11-20 00:03:49 +02:00
Peng Jian
e6adddd8ef redis-test: add test cases for Redis API
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:56:16 +08:00
Peng Jian
f2801feb66 Redis API: graft redis module to Scylla
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

v2: build: work around ragel 7 generated code bug (suggested by Avi)
    Ragel 7 incorrectly emits some unused variables that don't compile.
    As a workaround, sed them away.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:55:58 +08:00
Peng Jian
0737d9e84d redis: Redis API in Scylla
Scylla has advantage and amazing features. If Redis build on the top of Scylla,
it has the above features automatically. It's achived great progress
in cluster master managment, data persistence, failover and replication.

The benefits to the users are easy to use and develop in their production
environment, and taking avantages of Scylla.

Using the Ragel to parse the Redis request, server abtains the command name
and the parameters from the request, invokes the Scylla's internal API to
read and write the data, then replies to client.

Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>
2019-11-20 04:55:56 +08:00
Peng Jian
708a42c284 Document: add docs/redis/redis.md
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-11-20 04:46:33 +08:00
Nadav Har'El
9b9609c65b merge: row_marker: correct row expiry condition
Merged patch set by Piotr Dulikowski:

This change corrects condition on which a row was considered expired by its
TTL.

The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.

Fixes #4263,
Fixes #5290.

Tests:

    unit(dev)
    python test described in issue #5290
2019-11-19 18:14:15 +02:00
Amnon Heiman
9df10e2d4b scylla_util.py: Add optional timeout to out function
It is useful to have an option to limit the execution time of a shell
script.

This patch adds an optional timeout parameter, if a parameter will be
provided a command will return and failure if the duration is passed.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-11-19 17:30:28 +02:00
Nadav Har'El
b38c3f1288 Merge "Add separate counters for accesses to system tables"
Merged patch series from Juliusz Stasiewicz:

Welcome to my first PR to Scylla!
The task was intended as a warm-up ("noob") exercise; its description is
here: #4182 Sorry, I also couldn't help it and did some scouting: edited
descriptions of some metrics and shortened few annoyingly long LoC.
2019-11-19 15:21:56 +02:00
Piotr Dulikowski
9be842d3d8 row_marker: tests for row expiration 2019-11-19 13:45:30 +01:00
Tomasz Grabiec
5e4abd75cc main: Abort on EBADF and ENOTSOCK by default
Those are typically symptoms of use-after-free or memory corruption in
the program. It's better to catch such error sooner than later.

That situation is also dangerous since if a valid descriptor would
land under the invalid access, not the one which was intended for the
operation, then the operation may be performed on the wrong file and
result in corruption.

Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>
2019-11-19 13:07:33 +02:00
Piotr Dulikowski
589313a110 row_marker: correct expiration condition
This change corrects condition on which a row was considered expired by
its TTL.

The logic that decides when a row becomes expired was inconsistent with
the logic that decides if a single cell is expired. A single cell
becomes expired when `expiry_timestamp <= now`, while a row became
expired when `expiry_timestamp < now` (notice the strict inequality).
For rows inserted with TTL, this caused non-key cells to expire (change
their values to null) one second before the row disappeared. Now, row
expiry logic uses non-strict inequality.

Fixes: #4263, #5290.

Tests:
- unit(dev)
- python test described in issue #5290
2019-11-19 11:46:59 +01:00
Pekka Enberg
505f2c1008 test.py: Append test repeat cycle to output XML filename
Currently, we overwrite the same XML output file for each test repeat
cycle. This can cause invalid XML to be generated if the XML contents
don't match exactly for every iteration.

Fix the problem by appending the test repeat cycle in the XML filename
as follows:

  $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test

  $ ls -1 *.xml
  jenkins_test.release.vint_serialization_test.0.boost.xml
  jenkins_test.release.vint_serialization_test.1.boost.xml
  jenkins_test.release.vint_serialization_test.2.boost.xml


Fixes #5303.

Message-Id: <20191119092048.16419-1-penberg@scylladb.com>
2019-11-19 11:30:47 +02:00
Rafael Ávila de Espíndola
750adee6e3 lua: fix build with boost 1.67 and older vs fmt
It is not completely clear why the fmt base code fails with boost
1.67, but it is easy to avoid.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191118210540.129603-1-espindola@scylladb.com>
2019-11-19 11:14:00 +02:00
Tomasz Grabiec
ff567649fa Merge "gossip: Limit number of pending gossip ACK and ACK2 messages" from Asias
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.
2019-11-18 10:52:38 +01:00
Benny Halevy
f9e93bba38 sstables: compaction: move cleanup parameter to compaction_descriptor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>
2019-11-18 10:52:20 +01:00
Avi Kivity
1fe062aed4 Merge "Add basic UDF support" from Rafael
"

This patch series adds only UDF support, UDA will be in the next patch series.

With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.

We require --experimental to enable UDF in case there is some change
to the table format.
"

* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
  Lua: Document the conversions between Lua and CQL
  Lua: Implement decimal subtraction
  Lua: Implement decimal addition
  Lua: Implement support for returning decimal
  Lua: Implement decimal to string conversion
  Lua: Implement decimal to floating point conversion
  Lua: Implement support for decimal arguments
  Lua: Implement support for returning varint
  Lua: Implement support for returning duration
  Lua: Implement support for duration arguments
  Lua: Implement support for returning inet
  Lua: Implement support for inet arguments
  Lua: Implement support for returning time
  Lua: Implement support for time arguments
  Lua: Implement support for returning timeuuid
  Lua: Implement support for returning uuid
  Lua: Implement support for uuid and timeuuid arguments
  Lua: Implement support for returning date
  Lua: Implement support for date arguments
  Lua: Implement support for returning timestamp
  ...
2019-11-17 16:38:19 +02:00
Konstantin Osipov
48f3ca0fcb test.py: use the configured build modes from ninja mode_list
Add mode_list rule to ninja build and use it by default when searching
for tests in test.py.

Now it is no longer necessary to explicitly specify the test mode when
invoking test.py.

(cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)
2019-11-17 13:42:10 +01:00
Nadav Har'El
2fb2eb27a2 sstables: allow non-traditional characters in table name
The goal of this patch is to fix issue #5280, a rather serious Alternator
bug, where Scylla fails to restart when an Alternator table has secondary
indexes (LSI or GSI).

Traditionally, Cassandra allows table names to contain only alphanumeric
characters and underscores. However, most of our internal implementation
doesn't actually have this restriction. So Alternator uses the characters
':' and '!' in the table names to mark global and local secondary indexes,
respectively. And this actually works. Or almost...

This patch fixes a problem of listing, during boot, the sstables stored
for tables with such non-traditional names. The sstable listing code
needlessly assumes that the *directory* name, i.e., the CF names, matches
the "\w+" regular expression. When an sstable is found in a directory not
matching such regular expression, the boot fails. But there is no real
reason to require such a strict regular expression. So this patch relaxes
this requirement, and allows Scylla to boot with Alternator's GSI and LSI
tables and their names which include the ":" and "!" characters, and in
fact any other name allowed as a directory name.

Fixes #5280.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191114153811.17386-1-nyh@scylladb.com>
2019-11-17 14:27:47 +02:00
Shlomi Livne
3e873812a4 Document backport queue and procedure (#5282)
This document adds information about how fixes are tracked to be
backported into releases and what is the procedure that is followed to
backport those fixes.

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2019-11-17 01:45:24 -08:00
Benny Halevy
c215ad79a9 scylla-gdb: resolve: add startswith parameter
Allow filtering the resolved addresses by a startswith string.

The common use case if for resolving vtable ptrs, when resolving
the output of `find_vptrs` that may be too long for the host
(running gdb) memory size. In this case the number of vtable
ptrs is considerably smaller than the total number of objects
returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I
examined from scylla --smp=2 --memory=1024M)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:40:54 +02:00
Benny Halevy
2f688dcf08 scylla-gdb.py: find_single_sstable_readers: fix support for sstable_mutation_reader
provide template arguments for k_l and m readers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:02:05 +02:00
Kamil Braun
a67e887dea sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285)
CQL tracing would only report file I/O involving one sstable, even if
multiple sstables were read from during the query.

Steps to reproduce:

create a table with NullCompactionStrategy
insert row, flush memtables
insert row, flush memtables
restart Scylla
tracing on
select * from table
The trace would only report DMA reads from one of the two sstables.

Kudos to @denesb for catching this.

Related issue: #4908
2019-11-17 00:38:37 -08:00
Tomasz Grabiec
a384d0af76 Merge "A set of cleanups over main() code" from Pavel E.
There are ... signs of massive start/stop code rework in the
main() function. While fixing the sub-modules interdependencies
during start/stop I've polished these signs too, so here's the
simplest ones.
2019-11-15 15:25:18 +01:00
Pavel Emelyanov
1dc490c81c tracing: Move register_tracing_keyspace_backend forward decl into proper header
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
7e81df71ba main: Shorten developer_mode() evaluation
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
1bd68d87fc main: Do not carry pctx all over the code
v2:
- do not use struct initialization extention

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
655b6d0d1e main: Hide start_thrift
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
26f2b2ce5e main,db: Kill some unused .hh includes
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
f5b345604f main: Factor out get_conf_sub
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
924d52573d main: Remove unused return_value variable (and capture)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
2195edb819 gitignore: Add tags file
This file is generated by ctags utility for navigation, so it
is not to be tracked by git.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191031221339.19030-1-xemul@scylladb.com>
2019-11-14 16:50:11 +01:00
Gleb Natapov
e0668f806a lwt: change format of partition key serialization for system.paxos table
Serialize provided partition_key in such a way that the serialized value
will hash to the same token as the original key. This way when system.paxos
table is updated the update is shard local.

Message-Id: <20191114135449.GU10922@scylladb.com>
2019-11-14 15:07:16 +01:00
Avi Kivity
19b665ea6b Merge "Correctly handle null/unset frozen collection/UDT columns in INSERT JSON." from Kamil
"
When using INSERT JSON with frozen collection/UDT columns, if the columns were left unspecified or set to null, the statement would create an empty non-null value for these columns instead of using null values as it should have. For example:

cqlsh:b> create table t (k text primary key, l frozen<list<int>>, m frozen<map<int, int>>, s frozen<set<int>>, u frozen<ut>);
cqlsh:b> insert into t JSON '{"k": "insert_json"}';
cqlsh:b> select * from t;
 k                 | l    | m    | s    | u
-------------------+------+------+------+------
       insert_json |     [] |     {} |     {} |

This PR fixes this.
Resolves #5246 and closes #5270.
"

* 'frozen-json' of https://github.com/kbr-/scylla:
  tests: add null/unset frozen collection/UDT INSERT JSON test
  cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON
  cql3: decouple execute from term binding in user_type::setter
2019-11-14 15:29:30 +02:00
Avi Kivity
4544aa0b34 Update seastar submodule
* seastar 75e189c6ba...6f0ef32514 (6):
  > Merge "Add named semaphores" from Piotr
  > parallel_for_each_state: pass rvalue reference to add_future
  > future: Pass rvalue to uninitialized_wrapper::uninitialized_set.
  > dependencies: Add libfmt-dev to debian
  > log: Fix logger behavior when logging both to stdout and syslog.
  > README.md: list Scylla among the projects using Seastar
2019-11-14 15:01:18 +02:00
Juliusz Stasiewicz
1cfa458409 metrics: separate counters for `system' KS accesses
Resolves #4182. Metrics per system tables are accumulated separately,
depending on the origin of query (DB internals vs clients).
2019-11-14 13:14:39 +01:00
Vladimir Davydov
ab42b72c6d cql: fix SERIAL consistency check for batch statements
If CONSISTENCY is set to SERIAL or LOCAL SERIAL, all write requests must
fail according to Cassandra's documentation. However, batched writes
bypass this check. Fix this.
2019-11-14 12:15:39 +01:00
Vladimir Davydov
25aeefd6f3 cql: fix CAS consistency level validation
This patch resurrects Cassandra's code validating a consistency level
for CAS requests. Basically, it makes CAS requests use a special
function instead of validate_for_write to make error messages more
coherent.

Note, we don't need to resurrect requireNetworkTopologyStrategy as
EACH_QUORUM should work just fine for both CAS and non-CAS writes.
Looks like it is just an artefact of a rebase in the Cassandra
repository.
2019-11-14 12:15:39 +01:00
Juliusz Stasiewicz
b1e4d222ed cql3: cosmetics - improved description of metrics 2019-11-14 10:35:42 +01:00
Avi Kivity
cd075e9132 reloc: do not install dependencies when building the relocatable package
The dependencies are provided by the frozen toolchain. If a dependency
is missing, we must update the toolchain rather than rely on build-time
installation, which is not reproducible (as different package versions
are available at different times).

Luckily "dnf install" does not update an already-installed package. Had
that been a case, none of our builds would have been reproducible, since
packages would be updated to the latest version as of the build time rather
than the version selected by the frozen toolchain.

So, to prevent missing packages in the frozen toolchain translating to
an unreproducible build, remove the support for installing dependencies
from reloc/build_reloc.sh. We still parse the --nodeps option in case some
script uses it.

Fixes #5222.

Tests: reloc/build_reloc.sh.
2019-11-14 09:37:14 +02:00
Gleb Natapov
552c56633e storage_proxy: do not release mutation if not all replies were received
MV backpressure code frees mutation for delayed client replies earlier
to save memory. The commit 2d7c026d6e that
introduced the logic claimed to do it only when all replies are received,
but this is not the case. Fix the code to free only when all replies
are received for real.

Fixes #5242

Message-Id: <20191113142117.GA14484@scylladb.com>
2019-11-13 16:23:19 +02:00
Raphael S. Carvalho
3e70523111 distributed_loader: Release disk space of SSTables deleted by resharding
Resharding is responsible for the scheduling the deletion of sstables
resharded, but it was not refreshing the cache of the shards those
sstables belong to, which means cache was incorrectly holding reference
to them even after they were deleted. The consequence is sstables
deleted by resharding not having their disk space freed until cache
is refreshed by a subsequent procedure that triggers it.

Fixes #5261.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191107193550.7860-1-raphaelsc@scylladb.com>
2019-11-13 16:03:27 +02:00
Avi Kivity
6aed3b7471 Merge "cql: trivial cleanup" from Vova
* 'cql-trivial-cleanup' of ssh://github.com/scylladb/scylla-dev:
  cql: rename modification_statement::_sets_a_collection to _selects_a_collection
  cql: rename _column_conditions to _regular_conditions
  cql: remove unnecessary optional around prefetch_data
2019-11-13 15:12:10 +02:00
Avi Kivity
1cb9f9bdfe Merge "Use a fixed-size bitset for column set" from Kostja
"
Use a fixed-size, rather than a dynamically growing
bitset for column mask. This avoids unnecessary memory
reallocation in the most common case.
"

* 'column_set' of ssh://github.com/scylladb/scylla-dev:
  schema: pre-allocate the bitset of column_set
  schema: introduce schema::all_columns_count()
  schema: rename column_mask to column_set
2019-11-13 15:08:13 +02:00
Tomasz Grabiec
f68e17eb52 Merge "Partition/row hit/miss counters for memtable write operations" from Piotr D.
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-13 13:11:51 +01:00
Juliusz Stasiewicz
8318a6720a cql3: error msg w/ arg counts for prepared stmts with wrong arg cnt
Fixes #3748. Very small change: added argument count (expectation vs. reality)
to error msg within `invalid_request_exception'.
2019-11-13 13:43:37 +02:00
Nadav Har'El
ccb9038c69 alternator: Implement Expected operators LT and GT
Merged patch series from Dejan Mircevski. Implements the "LT" and "GT"
operators of the Expected update option (i.e., conditional updates),
and enables the pre-existing tests for them.
2019-11-13 12:07:44 +02:00
Konstantin Osipov
6159c012db schema: pre-allocate the bitset of column_set
The number of columns is usually small, and avoiding
a resize speeds up bit manipulation functions.
2019-11-13 11:41:51 +03:00
Konstantin Osipov
e95d675567 schema: introduce schema::all_columns_count()
schema::all_columns_count() will be used to reserve
memory of the column_set bitmask.
2019-11-13 11:41:42 +03:00
Konstantin Osipov
191acec7ab schema: rename column_mask to column_set
Since it contains a precise set of columns, it's more
accurate to call it a set, not a mask. Besides, the name
column_mask is already used for column options on storage
level.
2019-11-13 11:41:30 +03:00
Kamil Braun
d6446e352e tests: add null/unset frozen collection/UDT INSERT JSON test
When using INSERT JSON with null/unspecified frozen collection/UDT
columns, the columns should be set to null.

See #5270.
2019-11-12 18:24:47 +01:00
Vladimir Davydov
8110178e5d cql: rename modification_statement::_sets_a_collection to _selects_a_collection
This is merely to avoid confusion: we use _sets prefix to indicate that
there are operations over static/regular columns (_sets_static_columns,
_sets_regular_columns), but _sets_a_collection is set for both operations
and conditions. So let's rename it to _selects_a_collection and add some
comments.
2019-11-12 20:15:42 +03:00
Vladimir Davydov
a19192950e cql: rename _column_conditions to _regular_conditions
It's weird that modification_statement has _static_conditions for
conditions on static columns and _column_conditions for conditions on
regular columns, as if conditions on static columns are not column
conditions. Let's rename _column_conditions to _regular_conditions to
avoid confusion.
2019-11-12 20:15:35 +03:00
Konstantin Osipov
0ad0369684 cql: remove unnecessary optional around prefetch_data 2019-11-12 20:15:24 +03:00
Kamil Braun
6c04c5bed5 cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON
Before this commit, an empty non-null value was created for
frozen collection/UDT columns when an INSERT JSON statement was executed
with the value left unspecified or set to null.
This was incompatible with Cassandra which inserted a null (dead cell).

Fixes #5270.
2019-11-12 18:05:01 +01:00
Kamil Braun
0ad7d71f31 cql3: decouple execute from term binding in user_type::setter
This commit makes it possible to pass a bound value terminal
directly to the setter.
Continuation of commit bfe3c20035.
2019-11-12 18:02:21 +01:00
Takuya ASADA
614ec6fc35 install.sh: drop --pkg option, use .install file on .deb package
--pkg option on install.sh is introduced for .deb packaging since it requires
different install directory for each subpackage.
But we actually able to use "debian/tmp" for shared install directory,
then we can specify file owner of the package using .install files.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191030203142.31743-1-syuu@scylladb.com>
2019-11-12 16:50:37 +02:00
Piotr Dulikowski
59fbbb993f memtables: add partition/row hit/miss counters
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-12 13:35:41 +01:00
Piotr Dulikowski
48f7b2e4fb table: move out table::stats to table_stats
This change was done in order to be able to forward-declare
the table::stats structure.
2019-11-12 13:35:41 +01:00
Avi Kivity
cf7291462d Merge "cql3/functions: add missing min/max/count functions for ascii type" from Piotr
"
Adds missing overloads of functions count, min, max for type ascii.

Now they work:

cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE ks;
cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii);
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl');
cqlsh:ks> SELECT * FROM test_ascii;

 id | value
----+-------
  1 |  efgh
  0 |  abcd
  2 |  ijkl

(3 rows)
cqlsh:ks> SELECT count(value) FROM test_ascii;

 system.count(value)
---------------------
                   3

(1 rows)
cqlsh:ks> SELECT min(value) FROM test_ascii;

 system.min(value)
-------------------
              abcd

(1 rows)
cqlsh:ks> SELECT max(value) FROM test_ascii;

 system.max(value)
-------------------
              ijkl

(1 rows)
Tests:

unit(release)
cql_group_functions_tests.py (with added check for ascii type)

Fixes #5147.
"

* '5147-fix-min-max-count-for-ascii' of https://github.com/piodul/scylla:
  tests/cql_query_test: add aggregate functions test
  cql3/functions: add missing min/max/count for ascii
2019-11-12 14:15:14 +02:00
Piotr Dulikowski
41cb16a526 tests/cql_query_test: add aggregate functions test
Adds a test for min, max and avg functions for those primitive types for
which those functions are working at the moment.
2019-11-12 13:01:34 +01:00
Piotr Dulikowski
6d78d7cc69 cql3/functions: add missing min/max/count for ascii
Adds missing overloads of functions `count`, `min`, `max` for
type `ascii`. Now they work:

cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1};
cqlsh> USE ks;
cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii);
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl');
cqlsh:ks> SELECT * FROM test_ascii;

 id | value
----+-------
  1 |  efgh
  0 |  abcd
  2 |  ijkl

(3 rows)
cqlsh:ks> SELECT count(value) FROM test_ascii;

 system.count(value)
---------------------
                   3

(1 rows)
cqlsh:ks> SELECT min(value) FROM test_ascii;

 system.min(value)
-------------------
              abcd

(1 rows)
cqlsh:ks> SELECT max(value) FROM test_ascii;

 system.max(value)
-------------------
              ijkl

(1 rows)

Tests:
- unit(release)
- cql_group_functions_tests.py (with added check for `ascii` type)

Fixes #5147.
2019-11-12 13:01:34 +01:00
Rafael Ávila de Espíndola
10bcbaf348 Lua: Document the conversions between Lua and CQL
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6ffddeae5e Lua: Implement decimal subtraction
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
aba8e531d1 Lua: Implement decimal addition
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bb84eabbb3 Lua: Implement support for returning decimal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc17312a86 Lua: Implement decimal to string conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
e83d5bf375 Lua: Implement decimal to floating point conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b568bf4f54 Lua: Implement support for decimal arguments
This is just the minimum to pass a value to Lua. Right now you can't
actually do anything with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6c3f050eb4 Lua: Implement support for returning varint
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dc377abd68 Lua: Implement support for returning duration
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
c3f021d2e4 Lua: Implement support for duration arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9208b2f498 Lua: Implement support for returning inet
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
64be94ab01 Lua: Implement support for inet arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
faf029d472 Lua: Implement support for returning time
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
772f2a4982 Lua: Implement support for time arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
484f498534 Lua: Implement support for returning timeuuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9c2daf6554 Lua: Implement support for returning uuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ae1a1a4085 Lua: Implement support for uuid and timeuuid arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8aeed5beb Lua: Implement support for returning date
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
384effa54b Lua: Implement support for date arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
63bc960152 Lua: Implement support for returning timestamp
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee95756f62 Lua: Implement support for timestamp arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
1c6d5507b4 Lua: Implement support for returning counter
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0d9d53b5da Lua: Implement support for counter arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
74c4e58b6b Lua: Add a test for nested types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b226511ce8 Lua: Implement support for returning maps
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5c8d1a797f Lua: Implement support for map arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b5b15ce4e6 Lua: Implement support for returning set
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf7ba441e4 Lua: Implement support for set arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
02f076be43 Lua: Implement support for returning udt
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
92c8e94d9a Lua: Implement support for udt arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
a7c3f6f297 Lua: Implement support for returning list
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
688736f5ff Lua: Implement support for returning tuple
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ab5708a711 Lua: Implement support for list and tuple arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
534f29172c Lua: Implement support for returning boolean
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b03c580493 Lua: Implement support for boolean arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dcfe397eb6 Lua: Implement support for returning floating point
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf4b7ab39a Lua: Implement support for returning blob
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3d22433cd4 Lua: Implement support for blob arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dd754fcf01 Lua: Implement support for returning ascii
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
affb1f8efd Lua: Implement support for returning text
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8ed347ee7 Lua: Implement support for string arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0e4f047113 Lua: Implement a visitor for return values
This adds support for all integer types. Followup commits will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
34b770e2fb Lua: Push varint as decimal
This makes it substantially simpler to support both varint and
decimal, which will be implemented in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9b3cab8865 Lua: Implement support for varint to integer conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5a40264d97 Lua: Implement support for varint arguments
Right now it is not possible to do anything with the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3230b8bd86 Lua: Implement support for floating point arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9ad2cc2850 Lua: Implement a visitor for arguments
With this we support all simple integer types. Followup patches will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee1d87a600 Lua: Plug in the interpreter
This add a wrapper around the lua interpreter so that function
executions are interruptible and return futures.

With this patch it is possible to write and use simple UDFs that take
and return integer values.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc3bba1064 Lua: Add lua.cc and lua.hh skeleton files
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
7015e219ca Lua: Link with liblua
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
61200ebb04 Lua: Add config options
This patch just adds the config options that we will expose for the
lua runtime.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
d9337152f3 Use threads when executing user functions
This adds a requires_thread predicate to functions and propagates that
up until we get to code that already returns futures.

We can then use the predicate to decide if we need to use
seastar::async.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
52b48b415c Test that schema digests with UDFs don't change
This refactors test_schema_digest_does_not_change to also test a
schema with user defined functions and user defined aggregates.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
fc72a64c67 Add schema propagation and storage for UDF
With this it is possible to create user defined functions and
aggregates and they are saved to disk and the schema change is
propagated.

It is just not possible to call them yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ce6304d920 UDF: Add a feature and config option to track if udf is enabled
It can only be enabled with --experimental.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:40:47 -08:00
Rafael Ávila de Espíndola
dd17dfcbef Reject "OR REPLACE ... IF NOT EXISTS" in the grammar
The parser now rejects having both OR REPLACE and IF NOT EXISTS in the
same statement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
e7e3dab4aa Convert UDF parsing code to c++
For now this just constructs the corresponding c++ classes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5c45f3b573 Update UDF syntax
This updates UDF syntax to the current specification.

In particular, this removes DETERMINISTIC and adds "CALLED ON NULL
INPUT" and "RETURNS NULL ON NULL INPUT".

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
c75cd5989c transport: Add support for FUNCTION and AGGREGATE to schema_change
While at it, modernize the code a bit and add a test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
dac3cf5059 Clear functions between cql_test_env runs
At some point we should make the function list non static, but this
allows us to write tests for now.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
de1a970b93 cql: convert functions to add, remove and replace functions
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
33f9d196f9 Add iterator version of functions::find
This avoids allocating a std::vector and is more flexible since the
iterator can be passed to erase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
7f9dadee5c Implement functions::type_equals.
Since the types are uniqued we can just use ==.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5cef5a1b38 types: Add a friend visitor over data_value
This is a simple wrapper that allows code that is not in the types
hierarchy to visit a data_value.

Will be used by UDF.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
9bf9a84e4d types: Move the data_value visitor to a header
It will be used by the UDF implementation.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Yaron Kaikov
4a9b2a8d96 dist/docker: Add SCYLLA_REPO_URL argument to Dockerfile (#5264)
This change adds a SCYLLA_REPO_URL argument to Dockerfile, which defines
the RPM repository used to install Scylla from.

When building a new Docker image, users can specify the argument by
passing the --build-arg SCYLLA_REPO_URL=<url> option to the docker build
command. If the argument is not specified, the same RPM repository is
used as before, retaining the old default behavior.

We intend to use this in release engineering infrastructure to specify
RPM repositories for nightly builds of release branches (for example,
3.1.x), which are currently only using the stable RPMs.
2019-11-07 09:21:05 +02:00
Pavel Emelyanov
486e3f94d0 deps: Add libunistring-dev to debian
With this, previous patch to seastar and (suddenly) xenial repo for
scylla-libthrift010-dev scylla-antlr35-c++-dev the build on debian
buster finally passes.

Signed-off-by: Pavel Emelyanov <xemul@scyladb.com>
Message-Id: <CAHTybb-QFyJ7YQW0b6pjhY_xUr-_b1w_O3K1=1FOwrNM55BkLQ@mail.gmail.com>
2019-11-01 09:03:39 +02:00
Dejan Mircevski
859883b31d alternator: Implement GT operator in Expected
Add cmp_gt and use it in check_compare() to handle the GT case.  Also
reactivate GT tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 17:18:22 -04:00
Dejan Mircevski
0f7d837757 alternator: Factor out check_compare()
Code for check_LT(), check_GT(), etc. will be nearly identical, so
factor it out into a single function that takes a comparator object.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 17:01:29 -04:00
Dejan Mircevski
a47b768959 alternator: Implement LT operator in Expected
Add check_LT() function and reactivate LT tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 16:07:29 -04:00
Dejan Mircevski
ceae3c182f alternator: Overload base64_decode on rjson::value
In 1ca9dc5d47, it was established that the correct way to
base64-decode a JSON value is via string_view, rather than directly
from GetString().

This patch adds a base64_decode(rjson::value) overload, which
automatically uses the correct procedure.  It saves typing, ensures
correctness (fixing one incorrect call found), and will come in handy
for future EXPECTED comparisons.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 15:56:03 -04:00
Dejan Mircevski
9955f0342f alternator: Make unwrap_number() visible
unwrap_number() is now a public function in serialization.hh instead
of a static function visible only in executor.cc.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 10:46:30 -04:00
Nadav Har'El
3f859adebd Merge: Fix filtering static columns on empty partitions
Merged patch series from Piotr Sarna:

An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.

Fixes #5248
2019-10-31 10:50:21 +02:00
Pavel Emelyanov
5fe4757725 docs: The scylla's dpdk config is boolean
Docs say one can say --disable-dpdk , while it's not so. It's the seastar's
configure.py that has tristate -dpdk option, the scylla's one can only be
enabled.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Message-Id: <CAHTybb-rxP8DbH-wW4Zf-w89iuCirt6T6-PjZAUfVFj7C5yb=A@mail.gmail.com>
2019-10-31 10:12:17 +02:00
Vladimir Davydov
9ea8114f8c cql: fix CAS metric label
"type" label is already in use for the counter type ("derive", "gauge",
etc). Using the same label for "cas" / "non-cas" overwrites it. Let's
instead call the new label "conditional" and use "yes" / "no" for its
value, as suggested by Kostja.
Message-Id: <3082b16e4d6797f064d58da95fb4e50b59ab795c.1572451480.git.vdavydov@scylladb.com>
2019-10-30 17:14:17 +01:00
Avi Kivity
398c482cd0 Merge "combined reader gallop mode" from Piotr
"
In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode.

In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations.

In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally.

A microbenchmark was added in order to test performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%
Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%
Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.

Tests: unit(release)
Fixes #3593.
"

* '3593-combined_reader-gallop-mode' of https://github.com/piodul/scylla:
  mutation_reader: gallop mode microbenchmark
  mutation_reader: combined reader gallop tests
  mutation_reader: gallop mode for combined reader
  mutation_reader: refactor prepare_next
2019-10-30 17:34:47 +02:00
Piotr Sarna
dd00470a44 tests: add a test case for filtering on static columns
The test case covers filtering with an empty partition.

Refs #5248
2019-10-30 15:34:10 +01:00
Piotr Sarna
ca6fe598ec cql3: fix filtering on a static column for empty partitions
An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.

Fixes #5248
2019-10-30 15:31:54 +01:00
Tomasz Grabiec
9da3aec115 Merge "Mutation diff improvements" from Benny
- accept diff_command option
 - standard input support
2019-10-30 13:40:58 +01:00
Tomasz Grabiec
0d9367e08f Merge "Scyllatop: one pass update of multiple metrics" from Benny
Update previous results dictionary using the update_metrics method.
It calls metric_source.query_list to get a list of results (similar to discover()) then for each line in the response it updates results dictionary.

New results may be appeneded depending on the do_append parameter (True by default).

Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build.
(E.g. dtest-debug/216/testReport/scyllatop_test/TestScyllaTop/default_start_test/)
2019-10-30 13:38:39 +01:00
Tomasz Grabiec
b7b0a53b50 Merge "Add metrics for light-weigth transactions" from Vova
This patch set adds metrics useful for analyzing light-weight
transaction performance. The same metrics are available in Cassandra.
2019-10-30 12:09:03 +01:00
Vladimir Davydov
f0075ba845 cql: account cas requests separately
This patch adds "type" label to the following CQL metrics:

  inserts
  updates
  deletes
  batches
  statements_in_batches

The label is set to "cas" for conditional statements and "non-cas" for
unconditional statements.

Note, for a batch to be accounted as CAS, it is enough to have just one
conditional statement. In this case all statements within the batch are
accounted as CAS as well.
2019-10-30 13:44:35 +03:00
Piotr Dulikowski
81883a9f2e mutation_reader: gallop mode microbenchmark
This microbenchmark tests performance of the galloping reader
optimization. A combining reader that merges results from four other
readers is created. Each sub-reader provides a range of 32 clustering
rows that is disjoint from others. All sub-readers return rows from
the same partition. An improvement can be observed after introducing the
galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty
close to the old ones. The only one that seems to have suffered slightly
is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined:
(1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%

Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%

Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.
2019-10-30 09:51:18 +01:00
Piotr Dulikowski
29d6842db9 mutation_reader: combined reader gallop tests 2019-10-30 09:51:18 +01:00
Piotr Dulikowski
2b4ca0c562 mutation_reader: gallop mode for combined reader
In case when a single reader contributes a stream of fragments
and keeps winning over other readers, mutation_reader_merger will
enter gallop mode, in which it is assumed that the reader will keep
winning over other readers. Currently, a reader needs to contribute
3 fragments to enter that mode.

In gallop mode, fragments returned by the galloping reader will be
compared with the best fragment from _fragment_heap. If it wins, the
fragment is directly returned. Otherwise, gallop mode ends and
merging performed as in general case, which involves heap operations.

In current implementation, when the end of partition is encountered
while in gallop mode, the gallop mode is ended unconditionally.

Fixes #3593.
2019-10-30 09:51:18 +01:00
Piotr Dulikowski
2a46a09e7c mutation_reader: refactor prepare_next
Move out logic responsible for adding readers at partition boundary
into `maybe_add_readers_at_partition_boundary`, and advancing one reader
into `prepare_one`. This will allow to reuse this logic outside
`prepare_next`.
2019-10-30 09:49:12 +01:00
Avi Kivity
623071020e commitlog: change variadic stream in read_log_file to future<struct>
Since seastar::streams are based on future/promise, variadic streams
suffer the same fate as variadic futures - deprecation and eventual
removal.

This patch therefore replaces a variadic stream in commitlog::read_log_file()
with a non-variadic stream, via a helper struct.

Tests: unit (dev)
2019-10-29 19:25:12 +01:00
Botond Dénes
271ab750a6 scylla-gdb.py: add replica section to scylla memory
Recently, scylla memory started to go beyond just providing raw stats
about the occupancy of the various memory pools, to additionally also
provide an overview of the "usual suspects" that cause memory pressure.
As part of this, recently 46341bd63f
added a section of the coordinator stats. This patch continues this
trend and adds a replica section, with the "usual suspects":
* read concurrency semaphores
* execution stages
* read/write operations

Example:

    Replica:
      Read Concurrency Semaphores:
        user sstable reads:        0/100, remaining mem:      84347453 B, queued: 0
        streaming sstable reads:   0/ 10, remaining mem:      84347453 B, queued: 0
        system sstable reads:      0/ 10, remaining mem:      84347453 B, queued: 0
      Execution Stages:
        data query stage:
          03 "service_level_sg_0"             4967
             Total                            4967
        mutation query stage:
             Total                            0
        apply stage:
          03 "service_level_sg_0"             12608
          06 "statement"                      3509
             Total                            16117
      Tables - Ongoing Operations:
        pending writes phaser (top 10):
                  2 ks.table1
                  2 Total (all)
        pending reads phaser (top 10):
               3380 ks.table2
                898 ks.table1
                410 ks.table3
                262 ks.table4
                 17 ks.table8
                  2 system_auth.roles
               4969 Total (all)
        pending streams phaser (top 10):
                  0 Total (all)

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029164817.99865-1-bdenes@scylladb.com>
2019-10-29 18:03:06 +01:00
Vladimir Davydov
e510288b6f api: wire up column_family cas-related statistics 2019-10-29 19:26:18 +03:00
Vladimir Davydov
b75862610e paxos_state: account paxos round latency
This patch adds the following per table stats:

  cas_prepare_latency
  cas_propose_latency
  cas_commit_latency

They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed
by Cassandra.
2019-10-29 19:26:18 +03:00
Vladimir Davydov
21c3c98e5b api: wire up storage_proxy cas-related statistics 2019-10-29 19:26:18 +03:00
Vladimir Davydov
c27ab87410 storage_proxy: add cas request accounting
This patch implements accounting of Cassandra's metrics related to
lightweight transactions, namely:

  cas_read_latency              transactional read latency (histogram)
  cas_write_latency             transactional write latency (histogram)
  cas_read_timeouts             number of transactional read timeouts
  cas_write_timeouts            number of transactional write timeouts
  cas_read_unavailable          number of transactional read
                                unavailable errors
  cas_write_unavailable         number of transactional write
                                unavailable errors
  cas_read_unfinished_commit    number of transaction commit attempts
                                that occurred on read
  cas_write_unfinished_commit   number of transaction commit attempts
                                that occurred on write
  cas_write_condition_not_met   number of transaction preconditions
                                that did not match current values
  cas_read_contention           how many contended reads were
                                encountered (histogram)
  cas_write_contention          how many contended writes were
                                encountered (histogram)
2019-10-29 19:25:47 +03:00
Vladimir Davydov
967a9e3967 storage_proxy: zap ballot_and_contention
Pass contention by reference to begin_and_repair_paxos(), where it is
incremented on every sleep. Rationale: we want to account the total
number of times query() / cas() had to sleep, either directly or within
begin_and_repair_paxos(), no matter if the function failed or succeeded.
2019-10-29 19:22:18 +03:00
Botond Dénes
49aa8ab8a0 scylla-gdb.py: add compatibility with Scylla 3.0
Even though every Scylla version has its own scylla-gdb.py, because we
don't backport any fixes or improvements, practically we end up always
using master's version when debugging older versions of Scylla too. This
is made harder by the fact that both Scylla's and its dependencies'
(most notably that of libstdc++ and boost) code is constantly changing
between releases, requiring edits to scylla-gdb.py to make it usable
with past releases.

This patch attempts to make it easier to use scylla-gdb.py with past
releases, more specifically Scylla 3.0. This is achieved by wrapping
problematic lines in a `try: except:` and putting the backward
compatible version in the `except:` clause. These lines have comments
with the version they provide support for, so they can be removed when
said version is not supported anymore.

I did not attempt to provide full coverage, I only fixed up problems
that surfaced when using my favourite commands with 3.0.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029155737.94456-1-bdenes@scylladb.com>
2019-10-29 17:05:19 +01:00
Botond Dénes
e48f301e95 repair: repair_cf_range(): extract result of local checksum calculation only once
The loop that collects the result of the checksum calculations and logs
any errors. The error logging includes `checksums[0]` which corresponds
to the checksum calculation on the local node. This violates the
assumption of the code following the loop, which assumes that the future
of `checksums[0]` is intact after the loop terminates. However this is
only true when the checksum calculation is successful and is false when
it fails, as in this case the loop extracts the error and logs it. When
the code after the loop checks again whether said calculation failed, it
will get a false negative and will go ahead and attempt to extract the
value, triggering an assert failure.
Fix by making sure that even in the case of failed checksum calculation,
the result of `checksum[0]` is extracted only once.

Fixes: #5238
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>
2019-10-29 17:00:37 +01:00
Avi Kivity
60ea29da90 Update seastar submodule
* seastar 2963970f6b...75e189c6ba (7):
  > posix-stack: Do auto-resolve of ipv6 scope iff not set for link-local dests
  > README.md: Add redpanda and smf to 'Projects using Seastar'
  > unix_domain_test: don't assume that at temporary_buffer is null terminated
  > socket_address: Use offsetof instead of null pointer
  > README: add projects using seastar section to readme
  > Adjustments for glibc 2.30 and hwloc 2.0
  > Mark future::failed() as const
2019-10-29 14:34:10 +02:00
Gleb Natapov
0e9df4eaf8 lwt: mark lwt as experimental
We may want to change paxos tables format and change internode protocol,
so hide lwt behind experimental flag for now.

Message-Id: <20191029102725.GM2866@scylladb.com>
2019-10-29 14:33:48 +02:00
Benny Halevy
79d5fed40b mutation_fragment_stream_validator: validate end of stream in partition_key filter
Currently end of stream validation is done in the destructor,
but the validator may be destructed prematurely, e.g. on
exception, as seen in https://github.com/scylladb/scylla/issues/5215

This patch adds a on_end_of_stream() method explicitly called by
consume_pausable_in_thread.  Also, the respective concepts for
ParitionFilter, MutationFragmentFilter and a new on for the
on_end_of_stream method were unified as FlattenedConsumerFilter.

Refs #5215

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)
2019-10-29 12:35:33 +01:00
Benny Halevy
d5f53bc307 mutation_fragment_stream_validator: validate partition key monotonicity
Fixes #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 736360f823621f7994964fee77f37378ca934c56)
2019-10-29 12:35:33 +01:00
Gleb Natapov
e5e44bfda2 client_state: fix get_timestamp_for_paxos() to always advance a timestamp
Message-Id: <20191029102336.GL2866@scylladb.com>
2019-10-29 13:07:33 +02:00
Tomasz Grabiec
c2a4c915f3 Merge "Fix a few issues with CAS requests" from Vladimir D.
There are a few issues at the CQL layer, because of which the result of
a CAS request execution may differ between Scylla and Cassandra. Mostly,
it happens when static columns are involved. The goal of this patch set
is to fix these issues, thus making Scylla's implementation of CAS yield
the same results as Cassandra's.
2019-10-29 11:50:15 +01:00
Rafael Ávila de Espíndola
c74864447b types: Simplify validate_visitor for strings
We have different types for ascii and utf8, so there is no need for
an extra if.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191024232911.22700-1-espindola@scylladb.com>
2019-10-29 11:02:55 +02:00
Nadav Har'El
d69ab1b588 CDC: (atomic) delta + (non-optional) pre-image data columns
Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski:

Adds delta and pre-image data column writes for the atomic columns in a
cdc-enabled table.

Note that in this patch set it is still unconditional. Adding option support
comes in next set.

Uses code more or less derived from alternator to select pre-image, using
raw query interface. So should be fairly low overhead to query generation.
Pre-image and delta mutations are mixed in with the actual modification
mutations to generate the full cdc log (sans post-image).
2019-10-29 09:39:28 +02:00
Calle Wilund
7db393fe12 cdc_test: Add helper methods + preimage test
Add filtering, sorting etc helpers + simple pre-image test

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-29 07:49:05 +01:00
Vladimir Davydov
65b86d155e cql: add static row to CAS failure result if there are static conditions
Even if no rows match clustering key restrictions of a conditional
statement with static columns conditions, we still must include the
static column value into the CAS failure result set. For example,
the following conditional DELETE statement

  create table t(k int, c int, s int static, v int, primary key(k, c));
  insert into t(k, s) values(1, 1);
  delete v from t where k=1 and c=1 if v=1 and s=1;

must return

  [applied=False, v=null, s=1]

not just

  [applied=False, v=null, s=null]

To fix that, set partition_slice::option::always_return_static_content
for querying rows used for checking conditions so that we have the
static row in update_parameters::prefetch_data even if no regular row
matches clustering column restrictions. Plus modify cas_request::
applies_to() so that it sets is_in_cas_result_set flag for the static
row in case there are static column conditions, but the result set
happens to be empty.

As pointed out by Tomek, there's another reason to set partition_slice::
option::always_return_static_content apart from building a correct
result set on CAS failure. There could be a batch with two statements,
one with clustering key restrictions which select no row, and another
statement with only static column conditions. If we didn't enable this
flag, we wouldn't get a static row even if it exists, and static column
conditions would evaluate as if the static row didn't exist, for
example, the following batch

  create table t(k int, c int, s int static, primary key(k, c));
  insert into t(k, s) values(1, 1);
  begin batch
  insert into t(k, c) values(1, 1) if not exists
  update t set s = 2 where k = 1 if s = 1
  apply batch;

would fail although it clearly must succeed.
2019-10-28 22:30:37 +03:00
Vladimir Davydov
e0b31dd273 query: add flag to return static row on partition with no rows
A SELECT statement that has clustering key restrictions isn't supposed
to return static content if no regular rows matches the restrictions,
see #589. However, for the CAS statement we do need to return static
content on failure so this patch adds a flag that allows the caller to
override this behavior.
2019-10-28 21:50:44 +03:00
Vladimir Davydov
57d284d254 cql: exclude statements not checked by cas from result set
Apart from conditional statements, there may be other reading statements
in a batch, e.g. manipulating lists. We must not include rows fetched
for them into the CAS result set. For instance, the following CAS batch:

  create table t(p int, c int, i int, l list<int>, primary key(p, c));
  insert into t(p, c, i) values(1, 1, 1)
  insert into t(p, c, i, l) values(1, 1, 1, [1, 2, 3])
  begin batch
  update t set i=3 where p=1 and c=1 if i=2
  update t set l=l-[2] where p=1 and c=2
  apply batch;

is supposed to return

  [applied] | p | c | i
  ----------+---+---+---
     False  | 1 | 1 | 1

not

  [applied] | p | c | i
  ----------+---+---+---
     False  | 1 | 1 | 1
     False  | 1 | 2 | 1

To filter out such collateral rows from the result set, let's mark rows
checked by conditional statements with a special flag.
2019-10-28 21:50:43 +03:00
Vladimir Davydov
74b9e80e4c cql: fix EXISTS check that applies only to static columns
If a CQL statement only updates static columns, i.e. has no clustering
key restrictions, we still fetch a regular row so that we can check it
against EXISTS condition. In this case we must be especially careful: we
can't simply pass the row to modification_statement::applies_to, because
it may turn out that the row has no static columns set, i.e. there's no
in fact static row in the partition. So we filter out such rows without
static columns right in cas_request::applies_to before passing them
further to modification_statement::applies_to.

Example:

  create table t(p int, c int, s int static, primary key(p, c));
  insert into t(p, c) values(1, 1);
  insert into t(p, s) values(1, 1) if not exists;

The conditional statement must succeed in this case.
2019-10-28 21:49:37 +03:00
Vladimir Davydov
8fbf344f03 cql: ignore clustering key if statement checks only static columns
In case a CQL statement has only static columns conditions, we must
ignore clustering key restrictions.

Example:

  create table t(p int, c int, s int static, v int, primary key(p, c));
  insert into t(p, s) values(1, 1);
  update t set v=1 where p=1 and c=1 if s=1;

This conditional statement must successfully insert row (p=1, c=1, v=1)
into the table even though there's no regular row with p=1 and c=1 in
the table before it's executed, because the statement condition only
applies to the static column s, which exists and matches.
2019-10-28 21:13:19 +03:00
Vladimir Davydov
54cf903bb2 cql: differentiate static from regular EXISTS conditions
If a modification statement doesn't have a clustering column restriction
while the table has static columns, then EXISTS condition just needs to
check if there's a static row in the partition, i.e. it doesn't need to
select any regular rows. Let's treat such EXIST condition like a static
column condition so that we can ignore its clustering key range while
checking CAS conditions.
2019-10-28 21:13:05 +03:00
Vladimir Davydov
934a87999f cql: turn prefetch_data::row into struct
This will allow us to add helper methods and store extra info in each
row. For example, we can add a method for checking if a row has static
columns. Also, to build CAS result set, we need to differentiate rows
fetched to check conditions from those fetched for reading operations.
Using struct as row container will allow us to store this information in
each prefetched row.
2019-10-28 21:12:52 +03:00
Vladimir Davydov
bdd62b8bc3 cql: remove static column check from create_clustering_ranges
The check is pointless, because we check exactly the same while
preparing the statement, see process_where_clause() method of
modification_statement.
2019-10-28 21:12:43 +03:00
Vladimir Davydov
a8ddbffa75 cql: fix applies_only_to_static_columns check
Currently, we set _sets_regular_columns/_sets_static_columns flags when
adding regular/static conditions to modification_statement. We use them
in applies_only_to_static_columns() function that returns true iff
_sets_static_columns is set and _sets_regular_columns is clear. We
assume that if this function returns true then the statement only deals
with static columns and so must not have clustering key restrictions.
Usually, that's true, but there's one exception: DELETE FROM ...
statement that deletes whole rows. Technically, this statement doesn't
have any column operations, i.e. _sets_regular_columns flag is clear.
So if such a statement happens to have a static condition, we will
assume that it only applies to static columns and mistakenly raise an
error.

Example:

  create table t(k int, c int, s int static, v int, primary key(k, c));
  delete from t where k=1 and c=1 if s=1;

To fix this, let's not set the above mentioned flags when adding
conditions and instead check if _column_conditions array is empty in
applies_only_to_static_columns().
2019-10-28 21:12:36 +03:00
Vladimir Davydov
fbb11dac11 cql: set conditions before processing where clause
modification_statement::process_where_clause() assumes that both
operations and conditions has been added to the statement when it's
called: it uses this information to raise an error in case the statement
restrictions are incompatible with operations or conditions. Currently,
operations are set before this function is called, but not conditions.
This results in "Invalid restrictions on clustering columns since
the {} statement modifies only static columns" error while trying to
execute the following statements:

  create table t(k int, c int, s int static, v int, primary key(k, c));
  delete s from t where k=1 and c=1 if v=1;
  update t set s=1 where k=1 and c=1 if v=1;

Fix this by always initializing conditions before processing WHERE
clause.
2019-10-28 21:12:22 +03:00
Botond Dénes
edc1750297 scylla-gdb.py: introduce scylla smp-queues
Print a histogram of the number of async work items in the shard's
outgoing smp queues.
Example:

    (gdb) scylla smp-queues
        10747 17 ->  3 ++++++++++++++++++++++++++++++++++++++++
          721 17 -> 19 ++
          247 17 -> 20 +
          233 17 -> 10 +
          210 17 -> 14 +
          205 17 ->  4 +
          204 17 ->  5 +
          198 17 -> 16 +
          197 17 ->  6 +
          189 17 -> 11 +
          181 17 ->  1 +
          179 17 -> 13 +
          176 17 ->  2 +
          173 17 ->  0 +
          163 17 ->  8 +
            1 17 ->  9 +

Useful for identifying the target shard, when `scylla task_histogram`
indicates a high number of async work items.

To produce the histogram the command goes over all virtual objects in
memory and identifies the source and target queues of each
`seastar::smp_message_queue::async_work_item` object. Practically the
source queue will always be that of the current shard. As this scales
with the number of virtual objects in memory, it can take some time to
run. An alternative implementation would be to instead read the actual
smp queues, but the code of that is scary so I went for the simpler and
more reliable solution.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191028132456.37796-1-bdenes@scylladb.com>
2019-10-28 15:42:55 +02:00
Tomasz Grabiec
3b37027598 Merge "lwt: implement basic lightweight transactions support" from Kostja
This patch set introduces light-weight transactions support to
ScyllaDB. It is a subset of the full series, which adds
basic LWT support and which has been reviewed thus far.
2019-10-28 11:45:28 +01:00
Tomasz Grabiec
f745819ed7 Merge "lwt: paxos protocol implementation" from Gleb
This is paxos implementation for LWT. LWT itself is not included in the
patch so the code is essentially is not wired yet (except read path).
2019-10-28 11:29:40 +01:00
Avi Kivity
f8ba96efcf Merge "test_udt_mutations fixes" from Benny
"
mutation_test/test_udt_mutations kept failing on my machine and I tracked it down to the 3rd patch in this series (use int64_t constants for long_type). While at it, this series also fixes a comment and the end iterator in BOOST_REQUIRE(std::all_of(...))

mutation_test: test_udt_mutations: fixup udt comment
mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
mutation_test: test_udt_mutations: use int64_t constants for long_type

Test: mutation_test(dev, debug)
"

* 'test_udt_mutations-fixes' of https://github.com/bhalevy/scylla:
  mutation_test: test_udt_mutations: use int64_t constants for long_type
  mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
  mutation_test: test_udt_mutations: fixup udt comment
2019-10-28 10:43:52 +02:00
Calle Wilund
36328acf60 cql_assertions: Change signature to accept sstring 2019-10-28 06:16:12 +01:00
Calle Wilund
7d98f735ee cdc: Add static columns to data/preimage mutations
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
19bba5608a cdc: Create and perform a pre-image select for mutations
As well as generate per-image rows in resulting log mutation

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d4ee1938c7 cdc: Add modification record for regular atomic values in mutations
Fills in the data columns for regular columns iff they are
atomic (not unfrozed collections)
2019-10-28 06:16:12 +01:00
Calle Wilund
3fdcbd9dff cdc: Set row op in log
Adds actual operation (part delete, range delete, update) to
cdc log
2019-10-28 06:16:12 +01:00
Calle Wilund
8a6b72f47e cdc: Add pre-image select generator method
Based on a mutation, creates a pre-image select operation.

Note, this uses raw proxy query to shortcut parsing etc,
instead of trying to cache by generated query. Hypothesis is that
this is essentially faster.

The routine assumes all rows in a mutation touch same static/regular
columns. If this is not always true it will need additional
calculations.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d74f32b07a cql3::untyped_result_set: Add constructor from cql3:;result_set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
3ed7a9dd69 cql3::untyped_result_set: Add view getter to make non-intrusive read chaper
Also use in actual data conversion.
2019-10-28 06:16:12 +01:00
Calle Wilund
451bb7447d cdc: Add log / log data column operation types and make data cols tuples of these
Makes static/regular data columns tuple<op, value, ttl> as per spec.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Konstantin Osipov
e555dc502e lwt: implement basic lightweight transactions support
Support single-statement conditional updates and as well as batches.

This patch almost fully rewrites column_condition.cc, implementing
is_satisfied_by().

Most of the remaining complications in column_condition implementation
come from the need to properly handle frozen and multi-cell
collection in predicates - up until now it was not possible
to compare entire collection values between each other. This is further
complicated since multi-cell lists and sets are returned as maps.

We can no longer assume that the columns fetched by prefetch operation
are non-frozen collections. IF EXISTS/IF NOT EXISTS condition
fetches all columns, besides, a column may be needed to check other
condition.

When fetching the old row for LWT or to apply updates on list/columns,
we now calculate precisely the list of columns to fetch.

The primary key columns are also included in CAS batch result set,
and are thus also prefetched (the user needs them to figure out which
statements failed to apply).

The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502
but does deviate from the origin in handling of conditions on static
row cells. This is addressed in future series.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
67e68dabf0 lwt: ensure we don't crash when we get a LIKE 2019-10-27 23:42:49 +03:00
Konstantin Osipov
f8f36d066c lwt: check for unsupported collection type in condition element access
We don't support conditions with element access on non-frozen UDTs,
check that only supported collection types are supplied.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
c9f0adf616 lwt: rewrite cql3::raw::column_condition::prepare()
Restructure the code to avoid quite a bit of code duplication.
2019-10-27 23:42:47 +03:00
Konstantin Osipov
c2217df4d8 lwt: reorganize column_condition declaration and add comments 2019-10-27 23:42:03 +03:00
Konstantin Osipov
22b0240fe7 lwt: remove useless code in column_condition.hh
Each column_condition and raw::column_condition construction case had a
static method wrapping its constructor, simply supplying some defaults.

This neither improves clarity nor maintainability.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
3e25b83391 lwt: propagate if_exists condition from the parser to AST
UPDATE ... IF EXISTS is legal, but IF EXISTS condition
was not propagated from the parser to AST (rad::update_statement).
2019-10-27 23:42:03 +03:00
Konstantin Osipov
df28985295 lwt: introduce cql_statment_opt_metadata
cql_statement_opt_metadata is an interim node
in cql (prepared) statement hierarchy parenting
modification_statement and batch_statement. If there
is IF condition in such statements, they return a result set,
and thus have a result set metadata.

The metadata itself is filled in a subsequent patch.
2019-10-27 23:42:03 +03:00
Vladimir Davydov
c8869e803e lwt: remove commented out validateWhereClauseForConditions
This logic was implemented in validate_where_clause_for_conditions()
method of modification_statement class.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
eb5e82c6a1 lwt: add CAS where clause validation
Add checks for conditional modification statement limitations:
- WHERE clustering_key IN (list) IF condition is not supported
  since a conditions is evaluated for a single row/cell, so
  allowing multiple rows to match the WHERE clause would create
  ambiguity,
- the same is true for conditional range deletions.
- ensure all clustering restrictions are eq for conditional delete

  We must not allow statements like

  create table t(p int, c int, v int, primary key (p, c));
  delete from t where p=1 and c>0 if v=1;

  because there may be more than one statement in a partition satisfying
  WHERE clause, in which case it's unclear which of them should satisfy
  IF condition: all or just one.

  Raising an error on such a statement is consistent with Cassandra's
  behavior.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
203eb3eccc lwt: sleep a random amount of time when retrying CAS
Sleep a random interval between 0 and 100 ms before retrying CAS.
Reuse sleep function, make the distribution object thread local.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
0674fab05c lwt: implement storage_proxy::cas()
Introduce service::cas_request abstract base class
which can be used to parameterize Paxos logic.

Implement storage_proxy::cas() - compare and swap - the storage proxy
entry point for lightweight transactions.
2019-10-27 23:42:03 +03:00
Gleb Natapov
70adf65341 storage_proxy: make mutation holder responsible for mutation operation
Currently the code that manipulates mutations during write need to
check what kind of mutations are those and (sometimes) choose different
code paths. This patch encapsulates the differences in virtual
functions of mutation_holder object, so that high level code will not
concern itself with the details. The functions that are added:
apply_locally(), apply_remotely() and store_hint().
2019-10-27 23:21:51 +03:00
Gleb Natapov
b3e01a45d7 lwt: storage_proxy: implement paxos protocol
This patch adds all functionality needed for Paxos protocol. The
implementation does not strictly adhere to Paxos paper since the original
paper allows setting a value only once, while for LWT we need to be able
to make another Paxos round after "learn" phase completes, which requires
things like repair to be introduced.
2019-10-27 23:21:51 +03:00
Gleb Natapov
8d6201a23b lwt: Add RPC verbs needed for paxos implementation
Paxos protocol has three stages: prepare, accept, learn. This patch adds
rpc verb for each of those stages. To be term compatible with Cassandra
the patch calls those stages: prepare, propose, commit.
2019-10-27 23:21:51 +03:00
Gleb Natapov
d1774693bf lwt: Define state needed by paxos and persist it
Paxos protocol relies on replicas having a state that persists over
crashes/restarts. This patch defines such state and stores it in the
database itself in the paxos table to make it persistent.

The stored state is:
  in_progress_ballot    - promised ballot
  proposal              - accepted value
  proposal_ballot       - the ballot of the accepted value
  most_recent_commit    - most recently learned value
  most_recent_commit_at - the ballot of the most recently learned value
2019-10-27 23:21:51 +03:00
Gleb Natapov
15b935b95d lwt: add data structures needed for paxos implementation
This patch add two data structures that will be used by paxos. First
one is "proposal" which contains a ballot and a mutation representing
a value paxos protocol is trying to set. Second one is
"prepare_response" which is a value returned by paxos prepare stage.
It contains currently accepted value (if any) and most recently
learned value (again if any). The later is used to "repair" replicas
that missed previous "learn" message.
2019-10-27 23:21:51 +03:00
Benny Halevy
1895fb276e mutation_test: test_udt_mutations: use int64_t constants for long_type
Otherwise they are decomposed and serialized as 4-byte int32.

For example, on my machine cell[1] looked like this:
{0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}}

and it failed cells_equal against:
{0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}}

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 20:51:29 +02:00
Benny Halevy
fec772538c mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:49:25 +02:00
Benny Halevy
9c8cf9f51d mutation_test: test_udt_mutations: fixup udt comment
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:47:43 +02:00
Benny Halevy
76581e7f14 docs/debugging.md: fix gdb command for retrieving shared libraries information
This correct command is `info sharedlibrary`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191027153541.27286-1-bhalevy@scylladb.com>
2019-10-27 18:15:09 +02:00
Dejan Mircevski
2a136ba1bc alternator: Fix race condition in set_routes()
server::set_routes() was setting the value of server::_callbacks.
This led to a race condition, as set_routes() is invoked on every
shard simultaneously.  It is also unnecessary, since _callbacks can be
initialized in the constructor.

Fixes #5220.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-27 12:31:24 +02:00
Avi Kivity
27ef73f4f1 Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil
"
Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future.
Use this file to trace reads from SSTable data and index files.

Fixes #4908.
"

* 'traced_file' of https://github.com/kbr-/scylla:
  sstables: report sstable index file I/O in CQL tracing
  sstables: report sstable data file I/O in CQL tracing
  tracing: add traced_file class
2019-10-26 22:53:37 +03:00
Avi Kivity
2b856a7317 Merge "Support non-frozen UDTs." from Kamil
"
This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted.

I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code.

Fixes #2201.
"

* 'udt' of https://github.com/kbr-/scylla: (64 commits)
  tests: too many UDT fields check test
  collection_mutation: add a FIXME.
  tests: add a non-frozen UDT materialized view test
  tests: add a UDT mutation test.
  tests: add a non-frozen UDT "JSON INSERT" test.
  tests: add a non-frozen UDT to for_each_schema_change.
  tests: more non-frozen UDT tests.
  tests: move some UDT tests from cql_query_test.cc to new file.
  types: handle trailing nulls in tuples/UDTs better.
  cql3: enable deleting single fields of non-frozen UDTs.
  cql3: enable setting single fields of a non-frozen UDT.
  cql3: enable non-frozen UDTs.
  cql3: introduce user_types::marker.
  cql3: generalize function_call::make_terminal to UDTs.
  cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
  cql3: use a dedicated setter operation for inserting user types.
  cql3: introduce user_types::value.
  types: introduce to_bytes_opt_vec function.
  cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
  cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
  ...
2019-10-26 22:53:37 +03:00
Piotr Sarna
657e7ef5a5 alternator: add alternator health check
The health check is performed simply by issuing a GET request
to the alternator port - it returns the following status 200
response when the server is healthy:

$ curl -i localhost:8000
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 23
Server: Seastar httpd
Date: 21 Oct 2019 12:55:33 GMT

healthy: localhost:8000

This commit comes with a test.
Fixes #5050
Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>
2019-10-26 18:14:18 +03:00
Botond Dénes
01e913397a tests: memtable_test: flush_reader_test: compare compacted mutations
To filter out artificial differences due to different representation of
an equivalent set of writes.

Fixes: #5207

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191024103718.29266-1-bdenes@scylladb.com>
2019-10-26 18:14:18 +03:00
Kamil Braun
432ef7c9af sstables: report sstable index file I/O in CQL tracing
Use tracing::make_traced_file when reading from the index file in
index_reader.
2019-10-25 14:10:28 +02:00
Kamil Braun
394c36835a sstables: report sstable data file I/O in CQL tracing
Use tracing::make_traced_file when creating an sstable input_stream.
To achieve that, trace_state needs to be plumbed down through some
functions.
2019-10-25 14:10:28 +02:00
Kamil Braun
a8c9d1206a tracing: add traced_file class
This is a thin wrapper over the `seastar::file` class which adds
CQL trace messages before and after I/O operations.
2019-10-25 14:10:24 +02:00
Kamil Braun
2889edea3e tests: too many UDT fields check test 2019-10-25 12:05:10 +02:00
Kamil Braun
adfc04ebec collection_mutation: add a FIXME.
We could use iterators over cells instead of a vector of cells
in collection_mutation(_view)_description. Then some use cases could
provide iterators that construct the cells "on the fly".
2019-10-25 12:05:10 +02:00
Kamil Braun
45d2a96980 tests: add a non-frozen UDT materialized view test 2019-10-25 12:05:10 +02:00
Kamil Braun
e0c233ede1 tests: add a UDT mutation test. 2019-10-25 12:05:08 +02:00
Kamil Braun
a21d12faae tests: add a non-frozen UDT "JSON INSERT" test. 2019-10-25 12:04:44 +02:00
Kamil Braun
ae3464da45 tests: add a non-frozen UDT to for_each_schema_change. 2019-10-25 12:04:44 +02:00
Kamil Braun
b87b700e66 tests: more non-frozen UDT tests. 2019-10-25 12:04:44 +02:00
Kamil Braun
474742ac5d tests: move some UDT tests from cql_query_test.cc to new file. 2019-10-25 12:04:44 +02:00
Kamil Braun
612de1f4e3 types: handle trailing nulls in tuples/UDTs better.
Comparing user types after adding new fields was bugged.
In the following scenario:

create type ut (a int);
create table cf (a int primary key, b frozen<ut>);
insert into cf (a, b) values (0, (0));
alter type ut add b int;
select * from cf where b = {a:0,b:null};

the row with a = 0 should be returned, even though the value stored in the database is shorter
(by one null) than the value given by the user. Until now it wouldn't
have.
2019-10-25 12:04:44 +02:00
Kamil Braun
1a9034e38a cql3: enable deleting single fields of non-frozen UDTs.
This was already possible by setting the field to null, but now it
supports the DELETE syntax.
2019-10-25 12:04:44 +02:00
Kamil Braun
4d271051dd cql3: enable setting single fields of a non-frozen UDT.
The commit introduces the necessary modifications to the grammar,
a set_field raw operation, and a setter_by_field operation.
2019-10-25 12:04:44 +02:00
Kamil Braun
e74b5deb5d cql3: enable non-frozen UDTs.
Add a cluster feature for non-frozen UDTs.

If the cluster supports non-frozen UDTs, do not return an error
message when trying to create a table with a non-frozen user type.
2019-10-25 12:04:44 +02:00
Kamil Braun
7ac7a3994d cql3: introduce user_types::marker.
cql3::user_types::marker is a dedicated cql3::abstract_marker for user
type placeholders in prepared CQL queries. When bound, it returns a
user_types::value.
2019-10-25 12:04:44 +02:00
Kamil Braun
36999c94f4 cql3: generalize function_call::make_terminal to UDTs.
Use the dedicated user_types::value.
There is no way this code can be executed now, so I left a TODO.
2019-10-25 12:04:44 +02:00
Kamil Braun
49a7461345 cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
For user types, use its dedicated setter and value.
2019-10-25 12:04:44 +02:00
Kamil Braun
40f9ce2781 cql3: use a dedicated setter operation for inserting user types.
cql3::user_types::setter is a dedicated cql3::operation
for inserting and updating user types. It handles the multi-cell
(non-frozen) case.
2019-10-25 12:04:44 +02:00
Kamil Braun
51be1e3e9d cql3: introduce user_types::value.
This is a dedicated multi_item_terminal for user type values.
Will be useful in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
abe6c2d3d2 types: introduce to_bytes_opt_vec function.
It converts a vector<bytes_view_opt> to a vector<bytes_opt>.
Used in a bunch of places.
2019-10-25 12:04:44 +02:00
Kamil Braun
8ff2aebd76 cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
Previously it returned vector<cql3::raw_value>, even though we don't use
unset values when setting a UDT value (fields that are not provided
become nulls. Thats how C* does it).
This simplifies future implementation of user_types::{value, setter}.
2019-10-25 12:04:44 +02:00
Kamil Braun
f0a3af6adc cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
This is used in error messages and may be useful.
2019-10-25 12:04:44 +02:00
Kamil Braun
c89de228e3 cql3: generalize some error messages to UDTs 2019-10-25 12:04:44 +02:00
Kamil Braun
fd3bc27418 cql3: disallow non-frozen UDTs when creating secondary indexes 2019-10-25 12:04:44 +02:00
Kamil Braun
ff0bd0bb7a cql3: check for nested non-frozen UDTs in create_type_statement. 2019-10-25 12:04:44 +02:00
Kamil Braun
adf857e9ed cql3: add cql3_type::is_user_type.
This will be used in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
6ccb1ee19f cql3: generalize create_table_statement::raw_statement::prepare to UDTs.
Check for UDT with nested non-frozen collection.
Check for UDT with COMPACT STORAGE.
Check for UDT inside PRIMARY KEY.
2019-10-25 12:04:44 +02:00
Kamil Braun
a8c7670722 types: add multi_cell field to user_type_impl.
is_value_compatible_with_internal and update_user_type were generalized
to the non-frozen case.

For now, all user_type_impls in the code are non-multi-cell (frozen).
This will be changed in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
b904d04925 cql3: add a TODO to implement column_conditions for UDTs.
This will become relevant after LWT is implemented.
2019-10-25 12:04:44 +02:00
Kamil Braun
44534a4a0a sstables: generalize some comments to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
b38b8af0f2 schema: generalize compound_name to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
270cf2b289 query-result-set: generalize result_set_builder to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
2ada219f2c view: generalize create_virtual_column and maybe_make_virtual to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
574e1cd514 tests: generalize timestamp_based_spliiting_writer and bucket_writer to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
6da89e40df tests: generalize random_schema.cc:generate_collection to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
0fbfb67cbb tests: generalize mutation_test.cc summaries to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
a3a2f65fbf types: generalize serialize_for_cql to UDTs.
Also introduces a helper "linearized" function, which implements
a pattern occurring in all serialize_for_cql_aux functions.
2019-10-25 12:04:44 +02:00
Kamil Braun
05d4b2e1a4 tests: generalize data_model.cc:mutation_description::build to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
338fde672a mp_row_consumer: generalize consume_cell (kl) and consume_column (mc) to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
5e447e3250 mutation_partition_view: generalize read_collection_cell to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
90927c075a converting_mutation_partition_applier: generalize accept_cell to UDTs. 2019-10-25 12:04:42 +02:00
Kamil Braun
d9baff0e4b collection_mutation: generalize collection_mutation.cc:difference to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
a344019b25 collection_mutation: generalize collection_mutation_view::last_update to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
691f00408d collection_mutation: generalize merge to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
7f5cd8e8ce collection_mutation: generalize collection_mutation_view_description::materialize to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
20b42b1155 collection_mutation: generalize collection_mutation_view::is_any_live to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
323370e4ba collection_mutation: generalize deserialize_collection_mutation to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
393974df3b cql3: make {lists,maps,sets}::value::from_serialized take const {}_type&.
This will simplify the code a bit where from_serialized is used
after switching to visitors. Also reduces the number of shared_ptr
copies.
2019-10-25 10:49:19 +02:00
Kamil Braun
4327bba0db types: introduce (de)serialize_field_index functions.
These functions are used to translate field indices, which are used to
identify fields inside UDTs, from/to a serialized representation to be
stored inside sstables and mutations.
They do it in a way that is compatible with C*.
2019-10-25 10:49:19 +02:00
Kamil Braun
90d05eb627 cql3: reject too long user-defined types 2019-10-25 10:49:19 +02:00
Kamil Braun
0f8f950b74 cql3: optimize multi_item_terminal::get_elements().
Now it returns const std::vector<bytes_opt>& instead of
std::vector<bytes_opt>.
2019-10-25 10:49:19 +02:00
Kamil Braun
4374982de0 types: collection_type_impl::to_value becomes serialize_for_cql.
The purpose of collection_type_impl::to_value was to serialize a
collection for sending over CQL. The corresponding function in origin
is called serializeForNativeProtocol, but the name is a bit lengthy,
so I settled for serialize_for_cql.

The method now became a free-standing function, using the visit
function to perform a dispatch on the collection type instead
of a virtual call. This also makes it easier to generalize it to UDTs
in future commits.

Remove the old serialize_for_native_protocol with a FIXME: implement
inside. It was already implemented (to_value), just called differently.

remove dead methods: enforce_limit and serialized_values. The
corresponding methods in C* are auxiliary methods used inside
serializeForNativeProtocol. In our case, the entire algorithm
is wholly written in serialize_for_cql.
2019-10-25 10:49:19 +02:00
Kamil Braun
e5c0a992ef cql3: make cql3_type::raw::to_string private.
It only needs to be used in operator<<, which is a friend
of cql3_type::raw.
2019-10-25 10:42:58 +02:00
Kamil Braun
ff4d857a9d cql3: remove a dynamic_pointer_cast to user_type_impl.
There exists a method to check if something is a user type:
is_user_type(); use it instead.
2019-10-25 10:42:58 +02:00
Kamil Braun
d8f8908d34 types: introduce user_type_impl::idx_of_field method.
Each field of a user type has its index inside the type.
This method allows to find it easily, which is needed in a bunch of
places.
2019-10-25 10:42:58 +02:00
Kamil Braun
c77643a345 cql3: make cql3_type::_frozen protected. Add is_frozen() method.
Noone modifies _frozen from the outside.
Moving the field to `protected` makes it harder to introduce bugs.
2019-10-25 10:42:58 +02:00
Kamil Braun
d83ebe1092 collection_mutation: move collection_type_impl::difference to collection_mutation.hh. 2019-10-25 10:42:58 +02:00
Kamil Braun
7e3bbe548c collection_mutation: move collection_type_impl::merge to collection_mutation.hh. 2019-10-25 10:42:58 +02:00
Kamil Braun
a41277a7cd collection_mutation: move collection_type_impl::last_update to collection_mutation_view 2019-10-25 10:42:58 +02:00
Kamil Braun
30802f5814 collection_mutation: move collection_type_impl::is_any_live to collection_mutation_view 2019-10-25 10:42:58 +02:00
Kamil Braun
e16ba76c2e collection_mutation: move collection_type_impl::is_empty to collection_mutation_view. 2019-10-25 10:42:58 +02:00
Kamil Braun
bbdb438d89 collection_mutation: easier (de)serialization of collection_mutation(s).
`collection_type_impl::serialize_mutation_form`
became `collection_mutation(_view)_description::serialize`.

Previously callers had to cast their data_type down to collection_type
to use serialize_mutation_form. Now it's done inside `serialize`.
In the future `serialize` will be generalized to handle UDTs.

`collection_type_impl::deserialize_mutation_form`
became a free standing function `deserialize_collection_mutation`
with similiar benefits. Actually, noone needs to call this function
manually because of the next paragraph.

A common pattern consisting of linearizing data inside a `collection_mutation_view`
followed by calling `deserialize_mutation_form` has been abstracted out
as a `with_deserialized` method inside collection_mutation_view.

serialize_mutation_form_only_live was removed,
because it hadn't been used anywhere.
2019-10-25 10:42:58 +02:00
Kamil Braun
e4101679e4 collection_mutation: generalize constructor of collection_mutation to abstract_type.
The constructor doesn't use anything specific to collection_type_impl.
In the future it will also handle non-frozen user types.
2019-10-25 10:42:58 +02:00
Kamil Braun
b1d16c1601 types: move collection_type_impl::mutation(_view) out of collection_type_impl.
collection_type_impl::mutation became collection_mutation_description.
collection_type_impl::mutation_view became collection_mutation_view_description.
These classes now reside inside collection_mutation.hh.

Additional documentation has been written for these classes.

Related function implementations were moved to collection_mutation.cc.

This makes it easier to generalize these classes to non-frozen UDTs in future commits.
The new names (together with documentation) better describe their purpose.
2019-10-25 10:19:45 +02:00
Kamil Braun
c0d3e6c773 atomic_cell: move collection_mutation(_view) to a new file.
The classes 'collection_mutation' and 'collection_mutation_view'
were moved to a separate header, collection_mutation.hh.

Implementations of functions that operate on these classes,
including some methods of collection_type_impl, were moved
to a separate compilation unit, collection_mutation.cc.

This makes it easier to modify these structures in future commits
in order to generalize them for non-frozen User Defined Types.

Some additional documentation has been written for collection_mutation.
2019-10-25 10:19:45 +02:00
Kamil Braun
c90ea1056b Remove mutation_partition_applier.
It had been replaced by partition_builder
in commit dc290f0af7.
2019-10-25 10:19:45 +02:00
Asias He
f32ae00510 gossip: Limit number of pending gossip ACK2 messages
Similar to "gossip: Limit number of pending gossip ACK messages", limit
the number of pending gossip ACK2 messages in gossiper::handle_ack_msg.

Fixes #5210
2019-10-25 12:44:28 +08:00
Asias He
15148182ab gossip: Limit number of pending gossip ACK messages
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.

Fixes #5210
2019-10-25 12:44:28 +08:00
Nadav Har'El
8bffb800e1 alternator: Use system_auth.roles for alternator authorization
Merged patch series from Piotr Sarna:

This series couples system_auth.roles with authorization routines
in alternator. The `salted_hash` field, which is every user's hashed
password, is used as a secret key for the signature generation
in alternator.
This series also adds related expiration verifications for alternator
signatures.
It also comes with more test cases and docs updates.

Tests: alternator(local, remote), manual

Piotr Sarna (11):
  alternator: add extracting key from system_auth.roles
  alternator: futurize verify_signature function
  alternator: move the api handler to a separate function
  alternator: use keys from system_auth.roles for authorization
  alternator: add key cache to authorization
  alternator-test: add a wrong password test
  alternator: verify that the signature has not expired
  alternator: add additional datestamp verification
  alternator-test: add tests for expired signatures
  docs: update alternator entry for authorization
  alternator-test: add authorization to README

 alternator-test/conftest.py                |   2 +-
 alternator-test/test_authorization.py      |  44 ++++++++-
 alternator-test/test_describe_endpoints.py |   2 +-
 alternator/auth.hh                         |  15 ++-
 alternator/server.hh                       |  10 +-
 alternator/auth.cc                         |  62 +++++++++++-
 alternator/server.cc                       | 106 ++++++++++++---------
 alternator-test/README.md                  |  28 ++++++
 docs/alternator/alternator.md              |   7 +-
 9 files changed, 221 insertions(+), 55 deletions(-)
2019-10-23 20:51:08 +03:00
Tomasz Grabiec
e621db591e Merge "Fix TTL serialization breakage" from Avi
ommit 93270dd changed gc_clock to be 64-bit, to fix the Y2038
problem. While 64-bit tombstone::deletion_time is serialized in a
compatible way, TTLs (gc_clock::duration) were not.

This patchset reverts TTL serialization to the 32-bit serialization
format, and also allows opting-in to the 64-bit format in case a
cluster was installed with the broken code. Only Scylla 3.1.0 is
vulnerable.

Fixes #4855

Tests: unit (dev)
2019-10-23 18:23:26 +02:00
Tomasz Grabiec
71720be4f7 Merge "storage_service: Reject nodetool cleanup when there is pending ranges" from Asias
From Shlomi:

4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)

I get the error on c-s once decommission ends
  java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated
The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.

To fix, we should reject the nodetool cleanup when there is pending ranges on that node.

Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.

Refs: #5045
2019-10-23 17:45:41 +02:00
Avi Kivity
2970578677 config: add configuration option for 3.1.0 heritage clusters
Scylla 3.1.0 broke the serialization format for TTLs. Later versions
corrected it, but if a cluster was originally installed as 3.1.0,
it will use the broken serialization forever. This configuration option
allows upgrades from 3.1.0 to succeed, by enabling the broken format
even for later versions.
2019-10-23 18:36:35 +03:00
Avi Kivity
bf4c319399 gc_clock, serialization: define new serialization for gc_clock::duration (aka TTLs)
Scylla 3.1.0 inadvertently changed the serialization format of TTLs
(internally represented as gc_clock::duration) from 32-bit to 64-bit,
as part of preparation for Y2038 (which comes earlier for TTLed cells).
This breaks mutations transported in a mixed cluster.

To fix this, we revert back to the 32-bit format, unless we're in a 3.1.0-
heritage cluster, in which case we use the 64-bit format. Overflow of
a TTL is not a concern, since TTLs are capped to 20 years by the TTL layer.
An assertion is added to verify this.

This patch only defines a variable to indicate we're in
a 3.1.0 heritage cluster, but a way to set it is left to
a later patch.
2019-10-23 18:36:33 +03:00
Avi Kivity
771e028c1a Update seastar submodule
* seastar 6bcb17c964...2963970f6b (4):
  > Merge "IPv6 scope support and network interface impl" from Calle
  > noncopyable_function: do not copy uninitialized data
  > Merge "Move smp and smp queue out of reactor" from Asias
  > Consolidate posix socket implementations
2019-10-23 16:43:02 +03:00
Piotr Sarna
472e3cb4e1 alternator-test: add authorization to README
The README paragraph informs about turning on authorization with:
   alternator-enforce-authorization: true
and has a short note on how to set up the secret key for tests.
2019-10-23 15:05:39 +02:00
Piotr Sarna
280eb28324 docs: update alternator entry for authorization
The document now mentions that secret keys are extracted
from the system_auth.roles table.
2019-10-23 15:05:39 +02:00
Piotr Sarna
ebb0af3500 alternator-test: add tests for expired signatures
The first test case ensures that expired signatures are not accepted,
while the second one checks that signatures with dates that reach out
too far into the future are also refused.
2019-10-23 15:05:39 +02:00
Piotr Sarna
a0a33ae4f3 alternator: add additional datestamp verification
The authorization signature contains both a full obligatory date header
and a shortened datestamp - an additional verification step ensures that
the shortened stamp matches the full date.
2019-10-23 15:05:39 +02:00
Piotr Sarna
718cba10a1 alternator: verify that the signature has not expired
AWS signatures have a 15min expiration policy. For compatibility,
the same policy is applied for alternator requests. The policy also
ensures that signatures expanding more than 15 minutes into the future
are treated as unsafe and thus not accepted.
2019-10-23 15:05:39 +02:00
Piotr Sarna
e90c4a8130 alternator-test: add a wrong password test
The additional test case submits a request as a user that is expected
to exist (in the local setup), but the provided password is incorrect.
It also updates test_wrong_key_access so it uses an empty string
for trying to authenticate as an inexistent user - in order to cover
more corner cases.
2019-10-23 15:05:39 +02:00
Piotr Sarna
524b03dea5 alternator: add key cache to authorization
In order to avoid fetching keys from system_auth.roles system table
on every request, a cache layer is introduced. And in order not to
reinvent the wheel, the existing implementation of loading_cache
with max size 1024 and a 1 minute timeout is used.
2019-10-23 15:05:39 +02:00
Piotr Sarna
6dee7737d7 alternator: use keys from system_auth.roles for authorization
Instead of having a hardcoded secret key, the server now verifies
an actual key extracted from system_auth.roles system table.
This commit comes with a test update - instead of 'whatever':'whatever',
the credentials used for a local run are 'alternator':'secret_pass',
which matches the initial contents of system_auth.roles table,
which acts as a key store.

Fixes #5046
2019-10-23 15:05:39 +02:00
Piotr Sarna
388b492040 alternator: move the api handler to a separate function
The lambda used for handling the api request has grown a little bit
too large, so it's moved to a separate method. Along with it,
the callbacks are now remembered inside the class itself.
2019-10-23 15:05:39 +02:00
Piotr Sarna
a93cf12668 alternator: futurize verify_signature function
The verify_signature utility will later be coupled with Scylla
authorization. In order to prepare for that, it is first transformed
into a function that returns future<>, and it also becomes a member
of class server. The reason it becoming a member function is that
it will make it easier to implement a server-local key cache.
2019-10-23 15:05:39 +02:00
Piotr Sarna
dc310baa2d alternator: add extracting key from system_auth.roles
As a first step towards coupling alternator authorization with Scylla
authorization, a helper function for extracting the key (salted_hash)
belonging to the user is added.
2019-10-23 15:05:39 +02:00
Asias He
f876580740 storage_service: Reject nodetool cleanup when there is pending ranges
From Shlomi:

4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)

I get the error on c-s once decommission ends
  java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated

The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.

To fix, we should reject the nodetool cleanup when there is pending ranges on that node.

Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.

Refs: #5045
2019-10-23 19:20:36 +08:00
Asias He
a39c8d0ed0 Revert "storage_service: remove storage_service::_is_bootstrap_mode."
It will be needed by "storage_service: Reject nodetool cleanup when
there is pending ranges"

This reverts commit dbca327b46.
2019-10-23 19:20:36 +08:00
Raphael S. Carvalho
fc120a840d compaction: dont rely on undefined behavior when making garbage collected writer
Argument evaluation order is UB, so it's not guaranteed that
c->make_garbage_collected_sstable_writer() is called before
compaction is moved to run().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191023052647.9066-1-raphaelsc@scylladb.com>
2019-10-23 11:04:51 +03:00
Benny Halevy
3b3611b57a mutation_diff: standard input support
Also, not that the file name is properly quoted
it may contain space characters.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:58 +03:00
Benny Halevy
6feb4d5207 mutation_diff: accept diff_command option
To support using other diff tools than colordiff

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:47 +03:00
Tomasz Grabiec
dfac542466 Merge "extend multi-cell list & set type support" from Kostja
Make it possible to compare multi-cell lists and sets serialized
as maps with literal values and serialize them to network using
a standard format (vector of values).

This is a pre-requisite patch for column condition evaluation
in light-weight transactions.
2019-10-23 07:39:57 +03:00
Nadav Har'El
774f8aa4b8 docs/debugging.md: add guide on how to debug cores
Merged patch series from Botond Dénes:

This series extends the existing docs/debugging.md with a detailed guide
on how to debug Scylla coredumps. The intended target audience is
developers who are debugging their first core, hence the level of
details (hopefully enough). That said this should be just as useful for
seasoned debuggers just quickly looking up some snippet they can't
remember exactly. A Throubleshooting chapter is also added in this
series for commonly-met problems.

I decided to create this guide after myself having struggled for more
than a day on just opening(!) a coredump that was produced on Ubuntu.
As my main source, I used the How-to-debug-a-coredump page from the
internal wiki which contains many useful information on debugging
coredumps, however I found it to be missing some crucial information, as
well being very terse, thus being primarily useful for experienced
debuggers who can fill in the blanks. The reason I'm not extending said
wiki page is that I think this information should not be hidden in some
internal wiki page. Also, docs/debugging.md now seems to be a much
better base for such a document. This document was started as a
comprehensive debugging manual for beginners (but not just).

You will notice that the information on how to debug cores from
CentOS/Redhat are quite sparse. This is because I have no experience
with such cores, so for now the respective chapters are just stubs. I
intend to complete them in the future after having gained the necessary
experience and knowledge, however those being in possession of said
knowledge are more then welcome to send a patch. :)

Botond Dénes (4):
	docs/debugging.md: demote 'Starting GDB' and 'Using GDB'
	docs/debugging.md: fix formatting issues
	docs/debugging.md: add 'Debugging coredumps' subchapter
	docs/debugging.md: add 'Throubleshooting' subchapter

  docs/debugging.md | 240 +++++++++++++++++++++++++++++++++++++++++++---
  1 file changed, 228 insertions(+), 12 deletions(-)
2019-10-23 07:39:57 +03:00
Rafael Ávila de Espíndola
b3372be679 install-dependencies: Add Lua
Add lua as a dependency in preparation for UDF. This is the first
patch since it has to go in before to allow for a frozen toolchain
update.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
[avi: update frozen toolchain image]
Message-Id: <20191018231442.11864-2-espindola@scylladb.com>
2019-10-23 07:39:57 +03:00
Konstantin Osipov
a30c08e04e lwt: support for multi-cell set & list value serialization 2019-10-22 17:40:42 +03:00
Piotr Jastrzebski
eb8ae06ced cdc: Return db_context::builder by reference
from it's with_* functions.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-22 17:13:43 +03:00
Konstantin Osipov
605755e3f6 lwt: support for multi-cell map & list comparison with literal values
Multi-cell lists and maps may be stored in different formats: as sorted
vectors of pairs of values, when retreived from storage, or as sorted
vectors of values, when created from parser literals or supplied as
parameter values.

Implement a specialized compare for use when receiver and paramter
representation don't match.

Add helpers.
2019-10-22 17:07:33 +03:00
Raphael S. Carvalho
3b6583990d sstables: Fix sluggish backlog controller with incremental compaction
The problem is that backlog tracker is not being updated properly after
incremental compaction.
When replacing sstables earlier, we tell backlog tracker that we're done
with exhausted sstables[1], but we *don't* tell it about the new, sealed
sstables created that will replace the exhausted ones.
[1]: exhausted sstable is one that can be replaced earlier by compaction.
We need to notify backlog tracker about every sstable replacement which
was triggered by incremental compaction.
Otherwise, backlog for a table that enables incremental compaction will
be lower than it actually should. That's because new sstables being
tracked as partial decrease the backlog, whereas the exhausted ones
increase it.
The formula for a table's backlog is basically:
backlog(sstable set + compacting(1) - partial(2))
(1) compacting includes all compaction's input sstables, but the
exhausted ones are removed from it (correct behavior).
(2) partial includes all compaction's output sstables, but the ones
that replaced the exhausted sstables aren't removed from it (incorrect
behavior).
This problem is fixed by making backlog track *fully* aware of the early
replacement, not only the exhausted sstables, but also the new sstables
that replaced the exhausted ones. The new sstables need to be moved
inside the tracker from partial state to the regular one.

Fixes #5157.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191016002838.23811-1-raphaelsc@scylladb.com>
2019-10-22 16:19:57 +03:00
Vladimir Davydov
6c6689f779 cql: refactor statement accounting
Rather than passing a pointer to a cql_stats member corresponding to
the statement type, pass a reference to a cql_stats object and use
statement_type, which is already stored in modification_statement, for
determining which counter to increment. This will allow us to account
conditional statements, which will have a separate set of counters,
right in modification_statement::execute() - all we'll need to do is
add the new counters and bump them in case execute_with_condition is
called.

While we are at it, remove extra inclusions from statement_type.hh so as
not to introduce any extra dependencies for cql_stats.hh users.

Message-Id: <20191022092258.GC21588@esperanza>
2019-10-22 12:39:14 +03:00
Nadav Har'El
51fc6c7a8e make static_row optional to reduce memory footprint
Merged patch series from Avi Kivity:

The static row can be rare: many tables don't have them, and tables
that do will often have mutations without them (if the static row
is rarely updated, it may be present in the cache and in readers,
but absent in memtable mutations). However, it always consumes ~100
bytes of memory, even if it not present, due to row's overhead.

Change it to be optional by allocating it as an external object rather
than inlined into mutation_partition. This adds overhead when the
static row is present (17 bytes for the reference, back reference,
and lsa allocator overhead).

perf_simple_query appears to marginally (2%) faster. Footprint is
reduced by ~9% for a cache entry, 12% in memtables. More details are
provided in the patch commitlog.

Tests: unit (debug)

Avi Kivity (4):
  managed_ref: add get() accessor
  managed_ref: add external_memory_usage()
  mutation_partition: introduce lazy_row
  mutation_partition: make static_row optional to reduce memory
    footprint

 cell_locking.hh                          |   2 +-
 converting_mutation_partition_applier.hh |   4 +-
 mutation_partition.hh                    | 284 ++++++++++++++++++++++-
 partition_builder.hh                     |   4 +-
 utils/managed_ref.hh                     |  12 +
 flat_mutation_reader.cc                  |   2 +-
 memtable.cc                              |   2 +-
 mutation_partition.cc                    |  45 +++-
 mutation_partition_serializer.cc         |   2 +-
 partition_version.cc                     |   4 +-
 tests/multishard_mutation_query_test.cc  |   2 +-
 tests/mutation_source_test.cc            |   2 +-
 tests/mutation_test.cc                   |  12 +-
 tests/sstable_mutation_test.cc           |  10 +-
 14 files changed, 355 insertions(+), 32 deletions(-)
2019-10-22 12:25:15 +03:00
Avi Kivity
bc03b0fd47 Merge "Some refactoring of node startup code" from Kamil
"
The node startup code (in particular the functions storage_service::prepare_to_join and storage_service::join_token_ring) is complicated and hard to understand.

This patch set aims to simplify it at least a bit by removing some dead code, moving code around so it's easier to understand and adding some comments that explain what the code does.
I did it to help me prepare for implementing generation and gossiping of CDC streams.
"

* 'bootstrap-refactors' of https://github.com/kbr-/scylla:
  storage_service: more comments in join_token_ring
  db: remove system_keyspace::update_local_tokens
  db: improve documentation for update_tokens and get_saved_tokens in system_keyspace
  storage_service: remove storage_service::_is_bootstrap_mode.
  storage_service: simplify storage_service::bootstrap method
  storage_service: fix typo in handle_state_moving
  storage_service: remove unnecessary use of stringstream
  storage_service: remove redundant call to update_tokens during join_token_ring
  storage_service: remove storage_service::set_tokens method.
  storage_service: remove is_survey_mode
  storage_service::handle_state_normal: tokens_to_update* -> owned_tokens
  storage_service::handle_state_normal: remove local_tokens_to_remove
  db::system_keyspace::update_tokens: take tokens by const ref
  db::system_keyspace::prepare_tokens: make static, take tokens by const ref
  token_metadata::update_normal_tokens: take tokens by const ref
2019-10-22 12:11:11 +03:00
Asias He
0a52ecb6df gossip: Fix max generation drift measure
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes #5164
2019-10-21 20:20:55 +02:00
Kamil Braun
f1c26bf5c9 storage_service: more comments in join_token_ring
Explain why a call to update_normal_tokens is needed.
2019-10-21 11:11:03 +02:00
Kamil Braun
fb1e35f032 db: remove system_keyspace::update_local_tokens
That was dead code.
2019-10-21 11:11:03 +02:00
Kamil Braun
1b0c8e5d99 db: improve documentation for update_tokens and get_saved_tokens in system_keyspace 2019-10-21 11:11:03 +02:00
Kamil Braun
dbca327b46 storage_service: remove storage_service::_is_bootstrap_mode.
The flag did nothing. It was used in one place to check if there's a
bug, but it can easily by proven by reading the code that the check
would never pass.
2019-10-21 11:11:03 +02:00
Kamil Braun
b757a19f84 storage_service: simplify storage_service::bootstrap method
The storage_service::bootstrap method took a parameter: tokens to
bootstrap with. However, this method is only called in one place
(join_token_ring) with only one parameter: _bootstrap_tokens. It doesn't
make sense to call this method anywhere else with any other parameter.

This commit also adds a comment explaining what the method does and
moves it into the private section of storage_service.
2019-10-21 11:11:03 +02:00
Kamil Braun
84b41bd89b storage_service: fix typo in handle_state_moving 2019-10-21 11:11:03 +02:00
Kamil Braun
2ff4f9b8f4 storage_service: remove unnecessary use of stringstream 2019-10-21 11:11:03 +02:00
Kamil Braun
06cc7d409d storage_service: remove redundant call to update_tokens during join_token_ring
When a non-seed node was bootstrapping, system_keyspace::update_tokens
was called twice: first right after the tokens were generated (or
received if we were replacing a different node) in the call to
`bootstrap`, and then later in join_token_ring. The second call was
redundant.

The join_token_ring call was also redundant if we were not bootstrapping
and had tokens saved previously (e.g. when restarting). In that case we
would have read them from LOCAL and then save the same tokens again.

This commit removes the redundant call and inserts calls to
update_tokens where they are necessary, when new tokens are generated.
The aim is to make the code easier to understand.

It also adds a comment which explains why the tokens don't need to be
generated in one of the cases.
2019-10-21 11:11:03 +02:00
Kamil Braun
a223864f81 storage_service: remove storage_service::set_tokens method.
After commit 36ccf72f3c, this method
was used only in one place.
Its name did not make it obvious what it does and when is it safe to call it.
This commit pulls out the code from set_tokens to the point where it was
called (join_token_ring). The code is only possible to understand in
context.

This code was also saving the tokens to the LOCAL table before
retrieving them from this table again. There is no point in doing that:
1. there are no races, since when join_token_ring is running, it is the
only function which can call system_keyspace::update_tokens (which saves them to the
LOCAL table). There can be no multiple instances of join_token_ring.
2. Even if there was a race, this wouldn't fix anything. The tokens we
retrieve from LOCAL by calling get_local_tokens().get0() could already
be different in the LOCAL table when the get0() returns.
2019-10-21 11:09:59 +02:00
Kamil Braun
36ccf72f3c storage_service: remove is_survey_mode
That was dead, untested code, making it unnecessarily hard
to implement new features.
2019-10-21 10:38:49 +02:00
Kamil Braun
602c7268cc storage_service::handle_state_normal: tokens_to_update* -> owned_tokens
Replace the two variables:
    tokens_to_update_in_metadata
    tokens_to_update_in_system_keyspace
which were exactly the same, with one variable owned_tokens.
The new name describes what the variable IS instead what's it used for.

Add a comment to clarify what "owned" means: those are the tokens the
node chose and any collision was resolved positively for this node.

Move the variable definition further down in the code, where it's
actually needed.
2019-10-21 10:38:49 +02:00
Kamil Braun
2db07c697f storage_service::handle_state_normal: remove local_tokens_to_remove
That was dead code.
Removing tokens is handled inside remove_endpoint, using the
endpoints_to_remove set.
2019-10-21 10:38:49 +02:00
Kamil Braun
8c8a17a0fe db::system_keyspace::update_tokens: take tokens by const ref 2019-10-21 10:38:49 +02:00
Kamil Braun
00dcea3478 db::system_keyspace::prepare_tokens: make static, take tokens by const ref 2019-10-21 10:38:49 +02:00
Kamil Braun
e4ac4db1c5 token_metadata::update_normal_tokens: take tokens by const ref 2019-10-21 10:38:45 +02:00
Nadav Har'El
765dc86de4 Fix legacy token column handling for local indexes
Merged patch series from Piotr Sarna:

Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.

Branch: master, 3.1

Tests: unit(dev) + manual confirmation that a correct legacy column is picked
2019-10-20 16:04:40 +03:00
Nadav Har'El
631846a852 CDC: Implement minimal version that logs only primary key of each change
Merge a patch series from Piotr Jastrzębski (haaawk):

This PR introduces CDC in it's minimal version.

It is possible now to create a table with CDC enabled or to enable/disable
CDC on existing table. There is a management of CDC log and description
related to enabling/disabling CDC for a table.

For now only primary key of the changed data is logged.

To be able to co-locate cdc streams with related base table partitions it
was needed to propagate the information about the number of shards per node.
This was node through gossip.

There is an assumption that all the nodes use the same value for
sharding_ignore_msb_bits. If it does not hold we would have to gossip
sharding_ignore_msb_bits around together with the number of shards.

Fixes #4986.

Tests: unit(dev, release, debug)
2019-10-20 11:41:01 +03:00
Botond Dénes
4aa734f238 scylla-gdb.py: scylla generate_object_graph: use correct obj in edges
Currently, the function that generates the graph edges (and vertices)
with a breadth-first traversal of the object graph accidentally uses the
object that is the starting point of the graph as the "to" part of each
edge. This results in the graph having each of its edges point to the
starting point, as if all objects in it referenced said object directly.
Fix by using the object of the currently examined object.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191018113019.95093-1-bdenes@scylladb.com>
2019-10-18 13:48:20 +02:00
Botond Dénes
4dff50b7a4 docs/debugging.md: add 'Throubleshooting' subchapter
To the 'Debugging Scylla with GDB' chapter.
2019-10-18 10:08:23 +03:00
Botond Dénes
77ea086975 docs/debugging.md: add 'Debugging coredumps' subchapter
To the 'Debuggin Scylla with GDB` chapter. The '### Debugging
relocatable binaries built with the toolchain' subchapter is demoted to
be just a section in this new subchapter. It is also renamed to
'Relocatable binaries'.
This subchapter intends to be a complete guide on how to debug coredumps
from how to obtain the correct version of all the binaries all the way
to how to correctly open the core with GDB.
2019-10-18 10:08:23 +03:00
Pekka Enberg
f01d0e011c Update seastar submodule
* seastar e888b1df...6bcb17c9 (4):
  > iotune: don't crash in sequential read test if hitting EOF
  > Remove FindBoost.cmake from install files
  > Merge "Move reactor backend out of reactor" from Asias
  > fair_queue: Add fair_queue.cc
2019-10-18 08:45:22 +03:00
Piotr Jastrzebski
2b26e3c904 test: change test_partition_key_logging to test_primary_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
997be35ef3 modification_statement: log in cdc clustering key of a change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
d8718a4ffc test: add test_partition_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
96c800ed0b modification_statement: log in cdc partition key of a change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
a1edb68b16 test: check that alter table with cdc manages log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
a45c894032 alter_table_statement: handle 'with cdc ='
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
629cdb5065 test: check that drop table with cdc removes log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
57c3377b1f cql_test_env: add require_table_does_not_exist assertion
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
50d53cd43e drop_table_statement: remove cdc log and desc if cdc is enabled
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
b9d6635fc5 test: check that create table with cdc sets up log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
81a34168a3 create_table_statement: handle 'with cdc ='
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:14 +02:00
Piotr Jastrzebski
6e29f5e826 create_table_statement: prepare announce_migration for cdc
This patch wrapps announce_migration logic into a lambda
that will be used both when cdc is used and when it's not.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
a9e43f4e86 test: add test_with_cdc_parameter
At the moment, this test only checks that table
creation and alteration sets cdc_options property
on a table correctly.

Future patches will extend this test to cover more
CDC aspects.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
8c6d860402 cql3: add cdc table property
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
386221da84 schema_tables: handle 'cdc' options
cdc options will be stored in scylla_tables to preserve
compatibility with Cassandra.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
8df942a320 schema_builder: handle schema::_cdc_options
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
ca9536a771 schema: add _cdc_options field
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
f079dce7b1 snitch: Provide getter for ignore_msb_bits of an endpoint
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
afe520ad77 gossip: Add application_state::IGNORE_MSB_BITS
We would like to share with other nodes
the value of ignore_msb_bits property used by the node.

This is needed because CDC will operate on
streams of changes. Each shard on each node
will have its own stream that will be identified
by a stream_id. Stream_id will be selected in
such a way that using stream_id as partition key
will locate partition identified by stream_id on
a node and shard that the stream belongs to.

To be able to generate such stream_id we need
to know ignore_msb_bits property value for each node.

IMPORTANT NOTE: At this point CDC does not support
topology changes. It will work only on a stable cluster.
Support for topology modifications will be added in
later steps.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
b9d5851830 snitch: Provide getter for shard_count of an endpoint
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
a66d7cfe57 gossip: Add application_state::SHARD_COUNT
We would like to share with other nodes
the number of shards available at the node.

This is needed because CDC will operate on
streams of changes. Each shard on each node
will have its own stream that will be identified
by a stream_id. Stream_id will be selected in
such a way that using stream_id as partition key
will locate partition identified by stream_id on
a node and shard that the stream belongs to.

To be able to generate such stream_id we need
to know how many shards are on each node.

IMPORTANT NOTE: At this point CDC does not support
topology changes. It will work only on a stable cluster.
Support for topology modifications will be added in
later steps.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
f7ce8e4f2b cdc: Add flag guarding it's usage
At first, CDC will only be enabled when experimental flag is on.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Tomasz Grabiec
d7c3e48e8c Merge "Prepare modification_statement for LWT" from Kostja
Refactor modification_statement to enable lightweight
transaction implementation.

This patch set re-arranges logic of
modification_statement::get_mutations() and uses
a column mask of identify the columns to prefetch.
It also pre-computes a few modification statement properties
at prepare, assuming the prepared statement is invalidated if
the underlying schema changes.
2019-10-17 10:51:00 +02:00
Konstantin Osipov
5d3bf03811 lwt: pre-compute modification_statement properties at prepare
They are used more extensively with introduction of lightweight
transactions, and pre-computing makes it easier to reason about
complexity of the scenarios where they are involved.
2019-10-16 22:44:44 +03:00
Konstantin Osipov
6e0f76ea60 lwt: use column mask to build partition_slice
Pre-compute column mask of columns to prefetch when preparing
a modification statement and use it to build partition_slice
object for read command. Fetch only the required columns.

Ligthweight transactions build up on this by using adding
columns used in conditions and in cas result set to the column
maks of columns to read. Batch statements unite all column
masks to build a single relation for all rows modified by
conditional statements of a batch.
2019-10-16 22:44:37 +03:00
Konstantin Osipov
f32a7a0763 lwt: move option set for modification statement read command
Move the option set for read command to update_parameters
class, since this class encapsulates the logic of working
with the read command result.
2019-10-16 22:41:00 +03:00
Konstantin Osipov
c0f0ab5edd lwt: introduce column mask
Introduce a bitset container which can be used to compute
all columns used in a query.

Add a partition_slice constructor which uses the bitset.
2019-10-16 22:40:55 +03:00
Konstantin Osipov
a00b9a92b3 lwt: refactor modification statement get_mutations()
Refactor get_mutations() so that the read command and
apply_updates() functions can be used in lightweight transactions.

Move read_command creation to an own method, as well as apply_updates().
Rewrite get_mutations() using the new API.

Avoid unnecessary shared pointers.
2019-10-16 22:32:51 +03:00
Tomasz Grabiec
7b7e4be049 Merge "lwt: introduce column_definition::ordinal_id" from Kostja
Introduce a column definition ordinal_id and use it in boosted
update_parameters::prefetch_data as a column index of a full row.

Lightweight transactions prefetch data and return a result set.
Make sure update_parameters::prefetch_data can serve as a
single representation of prefetched list cells as well as
condition cells and as a CAS result set.

I have a lot of plans for column_definition::ordinal_id, it
simplifies a lot of operations with columns and will also be
used for building a bitset of columns used in a query
or in multiple queries of a batch.
2019-10-16 15:11:10 +02:00
Konstantin Osipov
a2b629c3a1 lwt: boost update_parameters to serve as a CAS result set
In modification_statement/batch_statement, we need to prefetch data to
1) apply list operations
2) evaluate CAS conditions
3) return CAS result set.

Boost update_parameters::prefetch_data to serve as a single result set
for all of the above. In case of a batch, store multiple rows for
multiple clustering keys involved in the batch.

Use an ordered set for columns and rows to make sure 3) CAS result set
is returned to the client in an ordered manner.

Deserialize the primary key and add it to result set rows since
it is returned to the client as part of CAS result set.

Index columns using ordinal_id - this allows having a single
set for all columns and makes columns easy to look up.

Remove an extra memcpy to build view objects when looking
up a cell by primary key, use partition_key/clustering_key
objects for lookup.
2019-10-16 15:56:50 +03:00
Konstantin Osipov
a450c25946 lwt: remove dead code in cql3/update_parameters.hh 2019-10-16 15:48:40 +03:00
Konstantin Osipov
a4ccbece5c lwt: remove an unnecessary optional around prefetch_data
Get rid of an unnecessary optional around
update_parameters::prefetch_data.

update_parameters won't own prefetch_data in the future anyway,
since prefetch_data can be shared among multiple modification
statements of a batch, each statement having its own options
and hence its own update_parameters instance.
2019-10-16 15:48:25 +03:00
Konstantin Osipov
7a399ebe0d lwt: move prefetch_data_builder to update_parameters.cc
Move prefetch_data_builder class from modification_statement.cc
to update_parameters.cc.

We're going to share the same builder to build a result set
for condition evaluation and to apply updates of batch statements, so we
need to share it.

No other changes.
2019-10-16 15:48:08 +03:00
Konstantin Osipov
fa73421198 lwt: introduce column_definition::ordinal_id
Make sure every column in the schema, be it a column of partition
key, clustering key, static or regular one, has a unique ordinal
identifier.

This makes it easy to compute the set of columns used in a query,
as well as index row cells.

Allow to get column definition in schema by ordinal id.
2019-10-16 15:46:25 +03:00
Avi Kivity
543e6974b9 Merge "Fix Incremental Compaction Efficiency" from Raphael
"
Incremental compaction code to release exhausted sstables was inefficient because
it was basically preventing any release from ever happening. So a new solution is
implemented to make incremental compaction approach actually efficient while
being cautious about not introducing data resurrection. This solution consists of
storing GC'able tombstones in a temporary sstable and keeping it till the end of
compaction. Overhead is avoided by not enabling it to strategies that don't work
with runs composed of multiple fragments.

Fixes #4531.

tests: unit, longevity 1TB for incremental compaction
"

* 'fix_incremental_compaction_efficiency/v6' of https://github.com/raphaelsc/scylla:
  tests: Check that partition is not resurrected on compaction failure
  tests: Add sstable compaction test for gc-only mutation compactor consumer
  sstables: Fix Incremental Compaction Efficiency
2019-10-16 15:15:53 +03:00
Tomasz Grabiec
054b53ac06 Merge "Introduce scylla generate_object_graph and improve scylla find and scylla fiber" from Botond
Introduce `scylla generate_object_graph`, a command which generates a
visual object graph, where vertices are objects and edges are
references. The graph starts from the object specified by the user. The
graph allows visual inspection of the object graph and hopefully allows
the user to identify the object in question.

Add the `--resolve` flag to `scylla find`. When specified, `scylla find`
will attempt to resolve the first pointer in the found objects as a vtable
pointer. If successful the pointer as well as the resolved  symbol will
be added to the listing.

In the listing of `scylla fiber` also print the starting task (as the
first item).
2019-10-15 20:11:16 +02:00
Tomasz Grabiec
c76f905497 Merge "scylla-gdb.py: improve the toolbox for investigating OOMs (but not just)" from Botond
This mini-series contains assorted improvements that I found very useful
while debugging OOM crashes in the past weeks:
* A wrapper for `std::list`.
* A wrapper for `std::variant`.
* Making `scylla find` usable from python code.
* Improvements to `scylla sstables` and `scylla task_histogram`
  commands.
* The `$downcast_vptr()` convenience function.
* The `$dereference_lw_shared_ptr()` convenience function.

Convenience functions in gdb are similar to commands, with some key
differences:
* They have a defined argument list.
* They can return values.
* They can be part of any gdb expression in which functions are allowed.

This makes them very useful for doing operations on values then
returning them so that the developer can use it the gdb shell.
2019-10-15 19:54:09 +02:00
Avi Kivity
acc433b286 mutation_partition: make static_row optional to reduce memory footprint
The static row can be rare: many tables don't have them, and tables
that do will often have mutations without them (if the static row
is rarely updated, it may be present in the cache and in readers,
but absent in memtable mutations). However, it always consumes ~100
bytes of memory, even if it not present, due to row's overhead.

Change it to be optional by using lazy_row instead of row. Some call
sites treewide were adjusted to deal with the extra indirection.

perf_simple_query appears to improve by 2%, from 163krps to 165 krps,
though it's hard to be sure due to noisy measurements.

memory_footprint comparisons (before/after):

mutation footprint:		       mutation footprint:
 - in cache:	 1096		        - in cache:	992
 - in memtable:	 854		        - in memtable:	750
 - in sstable:	 351		        - in sstable:	351
 - frozen:	 540		        - frozen:	540
 - canonical:	 827		        - canonical:	827
 - query result: 342		        - query result: 342

 sizeof(cache_entry) = 112	        sizeof(cache_entry) = 112
 -- sizeof(decorated_key) = 36	        -- sizeof(decorated_key) = 36
 -- sizeof(cache_link_type) = 32        -- sizeof(cache_link_type) = 32
 -- sizeof(mutation_partition) = 200    -- sizeof(mutation_partition) = 96
 -- -- sizeof(_static_row) = 112        -- -- sizeof(_static_row) = 8
 -- -- sizeof(_rows) = 24	        -- -- sizeof(_rows) = 24
 -- -- sizeof(_row_tombstones) = 40     -- -- sizeof(_row_tombstones) = 40

 sizeof(rows_entry) = 232	        sizeof(rows_entry) = 232
 sizeof(lru_link_type) = 16	        sizeof(lru_link_type) = 16
 sizeof(deletable_row) = 168	        sizeof(deletable_row) = 168
 sizeof(row) = 112		        sizeof(row) = 112
 sizeof(atomic_cell_or_collection) = 8  sizeof(atomic_cell_or_collection) = 8

Tests: unit (dev)
2019-10-15 15:42:05 +03:00
Avi Kivity
88613e6882 mutation_partition: introduce lazy_row
lazy_row adds indirection to the row class, in order to reduce storage requirements
when the row is not present. The intent is to use it for the static row, which is
not present in many schemas, and is often not present in writes even in schemas that
have a static row.

Indirection is done using managed_ref, which is lsa-compatible.

lazy_row implements most of row's methods, and a few more:
 - get(), get_existing(), and maybe_create(): bypass the abstraction and the
   underlying row
 - some methods that accept a row parameter also have an overload with a lazy_row
   parameter
2019-10-15 15:42:05 +03:00
Avi Kivity
efe8fa6105 managed_ref: add external_memory_usage()
Like other managed containers, add external_memory_usage() so we can account
for a partition's memory footprint in memtable/cache.
2019-10-15 15:41:42 +03:00
Botond Dénes
71923577a4 docs/debugging.md: fix formatting issues 2019-10-15 14:40:24 +03:00
Botond Dénes
4babd116d8 docs/debugging.md: demote 'Starting GDB' and 'Using GDB'
They really belong to the 'Introduction' chapter, instead of being
separate chapters of their own.
2019-10-15 14:40:20 +03:00
Pekka Enberg
0c1dad0838 Merge "Misc documentation cleanup" from Botond
"Delete README-DPDK.md, move IDL.md to docs/ and fix
docs/review-checklist.md to point to scylla's coding style document,
instead of seastar's."

* 'documentation-cleanup/v3' of https://github.com/denesb/scylla:
  docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's
  docs: mv coding-style.md docs/
  rm README-DPDK.md
  docs: mv IDL.md docs/
2019-10-15 12:53:49 +02:00
Pekka Enberg
b466d7ee33 Merge "Misc documentation cleanup" from Botond
"Delete README-DPDK.md, move IDL.md to docs/ and fix
docs/review-checklist.md to point to scylla's coding style document,
instead of seastar's."

* 'documentation-cleanup/v3' of https://github.com/denesb/scylla:
  docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's
  docs: mv coding-style.md docs/
  rm README-DPDK.md
  docs: mv IDL.md docs/
2019-10-15 08:53:22 +03:00
Benny Halevy
fef3342a34 test: random_schema::make_ckeys: fix inifinte loop
Allow returning fewer random clustering keys than requested since
the schema may limit the total number we can generate, for example,
if there is only one boolean clustering column.

Fixes #5161

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-15 08:52:39 +03:00
Botond Dénes
544f38ea6d docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's 2019-10-15 08:23:08 +03:00
Botond Dénes
56df6fbd58 docs: mv coding-style.md docs/
It is not discoverable in its current location (root directory) due to
the sheer number of source files in there.
2019-10-15 08:23:08 +03:00
Botond Dénes
c0706e52ce rm README-DPDK.md
Probably a leftover from the era when seastar and scylla shared the same
git repo.
2019-10-15 08:23:01 +03:00
Botond Dénes
061ac53332 docs: mv IDL.md docs/
Documentations should be in docs/.
2019-10-15 08:21:09 +03:00
Piotr Sarna
9e98b51aaa view: fix view_info select statement for local indexes
Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.
2019-10-14 17:14:19 +02:00
Piotr Sarna
2ee8c6f595 index: add is_global_index() utility
The helper function is useful for determining if given schema
represents a global index.
2019-10-14 17:13:32 +02:00
Botond Dénes
b2e10a3f2f scylla-gdb.py: introduce scylla generate_object_graph
When investigating OOM:s a prominent pattern is a size class that is
exploded, using up most of the available memory alone. If one is lucky,
the objects causing the OOM are instances of some virtual class, making
their identification easy. Other times the objects are referenced by
instances of some virtual class, allowing their identification with some
work. However there are cases where neither these objects nor their
direct referrers are instances of virtual classes. This is the case
`scylla generate_object_graph` intends to help.

scylla generate_object_graph, like its name suggests generates the
object graph of the requested object. The object graph is a directed
graph, where vertices are objects and edges are references between them,
going from referrers to the referee. The vertices contain information,
like the address of the object, its size, whether it is a live or not
and if applies, the address and symbol name of its vtable. The edges
contain the list of offsets the referrer has references at. The
generated graph is an image, which allows the visual inspection of the
object graph, allowing the developer to notice patterns and hopefully
identify the problematic objects.

The graph is generated with the help of `graphwiz`. The command
generates `.dot` files which can be converted to images with the help of
the `dot` utility. The command can do this if the output file is one of
the supported image formats (e.g. `png`), otherwise only the `.dot` file
is generated, leaving the actual image generation to the user.
2019-10-14 16:21:18 +03:00
Botond Dénes
f9e8e54603 scylla-gdb.py: boost scylla find
Add `--resolve` flag, which will make the command attempt to resolve the
first pointer of the found objects as a vtable pointer. If this is
successful the vtable pointer as well as the symbol name will be added
to the listing. This in particular makes backtracing continuation chains
a breeze, as the continuation object the searched one depends on can be
found at glance in the resulting listing (instead of having to manually
probe each item).

The arguments of `scylla find` are now parsed via `argparse`. While at
it, support for all the size classes supported by the underlying `find`
command were added, in addition to `w` and `g`. However the syntax of
specifying the size class to use has been changed, it now has to be
specified with the `-s|--size` command line argument, instead of passing
`-w` or `-g`.
2019-10-14 16:21:18 +03:00
Botond Dénes
0773104f32 scylla_fiber: also print the task that is the starting point of the fiber
Or in other words, the task that is the argument of the search. Example:
    (gdb) scylla fiber 0x60001a305910
    Starting task: (task*) 0x000060001a305910 0x0000000004aa5260 vtable for seastar::continuation<...> + 16
    #0  (task*) 0x0000600016217c80 0x0000000004aa5288 vtable for seastar::continuation<...> + 16
    #1  (task*) 0x000060000ac42940 0x0000000004aa2aa0 vtable for seastar::continuation<...> + 16
    #2  (task*) 0x0000600023f59a50 0x0000000004ac1b30 vtable for seastar::continuation<...> + 16
2019-10-14 13:36:25 +03:00
Botond Dénes
1a8846c04a scylla-gdb.py: move the code finding text_start and text_end to get_text_range()
This code is currently duplicated in `find_vptrs()` and
`scylla_task_histogram`. Refactor it out into a function.
The code is also improved in two ways:
* Make the search stricter, ensuring (hopefully) that indeed the
  executable's text section is found, not that of the first object in
  the `gdb file` listing.
* Throw an exception in the case when the search fails.
2019-10-14 13:25:28 +03:00
Raphael S. Carvalho
7f1a2156c7 table: Don't account for shared SSTables in compaction backlog tracker
We don't want to add shared sstables to table's backlog tracker because:
1) table's backlog tracker has only an influence on regular compaction
2) shared sstables are never regular compacted, they're worked by
resharding which has its own backlog tracker.

Such sstables belong to more than one shard, meaning that currently
they're added to backlog tracker of all shards that own them.
But the thing is that such sstables ends up being resharded in shard
that may be completely random. So increasing backlog of all shards
such sstables belong to, won't lead to faster resharding. Also, table's
backlog tracker is supposed to deal only with regular compaction.

Accounting for shared sstables in table's tracker may lead to incorrect
speed up of regular compactions because the controller is not aware
that some relevant part of the backlog is due to pending resharding.
The fix is about ignoring sstables that will be resharded and let
table's backlog tracker account only for sstables that can be worked on
by regular compaction, and rely on resharding controlling itself
with its own tracker.
NOTE: this doesn't fix the resharding controlling issue completely,
as described in #4952. We'll still need to throttle regular compaction
on behalf of resharding. So subsequent work may be about:
- move resharding to its own priority class, perhaps streaming.
- make a resharding's backlog tracker accounts for sstables in all of
its pending jobs, not only the ongoing ones (currently limited to 1 by shard).
- limit compaction shares when resharding is in progress.
THIS only fixes the issue in which controller for regular compaction
shouldn't account sstables completely exclusive to resharding.

Fixes #5077.
Refs #4952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>
2019-10-13 10:14:13 +03:00
Raphael S. Carvalho
88611d41d0 sstables: Fix major compaction's space amplification with incremental compaction
Incremental compaction efficiency depends on the reference of sstables
compacted being all released because the file descriptors of sstable
components are only closed once the sstable object is destructed.
Incremental compaction is not working for major compaction because a reference
to released sstables are being kept in the compaction manager, which prevents
their disk usage from being released. So the space amplification would be
the same as with a non-incremental approach, i.e. needs twice the amount of
used disk space for the table(s). With this issue fixed, the database now
becomes very major compaction friendly, the space requirement becoming very
low, a constant which is roughly number of fragments being currently compacted
multiplied by fragment size (1GB by default), for each table involved.

Fixes #5140.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191003211927.24153-1-raphaelsc@scylladb.com>
2019-10-13 09:55:11 +03:00
Raphael S. Carvalho
17c66224f7 tests: Check that partition is not resurrected on compaction failure
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-13 00:06:51 -03:00
Raphael S. Carvalho
6301a10fd7 tests: Add sstable compaction test for gc-only mutation compactor consumer
Make sure gc'able-tombstone-only sstable is properly generated with data that
comes from regular compaction's input sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-12 21:38:53 -03:00
Raphael S. Carvalho
91260cf91b sstables: Fix Incremental Compaction Efficiency
Compaction prevents data resurrection from happening by checking that there's
no way a data shadowed by a GC'able tombstone will survive alone, after
a failure for example.

Consider the following scenario:
We have two runs A and B, each divided to 5 fragments, A1..A5, B1..B5.

They have the following token ranges:

 A:  A1=[0, 3]   A2=[4, 7]  A3=[8, 11]   A4=[12, 15]    A5=[16,18]
B is the same as A's ranges, offset by 1:

 B:  B1=[1,4]    B2=[5,8]  B3=[9,12]    B4=[13,16]    B5=[17,19]

Let's say we are finished flushing output until position 10 in the compaction.
We are currently working on A3 and B3, so obviously those cannot be deleted.
Because B2 overlaps with A3, we cannot delete B2 either.
Otherwise, B2 could have a GC'able tombstone that shadows data in A3, and after
B2 is gone, dead data in A3 could be resurrected *on failure*.
Now, A2 overlaps with B2 which we couldn't delete yet, so we can't delete A2.
Now A2 overlaps with B1 so we can't delete B1. And B1 overlaps with A1 so
we can't delete A1. So we can't delete any fragment.

The problem with this approach is obvious, fragments can potentially not be
released due to data dependency, so incremental compaction efficiency is
severely reduced.
To fix it, let's not purge GC'able tombstones right away in the mutation
compactor step. Instead, let's have compaction writing them to a separate
sstable run that would be deleted in the end of compaction.
By making sure that tombstone information from all compacting sstables is not
lost, we no longer need to have incremental compaction imposing lots of
restriction on which fragments could be released. Now, any sstable which data
is safe in a new sstable can be released right away. In addition, incremental
compaction will only take place if compaction procedure is working with one
multi-fragment sstable run at least.

Fixes #4531.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-12 21:36:03 -03:00
Kamil Braun
ef9d5750c8 view: fix bug in virtual columns.
When creating a virtual column of non-frozen map type,
the wrong type was used for the map's keys.

Fixes #5165.
2019-10-11 20:47:06 +03:00
Avi Kivity
f12feec2c9 Update seastar submodule
* seastar 1f68be436f...e888b1df9c (8):
  > sharded: Make map work with mapper that returns a future
  > cmake: Remove FindBoost.cmake
  > Reduce noncopyable_function instruction cache footprint
  > doc: add Loops section to the tutorial
  > Merge "Move file related code out of reactor" from Asias
  > Merge "Move the io_queue code out of reactor" from Asias
  > cmake: expose seastar_perf_testing lib
  > future: class doc: explain why discarding a future is bad

 - main.cc now includes new file io_queue.hh
 - perf tests now include seastar perf utilities via user, not
   system, includes since those are not exported
2019-10-10 18:17:28 +03:00
Nadav Har'El
33027a36b4 alternator: Add authorization
Merged patch set from Piotr Sarna:

Refs #5046

This commit adds handling "Authorization:" header in incoming requests.
The signature sent in the authorization is recomputed server-side
and compared with what the client sent. In case of a mismatch,
UnrecognizedClientException is returned.
The signature computation is based on boto3 Python implementation
and uses gnutls to compute HMAC hashes.

This series is rebased on a previous HTTPS series in order to ease
merging these two. As such, it depends on the HTTPS series being
merged first.

Tests: alternator(local, remote)

The series also comes with a simple authorization test and a docs update.

Piotr Sarna (6):
  alternator: migrate split() function to string_view
  alternator: add computing the auth signature
  config: add alternator_enforce_authorization entry
  alternator: add verifying the auth signature
  alternator-test: add a basic authorization test case
  docs: update alternator authorization entry

 alternator-test/test_authorization.py |  34 ++++++++
 configure.py                          |   1 +
 alternator/{server.hh => auth.hh}     |  22 ++---
 alternator/server.hh                  |   3 +-
 db/config.hh                          |   1 +
 alternator/auth.cc                    |  88 ++++++++++++++++++++
 alternator/server.cc                  | 112 +++++++++++++++++++++++---
 db/config.cc                          |   1 +
 main.cc                               |   2 +-
 docs/alternator/alternator.md         |   7 +-
 10 files changed, 241 insertions(+), 30 deletions(-)
 create mode 100644 alternator-test/test_authorization.py
 copy alternator/{server.hh => auth.hh} (58%)
 create mode 100644 alternator/auth.cc
2019-10-10 15:57:46 +03:00
Nadav Har'El
df62499710 docs/isolation.md: copy-edit
Minor spelling and syntax corrections. No new content or semantic changes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191010093457.20439-1-nyh@scylladb.com>
2019-10-10 15:17:28 +03:00
Piotr Dulikowski
c04e8c37aa distributed_loader: populate non-system keyspaces in parallel
Before this change, when populating non-system keyspaces, each data
directory was scanned and for each entry (keyspace directory),
a keyspace was populated. This was done in a serial fashion - populating
of one keyspace was not started until the previous one was done.

Loading keyspaces in such fashion can introduce unnecessary waiting
in case of a large number of keyspaces in one data directory. Population
process is I/O intensive and barely uses CPU.

This change enables parallel loading of keyspaces per data directory.
Populating the next keyspace does not wait for the previous one.

A benchmark was performed measuring startup time, with the following
setup:
  - 1 data directory,
  - 200 keyspaces,
  - 2 tables in each keyspace, with the following schema:
      CREATE TABLE tbl (a int, b int, c int, PRIMARY KEY(a, b))
        WITH CLUSTERING ORDER BY (b DESC),
  - 1024 rows in each table, with values (i, 2*i, 3*i) for i in 0..1023,
  - ran on 6-core virtual machine running on i7-8750H CPU,
  - compiled in dev mode,
  - parameters: --smp 6 --max-io-requests 4 --developer-mode=yes
      --datadir $DIR --commitlog-directory $DIR
      --hints-directory $DIR --view-hints-directory $DIR

The benchmark tested:
  - boot time, by comparing timestamp of the first message in log,
    and timestamp of the following message:
      "init - Scylla version ... initialization completed."
  - keyspace population time, by comparing timestamps of messages:
      "init - loading non-system sstables"
    and
      "init - starting view update generator"

The benchmark was run 5 times for sequential and parallel version,
with the following results:
  - sequential: boot 31.620s, keyspace population 6.051s
  - parallel:   boot 29.966s, keyspace population 4.360s

Keyspace population time decreased by ~27.95%, and overall boot time
by about ~5.23%.

Tests: unit(release)

Fixes #2007
2019-10-10 15:12:23 +03:00
Piotr Sarna
6ca55d3c83 docs: update alternator authorization entry
The entry now contains a comment that computing a signature works,
but is still based on a hardcoded key.
2019-10-10 13:51:00 +02:00
Piotr Sarna
23798b7301 alternator-test: add a basic authorization test case
The test case ensures that passing wrong credential results
in getting an UnrecognizedClientException.
2019-10-10 13:51:00 +02:00
Piotr Sarna
97cbb9a2c7 alternator: add verifying the auth signature
The signature sent in the "Authorization:" header is now verified
by computing the signature server-side with a matching secret key
and confirming that the signatures match.
Currently the secret key is hardcoded to be "whatever" in order
to work with current tests, but it should be replaced
by a proper key store.

Refs #5046
2019-10-10 13:51:00 +02:00
Piotr Sarna
e245b54502 config: add alternator_enforce_authorization entry
The config entry will be used to turn authorization for alternator
requests on and off. The default is currently off, since the key store
is not implemented yet.
2019-10-10 13:51:00 +02:00
Piotr Sarna
589a22d078 alternator: add computing the auth signature
A function for computing the auth signature from user requests
is added, along with helper functions. The implementation
is based on gnutls's HMAC.

Refs #5046
2019-10-10 13:51:00 +02:00
Piotr Sarna
ca58b46b4c alternator: migrate split() function to string_view
The implementation of string split was based on sstring type for
simplicity, but it turns out that more generic std::string_view
will be beneficial later to avoid unneeded string copying.
Unfortunately boost::split does not cooperate well with string views,
so a simple manual implementation is provided instead.
2019-10-10 13:50:59 +02:00
Botond Dénes
52afbae1e5 README.md: add links to other documentation sources
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191010103926.34705-3-bdenes@scylladb.com>
2019-10-10 14:15:01 +03:00
Botond Dénes
e52712f82c docs: add README.md
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191010103926.34705-2-bdenes@scylladb.com>
2019-10-10 14:14:09 +03:00
Amnon Heiman
64c2d28a7f database: Add counter for the number of schema changes
Schema changes can have big effects on performance, typically it should
be a rare event.

It is usefull to monitor how frequently the schema changed.
This patch adds a counter that increases each time a schema changed.

After this patch the metrics would look like:

scylla_database_schema_changed{shard="0",type="derive"} 2

Fixes #4785

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-10-08 17:54:49 +02:00
Asias He
b89ced4635 streaming: Do not open rpc stream connection if reader has no data
We can use the reader::peek() to check if the reader contains any data.
If not, do not open the rpc stream connection. It helps to reduce the
port usage.

Refs: #4943
2019-10-08 10:31:02 +02:00
Konstantin Osipov
94006d77b1 lwt: add cas_contention_timeout_in_ms to config
Make the default conform to the origin.
Message-Id: <20191006154532.54856-3-kostja@scylladb.com>
2019-10-08 00:02:35 +02:00
Konstantin Osipov
383e17162a lwt: implement query_options::check_serial_consistency()
Both in a single-statement transaction and in a batch
we expect that serial consistency is provided. Move the
check to query_options class and make it available for
reuse.

Keep get_serial_consistency() around for use in
transport/server.cc.
Message-Id: <20191006154532.54856-2-kostja@scylladb.com>
2019-10-08 00:02:35 +02:00
Piotr Sarna
36a1905e98 storage_proxy: handle unstarted write cancelling
When another node is reported to be down, view updates queued
for it are cancelled, but some of them may already be initiated.
Right now, cancelling such a write resulted in an exception,
but on conceptual level it's not really an exception, since
this behaviour is expected.
Previous version of this patch was based on introducing a special
exception type that was later handled specially, but it's not clear
if it's a good direction. Instead, this patch simply makes this
path non-exceptional, as was originally done by Nadav in the first
version of the series that introduced handling unstarted write
cancellations. Additionally, a message containing the information
that a write is cancelled is logged with debug level.
2019-10-07 16:55:36 +03:00
Vladimir Davydov
e8bcb34ed4 api: drop /storage_proxy/metrics/cas_read/condition_not_met
There's no such metric in Cassandra (although Cassadra's docs mistakenly
say it exists). Having it would make no sense anyway so let's drop it.

Message-Id: <b4f7a6ad278235c443cb8ea740bfa6399f8e4ee1.1570434332.git.vdavydov@scylladb.com>
2019-10-07 16:54:39 +03:00
Piotr Sarna
5ab134abef alternator-test: update HTTPS section of README
README.md has 3 fixes applied:
 - s/alternator_tls_port/alternator_https_port
 - conf directory is mentioned more explicitly
 - it now correctly states that the self-signed certificate
   warning *is* explicitly ignored in tests
Message-Id: <e5767f7dbea260852fc2fa9b613e1bebf490cc78.1570444085.git.sarna@scylladb.com>
2019-10-07 14:51:16 +03:00
Avi Kivity
8ed6f94a16 Merge "Fix handling of schema alters and eviction in cache" from Tomasz
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()
2019-10-07 14:43:36 +03:00
Nadav Har'El
f2f0f5eb0f alternator: add https support
Merged patch series from Piotr Sarna:

This series adds HTTPS support for Alternator.
The series comes with --https option added to alternator-test, which makes
the test harness run all the tests with HTTPS instead of HTTP. All the tests
pass, albeit with security warnings that a self-signed x509 certificate was
used and it should not be trusted.

Fixes #5042
Refs scylladb/seastar#685

Patches:
  docs: update alternator entry on HTTPS
  alternator-test: suppress the "Unverified HTTPS request" warning
  alternator-test: add HTTPS info to README.md
  alternator-test: add HTTPS to test_describe_endpoints
  alternator-test: add --https parameter
  alternator: add HTTPS support
  config: add alternator HTTPS port
2019-10-07 12:38:20 +03:00
Avi Kivity
969113f0c9 Update seastar submodule
* seastar c21a7557f9...1f68be436f (6):
  > scheduling: Add per scheduling group data support
  > build: Include dpdk as a single object in libseastar.a
  > sharded: fix foreign_ptr's move assignment
  > build: Fix DPDK libraries linking in pkg-config file
  > http server: https using tls support
  > Make output_stream blurb Doxygen
2019-10-07 12:18:49 +03:00
Nadav Har'El
754add1688 alternator: fix Expected's BEGINS_WITH error handling
The BEGINS_WITH condition in conditional updates (via Expected) requires
that the given operand be either a string or a binary. Any other operand
should result in a validation exception - not a failed condition as we
generate now.

This patch fixes the test for this case so it will succeed against
Amazon DynamoDB (before this patch it fails - this failure was masked by
a typo before commit 332ffa77ea). The patch
then fixes our code to handle this case correctly.

Note that BEGINS_WITH handling of wrong types is now asymmetrical: A bad
type in the operand is now handled differently from a bad type in the
attribute's value. We add another check to the test to verify that this
is the case.

Fixes #5141

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191006080553.4135-1-nyh@scylladb.com>
2019-10-06 17:16:55 +03:00
Botond Dénes
d0fa5dc34d scylla-gdb.py: introduce the downcast_vptr convenience function
When debugging one constantly has to inspect object for which only
a "virtual pointer" is available, that is a pointer that points to a
common parent class or interface.
Finding the concrete type and downcasting the pointer is easy enough but
why do it manually when it is possible to automate it trivially?
$downcast_vptr() returns any virtual pointer given to it, casted to the
actual concrete object.
Exlample:
    (gdb) p $1
    $2 = (flat_mutation_reader::impl *) 0x60b03363b900
    (gdb) p $downcast_vptr(0x60b03363b900)
    $3 = (combined_mutation_reader *) 0x60b03363b900
    # The return value can also be dereferenced on the spot.
    (gdb) p *$downcast_vptr($1)
    $4 = {<flat_mutation_reader::impl> = {_vptr.impl = 0x46a3ea8 <vtable
    for combined_mutation_reader+16>, _buffer = {_impl = {<std::al...
2019-10-04 17:45:47 +03:00
Botond Dénes
434a41d39b scylla-gdb.py: introduce the dereference_lw_shared_ptr convenience function
Dereferencing an `seastar::lw_shared_ptr` is another tedious manual
task. The stored pointer (`_p`) has to be casted to the right subclass
of `lw_shared_ptr_counter_base`, which involves inspecting the code,
then make writing a cast expression that gdb is willing to parse. This
is something machines are so much better at doing.
`$dereference_lw_shared_ptr` returns a pointer to the actual pointed-to
object, given an instance of `seastar::lw_shared_ptr`.
Example:
    (gdb) p $1._read_context
    $2 = {_p = 0x60b00b068600}
    (gdb) p $dereference_lw_shared_ptr($1._read_context)
    $3 = {<seastar::enable_lw_shared_from_this<cache::read_context>>
    = {<seastar::lw_shared_ptr_counter_base> = {_count = 1}, ...
2019-10-04 17:45:47 +03:00
Botond Dénes
f5de002318 scylla-gdb.py: scylla_sstables: also print the sstable filename
And expose the method that obtains the file-name of an sstble object to
python code.
2019-10-04 17:45:32 +03:00
Botond Dénes
ad7a668be9 scylla-gdb.py: scylla_task_histogram: expose internal parameters
Make all the parameters of the sampling tweakable via command line
arguments. I strived to keep full backward compatibility, but due to the
limitations of `argparse` there is one "breaking" change. The optional
positional size argument is now a non-positional argument as `argparse`
doesn't support optional positional arguments.
Added documentation for both the command itself as well as for all the
arguments.
2019-10-04 17:44:40 +03:00
Botond Dénes
7767cc486e scylla-gdb.py: make scylla_find usable from python code 2019-10-04 17:44:40 +03:00
Botond Dénes
9cdea440ef scylla-gdb.py: add std_variant, a wrapper for std::variant
Allows conveniently obtaining the active member via calling `get()`.
2019-10-04 17:44:40 +03:00
Botond Dénes
55e9097dd9 scylla-gdb.py: add std_list, a wrapper for an std::list
std_list makes an `std::list` instance accessible from python code just
like a regular (read-only) python container.
2019-10-04 17:44:40 +03:00
Botond Dénes
b8f0b3ba93 std_optional: fix get()
Apparently there is now another layer of indirection: `std::_Storage`.
2019-10-04 17:43:40 +03:00
Tomasz Grabiec
020a537ade tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read 2019-10-04 11:38:13 +02:00
Tomasz Grabiec
ebedefac29 tests: row_cache_stress_test: Verify all entries are evictable at the end 2019-10-04 11:38:12 +02:00
Tomasz Grabiec
1b95f5bf60 tests: row_cache_stress_test: Exercise single-partition reads
make_single_key_reader() currently doesn't actually create
single-partition readers because it doesn't set
mutation_reader::forwarding::no when it creates individual
readers. The readers will default to mutation_reader::forwarding::yes
and actually create scanning readers in preparation for
fast-forwarding across partitions.

Fix by passing mutation_reader::forwarding::no.
2019-10-04 11:38:12 +02:00
Tomasz Grabiec
81dd17da4e tests: row_cache_stress_test: Add periodic schema alters
Reproduces #5127.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
2fc144e1a8 tests: memtable_snapshot_source: Allow changing the schema 2019-10-03 22:03:29 +02:00
Tomasz Grabiec
22dde90dba tests: simple_schema: Prepare for schema altering
Currently, methods of simple_schema assume that table's schema doesn't
change. Accessors like get_value() assume that rows were generated
using simple_schema::_s. Because if that, the column_definition& for
the "v" column is cached in the instance. That column_definiion&
cannot be used to access objects created with a different schema
version. To allow using simple_schema after schema changes,
column_definition& caching is now tagged with the table schema version
of origin. Methods which access schema-dependent objects, like
get_value(), are now accepting schema& corresponding to the objects.

Also, it's now possible to tell simple_schema to use a different
schema version in its generator methods.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
e6afc89735 row_cache: Record upgraded schema in memtable entries during update
Cache update may defer in the middle of moving of partition entry
from a flushed memtable to the cache. If the schema was changed since
the entry was written, it upgrades the schema of the partition_entry
first but doesn't update the schema_ptr in memtable_entry. The entry
is removed from the memtable afterward. If a memtable reader
encounters such an entry, it will try to upgrade it assuming it's
still at the old schema.

That is undefined behavior in general, which may include:

 - read failures due to bad_alloc, if fixed-size cells are interpreted
   as variable-sized cells, and we misinterpret a value for a huge
   size

 - wrong read results

 - node crash

This doesn't result in a permanent corruption, restarting the node
should help.

It's the more likely to happen the more rows there are in a
partition. It's unlikely to happen with single-row partitions.

Introduced in 70c7277.

Fixes #5128.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
ea461a3884 memtable: Extract memtable_entry::upgrade_schema() 2019-10-03 22:03:29 +02:00
Tomasz Grabiec
90d6c0b9a2 row_cache, mvcc: Prevent locked snapshots from being evicted
If the whole partition entry is evicted while being updated from the
memtable, a subsequent read may populate the partition using the old
version of data if it attempts to do it before cache update advances
past that partition. Partial eviction is not affected because
populating reads will notice that there is a newer snapshot
corresponding to the updater.

This can happen only in OOM situations where the whole cache gets evicted.

Affects only tables with multi-row partitions, which are the only ones
that can experience the update of partition entry being preempted.

Introduced in 70c7277.

Fixes #5134.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
57a93513bd row_cache: Make evict() not use invalidate_unwrapped()
invalidate_unwrapped() calls cache_entry::evict(), which cannot be
called concurrently with cache update. invalidate() serializes it
properly by calling do_update(), but evict() doesn't. The purpose of
evict() is to stress eviction in tests, which can happen concurrently
with cache update. Switch it to use memory reclaimer, so that it's
both correct and more realistic.

evict() is used only in tests.
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
c88a4e8f47 mvcc: Introduce partition_snapshot::touch() 2019-10-03 22:03:28 +02:00
Tomasz Grabiec
25e2f87a37 row_cache, mvcc: Do not upgrade schema of entries which are being updated
When a read enters a partition entry in the cache, it first upgrades
it to the current schema of the cache. The same happens when an entry
is updated after a memtable flush. Upgrading the entry is currently
performed by squashing all versions and replacing them with a single
upgraded version. That has a side effect of detaching all snapshots
from the partition entry. Partition entry update on memtable flush is
writing into a snapshot. If that snapshot is detached by a schema
upgrade, the entry will be missing writes from the memtable which fall
into continuous ranges in that entry which have not yet been updated.

This can happen only if the update of the entry is preempted and the
schema was altered during that, and a read hit that partition before
the update went past it.

Affects only tables with multi-row partitions, which are the only ones
that can experience the update of partition entry being preempted.

The problem is fixed by locking updated entries and not upgrading
schema of locked entries. cache_entry::read() is prepared for this,
and will upgrade on-the-fly to the cache's schema.

Fixes #5135
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
0675088818 row_cache: Use the correct schema version to populate the partition entry
The sstable reader which populates the partition entry in the cache is
using the schema of the partition entry snapshot, which will be the
schema of the cache at the time the partition was entered. If there
was a schema change after the cache reader entered the partition but
before it created the sstable reader, the cache populating reader will
interpret sstable fragments using the wrong schema version. That is
more likely if partitions have many rows, and the front of the
partition is populated. With single-row partitions that's unlikely to
happen.

That is undefined behavior in general, which may include:

 - read failures due to bad_alloc, if fixed-size cells are
   interpreted as variable-sized cells, and we misinterpret
   a value for a huge size

 - wrong read results

 - node crash

This doesn't result in a permanent corruption, restarting the node
should help.

Fixes #5127.
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
10992a8846 delegating_reader: Optimize fill_buffer()
Use move_buffer_content_to() which is faster than fill_buffer_from()
because it doesn't involve popping and pushing the fragments across
buffers. We save on size estimation costs.
2019-10-03 22:03:28 +02:00
Piotr Sarna
07ac3ea632 docs: update alternator entry on HTTPS
The HTTPS entry is updated - it's now supported, but still
misses the same features as HTTP - CRC headers, etc.
2019-10-03 19:10:30 +02:00
Piotr Sarna
b63077a8dc alternator-test: suppress the "Unverified HTTPS request" warning
Running with --https and a self-signed certificate results in a flood
of expected warnings, that the connection is not to be trusted.
These warnings are silenced, as users runing a local test with --https
usually use self-signed certificates.
2019-10-03 19:10:30 +02:00
Piotr Sarna
e65fd490da alternator-test: add HTTPS info to README.md
A short paragraph about running tests with `--https` and configuring
the cluster to work correctly with this parameter is added to README.md.
2019-10-03 19:10:30 +02:00
Piotr Sarna
0d28d7f528 alternator-test: add HTTPS to test_describe_endpoints
The test_describe_endpoints test spawns another client connection
to the cluster, so it needs to be HTTPS-aware in order to work properly
with --https parameter.
2019-10-03 19:10:30 +02:00
Piotr Sarna
9fd77ed81d alternator-test: add --https parameter
Running with --https parameter will result in sending the requests
via HTTPS instead of HTTP. By default, port 8043 is used for a local
cluster. Before running pytest --https, make sure that Scylla
was properly configured to initialize a HTTPS alternator server
by providing the alternator_tls_port parameter.

The HTTPS-based connection runs with verification disabled,
otherwise it would not work with self-signed certificates,
which are useful for tests.
2019-10-03 19:10:30 +02:00
Piotr Sarna
e1b0537149 alternator: add HTTPS support
By providing a server based on a TLS socket, it's now possible
to serve HTTPS requests in alternator. The HTTPS server is enabled
by setting its port in scylla.yaml: alternator_tls_port=XXXX.
Alternator TLS relies on the existing TLS configuration,
which is provided by certificate, keyfile, truststore, priority_string
options.

Fixes #5042
2019-10-03 19:10:30 +02:00
Piotr Sarna
b42eb8b80a config: add alternator HTTPS port
The config variable will be used to set up a TLS-based server
for serving alternator HTTPS requests.
2019-10-03 19:10:29 +02:00
Nadav Har'El
9d4e71bbc6 alternator-test: fix misleading xfail message
The test test_update_expression_function_nesting() fails because DynamoDB
don't allow an expression like list_append(list_append(:val1, :val2), :val3)
but Alternator doesn't check for this (and supports this expression).

The "xfail" message was outdated, suggesting that the test fails because
the "SET" expression isn't supported - but it is. So replace the message
by a more accurate one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190915104708.30471-1-nyh@scylladb.com>
2019-10-03 18:45:03 +03:00
Nadav Har'El
9747019e7b alternator: implement additional Expected operators
Merged patch set from Dejan Mircevski implementing some of the
missing operators for Expected: NE, IN, NULL and NOT_NULL.

Patches:
  alternator: Factor out Expected operand checks
  alternator: Implement NOT_NULL operator in Expected
  alternator: Implement NULL operator in Expected
  alternator: Fix expected_1_null testcase
  alternator: Implement IN operator in Expected
  alternator: Implement NE operator in Expected
  alternator: Factor out common code in Expected
2019-10-03 18:12:38 +03:00
Konstantin Osipov
25ffd36d21 lwt: prepare the expression tree for IF condition evaluation
Frozen empty lists/map/sets are not equal to null value,
whil multi-cell empty lists/map/sets are equal to null values.

Return a NULL value for an empty multi-cell set or list
if we know the receiver is not frozen - this makes it
easy to compare the parameter with the receiver.

Add a test case for inserting an empty list or set
- the result is indistinguishable from NULL value.
Message-Id: <20191003092157.92294-2-kostja@scylladb.com>
2019-10-03 14:56:25 +02:00
Avi Kivity
3cb081eb84 Merge " hinted handoff: fix races during shutdown and draining" from Vlad
"
Fix races that may lead to use-after-free events and file system level exceptions
during shutdown and drain.

The root cause of use-after-free events in question is that space_watchdog blocks on
end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as
it's accessed even if the corresponding end_point_hints_manager instance
is destroyed in the context of manager::drain_for().

File system exceptions may occur when space_watchdog attempts to scan a
directory while it's being deleted from the drain_for() context.
In case of such an exception new hints generation is going to be blocked
- including for materialized views, till the next space_watchdog round (in 1s).

Issues that are fixed are #4685 and #4836.

Tested as follows:
 1) Patched the code in order to trigger the race with (a lot) higher
    probability and running slightly modified hinted handoff replace
    dtest with a debug binary for 100 times. Side effect of this
    testing was discovering of #4836.
 2) Using the same patch as above tested that there are no crashes and
    nodes survive stop/start sequences (they were not without this series)
    in the context of all hinted handoff dtests. Ran the whole set of
    tests with dev binary for 10 times.
"

* 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla:
  hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
  hinted handoff: make taking file_update_mutex safe
  db::hints::manager::drain_for(): fix alignment
  db::hints::manager: serialize calls to drain_for()
  db::hints: cosmetics: identation and missing method qualifier
2019-10-03 14:38:00 +03:00
Tomasz Grabiec
aad1307b14 row_cache, memtable: Use upgrade_schema() 2019-10-03 13:28:33 +02:00
Tomasz Grabiec
3177732b35 flat_mutation_reader: Introduce upgrade_schema() 2019-10-03 13:28:33 +02:00
Asias He
a9b95f5f01 repair: Fix tracker::start and tracker::done in case of error
The operation after gate.enter() in tracker::start() can fail and throw,
we should call gate.leave() in such case to avoid unbalanced enter and
leave calls. tracker::done() has similar issue too.

Fix it by removing the gate enter and leave logic in tracker start and
done. A helper tracker::run() is introduced to take care of the gate and
repair status.

In addition, the error log is improved. It now logs exceptions on all
shards in the summary. e.g.,

[shard 0] repair - repair id 1 failed: std::runtime_error
({shard 0: std::runtime_error (error0), shard 1: std::runtime_error (error1)})

Fixes #5074
2019-10-03 13:33:02 +03:00
Botond Dénes
00b432b61d querier_cache: correctly account entries evicted on insertion in the population
Currently, the population stat is not increased for entries that are
evicted immediately on insert, however the code that does the eviction
still decreases the population stat, leading to an imbalance and in some
cases the underflow of the population stat. To fix, unconditionally
increase the population stat upon inserting an entry, regardless of
whether it is immediately evicted or not.

Fixes: #5123

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>
2019-10-03 11:49:44 +03:00
Dejan Mircevski
ac98385d04 alternator: Factor out Expected operand checks
Put all AttributeValuelist size verification under
verify_operand_count(), rather than have some cases invoke
verify_operand_count() while others verify it in check_*() functions.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 17:11:58 -04:00
Dejan Mircevski
de18b3240b alternator:Implement NOT_NULL operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:23:59 -04:00
Dejan Mircevski
75960639a4 alternator: Implement NULL operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:19:14 -04:00
Dejan Mircevski
e4fd5f3ef0 alternator: Fix expected_1_null testcase
Testcase "For NULL, AttributeValueList must be empty" accidentally
used NOT_NULL instead of NULL.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:19:14 -04:00
Dejan Mircevski
b7ac510581 alternator: Implement IN operator in Expected
Add check_IN() and a switch case that invokes it.  Reactivate IN
tests.  Add a testcase for non-scalar attribute values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:17:38 -04:00
Dejan Mircevski
56efa55a06 alternator: Implement NE operator in Expected
Recognize "NE" as a new operator type, add check_NE() function, invoke
it in verify_expected_one(), and reactivate NE tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 14:47:13 -04:00
Dejan Mircevski
af0462d127 alternator: Factor out common code in Expected
Operand-count verification will be repeated a lot as more operators
are implemented, so factor it out into verify_operand_count().

Also move `got` null checks to check_* functions, which reduces
duplication at call sites.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 14:36:57 -04:00
Konstantin Osipov
e8c13efb41 lwt: move mutation hashers to mutation.hh
Prepare mutation hashers for reuse in CAS implementation.
Message-Id: <20190930202409.40561-2-kostja@scylladb.com>
2019-10-01 19:49:31 +02:00
Konstantin Osipov
6cde985946 lwt: remove code that no longer servers as a reference
Remove ifdef'ed Java code, since LWT implementation
is based on the current state of the origin.
Message-Id: <20190930201022.40240-2-kostja@scylladb.com>
2019-10-01 19:46:15 +02:00
Konstantin Osipov
4d214b624b lwt: ensure enum_set::of is constexpr.
This allows using it to initialize const static members.
Message-Id: <20190930200530.40063-2-kostja@scylladb.com>
2019-10-01 19:45:56 +02:00
Tomasz Grabiec
3b9bf9d448 Merge "storage_proxy: replace variadic futures with structs" from Avi
Seastar variadic futures are deprecated, so replace with structs to
avoid nasty deprecation warnings.
2019-10-01 19:32:55 +02:00
Avi Kivity
162730862d storage_proxy: remove variadic future from query_partition_key_range_concurrent()
Seastar variadic futures are deprecated, so replace with a nice struct.
2019-09-30 21:33:44 +03:00
Avi Kivity
968b34a2b4 storage_proxy: remove variadic future from digest_read_resolver
Seastar variadic futures are deprecated, so replace with a nice
struct.
2019-09-30 21:32:17 +03:00
Avi Kivity
90096da9f3 managed_ref: add get() accessor
While a managed_ref emulates a reference more closely than it does
a pointer, it is still nullable, so add a get() (similar to
unique_ptr::get()) that can be nullptr if the reference is null.

The immediate use will be mutation_partition::_static_row, which
is often empty and takes up about 10% of a cache entry.
2019-09-30 20:55:36 +03:00
Nadav Har'El
c9aae13fae docs/alternator/getting-started.md: fix indentation in example code
The example Python code had wrong indentation, and wouldn't actually
work if naively copy-pasted. Noticed by Noam Hasson.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190929091440.28042-1-nyh@scylladb.com>
2019-09-30 13:03:29 +03:00
Avi Kivity
c6b66d197b Merge "Couple of preparatory patches for lwt" from Gleb
"
This is a collection of assorted patches that will be needed for LWT.
Most of them are trivial, but one touches a lot of files, so have a
good chance to cause rebase headache (I already had to rebase it on
top of Alternator). Lets push them earlier instead of carrying them in
the lwt branch.
"

* 'gleb/lwt-prepare-v2' of github.com:scylladb/seastar-dev:
  lwt: make _last_timestamp_micros static
  lwt: Add client_state::get_timestamp_for_paxos() function
  lwt: Pass client_state reference all the way to storage_proxy::query
  exceptions: Add a constructor for unavailable_exception that allows providing a custom message
  serializer: Add std::variant support
  lwt: Add missing functions to utils/UUID_gen.hh
2019-09-29 13:02:26 +03:00
Avi Kivity
9e990725d9 Merge "Simplify and explain from_varint_to_integer #5031" from Rafael
"
This is the second version of the patch series. The previous one was just the second patch, this one adds more tests an another patch to make it easier to test that the new code has the same behavior as the old one.
"

* 'espindola/overflow-is-intentional' of https://github.com/espindola/scylla:
  types: Simplify and explain from_varint_to_integer
  Add more cast tests
2019-09-29 11:27:55 +03:00
Tomasz Grabiec
b0e0f29b06 db: read: Filter-out sstables using its first and last keys
Affects single-partition reads only.

Refs #5113

When executing a query on the replica we do several things in order to
narrow down the sstable set we read from.

For tables which use LeveledCompactionStrategy, we store sstables in
an interval set and we select only sstables whose partition ranges
overlap with the queried range. Other compaction strategies don't
organize the sstables and will select all sstables at this stage. The
reasoning behind this is that for non-LCS compaction strategies the
sstables' ranges will typically overlap and using interval sets in
this case would not be effective and would result in quadratic (in
sstable count) memory consumption.

The assumption for overlap does not hold if the sstables come from
repair or streaming, which generates non-overlapping sstables.

At a later stage, for single-partition queries, we use the sstables'
bloom filter (kept in memory) to drop sstables which surely don't
contain given partition. Then we proceed to sstable indexes to narrow
down the data file range.

Tables which don't use LCS will do unnecessary I/O to read index pages
for single-partition reads if the partition is outside of the
sstable's range and the bloom filter is ineffective (Refs #5112).

This patch fixes the problem by consulting sstable's partition range
in addition to the bloom filter, so that the non-overlapping sstables
will be filtered out with certainty and not depend on bloom filter's
efficiency.

It's also faster to drop sstables based on the keys than the bloom
filter.

Tests:
  - unit (dev)
  - manual using cqlsh

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>
2019-09-28 19:42:57 +03:00
Tomasz Grabiec
b93cc21a94 sstables: Fix partition key count estimation for a range
The method sstable::estimated_keys_for_range() was severely
under-estimating the number of partitions in an sstable for a given
token range.

The first reason is that it underestimated the number of sstable index
pages covered by the range, by one. In extreme, if the requested range
falls into a single index page, we will assume 0 pages, and report 1
partition. The reason is that we were using
get_sample_indexes_for_range(), which returns entries with the keys
falling into the range, not entries for pages which may contain the
keys.

A single page can have a lot of partitions though. By default, there
is a 1:20000 ratio between summary entry size and the data file size
covered by it. If partitions are small, that can be many hundreds of
partitions.

Another reason is that we underestimate the number of partitions in an
index page. We multiply the number of pages by:

   (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval)
     / _components->summary.header.sampling_level

Using defaults, that means multiplying by 128. In the cassandra-stress
workload a single partition takes about 300 bytes in the data file and
summary entry is 22 bytes. That means a single page covers 22 * 20'000
= 440'000 bytes of the data file, which contains about 1'466
partitions. So we underestimate by an order of magnitude.

Underestimating the number of partitions will result in too small
bloom filters being generated for the sstables which are the output of
repair or streaming. This will make the bloom filters ineffective
which results in reads selecting more sstables than necessary.

The fix is to base the estimation on the number of index pages which
may contain keys for the range, and multiply that by the average key
count per index page.

Fixes #5112.
Refs #4994.

The output of test_key_count_estimation:

Before:

count = 10000
est = 10112
est([-inf; +inf]) = 512
est([0; 0]) = 128
est([0; 63]) = 128
est([0; 255]) = 128
est([0; 511]) = 128
est([0; 1023]) = 128
est([0; 4095]) = 256
est([0; 9999]) = 512
est([5000; 5000]) = 1
est([5000; 5063]) = 1
est([5000; 5255]) = 1
est([5000; 5511]) = 1
est([5000; 6023]) = 128
est([5000; 9095]) = 256
est([5000; 9999]) = 256
est(non-overlapping to the left) = 1
est(non-overlapping to the right) = 1

After:

count = 10000
est = 10112
est([-inf; +inf]) = 10112
est([0; 0]) = 2528
est([0; 63]) = 2528
est([0; 255]) = 2528
est([0; 511]) = 2528
est([0; 1023]) = 2528
est([0; 4095]) = 5056
est([0; 9999]) = 10112
est([5000; 5000]) = 2528
est([5000; 5063]) = 2528
est([5000; 5255]) = 2528
est([5000; 5511]) = 2528
est([5000; 6023]) = 5056
est([5000; 9095]) = 7584
est([5000; 9999]) = 7584
est(non-overlapping to the left) = 0
est(non-overlapping to the right) = 0

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>
2019-09-28 19:36:43 +03:00
Piotr Sarna
10f90d0e25 types: remove deprecated comment
The comment does not apply anymore, as this definition is no more
in database.hh.
Message-Id: <a0b6ff851e1e3bcb5fcd402fbf363be7af0219af.1569580556.git.sarna@scylladb.com>
2019-09-27 19:32:17 +02:00
Dejan Mircevski
9a89e0c5ec dbuild: Update README on interactive mode
`dbuild` was recently (24c732057) updated to run in interactive mode
when given no arguments; we can now update the README to mention that.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-27 16:33:27 +02:00
Dejan Mircevski
f8638d8ae1 alternator: Add build byproducts to .gitignore
Add .pytest_cache and expressions.tokens to the top-level .gitignore.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-27 16:18:45 +02:00
Dejan Mircevski
332ffa77ea alternator: Actually use BEGINS_WITH in its tests
For some reason, BEGINS_WITH tests used EQ as comparison operator.

Tests: pytest test_expected.py

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-26 22:41:34 +03:00
Tomasz Grabiec
5b0e48f25b Merge "toppartitions: don't transport schema_ptr across shards" from Avi
When the toppartitions operation gathers results, it copies partition
keys with their schema_ptr:s. When these schema_ptr:s are copies
or destroyed, they can cause leaks or premature frees of the schema
in its original shard since reference count operations in are not atomic.

Fix that by converting the schema_ptr to a global_schema_ptr during
transportation.

Fixes #5104 (direct bug)
Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node)
Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node)

Tests: new test added that fails with the existing code in debug mode,
manual toppartitions test
2019-09-26 17:09:54 +02:00
Avi Kivity
36b4d55b28 tests: add test for toppartitions cross-shard schema_ptr copy 2019-09-26 17:40:46 +03:00
Avi Kivity
670f398a8a toppartitions: do not copy schema_ptr:s in item keys across shards
Copying schema_ptrs across shards results in memory corruption since
lw_shared_ptr does not use atomic operations for reference counts.
Prevent that by converting schema_ptr:s to global_schema_ptr:s before
shipping them across shards in the map operation, and converting them
back to local schema_ptr:s in the reduce operation.
2019-09-26 17:26:40 +03:00
Avi Kivity
f015bd69b7 toppartitions: compare schemas using schema::id(), not pointer to schema
This allows keys from different stages in the schema's like to compare equal.
This is safe since the partition key cannot change, unlike the rest of the schema.

More importantly, it will allow us to compare keys made local after a pass through
global_schema_ptr, which does not guarantee that the schema_ptr conversion will be
the same even when starting with the same global_schema_ptr.
2019-09-26 17:15:46 +03:00
Avi Kivity
ea4976a128 schema_registry: mark global_schema_ptr move constructor noexcept
Throwing move constructors are a a pain; so we should try to make
them noexcept. Currently, global_schema_ptr's move constructor
throws an exception if used illegaly (moving from a different shard);
this patch changes it to an assert, on the grounds that this error
is impossible to recover from.

The direct motivation for the patch is the desire to store objects
containing a global_schema_ptr in a chunked_vector, to move lists
of partition keys across shards for the topppartitions functionality.
chunked_vector currently requires noexcept move constructors for its
value_type.
2019-09-26 16:56:59 +03:00
Avi Kivity
ba64ec78cf messaging_service: use rpc::tuple instead of variadic futures for rpc
Since variadic future<> is deprecated, switch to rpc::tuple for multiple
return values in rpc calls. This is more or less mechanical translation.
2019-09-26 12:09:31 +02:00
Tomasz Grabiec
9183e28f2c Merge "Recreate dependent user types" from Rafael
When a user type changes we were not recreating other uses types that
use it. This patch series fixes that and makes it clear which code is
responsible for it.

In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049
2019-09-26 12:06:32 +02:00
Gleb Natapov
e0b303b432 lwt: make _last_timestamp_micros static
If each client_state has its own copy of the variable two clients may
generate timestamps that clash and needlessly create contention. Making
the variable shared between all client_state on the same shard will make
sure this will not happen to two clients on the same shard. It may still
happen for two client on two different shards or two different nodes.
2019-09-26 11:44:00 +03:00
Gleb Natapov
622d21f740 lwt: Add client_state::get_timestamp_for_paxos() function
Paxos needs a unique timestamp that is greater than some other
timestamp, so that the next round had more chances to succeed.
Add a function that returns such a timestamp.
2019-09-26 11:44:00 +03:00
Gleb Natapov
e72a105b5e lwt: Pass client_state reference all the way to storage_proxy::query
client_state holds a state to generate monotonically increasing unique
timestamp. Queries with a SERIAL consistency level need it to generate
a paxos round.
2019-09-26 11:44:00 +03:00
Gleb Natapov
556f65e8a1 exceptions: Add a constructor for unavailable_exception that allows providing a custom message 2019-09-26 11:44:00 +03:00
Gleb Natapov
209414b4eb serializer: Add std::variant support 2019-09-26 11:44:00 +03:00
Gleb Natapov
f9209e27d4 lwt: Add missing functions to utils/UUID_gen.hh
Some lwt related code is missing in our UUID implementation. Add it.
2019-09-26 11:44:00 +03:00
Rafael Ávila de Espíndola
5af8b1e4a3 types: recreate dependent user types.
In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
4c3209c549 types: Don't include dependent user types in update.
The way schema changes propagate is by editing the system tables and
comparing the before and after state.

When a user type A uses another user type B and we modify B, the
representation of A in the system table doesn't change, so this code
was not producing any changes on the diff that the receiving side
uses.

Deleting it makes it clear that it is the receiver's responsibility to
handle dependent user types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
34eddafdb0 types: Don't modify the type list in db::cql_type_parser::raw_builder
With this patch db::cql_type_parser::raw_builder creates a local copy
of the list of existing types and uses that internally. By doing that
build() should have no observable behavior other than returning the
new types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
d6b2e3b23b types: pass a reference to prepare_internal
We were never passing a null pointer and never saving a copy of the
lw_shared_ptr. Passing a reference is more flexible as not all callers
are required to hold the user_types_metadata in a lw_shared_ptr.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:40:30 -07:00
Avi Kivity
03260dd910 Update seastar submodule
* seastar b56a8c5045...c21a7557f9 (3):
  > net: socket::{set,get}_reuseaddr() should not be virtual
  > iotune: print verbose message in case of shutdown errors
  > iotune: close test file on shutdown

Fixes #4946.
2019-09-25 16:08:32 +03:00
Tomasz Grabiec
06b9818e98 Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny
1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand.
2. Lookup the view_update_write_response_handler id before calling  timeout_cb and tolerate it not found.
   Just log a warning if this happened.

Fixes #5032
2019-09-25 14:49:42 +02:00
Avi Kivity
83bc59a89f Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099)" from Tomasz
"
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during table schema altering when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
"

* 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla:
  mvcc: Fix incorrect schema verison being used to copy the mutation when applying
  mutation_partition: Track and validate schema version in debug builds
  tests: Use the correct schema to access mutation_partition
2019-09-25 15:30:22 +03:00
Tomasz Grabiec
11440ff792 mvcc: Fix incorrect schema verison being used to copy the mutation when applying
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect, because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during alter when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
2019-09-25 11:28:07 +02:00
Tomasz Grabiec
bce0dac751 mutation_partition: Track and validate schema version in debug builds
This patch makes mutation_partition validate the invariant that it's
supposed to be accessed only with the schema version which it conforms
to.

Refs #5095
2019-09-25 10:27:06 +02:00
Avi Kivity
721fa44c4f Update seastar submodule
* seastar e51a1a8ed9...b56a8c5045 (3):
  > net: add support for UNIX-domain sockets
  > future: Warn on promise::set_exception with no corresponding future or task
  > Merge "Handle exceptions in repeat_until_value and misc cleanups" from Rafael
2019-09-25 11:21:57 +03:00
Benny Halevy
e9388b3f03 storage_proxy::drain_on_shutdown fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
b7c7af8a75 storage_proxy: validate id from view_update_handlers_list
Handle a race where a write handler is removed from _response_handlers
but not yet from _view_update_handlers_list.

Fixes #5032

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
1fea5f5904 storage_proxy: refactor remove_response_handler
Refactor remove_response_handler_entry out of remove_response_handler,
to be called on a valid iterator found by _response_handlers.find(id).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
592c4bcfc2 storage_proxy: remove_response_handler: assert id was found
Help identify cases like seen in #5032 where the handler id
wasn't found from the on_down -> timeout_cb path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Raphael S. Carvalho
571fa94eb5 sstables/compaction_manager: Don't perform upgrade on shared SSTables
compaction_manager::perform_sstable_upgrade() fails when it feeds
compaction mechanism with shared sstables. Shared sstables should
be ignored when performing upgrade and so wait for reshard to pick
them up in parallel. Whenever a shared sstable is brought up either
on restart or via refresh, reshard procedure kicks in.
Reshard picks the highest supported format so the upgrade for
shared sstable will naturally take place.

Fixes #5056.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>
2019-09-25 11:18:40 +03:00
Asias He
19e8c14ad1 gossiper: Improve the gossip timer callback lock handling (#5097)
- Update the outdated comments in do_stop_gossiping. It was
  storage_service not storage_proxy that used the lock. More
  importantly, storage_service does not use it any more.

- Drop the unused timer_callback_lock and timer_callback_unlock API

- Use with_semaphore to make sure the semaphore usage is balanced.

- Add log in gossiper::do_stop_gossiping when it tries to take the
  semaphore to help debug hang during the shutdown.

Refs: #4891
Refs: #4971
2019-09-25 10:46:38 +03:00
Tomasz Grabiec
4d9b176aaa tests: Use the correct schema to access mutation_partition 2019-09-24 19:46:57 +02:00
Botond Dénes
425cc0c104 doc: add debugging.md
A documentation file that is intended to be a place for anything
debugging related: getting started tutorial, tips and tricks and
advanced guides.
For now it contains a short introductions, some selected links to
more in-depth documentation and some trips and tricks that I could think
off the top of my head.
One of those tricks describes how to load cores obtained from
relocatable packages inside the `dbuild` container. I originally
intended to add that to `tools/toolchain/README.md` but was convinced
that `docs/debugging.md` would be a better place for this.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924133110.15069-1-bdenes@scylladb.com>
2019-09-24 20:18:45 +03:00
Botond Dénes
d57ab83bc8 querier_cache: add inserted stat
Recently we have seen a case where the population stat of the cache was
corrupt, either due to misaccounting or some more serious corruption.
When debugging something like that it would have been useful to know how
many items have been inserted to the cache. I also believe that such a
counter could be useful generally as well.

Refs: #4918

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924083429.43038-1-bdenes@scylladb.com>
2019-09-24 10:52:49 +02:00
Avi Kivity
8e8a048ada Merge "lsa: Assert no cross-shard region locking #5090" from Tomasz
"
We observed an abort on bad_alloc which was not caused by real OOM,
but could be explained by cache region being locked from a different
shard, which is not allowed, concurrently with memory reclamation.

It's impossible now to prove this, or, if that was indeed the case, to
determine which code path was attempting such lock. This patch adds an
assert which would catch such incorrect locking at the attempt.

Refs #4978

Tests:
 - unit (dev, release, debug)
"

* 'assert-no-xshard-lsa-locking' of https://github.com/tgrabiec/scylla:
  lsa: Assert no cross-shard region locking
  tests: Make managed_vector_test a seastar test
2019-09-23 19:52:47 +03:00
Avi Kivity
79d17f3c80 Update seastar submodule
* seastar 2a526bb120...e51a1a8ed9 (2):
  > rpc: introduce rpc::tuple as a way to move away from variadic future
  > shared_future: don't warn on broken futures
2019-09-23 19:50:40 +03:00
Avi Kivity
1b8009d10c sstables: compaction_manager: #include seastarx.hh
Make it easier for the IDE to resolve references to the seastar
namespace. In any case include files should be stand-alone and not
depend on previously included files.
2019-09-23 16:12:49 +02:00
Avi Kivity
07af9774b3 relocatable: erase build directory from executable and debug info
The build directory is meaningless, since it is typically some
directory in a continuous integration server. That means someone
debugging the relocatable package needs to issue the gdb command
'set substitute-path' with the correct arguments, or they lose
source debugging. Doing so in the relocatable package build saves
this step.

The default build is not modified, since a typical local build
benefits from having the paths hardcoded, as the debugger will
find the sources automatically.
2019-09-23 13:08:15 +02:00
Tomasz Grabiec
eb08ab7ed9 lsa: Assert no cross-shard region locking
We observed an abort on bad_alloc which was not caused by real OOM,
but could be explained by cache region being locked from a different
shard, which is not allowed, concurrently with memory reclamation.

It's impossible now to prove this, or, if that was indeed the case, to
determine which code path was attempting such lock. This patch adds an
assert which would catch such incorrect locking at the attempt.

Refs #4978
2019-09-23 12:51:29 +02:00
Tomasz Grabiec
8bedcd6696 tests: Make managed_vector_test a seastar test
LSA will depend on seastar reactor being present.
2019-09-23 12:51:24 +02:00
Raphael S. Carvalho
b4cf429aab sstables/LCS: Fix increased write amplification due to incorrect SSTable demotion
LCS demotes a SSTable from a given level when it thinks that level is inactive.
Inactive level means N rounds (compaction attempt) without any activity in it,
in other words, no SSTable has been promoted to it.
The problem happens because the metadata that tracks inactiveness of each level
can be incorrectly updated when there's an ongoing compaction. LCS has parallel
compaction disabled. So if a table finds itself running a long operation like
cleanup that blocks minor compaction, LCS could incorrectly think that many
levels need demotion, and by the time cleanup finishes, some demotions would
incorrectly take place.
This problem is fixed by only updating the counter that tracks inactiveness
when compaction completes, so it's not incorrectly updated when there's an
ongoing compaction for the table.

Fixes #4919.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190917235708.8131-1-raphaelsc@scylladb.com>
2019-09-22 10:46:38 +03:00
Eliran Sinvani
280715ad45 Storage proxy: protect against infinite recursion in query_partition_key_range_concurrent
A recent fix to #3767 limited the amount of ranges that
can return from query_ranges_to_vnodes_generator. This with
the combination of a large amount of token ranges can lead to
an infinite recursion. The algorithm multiplies by factor of
2 (actualy a shift left by one)  the amount of requested
tokens in each recursion iteration. As long as the requested
number of ranges is greater than 0, the recursion is implicit,
and each call is scheduled separately since the call is inside
a continuation of a map reduce.
But if the amount of iterations is large enough (~32) the
counter for requested ranges zeros out and from that moment on
two things will happen:
1. The counter will remain 0 forever (0*2 == 0)
2. The map reduce future will be immediately available and this
will result in the continuation being invoked immediately.
The latter causes the recursive call to be a "regular" recursive call
thus, through the stack and not the task queue of the scheduler, and
the former causes this recursion to be infinite.
The combination creates a stack that keeps growing and eventually
overflows resulting in undefined behavior (due to memory overrun).

This patch prevent the problem from happening, it limits the growth of
the concurrency counter beyond twice the last amount of tokens returned
by the query_ranges_to_vnodes_generator.And also makes sure it is not
get stuck at zero.

Testing: * Unit test in dev mode.
         * Modified add 50 dtest that reproduce the problem

Fixes #4944

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190922072838.14957-1-eliransin@scylladb.com>
2019-09-22 10:33:31 +03:00
Gleb Natapov
73e3d0a283 messaging_service: enable reuseaddr on messaging service rpc
Fixes #4943

Message-Id: <20190918152405.GV21540@scylladb.com>
2019-09-19 11:43:03 +03:00
Rafael Ávila de Espíndola
4d0916a094 commitlog: Handle gate_closed_exception
Before this patch, if the _gate is closed, with_gate throws and
forward_to is not executed. When the promise<> p is destroyed it marks
its _task as a broken promise.

What happens next depends on the branch.

On master, we warn when the shared_future is destroyed, so this patch
changes the warning from a broken_promise to a gate closed.

On 3.1, we warn when the promises in shared_future::_peers are
destroyed since they no longer have a future attached: The future that
was attached was the "auto f" just before the with_gate call, and it
is destroyed when with_gate throws. The net result is that this patch
fixes the warning in 3.1.

I will send a patch to seastar to make the warning on master more
consistent with the warning in 3.1.

Fixes #4394

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190917211915.117252-1-espindola@scylladb.com>
2019-09-17 23:41:21 +02:00
Avi Kivity
60656d1959 Update seastar submodule
* seastar 84d8e9fe9b...2a526bb120 (1):
  > iotune: fix exception handling in case test file creation fails

Fixes #5001.
2019-09-16 19:39:14 +03:00
Glauber Costa
c9f2d1d105 do not crash in user-defined operations if the controller is disabled
Scylla currently crashes if we run manual operations like nodetool
compact with the controller disabled. While we neither like nor
recommend running with the controller disabled, due to some corner cases
in the controller algorithm we are not yet at the point in which we can
deprecate this and are sometimes forced to disable it.

The reason for the crash is that manual operations will invoke
_backlog_of_shares, which returns what is the backlog needed to
create a certain number of shares. That scan the existing control
points, but when we run without the controller there are no control
points and we crash.

Backlog doesn't matter if the controller is disabled, and the return
value of this function will be immaterial in this case. So to avoid the
crash, we return something right away if the controller is disabled.

Fixes #5016

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-09-16 18:26:57 +02:00
Avi Kivity
d77171e10e build: adjust libthread_db file name to match gdb expectations
gdb searches for libthread_db.so using its canonical name of libthread_db.so.1 rather
than the file name of libthread_db-1.0.so, so use that name to store the file in the
archive.

Fixes #4996.
2019-09-16 14:48:42 +02:00
Avi Kivity
7502985112 Update seastar submodule
* seastar b3fb4aaab3...84d8e9fe9b (8):
  > Use aio fsync if available
  > Merge "fix some tcp connection bugs and add reuseaddr option to a client socket" from Gleb
  > lz4: use LZ4_decompress_safe
  > reactor: document seastar::remove_file()
  > core/file.hh: remove redundant std::move()
  > core/{file,sstring}: do not add `const` to return value
  > http/api_docs: always call parent constructor
  > Add input_stream blurb
2019-09-16 11:52:55 +03:00
Piotr Sarna
feec3825aa view: degrade shutdown bookkeeping update failures log to warn
Currently, if updating bookkeeping operations for view building fails,
we log the error message and continue. However, during shutdown,
some errors are more likely to happen due to existing issues
like #4384. To differentiate actual errors from semi-expected
errors during shutdown, the latter are now logged with a warning
level instead of error.

Fixes #4954
2019-09-16 10:13:06 +03:00
Piotr Sarna
f912122072 main: log unexpected errors thrown on shutdown (#4993)
Shutdown routines are usually implemented via the deferred_action
mechanism, which runs a function in its destructor. We thus expect
the function to be noexcept, but unfortunately it's not always
the case. Throwing in the destructor results in terminating the program
anyway, but before we do that, the exception can be logged so it's
easier to investigate and pinpoint the issue.

Example output before the patch:
INFO  2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder
terminate called without an active exception
Aborting on shard 0.
Backtrace:
  0x000000000184a9ad
(...)

Example output after the patch:
INFO  2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder
ERROR 2019-09-10 12:49:05,858 [shard 0] init - Unexpected error on shutdown: std::runtime_error (Hello there!)
terminate called without an active exception
Aborting on shard 0.
Backtrace:
  0x000000000184a9ad
(...)
2019-09-16 09:42:55 +03:00
Rafael Ávila de Espíndola
1d9ba4c79b types: Simplify and explain from_varint_to_integer
This simplifies the implementation of from_varint_to_integer and
avoids using the fact that a static_cast from cpp_int to uint64_t
seems to just keep the low 64 bits.

The boost release notes
(https://www.boost.org/users/history/version_1_67_0.html) implies that
the conversion function should return the maximum value a uint64_t can
hold if the original value is too large.

The idea of using a & with ~0 is a suggestion from the boost release
notes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-15 14:44:54 -07:00
Rafael Ávila de Espíndola
6611e9faf7 Add more cast tests
These cover converting a varint to a value smaller than 64 bits.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-15 14:44:54 -07:00
Benny Halevy
c22ad90c04 scyllatop: livedata, metric: expire absent metrics
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 19:48:09 +03:00
Benny Halevy
6e807a56e1 scyllatop: livedata: update all metrics based on new discovered list
Update current results dictionary using the Metric.discover method.

New results are added and missing results are marked as absent.
(Both full metrics or specific keys)

Previously, with prometheous, each metric.update called query_list
resulting in O(n^2) when all metric were updated, like in the scylla_top
dtest - causing test timeout when testing debug build.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 19:45:34 +03:00
Benny Halevy
16de4600a0 scyllatop: metric: return discover results as dict
So that we can easily search by symbol for updating
multiple results in a single pass.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
02707621d4 scyllatop: metric: update_info in discover
So that all metric information can be retrieved in a single pass.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
3861460d3b scyllatop: metric: refactor update method
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
99ab60fc27 scyllatop: metric: add_to_results
In preparation to changing results to a dict
use a method to add a new metric to the results.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
b489556807 scyllatop: metric: refactor discover and discover_with_help
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
8f7c721907 scyllatop: livedata: get rid of _setupUserSpecifiedMetrics
Add self._metricPatterns member and merge _setupUserSpecifiedMetrics
with _initializeMetrics.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
c17aee0dd3 scyllatop: add debug logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Tomasz Grabiec
79935df959 commitlog: replay: Respect back-pressure from memtable space to prevent OOM
Commit log replay was bypassing memtable space back-pressure, and if
replay was faster than memtable flush, it could lead to OOM.

The fix is to call database::apply_in_memory() instead of
table::apply(). The former blocks when memtable space is full.

Fixes #4982.

Tests:
  - unit (release)
  - manual, replay with memtable flush failin and without failing

Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>
2019-09-15 11:51:56 +03:00
Tomasz Grabiec
3c49b2960b gdb: Introduce 'scylla memtables'
Example output:

(gdb) scylla memtables
table "ks_truncate"."standard1":
  (memtable*) 0x60c0005a5500: total=131072, used=131072, free=0, flushed=0
table "keyspace1"."standard1":
  (memtable*) 0x60c0005a6000: total=5144444928, used=4512728524, free=631716404, flushed=0
  (memtable*) 0x60c0005a8a80: total=426901504, used=374294312, free=52607192, flushed=0
  (memtable*) 0x60c000eb6a80: total=0, used=0, free=0, flushed=0
table "system_traces"."sessions_time_idx":
  (memtable*) 0x60c0005a4d80: total=131072, used=131072, free=0, flushed=0


Message-Id: <1568133476-22463-1-git-send-email-tgrabiec@scylladb.com>
2019-09-15 10:39:55 +03:00
Kamil Braun
9bf4fe669f Auto-expand replication_factor for NetworkTopologyStrategy (#4667)
If the user supplies the 'replication_factor' to the 'NetworkTopologyStrategy' class,
it will expand into a replication factor for each existing DC for their convenience.

Resolves #4210.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-09-15 10:38:09 +03:00
Tomasz Grabiec
8517eecc28 Revert "Simplify db::cql_type_parser::parse"
This reverts commit 7f64a6ec4b.

Fixes #5011

The reverted commit exposes #3760 for all schemas, not only those
which have UDTs.

The problem is that table schema deserialization now requires keyspace
to be present. If the replica hasn't received schema changes which
introduce the keyspace yet, the write will fail.
2019-09-12 12:45:21 +02:00
Nadav Har'El
67a07e9cbc README.md: mention Alternator
Mention on the top-level README.md that Scylla by default is compatible
with Cassandra, but also has experimental support for DynamoDB's API.
Provide links to alternator/alternator.md and alternator/getting-started.md
with more information about this feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190911080913.10141-1-nyh@scylladb.com>
2019-09-11 18:01:58 +03:00
Avi Kivity
c08921b55a Merge "Alternator - Add support for DynamoDB Compatible API in Scylla" from Nadav & Piotr
"
In this patch set, written by Piotr Sarna and myself, we add Alternator - a new
Scylla feature adding compatibility with the API of Amazon DynamoDB(TM).
DynamoDB's API uses JSON-encoded requests and responses which are sent over
an HTTP or HTTPS transport. It is described in detail on Amazon's site:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/

Our goal is that any application written to use Amazon DynamoDB could
be run, unmodified, against Scylla with Alternator enabled. However, at this
stage the Alternator implementation is incomplete, and some of DynamoDB's
API features are not yet supported. The extent of Alternator's compatibility
with DynamoDB is described in the document docs/alternator/alternator.md
included in this patch set. The same document also describes Alternator's
design (and also points to a longer design document).

By default, Scylla continues to listen only to Cassandra API requests and not
DynamoDB API requests. To enable DynamoDB-API compatibility, you must set
the alternator-port configuration option (via command line or YAML) to the port on
which you wish to listen for DynamoDB API requests. For more information, see
docs/alternator/alternator.md. The document docs/alternator/getting-started.md
also contains some examples of how to get started with Alternator.
"

* 'alternator' of https://github.com/nyh/scylla: (272 commits)
  Added comments about DAX, monitoring and more
  alternator: fix usage of client_state
  alternator-test: complete test_expected.py for rest of comparison operators
  alternator-test: reproduce bug in Expected with EQ of set value
  alternator: implement the Expected request parameter
  alternator: add returning PAY_PER_REQUEST billing mode
  alternator: update docs/alternator.md on GSI/LSI situation
  Alternator: Add getting started document for alternator
  move alternator.md to its own directory
  alternator-test: add xfail test for GSI with 2 regular columns
  alternator/executor.cc: Latencies should use steady_clock
  alternator-test: fix LSI tests
  alternator-test: fix test_describe_endpoints.py for AWS run
  alternator-test: test_describe_endpoints.py without configuring AWS
  alternator: run local tests without configuring AWS
  alternator-test: add LSI tests
  alternator-test: bump create table time limit to 200s
  alternator: add basic LSI support
  alternator: rename reserved column name "attrs"
  alternator: migrate make_map_element_restriction to string view
  ...
2019-09-11 18:01:05 +03:00
Dor Laor
7d639d058e Added comments about DAX, monitoring and more 2019-09-11 18:01:05 +03:00
Nadav Har'El
c953aa3e20 alternator-test: complete test_expected.py for rest of comparison operators
This patch adds tests for all the missing comparion operators in the
Expected parameter (the old-style parameter for conditional operations).
All these new tests are now xfailing on Alternator (and succeeding on
DynamoDB), because these operators are not yet implemented in Alternator
(we only implemented EQ and BEGINS_WITH, so far - the rest are easy but
need to be implemented).

The test_expected.py is now hopefully comprehensive, covering the entire
feature set of the "Expected" parameter and all its various cases and
subcases.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190910092208.23461-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
23bb3948ee alternator-test: reproduce bug in Expected with EQ of set value
Our implementation of the "EQ" operator in Expected (conditional
operation) just compares the JSON represntation of the values.
This is almost always correct, but unfortunately incorrect for
sets - where we can have two equal sets despite having a
different order.

This patch just adds an (xfailing) test for this bug.

The bug itself can be fixed in the future in one of several ways
including changing the implementation of EQ, or changing the
serialization of sets so they'll always be sorted in the same
way.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190909125147.16484-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
13d657b20d alternator: implement the Expected request parameter
In this patch we implement the Expected parameter for the UpdateItem,
PutItem and DeleteItem operations. This parameter allows a conditional
update - i.e., do an update only if the existing value of the item
matches some condition.
This is the older form of conditional updates, but is still used by many
applications, including Amazon's Tic-Tac-Toe demo.

As usual, we do not yet provide isolation guarantees for read-modify-write
operations - the item is simply read before the modification, and there is
no protection against concurrent operation. This will of course need to be
addressed in the future.

The Expected parameter has a relatively large number of variations, and most
of them are supported by this code, except that currenly only two comparison
operators are supported (EQ and BEGINS_WITH) out of the 13 listed in the
documentation. The rest will be implemented later.

This patch also includes comprehensive tests for the Expected feature.
These tests are almost exhaustive, except for one missing part (labled FIXME) -
among the 13 comparison operations, the tests only check the EQ and BEGINS_WITH
operators. We'll later need to add checks to the rest of them as well.
As usual, all the tests pass on Amazon DynamoDB, and after this patch all
of them succeed on Alternator too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190905125558.29133-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
c5fc48d1ee alternator: add returning PAY_PER_REQUEST billing mode
In order for Spark jobs to work correctly, a hardcoded PAY_PER_REQUEST
billing mode entry is returned when describing a table with
a DescribeTable request.
Also, one test case in test_describe_table.py is no longer marked XFAIL.
Message-Id: <a4e6d02788d8be48b389045e6ff8c1628240197c.1567688894.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b58eadd6c9 alternator: update docs/alternator.md on GSI/LSI situation
Update docs/alternator.md on the current level of compatibility of our
GSI and LSI implementation vs. DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190904120730.12615-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Eliran Sinvani
a6f600c54f Alternator: Add getting started document for alternator
This patch adds a getting started document for alternator,
it explains how to start up a cluster that has an alternator
API port open and how to test that it works using either an
application or some simple and minimal python scripts.
The goal of the document is to get a user to have an up and
running docker based cluster with alternator support in the
shortest time possible.
2019-09-11 18:01:05 +03:00
Eliran Sinvani
573ff2de35 move alternator.md to its own directory
As part of trying to make alternator more accessible
to users, we expect more documents to be created so
it seems like a good idea to give all of the alternator
docs their own directory.
2019-09-11 18:01:05 +03:00
Piotr Sarna
6579a3850a alternator-test: add xfail test for GSI with 2 regular columns
When updating the second regular base column that is also a view
key, the code in Scylla will assume it only needs to update an entry
instead of replacing an old one. This leads to inconsitencies
exposed in the test case.
Message-Id: <5dfeb9f61f986daa6e480e9da4c7aabb5a09a4ec.1567599461.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Amnon Heiman
722b4b6e98 alternator/executor.cc: Latencies should use steady_clock
To get a correct latency estimations executor should use a higher clock
resolution.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
b470137cea alternator-test: fix LSI tests
LSI tests are amended, so they no longer needlessly XPASS:
 * two xpassing tests are no longer marked XFAIL
 * there's an additional test for partial projection
   that succeeds on DynamoDB and does not work fine yet in alternator
Message-Id: <0418186cb6c8a91de84837ffef9ac0947ea4e3d3.1567585915.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
dc1d577421 alternator-test: fix test_describe_endpoints.py for AWS run
The previous patch fixed test_describe_endpoints.py for a local run
without an AWS configuration. But when running with "--aws", we do
need to use that AWS configuration, and this patch fixes this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
897dffb977 alternator-test: test_describe_endpoints.py without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

The previous patch already fixed this for most tests, this patch fixes the
same issue in test_describe_endpoints.py, which had a separate copy of the
problematic code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b39101cd04 alternator: run local tests without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

Also modified the README to be clearer, and more focused on the local
runs.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708121420.7485-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
efff187deb alternator-test: add LSI tests
Cases for local secondary indexes are added - loosely based on
test_gsi.py suite.
2019-09-11 18:01:05 +03:00
Piotr Sarna
927dc87b9c alternator-test: bump create table time limit to 200s
Unfortunately the previous 100s limit proved to be not enough
for creating tables with both local and global indexes attached
to them. Empirically 200s was chosen as a safe default,
as the longest test oscillated around 100s with the deviation of 10s.
2019-09-11 18:01:05 +03:00
Piotr Sarna
2fcd1ff8a9 alternator: add basic LSI support
With this patch, LocalSecondaryIndexes can be added to a table
during its creation. The implementation is heavily shared
with GlobalSecondaryIndexes and as such suffers from the same TODOs:
projections, describing more details in DescribeTable, etc.
2019-09-11 18:01:05 +03:00
Nadav Har'El
7b8917b5cb alternator: rename reserved column name "attrs"
We currently reserve the column name "attrs" for a map of attributes,
so the user is not allowed to use this name as a name of a key.

We plan to lift this reservation in a future patch, but until we do,
let's at least choose a more obscure name to forbid - in this patch ":attrs".
It is even less likely that a user will want to use this specific name
as a column name.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190903133508.2033-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ef7903a90f alternator: migrate make_map_element_restriction to string view
In order to elide unnecessary copying and allow more copy elision
in the future, make_map_element_restriction helper function
uses string_view instead of a const string reference.
Message-Id: <1a3e82e7046dc40df604ee7fbea786f3853fee4d.1567502264.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
fc946ddfba alternator: clean error, not a crash, on reserved column name
Currently, we reserve the name ATTRS_COLUMN_NAME ("attrs") - the user
cannot use it as a key column name (key of the base table or GSI or LSI)
because we use this name for the attribute map we add to the schema.

Currently, if the user does attempt to create such a key column, the
result is undefined (sometimes corrupt sstables, sometimes outright crashes).
This patches fixes it to become a clean error, saying that this column name is
currently reserved.

The test test_create_table_special_column_name now cleanly fails, instead
of crashing Scylla, so it is converted from "skip" to "xfail".

Eventually we need to solve this issue completely (e.g., in rare cases
rename columns to allow us to reserve a name like ATTRS_COLUMN_NAME,
or alternatively, instead of using a fixed name ATTRS_COLUMN_NAME pick a
different one different from the key column names). But until we do,
better fail with a clear error instead of a crash.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190901102832.7452-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
d64980f2ae alternator-test: add initial test_condition_expression file
The file initially consists of a very simple case that succeeds
with `--aws` and expectedly fails without it, because the expression
is not implemented yet.
2019-09-11 18:01:05 +03:00
Piotr Sarna
80edc00f62 alternator-test: add tests for unsupported expressions
The test cases are marked XFAIL, as their expressions are not yet
supported in alternator. With `--aws`, they pass.
2019-09-11 18:01:05 +03:00
Pekka Enberg
380a7be54b dist/docker: Add support for Alternator
This adds a "alternator-address" and "alternator-port" configuration
options to the Docker image, so people can enable Alternator with
"docker run" with:

  docker run --name some-scylla -d <image> --alternator-port=8080
Message-Id: <20190902110920.19269-1-penberg@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
3fae8239fa alternator: throw on unsupported expressions
When an unsupported expression parameter is encountered -
KeyConditionExpression, ConditionExpression or FilterExpression
are such - alternator will return an error instead of ignoring
the parameter.
2019-09-11 18:01:05 +03:00
Amnon Heiman
811df711fb alternator/executor: update the latencies histogram
This patch update the latencies histogram for get, put, delete and
update.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Amnon Heiman
4a6d1f5559 alternator/stats metrics: use labels and estimated histogram
This patch make two chagnes to the alternator stats:
1. It add estimated_histogram for the get, put, update and delete
operation

2. It changes the metrics naming, so the operation will be a label, it
will be easier to handle, perform operation and display in this way.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
de53ed7cdd alternator_test: mark test_gsi_3 as passing
The test_gsi_3, involving creating a GSI with two key columns which weren't
previously a base key, now passes, so drop the "xfail" marker.

We still have problems with such materialized views, but not in the simple
scenario tested by test_gsi_3.

Later we should create a new test for the scenario which still fails, if
any.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
0e6338ffd9 alternator: allow creating GSI with 2 base regular columns
Creating an underlying materialized view with 2 regular base columns
is risky in Scylla, as second's column liveness will not be correctly
taken into account when ensuring view row liveness.
Still, in case specific conditions are met:
 * the regular base column value is always present in the base row
 * no TTLs are involved
then the materialized view will behave as expected.

Creating a GSI with 2 base regular columns issues a warning,
as it should be performed with care.
Message-Id: <5ce8642c1576529d43ea05e5c4bab64d122df829.1567159633.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
3325e76c6f alternator: fix default BillingMode
It is important that BillingMode should default to PROVISIONED, as it
does on DynamoDB. This allows old clients, which don't specify
BillingMode at all, to specify ProvisionedThroughput as allowed with
PROVISIONED.

Also added a test case for this case (where BillingMode is absent).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829193027.7982-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
395a97e928 alternator: correct error on missing index or table
When querying on a missing index, DynamoDB returns different errors in
case the entire table is missing (ResourceNotFoundException) or the table
exists and just the index is missing (ValidationException). We didn't
make this distinction, and always returned ValidationException, but this
confuses clients that expect ResourceNotFoundException - e.g., Amazon's
Tic-Tac-Toe demo.

This patch adds a test for the first case (the completely missing table) -
we already had a test for the second case - and returns the correct
error codes. As usual the test passes against DynamoDB as well as Alternator,
ensure they behave the same.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829174113.5558-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
62c4ed8ee3 alternator: improve request logging
We needlessly split the trace-level log message for the request to two
messages - one containing just the operation's name, and one with the
parameters. Moreover we printed them in the opposite order (parameters
first, then the operation). So this patch combines them into one log
message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829165341.3600-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
f755c22577 alternator-test: reproduce bug with using "attrs" as key column name
Alternator puts in the Scylla table a column called "attrs" for all the
non-key attributes. If the user happens to choose the same name, "attrs",
for one of the key columns, the result of writing two different columns
with the same name is a mess and corrupt sstables.

This test reproduces this bug (and works against DynamoDB of course).

Because the test doesn't cleanly fail, but rather leaves Scylla in a bad
state from which it can't fully recover, the test is marked as "skip"
until we fix this bug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190828135644.23248-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
6b27eaf4d0 alternator: remove redundant key checks in UpdateItem
Updating key columns is not allowed in UpdateItem requests,
but the series introducing GSI support for regular columns
also introduced redundant duplicates checks of this kind.
This condition is already checked in resolve_update_path helper function
and existing test_update_expression_cannot_modify_key test makes sure that
the condition is checked.
Message-Id: <00f83ab631f93b263003fb09cd7b055bee1565cd.1567086111.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
04a117cda3 alternator-test: improve test_update_expression_cannot_modify_key
The test test_update_expression_cannot_modify_key() verifies that an
update expression cannot modify one of the key columns. The existing
test only tried the SET and REMOVE actions - this patch makes the
test more complete by also testing the ADD and DELETE actions.

This patch also makes the expected exception more picky - we now
expect that the exception message contains the word "key" (as it,
indeed, does on both DynamoDB and Alternator). If we get any other
exception, there may be a problem.

The test passed before this patch, and passes now as well - it's just
stricter now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829135650.30928-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
81a97b2ac0 alternator-test: add test case for GSI with both keys
A case which adds a global secondary index on a table with both
hash and sort keys is added.
2019-09-11 18:01:05 +03:00
Piotr Sarna
615603877c alternator: use from_single_value instead of from_singular in ck
The code previously used clustering_key::from_singular() to compute
a clustering key value. It works fine, but has two issues:
1. involves one redundant deserialization stage compared to
   from_single_value
2. does not work with compound clustering keys, which can appear
   when using indexes
2019-09-11 18:01:05 +03:00
Piotr Sarna
4474ceceed alternator-test: enable passing tests
With more GSI features implemented, tests with XPASS status are promoted
to being enabled.

One test case (test_gsi_describe) is partially done as DescribeTable
now contains index names, but we could try providing more attributes
(e.g. IndexSizeBytes and ItemCount from the test case), so the test
is left in the XFAIL state.
2019-09-11 18:01:05 +03:00
Piotr Sarna
f922d6d771 alternator: Add 'mismatch' to serialization error message
In order to match the tests and origin more properly, the error message
for mismatched types is updated so it contains the word 'mismatch'.
2019-09-11 18:01:05 +03:00
Piotr Sarna
9dceea14f9 alternator: add describing GSI in DescribeTable
The DescribeTable request now contains the list of index names
as well. None of the attributes of the list are marked as 'required'
in the documentation, so currently the implementation provides
index names only.
2019-09-11 18:01:05 +03:00
Piotr Sarna
938a06e4c0 alternator: allow adding GSI-related regular columns to schema
In order to be able to create a Global Secondary Index over a regular
column, this column is upgraded from being a map entry to being a full
member of the schema. As such, it's possible to use this column
definition in the underlying materialized view's key.
2019-09-11 18:01:05 +03:00
Piotr Sarna
2a123925ca alternator: add handling regular columns with schema definitions
In order to prepare alternator for adding regular columns to schema,
i.e. in order to create a materialized view over them,
the code is changed so that updating no longer assumes that only keys
are included in the table schema.
2019-09-11 18:01:05 +03:00
Piotr Sarna
befa2fdc80 alternator: start fetching all regular columns
Since in the future we may want to have more regular columns
in alternator tables' schemas, the code is changed accordingly,
so all regular columns will be fetched instead of just the attribute
map.
2019-09-11 18:01:05 +03:00
Piotr Sarna
53044645aa alternator: avoid creating empty collection mutations
If no regular column attributes are passed to PutItem, the attr
collector serializes an empty collection mutation nonetheless
and sends it. It's redundant, so instead, if the attr colector
is empty, the collection does not get serialized and sent to replicas.
2019-09-11 18:01:05 +03:00
Nadav Har'El
317954fe19 alternator-test: add license blurbs
Add copyright and license blurbs to all alternator-test source files.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825161018.10358-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
c9eb9d9c76 alternator: update license blurbs
Update all the license blurbs to the one we use in the open-source
Scylla project, licensed under the AGPL.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825160321.10016-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
d6e671b04f alternator: add initial tracing to requests
Each request provides basic tracing information about itself.

Example output from tracing:

cqlsh> select request, parameters from system_traces.sessions
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 request          | parameters
------------------+-----------------------------------------------------
 Alternator Query | {'query': '{"TableName": "alternator_test_15664",
                    "KeyConditions": {"p": {"AttributeValueList":
                    [{"S": "T0FE0QCS0X"}], "ComparisonOperator": "EQ"}}}'}

cqlsh> select session_id, activity from system_traces.events
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 session_id                           | activity
--------------------------------------+-----------------------------
 39813070-c4ea-11e9-8572-000000000000 |                    Querying
 39813070-c4ea-11e9-8572-000000000000 | Performing a database query
2019-09-11 18:01:05 +03:00
Piotr Sarna
cb791abb9d alternator: enable query tracing
Probabilistic tracing can be enabled via REST API. Alternator will
from now on create tracing sessions for its operations as well.

Examples:

 # trace around 0.1% of all requests
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=0.001
 # trace everything
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=1
2019-09-11 18:01:05 +03:00
Piotr Sarna
6c8c31bfc9 alternator: add client state
Keeping an instance of client_state is a convenient way of being able
to use tracing for alternator. It's also currently used in paging,
so adding a client state to executor removes the need of keeping
a dummy value.
2019-09-11 18:01:05 +03:00
Piotr Sarna
1ca9dc5d47 alternator: use correct string views in serialization
String views used in JSON serialization should use not only the pointer
returned by rapidjson, but also the string length, as it may contain
\0 characters.
Additionally, one unnecessary copy is elided.
2019-09-11 18:01:05 +03:00
Nadav Har'El
32b898db7b alternator: docs/alternator.md: link to a longer document
Add a link to a longer document (currently, around 40 pages) about
DynamoDB's features and how we implemented or may implement them in
Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825121201.31747-2-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
a5c3d11ccb alternator: document choice of RF
After changing the choice of RF in a previous patch, let's update the
relevant part of docs/alternator.md.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825121201.31747-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
d20ec9f492 alternator: expand docs/alternator.md
Expand docs/alternator.md with new sections about how to run Alternator,
and a very brief introduction to its design.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818164628.12531-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
9b0ef1a311 alternator: refuse CreateTable if uses unsupported features
If a user tries to create a table with a unsupported feature -
a local secondary index, a used-defined encryption key or supporting
streams (CDC), let's refuse the table creation, so the application
doesn't continue thinking this feature is available to it.

The "Tags" feature is also not supported, but it is more harmless
(it is used mostly for accounting purposes) so we do not fail the
table creation because of it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818125528.9091-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ab25472034 alternator: migrate to visitor pattern in serialization
Types can now be processed with a visitor pattern, which is more neat
than a chain of if statements.
Message-Id: <256429b7593d8ad8dff737d8ddb356991fb2a423.1566386758.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
42d2910f2c alternator: add from_string with raw pointer to rjson
from_string is a family of function that create rjson values from
strings - now it's extended with accepting raw pointer and size.
Message-Id: <d443e2e4dcc115471202759ecc3641ec902ed9e4.1566386758.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
2f53423a2f alternator: automatically choose RF: 1 or 3
In CQL, before a user can create a table, they must create a keyspace to
contain this table and, among other things, specify this keyspace's RF.

But in the DynamoDB API, there is no "create keyspace" operation - the
user just creates a table, and there is no way, and no opportunity,
to specify the requested RF. Presumably, Amazon always uses the same
RF for all tables, most likely 3, although this is not officially
documented anywhere.

The existing code creates the keyspace during Scylla boot, with RF=1.
This RF=1 always works, and is a good choice for a one-node test run,
but was a really bad choice for a real cluster with multiple nodes, so
this patch fixes this choice:

With this patch, the keyspace creation is delayed - it doesn't happen
when the first node of the cluster boots, but only when the user creates
the first table. Presumably, at that time, the cluster is already up,
so at that point we can make the obvious choice automatically: a one-node
cluster will get RF=1, a >=3 node cluster will get RF=3. The choice of
RF is logged - and the choice of RF=1 is considered a warning.

Note that with this patch, keyspace creation is still automatic as it
was before. The user may manually create the keyspace via CQL, to
override this automatic choice. In the future we may also add additional
keyspace configuration options via configuration flags or new REST
requests, and the keyspace management code will also likely change
as we start to support clusters with multiple regions and global
tables. But for now, I think the automatic method is easiest for
users who want to test-drive Alternator without reading lengthy
instructions on how to set up the keyspace.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190820180610.5341-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
1a1935eb72 alternator-test: add a test for wrong BEGINS_WITH target type
The test ensures that passing a non-compatible type to BEGINS WITH,
e.g. a number, results in a validation error.
Tested both locally and remotely.
Message-Id: <894a10d3da710d97633dd12b6ac54edccc18be82.1566291989.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b7b998568f alternator: add to CreateTable verification of BillingMode setting
We allow BillingMode to be set to either PAY_PER_REQUEST (the default)
or PROVISIONED, although neither mode is fully implemented: In the former
case the payment isn't accounted, and in the latter case the throughput
limits are not enforced.
But other settings for BillingMode are now refused, and we add a new test
to verify that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818122919.8431-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
66a2af4f7d alternator-test: require a new-enough boto library
The alternator tests want to exercise many of the DynamoDB API features,
so they need a recent enough version of the client libraries, boto3
and botocore. In particular, only in botocore 1.12.54, released a year
ago, was support for BillingMode added - and we rely on this to create
pay-per-request tables for our tests.

Instead of letting the user run with an old version of this library and
get dozens of mysterious errors, in this patch we add a test to conftest.py
which cleanly aborts the test if the libraries aren't new enough, and
recommends a "pip" command to upgrade these libraries.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190819121831.26101-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
64bf2b29a8 alternator-test: exhaustive tests for DescribeTable operation
The DescribeTable operation was currently implemented to return the
minimal information that libraries and applications usually need from
it, namely verifying that some table exists. However, this operation
is actually supposed to return a lot more information fields (e.g.,
the size of the table, its creation date, and more) which we currently
don't return.

This patch adds a new test file, test_describe_table.py, testing all
these additional attributes that DescribeTable is supposed to return.
Several of the tests are marked xfail (expected to fail) because we
did not implement these attributes yet.

The test is exhaustive except for attributes that have to do with four
major features which will be tested together with these features: GSI,
LSI, streams (CDC), and backup/restore.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190816132546.2764-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
fbd2f5077d alternator: enable timeouts on requests
Currently Alternator starts all Scylla requests (including both reads
and writes) without any timeout set. Because of bugs and/or network
problems, Requests can theoretically hang and waste Scylla request for
hours, long after the client has given up on them and closed their
connection.

The DynamoDB protocol doesn't let a user specify which timeout to use,
so we should just use something "reasonable", in this patch 10 seconds.
Remember that all DynamoDB read and write requests are small (even scans
just scan a small piece), so 10 seconds should be above and beyond
anything we actually expect to see in practice.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190812105132.18651-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b2bd3bbc1f alternator: add "--alternator-address" configuration parameter
So far we had the "--alternator-port" option allowing to configure the port
on which the Alternator server listens on, but the server always listened
to any address. It is important to also be able to configure the listen
address - it is useful in tests running several instances of Scylla on
the same machine, and useful in multi-homed machines with several interfaces.

So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0
(to listen on all interfaces). It works like the many other "--*-address"
options that Scylla already has.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808204641.28648-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
ea41dd2cf8 alternator: docs/alternator.md more about filtering support
Give more details about what is, and what isn't, currently
supported in filtering of Scan (and Query) results.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190811094425.30951-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
88eed415bd alternator: fix indentation
It turns out that recent rjson patches introduced some buggy
tabs instead of spaces due to bad IDE configuration. The indentation
is restored to spaces.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3c11428d8d alternator-test: add QueryFilter validation cases
QueryFilter validation was lately supplemented with non-key column
checks, which is hereby tested.
2019-09-11 18:01:05 +03:00
Piotr Sarna
0e0dc14302 alternator-test: add scan case for key equality filtering
With key equality filtering enabled, a test case for scanning is provided.
2019-09-11 18:01:05 +03:00
Piotr Sarna
f1641caa41 alternator: add filtering for key equality
Until now, filtering in alternator was possible only for non-key
column equality relations. This commit adds support for equality
relations for key columns.
2019-09-11 18:01:05 +03:00
Piotr Sarna
a2828f9daa alternator: add validation to QueryFilter
QueryFilter, according to docs, can only contain non-key attributes.
2019-09-11 18:01:05 +03:00
Piotr Sarna
d055658fff alternator: add computing key bounds from filtering
Alternator allows passing hash and sort key restrictions
as filters - it is, however, better to incorporate these restrictions
directly into partition and clustering ranges, if possible.
It's also necessary, as optimizations inside restrictions_filter
assume that it will not be fed unneeded rows - e.g. if filtering
is not needed on partition key restrictions, they will not be checked.
2019-09-11 18:01:05 +03:00
Piotr Sarna
9c05051b59 alternator: extract getting key value subfunction
Currently the only utility function for getting key bytes
from JSON was to parse a document with the following format:
"key_column_name" : { "key_column_type" : VALUE }.
However, it's also useful to parse only the inner document, i.e.:
{ "key_column_type" : VALUE }.
2019-09-11 18:01:05 +03:00
Piotr Sarna
c84019116a alternator: make make_map_element_restriction static
The function has no outside users and thus does not need to be exposed.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3ee99a89b1 alternator: register filtering metrics
Three metrics related to filtering are added to alternator:
 - total rows read during filtering operations
 - rows read and matched by filtering
 - rows read and dropped by filtering
2019-09-11 18:01:05 +03:00
Piotr Sarna
b3e35dab26 alternator: add bumping filtering stats
When filtering is used in querying or scanning, the number of total
filtered rows is added to stats.
2019-09-11 18:01:05 +03:00
Piotr Sarna
a6d098d3eb alternator: add cql_stats to alternator stats
Some underlying operations (e.g. paging) make use of cql_stats
structure from CQL3. As such, cql_stats structure is added
to alternator stats in order to gather and use these statistics.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3ae54892cd alternator: fix a comment typo
s/Miscellenous/Miscellaneous/g
2019-09-11 18:01:05 +03:00
Piotr Sarna
ccf778578a alternator: register read-before-write stats
Read-before-write stat counters were already introduced, but the metrics
needs to be added to a metric group as well in order to be available
for users.
2019-09-11 18:01:05 +03:00
Nadav Har'El
6f81d0cb15 alternator: initial support for GSI
This patch adds partial support for GSI (Global Secondary Index) in
Alternator, implemented using a materialized view in Scylla.

This initial version only supports the specific cases of the index indexing
a column which was already part of the base table's key - e.g., indexing
what used to be a sort key (clustering key) in the base table. Indexing
of non-key attributes (which today live in a map) is not yet supported in
this version.

Creation of a table with GSIs is supported, and so is deleting the table.
UpdateTable which adds a GSI to an existing table is not yet supported.
Query and Scan operations on the index are supported.
DescribeTable does not yet list the GSIs as it should.

Seven previously-failing tests now pass, so their "xfail" tag is removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808090256.12374-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
33611acf44 alternator: add stats for read-before-write
A simple metric counting how many read-before-writes were executed
is added.
Message-Id: <d8cc1e9d77e832bbdeff8202a9f792ceb4f1e274.1565274797.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ae59340c15 alternator: complement rjson.hh comments
Some comments in rjson.hh header file were not clear and are hereby
amended.
Message-Id: <7fa4e2cf39b95c176af31fe66f404a6a51a25bec.1565275276.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
5eb583ab09 alternator: remove missing key FIXME
The case for missing key in update_item was already properly fixed
along with migrating from libjsoncpp to rapidjson, but one FIXME
remained in the code by mistake.

Message-Id: <94b3cf53652aa932a661153c27aa2cb1207268c7.1565271432.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
436f806341 alternator: remove decimal_type FIXME
Decimal precision problems were already solved by commit
d5a1854d93c9448b1d22c2d02eb1c46a286c5404, but one FIXME
remained in the code by mistake.

Message-Id: <381619e26f8362a8681b83e6920052919acf1142.1565271198.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
b29b753196 alternator: add comments to rjson
The rapidjson library needs to be used with caution in order to
provide maximum performance and avoid undefined behavior.
Comments added to rjson.hh describe provided methods and potential
pitfalls to avoid.
Message-Id: <ba94eda81c8dd2f772e1d336b36cae62d39ed7e1.1565270214.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
7b02c524d0 alternator: remove a pointer-based workaround for future<json>
With libjsoncpp we were forced to work around the problem of
non-noexcept constructors by using an intermediate unique pointer.
Objects provided by rapidjson have correct noexcept specifiers,
so the workaround can be dropped.
2019-09-11 18:01:04 +03:00
Piotr Sarna
cb29d6485e alternator: migrate to rapidjson library
Profiling alternator implied that JSON parsing takes up a fair amount
of CPU, and as such should be optimized. libjsoncpp is a standard
library for handling JSON objects, but it also proves slower than
rapidjson, which is hereby used instead.
The results indicated that libjsoncpp used roughly 30% of CPU
for a single-shard alternator instance under stress, while rapidjson
dropped that usage to 18% without optimizations.
Future optimizations should include eliding object copying, string copying
and perhaps experimenting with different JSON allocators.
2019-09-11 18:01:04 +03:00
Piotr Sarna
0fd1354ef9 alternator: add handling rapidjson errors in the server
If a JSON parsing error is encountered, it is transformed
to a validation exception and returned to the user in JSON form.
2019-09-11 18:01:04 +03:00
Piotr Sarna
7064b3a2bf alternator: add rapidjson helper functions
Migrating from libjsoncpp to rapidjson proved to be beneficial
for parsing performance. As a first step, a set of helper functions
is provided to ease the migration process.
2019-09-11 18:01:04 +03:00
Piotr Sarna
0b0bfc6e54 alternator: add missing namespaces to status_type
error.hh file implicitly assumed that seastar:: namespace is available
when it's included, which is not always the case. To remedy that,
seastar::httpd namespace is used explicitly.
2019-09-11 18:01:04 +03:00
Nadav Har'El
56309db085 alternator: correct catch table-already-exists exception
Our CreateTable handler assumed that the function
migration_manager::announce_new_column_family()
returns a failed future if the table already exists. But in some of
our code branches, this is not the case - the function itself throws
instead of returning a failed future. The solution is to use
seastar::futurize_apply() to handle both possibilities (direct exception
or future holding an exception).

This fixes a failure of the test_table.py::test_create_table_already_exists
test case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:04 +03:00
Nadav Har'El
d74b203dee alternator: add docs/alternator.md
This adds a new document, docs/alternator.md, about Alternator.

The scope of this document should be expanded in the future. We begin
here by introducing Alternator and its current compatibility level with
Amazon DynamoDB, but it should later grow to explain the design of Alternator
and how it maps the DynamoDB data model onto Scylla's.

Whether this document should remain a short high-level overview, or a long
and detailed design document, remains an open question.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190805085340.17543-1-nyh@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
75ee13e5f2 dependencies: add rapidjson
The rapidjson fast JSON parsing library is used instead of libjsoncpp
in the Alternator subproject.

[avi: update toolchain image to include the new dependency]

Message-Id: <a48104dec97c190e3762f927973a08a74fb0c773.1564995712.git.sarna@scylladb.com>
2019-09-11 18:00:44 +03:00
Nadav Har'El
5eaf73a292 alternator: fix sharing of a seastar::shared_ptr between threads
The function attrs_type() return a supposedly singleton, but because
it is a seastar::shared_ptr we can't use the same one for multiple
threads, and need to use a separate one per thread.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190804163933.13772-1-nyh@scylladb.com>
2019-09-11 16:06:05 +03:00
Nadav Har'El
1b1ede9288 alternator: fix cross-shard use of CQL type objects
The CQL type singletons like utf8_type et al. are separate for separate
shards and cannot be used across shards. So whatever hash tables we use
to find them, also needs to be per-shard. If we fail to do this, we
get errors running the debug build with multiple shards.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190804165904.14204-1-nyh@scylladb.com>
2019-09-11 16:05:39 +03:00
Nadav Har'El
7eae889513 alternator-test: some more GSI tests
Expand the GSI test suite. The most important new test is
test_gsi_key_not_in_index(), where the index's key includes just one of
the base table's key columns, but not a second one. In this case, the
Scylla implementation will nevertheless need to add the second key column
to the view (as a clustering key), even though it isn't considered a key
column by the DynamoDB API.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190718085606.7763-1-nyh@scylladb.com>
2019-09-11 16:05:38 +03:00
Nadav Har'El
10ad60f7de alternator: ListTables should not list materialized views
Our ListTables implementation uses get_column_families(), which lists both
base tables and materialized views. We will use materialized views to
implement DynamoDB's secondary indexes, and those should not be listed in
the results of ListTables.

The patch also includes a test for this.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190717133103.26321-2-nyh@scylladb.com>
2019-09-11 16:04:29 +03:00
Nadav Har'El
676ada4576 alternator-test: move list_tables to util.py
The list_tables() utility function was used only in test_table.py
but I want to use it elsewhere too (in GSI test) so let's move it
to util.py.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190717133103.26321-1-nyh@scylladb.com>
2019-09-11 16:04:28 +03:00
Piotr Sarna
f3963865f5 alternator: make set_sum exception more user-friendly
As in case of set_diff, an exception message in set_sum should include
the user-provided request (ADD) rather than our internal helper function
set_sum.
2019-09-11 16:03:27 +03:00
Piotr Sarna
9dd8644e4a alternator-tests: enable DELETE case for sets
UpdateExpression's case for DELETE operation for sets is enabled.
2019-09-11 16:03:26 +03:00
Piotr Sarna
2b215b159c alternator: implement set DELETE
UpdateExpression's DELETE operation for set is implemented on top
of set_diff helper function.
2019-09-11 16:02:25 +03:00
Piotr Sarna
fe72a6740c alternator: add set difference helper function
A function for computing set differene of two sets represented
as JSON is added.
2019-09-11 16:01:03 +03:00
Nadav Har'El
e13c56be0b alternator: fail attempt to create table with GSI
Although we do not support GSI yet, until now we silently ignored
CreateTable's GSI parameter, and the user wouldn't know the table
wasn't created as intended.

In this patch, GSI is still unsupported, but now CreateTable will
fail with an error message that GSI is not supported.

We need to change some of the tests which test the error path, and
expect an error - but should not consider a table creation error
as the expected error.

After this patch, test_gsi.py still fails all the tests on
Alternator, but much more quickly :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711161420.18547-1-nyh@scylladb.com>
2019-09-11 16:00:01 +03:00
Piotr Sarna
336c90daaa alternator-test: add stub case for set add duplication
The test case for adding two sets with common values is added.
This case is a stub, because boto3 transforms the result into a Python
set, which removes duplicates on its own. A proper TODO is left
in order to migrate this case to a lower-level API and check
the returned JSON directly for lack of duplicates.
2019-09-11 16:00:00 +03:00
Piotr Sarna
67c95cb303 alternator-test: enable tests for ADD operation
Tests for UpdateExpression::ADD are enabled.
2019-09-11 15:59:59 +03:00
Piotr Sarna
f29c2f6895 alternator: add ADD operation
UpdateExpression is now able to perform ADD operation on both numbers
and sets.
2019-09-11 15:59:00 +03:00
Piotr Sarna
a5f2926056 alternator: add helper function for adding sets
A helper function that allows creating a set sum out of two sets
represented in JSON is added.
2019-09-11 15:57:41 +03:00
Piotr Sarna
18686ff288 alternator: add unwrap_set
It will be needed later to implement adding sets.
2019-09-11 15:56:15 +03:00
Piotr Sarna
09993cf857 alternator: add get_item_type_string helper function
It will be useful later for ensuring that parameters for various
functions have matching types.
2019-09-11 15:52:31 +03:00
Nadav Har'El
d54c82209c alternator: fix Query verification of appropriate key columns
The Query operation's conditions can be used to search for a particular
hash key or both hash and sort keys - but not any other combinations.
We previously forgot to verify most errors, so in this patch we add
missing verifications - and tests to confirm we fail the query when
DynamoDB does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711132720.17248-1-nyh@scylladb.com>
2019-09-11 15:51:27 +03:00
Nadav Har'El
fbe63ddcc4 alternator-test: more GSI tests
Add more tests for GSI - tests that DescribeTable describes the GSI,
and test the case of more than one GSI for a base table.

Unfortunately, creating an empty table with two GSIs routinely takes
on DynamoDB more than a full minute (!), so because we now have a
test with two GSIs, I had to increase the timeout in create_test_table().

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711112911.14703-1-nyh@scylladb.com>
2019-09-11 15:51:26 +03:00
Piotr Sarna
a3be9dda7f alternator-test: enable if_not_exists-related tests
Test cases that relied on the implementation of if_not_exists are
enabled.
2019-09-11 15:51:25 +03:00
Piotr Sarna
cec82490d2 alternator: implement if_not_exists
The if_not_exists function is implemented on the basis of recently added
read-before write mechanism.
2019-09-11 15:50:22 +03:00
Piotr Sarna
b14e3c0e72 alternator: rename holds_path to a more generic name
The holds_path() utility function is actually used to check if a value
needs read before write, so its name is changed to more fitting
check_needs_read_before_write.
2019-09-11 15:49:19 +03:00
Nadav Har'El
5fc7b0507e alternator: fix bug in collection mutations
Alternator currently keeps an item's attributes inside a map, and we
had a serious bug in the way we build mutations for this map:

We didn't know there was a requirement to build this mutation sorted by
the attribute's name. When we neglect to do this sorting, this confuses
Scylla's merging algorithms, which assume collection cells are thus
sorted, and the result can be duplicate cells in a collection, and the
visible effect is a mutation that seems to be ignored - because both
old and new values exist in the collection.

So this patch includes a new helper class, "attribute_collector", which
helps collect attribute updates (put and del) and extract them in correctly
sorted order. This helper class also eliminates some duplication of
arcane code to create collection cells or deletions of collection cells.

This patch includes a simple test that previously failed, and one xfail
test that failed just because of this bug (this was the test that exposed
this bug). Both tests now succeed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190709160858.6316-1-nyh@scylladb.com>
2019-09-11 15:48:18 +03:00
Nadav Har'El
5cce53fed9 alternator-test: exhaustive tests for GSI
This patch adds what is hopefully an exhaustive test suite for the
global secondary indexing (GSI) feature, and all its various
complications and corner cases of how GSIs can be created, deleted,
named, written, read, and more (the tests are heavily documented to
explain what they are testing).

All these tests pass on DynamoDB, and fail on Alternator, so they are
marked "xfail". As we develop the GSI feature in Alternator piece by
piece, we should make these tests start to pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708160145.13865-1-nyh@scylladb.com>
2019-09-11 15:48:17 +03:00
Nadav Har'El
9eea90d30d alternator-test: another test for BatchWriteItem
This adds another test for BatchWriteItem: That if one of the operations is
invalid - e.g., has a wrong key type - the entire batch is rejected, and not
none of its operations are done - even the valid ones.

The test succeeds, because we already handle this case correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190707134610.30613-1-nyh@scylladb.com>
2019-09-11 15:48:16 +03:00
Nadav Har'El
01f4cf1373 alternator-test: test UpdateItem's SET with #reference
Test an operation like SET #one = #two, where the RHS has a reference
to a name, rather than the name itself. Also verify that DynamoDB
gives an error if ExpressionAttributeNames includes names not needed
by neither left or right hand side of such assignments.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708133311.11843-1-nyh@scylladb.com>
2019-09-11 15:48:15 +03:00
Piotr Sarna
e482f27e2f alternator-test: add test for reading key before write
The test case checks if reading keys in order to use their values
in read-before-write updates works fine.
2019-09-11 15:48:14 +03:00
Piotr Sarna
7b605d5bec alternator-test: add test case for nested read-before-write
A test for read-before-write in nested paths (inside a function call
or inside a +/- operator) is added.
2019-09-11 15:48:13 +03:00
Piotr Sarna
da795d8733 alternator-test: enable basic read-before-write cases
With unsafe read-before-write implemented, simple cases can be enabled
by removing their xfail flag.
2019-09-11 15:48:12 +03:00
Piotr Sarna
2e473b901a alternator: fix indentation 2019-09-11 15:48:09 +03:00
Piotr Sarna
bf13564a9d alternator: add unsafe read-before-write to update_item
In order to serve update requests that depend on read-before-write,
a proper helper function which fetches the existing item with a given
key from the database is added.
This read-before-write mechanism is not considered safe, because it
provides no linearizability guarantees and offers no synchronization
protection. As such, it should be consider a placeholder that works
fine on a single machine and/or no concurrent access to the same key.
2019-09-11 15:45:21 +03:00
Piotr Sarna
2fb711a438 alternator: add context parameters to calculate_value
The calculate_value utility function is going to need more context
in order to resolve paths present in the right-hand side of update_item
operators: update_info and schema.
2019-09-11 15:40:17 +03:00
Piotr Sarna
cbe1836883 alternator: add allowing key columns when resolving path
Historically, resolving a path checked for key columns, which are not
allowed to be on the left-hand side of the assignment. However, path
resolving will now also be used for right-hand side, where it should
be allowed to use the key value.
2019-09-11 15:39:15 +03:00
Piotr Sarna
20a6077fb3 alternator: add optional previous item to calculate_value
In order to implement read-before-write in the future, calculate_value
now accepts an additional parameter: previous_item. If read-before-write
was performed, previous_item will contain an item for the given key
which already exists in the database at the time of the update.
2019-09-11 15:38:13 +03:00
Piotr Sarna
784aaaa8ff alternator: move describe_item implementation up
It will be needed later to add read-before-write to update_item.
2019-09-11 15:37:13 +03:00
Nadav Har'El
bd4dfa3724 alternator-test: move create_test_table() to util.py
This patch moves the create_test_table() utility function, which creates
a test table with a unique name, from the fixtures (conftest.py) to
util.py. This will allow reusing this function in tests which need to
create tables but not through the existing fixtures. In particular
we will need to do this for GSI (global secondary index) tests
in the next patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708104438.5830-1-nyh@scylladb.com>
2019-09-11 15:37:12 +03:00
Nadav Har'El
ce13a0538c alternator-test: expand tests of duplicate items in BatchWriteItem
The tests we had for BatchWriteItem's refusal to accept duplicate keys
only used test_table_s, with just a hash key. This patch adds tests
for test_table, i.e., a table with both hash and sort keys - to check
that we check duplicates in that case correctly as well.

Moreover, the expanded tests also verify that although identical
keys are not allowed, keys with just one component (hash or sort key)
the same but the other not the same - are fine.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190705191737.22235-1-nyh@scylladb.com>
2019-09-11 15:37:11 +03:00
Nadav Har'El
9bc2685a92 alternator-test: run local tests without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

Also modified the README to be clearer, and more focused on the local
runs.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708121420.7485-1-nyh@scylladb.com>
2019-09-11 15:37:10 +03:00
Nadav Har'El
cb42c75e0a alternator-test: don't hardcode us-east-1 region
For "--aws" tests, use the default region chosen by the user in the
AWS configuration (~/.aws/config or environment variable), instead of
hard-coding "us-east-1".

Patch by Pekka Enberg.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708105852.6313-1-nyh@scylladb.com>
2019-09-11 15:37:09 +03:00
Piotr Sarna
8f9e720f10 alternator-test: enable precision test for add
With big_decimal-based implementation, the precision test passes.
Message-Id: <6d631a43901a272cb9ebd349cb779c9677ce471e.1562318971.git.sarna@scylladb.com>
2019-09-11 15:37:08 +03:00
Piotr Sarna
78e495fac3 alternator: allow arithmetics without losing precision
Calculating value represented as 'v1 + v2' or 'v1 - v2' was previously
implemented with a double type, which offers limited precision.
From now on, these computations are based on big_decimal, which
allows returning values without losing precision.
This patch depends on 'add big_decimal arithmetic operators' series.
Message-Id: <f741017fe3d3287fa70618068bdc753bfc903e74.1562318971.git.sarna@scylladb.com>
2019-09-11 15:36:08 +03:00
Piotr Sarna
466f25b1e8 alternator-test: enable batch duplication cases
With duplication checks implemented, batch write and delete tests
no longer need to be marked @xfail.
Message-Id: <6c5864607e06e8249101bd711dac665743f78d9f.1562325663.git.sarna@scylladb.com>
2019-09-11 15:36:07 +03:00
Piotr Sarna
eb7ada8387 alternator: add checking for duplicate keys in batches
Batch writes and batch deletes do not allow multiple entries
for the same key. This patch implements checking for duplicated
entries and throws an error if applicable.
Message-Id: <450220ba74f26a0893430cb903e4749f978dfd31.1562325663.git.sarna@scylladb.com>
2019-09-11 15:35:01 +03:00
Nadav Har'El
b810fa59c4 alternator-test: move utility functions to a new "util.py"
Move some common utility functions to a common file "util.py"
instead of repeating them in many test files.

The utility functions include random_string(), random_bytes(),
full_scan(), full_query(), and multiset() (the more general
version, which also supports freezing nested dicts).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190705081013.1796-1-nyh@scylladb.com>
2019-09-11 15:35:00 +03:00
Nadav Har'El
2fb77ed9ad alternator: use std::visit for reading std::variant
The idiomatic way to use an std::variant depending the type holds is to use
std::visit. This modern API makes it unnecessary to write many boiler-plate
functions to test and cast the type of the variant, and makes it impossible
to forget one of the options. So in this patch we throw out the old ways,
and welcome the new.

Thanks to Piotr Sarna for the idea.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190704205625.20300-1-nyh@scylladb.com>
2019-09-11 15:33:57 +03:00
Nadav Har'El
4d07e2b7c5 alternator: support BatchGetItem
This patch adds to Alternator an implementation of the BatchGetItem
operation, which allows to start a number of GetItem requests in parallel
in a single request.

The implementation is almost complete - the only missing feature is the
ability to ask only for non-top-level attributes in ProjectionExpression.
Everything else should work, and this patch also includes tests which,
as usual, pass on DynamoDB and now also on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:33:50 +03:00
Nadav Har'El
d1a5512a35 alternator: fix second boot
Amazingly, it appears we never tested booting Alternator a second time :-)

Our initialization code creates a new keyspace, and was supposed to ignore
the error if this keyspace already existed - but we thought the error will
come as an exceptional future, which it didn't - it came as a thrown
exception. So we need to change handle_exception() to a try/catch.

With this patch, I can kill Alternator and it will correctly start again.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:22:48 +03:00
Nadav Har'El
374162f759 alternator: generate error on spurious key columns
Operations which take a key as parameter, namely GetItem, UpdateItem,
DeleteItem and BatchWriteItem's DeleteRequest, already fail if the given
key is missing one of the nessary key attributes, or has the wrong types
for them. But they should also fail if the given key has spurious
attributes beyond those actually needed in a key.

So this patch adds this check, and tests to confirm that we do these checks
correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:21:50 +03:00
Nadav Har'El
da4da6afbf alternator: fix PutItem to really replace item.
The PutItem operation, and also the PutRequest of BatchWriteItem, are
supposed to completely replace the item - not to merge the new value with
the previous value. We implemented this wrongly - we just wrote the new
item forgetting a tombstone to remove the old item.

So this patch fixes these operations, and adds tests which confirm the
fix (as usual, these tests pass on DynamoDB, failed on Alternator before
this patch, and pass after the patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:20:55 +03:00
Nadav Har'El
a0fffcebde alternator: add support for DeleteRequest in BatchWriteItem
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:20:01 +03:00
Nadav Har'El
83b91d4b49 alternator: add DeleteItem
Add support for the DeleteItem operation, which deletes an item.

The basic deletion operation is supported. Still not supported are:

1. Parameters to conditionally delete (ConditionalExpression or Expected)
2. Parameters to return pre-delete content
3. ReturnItemCollectionMetrics (statistics relevant for tables with LSI)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:19:46 +03:00
Nadav Har'El
b09603ed9b alternator: cleaner error on DeleteRequest
In BatchWriteItem, we currently only support the PutRequest operation.
If a user tries to use DeleteRequest (which we don't support yet), he
will get a bizarre error. Let's test the request type more carefully,
and print a better error message. This will also be the place where
eventually we'll actually implement the DeleteRequest.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:16:02 +03:00
Nadav Har'El
a7f7ce1a73 alternator-test: tests for BatchWriteItem
This patch adds more comprehensive tests for the BatchWriteItem operation,
in a new file batch_test.py. The one test we already had for it was also
moved from test_item.py here.

Some of the test still xfail for two reasons:
1. Support for the DeleteRequest operation of BatchWriteItem is missing.
2. Tests that forbid duplicate keys in the same request are missing.

As usual, all tests succeed on DynamoDB, and hopefully (I tried...)
cover all the BatchWriteItem features.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:16:01 +03:00
Nadav Har'El
a8dd3044e2 alternator: support (most of) ProjectionExpression
DynamoDB has two similar parameters - AttributesToGet and
ProjectionExpression - which are supported by the GetItem, Scan and
Query operations. Until now we supported only the older AttributesToGet,
and this patch adds support to the newer ProjectionExpression.

Besides having a different syntax, the main difference between
AttributesToGet and ProjectionExpression is that the latter also
allows fetching only a specific nested attribute, e.g., a.b[3].c.
We do not support this feature yet, although it would not be
hard to add it: With our current data representation, it means
fetching the top-level attribute 'a', whose value is a JSON, and then
post-filtering it to take out only the '.b[3].c'. We'll do that
later.

This patch also adds more test cases to test_projection_expression.py.
All tests except three which check the nested attributes now pass,
and those three xfail (they succeed on DynamoDB, and fail as expected
on Alternator), reminding us what still needs to be done.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:15:01 +03:00
Nadav Har'El
98c4e646a5 alternator-test: tests for yet-unimplemented ProjectionExpression
Our GetItem, Query and Scan implementations support the AttributesToGet
parameter to fetch only a subset of the attributes, but we don't yet
support the more elaborate ProjectionExpression parameter, which is
similar but has a different syntax and also allows to specify nested
document paths.

This patch adds existive testing of all the ProjectionExpression features.
All these tests pass against DynamoDB, but fail against the current
Alternator so they are marked "xfail". These tests will be helpful for
developing the ProjectionExpression feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:15:00 +03:00
Nadav Har'El
7c9e64ed81 alternator-test: more tests for AttributesToGet parameter
The AttributesToGet parameter - saying which attributes to fetch for each
item - is already supported in the GetItem, Query and Scan operations.
However, we only had a test for it for it for Scan. This patch adds
similar tests also for the GetItem and Query operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:59 +03:00
Nadav Har'El
9c53f33003 alternator-test: another test for top-level attribute overwrite
Yet another test for overwriting a top-level attribute which contains
a nested document - here, overwriting it by just a string.

This test passes. In the current implementation we don't yet support
updates to specific attribute paths (e.g. a.b[3].c) but we do support
well writing and over-writing top-level attributes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:58 +03:00
Nadav Har'El
f6fa971e96 alternator: initial implementation of "+" and "-" in UpdateExpression
This patch implements the last (finally!) syntactic feature of the
UpdateExpression - the ability to do SET a=val1+val2 (where, as
before, each of the values can be a reference to a value, an
attribute path, or a function call).

The implementation is not perfect: It adds the values as double-precision
numbers, which can lose precision. So the patch adds a new test which
checks that the precision isn't lost - a test that currently fails
(xfail) on Alternator, but passes on DynamoDB. The pre-existing test
for adding small integer now passes on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:01 +03:00
Nadav Har'El
a5af962d80 alternator: support the list_append() function in UpdateExpression
In the previous patch we added function-call support in the UpdateExpression
parser. In this patch we add support for one such function - list_append().
This function takes two values, confirms they are lists, and concatenates
them. After this patch only one function remains unimplemented:
if_not_exists().

We also split the test we already had for list_append() into two tests:
One uses only value references (":val") and passes after this patch.
The second test also uses references to other attributes and will only
work after we start supporting read-modify-write.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:13:07 +03:00
Nadav Har'El
9d2eba1c75 alternator: parse more types of values in UpdateExpression
Until this patch, in update expressions like "SET a = :val", we only
allowed the right-hand-side of the assignment to be a reference to a
value stored in the request - like ":val" in the above example.

But DynamoDB also allows the value to be an attribute path (e.g.,
"a.b[3].c", and can also be a function of a bunch of other values.
This patch adds supports for parsing all these value types.

This patch only adds the correct parsing of these additional types of
values, but they are still not supported: reading existing attributes
(i.e., read-modify-write operations) is still not supported, and
none of the two functions which UpdateExpression needs to support
are supported yet. Nevertheless, the parsing is now correct, and the
the "unknown_function" test starts to pass.

Note that DynamoDB allows the right-hand side of an assignment to be
not only a single value, but also value+value and value-value. This
possibility is not yet supported by the parser and will be added
later.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:12:06 +03:00
Piotr Sarna
cb50207c7b alternator-test: add initial filtering test for scans
Currently the only supported case is equality on non-key attributes.
More complex filtering tests are also included in test_query.py.
2019-09-11 15:12:05 +03:00
Piotr Sarna
b5eb3aed10 alternator-test: add initial filtering test for query
The test cases verify that equality-based filtering on non-key
attributes works fine. It also contains test stubs for key filtering
and non-equality attribute filtering.
2019-09-11 15:12:04 +03:00
Piotr Sarna
319e946d8f alternator-test: diversify attribute values in filled test table
Filled test table used to have identical non-key attributes for all
rows. These values are now diversified in order to allow writing
filtering test cases.
2019-09-11 15:12:03 +03:00
Piotr Sarna
e4516617eb alternator: add filtering to Query
Query requests now accept QueryFilter parameter.
2019-09-11 15:11:10 +03:00
Piotr Sarna
4ea02bec89 alternator: enable filtering for Scan
Scans can now accept ScanFilter parameter to perform filtering
on returned rows.
2019-09-11 15:10:12 +03:00
Piotr Sarna
8cb078f757 alternator: add initial filtering implementation
Filtering is currently only implemented for the equality operator
on non-key attributes.
Next steps (TODO) involve:
1. Implementing filtering for key restrictions
2. Implementing non-key attribute filtering for operators other than EQ.
   It, in turn, may involve introducing 'map value restrictions' notion
   to Scylla, since now it only allows equality restrictions on map
   values (alternator attributes are currently kept in a CQL map).
3. Implementing FilterExpression in addition to deprecated QueryFilter
2019-09-11 15:08:50 +03:00
Nadav Har'El
aa94e7e680 alternator: clean up parsing of attribute-path components
Before this patch, we read either an attribute name like "name" or
a reference to one "#name", as one type of token - NAME.
However, while attribute paths indeed can use either one, in some other
contexts - such as a function name - only "name" is allowed, so we
need to distinguish between two types of tokens: NAME and NAMEREF.

While separating those, I noticed that we incorrectly allowed a "#"
followed by *zero* alphanumeric characters to be considered a NAMEREF,
which it shouldn't. In other words, NAMEREF should have ALNUM+, not ALNUM*.
Same for VALREF, which can't be just a ":" with nothing after it.
So this patch fixes these mistakes, and adds tests for them.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:08:36 +03:00
Nadav Har'El
13476c8202 alternator: complain about unused values or names in UpdateExpression
DynamoDB complains, and fails an update, if the update contains in
ExpressionAttributeNames or ExpressionAttributeValues names which aren't
used by the expression.

Let's do the same, although sadly this means more work to track which
of the references we've seen and which we haven't.

This patch makes two previously xfail (expected fail) tests become
successful tests on Alternator (they always succeeded against DynamoDB).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:35 +03:00
Nadav Har'El
c4fc02082b alternator-test: complete test for UpdateItem's UpdateExpression
The existing tests in test_update_expression.py thoroughly tested the
UpdateExpression features which we currently support. But tests for
features which Alternator *doesn't* yet support were partial.

In this patch, we add a large number of new tests to
test_update_expression.py aiming to cover ALL the features of
UpdateExpression, regardless of whether we already support it in
Alternator or not. Every single feature and esoteric edge-case I could
discover is covered in these tests - and as far as I know these tests
now cover the *entire* UpdateExpression feature. All the tests succeed
on DynamoDB, and confirm our understanding of what DynamoDB actually does
on all these cases.

After this patch, test_update_expression.py is a whopper, with 752 lines of
code and 37 separate test functions. 23 out of these 37 tests are still
"xfail" - they succeed on DynamoDB but fail on Alternator, because of
several features we are still missing. Those missing features include
direct updates of nested attributes, read-modify-write updates (e.g.,
"SET a=b" or "SET a=a+1"), functions (e.g., "SET a = list_append(a, :val)"),
the ADD and DELETE operations on sets, and various other small missing
pieces.

The benefit of this whopper test is two-fold: First, it will allow us
to test our implementation as we continue to fill it (i.e., "test-
driven development"). Second, all these tested edge cases basically
"reverse engineer" how DynamoDB's expression parser is supposed to work,
and we will need this knowledge to implement the still-missing features of
UpdateExpression.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:34 +03:00
Nadav Har'El
ede5943401 alternator-test: test for UpdateItem's UpdateExpression
This patch adds an extensive array of tests for UpdateItem's UpdateExpression
support, which was introduced in the previous patch.

The tests include verification of various edge cases of the parser, support
for ":value" and "#name" references, functioning SET and REMOVE operations,
combinations of multiple such operations, and much more.

As usual, all these tests were ran and succeed on DynamoDB, as well as on
Alternator - to confirm Alternator behaves the same as DynamoDB.

There are two tests marked "xfail" (expected to fail), because Alternator
still doesn't support the attribute copy syntax (e.g., "SET a = b",
doing a read-before-write).

There are some additional areas which we don't support - such as the DELETE
and ADD operations or SET with functions - but those areas aren't yet test
in these tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:33 +03:00
Nadav Har'El
4baa0d3b67 alternator: enable support for UpdateItem's UpdateExpression
For the UpdateItem operation, so far we supported updates via the
AttributeUpdates parameter, specifying which attributes to set or remove
and how. But this parameter is considered deprecated, and DynamoDB supports
a more elaborate way to modify attributes, via an "UpdateExpression".

In the previous patch we added a function to parse such an UpdateExpression,
and in this patch we use the result of this parsing to actually perform
the required updates.

UpdateExpression is only partially supported after this patch. The basic
"SET" and "REMOVE" operations are supported, but various other cases aren't
fully supported and will be fixed in followup patches. The following
patch will add extensive tests to confirm exactly what works correctly
with the new UpdateExpression support.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:34 +03:00
Nadav Har'El
829bafd181 alternator: add expression parsers
The DynamoDB protocol is based on JSON, and most DynamoDB requests describe
the operation and its parameters via JSON objects such as maps and lists.
However, in some types of requests an "expression" is passed as a single
string, and we need to parse this string. These cases include:
1. Attribute paths, such as "a[3].b.c", are used in projection
 expressions as well as inside other expressions described below.
2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
 used in conditional updates, filters, and other places.
3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"

This patch introduces the framework to parse these expressions, and
an implementation of parsing update expressions. These update expressions
will be used in the UpdateItem operation in the next patch.

All these expression syntaxes are very simple: Most of them could be
parsed as regular expressions, or at most a simple hand-written lexical
analyzer and recursive-descent parser. Nevertheless, we decided to specify
these parsers in the same ANTLR3 language already used in the Scylla
project for parsing CQL, hopefully making these parsers easier to reason
about, and easier to change if needed - and reducing the amount of boiler-
plate code.

The parsing of update expressions is most complete except that in SET
actions, only the "path = value" form is supported and not yet forms
forms such as "path1 = path2" (which does read-before-write) or
"path1 = path1 + value" or "path = function(...)".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:12 +03:00
Nadav Har'El
f0f50607a7 alternator-test: split nested-document tests to new file
We need to write more tests for various case of handling
nested documents and nested attributes. Let's collect them
all in the same test file.

This patch mostly moves existing code, but also adds one
small test, test_nested_document_attribute_write, which
just writes a nested document and reads it back (it's
mostly covered by the existing test_put_and_get_attribute_types,
but is specifically about a nested document).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:11 +03:00
Nadav Har'El
12abe8e797 alternator-test: make local test the default
We usually run Alternator tests against the local Alternator - testing
against AWS DynamoDB is rarer, and usually just done when writing the
test. So let's make "pytest" without parameters default to testing locally.
To test against AWS, use "pytest --aws" explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:10 +03:00
Piotr Sarna
b67f22bfc6 alternator: move related functions to serialization.cc
Existing functions related to serialization and deserialization
are moved to serialization.cc source file.
Message-Id: <fb49a08b05fdfcf7473e6a7f0ac53f6eaedc0144.1559646761.git.sarna@scylladb.com>
2019-09-11 15:06:05 +03:00
Piotr Sarna
fdba9866fc alternator: apply new serialization to reads and writes
Attributes for reads (GetItem, Query, Scan, ...) and writes (PutItem,
UpdateItem, ...) are now serialized and deserialized in binary form
instead of raw JSON, provided that their type is S, B, BOOL or N.
Optimized serialization for the rest of the types will be introduced
as follow-ups.
Message-Id: <6aa9979d5db22ac42be0a835f8ed2931dae208c1.1559646761.git.sarna@scylladb.com>
2019-09-11 15:02:21 +03:00
Piotr Sarna
b3fd4b5660 alternator: add simple attribute serialization routines
Attributes used to be written into the database in raw JSON format,
which is far from optimal. This patch introduces more robust
serializationi routines for simple alternator types: S, B, BOOL, N.
Serialization uses the first byte to encode attribute type
and follows with serializing data in binary form.
More complex types (sets, lists, etc.) are currently still
serialized in raw JSON and will be optimized in follow-up patches.
Message-Id: <10955606455bbe9165affb8ac8fba4d9e7c3705f.1559646761.git.sarna@scylladb.com>
2019-09-11 15:01:07 +03:00
Piotr Sarna
27f00d1693 alternator: move error class to a separate header
Error class definitions were previously in server.hh, but they
are separate entities - future .cc files can use the errors without
the need of including server definitions.
Message-Id: <b5689e0f4c9f9183161eafff718f45dd8a61b653.1559646761.git.sarna@scylladb.com>
2019-09-11 14:52:58 +03:00
Nadav Har'El
52810d1103 configure.py: move alternator source files to separate list
For some unknown reason we put the list of alternator source files
in configure.py inside the "api" list. Let's move it into a separate
list.

We could have just put it in the scylla_core list, but that would cause
frequent and annoying patch conflicts when people add alternator source
files and Scylla core source files concurrently.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:52:39 +03:00
Nadav Har'El
d4b3c493ad alternator: stub support for UpdateItem with UpdateExpression
So far for UpdateItem we only supported the old-style AttributeUpdates
parameter, not the newer UpdateExpression. This patch begins the path
to supporting UpdateExpression. First, trying to use *both* parameters
should result in an error, and this patch does this (and tests this).
Second, passing neither parameters is allowed, and should result in
an *empty* item being created.

Finally, since today we do not yet support UpdateExpression, this patch
will cause UpdateItem to fail if UpdateExpression is used, instead of
silently being ignored as we did so far.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:40 +03:00
Nadav Har'El
04856a81f5 alternator-tests: two simple test for nested documents
This patch adds two simple tests for nested documents, which pass:

test_nested_document_attribute_overwrite() tests what happens when
we UpdateItem a top-level attribute to a dictionary. We already tested
this works on an empty item in a previous test, but now we check what
happens when the attribute already existed, and already was a dictionary,
and now we update it to a new dictionary. In the test attribute a was
{b:3, c:4} and now we update it to {c:5}. The test verifies that the new
dictionary completely replaces the old one - the two are not merged.
The new value of the attribute is just {c:5}, *not* {b:3, c:5}.

The second test verifies that the AttributeUpdates parameter of
UpdateItem cannot be used to update a just a nested attributes.
Any dots in the attribute name are considered an actual dot - not
part of a path of attribute names.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:39 +03:00
Nadav Har'El
b782d1ef8d alternator-test: test_query.py: change item list comparison
Comparing two lists of items without regard for order is not trivial.
For this reason some tests in test_query.py only compare arrays of sort
keys, and those tests are fine.

But other tests used a trick of converting a list of items into a
of set_of_frozen_elements() and compare this sets. This trick is almost
correct, but it can miss cases where items repeat.

So in this patch, we replace the set_of_frozen_elements() approach by
a similar one using a multiset (set with repetitions) instead of a set.
A multiset in Python is "collections.Counter". This is the same approach
we started to also used in test_scan.py in a recent patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:38 +03:00
Nadav Har'El
15f47a351e alternator: remove unused code
Remove the incomplete and unused function to convert DynamoDB type names
to ScyllaDB type objects:

DynamoDB has a different set of types relevant for keys and for attributes.
We already have a separate function, parse_key_type(), for parsing key
types, and for attributes - we don't currently parse the type names at
all (we just save them as JSON strings), so the function we removed here
wasn't used, and was in fact #if'ed out. It was never completed, and it now
started to decay (the type for numbers is wrong), so we're better off
completely removing it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:50:44 +03:00
Nadav Har'El
b63bd037ea alternator: implement correct "number" type for keys
This patch implements a fully working number type for keys, and now
Alternator fully and correctly supports every key type - strings, byte
arrays, and numbers.

The patch also adds a test which verifies that Scylla correctly sorts
number sort keys, and also correctly retrieves them to the full precision
guaranteed by DynamoDB (38 decimal digits).

The implementation uses Scylla's "decimal" type, which supports arbitrary
precision decimal floating point, and in particular supports the precision
specified by DynamoDB. However, "decimal" is actually over-qualified for
this use, so might not be optimal for the more specific requirements of
DynamoDB. So a FIXME is left to optimize this case in the future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:47 +03:00
Nadav Har'El
cb1b2b1fc2 alternator-test: test_scan.py: change item list comparison
Comparing two lists of items without regard for order is not trivial.
test_scan.py currently has two ways of doing this, both unsatisfactory:

1. We convert each list to a set via set_of_frozen_elements(), and compare
   the sets. But this comparison can miss cases where items repeat.

2. We use sorted() on the list. This doesn't work on Python 3 because
   it removed the ability to compare (with "<") dictionaries.

So in this patch, we replace both by a new approach, similar to the first
one except we use a multiset (set with repetitions) instead of a set.
A multiset in Python is "collections.Counter".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:46 +03:00
Nadav Har'El
4a1b6bf728 alternator-test: drop "test_2_tables" fixture
Creating and deleting tables is the slowest part of our tests,
so we should lower the number of tables our tests create.

We had a "test_2_tables" fixture as a way to create two
tables, but since our tests already create other tables
for testing different key types, it's faster to reuse those
tables - instead of creating two more unused tables.

On my system, a "pytest --local", running all 38 tests
locally, drops from 25 seconds to 20 seconds.

As a bonus, we also have one fewer fixture ;-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:45 +03:00
Nadav Har'El
013fb1ae38 alternator-text: fix errors in len/length variable name
Also change "xrage" to "range" to appease Python 3

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:44 +03:00
Nadav Har'El
30a123d8ad DynamoDB limits the size of hash keys to 2048 bytes, sort keys
to 1024 bytes, and the entire item to 400 KB which therefore also
limits the size of one attribute. This test checks that we can
reach up to these limits, with binary keys and attributes.

The test does *not* check what happens once we exceed these
limits. In such a case, DynamoDB throws an error (I checked that
manually) but Alternator currently simply succeeds. If in the
future we decide to add artificial limits to Alternator as well,
we should add such tests as well.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:43 +03:00
Nadav Har'El
b91eca28bd alternator-test: don't use "len" as a parameter name
"len" is an unfortunate choice for a variable name, in case one
day the implementation may want to call the built-in "len" function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:42 +03:00
Nadav Har'El
e21e0e6a37 alternator-test: test sort-key ordering - for both string and binary keys
We already have a test for *string* sort-key ordering of items returned
by the Scan operation, and this test adds a similar test for the Query
operation. We verify that items are retrieved in the desired sorted
order (sorted by the aptly-named sort key) and not in creation order
or any other wrong order.

But beyond just checking that Query works as expected (it should,
given it uses the same machinary as Scan), the nice thing about this
test is that it doesn't create a new table - it uses a shared table
and creates one random partition inside it. This makes this test
faster and easier to write (no need for a new fixture), and most
importantly - easily allows us to write similar tests for other
key types.

So this patch also tests the correct ordering of *binary* sort keys.
It helped exposed bugs in previous versions of the binary key implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:41 +03:00
Nadav Har'El
1d058cf753 alternator-test: test item operations with binary keys
Simple tests for item operations (PutItem, GetItem) with binary key instead
of string for the hash and sort keys. We need to be able to store such
keys, and then retrieve them correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:40 +03:00
Nadav Har'El
4bfd5d7ed1 alternator: add support for bytes as key columns
Until now we only supported string for key columns (hash or sort key).
This patch adds support for the bytes type (a.k.a binary or blob) as well.
The last missing type to be supported in keys is the number type.

Note that in JSON, bytes values are represented with base64 encoding,
so we need to decode them before storing the decoded value, and re-encode
when the user retrieves the value. The decoding is important not just
for saving storage space (the encoding is 4/3 the size of the decoded)
but also for correct *sorting* of the binary keys.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:35 +03:00
Nadav Har'El
57b46a92d7 alternator: add base64 encoding and decoding functions
The DynamoDB API uses base64 encoding to encode binary blobs as JSON
strings. So we need functions to do these conversions.

This code was "inspired" by https://github.com/ReneNyffenegger/cpp-base64
but doesn't actually copy code from it.

I didn't write any specific unit tests for this code, but it will be
exercised and tested in a following patch which tests Alternator's use
of these functions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:46:13 +03:00
Piotr Sarna
0980fde9d5 alternator-test: add dedicated BEGINS_WITH case to Query
BEGINS_WITH behaves in a special way when a key postfix
consists of <255> bytes. The initial test does not use that
and instead checks UTF-8 characters, but once bytes type
is implemented for keys, it should also test specifically for
corner cases, like strings that consist of <255> byte only.
Message-Id: <fe10d7addc1c9d095f7a06f908701bb2990ce6fe.1558603189.git.sarna@scylladb.com>
2019-09-11 14:46:12 +03:00
Piotr Sarna
5bc7bb00e0 alternator-test: rename test_query_with_paginator
Paginator is an implementation detail and does not belong in the name,
and thus the test is renamed to test_query_basic_restrictions.
Message-Id: <849bc9d210d0faee4bb8479306654f2a59e18517.1558524028.git.sarna@scylladb.com>
2019-09-11 14:46:11 +03:00
Piotr Sarna
9e2ecf5188 alternator: fix string increment for BEGINS_WITH
BEGINS_WITH statement increments a string in order to compute
the upper bound for a clustering range of a query.
Unfortunately, previous implementation was not correct,
as it appended a <0> byte if the last character was <255>,
instead of incrementing a last-but-one character.
If the string contains <255> bytes only, the upper bound
of the returned upper bound is infinite.
Message-Id: <3a569f08f61fca66cc4f5d9e09a7188f6daad578.1558524028.git.sarna@scylladb.com>
2019-09-11 14:45:17 +03:00
Nadav Har'El
7b9180cd99 alternator: common get_read_consistency() function
We had several places in the code that need to parse the
ConsistentRead flag in the request. Let's add a function
that does this, and while at it, checks for more error
cases and also returns LOCAL_QUORUM and LOCAL_ONE instead
of QUORUM and ONE.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:44:24 +03:00
Nadav Har'El
56907bf6c6 alternator: for writes, use LOCAL_QUORUM instead of QUORUM
As Shlomi suggested in the past, it is more likely that when we
eventually support global tables, we will use LOCAL_QUORUM,
not QUORUM. So let's switch to that now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:44:20 +03:00
Nadav Har'El
8c347cc786 alternator-test: verify that table with only hash key also works
So far, all of the tests in test_item.py (for PutItem, GetItem, UpdateItem),
were arbitrarily done on a test table with both hash key and sort key
(both with string type). While this covers most of the code paths, we still
need to verify that the case where there is *not* a sort key, also works
fine. E.g., maybe we have a bug where a missing clustering key is handled
incorrectly or an error is incorrectly reported in that case?

But in this patch we add tests for the hash-key-only case, and see that
it already works correctly. No bug :-)

We add a new fixture test_table_s for creating a test table with just
a single string key. Later we'll probably add more of these test tables
for additional key types.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:41:16 +03:00
Nadav Har'El
c53b2ebe4d alternator-test: also test for missing part of key
Another type of key type error can be to forget part of the key
(the hash or sort key). Let's test that too (it already works correctly,
no need to patch the code).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:41:15 +03:00
Nadav Har'El
f58abb76d6 alternator: gracefully handle wrong key types
When a table has a hash key or sort key of a certain type (this can
be string, bytes, or number), one cannot try to choose an item using
values of different types.

We previously did not handle this case gracefully, and PutItem handled
it particularly bad - writing malformed data to the sstable and basically
hanging Scylla. In this patch we fix the pk_from_json() and ck_from_json()
functions to verify the expected type, and fail gracefully if the user
sent the wrong type.

This patch also adds tests for these failures, for the GetItem, PutItem,
and UpdateItem operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:40:23 +03:00
Nadav Har'El
9ee912d5cf alternator: correct handling of missing item in GetItem
According to the documentation, trying to GetItem a non-existant item
should result in an empty response - NOT a response with an empty "Item"
map as we do before this patch.

This patch fixes this case, and adds a test case for it. As usual,
we verify that the test case also works on Amazon DynamoDB, to verify
DynamoDB really behaves the way we thik it does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:39:32 +03:00
Nadav Har'El
7f73f561d5 alternator: fix support for empty items
If an empty item (i.e., no attributes except the key) is created, or an item
becomes empty (by deleting its existing attributes), the empty item must be
maintained - it cannot just disappear. To do this in Scylla, we must add a
row marker - otherwise an empty attribute map is not enough to keep the
row alive.

This patch includes 4 test cases for all the various ways an empty item can be
created empty or non-empty item be emptied, and verifies that the empty item
can be correctly retrieved (as usual, to verify that our expectation of
"correctness" is indeed correct, we run the same tests against DynamoDB).
All these 4 tests failed before this patch, and now succeed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:38:40 +03:00
Nadav Har'El
95ed2f7de8 alternator: remove two unused lines of code
These lines of codes were superfluous and their result unused: the
make_item_mutation() function finds the pk and ck on its own.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:37:49 +03:00
Nadav Har'El
eb81b31132 alternator: add statistics
his patch adds a statistics framework to Alternator: Executor has (for
each shard) a _stats object which contains counters for various events,
and also is in charge of making these counters visible via Scylla's regular
metrics API (http://localhost:9180/metrics).

This patch includes a counter for each of DynamoDB's operation types,
and we increase the ones we support when handled. We also added counters
for total operations and unsupported operations (operation types we don't
yet handle). In the future we can easily add many more counters: Define
the counter in stats.hh, export it in stats.cc, and increment it in
where relevant in executor.cc (or server.cc).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:36:26 +03:00
Piotr Sarna
d267e914ad alternator-test: add initial Query test
The test covers simple restrictions on primary keys.
Message-Id: <2a7119d380a9f8572210571c565feb8168d43001.1558356119.git.sarna@scylladb.com>
2019-09-11 14:36:25 +03:00
Piotr Sarna
b309c9d54b alternator: implement basic Query
The implementation covers the following restrictions
 - equality for hash key;
 - equality, <, <=, >, >=, between, begins_with for sort key.
Message-Id: <021989f6d0803674cbd727f9b8b3815433ceeea5.1558356119.git.sarna@scylladb.com>
2019-09-11 14:36:16 +03:00
Piotr Sarna
8571046d3e alternator: move do_query to separate function
A fair portion of code from scan() will be used later to implement
query(), so it's extracted as a helper function.
Message-Id: <d3bc163a1cb2032402768fcbc6a447192fba52a4.1558356119.git.sarna@scylladb.com>
2019-09-11 14:31:31 +03:00
Nadav Har'El
4a8b2c794d alternator-test: another edge case for Scan with AttributesToGet
Ask to retrieve only an attribute name which *none* of the items have.
The result should be a silly list of empty items, and indeed it is.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:30 +03:00
Nadav Har'El
c766d1153d alternator-test: shorten test_scan.py by reusing full_scan more
Use full_scan() in another test instead of open-coding the scan.
There are two more tests that could have used full_scan(), but
since they seem to be specifically adding more assertions or
using a different API ("paginators"), I decided to leave them
as-is. But new tests should use full_scan().

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:29 +03:00
Nadav Har'El
2666b29c77 alternator-test: test AttributesToGet parameter in Scan request
This is a short, but extensive, test to the AttributesToGet parameter
to Scan, allowing to select for output only some of the attributes.

The AttributesToGet feature has several non-obvious features. Firstly,
it doesn't require that any key attributes be selected. So since each
item may have different non-key attributes, some scanned items may
be missing some of the selected columns, and some of the items may
even be missing *all* the selected columns - in which case DynamoDB
returns an empty item (and doesn't entirely skip this item). This
test covers all these cases, and it adds yet another item to the
'filled_test_table' fixture, one which has different attributes,
so we can see these issues.

As usual, this test passes in both DynamoDB and Alternator, to
assure we correspond to the *right* behavior, not just what we
think is right.

This test actually exposed a bug in the way our code returned
empty items (items which had none of the selected columns),
a bug which was fixed by the previous patch.

Instead of having yet another copy of table-scanning code, this
patch adds a utility function full_scan(), to scan an entire
table (with optional extra parameters for the scan) and return
the result as an array. We should simply existing tests in
test_scan.py by using this new function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:28 +03:00
Avi Kivity
446faba49c Merge "dbuild: add --image option, help, and usage" from Benny
* tag 'dbuild-image-help-usage-v1' of github.com:bhalevy/scylla:
  dbuild: add usage
  dbuild: add help option
  dbuild: list available images when no image arg is given
  dbuild: add --image option
2019-09-11 14:30:45 +03:00
Nadav Har'El
f871a4bc87 alternator: fix bug in returning an empty item in a Scan
When a Scan selects only certain attributes, and none of the key
attributes are selected, for some of the scanned items *nothing*
will remain to be output, but still Dynamo outputs an empty item
in this case. Our code had a bug where after each item we "moved"
the object leaving behind a null object, not an empty map, so a
completely empty item wasn't output as an empty map as expected,
and resulted in boto3 failing to parse the response.

This simple one-line patch fixes the bug, by resetting the item
to an empty map after moving it out.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:30:37 +03:00
Piotr Sarna
8525b14271 alternator: add lookup table for requests
Instead of using a really long if-else chain, requests are now
looked up via a routing table.
Message-Id: <746a34b754c3070aa9cbeaf98a6e7c6781aaee65.1557914794.git.sarna@scylladb.com>
2019-09-11 14:29:59 +03:00
Piotr Sarna
f3440f2e4a alternator-test: migrate filled_test_table to use batches
Filled test table fixture now takes advantage of batch writes
in order to run faster.
Message-Id: <e299cdffa9131d36465481ca3246199502d65e0c.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:58 +03:00
Piotr Sarna
4c3bdd3021 alternator-test: add batch writing test case
Message-Id: <a950799dd6d31db429353d9220b63aa96676a7a7.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:57 +03:00
Piotr Sarna
c0ecd1a334 alternator: add basic BatchWriteItem
The initial implementation only supports PutRequest requests,
without serving DeleteRequest properly.
Message-Id: <451bcbed61f7eb2307ff5722de33c2e883563643.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:50 +03:00
Nadav Har'El
9a0c13913d alternator: improve where DescribeEndpoints gets its information
Instead of blindly returning "localhost:8000" in response to
DescribeEndpoints and for sure causing us problems in the future,
the right thing to do is to return the same domain name which the
user originally used to get to us, be it "localhost:8000" or
"some.domain.name:1234". But how can we know what this domain name
was? Easy - this is why HTTP 1.1 added a mandatory "Host:" header,
and the DynamoDB driver I tested (boto3) adds it as expected,
indeed with the expected value of "localhost:8000" on my local setup.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:25:22 +03:00
Nadav Har'El
a4a3b2fe43 alternator-test: test for sort order of items in a single partition
Although different partitions are returned by a Scan in (seemingly)
random order, items in a single partition need to be returned sorted
by their sort key. This adds a test to verify this.

This patch adds to the filled_test_table fixture, which until now
had just one item in each partition, another partition (with the key
"long") with 164 additional items. The test_scan_sort_order_string
test then scans this table, and verifies that the items are really
returned in sorted order.

The sort order is, of course, string order. So we have the first
item with sort key "1", then "10", then "100", then "101", "102",
etc. When we implement numeric keys we'll need to add a version
of this test which uses a numeric clustering key and verifies the
sort order is numeric.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:25:21 +03:00
Nadav Har'El
32c388b48c alternator: fix clustering key setup
Because of a typo, we incorrectly set the table's sort key as a second
partition key column instead of a clustering key column. This has bad
but subtle consequences - such as that the items are *not* sorted
according to the sort key. So in this patch we fix the typo.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:24:30 +03:00
Nadav Har'El
29e0f68ee0 alternator: add initial implementation of DescribeEndpoints
DescribeEndpoints is not a very important API (and by default, clients
don't use it) but I wanted to understand how DynamoDB responds to it,
and what better way than to write a test :-)

And then, if we already have a test, let's implement this request in
Scylla as well. This is a silly implementation, which always returns
"localhost:8000". In the future, this will need to be configurable -
we're not supposed here to return *this* server's IP address, but rather
a domain name which can be used to get to all servers.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:22:47 +03:00
Avi Kivity
211b0d3eb4 Merge "sstables, gdb: Centralize tracking of sstable instances" from Tomasz
"
Currently, GDB scripts locate sstables by scanning the heap for
bag_sstable_set containers. That has disadvatanges:

  - not all containers are considered

  - it's extremely slow on large heaps

  - fragile, new containers can be added, and we won't even know

This series fixes all above by adding a per-shard sstable tracker
which tracks sstable objects in a linked-list.
"

* 'sstable-tracker' of github.com:tgrabiec/scylla:
  gdb: Use sstable tracker to get the list of sstables
  gdb: Make intrusive_list recognize member_hook links
  sstables: Track whether sstable was already open or not
  sstables: Track all instances of sstable objects
  sstables: Make sstable object not movable
  sstables: Move constructor out of line
2019-09-11 14:22:41 +03:00
Nadav Har'El
982b5e60e7 alternator: unify and improve TableName field handling
Most of the request types need to a TableName parameter, specifying the
name of the table they operate on. There's a lot of boilerplate code
required to get this table name and verify that it is valid (the parameter
exists, is a string, passes DynamoDB's naming rules, and the table
actually exists), which resulted in a lot of code duplication - and
in some cases missing checks.

So this patch introduces two utility functions, get_table_name()
and get_table(), to fetch a table name or the schema of an existing
table, from the request, with all necessary validation. If validation
fails, the appropriate api_error() is thrown so the user gets the
right error message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:21:53 +03:00
Nadav Har'El
b8fc783171 alternator-test: clean up conftest.py
Remove unused random-string code from conftest.py, and also add a
TODO comment how we should speed up filled_test_table fixture by
using a batch write - when that becomes available in Alternator.
(right now this fixture takes almost 4 seconds to prepare on a local
Alternator, and a whopping 3 minutes (!) to prepare on DynamoDB).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:21:52 +03:00
Piotr Sarna
a4387079ac alternator-test: add initial scan test
Message-Id: <c28ff1d38930527b299fe34e9295ecd25607398c.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:51 +03:00
Piotr Sarna
b6d148c9e0 alternator-test: add filled test table fixture
The fixture creates a test table and fills it with random data,
which can be later used for testing reads.
Message-Id: <649a8b8928e1899c5cbd82d65d745a464c1163c8.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:50 +03:00
Piotr Sarna
4def674731 alternator: implement basic scan
The most basic version of Scan request is implemented.
It still contains a list of TODOs, among which the support for Segments
parameter for scan parallelism.
Message-Id: <5d1bfc086dbbe64b3674b0053e58a0439e64909b.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:39 +03:00
Piotr Sarna
0ce3866fb5 alternator: lower debug messages verbosity in the HTTP server
The HTTP server still uses WARN log level to log debug messages,
which is way higher than necessary. These messages are degraded
to TRACE level.
Message-Id: <59559277f2548d4046001bebff45ab2d3b7063b5.1557744617.git.sarna@scylladb.com>
2019-09-11 14:12:40 +03:00
Nadav Har'El
d45220fb39 alternator-test: simplify test_put_and_get_attribute_types
The test test_put_and_get_attribute_types needlessly named all the
different attributes and their variables, causing a lot of repetition
and chance for mistakes when adding additional attributes to the test.

In this rewrite, we only have a list of items, and automatically build
attributes with them as values (using sequential names for the attributes)
and check we read back the same item (Python's dict equality operator
checks the equality recursively, as expected).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:39 +03:00
Nadav Har'El
ea32841dab alternator-test: test all attribute types
Although we planned to initially support only string types, it turns out
for the attributes (*not* the key), we actually support all types already,
including all scalar types (string, number, bool, binary and null) and
more complex types (list, nested document, and sets).

This adds a tests which PutItem's these types and verifies that we can
retrieve them.

Note that this test deals with top-level attributes only. There is no
attempt to modify only a nested attribute (and with the current code,
it wouldn't work).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:38 +03:00
Nadav Har'El
c645538061 alternator-test: rewrite ListTables test
In our tests, we cannot really assume that ListTables should returns *only*
the tables we created for the test, or even that a page size of 100 will
be enough to list our 3 pages. The issue is that on a shared DynamoDB, or
in hypothetical cases where multiple tests are run in parallel, or previous
tests had catestrophic errors and failed to clean up, we have no idea how
many unrelated tables there are in the system. There may be hundreds of
them.  So every ListTables test will need to use paging.

So in this re-implementation, we begin with a list_tables() utility function
which calls ListTables multiple times to fetch all tables, and return the
resulting list (we assume this list isn't so huge it becomes unreasonable
to hold it in memory). We then use this utility function to fetch the table
list with various page sizes, and check that the test tables we created are
listed in the resulting list.

There's no longer a separate test for "all" tables (really was a page of 100
tables) and smaller pages (1,2,3,4) - we now have just one test that does the
page sizes 1,2,3,4, 50 and 100.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:37 +03:00
Piotr Sarna
6b83e17b74 alternator: add tests to ListTables command
Test cases cover both listing appropriate table names
and pagination.
Message-Id: <e7d5f1e5cce10c86c47cdfb4d803149488935ec0.1557402320.git.sarna@scylladb.com>
2019-09-11 14:12:36 +03:00
Piotr Sarna
dfbf4ffe0f alternator-test: add 2 tables fixture
For some tests, more than 1 table is needed, so another fixture
that provided two additional test tables is added.
Message-Id: <75ae9de5cc1bca19594db1f0bc03260f83459380.1557402320.git.sarna@scylladb.com>
2019-09-11 14:12:35 +03:00
Piotr Sarna
b6dde25bcc alternator: implement ListTables
ListTables is used to extract all table names created so far.
Message-Id: <04f4d804a40ff08a38125f36351e56d7426d2e3d.1557402320.git.sarna@scylladb.com>
2019-09-11 14:10:54 +03:00
Piotr Sarna
b73a9f3744 alternator: use trace level for debug messages
In the early development stage, warn level was used for all
debug messages, while it's more appropriate to use 'trace' or 'debug'.
Message-Id: <419ca5a22bc356c6e47fce80b392403cefbee14d.1557402320.git.sarna@scylladb.com>
2019-09-11 14:10:02 +03:00
Nadav Har'El
4ed9aa4fb4 alternator-test: cleanup in conftest.py
This patch cleans up some comments and reorganizes some functions in
conftest.py, where the test_table fixture was defined. The goal is to
later add additional types of test tables with different schemas (e.g.,
just a partition key, different key types, etc.) without too much
code duplication.

This patch doesn't change anything functional in the tests, and they
still pass ("pytest --local" runs all tests against the local Alternator).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:10:01 +03:00
Nadav Har'El
5c564b7117 alternator: make ck_from_json() easier to use
The ck_from_json() utility function is easier to use if it handles
the no-clustering-key case as the callers need them too, instead of
requiring them to handle the no-clustering-key case separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:09:06 +03:00
Nadav Har'El
3ae0066aae alternator: add support for UpdateItem's DELETE operation
So far we supported UpdateItem only with PUT operations - this patch
adds support for DELETE operations, to delete specific attributes from
an item.

Only the case of a missing value is support. DynamoDB also provides
the ability to pass the old value, and only perform the deletion if
the value and/or its type is still up-to-date - but we don't support
this yet and fail such request if it is attempted.

This patch also includes a test for this case in alternator-test/

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:08:57 +03:00
Nadav Har'El
81679d7401 alternator-test: add tests for UpdateItem
Add initial tests for UpdateItem. Only the features currently supported
by our code (only string attributes, only "PUT" action) are tested.

As usual, this test (like all others) was tested to pass on both DynamoDB
and Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:03:10 +03:00
Nadav Har'El
0c2a440f7f alternator: add initial UpdateItem implementation
Add an initial UpdateItem implementation. As PutItem and GetItem we
are still limited to string attributes. This initial implementation
of UpdateItem implements only the "PUT" action (not "DELETE" and
certainly not "ADD") and not any of the more advanced options.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:03:00 +03:00
Piotr Sarna
686d1d9c3c alternator: add attrs_column() helper function
Message-Id: <d93ae70ccd27fe31d0bc6915a20d83d7a85342cf.1557223199.git.sarna@scylladb.com>
2019-09-11 13:08:52 +03:00
Piotr Sarna
6ad9b10317 alternator: make constant names more explicit
KEYSPACE and ATTRS constants refer to their names, not objects,
so they're named more explicitly.
Message-Id: <14b1f00d625e041985efbc4cbde192bd447cbf03.1557223199.git.sarna@scylladb.com>
2019-09-11 13:07:14 +03:00
Piotr Sarna
2975ca668c alternator: remove inaccessible return statement
Message-Id: <afaef20e7e110fa23271fb8c3dc40cec0716efb6.1557223199.git.sarna@scylladb.com>
2019-09-11 13:06:21 +03:00
Piotr Sarna
6e8db5ac6a alternator: inline keywords
It was decided that all alternator-specific keywords can be inlined
in code instead of defining them as constants.
Message-Id: <6dffb9527cfab2a28b8b95ac0ad614c18027f679.1557223199.git.sarna@scylladb.com>
2019-09-11 13:04:38 +03:00
Nadav Har'El
50a69174b3 alternator: some cleanups in validate_table_name()
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:03:44 +03:00
Nadav Har'El
0e06d82a1f alternator: clean up api_error() interface
All operation-generated error messages should have the 400 HTTP error
code. It's a real nag to have to type it every time. So make it the
default.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:01:47 +03:00
Nadav Har'El
0634629a79 alternator-test: test for error on creating an already-existing table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:01:46 +03:00
Nadav Har'El
6fe6cf0074 alternator: correct error when trying to CreateTable an existing table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:00:54 +03:00
Nadav Har'El
871dd7b908 alternator: fix return object from PutItem
Without special options, PutItem should return nothing (an empty
JSON result). Previously we had trouble doing this, because instead
of return an empty JSON result, we converted an empty string into
JSON :-) So the existing code had an ugly workaround which worked,
sort of, for the Python driver but not for the Java driver.

The correct fix, in this patch, is to invent a new type json_string
which is a string *already* in JSON and doesn't need further conversion,
so we can use it to return the empty result. PutItem now works from
YCSB's Java driver.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:00:47 +03:00
Nadav Har'El
ae1ee91f3c alternator-test: more examples in README.md
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:56:07 +03:00
Nadav Har'El
886438784c alternator-test: test table name limit of 222 bytes, instead of 255.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:56:06 +03:00
Nadav Har'El
28e7fa20ed alternator: limit table names to 222 bytes
Although we would like to allow table names up to 222 bytes, this is not
currently possible because Scylla tacks additional 33 bytes to create
a directory name, and directory names are limited to 255 bytes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:55:07 +03:00
Nadav Har'El
a702e5a727 alternator-test: verify appropriate error when invalid key type is used
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:55:06 +03:00
Nadav Har'El
8af58b0801 alternator: better key type parsing
The supported key types are just S(tring), B(lob), or N(umber).
Other types are valid for attributes, but not for keys, and should
not be accepted. And wrong types used should result in the appropriate
user-visible error.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:54:12 +03:00
Nadav Har'El
6cdcf5abac alternator-test: additional cases of invalid schemas in CreateTable
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:54:11 +03:00
Nadav Har'El
9839183157 alternator: better invalid schema detection for CreateTable
To be correct, CreateTable's input parsing need to work in reverse from
what it did: First, the key columns are listed in KeySchema, and then
each of these (and potetially more, e.g., from indexes) need to appear
AttributeDefinitions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:53:22 +03:00
Nadav Har'El
8bfbc1bae5 alternator-test: tests for CreateTable with bad schema
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:53:21 +03:00
Benny Halevy
0f01a4c1b8 dbuild: add usage
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:53:02 +03:00
Benny Halevy
f43bffdf9c dbuild: add help option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:52:50 +03:00
Nadav Har'El
dc34c92899 alternator: better error handling for schema errors in CreateTable
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:52:31 +03:00
Nadav Har'El
77de0af40f alternator-test: test for PutItem to nonexistant table
We expect to see the right error code, not some "internal error".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:52:30 +03:00
Nadav Har'El
ca3553c880 alternator: PutItem: appropriate error for a non-existant table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:38 +03:00
Nadav Har'El
275a07cf10 alternator-test: add another column to test_basic_string_put_and_get()
Just to make sure our success isn't limited to just a single non-key
attribute, let's add another one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:37 +03:00
Nadav Har'El
6ca72b5fed alternator: GetItem should by default returns all the columns, not none
The test

  pytest --local test_item.py::test_basic_string_put_and_get

Now passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:31 +03:00
Benny Halevy
c840c43fa7 dbuild: list available images when no image arg is given
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:51:26 +03:00
Nadav Har'El
9920143fb7 alternator: change empty return of PutItem
Without any arguments, PutItem should return no data at all. But somehow,
for reasons I don't understand, the boto3 driver gets confused from an
empty JSON thinking it isn't JSON at all. If we return a structure with
an empty "attributes" fields, boto3 is happy.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:49:20 +03:00
Nadav Har'El
8dec31d23b alternator: add initial implementation of DeleteTable
Add an initial implementation of Delete table, enough for making the

   pytest --local test_table.py::test_create_and_delete_table

Pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:42 +03:00
Nadav Har'El
41d4b88e78 alternator: on unknown operation, return standard API error
When given an unknown operation (we didn't implement yet many of them...)
we should throw the appropriate api_error, not some random exception.

This allows the client to understand the operation is not supported
and stop retrying - instead of retrying thinking this was a weird
internal error.

For example the test
   pytest --local test_table.py::test_create_and_delete_table

Now fails immediately, saying Unsupported operation DeleteTable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:04 +03:00
Nadav Har'El
1b1921bc94 alternator: fix JSON in DescribeTable response
The structure's name in DescribeTable's output is supposed to be called
"Table", not "TableDescription". Putting in the wrong place caused the
driver's table creation waiters to fail.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:44:14 +03:00
Nadav Har'El
6a455035ba alternator: validate table name in CreateTable
validate table name in CreateTable, and if it doesn't fit DynamoDB's
requirement, return the appropriate error as drivers expect.

With this patch, test_table.py::test_create_table_unsupported_names
now passes (albeit with a one minute pause - this a bug with keep-alive
support...).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:24 +03:00
Nadav Har'El
0da214c2fe alternator-test: test_create_table_unsupported_names minor fix
Check the expected error message to contain just ValidationException
instead of an overly specific text message from DynamoDB, so we aren't
so constraint in our own messages' wording.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:23 +03:00
Nadav Har'El
4f721a0637 alternator-test: test for creating table with very long name
Dynamo allows tables names up to 255 characters, but when this is tested on
Alternator, the results are disasterous: mkdir with such a long directory
name fails, Scylla considers this an unrecoverable "I/O error", and exits
the server.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:22 +03:00
Nadav Har'El
6967dd3d8f test-table: test DescribeTable on non-existent table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:21 +03:00
Nadav Har'El
d0cdc65b4c Add "--local" option to run test against local Scylla installation
For example "pytest --local test_item.py"

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:21 +03:00
Nadav Har'El
079c7c3737 test_item.py: basic string put and get test
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:20 +03:00
Nadav Har'El
4550f3024d test_table fixture: be quicker to realize table was created.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:19 +03:00
Nadav Har'El
f1f76ed17b test_table fixture: automatically delete
Automatically delete the test table when the test ends.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:18 +03:00
Nadav Har'El
a946e255c6 test_item.py: start testing CRUD operations
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:17 +03:00
Nadav Har'El
4d7d871930 Start to use "test fixtures"
Start to use "test fixtures" defined in conftest.py: The connection to
the DynamoDB API, and also temporary tables, can be reused between multiple
tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:16 +03:00
Nadav Har'El
6984ccf462 Add some table tests and README
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:15 +03:00
Nadav Har'El
f66ec337f7 alternator: very initial implementation of DescribeTable
This initial implementation is enough to pass a test of getting a
failure for a non-existant table -
test_table.py::test_describe_table_non_existent_table
and to recognize an existing table. But it's still missing a lot
of fields for an existing table (among others, the schema).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:41:32 +03:00
Nadav Har'El
ad9eb0a003 alternator: errors should be output from server as Dynamo drivers expect
Exceptions from the handlers need to be output in a certain way - as
a JSON with specific fields - as DynamoDB drivers expect them to be.
If a handler throws an alternator::api_error with these specific fields,
they are output, but any other exception is converted into the same
format as an "Internal Error".

After this patch, executor code can throw an alternator::api_error and
the client will receive this error in the right format.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:40:55 +03:00
Nadav Har'El
db49bc6141 alternator: add alternator::api_error exception type
DynamoDB error messages are returned in JSON format and expect specific
information: Some HTTP error code (often but not always 400), a string
error "type" and a user-readable message. Code that wants to return
user-visible exceptions should use this type, and in the next patch we
will translate it to the appropriate JSON string.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:39:26 +03:00
Nadav Har'El
9d72bc3167 alternator: table creation time is in seconds
The "Timestamp" type returned for CreationDateTime can be one of several
things but if it is a number, it is supposed to be the time in *seconds*
since the epoch - not in milliseconds. Returning milliseconds as we
wrongly did causes boto3 (AWS's Python driver) to throw a parse exception
on this response.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:38:41 +03:00
Nadav Har'El
c0518183c2 alternator: require alternator-port configuration
Until now, we always opened the Alternator port along with Scylla's
regular ports (CQL etc.). This should really be made optional.

With this patch, by default Alternator does NOT start and does not
open a port. Run Scylla with --alternator-port=8000 to open an Alternator
API port on port 8000, as was the default until now. It's also possible
to set this in scylla.yaml.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:38:31 +03:00
Piotr Sarna
2ec78164bc alternator: add minimal HTTP interface
The interface works on port 8000 by default and provides
the most basic alternator operations - it's an incomplete
set without validation, meant to allow testing as early as possible.
2019-09-11 12:34:18 +03:00
Benny Halevy
443e0275ab dbuild: add --image option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 11:46:33 +03:00
Tomasz Grabiec
06154569d5 gdb: Use sstable tracker to get the list of sstables 2019-09-10 17:05:19 +02:00
Tomasz Grabiec
a141d30eca gdb: Make intrusive_list recognize member_hook links
GDB now gives "struct boost::intrusive::member_hook" from template_arguments()
2019-09-10 17:05:19 +02:00
Tomasz Grabiec
c014c79d4b sstables: Track whether sstable was already open or not
Some sstable objects correspond to sstables which are being written
and are not sealed yet. Such sstables don't have all the fields
filled-in. Tools which calculate statistics (like GDB scripts) need to
distinguish such sstables.
2019-09-10 17:05:18 +02:00
Tomasz Grabiec
33bef82f6b sstables: Track all instances of sstable objects
Will make it easier to collect statistics about sstable in-memory metadata.
2019-09-10 17:05:16 +02:00
Tomasz Grabiec
fd74504e87 sstables: Make sstable object not movable
Will be easier to add non-movable fields.

We don't really need it to be movable, all instances should be managed
by a shared pointer.
2019-09-10 17:04:54 +02:00
Tomasz Grabiec
589c7476e0 sstables: Move constructor out of line 2019-09-10 17:04:54 +02:00
Tomasz Grabiec
785fe281e7 gdb: scylla sstables: Print table name
Message-Id: <1568121825-32008-1-git-send-email-tgrabiec@scylladb.com>
2019-09-10 16:36:21 +03:00
Glauber Costa
6651f96a70 sstables: do not keep sharding information from scylla metadata in memory (#4915)
There is no reason to keep parts of the the Scylla Metadata component in memory
after it is read, parsed, and its information fed into the SSTable.

We have seen systems in which the Scylla metadata component is one
of the heaviest memory users, more than the Summary and Filter.

In particular, we use the token metadata, which is the largest part of the
Scylla component, to calculate a single integer -> the shards that are
responsible for this SSTable. Once we do that, we never use it again

Tests: unit (release/debug), + manual scylla write load + reshard.

Fixes #4951

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-09-09 22:28:51 +03:00
Tomasz Grabiec
a09479e63c Merge "Validate position in partition monotonicity" from Benny
Introduce mutation_fragment_stream_validator class and use it as a
Filter to flat_mutation_reader::consume_in_thread from
sstable::write_components to validate partition region and optionally
clustering key monotonicity.

Fixes #4803
2019-09-09 15:38:31 +02:00
Benny Halevy
42f6462837 config: enable_sstable_key_validation by default in debug build
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
34d306b982 config: add enable_sstable_key_validation option
key monotonicity validation requires an overhead to store the last key and also to compare
therefore provide an option to enable/disable it (disabled by default).

Refs #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
507c99c011 mutation_fragment_stream_validator: add compare_keys flag
Storing and comparing keys is expensive.
Add a flag to enable/disable this feature (disabled by default).
Without the flag, only the partition region monotonicity is
validated, allowing repeated clustering rows, regardless of
clustering keys.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
bc2ef1d409 mutation_fragment: declare partition_region operator<< in header file
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
496467d0a2 sstables: writer: Validate input mutation fragment stream
Fixes #4803
Refs #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
a37acee68f position_in_partition: define operator=(position_in_partition_view)
The respective constructor is explicit.
Define this assignment operator to be used by flat_mutation_reader
mutation_fragment_stream_validator filter so that it can use
mutation_fragment::position() verbatim and keep its internal
state as position_in_partition.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
41b60b8bc5 compaction: s/filter_func/make_partition_filter/
It expresses the purpose of this function better
as suggested by Tomasz Grabiec.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
24c7320575 dbuild: run interactive shell by default
If not given any other args to run, just run an interactive shell.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190909113140.9130-1-bhalevy@scylladb.com>
2019-09-09 15:15:57 +03:00
Nadav Har'El
2543760ee6 docs/metrics.md: document additional "lables"
Recently we started to use more the concept of metric labels - several
metrics which share the same name, but differ in the value of some label
such a "group" (for different scheduling groups).

This patch documents this feature in docs/metrics.md, gives the example of
scheduling groups, and explains a couple more relevant Promethueus syntax
tricks.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190909113803.15383-1-nyh@scylladb.com>
2019-09-09 15:15:57 +03:00
Botond Dénes
59a96cd995 scylla-gdb.py: introduce scylla task-queues
This command provides an overview of the reactors task queues.
Example:
   id name                             shares  tasks
 A 00 "main"                           1000.00 4
   01 "atexit"                         1000.00 0
   02 "streaming"                       200.00 0
 A 03 "compaction"                      171.51 1
   04 "mem_compaction"                 1000.00 0
*A 05 "statement"                      1000.00 2
   06 "memtable"                          8.02 0
   07 "memtable_to_cache"               200.00 0

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190906060039.42301-1-bdenes@scylladb.com>
2019-09-09 15:15:57 +03:00
Avi Kivity
8e8975730d Update seastar submoodule
* seastar cb7026c16f...b3fb4aaab3 (10):
  > Revert "scheduling groups: Adding per scheduling group data support"
  > scheduling groups: Adding per scheduling group data support
  > rpc: check that two servers are not created with the same streaming id
  > future: really ignore exceptions in ignore_ready_future
  > iostream: Constify eof() function
  > apply.hh: add missing #include for size_t
  > scheduling_group_demo: add explicit yields since future::get() no longer does
  > Fix buffer size used when calling accept4()
  > future-util: reduce allocations and continuations in parallel_for_each
  > rpc: lz4_decompressor: Add a static constexpr variable decleration for Cpp14 compatibility
2019-09-09 15:15:34 +03:00
Gleb Natapov
9e9f64d90e messaging_service: configure different streaming domain for each rpc server
A streaming domain identifies a server across shards. Each server should
have different one.

Fixes: #4953

Message-Id: <20190908085327.GR21540@scylladb.com>
2019-09-08 14:05:40 +03:00
Piotr Sarna
01410c9770 transport: make sure returning connection errors happens inside the gate.
Previously, the gate could get
closed too early, which would result in shutting down the server
before it had an opportunity to respond to the client.

Refs #4818
2019-09-08 13:23:20 +03:00
Avi Kivity
5663218fac Merge "types: Fix decimal to integer and varint to integer conversion" from Rafael
"
The release notes for boost 1.67.0 includes:

Breaking Change: When converting a multiprecision integer to a narrower type, if the value is too large (or negative) to fit in the smaller type, then the result is either the maximum (or minimum) value of the target

Since we just moved out of boost 1.66, we have to update our code.

This fixes issue #4960
"

* 'espindola/fix-4960' of https://github.com/espindola/scylla:
  types: fix varint to integer conversion
  types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer
  types: fix decimal to integer conversion
  types: extract helper for converting a decimal to a cppint
  types: rename and detemplate make_castas_fctn_from_decimal_to_integer
2019-09-08 10:45:42 +03:00
Avi Kivity
244218e483 Merge "simplify date type" from Rafael
"
With this patch series one has to be explicit to create a date_type_impl and now there is only the one documented difference between date_type_impl and timestamp_type_impl.
"

* 'espindola/simplify-date-type' of https://github.com/espindola/scylla:
  types: Reduce duplication around date_type_impl
  types: Don't use date_type_native_type when we want a timestamp
  types: Remove timestamp_native_type
  types: Don't specialize data_type_for for db_clock::time_point
  types: Make it harder to create date_type
2019-09-08 10:21:48 +03:00
Rafael Ávila de Espíndola
3bac4ebac7 types: Reduce duplication around date_type_impl
According to the comments, the only different between date_type_impl
and timestamp_type_impl is the comparison function.

This patch makes that explicit by merging all code paths except:

* The warning when converting between the two
* The compare function

The date_type_impl type can still be user visible via very old
sstables or via the thrift protocol. It is not clear if we still need
to support either, but with this patch it is easy to do so.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
36d40b4858 types: Don't use date_type_native_type when we want a timestamp
In these cases it is pretty clear that the original code wanted to
create a timestamp_type data_value but was creating a date_type one
because of the old defaults.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
01cd21c04d types: Remove timestamp_native_type
Now that we know that anything expecting a date_type has been
converted to date_type_native_type, switch to using
db_clock::time_point when we want a timestamp_type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
df6c2d1230 types: Don't specialize data_type_for for db_clock::time_point
This also moves every user to date_type_native_type. A followup patch
will convert to timestamp_type when appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
e09fa2dcff types: Make it harder to create date_type
date_type was replaced with timestamp_type, but it was very easy to
create a date_type instead of a timestamp_type by accident.

This patch changes the code so that a date_type is no longer
implicitly used when constructing a data_value. All existing code that
was depending on this is converted to explicitly using
date_type_native_type. A followup patch will convert to timestamp_type
when appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Gleb Natapov
f78b2c5588 transport: remove remaining craft related to cql's server load balancing
Commit 7e3805ed3d removed the load balancing code from cql
server, but it did not remove most of the craft that load balancing
introduced. The most of the complexity (and probably the main reason the
code never worked properly) is around service::client_state class which
is copied before been passed to the request processor (because in the past
the processing could have happened on another shard) and then merged back
into the "master copy" because a request processing may have changed it.

This commit remove all this copying. The client_request is passed as a
reference all the way to the lowest layer that needs it and it copy
construction is removed to make sure nobody copies it by mistake.

tests: dev, default c-s load of 3 node cluster

Message-Id: <20190906083050.GA21796@scylladb.com>
2019-09-07 18:17:53 +03:00
Avi Kivity
3b5aa13437 Merge "Optimize type find" from Rafael
"
This avoids a double dispatch on _kind and also removes a few shared_ptr copies.

The extra work was a small regression from the recent types refactoring.
"

* 'espindola/optimize_type_find' of https://github.com/espindola/scylla:
  types: optimize type find implementation
  types: Avoid shared_ptr copies
2019-09-07 18:14:36 +03:00
Gleb Natapov
5b9dc00916 test: fix query_processor_test::test_query_counters to use SERIAL consistency correctly
It is not possible to scan a table with SERIAL consistency only to read
a single partition.

Message-Id: <20190905143023.GQ21540@scylladb.com>
2019-09-07 18:07:01 +03:00
Gleb Natapov
e52ebfb957 cql3: remove unused next_timestamp() function
next_timestamp() just calls get_timestamp() directly and nobody uses it
anyway.

Message-Id: <20190905101648.GO21540@scylladb.com>
2019-09-05 17:20:21 +03:00
Botond Dénes
783277fb02 stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase
Currently when an error happens during the receive and distribute phase
it is swallowed and we just return a -1 status to the remote. We only
log errors that happen during responding with the status. This means
that when streaming fails, we only know that something went wrong, but
the node on which the failure happened doesn't log anything.

Fix by also logging errors happening in the receive and distribute
phase. Also mention the phase in which the error happened in both error
log messages.

Refs: #4901
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>
2019-09-05 13:43:00 +02:00
Rafael Ávila de Espíndola
dd81e94684 types: fix varint to integer conversion
The previous code was using the boost::multiprecision::cpp_int to
integer conversion, but that doesn't have the same semantics an cql
for signed numbers.

This fixes the dtest cql_cast_test.py:CQLCastTest.cast_varint_test.

Fixes #4960

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
263e18b625 types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer
It will be used when converting varint to integer too.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
2d453b8e17 types: fix decimal to integer conversion
The previous code was using the boost::multiprecision::cpp_rational to
integer conversion, but that doesn't have the same semantics an cql.

This patch avoids creating a cpp_rational in the first place and works
just with integers.

This fixes the dtest cql_cast_test.py:CQLCastTest.cast_decimal_test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
fb760774dd types: extract helper for converting a decimal to a cppint
It will also be used in the decimal to integer conversion.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:07 -07:00
Rafael Ávila de Espíndola
40e6882906 types: rename and detemplate make_castas_fctn_from_decimal_to_integer
It was only ever used for varint.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 14:54:47 -07:00
Avi Kivity
301246f6c0 storage_proxy: protect _view_update_handlers_list iterators from invalidation
on_down() iterates over _view_update_handlers_list, but it yields during iteration,
and while it yields, elements in that list can be removed, resulting in a
use-after-free.

Prevent this by registering iterators that can be potentially invalidated, and
any time we remove an element from the list, check whether we're removing an element
that is being pointed to by a live iterator. If that is the case, advance the iterator
so that it points at a valid element (or at the end of the list).

Fixes #4912.

Tests: unit (dev)
2019-09-04 17:19:28 +03:00
Tomasz Grabiec
9f5826fd4b Merge "Use canonical mutations for background schema sync" from Botond
Currently the background schema sync (push/pull) uses frozen mutation to
send the schema mutations over the wire to the remote node. For this to
work correctly, both nodes have to have the exact same schema for the
system schema tables, as attempting to unpack the frozen mutation with
the wrong schema leads to undefined behaviour.
To avoid this and to ensure syncing schema between nodes with different
schema table schema versions is defined we migrate the background
schema sync to use canonical mutations for the transfer of the schema
mutations. Canonical mutations are immune to this problem, as they
support deserializing with any version of the schema, older or newer
one.

The foreground schema sync mechanisms -- the on-demand schema pulls on
reads and writes -- already use canonical mutations to transmit the
schema mutations.

It is important to note that due to this change, column-level
incompatibilities between the schema mutations and the schema used to
deserialize them will be hidden. This is undesired and should be fixed
in a follow-up (#4956). Table level incompatibilities are detected and
schema mutations containing such mutations will be rejected just like before.

This patch adds canonical mutation support to the two background schema
sync verbs:
* `DEFINITIONS_UPDATE` (schema push)
* `MIGRATION_REQUEST` (schema pull)

Both verbs still support the old frozen mutation schema transfer, albeit
that path is now much less efficient. After all nodes are upgraded, the
pull verb can effectively avoid sending frozen mutations altogether,
completely migrating to canonical mutations. Unfortunately this was not
possible for the push verb, so that one now has an overhead as it needs
to send both the frozen and canonical mutations.

Fixes: #4273
2019-09-04 13:58:14 +02:00
Benny Halevy
bc29520eb8 flat_mutation_reader: consume_in_thread: add mutation_filter
For validating mutation_fragment's monotonicity.

Note: forwarding constructor allows implicit conversion by
current callers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-04 13:42:37 +03:00
Rafael Ávila de Espíndola
000514e7cc sstable: close file_writer if an exception in thrown
The previous code was not exception safe and would eventually cause a
file to be destroyed without being closed, causing an assert failure.

Unfortunately it doesn't seem to be possible to test this without
error injection, since using an invalid directory fails before this
code is executed.

Fixes #4948

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190904002314.79591-1-espindola@scylladb.com>
2019-09-04 13:28:55 +03:00
Botond Dénes
7adc764b6e messaging_service: add canonical_support to schema pull and push verbs
The verbs are:
* DEFINITIONS_UPDATE (push)
* MIGRATION_REQUEST (pull)

Support was added in a backward-compatible way. The push verb, sends
both the old frozen mutation parameter, and the new optional canonical
mutation parameter. It is expected that new nodes will use the latter,
while old nodes will fall-back to the former. The pull verb has a new
optional `options` parameter, which for now contains a single flag:
`remote_supports_canonical_mutation_retval`. This flag, if set, means
that the remote node supports the new canonical mutation return value,
thus the old frozen mutations return value can be left empty.
2019-09-04 10:32:44 +03:00
Botond Dénes
d9a8ff15d8 service::migration_manager: add canonical_mutation merge_schema_from() overload
Add an overload which takes a vector of canonical mutations. Going
forward, this is the overload to use.
2019-09-04 10:32:44 +03:00
Botond Dénes
e02b93cae1 schema_tables: convert_schema_to_mutations: return canonical_mutations
In preparation to the schema push/pull migrating to use canonical
mutations, convert the method producing the schema mutations to return a
vector of canonical mutations. The only user, MIGRATION_REQUEST verb,
converts the canonical mutations back to frozen mutations. This is very
inefficient, but this path will only be used in mixed clusters. After
all nodes are upgraded the verb will be sending the canonical mutations
directly instead.
2019-09-04 08:47:20 +03:00
Rafael Ávila de Espíndola
b100f95adc types: optimize type find implementation
This turns find into a template so there is only one switch over the
kind of each type in the search.

To evaluate the change in code size sizes, I added [[noinline]] to
find and obtained the following results.

The release columns for release in the before case have an extra column
because the functions are sufficiently complex to trigger gcc to split
them in hot + cold.

before:
                      dev                         release (hot + cold split)
find                  0x35f               = 863   0x3d5 + 0x112               = 1255
references_duration   0x62 + 0x22 + 0x8   = 140   0x55 + 0x1f + 0x2a + 0x8    = 166
references_user_type  0x6b + 0x26 + 0x111 = 418   0x65 + 0x1f + 0x32 + 0x11b  = 465

after:
                      dev                          release
find                  0xd6 + 0x1b4        = 650    0xd2 + 0x1f5               = 711
references_duration   0x13                = 19     0x13                       = 19
references_user_type  0x1a                = 26     0x21                       = 33

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-03 08:23:21 -07:00
Rafael Ávila de Espíndola
e0065b414e types: Avoid shared_ptr copies
They are somewhat expensive (in code size at least) and not needed
everywhere.

Inside the getter the variables are 'const data_type&', so we can
return that. Everything still works when a copy is needed, but in code
that just wants to check a property we avoid the copy.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-03 07:43:35 -07:00
Benny Halevy
bdfb73f67d scripts/create-relocatable-package: ldd: print executable name in exception
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190903080511.534-1-bhalevy@scylladb.com>
2019-09-03 15:34:38 +03:00
Avi Kivity
294a86122e Merge "nonroot installer" from Takuya
"
This is nonroot installer patchset v9.
"

* 'nonroot_v9' of https://github.com/syuu1228/scylla:
  dist/common/scripts: support nonroot mode on setup scripts
  reloc/python3: add install.sh on python relocatable package
  install.sh: add --nonroot mode
  dist/common/systemd: untemplataize *.service, use drop-in units instead
  dist/debian: delete debian/*.install, debian/*.dirs
2019-09-03 15:33:20 +03:00
Piotr Sarna
7b297865e1 transport: wait for the connections to finish when stopping (#4818)
During CQL request processing, a gate is used to ensure that
the connection is not shut down until all ongoing requests
are done. However, the gate might have been left too early
if the database was not ready to respond immediately - which
could result in trying to respond to an already closed connection
later. This issue is solved by postponing leaving the gate
until the continuation chain that handles the request is finished.

Refs #4808
2019-09-03 14:49:11 +03:00
Avi Kivity
8fb59915bb Merge "Minor cleanup patches for sstables" from Asias
* 'cleanup_sstables' of https://github.com/asias/scylla:
  sstables: Move leveled_compaction_strategy implementation to source file
  sstables: Include dht/i_partitioner.hh for dht::partition_range
2019-09-03 14:47:44 +03:00
Takuya ASADA
31ddb2145a dist/common/scripts: support nonroot mode on setup scripts
Since nonroot mode requires to run everything on non-privileged user,
most of setup scripts does not able to use nonroot mode.
We only provide following functions on nonroot mode:
 - EC2 check
 - IO setup
 - Node exporter installer
 - Dev mode setup
Rest of functions will be skipped on scylla_setup.
To implement nonroot mode on setup scripts, scylla_util provides
utility functions to abstract difference of directory structure between normal
installation and nonroot mode.
2019-09-03 20:06:35 +09:00
Takuya ASADA
cfa8885ae1 reloc/python3: add install.sh on python relocatable package
To support nonroot installation on scylla-python3, add install.sh on
scylla-python3 relocatable package.
2019-09-03 20:06:30 +09:00
Takuya ASADA
2de14e0800 install.sh: add --nonroot mode
This implements the way to install Scylla without requires root privilege,
not distribution dependent, does not uses package manager.
2019-09-03 20:06:24 +09:00
Takuya ASADA
cde798dba5 dist/common/systemd: untemplataize *.service, use drop-in units instead
Since systemd unit can override parameters using drop-in unit, we don't need
mustache template for them.

Also, drop --disttype and --target options on install.sh since it does not
required anymore, introduce --sysconfdir instead for non-redhat distributions.
2019-09-03 20:06:15 +09:00
Takuya ASADA
49a360f234 dist/debian: delete debian/*.install, debian/*.dirs
Since ac9b115, we switched to install.sh on Debian so we don't rely on .deb
specific packaging scripts anymore.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2019-09-03 20:06:09 +09:00
Benny Halevy
7827e3f11d tests: test_large_data: do not stop database
Now that compaction returns only after the compacted sstables are
deleted we no longer need to stop the base to force waiting
for deletes (that were previously done asynchronously)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
19b67d82c9 table::on_compaction_completion: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
8dd6e13468 table::on_compaction_completion: wait for background deletes
Don't let background deletes accumulate uncontrollably.

Fixes #4909

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
da6645dc2c table: refresh_snapshot before deleting any sstables
The row cache must not hold refrences to any sstable we're
about to delete.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:29 +03:00
Nadav Har'El
6c4ad93296 api/compaction_manager: do not hold map on the stack
Merged patch series by Amnon Heiman:

This patch fixes a bug that a map is held on the stack and then is used
by a future.

Instead, the map is now moved to the relevant lambda function.

Fixes #4824
2019-09-01 13:16:34 +03:00
Avi Kivity
e962beea20 toolchain: update to Fedora 30 and gcc 9.2
In Fedora 30 we have a new boost version, so we no longer need to
use our patched boost, so we also remove the scylladb/toolchain copr.
2019-09-01 12:05:26 +03:00
Piotr Sarna
23c891923e main: make sure view_builder doesn't propagate semaphore errors
Stopping services which occurs in a destructor of deferred_action
should not throw, or it will end the program with
terminate(). View builder breaks a semaphore during its shutdown,
which results in propagating a broken_semaphore exception,
which in turn results in throwing an exception during stop().get().
In order to fix that issue, semaphore exceptions are explicitly
ignored, since they're expected to appear during shutdown.

Fixes #4875
2019-09-01 11:59:57 +03:00
Tomasz Grabiec
c8f8a9450f Merge "Improve cpu instruction set support checks" from Avi
To prevent termination with SIGILL, tighten the instruction set
support checks. First, check for CLMUL too. Second, add a check in
scylla_prepare to catch the problem early.

Fixes #4921.
2019-08-30 16:54:04 +02:00
Avi Kivity
07010af44c scylla_prepare: verify processor satisfies instruction set requirements
Scylla requires the CLMUL and SSE 4.2 instruction sets and will fail without them.
There is a check in main(), but that happens after the code is running and it may
already be too late. Add a check in scylla_prepare which runs before the main
executable.
2019-08-29 15:34:29 +03:00
Avi Kivity
9579946e72 main: extend CPU feature check to verify that PCLMUL is available
Since 79136e895f, we use the pclmul instruction set,
so check it is there.
2019-08-29 15:13:32 +03:00
Gleb Natapov
e61a86bbb2 to_string: Add operator<< overload for std::tuple.
Message-Id: <20190829100902.GN21540@scylladb.com>
2019-08-29 13:35:02 +03:00
Rafael Ávila de Espíndola
036f51927c sstables: Remove unused include
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190827210424.37848-1-espindola@scylladb.com>
2019-08-28 11:32:44 +03:00
Benny Halevy
869b518dca sstables: auto-delete unsealed sstables
Fixes #4807

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190827082044.27223-1-bhalevy@scylladb.com>
2019-08-28 09:46:17 +03:00
Botond Dénes
969aa22d51 configure.py: promote unused result warning to error
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190827111428.6829-2-bdenes@scylladb.com>
2019-08-28 09:46:17 +03:00
Botond Dénes
480b42b84f tests/gossip_test: silence discarded future warning
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190827111428.6829-1-bdenes@scylladb.com>
2019-08-28 09:46:17 +03:00
Avi Kivity
d85339e734 Update seastar submodule
* seastar 20bfd61955...cb7026c16f (2):
  > net: dpdk: suppress discarded future warning
  > Merge "Optimize promises in then/then_wrapped" from Rafael
2019-08-28 09:46:17 +03:00
Avi Kivity
f1d73d0c13 Merge "systemd: put scylla processes in systemd slices. #4743" from Glauber
"
It is well known that seastar applications, like Scylla, do not play
well with external processes: CPU usage from external processes may
confuse the I/O and CPU schedulers and create stalls.

We have also recently seen that memory usage from other application's
anonymous and page cache memory can bring the system to OOM.

Linux has a very good infrastructure for resource control contributed by
amazingly bright engineers in the form of cgroup controllers. This
infrastructure is exposed by SystemD in the form of slices: a
hierarchical structure to which controllers can be attached.

In true systemd way, the hierarchy is implicit in the filenames of the
slice files. a "-" symbol defines the hierarchy, so the files that this
patch presents, scylla-server and scylla-helper, essentially create a
"scylla" cgroup at the top level with "server" and "helper" children.

Later we mark the Services needed to run scylla as belonging to one
or the other through the Slice= directive.

Scylla DBAs can benefit from this setup by using the systemd-run
utility to fire ad-hoc commands.

Let's say for example that someone wants to hypothetically run a backup
and transfer files to an external object store like S3, making sure that
the amount of page cache used won't create swap pressure leading to
database timeouts.

One can then run something like:

sudo systemd-run --uid=id -u scylla --gid=id -g scylla -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool

(or even better, the backup tool can itself be a systemd timer)
"

* 'slices' of https://github.com/glommer/scylla:
  systemd: put scylla processes in systemd slices.
  move postinst steps to an external script
2019-08-26 20:16:55 +03:00
Benny Halevy
20083be9f6 sstables: delete_atomically: fix misplaced parenthesis in pending_delete_log warning message
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190818064637.9207-1-bhalevy@scylladb.com>
2019-08-26 19:50:21 +03:00
Avi Kivity
b9e9d7d379 Merge "Resolve discarded future warnings" from Botond
"
The warning for discarded futures will only become useful, once we can
silence all present warnings and flip the flag to make it become error.
Then it will start being useful in finding new, accidental discarding of
futures.
This series silences all remaining warnings in the Scylla codebase. For
those cases where it was obvious that the future is discarded on
purpose, the author taking all necessary precaution (handling exception)
the warning was simply silenced by casting the future to void and
adding a relevant comment. Where the discarding seems to have been done
in error, I have fixed the code to not discard it. To the rest of the
sites I added a FIXME to fix the discarding.
"

* 'resolve-discarded-future-warnings/v4.2' of https://github.com/denesb/scylla:
  treewide: silence discarded future warnings for questionable discards
  treewide: silence discarded future warnings for legit discards
  tests: silence discarded future warnings
  tests/cql_query_test.cc: convert some tests to thread
2019-08-26 19:40:25 +03:00
Botond Dénes
136fc856c5 treewide: silence discarded future warnings for questionable discards
This patches silences the remaining discarded future warnings, those
where it cannot be determined with reasonable confidence that this was
indeed the actual intent of the author, or that the discarding of the
future could lead to problems. For all those places a FIXME is added,
with the intent that these will be soon followed-up with an actual fix.
I deliberately haven't fixed any of these, even if the fix seems
trivial. It is too easy to overlook a bad fix mixed in with so many
mechanical changes.
2019-08-26 19:28:43 +03:00
Botond Dénes
fddd9a88dd treewide: silence discarded future warnings for legit discards
This patch silences those future discard warnings where it is clear that
discarding the future was actually the intent of the original author,
*and* they did the necessary precautions (handling errors). The patch
also adds some trivial error handling (logging the error) in some
places, which were lacking this, but otherwise look ok. No functional
changes.
2019-08-26 18:54:44 +03:00
Botond Dénes
cff4c4932d tests: silence discarded future warnings 2019-08-26 18:54:44 +03:00
Botond Dénes
486fa8c10c tests/cql_query_test.cc: convert some tests to thread
Some tests are currently discarding futures unjustifiably, however
adding code to wait on these futures is quite inconvenient due to the
continuation style code of these tests. Convert them to run in a seastar
thread to make the fix easier.
2019-08-26 18:54:44 +03:00
Tomasz Grabiec
ac5ff4994a service: Announce the new schema version when features are enabled
Introduced in c96ee98.

We call update_schema_version() after features are enabled and we
recalculate the schema version. This method is not updating gossip
though. The node will still use it's database::version() to decide on
syncing, so it will not sync and stay inconsistent in gossip until the
next schema change.

We should call updatE_schema_version_and_announce() instead so that
the gossip state is also updated.

There is no actual schema inconsistency, but the joining node will
think there is and will wait indefinitely. Making a random schema
change would unbock it.

Fixes #4647.

Message-Id: <1566825684-18000-1-git-send-email-tgrabiec@scylladb.com>
2019-08-26 17:54:59 +03:00
Avi Kivity
a7b82af4c3 Update seastar submodule
* seastar afc5bbf511...20bfd61955 (18):
  > reactor: closing file used to check if direct_io is supported
  > future: set_coroutine(): s/state()/_state/
  > tests/perf/perf_test.hh: suppress discarded future warning
  > tests: rpc: fix memory leak in timeout wraparound tests
  > Revert "future-util: reduce allocations and continuations in parallel_for_each"
  > reactor: fix rename_priority_class() build failure in C++14 mode
  > future: mark future_state_base::failed() as unlikely
  > future-util: reduce allocations and continuations in parallel_for_each
  > future-utils: generalize when_all_estimate_vector_capacity()
  > output_stream: Add comment on sequentiality
  > docs/tutorial.md: minor cleanups in first section
  > core: fix a race in execution stages (Fixes #4856, fixes #4766)
  > semaphore: use semaphore's clock type in with_semaphore()/get_units()
  > future: fix doxygen documentation for promise<>
  > sharded: fixed detecting stop method when building with clang
  > reactor: fixed locking error in rename_priority_class
  > Assert that append_challenged_posix_file_impl are closed.
  > rpc: correctly handle huge timeouts
2019-08-26 15:37:58 +03:00
Asias He
3ea1255020 storage_service: Use sleep_abortable instead of sleep (#4697)
Make the sleep abortable so that it is able to break the loop during
shutdown.

Fixes #4885
2019-08-26 13:35:44 +03:00
Asias He
2f24fd9106 sstables: Move leveled_compaction_strategy implementation to source file
It is better than putting everything in header.
2019-08-26 16:49:48 +08:00
Asias He
b69138c4e4 sstables: Include dht/i_partitioner.hh for dht::partition_range
Get rid of one FIXME.
2019-08-26 16:35:18 +08:00
Nadav Har'El
b60d201a11 API: column_family.cc Add get_built_indexes implmentation
Merged patch series from Amnon Heiman amnon@scylladb.com

This Patch adds an implementation of the get built index API and remove a
FIXME.

The API returns a list of secondary indexes belongs to a column family
and have already been fully built.

Example:

CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) );
CREATE index on scylla_demo.mytableID (time);

$ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid'
["mytableid_time_idx"]
2019-08-25 18:37:44 +03:00
Amnon Heiman
2d3185fa7d column_family.cc: remove unhandle future
The sum_ratio struct is a helper struct that is used when calculating
ratio over multiple shards.

Originally it was created thinking that it may need to use future, in
practice it was never used and the future was ignore.

This patch remove the future from the implementation and reduce an
unhandle future warning from the compilation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-25 16:51:14 +03:00
Amnon Heiman
21dee3d8ef API:column_family.cc Add get_build_index implmentation
This Patch adds an implementation of the get build index API and remove a
FIXME.

The API returns the list of the built secondary indexes belongs to a column family.

Example:

CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE scylla_demo.mytableID (     uid uuid,     text text,     time timeuuid,     PRIMARY KEY (uid, time) );
CREATE index on scylla_demo.mytableID (time);

$ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid'
["mytableid_time_idx"]

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-25 16:46:49 +03:00
Juliana Oliveira
711ed76c82 auth: standard_role_manager: read null columns as false
When a role is created through the `create role` statement, the
'is_superuser' and 'can_login' columns are set to false by default.
Likewise, `list roles`, `alter roles` and `* roles` operations
expect to find a boolean when reading the same columns.

This is not the case, though, when a user directly inserts to
`system_auth.roles` and doesn't set those columns. Even though
manually creating roles is not a desired day-to-day operation,
it is an insert just like any other and it should work.

`* roles` operations, on the other hand, are not prepared for
this deviations. If a user manually creates a role and doesn't
set boolean values to those columns, `* roles` will return all
sorts of errors. This happens because `* roles` is explicitly
expecting a boolean and casting for it.

This patch makes `* roles` more friendly by considering the
boolean variable `false` - inside `* roles` context - if the
actual value is `null`; it won't change the `null` value.

Fixes #4280

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20190816032617.61680-1-juliana@scylladb.com>
2019-08-25 11:52:43 +03:00
Pekka Enberg
118a141f5d scylla_blocktune.py: Kill btrfs related FIXME
The scylla_blocktune.py has a FIXME for btrfs from 2016, which is no
longer relevant for Scylla deployments, as Red Hat dropped support for
the file system in 2017.

Message-Id: <20190823114013.31112-1-penberg@scylladb.com>
2019-08-24 20:40:08 +03:00
Botond Dénes
18581cfb76 multishard_mutation_query: create_readr(): use the caller's priority class
The priority class the shard reader was created with was hardcoded to be
`service::get_local_sstable_query_read_priority()`. At the time this
code was written, priority classes could not be passed to other shards,
so this method, receiving its priority class parameters from another
shard, could not use it. This is now fixed, so we can just use whatever
the caller wants us to use.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190823115111.68711-1-bdenes@scylladb.com>
2019-08-23 16:10:43 +02:00
Tomasz Grabiec
080989d296 Merge "cql3: cartesian product limits" from Avi
Cartesian products (generated by IN restrictions) can grow very large,
even for short queries. This can overwhelm server resources.

Add limit checking for cartesian products, and configuration items for
users that are not satisfied with the default of 100 records fetched.

Fixes #4752.

Tests: unit (dev), manual test with SIGHUP.
2019-08-21 19:35:59 +02:00
Avi Kivity
67b0d379e0 main: add glue between db::config and cql3::cql_config
Copy values between the flat db::config and the hierarchical cql_config, adding
observers to keep the values updated.
2019-08-21 19:35:59 +02:00
Avi Kivity
8c7ad1d4cd cql: single_column_clustering_key_restrictions: limit cartesian products
Cartesian products (via IN restrictions) make it easy to generate huge
primary key sets with simple queries, overflowing server resources. Limit
them in the coordinator and report an exception instead of trying to
execute a query that would consume all of our memory.

A unit test is added.
2019-08-21 19:35:59 +02:00
Avi Kivity
3a44fa9988 cql3, treewide: introduce empty cql3::cql_config class and propagate it
We need a way to configure the cql interpreter and runtime. So far we relied
on accessing the configuration class via various backdoors, but that causes
its own problems around initialization order and testability. To avoid that,
this patch adds an empty cql_config class and propagates it from main.cc
(and from tests) to the cql interpreter via the query_options class, which is
already passed everywhere.

Later patches will fill it with contents.
2019-08-21 19:35:59 +02:00
Rafael Ávila de Espíndola
86c29256eb types: Fix references_user_type
This was broken since the type refactoring. It was checking the static
type, which is always abstract_type. Unfortunately we only had dtests
for this.

This can probably be optimized to avoid the double switch over kind,
but it is probably better to do the simple fix first.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190821155354.47704-1-espindola@scylladb.com>
2019-08-21 19:13:59 +03:00
Dejan Mircevski
ea9d358df9 cql3: Optimize LIKE regex construction
Currently we create a regex from the LIKE pattern for every row
considered during filtering, even though the pattern is always the
same.  This is wasteful, especially since we require costly
optimization in the regex compiler.  Fix it by reusing the regex
whenever the pattern is unchanged since the last call.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-21 16:45:47 +03:00
Piotr Sarna
526f4c42aa storage_proxy: fix iterator liveness issue in on_down (#4876)
The loop over view update handlers used a guard in order to ensure
that the object is not prematurely destroyed (thus invalidating
the iterator), but the guard itself was not in the right scope.
Fixed by replacinga 'for' loop with a 'while' loop, which moves
the iterator incrementation inside the scope in which it's still
guarded and valid.

Fixes #4866
2019-08-21 15:44:43 +03:00
Avi Kivity
4ef7429c4a build: build seastar in build directory
Currently, seastar is built in seastar/build/{mode}. This means we have two build
directories: build/{mode} and seastar/build/{mode}.

This patch changes that to have only a single build directory (build/{mode}). It
does that by calling Seastar's cmake directly instead of through Seastar's
./configure.py.  However, to support dpdk, if that is enabled it calls cmake
through Seastar's ./cooking.sh (similar to what Seastar's ./configure.py does).

All ./configure.py flags are translated to cmake variables, in the same way that
Seastar does.

Contains fix from Rafael to pass the flags for the correct mode.
2019-08-21 13:10:17 +02:00
Rafael Ávila de Espíndola
278b6abb2b Improve documentation on the system.large_* tables
This clarifies that "rows" are clustering rows and that there is no
information about individual collection elements.

The patch also documents some properties common to all these tables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190820171204.48739-1-espindola@scylladb.com>
2019-08-21 10:36:25 +03:00
Vlad Zolotarov
d253846c91 hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
The endpoint directories scanned by space_watchdog may get deleted
by the manager::drain_for().

If a deleted directory is given to a lister::scan_dir() this will end up
in an exception and as a result a space_watchdog will skip this round
and hinted handoff is going to be disabled (for all agents including MVs)
for the whole space_watchdog round.

Let's make sure this doesn't happen by serializing the scanning and deletion
using end_point_hints_manager::file_update_mutex.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:46:46 -04:00
Vlad Zolotarov
b34c36baa2 hinted handoff: make taking file_update_mutex safe
end_point_hints_manager::file_update_mutex is taken by space_watchdog
but while space_watchdog is waiting for it the corresponding
end_point_hints_manager instance may get destroyed by manager::drain_for()
or by manager::stop().

This will end up in a use-after-free event.

Let's change the end_point_hints_manager's API in a way that would prevent
such an unsafe locking:

   - Introduce the with_file_update_mutex().
   - Make end_point_hints_manager::file_update_mutex() method private.

Fixes #4685
Fixes #4836

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:26:19 -04:00
Vlad Zolotarov
dbad9fcc7d db::hints::manager::drain_for(): fix alignment
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Vlad Zolotarov
7a12b46fc9 db::hints::manager: serialize calls to drain_for()
If drain_for() is running together with itself: one instance for the local
node and one for some other node, erasing of elements from the _ep_managers
map may lead to a use-after-free event.

Let's serialize drain_for() calls with a semaphore.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Vlad Zolotarov
09600f1779 db::hints: cosmetics: identation and missing method qualifier
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Avi Kivity
698b72b501 relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
2019-08-20 00:25:43 +02:00
Botond Dénes
4cb873abfe query::trim_clustering_row_ranges_to(): fix handling of non-full prefix keys
Non-full prefix keys are currently not handled correctly as all keys
are treated as if they were full prefixes, and therefore they represent
a point in the key space. Non-full prefixes however represent a
sub-range of the key space and therefore require null extending before
they can be treated as a point.
As a quick reminder, `key` is used to trim the clustering ranges such
that they only cover positions >= then key. Thus,
`trim_clustering_row_ranges_to()` does the equivalent of intersecting
each range with (key, inf). When `key` is a prefix, this would exclude
all positions that are prefixed by key as well, which is not desired.

Fixes: #4839
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190819134950.33406-1-bdenes@scylladb.com>
2019-08-20 00:24:51 +02:00
Avi Kivity
21d6f0bb16 Merge "Add LIKE test cases for all non-string types #4859" from Dejan
"
Follow-up to #4610, where a review comment asked for test coverage on all types. Existing tests cover all the types admissible in LIKE, while this PR adds coverage for all inadmissible types.

Tests: unit (dev)
"

* 'like-nonstring' of https://github.com/dekimir/scylla:
  cql_query_test: Add LIKE tests for all types
  cql_query_test: Remove LIKE-nonstring-pattern case
  cql_query_test: Move a testcase elsewhere in file
2019-08-20 00:24:51 +02:00
Tomasz Grabiec
6813ae22b0 Merge "Handle termination signals during streaming" from Avi
In b197924, we changed the shutdown process not to rely on the global
reactor-defined exit, but instead added a local variable to hold the
shutdown state. However, we did not propagate that state everywhere,
and now streaming processes are not able to abort.

Fix that by enhancing stop_signal with a sharded<abort_source> member
that can be propagated to services. Propagate it to storage_service
and thence to boot_strapper and range_streamer so that streaming
processes can be aborted.

Fixes #4674
Fixes #4501

Tests: unit (dev), manual bootstrap test
2019-08-20 00:24:51 +02:00
Avi Kivity
2c7435418a Merge "database: assign proper io priority for streaming view updates" from Piotr
"
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.

Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...}

Tests: unit(dev)
"

* 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla:
  db,view: wrap view update generation in stream scheduling group
  database: assign proper io priority for streaming view updates
2019-08-20 00:24:51 +02:00
Pekka Enberg
d0eecbf3bb api/storage_proxy: Wire up hinted-handoff status to API
We support hinted-handoff now, so let's return it's status via the API.

Message-Id: <20190819080006.18070-1-penberg@scylladb.com>
2019-08-20 00:24:50 +02:00
Piotr Sarna
3cc5a04301 db,view: wrap view update generation in stream scheduling group
Generating view updates is used by streaming, so the service itself
should also run under the matching scheduling group.
2019-08-20 00:24:50 +02:00
Piotr Sarna
1ab07b80b4 database: assign proper io priority for streaming view updates
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.
2019-08-20 00:24:50 +02:00
Tomasz Grabiec
b9447d0319 Revert "relocatable: switch from run-time relocation to install-time relocation"
This reverts commit 4ecce2d286.

Should be committed via the next branch.
2019-08-20 00:22:30 +02:00
Avi Kivity
4ecce2d286 relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
2019-08-20 00:20:19 +02:00
Glauber Costa
da260ecd61 systemd: put scylla processes in systemd slices.
It is well known that seastar applications, like Scylla, do not play
well with external processes: CPU usage from external processes may
confuse the I/O and CPU schedulers and create stalls.

We have also recently seen that memory usage from other application's
anonymous and page cache memory can bring the system to OOM.

Linux has a very good infrastructure for resource control contributed by
amazingly bright engineers in the form of cgroup controllers. This
infrastructure is exposed by SystemD in the form of slices: a
hierarchical structure to which controllers can be attached.

In true systemd way, the hierarchy is implicit in the filenames of the
slice files. a "-" symbol defines the hierarchy, so the files that this
patch presents, scylla-server and scylla-helper, essentially create a
"scylla" cgroup at the top level with "server" and "helper" children.

Later we mark the Services needed to run scylla as belonging to one
or the other through the Slice= directive.

Scylla DBAs can benefit from this setup by using the systemd-run
utility to fire ad-hoc commands.

Let's say for example that someone wants to hypothetically run a backup
and transfer files to an external object store like S3, making sure that
the amount of page cache used won't create swap pressure leading to
database timeouts.

One can then run something like:

```
   sudo systemd-run --uid=`id -u scylla` --gid=`id -g scylla` -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool
```

(or even better, the backup tool can itself be a systemd timer)

Changes from last version:
- No longer use the CPUQuota
- Minor typo fixes
- postinstall fixup for small machines

Benchmark results:
==================

Test: read from disk, with 100% disk util using a single i3.xlarge (4 vCPUs).
We have to fill the cache as we read, so this should stress CPU, memory and
disk I/O.

cassandra-stress command:
```
  cassandra-stress read no-warmup duration=5m -rate threads=20 -node 10.2.209.188 -pop dist=uniform\(1..150000000\)
```

Baseline results:

```
Results:
Op rate                   :   13,830 op/s  [READ: 13,830 op/s]
Partition rate            :   13,830 pk/s  [READ: 13,830 pk/s]
Row rate                  :   13,830 row/s [READ: 13,830 row/s]
Latency mean              :    1.4 ms [READ: 1.4 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.8 ms [READ: 2.8 ms]
Latency 99.9th percentile :    3.4 ms [READ: 3.4 ms]
Latency max               :   12.0 ms [READ: 12.0 ms]
Total partitions          :  4,149,130 [READ: 4,149,130]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Question 1:
===========

Does putting scylla in a special slice affect its performance ?

Results with Scylla running in a slice:

```
Results:
Op rate                   :   13,811 op/s  [READ: 13,811 op/s]
Partition rate            :   13,811 pk/s  [READ: 13,811 pk/s]
Row rate                  :   13,811 row/s [READ: 13,811 row/s]
Latency mean              :    1.4 ms [READ: 1.4 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.2 ms [READ: 2.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.6 ms]
Latency 99.9th percentile :    3.3 ms [READ: 3.3 ms]
Latency max               :   23.2 ms [READ: 23.2 ms]
Total partitions          :  4,151,409 [READ: 4,151,409]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion* : No significant change

Question 2:
===========

What happens when there is a CPU hog running in the same server as scylla?

CPU hog:

```
   taskset -c 0 /bin/sh -c "while true; do true; done" &
   taskset -c 1 /bin/sh -c "while true; do true; done" &
   taskset -c 2 /bin/sh -c "while true; do true; done" &
   taskset -c 3 /bin/sh -c "while true; do true; done" &
   sleep 330
```

Scenario 1: CPU hog runs freely:

```
Results:
Op rate                   :    2,939 op/s  [READ: 2,939 op/s]
Partition rate            :    2,939 pk/s  [READ: 2,939 pk/s]
Row rate                  :    2,939 row/s [READ: 2,939 row/s]
Latency mean              :    6.8 ms [READ: 6.8 ms]
Latency median            :    5.3 ms [READ: 5.3 ms]
Latency 95th percentile   :   11.0 ms [READ: 11.0 ms]
Latency 99th percentile   :   14.9 ms [READ: 14.9 ms]
Latency 99.9th percentile :   17.1 ms [READ: 17.1 ms]
Latency max               :   26.3 ms [READ: 26.3 ms]
Total partitions          :    884,460 [READ: 884,460]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Scenario 2: CPU hog runs inside scylla-helper slice

```
Results:
Op rate                   :   13,527 op/s  [READ: 13,527 op/s]
Partition rate            :   13,527 pk/s  [READ: 13,527 pk/s]
Row rate                  :   13,527 row/s [READ: 13,527 row/s]
Latency mean              :    1.5 ms [READ: 1.5 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile :    3.8 ms [READ: 3.8 ms]
Latency max               :   18.7 ms [READ: 18.7 ms]
Total partitions          :  4,069,934 [READ: 4,069,934]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion*: With systemd slice we can keep the performance very close to
baseline

Question 3:
===========

What happens when there is a CPU hog running in the same server as scylla?

I/O hog: (Data in the cluster is 2x size of memory)

```
while true; do
	find /var/lib/scylla/data -type f -exec grep glauber {} +
done
```

Scenario 1: I/O hog runs freely:

```
Results:
Op rate                   :    7,680 op/s  [READ: 7,680 op/s]
Partition rate            :    7,680 pk/s  [READ: 7,680 pk/s]
Row rate                  :    7,680 row/s [READ: 7,680 row/s]
Latency mean              :    2.6 ms [READ: 2.6 ms]
Latency median            :    1.3 ms [READ: 1.3 ms]
Latency 95th percentile   :    7.8 ms [READ: 7.8 ms]
Latency 99th percentile   :   10.9 ms [READ: 10.9 ms]
Latency 99.9th percentile :   16.9 ms [READ: 16.9 ms]
Latency max               :   40.8 ms [READ: 40.8 ms]
Total partitions          :  2,306,723 [READ: 2,306,723]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Scenario 2: I/O hog runs in the scylla-helper systemd slice:

```
Results:
Op rate                   :   13,277 op/s  [READ: 13,277 op/s]
Partition rate            :   13,277 pk/s  [READ: 13,277 pk/s]
Row rate                  :   13,277 row/s [READ: 13,277 row/s]
Latency mean              :    1.5 ms [READ: 1.5 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile :    3.5 ms [READ: 3.5 ms]
Latency max               :  183.4 ms [READ: 183.4 ms]
Total partitions          :  3,984,080 [READ: 3,984,080]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion*: With systemd slice we can keep the performance very close to
baseline

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-08-19 14:31:28 -04:00
Avi Kivity
c32f9a8f7b dht: check for aborts during streaming
Propagate the abort_source from main() into boot_strapper and range_stream and
check for aborts at strategic points. This includes aborting running stream_plans
and aborting sleeps between retries.

Fixes #4674
2019-08-18 20:41:07 +03:00
Avi Kivity
5af6f5aa22 main: expose SIGINT/SIGTERM as abort_source
In order to propagate stop signals, expose them as sharded<abort_source>. This
allows propagating the signal to all shards, and integrating it with
sleep_abortable().

Because sharded<abort_source>::stop() will block, we'll now require stop_signal
to run in a thread (which is already the case).
2019-08-18 20:15:26 +03:00
Dejan Mircevski
48bb89fcb7 cql_query_test: Add LIKE tests for all types
As requested in a prior code review [1], ensure that LIKE cannot be
used on any non-string type.

[1] https://github.com/scylladb/scylla/pull/4610#pullrequestreview-255590129

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:55:35 -04:00
Dejan Mircevski
ef071bf7ce cql_query_test: Remove LIKE-nonstring-pattern case
This testcase was previously commented out, pending a fix that cannot
be made.  Currently it is impossible to validate the marker-value type
at filtering time.  The value is entered into the options object under
its presumed type of string, regardless of what it was made from.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:07:44 -04:00
Dejan Mircevski
20e688e703 cql_query_test: Move a testcase elsewhere in file
Somehow this test case sits in the middle of LIKE-operator tests:
test_alter_type_on_compact_storage_with_no_regular_columns_does_not_crash

Move it so LIKE test cases are contiguous.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:07:44 -04:00
Glauber Costa
ffc328c924 move postinst steps to an external script
There are systemd-related steps done in both rpm and deb builds.
Move that to a script so we avoid duplication.

The tests are so far a bit specific to the distributions, so it
needs to be adapted a bit.

Also note that this also fixes a bug with rpm as a side-effect:
rpm does not call daemon-reload after potentially changing the
systemd files (it is only implied during postun operations, that
happen during uninstall). daemon-reload was called explicitly for
debian packages, and now it is called for both.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-08-15 10:43:17 -04:00
Amnon Heiman
6a0490c419 api/compaction_manager: indentation 2019-08-12 14:04:40 +03:00
Amnon Heiman
8181601f0e api/compaction_manager: do not hold map on the stack
This patch fixes a bug that a map is held on the stack and then is used
by a future.

Instead, the map is now wrapped with do_with.

Fixes #4824

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-12 14:04:00 +03:00
2786 changed files with 47730 additions and 11849 deletions

5
.gitignore vendored
View File

@@ -19,3 +19,8 @@ CMakeLists.txt.user
__pycache__CMakeLists.txt.user
.gdbinit
resources
.pytest_cache
/expressions.tokens
tags
testlog/*
test/*/*.reject

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -97,7 +97,7 @@ scan_scylla_source_directories(
service
sstables
streaming
tests
test
thrift
tracing
transport

View File

@@ -56,7 +56,7 @@ $ ./configure.py --help
The most important option is:
- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.

View File

@@ -5,8 +5,6 @@ F: Filename, directory, or pattern for the subsystem
---
AUTH
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
R: Vlad Zolotarov <vladz@scylladb.com>
R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
@@ -14,22 +12,17 @@ F: auth/*
CACHE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
R: Piotr Jastrzebski <piotr@scylladb.com>
F: row_cache*
F: *mutation*
F: tests/mvcc*
COMMITLOG / BATCHLOGa
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
F: db/commitlog/*
F: db/batch*
COORDINATOR
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Gleb Natapov <gleb@scylladb.com>
F: service/storage_proxy*
@@ -49,12 +42,10 @@ M: Pekka Enberg <penberg@scylladb.com>
F: cql3/*
COUNTERS
M: Paweł Dziepak <pdziepak@scylladb.com>
F: counters*
F: tests/counter_test*
GOSSIP
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
F: gms/*
@@ -65,14 +56,11 @@ F: dist/docker/*
LSA
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
F: utils/logalloc*
MATERIALIZED VIEWS
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
R: Duarte Nunes <duarte@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: db/view/*
F: cql3/statements/*view*
@@ -82,14 +70,12 @@ F: dist/*
REPAIR
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: repair/*
SCHEMA MANAGEMENT
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: db/schema_tables*
F: db/legacy_schema_migrator*
@@ -98,15 +84,13 @@ F: schema*
SECONDARY INDEXES
M: Pekka Enberg <penberg@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
R: Pekka Enberg <penberg@scylladb.com>
F: db/index/*
F: cql3/statements/*index*
SSTABLES
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
@@ -114,18 +98,17 @@ F: sstables/*
STREAMING
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
F: streaming/*
F: service/storage_service.*
THRIFT TRANSPORT LAYER
M: Duarte Nunes <duarte@scylladb.com>
F: thrift/*
ALTERNATOR
M: Nadav Har'El <nyh@scylladb.com>
F: alternator/*
F: alternator-test/*
THE REST
M: Avi Kivity <avi@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: *

View File

@@ -1,29 +0,0 @@
Seastar and DPDK
================
Seastar uses the Data Plane Development Kit to drive NIC hardware directly. This
provides an enormous performance boost.
To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a
run-time parameter. This will use the DPDK package provided as a git submodule with the
seastar sources.
To use your own self-compiled DPDK package, follow this procedure:
1. Setup host to compile DPDK:
- Ubuntu
`sudo apt-get install -y build-essential linux-image-extra-$(uname -r)`
2. Prepare a DPDK SDK:
- Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`
- Untar it.
- Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.
- For DPDK 1.7.x: edit config/common_linuxapp:
- Set CONFIG_RTE_LIBRTE_PMD_BOND to 'n'.
- Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.
- Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.
- Start the tools/setup.sh script as root.
- Compile a linuxapp target (option 9).
- Install IGB_UIO module (option 11).
- Bind some physical port to IGB_UIO (option 17).
- Configure hugepage mappings (option 14/15).
3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.

View File

@@ -27,10 +27,10 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
```
* run Scylla with one CPU and ./tmp as data directory
* run Scylla with one CPU and ./tmp as work directory
```
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
./build/release/scylla --workdir tmp --smp 1
```
* For more run options:
@@ -38,6 +38,24 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
./build/release/scylla --help
```
## Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
Thrift. There is also experimental support for the API of Amazon DynamoDB,
but being experimental it needs to be explicitly enabled to be used. For more
information on how to enable the experimental DynamoDB compatibility in Scylla,
and the current limitations of this feature, see
[Alternator](docs/alternator/alternator.md) and
[Getting started with Alternator](docs/alternator/getting-started.md).
## Documentation
Documentation can be found in [./docs](./docs) and on the
[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear
definition of what goes where, so when looking for something be sure to check
both.
Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
User documentation can be found [here](https://docs.scylladb.com/).
## Building Fedora RPM
As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:

View File

@@ -1,7 +1,7 @@
#!/bin/sh
PRODUCT=scylla
VERSION=666.development
VERSION=3.3.4
if test -f version
then

78
alternator-test/README.md Normal file
View File

@@ -0,0 +1,78 @@
Tests for Alternator that should also pass, identically, against DynamoDB.
Tests use the boto3 library for AWS API, and the pytest frameworks
(both are available from Linux distributions, or with "pip install").
To run all tests against the local installation of Alternator on
http://localhost:8000, just run `pytest`.
Some additional pytest options:
* To run all tests in a single file, do `pytest test_table.py`.
* To run a single specific test, do `pytest test_table.py::test_create_table_unsupported_names`.
* Additional useful pytest options, especially useful for debugging tests:
* -v: show the names of each individual test running instead of just dots.
* -s: show the full output of running tests (by default, pytest captures the test's output and only displays it if a test fails)
Add the `--aws` option to test against AWS instead of the local installation.
For example - `pytest --aws test_item.py` or `pytest --aws`.
If you plan to run tests against AWS and not just a local Scylla installation,
the files ~/.aws/credentials should be configured with your AWS key:
```
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
and ~/.aws/config with the default region to use in the test:
```
[default]
region = us-east-1
```
## HTTPS support
In order to run tests with HTTPS, run pytest with `--https` parameter. Note that the Scylla cluster needs to be provided
with alternator\_https\_port configuration option in order to initialize a HTTPS server.
Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate
a key and a self-signed certificate, which is sufficient to run `--https` tests:
```
openssl genrsa 2048 > scylla.key
openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt
```
If this pair is put into `conf/` directory, it will be enough
to allow the alternator HTTPS server to think it's been authorized and properly certified.
Still, boto3 library issues warnings that the certificate used for communication is self-signed,
and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.
## Authorization
By default, boto3 prepares a properly signed Authorization header with every request.
In order to confirm the authorization, the server recomputes the signature by using
user credentials (user-provided username + a secret key known by the server),
and then checks if it matches the signature from the header.
Early alternator code did not verify signatures at all, which is also allowed by the protocol.
A partial implementation of the authorization verification can be allowed by providing a Scylla
configuration parameter:
```yaml
alternator_enforce_authorization: true
```
The implementation is currently coupled with Scylla's system\_auth.roles table,
which means that an additional step needs to be performed when setting up Scylla
as the test environment. Tests will use the following credentials:
Username: `alternator`
Secret key: `secret_pass`
With CQLSH, it can be achieved by executing this snipped:
```bash
cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"
```
Most tests expect the authorization to succeed, so they will pass even with `alternator_enforce_authorization`
turned off. However, test cases from `test_authorization.py` may require this option to be turned on,
so it's advised.

179
alternator-test/conftest.py Normal file
View File

@@ -0,0 +1,179 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# This file contains "test fixtures", a pytest concept described in
# https://docs.pytest.org/en/latest/fixture.html.
# A "fixture" is some sort of setup which an invididual test requires to run.
# The fixture has setup code and teardown code, and if multiple tests
# require the same fixture, it can be set up only once - while still allowing
# the user to run individual tests and automatically set up the fixtures they need.
import pytest
import boto3
from util import create_test_table
# Test that the Boto libraries are new enough. These tests want to test a
# large variety of DynamoDB API features, and to do this we need a new-enough
# version of the the Boto libraries (boto3 and botocore) so that they can
# access all these API features.
# In particular, the BillingMode feature was added in botocore 1.12.54.
import botocore
import sys
from distutils.version import LooseVersion
if (LooseVersion(botocore.__version__) < LooseVersion('1.12.54')):
pytest.exit("Your Boto library is too old. Please upgrade it,\ne.g. using:\n sudo pip{} install --upgrade boto3".format(sys.version_info[0]))
# By default, tests run against a local Scylla installation on localhost:8080/.
# The "--aws" option can be used to run against Amazon DynamoDB in the us-east-1
# region.
def pytest_addoption(parser):
parser.addoption("--aws", action="store_true",
help="run against AWS instead of a local Scylla installation")
parser.addoption("--https", action="store_true",
help="communicate via HTTPS protocol on port 8043 instead of HTTP when"
" running against a local Scylla installation")
# "dynamodb" fixture: set up client object for communicating with the DynamoDB
# API. Currently this chooses either Amazon's DynamoDB in the default region
# or a local Alternator installation on http://localhost:8080 - depending on the
# existence of the "--aws" option. In the future we should provide options
# for choosing other Amazon regions or local installations.
# We use scope="session" so that all tests will reuse the same client object.
@pytest.fixture(scope="session")
def dynamodb(request):
if request.config.getoption('aws'):
return boto3.resource('dynamodb')
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
local_url = 'https://localhost:8043' if request.config.getoption('https') else 'http://localhost:8000'
# Disable verifying in order to be able to use self-signed TLS certificates
verify = not request.config.getoption('https')
# Silencing the 'Unverified HTTPS request warning'
if request.config.getoption('https'):
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
return boto3.resource('dynamodb', endpoint_url=local_url, verify=verify,
region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')
# "test_table" fixture: Create and return a temporary table to be used in tests
# that need a table to work on. The table is automatically deleted at the end.
# We use scope="session" so that all tests will reuse the same client object.
# This "test_table" creates a table which has a specific key schema: both a
# partition key and a sort key, and both are strings. Other fixtures (below)
# can be used to create different types of tables.
#
# TODO: Although we are careful about deleting temporary tables when the
# fixture is torn down, in some cases (e.g., interrupted tests) we can be left
# with some tables not deleted, and they will never be deleted. Because all
# our temporary tables have the same test_table_prefix, we can actually find
# and remove these old tables with this prefix. We can have a fixture, which
# test_table will require, which on teardown will delete all remaining tables
# (possibly from an older run). Because the table's name includes the current
# time, we can also remove just tables older than a particular age. Such
# mechanism will allow running tests in parallel, without the risk of deleting
# a parallel run's temporary tables.
@pytest.fixture(scope="session")
def test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
yield table
# We get back here when this fixture is torn down. We ask Dynamo to delete
# this table, but not wait for the deletion to complete. The next time
# we create a test_table fixture, we'll choose a different table name
# anyway.
table.delete()
# The following fixtures test_table_* are similar to test_table but create
# tables with different key schemas.
@pytest.fixture(scope="session")
def test_table_s(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_b(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sb(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sn(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'N' } ])
yield table
table.delete()
# "filled_test_table" fixture: Create a temporary table to be used in tests
# that involve reading data - GetItem, Scan, etc. The table is filled with
# 328 items - each consisting of a partition key, clustering key and two
# string attributes. 164 of the items are in a single partition (with the
# partition key 'long') and the 164 other items are each in a separate
# partition. Finally, a 329th item is added with different attributes.
# This table is supposed to be read from, not updated nor overwritten.
# This fixture returns both a table object and the description of all items
# inserted into it.
@pytest.fixture(scope="session")
def filled_test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
count = 164
items = [{
'p': str(i),
'c': str(i),
'attribute': "x" * 7,
'another': "y" * 16
} for i in range(count)]
items = items + [{
'p': 'long',
'c': str(i),
'attribute': "x" * (1 + i % 7),
'another': "y" * (1 + i % 16)
} for i in range(count)]
items.append({'p': 'hello', 'c': 'world', 'str': 'and now for something completely different'})
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
yield table, items
table.delete()

View File

@@ -0,0 +1,74 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for authorization
import pytest
import botocore
from botocore.exceptions import ClientError
import boto3
import requests
# Test that trying to perform an operation signed with a wrong key
# will not succeed
def test_wrong_key_access(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints()
# A similar test, but this time the user is expected to exist in the database (for local tests)
def test_wrong_password(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='alternator', aws_secret_access_key='wrong_key').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='wrong_key', verify=verify).describe_endpoints()
# A test ensuring that expired signatures are not accepted
def test_expired_signature(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '20170101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature expired" in response.text
# A test ensuring that signatures that exceed current time too much are not accepted.
# Watch out - this test is valid only for around next 1000 years, it needs to be updated later.
def test_signature_too_futuristic(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '30200101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text

View File

@@ -0,0 +1,253 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for batch operations - BatchWriteItem, BatchReadItem.
# Note that various other tests in other files also use these operations,
# so they are actually tested by other tests as well.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Test ensuring that items inserted by a batched statement can be properly extracted
# via GetItem. Schema has both hash and sort keys.
def test_basic_batch_write_item(test_table):
count = 7
with test_table.batch_writer() as batch:
for i in range(count):
batch.put_item(Item={
'p': "batch{}".format(i),
'c': "batch_ck{}".format(i),
'attribute': str(i),
'another': 'xyz'
})
for i in range(count):
item = test_table.get_item(Key={'p': "batch{}".format(i), 'c': "batch_ck{}".format(i)}, ConsistentRead=True)['Item']
assert item['p'] == "batch{}".format(i)
assert item['c'] == "batch_ck{}".format(i)
assert item['attribute'] == str(i)
assert item['another'] == 'xyz'
# Test batch write to a table with only a hash key
def test_batch_write_hash_only(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
# Test batch delete operation (DeleteRequest): We create a bunch of items, and
# then delete them all.
def test_batch_write_delete(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
with test_table_s.batch_writer() as batch:
for item in items:
batch.delete_item(Key={'p': item['p']})
# Verify that all items are now missing:
for item in items:
assert not 'Item' in test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)
# Test the same batch including both writes and delete. Should be fine.
def test_batch_write_and_delete(test_table_s):
p1 = random_string()
p2 = random_string()
test_table_s.put_item(Item={'p': p1})
assert 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert not 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p2})
batch.delete_item(Key={'p': p1})
assert not 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
# It is forbidden to update the same key twice in the same batch.
# DynamoDB says "Provided list of item keys contains duplicates".
def test_batch_write_duplicate_write(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
def test_batch_write_duplicate_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.delete_item(Key={'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': other})
batch.delete_item(Key={'p': other, 'c': c})
def test_batch_write_duplicate_write_and_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
# Test that BatchWriteItem's PutRequest completely replaces an existing item.
# It shouldn't merge it with a previously existing value. See also the same
# test for PutItem - test_put_item_replace().
def test_batch_put_item_replace(test_table_s, test_table):
p = random_string()
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}
# Test that if one of the batch's operations is invalid, because a key
# column is missing or has the wrong type, the entire batch is rejected
# before any write is done.
def test_batch_write_invalid_operation(test_table_s):
# test key attribute with wrong type:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'p': 3}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# test missing key attribute:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'x': 'whatever'}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for BatchGetItem, reading several entire items.
# Schema has both hash and sort keys.
def test_batch_get_item(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
# We use the low-level batch_get_item API for lack of a more convenient
# API. At least it spares us the need to encode the key's types...
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ConsistentRead': True}})
print(reply)
got_items = reply['Responses'][test_table.name]
assert multiset(got_items) == multiset(items)
# Same, with schema has just hash key.
def test_batch_get_item_hash(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p')} for x in items]
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': keys, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert multiset(got_items) == multiset(items)
# Test what do we get if we try to read two *missing* values in addition to
# an existing one. It turns out the missing items are simply not returned,
# with no sign they are missing.
def test_batch_get_item_missing(test_table_s):
p = random_string();
test_table_s.put_item(Item={'p': p})
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}, {'p': random_string()}, {'p': p}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == [{'p' : p}]
# If all the keys requested from a particular table are missing, we still
# get a response array for that table - it's just empty.
def test_batch_get_item_completely_missing(test_table_s):
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == []
# Test GetItem with AttributesToGet
def test_batch_get_item_attributes_to_get(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'AttributesToGet': wanted, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)
# Test GetItem with ProjectionExpression (just a simple one, with
# top-level attributes)
def test_batch_get_item_projection_expression(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ProjectionExpression': ",".join(wanted), 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,49 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for the DescribeEndpoints operation
import boto3
# Test that the DescribeEndpoints operation works as expected: that it
# returns one endpoint (it may return more, but it never does this in
# Amazon), and this endpoint can be used to make more requests.
def test_describe_endpoints(request, dynamodb):
endpoints = dynamodb.meta.client.describe_endpoints()['Endpoints']
# It is not strictly necessary that only a single endpoint be returned,
# but this is what Amazon DynamoDB does today (and so does Alternator).
assert len(endpoints) == 1
for endpoint in endpoints:
assert 'CachePeriodInMinutes' in endpoint.keys()
address = endpoint['Address']
# Check that the address is a valid endpoint by checking that we can
# send it another describe_endpoints() request ;-) Note that the
# address does not include the "http://" or "https://" prefix, and
# we need to choose one manually.
prefix = "https://" if request.config.getoption('https') else "http://"
verify = not request.config.getoption('https')
url = prefix + address
if address.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, verify=verify).describe_endpoints()
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass', verify=verify).describe_endpoints()
# Nothing to check here - if the above call failed with an exception,
# the test would fail.

View File

@@ -0,0 +1,169 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the DescribeTable operation.
# Some attributes used only by a specific major feature will be tested
# elsewhere:
# 1. Tests for describing tables with global or local secondary indexes
# (the GlobalSecondaryIndexes and LocalSecondaryIndexes attributes)
# are in test_gsi.py and test_lsi.py.
# 2. Tests for the stream feature (LatestStreamArn, LatestStreamLabel,
# StreamSpecification) will be in the tests devoted to the stream
# feature.
# 3. Tests for describing a restored table (RestoreSummary, TableId)
# will be together with tests devoted to the backup/restore feature.
import pytest
from botocore.exceptions import ClientError
import re
import time
from util import multiset
# Test that DescribeTable correctly returns the table's name and state
def test_describe_table_basic(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['TableName'] == test_table.name
assert got['TableStatus'] == 'ACTIVE'
# Test that DescribeTable correctly returns the table's schema, in
# AttributeDefinitions and KeySchema attributes
def test_describe_table_schema(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
expected = { # Copied from test_table()'s fixture
'KeySchema': [ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
'AttributeDefinitions': [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
]
}
assert got['KeySchema'] == expected['KeySchema']
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected['AttributeDefinitions'])
# Test that DescribeTable correctly returns the table's billing mode,
# in the BillingModeSummary attribute.
def test_describe_table_billing(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['BillingModeSummary']['BillingMode'] == 'PAY_PER_REQUEST'
# The BillingModeSummary should also contain a
# LastUpdateToPayPerRequestDateTime attribute, which is a date.
# We don't know what date this is supposed to be, but something we
# do know is that the test table was created already with this billing
# mode, so the table creation date should be the same as the billing
# mode setting date.
assert 'LastUpdateToPayPerRequestDateTime' in got['BillingModeSummary']
assert got['BillingModeSummary']['LastUpdateToPayPerRequestDateTime'] == got['CreationDateTime']
# Test that DescribeTable correctly returns the table's creation time.
# We don't know what this creation time is supposed to be, so this test
# cannot be very thorough... We currently just tests against something we
# know to be wrong - returning the *current* time, which changes on every
# call.
@pytest.mark.xfail(reason="DescribeTable does not return table creation time")
def test_describe_table_creation_time(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'CreationDateTime' in got
time1 = got['CreationDateTime']
time.sleep(1)
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
time2 = got['CreationDateTime']
assert time1 == time2
# Test that DescribeTable returns the table's estimated item count
# in the ItemCount attribute. Unfortunately, there's not much we can
# really test here... The documentation says that the count can be
# delayed by six hours, so the number we get here may have no relation
# to the current number of items in the test table. The attribute should exist,
# though. This test does NOT verify that ItemCount isn't always returned as
# zero - such stub implementation will pass this test.
@pytest.mark.xfail(reason="DescribeTable does not return table item count")
def test_describe_table_item_count(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'ItemCount' in got
# Similar test for estimated size in bytes - TableSizeBytes - which again,
# may reflect the size as long as six hours ago.
@pytest.mark.xfail(reason="DescribeTable does not return table size")
def test_describe_table_size(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableSizeBytes' in got
# Test the ProvisionedThroughput attribute returned by DescribeTable.
# This is a very partial test: Our test table is configured without
# provisioned throughput, so obviously it will not have interesting settings
# for it. DynamoDB returns zeros for some of the attributes, even though
# the documentation suggests missing values should have been fine too.
@pytest.mark.xfail(reason="DescribeTable does not return provisioned throughput")
def test_describe_table_provisioned_throughput(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['ProvisionedThroughput']['NumberOfDecreasesToday'] == 0
assert got['ProvisionedThroughput']['WriteCapacityUnits'] == 0
assert got['ProvisionedThroughput']['ReadCapacityUnits'] == 0
# This is a silly test for the RestoreSummary attribute in DescribeTable -
# it should not exist in a table not created by a restore. When testing
# the backup/restore feature, we will have more meaninful tests for the
# value of this attribute in that case.
def test_describe_table_restore_summary(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'RestoreSummary' in got
# This is a silly test for the SSEDescription attribute in DescribeTable -
# by default, a table is encrypted with AWS-owned keys, not using client-
# owned keys, and the SSEDescription attribute is not returned at all.
def test_describe_table_encryption(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'SSEDescription' in got
# This is a silly test for the StreamSpecification attribute in DescribeTable -
# when there are no streams, this attribute should be missing.
def test_describe_table_stream_specification(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'StreamSpecification' in got
# Test that the table has an ARN, a unique identifier for the table which
# includes which zone it is on, which account, and of course the table's
# name. The ARN format is described in
# https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#genref-arns
@pytest.mark.xfail(reason="DescribeTable does not return ARN")
def test_describe_table_arn(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableArn' in got and got['TableArn'].startswith('arn:')
# Test that the table has a TableId.
# TODO: Figure out what is this TableId supposed to be, it is just a
# unique id that is created with the table and never changes? Or anything
# else?
@pytest.mark.xfail(reason="DescribeTable does not return TableId")
def test_describe_table_id(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableId' in got
# DescribeTable error path: trying to describe a non-existent table should
# result in a ResourceNotFoundException.
def test_describe_table_non_existent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException') as einfo:
dynamodb.meta.client.describe_table(TableName='non_existent_table')
# As one of the first error-path tests that we wrote, let's test in more
# detail that the error reply has the appropriate fields:
response = einfo.value.response
print(response)
err = response['Error']
assert err['Code'] == 'ResourceNotFoundException'
assert re.match(err['Message'], 'Requested resource not found: Table: non_existent_table not found')

File diff suppressed because it is too large Load Diff

874
alternator-test/test_gsi.py Normal file
View File

@@ -0,0 +1,874 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of GSI (Global Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new GSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# GSIs only support eventually consistent reads, so tests that involve
# writing to a table and then expect to read something from it cannot be
# guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
# Note that in practice, there repeated reads are almost never necessary:
# Amazon claims that "Changes to the table data are propagated to the global
# secondary indexes within a fraction of a second, under normal conditions"
# and indeed, in practice, the tests here almost always succeed without a
# retry.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_gsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# One of the simplest forms of a non-trivial GSI: The base table has a hash
# and sort key, and the index reverses those roles. Other attributes are just
# copied.
@pytest.fixture(scope="session")
def test_table_gsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_simple(test_table_gsi_1):
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
# The index allows a query on just a specific sort key, which isn't
# allowed on the base table.
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1, 'hello', full_scan(test_table_gsi_1))
def test_gsi_same_key(test_table_gsi_1):
c = random_string();
# All these items have the same sort key 'c' but different hash key 'p'
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Check we get an appropriate error when trying to read a non-existing index
# of an existing table. Although the documentation specifies that a
# ResourceNotFoundException should be returned if "The operation tried to
# access a nonexistent table or index", in fact in the specific case that
# the table does exist but an index does not - we get a ValidationException.
def test_gsi_missing_index(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_query(test_table_gsi_1, IndexName='wrong_name',
KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_scan(test_table_gsi_1, IndexName='wrong_name')
# Nevertheless, if the table itself does not exist, a query should return
# a ResourceNotFoundException, not ValidationException:
def test_gsi_missing_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.query(TableName='nonexistent_table', IndexName='any_name', KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.scan(TableName='nonexistent_table', IndexName='any_name')
# Verify that strongly-consistent reads on GSI are *not* allowed.
@pytest.mark.xfail(reason="GSI strong consistency not checked")
def test_gsi_strong_consistency(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': ['hi'], 'ComparisonOperator': 'EQ'}}, IndexName='hello', ConsistentRead=True)
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_scan(test_table_gsi_1, IndexName='hello', ConsistentRead=True)
# Verify that a GSI is correctly listed in describe_table
@pytest.mark.xfail(reason="DescribeTable provides index names only, no size or item count")
def test_gsi_describe(test_table_gsi_1):
desc = test_table_gsi_1.meta.client.describe_table(TableName=test_table_gsi_1.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 1
gsi = gsis[0]
assert gsi['IndexName'] == 'hello'
assert 'IndexSizeBytes' in gsi # actual size depends on content
assert 'ItemCount' in gsi
assert gsi['Projection'] == {'ProjectionType': 'ALL'}
assert gsi['IndexStatus'] == 'ACTIVE'
assert gsi['KeySchema'] == [{'KeyType': 'HASH', 'AttributeName': 'c'},
{'KeyType': 'RANGE', 'AttributeName': 'p'}]
# TODO: check also ProvisionedThroughput, IndexArn
# When a GSI's key includes an attribute not in the base table's key, we
# need to remember to add its type to AttributeDefinitions.
def test_gsi_missing_attribute_definition(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*AttributeDefinitions'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [ { 'AttributeName': 'c', 'KeyType': 'HASH' } ],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# test_table_gsi_1_hash_only is a variant of test_table_gsi_1: It's another
# case where the index doesn't involve non-key attributes. Again the base
# table has a hash and sort key, but in this case the index has *only* a
# hash key (which is the base's hash key). In the materialized-view-based
# implementation, we need to remember the other part of the base key as a
# clustering key.
@pytest.fixture(scope="session")
def test_table_gsi_1_hash_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_key_not_in_index(test_table_gsi_1_hash_only):
# Test with items with different 'c' values:
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1_hash_only, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Test items with the same sort key 'c' but different hash key 'p'
c = random_string();
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1_hash_only, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1_hash_only, 'hello', full_scan(test_table_gsi_1_hash_only))
# A second scenario of GSI. Base table has just hash key, Index has a
# different hash key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_gsi_2(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_2(test_table_gsi_2):
items1 = [{'p': random_string(), 'x': random_string()} for i in range(10)]
x1 = items1[0]['x']
x2 = random_string()
items2 = [{'p': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['x'] == x1]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['x'] == x2]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
def test_gsi_missing_attribute(test_table_gsi_2):
p1 = random_string()
x1 = random_string()
test_table_gsi_2.put_item(Item={'p': p1, 'x': x1})
p2 = random_string()
test_table_gsi_2.put_item(Item={'p': p2})
# Both items are now in the base table:
assert test_table_gsi_2.get_item(Key={'p': p1})['Item'] == {'p': p1, 'x': x1}
assert test_table_gsi_2.get_item(Key={'p': p2})['Item'] == {'p': p2}
# But only the first item is in the index: It can be found using a
# Query, and a scan of the index won't find it (but a scan on the base
# will).
assert_index_query(test_table_gsi_2, 'hello', [{'p': p1, 'x': x1}],
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
assert any([i['p'] == p1 for i in full_scan(test_table_gsi_2)])
# Note: with eventually consistent read, we can't really be sure that
# and item will "never" appear in the index. We do this test last,
# so if we had a bug and such item did appear, hopefully we had enough
# time for the bug to become visible. At least sometimes.
assert not any([i['p'] == p2 for i in full_scan(test_table_gsi_2, IndexName='hello')])
# Test when a table has a GSI, if the indexed attribute has the wrong type,
# the update operation is rejected, and is added to neither base table nor
# index. This is different from the case of a *missing* attribute, where
# the item is added to the base table but not index.
# The following three tests test_gsi_wrong_type_attribute_{put,update,batch}
# test updates using PutItem, UpdateItem, and BatchWriteItem respectively.
def test_gsi_wrong_type_attribute_put(test_table_gsi_2):
# PutItem with wrong type for 'x' is rejected, item isn't created even
# in the base table.
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.put_item(Item={'p': p, 'x': 3})
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
def test_gsi_wrong_type_attribute_update(test_table_gsi_2):
# An UpdateItem with wrong type for 'x' is also rejected, but naturally
# if the item already existed, it remains as it was.
p = random_string()
x = random_string()
test_table_gsi_2.put_item(Item={'p': p, 'x': x})
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.update_item(Key={'p': p}, AttributeUpdates={'x': {'Value': 3, 'Action': 'PUT'}})
assert test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': x}
def test_gsi_wrong_type_attribute_batch(test_table_gsi_2):
# In a BatchWriteItem, if any update is forbidden, the entire batch is
# rejected, and none of the updates happen at all.
p1 = random_string()
p2 = random_string()
p3 = random_string()
items = [{'p': p1, 'x': random_string()},
{'p': p2, 'x': 3},
{'p': p3, 'x': random_string()}]
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2, p3]:
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
# A third scenario of GSI. Index has a hash key and a sort key, both are
# non-key attributes from the base table. This scenario may be very
# difficult to implement in Alternator because Scylla's materialized-views
# implementation only allows one new key column in the view, and here
# we need two (which, also, aren't actual columns, but map items).
@pytest.fixture(scope="session")
def test_table_gsi_3(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_3(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
def test_gsi_update_second_regular_base_column(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
items[3]['b'] = 'updated'
test_table_gsi_3.update_item(Key={'p': items[3]['p']}, AttributeUpdates={'b': {'Value': 'updated', 'Action': 'PUT'}})
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
# This is the same feature we already tested in test_gsi_missing_attribute()
# above, but on a different table: In that test we used test_table_gsi_2,
# with one indexed attribute, and in this test we use test_table_gsi_3 which
# has two base regular attributes in the view key, and more possibilities
# of which value might be missing. Reproduces issue #6008.
def test_gsi_missing_attribute_3(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
# First, add an item with a missing "a" value. It should appear in the
# base table, but not in the index:
test_table_gsi_3.put_item(Item={'p': p, 'b': b})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'b': b}
# Note: with eventually consistent read, we can't really be sure that
# an item will "never" appear in the index. We hope that if a bug exists
# and such an item did appear, sometimes the delay here will be enough
# for the unexpected item to become visible.
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# Same thing for an item with a missing "b" value:
test_table_gsi_3.put_item(Item={'p': p, 'a': a})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'a': a}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# And for an item missing both:
test_table_gsi_3.put_item(Item={'p': p})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# A fourth scenario of GSI. Two GSIs on a single base table.
@pytest.fixture(scope="session")
def test_table_gsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_a',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
},
{ 'IndexName': 'hello_b',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
# Test that a base table with two GSIs updates both as expected.
def test_gsi_4(test_table_gsi_4):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_4, 'hello_a', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'}})
assert_index_query(test_table_gsi_4, 'hello_b', [items[3]],
KeyConditions={'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Verify that describe_table lists the two GSIs.
def test_gsi_4_describe(test_table_gsi_4):
desc = test_table_gsi_4.meta.client.describe_table(TableName=test_table_gsi_4.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 2
assert multiset([g['IndexName'] for g in gsis]) == multiset(['hello_a', 'hello_b'])
# A scenario for GSI in which the table has both hash and sort key
@pytest.fixture(scope="session")
def test_table_gsi_5(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_5(test_table_gsi_5):
items1 = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
p1, x1 = items1[0]['p'], items1[0]['x']
p2, x2 = random_string(), random_string()
items2 = [{'p': p2, 'c': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_5.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['x'] == x1]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['x'] == x2]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Verify that DescribeTable correctly returns the schema of both base-table
# and secondary indexes. KeySchema is given for each of the base table and
# indexes, and AttributeDefinitions is merged for all of them together.
def test_gsi_5_describe_table_schema(test_table_gsi_5):
got = test_table_gsi_5.meta.client.describe_table(TableName=test_table_gsi_5.name)['Table']
# Copied from test_table_gsi_5 fixture
expected_base_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' } ]
expected_gsi_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# Similar DescribeTable schema test for test_table_gsi_2. The peculiarity
# in that table is that the base table has only a hash key p, and index
# only hash hash key x; Now, while internally Scylla needs to add "p" as a
# clustering key in the materialized view (in Scylla the view key always
# contains the base key), when describing the table, "p" shouldn't be
# returned as a range key, because the user didn't ask for it.
# This test reproduces issue #5320.
@pytest.mark.xfail(reason="GSI DescribeTable spurious range key (#5320)")
def test_gsi_2_describe_table_schema(test_table_gsi_2):
got = test_table_gsi_2.meta.client.describe_table(TableName=test_table_gsi_2.name)['Table']
# Copied from test_table_gsi_2 fixture
expected_base_keyschema = [ { 'AttributeName': 'p', 'KeyType': 'HASH' } ]
expected_gsi_keyschema = [ { 'AttributeName': 'x', 'KeyType': 'HASH' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# All tests above involved "ProjectionType: ALL". This test checks how
# "ProjectionType:: KEYS_ONLY" works. We note that it projects both
# the index's key, *and* the base table's key. So items which had different
# base-table keys cannot suddenly become the same item in the index.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
items = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
table.delete()
# Test for "ProjectionType:: INCLUDE". The secondary table includes the
# its own and the base's keys (as in KEYS_ONLY) plus the extra keys given
# in NonKeyAttributes.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_include(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'INCLUDE',
'NonKeyAttributes': ['a', 'b'] }
}
])
# Some items have the projected attributes a,b and some don't:
items = [{'p': random_string(), 'x': random_string(), 'a': random_string(), 'b': random_string(), 'y': random_string()} for i in range(10)]
items = items + [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x', 'a', 'b']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
print(len(expected_items))
table.delete()
# DynamoDB's says the "Projection" argument of GlobalSecondaryIndexes is
# mandatory, and indeed Boto3 enforces that it must be passed. The
# documentation then goes on to claim that the "ProjectionType" member of
# "Projection" is optional - and Boto3 allows it to be missing. But in
# fact, it is not allowed to be missing: DynamoDB complains: "Unknown
# ProjectionType: null".
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_missing_projection_type(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*ProjectionType'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': {}
}
])
# update_table() for creating a GSI is an asynchronous operation.
# The table's TableStatus changes from ACTIVE to UPDATING for a short while
# and then goes back to ACTIVE, but the new GSI's IndexStatus appears as
# CREATING, until eventually (after a *long* time...) it becomes ACTIVE.
# During the CREATING phase, at some point the Backfilling attribute also
# appears, until it eventually disappears. We need to wait until all three
# markers indicate completion.
# Unfortunately, while boto3 has a client.get_waiter('table_exists') to
# wait for a table to exists, there is no such function to wait for an
# index to come up, so we need to code it ourselves.
def wait_for_gsi(table, gsi_name):
start_time = time.time()
# Surprisingly, even for tiny tables this can take a very long time
# on DynamoDB - often many minutes!
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
assert len(index_desc) == 1
index_status = index_desc[0]['IndexStatus']
if index_status != 'ACTIVE':
print('%d Index status still %s' % (i, index_status))
continue
# When the index is ACTIVE, this must be after backfilling completed
assert not 'Backfilling' in index_desc[0]
print('wait_for_gsi took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi did not complete")
# Similarly to how wait_for_gsi() waits for a GSI to finish adding,
# this function waits for a GSI to be finally deleted.
def wait_for_gsi_gone(table, gsi_name):
start_time = time.time()
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
if 'GlobalSecondaryIndexes' in desc['Table']:
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
if len(index_desc) != 0:
index_status = index_desc[0]['IndexStatus']
print('%d Index status still %s' % (i, index_status))
continue
print('wait_for_gsi_gone took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi_gone did not complete")
# All tests above involved creating a new table with a GSI up-front. This
# test will test creating a base table *without* a GSI, putting data in
# it, and then adding a GSI with the UpdateTable operation. This starts
# a backfilling stage - where data is copied to the index - and when this
# stage is done, the index is usable. Items whose indexed column contains
# the wrong type are silently ignored and not added to the index (it would
# not have been possible to add such items if the GSI was already configured
# when they were added).
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_backfill(dynamodb):
# First create, and fill, a table without GSI. The items in items1
# will have the appropriate string type for 'x' and will later get
# indexed. Items in item2 have no value for 'x', and in item3 'x' is in
# not a string; So the items in items2 and items3 will be missing
# in the index we'll create later.
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
items1 = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
items2 = [{'p': random_string(), 'y': random_string()} for i in range(10)]
items3 = [{'p': random_string(), 'x': i} for i in range(10)]
items = items1 + items2 + items3
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert multiset(items) == multiset(full_scan(table))
# Now use UpdateTable to create the GSI
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'x', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'x', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the table is backfilled.
wait_for_gsi(table, 'hello')
# As explained above, only items in items1 got copied to the gsi,
# and Scan on them works as expected.
# Note that we don't need to retry the reads here (i.e., use the
# assert_index_scan() or assert_index_query() functions) because after
# we waited for backfilling to complete, we know all the pre-existing
# data is already in the index.
assert multiset(items1) == multiset(full_scan(table, IndexName='hello'))
# We can also use Query on the new GSI, to search on the attribute x:
assert multiset([items1[3]]) == multiset(full_query(table,
IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items1[3]['x']], 'ComparisonOperator': 'EQ'}}))
# Let's also test that we cannot add another index with the same name
# that already exists
with pytest.raises(ClientError, match='ValidationException.*already exists'):
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'y', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'y', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
table.delete()
# Test deleting an existing GSI using UpdateTable
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_delete(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# So far, we have the index for "x" and can use it:
assert_index_query(table, 'hello', [items[3]],
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
# Now use UpdateTable to delete the GSI for "x"
dynamodb.meta.client.update_table(TableName=table.name,
GlobalSecondaryIndexUpdates=[{ 'Delete':
{ 'IndexName': 'hello' } }])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the GSI is removed.
wait_for_gsi_gone(table, 'hello')
# Now index is gone. We cannot query using it.
with pytest.raises(ClientError, match='ValidationException.*hello'):
full_query(table, IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
table.delete()
# Utility function for creating a new table a GSI with the given name,
# and, if creation was successful, delete it. Useful for testing which
# GSI names work.
def create_gsi(dynamodb, index_name):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# Verify that the GSI wasn't just ignored, as Scylla originally did ;-)
assert 'GlobalSecondaryIndexes' in table.meta.client.describe_table(TableName=table.name)['Table']
table.delete()
# Like table names (tested in test_table.py), index names must must also
# be 3-255 characters and match the regex [a-zA-Z0-9._-]+. This test
# is similar to test_create_table_unsupported_names(), but for GSI names.
# Note that Scylla is actually more limited in the length of the index
# names, because both table name and index name, together, have to fit in
# 221 characters. But we don't verify here this specific limitation.
def test_gsi_unsupported_names(dynamodb):
# Unfortunately, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException.*nnnnn'):
create_gsi(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException.*nyh'):
create_gsi(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_gsi_non_scylla_name(dynamodb):
create_gsi(dynamodb, '.alternator_test')
# Index names with 255 characters are allowed in Dynamo. In Scylla, the
# limit is different - the sum of both table and index length cannot
# exceed 211 characters. So we test a much shorter limit.
# (compare test_create_and_delete_table_very_long_name()).
def test_gsi_very_long_name(dynamodb):
#create_gsi(dynamodb, 'n' * 255) # works on DynamoDB, but not on Scylla
create_gsi(dynamodb, 'n' * 190)
# Verify that ListTables does not list materialized views used for indexes.
# This is hard to test, because we don't really know which table names
# should be listed beyond those we created, and don't want to assume that
# no other test runs in parallel with us. So the method we chose is to use a
# unique random name for an index, and check that no table contains this
# name. This assumes that materialized-view names are composed using the
# index's name (which is currently what we do).
@pytest.fixture(scope="session")
def test_table_gsi_random_name(dynamodb):
index_name = random_string()
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield [table, index_name]
table.delete()
def test_gsi_list_tables(dynamodb, test_table_gsi_random_name):
table, index_name = test_table_gsi_random_name
# Check that the random "index_name" isn't a substring of any table name:
tables = list_tables(dynamodb)
for name in tables:
assert not index_name in name
# But of course, the table's name should be in the list:
assert table.name in tables

View File

@@ -0,0 +1,35 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the health check
import requests
# Test that a health check can be performed with a GET packet
def test_health_works(dynamodb):
url = dynamodb.meta.client._endpoint.host
response = requests.get(url)
assert response.ok
assert response.content.decode('utf-8').strip() == 'healthy: {}'.format(url.replace('https://', '').replace('http://', ''))
# Test that a health check only works for the root URL ('/')
def test_health_only_works_for_root_path(dynamodb):
url = dynamodb.meta.client._endpoint.host
for suffix in ['/abc', '/-', '/index.htm', '/health']:
print(url + suffix)
response = requests.get(url + suffix, verify=False)
assert response.status_code in range(400, 405)

View File

@@ -0,0 +1,402 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the CRUD item operations: PutItem, GetItem, UpdateItem, DeleteItem
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string, random_bytes
# Basic test for creating a new item with a random name, and reading it back
# with strong consistency.
# Only the string type is used for keys and attributes. None of the various
# optional PutItem features (Expected, ReturnValues, ReturnConsumedCapacity,
# ReturnItemCollectionMetrics, ConditionalOperator, ConditionExpression,
# ExpressionAttributeNames, ExpressionAttributeValues) are used, and
# for GetItem strong consistency is requested as well as all attributes,
# but no other optional features (AttributesToGet, ReturnConsumedCapacity,
# ProjectionExpression, ExpressionAttributeNames)
def test_basic_string_put_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'attribute': val, 'another': val2})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Similar to test_basic_string_put_and_get, just uses UpdateItem instead of
# PutItem. Because the item does not yet exist, it should work the same.
def test_basic_string_update_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'PUT'}, 'another': {'Value': val2, 'Action': 'PUT'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Test put_item and get_item of various types for the *attributes*,
# including both scalars as well as nested documents, lists and sets.
# The full list of types tested here:
# number, boolean, bytes, null, list, map, string set, number set,
# binary set.
# The keys are still strings.
# Note that only top-level attributes are written and read in this test -
# this test does not attempt to modify *nested* attributes.
# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/dynamodb.html
# on how to pass these various types to Boto3's put_item().
def test_put_and_get_attribute_types(test_table):
key = {'p': random_string(), 'c': random_string()}
test_items = [
Decimal("12.345"),
42,
True,
False,
b'xyz',
None,
['hello', 'world', 42],
{'hello': 'world', 'life': 42},
{'hello': {'test': 'hi', 'hello': True, 'list': [1, 2, 'hi']}},
set(['hello', 'world', 'hi']),
set([1, 42, Decimal("3.14")]),
set([b'xyz', b'hi']),
]
item = { str(i) : test_items[i] for i in range(len(test_items)) }
item.update(key)
test_table.put_item(Item=item)
got_item = test_table.get_item(Key=key, ConsistentRead=True)['Item']
assert item == got_item
# The test_empty_* tests below verify support for empty items, with no
# attributes except the key. This is a difficult case for Scylla, because
# for an empty row to exist, Scylla needs to add a "CQL row marker".
# There are several ways to create empty items - via PutItem, UpdateItem
# and deleting attributes from non-empty items, and we need to check them
# all, in several test_empty_* tests:
def test_empty_put(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_put_delete(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'hello': 'world'})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update_delete(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Value': 'world', 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
# Test error handling of UpdateItem passed a bad "Action" field.
def test_update_bad_action(test_table):
p = random_string()
c = random_string()
val = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'NONEXISTENT'}})
# A more elaborate UpdateItem test, updating different attributes at different
# times. Includes PUT and DELETE operations.
def test_basic_string_more_update(test_table):
p = random_string()
c = random_string()
val1 = random_string()
val2 = random_string()
val3 = random_string()
val4 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a2': {'Value': val2, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val3, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['a1'] == val3
assert item['a2'] == val2
assert not 'a3' in item
# Test that item operations on a non-existant table name fail with correct
# error code.
def test_item_operations_nonexistent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.put_item(TableName='non_existent_table',
Item={'a':{'S':'b'}})
# Fetching a non-existant item. According to the DynamoDB doc, "If there is no
# matching item, GetItem does not return any data and there will be no Item
# element in the response."
def test_get_item_missing_item(test_table):
p = random_string()
c = random_string()
assert not "Item" in test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)
# Test that if we have a table with string hash and sort keys, we can't read
# or write items with other key types to it.
def test_put_item_wrong_key_type(test_table):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.put_item(Item={'p': s, 'c': s})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s})
def test_update_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.update_item(Key={'p': s, 'c': s}, AttributeUpdates={})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': b, 'c': s}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': n, 'c': s}, AttributeUpdates={})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': b}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': n}, AttributeUpdates={})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'c': s}, AttributeUpdates={})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s}, AttributeUpdates={})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_get_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types) but have empty result
assert not "Item" in test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_delete_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.delete_item(Key={'p': s, 'c': s})
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': s, 'c': s})
# Most of the tests here arbitrarily used a table with both hash and sort keys
# (both strings). Let's check that a table with *only* a hash key works ok
# too, for PutItem, GetItem, and UpdateItem.
def test_only_hash_key(test_table_s):
s = random_string()
test_table_s.put_item(Item={'p': s, 'hello': 'world'})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world'}
test_table_s.update_item(Key={'p': s}, AttributeUpdates={'hi': {'Value': 'there', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world', 'hi': 'there'}
# Tests for item operations in tables with non-string hash or sort keys.
# These tests focus only on the type of the key - everything else is as
# simple as we can (string attributes, no special options for GetItem
# and PutItem). These tests also focus on individual items only, and
# not about the sort order of sort keys - this should be verified in
# test_query.py, for example.
def test_bytes_hash_key(test_table_b):
# Bytes values are passed using base64 encoding, which has weird cases
# depending on len%3 and len%4. So let's try various lengths.
for len in range(10,18):
p = random_bytes(len)
val = random_string()
test_table_b.put_item(Item={'p': p, 'attribute': val})
assert test_table_b.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'attribute': val}
def test_bytes_sort_key(test_table_sb):
p = random_string()
c = random_bytes()
val = random_string()
test_table_sb.put_item(Item={'p': p, 'c': c, 'attribute': val})
assert test_table_sb.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': val}
# Tests for using a large binary blob as hash key, sort key, or attribute.
# DynamoDB strictly limits the size of the binary hash key to 2048 bytes,
# and binary sort key to 1024 bytes, and refuses anything larger. The total
# size of an item is limited to 400KB, which also limits the size of the
# largest attributes. For more details on these limits, see
# https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
# Alternator currently does *not* have these limitations, and can accept much
# larger keys and attributes, but what we do in the following tests is to verify
# that items up to DynamoDB's maximum sizes also work well in Alternator.
def test_large_blob_hash_key(test_table_b):
b = random_bytes(2048)
test_table_b.put_item(Item={'p': b})
assert test_table_b.get_item(Key={'p': b}, ConsistentRead=True)['Item'] == {'p': b}
def test_large_blob_sort_key(test_table_sb):
s = random_string()
b = random_bytes(1024)
test_table_sb.put_item(Item={'p': s, 'c': b})
assert test_table_sb.get_item(Key={'p': s, 'c': b}, ConsistentRead=True)['Item'] == {'p': s, 'c': b}
def test_large_blob_attribute(test_table):
p = random_string()
c = random_string()
b = random_bytes(409500) # a bit less than 400KB
test_table.put_item(Item={'p': p, 'c': c, 'attribute': b })
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': b}
# Checks what it is not allowed to use in a single UpdateItem request both
# old-style AttributeUpdates and new-style UpdateExpression.
def test_update_item_two_update_methods(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'a': {'Value': 3, 'Action': 'PUT'}},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
# Verify that having neither AttributeUpdates nor UpdateExpression is
# allowed, and results in creation of an empty item.
def test_update_item_no_update_method(test_table_s):
p = random_string()
assert not "Item" in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.update_item(Key={'p': p})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p}
# Test GetItem with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_getitem_attributes_to_get(dynamodb, test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=wanted, ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Basic test for DeleteItem, with hash key only
def test_delete_item_hash(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p})
assert 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.delete_item(Key={'p': p})
assert not 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for DeleteItem, with hash and sort key
def test_delete_item_sort(test_table):
p = random_string()
c = random_string()
key = {'p': p, 'c': c}
test_table.put_item(Item=key)
assert 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
test_table.delete_item(Key=key)
assert not 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
# Test that PutItem completely replaces an existing item. It shouldn't merge
# it with a previously existing value, as UpdateItem does!
# We test for a table with just hash key, and for a table with both hash and
# sort keys.
def test_put_item_replace(test_table_s, test_table):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
test_table_s.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
test_table.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}

365
alternator-test/test_lsi.py Normal file
View File

@@ -0,0 +1,365 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of LSI (Local Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new LSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# Currently, Alternator's LSIs only support eventually consistent reads, so tests
# that involve writing to a table and then expect to read something from it cannot
# be guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_lsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'S' }],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'c': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# Checks that providing a hash key different than the base table is not allowed,
# and so is providing duplicated keys or no sort key at all
def test_lsi_wrong(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
# A simple scenario for LSI. Base table has just hash key, Index has an
# additional sort key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_lsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_lsi_1(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}})
# A second scenario of LSI. Base table has both hash and sort keys,
# a local index is created on each non-key parameter
@pytest.fixture(scope="session")
def test_table_lsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
{ 'AttributeName': 'x2', 'AttributeType': 'S' },
{ 'AttributeName': 'x3', 'AttributeType': 'S' },
{ 'AttributeName': 'x4', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_' + column,
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': column, 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
} for column in ['x1','x2','x3','x4']
])
yield table
table.delete()
def test_lsi_4(test_table_lsi_4):
items1 = [{'p': random_string(), 'c': random_string(),
'x1': random_string(), 'x2': random_string(), 'x3': random_string(), 'x4': random_string()} for i in range(10)]
i_values = items1[0]
i5 = random_string()
items2 = [{'p': i5, 'c': i5, 'x1': i5, 'x2': i5, 'x3': i5, 'x4': i5}]
items = items1 + items2
with test_table_lsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
for column in ['x1', 'x2', 'x3', 'x4']:
expected_items = [i for i in items if (i['p'], i[column]) == (i_values['p'], i_values[column])]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i_values['p']], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i_values[column]], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if (i['p'], i[column]) == (i5, i5)]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'}})
def test_lsi_describe(test_table_lsi_4):
desc = test_table_lsi_4.meta.client.describe_table(TableName=test_table_lsi_4.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_x1', 'hello_x2', 'hello_x3', 'hello_x4'])
# TODO: check projection and key params
# TODO: check also ProvisionedThroughput, IndexArn
# A table with selective projection - only keys are projected into the index
@pytest.fixture(scope="session")
def test_table_lsi_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Check that it's possible to extract a non-projected attribute from the index,
# as the documentation promises
def test_lsi_get_not_projected_attribute(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1 and i['d'] == d1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [{'d': i['d']} for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['d'])
# Check that only projected attributes can be extracted
@pytest.mark.xfail(reason="LSI in alternator currently only implement full projections")
def test_lsi_get_all_projected_attributes(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [{'p': i['p'], 'c': i['c'],'b': i['b']} for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
# Check that strongly consistent reads are allowed for LSI
def test_lsi_consistent_read(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
# A table with both gsi and lsi present
@pytest.fixture(scope="session")
def test_table_lsi_gsi(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_g1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_l1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Test that GSI and LSI can coexist, even if they're identical
def test_lsi_and_gsi(test_table_lsi_gsi):
desc = test_table_lsi_gsi.meta.client.describe_table(TableName=test_table_lsi_gsi.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
assert 'GlobalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_l1'])
assert(sorted([gsi['IndexName'] for gsi in gsis]) == ['hello_g1'])
items = [{'p': random_string(), 'c': random_string(), 'x1': random_string()} for i in range(17)]
p1, c1, x1 = items[0]['p'], items[0]['c'], items[0]['x1']
with test_table_lsi_gsi.batch_writer() as batch:
for item in items:
batch.put_item(item)
for index in ['hello_g1', 'hello_l1']:
expected_items = [i for i in items if i['p'] == p1 and i['x1'] == x1]
assert_index_query(test_table_lsi_gsi, index, expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x1': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

View File

@@ -0,0 +1,60 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for operations on items with *nested* attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test that we can write a top-level attribute that is a nested document, and
# read it back correctly.
def test_nested_document_attribute_write(test_table_s):
nested_value = {
'a': 3,
'b': {'c': 'hello', 'd': ['hi', 'there', {'x': 'y'}, '42']},
}
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': nested_value})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': nested_value}
# Test that if we have a top-level attribute that is a nested document (i.e.,
# a dictionary), updating this attribute will replace it entirely by a new
# nested document - not merge into the old content with the new content.
def test_nested_document_attribute_overwrite(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': {'c': 5}, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'c': 5}, 'd': 5}
# Moreover, we can overwrite an entire nested document by, say, a string,
# and that's also fine.
def test_nested_document_attribute_overwrite_2(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': 'hi', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi', 'd': 5}
# Verify that AttributeUpdates cannot be used to update a nested attribute -
# trying to use a dot in the name of the attribute, will just create one with
# an actual dot in its name.
def test_attribute_updates_dot(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a.b': {'Value': 3, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}

View File

@@ -0,0 +1,201 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the various operations (GetItem, Query, Scan) with a
# ProjectionExpression parameter.
#
# ProjectionExpression is an expension of the legacy AttributesToGet
# parameter. Both parameters request that only a subset of the attributes
# be fetched for each item, instead of all of them. But while AttributesToGet
# was limited to top-level attributes, ProjectionExpression can request also
# nested attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Basic test for ProjectionExpression, requesting only top-level attributes.
# Result should include the selected attributes only - if one wants the key
# attributes as well, one needs to select them explicitly. When no key
# attributes are selected, an item may have *none* of the selected
# attributes, and returned as an empty item.
def test_projection_expression_toplevel(test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, ProjectionExpression=",".join(wanted), ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Various simple tests for ProjectionExpression's syntax, using only top-evel
# attributes.
def test_projection_expression_toplevel_syntax(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a'})['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,b')['Item'] == {'a': 'hello', 'b': 'hi'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=' a , b ')['Item'] == {'a': 'hello', 'b': 'hi'}
# Missing or unused names in ExpressionAttributeNames are errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#wrong': 'a'})['Item'] == {'a': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a', '#unused': 'b'})['Item'] == {'a': 'hello'}
# It is not allowed to fetch the same top-level attribute twice (or in
# general, list two overlapping attributes). We get an error like
# "Invalid ProjectionExpression: Two document paths overlap with each
# other; must remove or rewrite one of these paths; path one: [a], path
# two: [a]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a')['Item']
# A comma with nothing after it is a syntax error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=',a')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,,b')['Item']
# An empty ProjectionExpression is not allowed. DynamoDB recognizes its
# syntax, but then writes: "Invalid ProjectionExpression: The expression
# can not be empty".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='')['Item']
# The following two tests are similar to test_projection_expression_toplevel()
# which tested the GetItem operation - but these test Scan and Query.
# Both test ProjectionExpression with only top-level attributes.
def test_projection_expression_scan(filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_scan(table, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_projection_expression_query(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# The previous tests all fetched only top-level attributes. They could all
# be written using AttributesToGet instead of ProjectionExpression (and,
# in fact, we do have similar tests with AttributesToGet in other files),
# but the previous test checked that the alternative syntax works correctly.
# The following test checks fetching more elaborate attribute paths from
# nested documents.
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_projection_expression_path(test_table_s):
p = random_string()
test_table_s.put_item(Item={
'p': p,
'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5},
'b': 'hello'
})
# Fetching the entire nested document "a" works, of course:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5}}
# If we fetch a.b, we get only the content of b - but it's still inside
# the a dictionary:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}]}}
# Similarly, fetching a.b[0] gives us a one-element array in a dictionary.
# Note that [0] is the first element of an array.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0]')['Item'] == {'a': {'b': [2]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2]')['Item'] == {'a': {'b': [{'x': 'hi', 'y': 'yo'}]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2].y')['Item'] == {'a': {'b': [{'y': 'yo'}]}}
# Trying to read any sort of non-existant attribute returns an empty item.
# This includes a non-existing top-level attribute, an attempt to read
# beyond the end of an array or a non-existant member of a dictionary, as
# well as paths which begin with a non-existant prefix.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3]')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x.y')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3].x')['Item'] == {}
# We can read multiple paths - the result are merged into one object
# structured the same was as in the original item:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[1]')['Item'] == {'a': {'b': [2, 4]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.c')['Item'] == {'a': {'b': [2], 'c': 5}}
# It is not allowed to read the same path multiple times. The error from
# DynamoDB looks like: "Invalid ProjectionExpression: Two document paths
# overlap with each other; must remove or rewrite one of these paths;
# path one: [a, b, [0]], path two: [a, b, [0]]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[0]')['Item']
# Two paths are considered to "overlap" if the content of one path
# contains the content of the second path. So requesting both "a" and
# "a.b[0]" is not allowed.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a.b[0]')['Item']
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_query_projection_expression_path(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression="a.x")
expected_items = [{'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_scan_projection_expression_path(test_table):
# This test is similar to test_query_projection_expression_path above,
# but uses a scan instead of a query. The scan will generate unrelated
# partitions created by other tests (hopefully not too many...) that we
# need to ignore. We also need to ask for "p" too, so we can filter by it.
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = [ x for x in full_scan(test_table, ProjectionExpression="p, a.x") if x['p'] == p]
expected_items = [{'p': p, 'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
# It is not allowed to use both ProjectionExpression and its older cousin,
# AttributesToGet, together. If trying to do this, DynamoDB produces an error
# like "Can not use both expression and non-expression parameters in the same
# request: Non-expression parameters: {AttributesToGet} Expression
# parameters: {ProjectionExpression}
def test_projection_expression_and_attributes_to_get(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
with pytest.raises(ClientError, match='ValidationException.*both'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a', AttributesToGet=['b'])['Item']
with pytest.raises(ClientError, match='ValidationException.*both'):
full_scan(test_table_s, ProjectionExpression='a', AttributesToGet=['a'])
with pytest.raises(ClientError, match='ValidationException.*both'):
full_query(test_table_s, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression='a', AttributesToGet=['a'])

View File

@@ -0,0 +1,516 @@
# -*- coding: utf-8 -*-
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Query operation
import random
import pytest
from botocore.exceptions import ClientError, ParamValidationError
from decimal import Decimal
from util import random_string, random_bytes, full_query, multiset
from boto3.dynamodb.conditions import Key, Attr
# Test that scanning works fine with in-stock paginator
def test_query_basic_restrictions(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
# EQ
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long']) == multiset(got_items)
# LT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['12'], 'ComparisonOperator': 'LT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
# LE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'LE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] <= '14']) == multiset(got_items)
# GT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['15'], 'ComparisonOperator': 'GT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] > '15']) == multiset(got_items)
# GE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'GE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '14']) == multiset(got_items)
# BETWEEN
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['155', '164'], 'ComparisonOperator': 'BETWEEN'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '155' and item['c'] <= '164']) == multiset(got_items)
# BEGINS_WITH
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['11'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
print([item for item in items if item['p'] == 'long' and item['c'].startswith('11')])
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'].startswith('11')]) == multiset(got_items)
# Test that KeyConditionExpression parameter is supported
@pytest.mark.xfail(reason="KeyConditionExpression not supported yet")
def test_query_key_condition_expression(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditionExpression=Key("p").eq("long") & Key("c").lt("12")):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
def test_begins_with(dynamodb, test_table):
paginator = dynamodb.meta.client.get_paginator('query')
items = [{'p': 'unorthodox_chars', 'c': sort_key, 'str': 'a'} for sort_key in [u'ÿÿÿ', u'cÿbÿ', u'cÿbÿÿabg'] ]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# TODO(sarna): Once bytes type is supported, /xFF character should be tested
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'ÿÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'ÿÿ')])
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'cÿbÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'cÿbÿ')])
def test_begins_with_wrong_type(dynamodb, test_table_sn):
paginator = dynamodb.meta.client.get_paginator('query')
with pytest.raises(ClientError, match='ValidationException'):
for page in paginator.paginate(TableName=test_table_sn.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [17], 'ComparisonOperator': 'BEGINS_WITH'}
}):
pass
# Items returned by Query should be sorted by the sort key. The following
# tests verify that this is indeed the case, for the three allowed key types:
# strings, binary, and numbers. These tests test not just the Query operation,
# but inherently that the sort-key sorting works.
def test_query_sort_order_string(test_table):
# Insert a lot of random items in one new partition:
# str(i) has a non-obvious sort order (e.g., "100" comes before "2") so is a nice test.
p = random_string()
items = [{'p': p, 'c': str(i)} for i in range(128)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the items
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Verify that got_sort_keys are already sorted (in string order)
assert sorted(got_sort_keys) == got_sort_keys
# Verify that got_sort_keys are a sorted version of the expected sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_bytes(test_table_sb):
# Insert a lot of random items in one new partition:
# We arbitrarily use random_bytes with a random length.
p = random_string()
items = [{'p': p, 'c': random_bytes(10)} for i in range(128)]
with test_table_sb.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table_sb, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Boto3's "Binary" objects are sorted as if bytes are signed integers.
# This isn't the order that DynamoDB itself uses (byte 0 should be first,
# not byte -128). Sorting the byte array ".value" works.
assert sorted(got_sort_keys, key=lambda x: x.value) == got_sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_number(test_table_sn):
# This is a list of numbers, sorted in correct order, and each suitable
# for accurate representation by Alternator's number type.
numbers = [
Decimal("-2e10"),
Decimal("-7.1e2"),
Decimal("-4.1"),
Decimal("-0.1"),
Decimal("-1e-5"),
Decimal("0"),
Decimal("2e-5"),
Decimal("0.15"),
Decimal("1"),
Decimal("1.00000000000000000000000001"),
Decimal("3.14159"),
Decimal("3.1415926535897932384626433832795028841"),
Decimal("31.4"),
Decimal("1.4e10"),
]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Finally, verify that we get back exactly the same numbers (with identical
# precision), and in their original sorted order.
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
def test_query_filtering_attributes_equality(filled_test_table):
test_table, items = filled_test_table
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "yy" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_query_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# QueryFilter can only contain non-key attributes in order to be compatible
def test_query_filtering_key_equality(filled_test_table):
test_table, items = filled_test_table
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"c" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"attribute" : {
"AttributeValueList" : [ "x" ],
"ComparisonOperator": "EQ"
},
"p" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
# Test Query with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_query_attributes_to_get(dynamodb, test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# Test that in a table with both hash key and sort key, which keys we can
# Query by: We can Query by the hash key, by a combination of both hash and
# sort keys, but *cannot* query by just the sort key, and obviously not
# by any non-key column.
def test_query_which_key(test_table):
p = random_string()
c = random_string()
p2 = random_string()
c2 = random_string()
item1 = {'p': p, 'c': c}
item2 = {'p': p, 'c': c2}
item3 = {'p': p2, 'c': c}
for i in [item1, item2, item3]:
test_table.put_item(Item=i)
# Query by hash key only:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
expected_items = [item1, item2]
assert multiset(expected_items) == multiset(got_items)
# Query by hash key *and* sort key (this is basically a GetItem):
got_items = full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
expected_items = [item1]
assert multiset(expected_items) == multiset(got_items)
# Query by sort key alone is not allowed. DynamoDB reports:
# "Query condition missed key schema element: p".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Query by a non-key isn't allowed, for the same reason - that the
# actual hash key (p) is missing in the query:
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try both p and a non-key we get a complaint that the sort
# key is missing: "Query condition missed key schema element: c"
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try p, c and another key, we get an error that
# "Conditions can be of length 1 or 2 only".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Test the "Select" parameter of Query. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_query_select(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num, 'x': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order. By default,
# query returns all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_ATTRIBUTES')['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES')
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['x'])['Items']
expected_items = [{'x': i} for i in numbers]
assert got_items == expected_items
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression='x')['Items']
assert got_items == expected_items
# Select=COUNT just returns a count - not any items
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='COUNT')
assert got['Count'] == len(numbers)
assert not 'Items' in got
# Check again that we also get a count - not just with Select=COUNT,
# but without Select=COUNT we also get the items:
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert got['Count'] == len(numbers)
assert 'Items' in got
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='UNKNOWN')
# Test that the "Limit" parameter can be used to return only some of the
# items in a single partition. The items returned are the first in the
# sorted order.
def test_query_limit(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
# Unfortunately, the boto3 library forbids a Limit of 0 on its own,
# before even sending a request, so we can't test how the server responds.
with pytest.raises(ParamValidationError):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=0)
# In test_query_limit we tested just that Limit allows to stop the result
# after right right number of items. Here we test that such a stopped result
# can be resumed, via the LastEvaluatedKey/ExclusiveStartKey paging mechanism.
def test_query_limit_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that full_query() returns all these numbers, in sorted order.
# full_query() will do a query with the given limit, and resume it again
# and again until the last page.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Test that the ScanIndexForward parameter works, and can be used to
# return items sorted in reverse order. Combining this with Limit can
# be used to return the last items instead of the first items of the
# partition.
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order or reverse
# order, depending on the ScanIndexForward parameter being True or False.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
reversed_numbers = list(reversed(numbers))
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=True)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers - in regular or
# reverse order, depending on ScanIndexForward.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=True)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=False)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers[0:limit]
# Test that paging also works properly with reverse order
# (ScanIndexForward=false), i.e., reverse-order queries can be resumed
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
reversed_numbers = list(reversed(numbers))
# Verify that with ScanIndexForward=False, full_query() returns all
# these numbers in reversed sorted order - getting pages of Limit items
# at a time and resuming the query.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers

View File

@@ -0,0 +1,226 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the ReturnValues parameter for the different update operations
# (PutItem, UpdateItem, DeleteItem).
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test trivial support for the ReturnValues parameter in PutItem, UpdateItem
# and DeleteItem - test that "NONE" works (and changes nothing), while a
# completely unsupported value gives an error.
# This test is useful to check that before the ReturnValues parameter is fully
# implemented, it returns an error when a still-unsupported ReturnValues
# option is attempted in the request - instead of simply being ignored.
def test_trivial_returnvalues(test_table_s):
# PutItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# UpdateItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# DeleteItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# Test the ReturnValues parameter on a PutItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_put_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='none')
# Test the ReturnValues parameter on a DeleteItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_delete_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='none')
# Test the ReturnValues parameter on a UpdateItem operation. All five
# settings are supported for this parameter for this operation: NONE
# (the default), ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_update_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the entire old value of the item (even
# attributes we did not modify) is returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'dog'}
# With ReturnValues=UPDATED_OLD, only the overwritten attributes of the
# old item are returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'dog'}
# Even if an update overwrites an attribute by the same value again,
# this is considered an update, and the old value (identical to the
# new one) is returned:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'b': 'cat'}
# Deleting an attribute also counts as overwriting it, of course:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='REMOVE b')
assert ret['Attributes'] == {'b': 'cat'}
# With ReturnValues=ALL_NEW, the entire new value of the item (including
# old attributes we did not modify) is returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_NEW',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'cat'}
# With ReturnValues=UPDATED_NEW, only the new value of the updated
# attributes are returned. Note that "updated attributes" means
# the newly set attributes - it doesn't require that these attributes
# have any previous values
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'cat', 'c': 'hello'}
# Deleting an attribute also counts as overwriting it, but the delete
# column is not returned in the response - so it's empty in this case.
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='REMOVE b')
assert not 'Attributes' in ret
# In the above examples, UPDATED_NEW is not useful because it just
# returns the new values we already know from the request... UPDATED_NEW
# becomes more useful in read-modify-write operations:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 1})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
assert ret['Attributes'] == {'a': 2}
# A non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='none',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})

View File

@@ -0,0 +1,252 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Scan operation
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_scan_and_count, multiset
from boto3.dynamodb.conditions import Attr
# Test that scanning works fine with/without pagination
def test_scan_basic(filled_test_table):
test_table, items = filled_test_table
for limit in [None,1,2,4,33,50,100,9007,16*1024*1024]:
pos = None
got_items = []
while True:
if limit:
response = test_table.scan(Limit=limit, ExclusiveStartKey=pos) if pos else test_table.scan(Limit=limit)
assert len(response['Items']) <= limit
else:
response = test_table.scan(ExclusiveStartKey=pos) if pos else test_table.scan()
pos = response.get('LastEvaluatedKey', None)
got_items += response['Items']
if not pos:
break
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
def test_scan_with_paginator(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('scan')
got_items = []
for page in paginator.paginate(TableName=test_table.name):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
for page_size in [1, 17, 1234]:
got_items = []
for page in paginator.paginate(TableName=test_table.name, PaginationConfig={'PageSize': page_size}):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
# Although partitions are scanned in seemingly-random order, inside a
# partition items must be returned by Scan sorted in sort-key order.
# This test verifies this, for string sort key. We'll need separate
# tests for the other sort-key types (number and binary)
def test_scan_sort_order_string(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the partition "long"
items_long = [x['c'] for x in items if x['p'] == 'long']
got_items_long = [x['c'] for x in got_items if x['p'] == 'long']
# Verify that got_items_long are already sorted (in string order)
assert sorted(got_items_long) == got_items_long
# Verify that got_items_long are a sorted version of the expected items_long
assert sorted(items_long) == got_items_long
# Test Scan with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_scan_attributes_to_get(dynamodb, filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
print(wanted)
got_items = full_scan(table, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_scan_with_attribute_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" ]
assert multiset(expected_items) == multiset(got_items)
scan_filter = {
"another" : {
"AttributeValueList" : [ "y" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" and item["another"] == "y" ]
assert multiset(expected_items) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_scan_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and 'another' in item.keys() and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
def test_scan_with_key_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter_p = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_p_and_attribute = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "x"*7 ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c_and_another = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "y"*16 ],
"ComparisonOperator": "EQ"
}
}
# Filtering on the hash key
got_items = full_scan(table, ScanFilter=scan_filter_p)
expected_items = [item for item in items if "p" in item.keys() and item["p"] == "7" ]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key
got_items = full_scan(table, ScanFilter=scan_filter_c)
expected_items = [item for item in items if "c" in item.keys() and item["c"] == "9"]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the hash key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_p_and_attribute)
expected_items = [item for item in items if "p" in item.keys() and "another" in item.keys() and item["p"] == "7" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_c_and_another)
expected_items = [item for item in items if "c" in item.keys() and "another" in item.keys() and item["c"] == "9" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Test the "Select" parameter of Scan. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_scan_select(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
# By default, a scan returns all the items, with all their attributes:
# query returns all attributes:
got_items = full_scan(test_table)
assert multiset(items) == multiset(got_items)
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = full_scan(test_table, Select='ALL_ATTRIBUTES')
assert multiset(items) == multiset(got_items)
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing beyond
# what AttributesToGet and ProjectionExpression already do:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='SPECIFIC_ATTRIBUTES')
wanted = ['c', 'another']
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression=','.join(wanted))
assert multiset(expected_items) == multiset(got_items)
# Select=COUNT just returns a count - not any items
(got_count, got_items) = full_scan_and_count(test_table, Select='COUNT')
assert got_count == len(items)
assert got_items == []
# Check that we also get a count in regular scans - not just with
# Select=COUNT, but without Select=COUNT we both items and count:
(got_count, got_items) = full_scan_and_count(test_table)
assert got_count == len(items)
assert multiset(items) == multiset(got_items)
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='UNKNOWN')
# Test parallel scan, i.e., the Segments and TotalSegments options.
# In the following test we check that these parameters allow splitting
# a scan into multiple parts, and that these parts are in fact disjoint,
# and their union is the entire contents of the table. We do not actually
# try to run these queries in *parallel* in this test.
@pytest.mark.xfail(reason="parallel scan not supported yet")
def test_scan_parallel(filled_test_table):
test_table, items = filled_test_table
for nsegments in [1, 2, 17]:
print('Testing TotalSegments={}'.format(nsegments))
got_items = []
for segment in range(nsegments):
got_items.extend(full_scan(test_table, TotalSegments=nsegments, Segment=segment))
# The following comparison verifies that each of the expected item
# in items was returned in one - and just one - of the segments.
assert multiset(items) == multiset(got_items)

View File

@@ -0,0 +1,276 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for basic table operations: CreateTable, DeleteTable, ListTables.
import pytest
from botocore.exceptions import ClientError
from util import list_tables, test_table_name, create_test_table, random_string
# Utility function for create a table with a given name and some valid
# schema.. This function initiates the table's creation, but doesn't
# wait for the table to actually become ready.
def create_table(dynamodb, name, BillingMode='PAY_PER_REQUEST', **kwargs):
return dynamodb.create_table(
TableName=name,
BillingMode=BillingMode,
KeySchema=[
{
'AttributeName': 'p',
'KeyType': 'HASH'
},
{
'AttributeName': 'c',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'p',
'AttributeType': 'S'
},
{
'AttributeName': 'c',
'AttributeType': 'S'
},
],
**kwargs
)
# Utility function for creating a table with a given name, and then deleting
# it immediately, waiting for these operations to complete. Since the wait
# uses DescribeTable, this function requires all of CreateTable, DescribeTable
# and DeleteTable to work correctly.
# Note that in DynamoDB, table deletion takes a very long time, so tests
# successfully using this function are very slow.
def create_and_delete_table(dynamodb, name, **kwargs):
table = create_table(dynamodb, name, **kwargs)
table.meta.client.get_waiter('table_exists').wait(TableName=name)
table.delete()
table.meta.client.get_waiter('table_not_exists').wait(TableName=name)
##############################################################################
# Test creating a table, and then deleting it, waiting for each operation
# to have completed before proceeding. Since the wait uses DescribeTable,
# this tests requires all of CreateTable, DescribeTable and DeleteTable to
# function properly in their basic use cases.
# Unfortunately, this test is extremely slow with DynamoDB because deleting
# a table is extremely slow until it really happens.
def test_create_and_delete_table(dynamodb):
create_and_delete_table(dynamodb, 'alternator_test')
# DynamoDB documentation specifies that table names must be 3-255 characters,
# and match the regex [a-zA-Z0-9._-]+. Names not matching these rules should
# be rejected, and no table be created.
def test_create_table_unsupported_names(dynamodb):
from botocore.exceptions import ParamValidationError, ClientError
# Intererstingly, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_create_and_delete_table_non_scylla_name(dynamodb):
create_and_delete_table(dynamodb, '.alternator_test')
# names with 255 characters are allowed in Dynamo, but they are not currently
# supported in Scylla because we create a directory whose name is the table's
# name followed by 33 bytes (underscore and UUID). So currently, we only
# correctly support names with length up to 222.
def test_create_and_delete_table_very_long_name(dynamodb):
# In the future, this should work:
#create_and_delete_table(dynamodb, 'n' * 255)
# But for now, only 222 works:
create_and_delete_table(dynamodb, 'n' * 222)
# We cannot test the following on DynamoDB because it will succeed
# (DynamoDB allows up to 255 bytes)
#with pytest.raises(ClientError, match='ValidationException'):
# create_table(dynamodb, 'n' * 223)
# Tests creating a table with an invalid schema should return a
# ValidationException error.
def test_create_table_invalid_schema(dynamodb):
# The name of the table "created" by this test shouldn't matter, the
# creation should not succeed anyway.
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'HASH' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'z', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'k', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'k', 'AttributeType': 'Q' }
],
)
# Test that trying to create a table that already exists fails in the
# appropriate way (ResourceInUseException)
def test_create_table_already_exists(dynamodb, test_table):
with pytest.raises(ClientError, match='ResourceInUseException'):
create_table(dynamodb, test_table.name)
# Test that BillingMode error path works as expected - only the values
# PROVISIONED or PAY_PER_REQUEST are allowed. The former requires
# ProvisionedThroughput to be set, the latter forbids it.
# If BillingMode is outright missing, it defaults (as original
# DynamoDB did) to PROVISIONED so ProvisionedThroughput is allowed.
def test_create_table_billing_mode_errors(dynamodb, test_table):
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='unknown')
# billing mode is case-sensitive
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='pay_per_request')
# PAY_PER_REQUEST cannot come with a ProvisionedThroughput:
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(),
BillingMode='PAY_PER_REQUEST', ProvisionedThroughput={'ReadCapacityUnits': 10, 'WriteCapacityUnits': 10})
# On the other hand, PROVISIONED requires ProvisionedThroughput:
# By the way, ProvisionedThroughput not only needs to appear, it must
# have both ReadCapacityUnits and WriteCapacityUnits - but we can't test
# this with boto3, because boto3 has its own verification that if
# ProvisionedThroughput is given, it must have the correct form.
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='PROVISIONED')
# If BillingMode is completely missing, it defaults to PROVISIONED, so
# ProvisionedThroughput is required
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(TableName=test_table_name(),
KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }])
# Our first implementation had a special column name called "attrs" where
# we stored a map for all non-key columns. If the user tried to name one
# of the key columns with this same name, the result was a disaster - Scylla
# goes into a bad state after trying to write data with two updates to same-
# named columns.
special_column_name1 = 'attrs'
special_column_name2 = ':attrs'
@pytest.fixture(scope="session")
def test_table_special_column_name(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[
{ 'AttributeName': special_column_name1, 'KeyType': 'HASH' },
{ 'AttributeName': special_column_name2, 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': special_column_name1, 'AttributeType': 'S' },
{ 'AttributeName': special_column_name2, 'AttributeType': 'S' },
],
)
yield table
table.delete()
@pytest.mark.xfail(reason="special attrs column not yet hidden correctly")
def test_create_table_special_column_name(test_table_special_column_name):
s = random_string()
c = random_string()
h = random_string()
expected = {special_column_name1: s, special_column_name2: c, 'hello': h}
test_table_special_column_name.put_item(Item=expected)
got = test_table_special_column_name.get_item(Key={special_column_name1: s, special_column_name2: c}, ConsistentRead=True)['Item']
assert got == expected
# Test that all tables we create are listed, and pagination works properly.
# Note that the DyanamoDB setup we run this against may have hundreds of
# other tables, for all we know. We just need to check that the tables we
# created are indeed listed.
def test_list_tables_paginated(dynamodb, test_table, test_table_s, test_table_b):
my_tables_set = {table.name for table in [test_table, test_table_s, test_table_b]}
for limit in [1, 2, 3, 4, 50, 100]:
print("testing limit={}".format(limit))
list_tables_set = set(list_tables(dynamodb, limit))
assert my_tables_set.issubset(list_tables_set)
# Test that pagination limit is validated
def test_list_tables_wrong_limit(dynamodb):
# lower limit (min. 1) is imposed by boto3 library checks
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.meta.client.list_tables(Limit=101)

View File

@@ -0,0 +1,854 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the UpdateItem operations with an UpdateExpression parameter
import random
import string
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string
# The simplest test of using UpdateExpression to set a top-level attribute,
# instead of the older AttributeUpdates parameter.
# Checks only one "SET" action in an UpdateExpression.
def test_update_expression_set(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
# An empty UpdateExpression is NOT allowed, and generates a "The expression
# can not be empty" error. This contrasts with an empty AttributeUpdates which
# is allowed, and results in the creation of an empty item if it didn't exist
# yet (see test_empty_update()).
def test_update_expression_empty(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='')
# A basic test with multiple SET actions in one expression
def test_update_expression_set_multi(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET x = :val1, y = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': 4, 'y': 4}
# SET can be used to copy an existing attribute to a new one
def test_update_expression_set_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
# Copying an non-existing attribute generates an error
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = z')
# It turns out that attributes to be copied are read before the SET
# starts to write, so "SET x = :val1, y = x" does not work...
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET x = :val1, y = x', ExpressionAttributeValues={':val1': 4})
# SET z=z does nothing if z exists, or fails if it doesn't
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET z = z')
# We can also use name references in either LHS or RHS of SET, e.g.,
# SET #one = #two. We need to also take the references used in the RHS
# when we want to complain about unused names in ExpressionAttributeNames.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello', 'c': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a', '#three': 'z'})
# Test for read-before-write action where the value to be read is nested inside a - operator
def test_update_expression_set_nested_copy(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #n = :two',
ExpressionAttributeNames={'#n': 'n'}, ExpressionAttributeValues={':two': 2})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nn = :seven - #n',
ExpressionAttributeNames={'#nn': 'nn', '#n': 'n'}, ExpressionAttributeValues={':seven': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnn = :nnn',
ExpressionAttributeNames={'#nnn': 'nnn'}, ExpressionAttributeValues={':nnn': [2,4]})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnnn = list_append(:val1, #nnn)',
ExpressionAttributeNames={'#nnnn': 'nnnn', '#nnn': 'nnn'}, ExpressionAttributeValues={':val1': [1,3]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5, 'nnn': [2,4], 'nnnn': [1,3,2,4]}
# Test for getting a key value with read-before-write
def test_update_expression_set_key(test_table_sn):
p = random_string()
test_table_sn.update_item(Key={'p': p, 'c': 7});
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #n = #p',
ExpressionAttributeNames={'#n': 'n', '#p': 'p'})
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #nn = #c + #c',
ExpressionAttributeNames={'#nn': 'nn', '#c': 'c'})
assert test_table_sn.get_item(Key={'p': p, 'c': 7}, ConsistentRead=True)['Item'] == {'p': p, 'c': 7, 'n': p, 'nn': 14}
# Simple test for the "REMOVE" action
def test_update_expression_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hi'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hi'}
# Demonstrate that although all DynamoDB examples give UpdateExpression
# action names in uppercase - e.g., "SET", it can actually be any case.
def test_update_expression_action_case(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression='sEt b = :val1', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# Demonstrate that whitespace is ignored in UpdateExpression parsing.
def test_update_expression_action_whitespace(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression=' set b=:val1 ', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# In UpdateExpression, the attribute name can appear directly in the expression
# (without a "#placeholder" notation) only if it is a single "token" as
# determined by DynamoDB's lexical analyzer rules: Such token is composed of
# alphanumeric characters whose first character must be alphabetic. Other
# names cause the parser to see multiple tokens, and produce syntax errors.
def test_update_expression_name_token(test_table_s):
p = random_string()
# Alphanumeric names starting with an alphabetical character work
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET alnum = :val1', ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['alnum'] == 1
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['Alpha_Numeric_123'] == 2
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET A123_ = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['A123_'] == 3
# But alphanumeric names cannot start with underscore or digits.
# DynamoDB's lexical analyzer doesn't recognize them, and produces
# a ValidationException looking like:
# Invalid UpdateExpression: Syntax error; token: "_", near: "SET _123"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _123 = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _abc = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123a = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123 = :val1', ExpressionAttributeValues={':val1': 3})
# Various other non-alpha-numeric characters, split a token and NOT allowed
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi-there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi$there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET "hithere" = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET !hithere = :val1', ExpressionAttributeValues={':val1': 3})
# In addition to the literal names, DynamoDB also allows references to any
# name, using the "#reference" syntax. It turns out the reference name is
# also a token following the rules as above, with one interesting point:
# since "#" already started the token, the next character may be any
# alphanumeric and doesn't need to be only alphabetical.
# Note that the reference target - the actual attribute name - can include
# absolutely any characters, and we use silly_name below as an example
silly_name = '3can include any character!.#='
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 4}, ExpressionAttributeNames={'#Alpha_Numeric_123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 4
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123a = :val1', ExpressionAttributeValues={':val1': 5}, ExpressionAttributeNames={'#123a': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 5
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123 = :val1', ExpressionAttributeValues={':val1': 6}, ExpressionAttributeNames={'#123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 6
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #_ = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#_': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 7
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #hi-there = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#hi-there': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #!hi = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#!hi': silly_name})
# Just a "#" is not enough as a token. Interestingly, DynamoDB will
# find the bad name in ExpressionAttributeNames before it actually tries
# to parse UpdateExpression, but we can verify the parse fails too by
# using a valid but irrelevant name in ExpressionAttributeNames:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#a': silly_name})
# There is also the value references, ":reference", for the right-hand
# side of an assignment. These have similar naming rules like "#reference".
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :Alpha_Numeric_123', ExpressionAttributeValues={':Alpha_Numeric_123': 8})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 8
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123a', ExpressionAttributeValues={':123a': 9})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123', ExpressionAttributeValues={':123': 10})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 10
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :_', ExpressionAttributeValues={':_': 11})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 11
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :hi!there', ExpressionAttributeValues={':hi!there': 12})
# Just a ":" is not enough as a token.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':': 7})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':a': 7})
# Trying to use a :reference on the left-hand side of an assignment will
# not work. In DynamoDB, it's a different type of token (and generates
# syntax error).
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET :a = :b', ExpressionAttributeValues={':a': 1, ':b': 2})
# Multiple actions are allowed in one expression, but actions are divided
# into clauses (SET, REMOVE, DELETE, ADD) and each of those can only appear
# once.
def test_update_expression_multi(test_table_s):
p = random_string()
# We can have two SET actions in one SET clause:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 2}
# But not two SET clauses - we get error "The "SET" section can only be used once in an update expression"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# We can have a REMOVE and a SET clause (note no comma between clauses):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = :val2', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = :val2 REMOVE b', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 3}
# The same clause (e.g., SET) cannot be used twice, even if interleaved with something else
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 REMOVE a SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# Trying to modify the same item twice in the same update is forbidden.
# For "SET a=:v REMOVE a" DynamoDB says: "Invalid UpdateExpression: Two
# document paths overlap with each other; must remove or rewrite one of
# these paths; path one: [a], path two: [a]".
# It is actually good for Scylla that such updates are forbidden, because had
# we allowed "SET a=:v REMOVE a" the result would be surprising - because data
# wins over a delete with the same timestamp, so "a" would be set despite the
# REMOVE command appearing later in the command line.
def test_update_expression_multi_overlap(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# Neither "REMOVE a SET a = :v" nor "SET a = :v REMOVE a" are allowed:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET a = :v', ExpressionAttributeValues={':v': 'hi'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v REMOVE a', ExpressionAttributeValues={':v': 'yo'})
# It's also not allowed to set a twice in the same clause
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v1, a = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'})
# Obviously, the paths are compared after the name references are evaluated
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a1 = :v1, #a2 = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'}, ExpressionAttributeNames={'#a1': 'a', '#a2': 'a'})
# The problem isn't just with identical paths - we can't modify two paths that
# "overlap" in the sense that one is the ancestor of the other.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_multi_overlap_nested(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, a.b = :val2',
ExpressionAttributeValues={':val1': {'b': 7}, ':val2': 'there'})
test_table_s.put_item(Item={'p': p, 'a': {'b': {'c': 2}}})
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.b = :val1, a.b.c = :val2',
ExpressionAttributeValues={':val1': 'hi', ':val2': 'there'})
# In the previous test we saw that *modifying* the same item twice in the same
# update is forbidden; But it is allowed to *read* an item in the same update
# that also modifies it, and we check this here.
def test_update_expression_multi_with_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# "REMOVE a SET b = a" works: as noted in test_update_expression_set_copy()
# the value of 'a' is read before the actual REMOVE operation happens.
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = b REMOVE b')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 'hello'}
# Test case where a :val1 is referenced, without being defined
def test_update_expression_set_missing_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val2': 4})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1')
# It is forbidden for ExpressionAttributeValues to contain values not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeValues unused in expressions: keys: {:val1}"
def test_update_expression_spurious_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test case where a #name is referenced, without being defined
def test_update_expression_set_missing_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4},
ExpressionAttributeNames={'#wrongname': 'hello'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4})
# It is forbidden for ExpressionAttributeNames to contain names not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeNames unused in expressions: keys: {#b}"
def test_update_expression_spurious_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeNames={'#a': 'hello', '#b': 'hi'},
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test that the key attributes (hash key or sort key) cannot be modified
# by an update
def test_update_expression_cannot_modify_key(test_table):
p = random_string()
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET p = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET c = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE p')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE c')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD p :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD c :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE p :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE c :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
# As sanity check, verify we *can* modify a non-key column
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='SET a = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 4}
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE a')
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c}
# Test that trying to start an expression with some nonsense like HELLO
# instead of SET, REMOVE, ADD or DELETE, fails.
def test_update_expression_non_existant_clause(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='HELLO b = :val1',
ExpressionAttributeValues={':val1': 4})
# Test support for "SET a = :val1 + :val2", "SET a = :val1 - :val2"
# Only exactly these combinations work - e.g., it's a syntax error to
# try to add three. Trying to add a string fails.
def test_update_expression_plus_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 4, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 7}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 - :val2',
ExpressionAttributeValues={':val1': 5, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
# Only the addition of exactly two values is supported!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2 + :val3',
ExpressionAttributeValues={':val1': 4, ':val2': 3, ':val3': 2})
# Only numeric values can be added - other things like strings or lists
# cannot be added, and we get an error like "Incorrect operand type for
# operator or function; operator or function: +, operand type: S".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 'dog', ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
# While most of the Alternator code just saves high-precision numbers
# unchanged, the "+" and "-" operations need to calculate with them, and
# we should check the calculation isn't done with some lower-precision
# representation, e.g., double
def test_update_expression_plus_precision(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("10000000000000000000001")}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val2 - :val1',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("9999999999999999999999")}
# Test support for "SET a = b + :val2" et al., i.e., a version of the
# above test_update_expression_plus_basic with read before write.
def test_update_expression_plus_rmw(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = a + :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1 + a',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + a',
ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 10
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = b + a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 19
# Test the list_append() function in SET, for the most basic use case of
# concatenating two value references. Because this is the first test of
# functions in SET, we also test some generic features of how functions
# are parsed.
def test_update_expression_list_append_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': [4, 'hello', 'hi', 7]}
# Unlike the operation name "SET", function names are case-sensitive!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = LIST_APPEND(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
# As usual, spaces are ignored by the parser
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['b']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['a', 'b']}
# The list_append function only allows two parameters. The parser can
# correctly parse fewer or more, but then an error is generated: "Invalid
# UpdateExpression: Incorrect number of operands for operator or function;
# operator or function: list_append, number of operands: 1".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1)',
ExpressionAttributeValues={':val1': ['a']})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2, :val3)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': [7], ':val3': ['a']})
# If list_append is used on value which isn't a list, we get
# error: "Invalid UpdateExpression: Incorrect operand type for operator
# or function; operator or function: list_append, operand type: S"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': 'hi'})
# Additional list_append() tests, also using attribute paths as parameters
# (i.e., read-modify-write).
def test_update_expression_list_append(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': ['hi', 2]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==['hi', 2]
# Often, list_append is used to append items to a list attribute
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(a, :val1)',
ExpressionAttributeValues={':val1': [4, 'hello']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['hi', 2, 4, 'hello']
# But it can also be used to just concatenate in other ways:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, a)',
ExpressionAttributeValues={':val1': ['dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = list_append(a, :val1)',
ExpressionAttributeValues={':val1': ['cat']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == ['dog', 'hi', 2, 4, 'hello', 'cat']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = list_append(a, b)')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['c'] == ['dog', 'hi', 2, 4, 'hello', 'dog', 'hi', 2, 4, 'hello', 'cat']
# As usual, #references are allowed instead of inline names:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name1 = list_append(#name2,:val1)',
ExpressionAttributeValues={':val1': [8]},
ExpressionAttributeNames={'#name1': 'a', '#name2': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello', 8]
# Test the "if_not_exists" function in SET
# The test also checks additional features of function-call parsing.
def test_update_expression_if_not_exists(test_table_s):
p = random_string()
# Since attribute a doesn't exist, set it:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# Now the attribute does exist, so set does nothing:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# if_not_exists can also be used to check one attribute and set another,
# but note that if_not_exists(a, :val) means a's value if it exists,
# otherwise :val!
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 4
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 6})
# note how because 'a' does exist, its value is copied, overwriting b's
# value:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 2
# The parser expects function parameters to be value references, paths,
# or nested call to functions. Other crap will cause syntax errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(non@sense, :val1)',
ExpressionAttributeValues={':val1': 6})
# if_not_exists() requires that the first parameter be a path. However,
# the parser doesn't know this, and allows for a function parameter
# also a value reference or a function call. If try one of these other
# things the parser succeeds, but we get a later error, looking like:
# "Invalid UpdateExpression: Operator or function requires a document
# path; operator or function: if_not_exists"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(if_not_exists(a, :val2), :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
# Surprisingly, if the wrong argument is a :val value reference, the
# parser first tries to look it up in ExpressionAttributeValues (and
# fails if it's missing), before realizing any value reference would be
# wrong... So the following fails like the above does - but with a
# different error message (which we do not check here): "Invalid
# UpdateExpression: An expression attribute value used in expression
# is not defined; attribute value: :val2"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6})
# When the expression parser parses a function call f(value, value), each
# value may itself be a function call - ad infinitum. So expressions like
# list_append(if_not_exists(a, :val1), :val2) are legal and so is deeper
# nesting.
@pytest.mark.xfail(reason="for unknown reason, DynamoDB does not allow nesting list_append")
def test_update_expression_function_nesting(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['cat', 'dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog', '1', '2']
# I don't understand why the following expression isn't accepted, but it
# isn't! It produces a "Invalid UpdateExpression: The function is not
# allowed to be used this way in an expression; function: list_append".
# I don't know how to explain it. In any case, the *parsing* works -
# this is not a syntax error - the failure is in some verification later.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(:val1, :val2), :val3)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi']})
# Ditto, the following passes the parser but fails some later check with
# the same error message as above.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(list_append(:val1, :val2), :val3), :val4)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi'], ':val4': ['yo']})
# Verify how in SET expressions, "+" (or "-") nests with functions.
# We discover that f(x)+f(y) works but f(x+y) does NOT (results in a syntax
# error on the "+"). This means that the parser has two separate rules:
# 1. set_action: SET path = value + value
# 2. value: VALREF | NAME | NAME (value, ...)
def test_update_expression_function_plus_nesting(test_table_s):
p = random_string()
# As explained above, this - with "+" outside the expression, works:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(b, :val1)+:val2',
ExpressionAttributeValues={':val1': 2, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
# ...but this - with the "+" inside an expression parameter, is a syntax
# error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = if_not_exists(c, :val1+:val2)',
ExpressionAttributeValues={':val1': 5, ':val2': 4})
# This test tries to use an undefined function "f". This, obviously, fails,
# but where we to actually print the error we would see "Invalid
# UpdateExpression: Invalid function name; function: f". Not a syntax error.
# This means that the parser accepts any alphanumeric name as a function
# name, and only later use of this function fails because it's not one of
# the supported file.
def test_update_expression_unknown_function(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*f'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f(b,c,d)')
with pytest.raises(ClientError, match='ValidationException.*f123_hi'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f123_hi(b,c,d)')
# Just like unreferenced column names parsed by the DynamoDB parser,
# function names must also start with an alphabetic character. Trying
# to use _f as a function name will result with an actual syntax error,
# on the "_" token.
with pytest.raises(ClientError, match='ValidationException.*yntax error'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = _f(b,c,d)')
# Test "ADD" operation for numbers
def test_update_expression_add_numbers(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 3, 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 7
# If the value to be added isn't a number, we get an error like "Invalid
# UpdateExpression: Incorrect operand type for operator or function;
# operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Similarly, if the attribute we're adding to isn't a number, we get an
# error like "An operand in the update expression has an incorrect data
# type"
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD b :val1',
ExpressionAttributeValues={':val1': 1})
# Test "ADD" operation for sets
def test_update_expression_add_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog', 'cat', 'mouse', 'pig'])
# TODO: right now this test won't detect duplicated values in the returned result,
# because boto3 parses a set out of the returned JSON anyway. This check should leverage
# lower level API (if exists) to ensure that the JSON contains no duplicates
# in the set representation. It has been verified manually.
test_table_s.put_item(Item={'p': p, 'a': set(['beaver', 'lynx', 'coati']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['coati', 'beaver', 'badger'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['beaver', 'badger', 'lynx', 'coati'])
# The value to be added needs to be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Test "DELETE" operation for sets
def test_update_expression_delete_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# Deleting an element not present in the set is not an error - it just
# does nothing
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# The value to be deleted must be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: DELETE, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': 'hello'})
######## Tests for paths and nested attribute updates:
# A dot inside a name in ExpressionAttributeNames is a literal dot, and
# results in a top-level attribute with an actual dot in its name - not
# a nested attribute path.
def test_update_expression_dot_in_name(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeValues={':val1': 3},
ExpressionAttributeNames={'#a': 'a.b'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}
# A basic test for direct update of a nested attribute: One of the top-level
# attributes is itself a document, and we update only one of that document's
# nested attributes.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}, 'd': 5}
# Of course we can also add new nested attributes, not just modify
# existing ones:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.d = :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7, 'd': 3}, 'd': 5}
# Similar test, for a list: one of the top-level attributes is a list, we
# can update one of its items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_index(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[1] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'hello', 'three']}
# Test that just like happens in top-level attributes, also in nested
# attributes, setting them replaces the old value - potentially an entire
# nested document, by the whole value (which may have a different type)
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_different_type(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': {'one': 1, 'two': 2}}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}}
# Yet another test of a nested attribute update. This one uses deeper
# level of nesting (dots and indexes), adds #name references to the mix.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_deep(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c[1].#name.y[1] = :val1',
ExpressionAttributeValues={':val1': 9}, ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 9, 7]}}]}
# A deep path can also appear on the right-hand-side of an assignment
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.z = a.c[1].#name.y[1]',
ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a']['z'] == 9
# A REMOVE operation can be used to remove nested attributes, and also
# individual list items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}, 'q': 2}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a.c[1].x.y[1], a.c[1].q')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 7]}}]}
# The DynamoDB documentation specifies: "When you use SET to update a list
# element, the contents of that element are replaced with the new data that
# you specify. If the element does not already exist, SET will append the
# new element at the end of the list."
# So if we take a three-element list a[7], and set a[7], the new element
# will be put at the end of the list, not position 7 specifically.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_array_out_of_bounds(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[7] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello']}
# The DynamoDB documentation also says: "If you add multiple elements
# in a single SET operation, the elements are sorted in order by element
# number.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[84] = :val1, a[37] = :val2',
ExpressionAttributeValues={':val1': 'a1', ':val2': 'a2'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello', 'a2', 'a1']}
# Test what happens if we try to write to a.b, which would only make sense if
# a were a nested document, but a doesn't exist, or exists and is NOT a nested
# document but rather a scalar or list or something.
# DynamoDB actually detects this case and prints an error:
# ClientError: An error occurred (ValidationException) when calling the
# UpdateItem operation: The document path provided in the update expression
# is invalid for update
# Because Scylla doesn't read before write, it cannot detect this as an error,
# so we'll probably want to allow for that possibility as well.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': ['hi']})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c.c = :val1',
ExpressionAttributeValues={':val1': 7})
# Similarly for other types of bad paths - using [0] on something which
# isn't an array,
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_array(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[0] = :val1',
ExpressionAttributeValues={':val1': 7})

141
alternator-test/util.py Normal file
View File

@@ -0,0 +1,141 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Various utility functions which are useful for multiple tests
import string
import random
import collections
import time
def random_string(length=10, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for x in range(length))
def random_bytes(length=10):
return bytearray(random.getrandbits(8) for _ in range(length))
# Utility functions for scan and query into an array of items:
# TODO: add to full_scan and full_query by default ConsistentRead=True, as
# it's not useful for tests without it!
def full_scan(table, **kwargs):
response = table.scan(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# full_scan_and_count returns both items and count as returned by the server.
# Note that count isn't simply len(items) - the server returns them
# independently. e.g., with Select='COUNT' the items are not returned, but
# count is.
def full_scan_and_count(table, **kwargs):
response = table.scan(**kwargs)
items = []
count = 0
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
return (count, items)
# Utility function for fetching the entire results of a query into an array of items
def full_query(table, **kwargs):
response = table.query(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# To compare two lists of items (each is a dict) without regard for order,
# "==" is not good enough because it will fail if the order is different.
# The following function, multiset() converts the list into a multiset
# (set with duplicates) where order doesn't matter, so the multisets can
# be compared.
def freeze(item):
if isinstance(item, dict):
return frozenset((key, freeze(value)) for key, value in item.items())
elif isinstance(item, list):
return tuple(freeze(value) for value in item)
return item
def multiset(items):
return collections.Counter([freeze(item) for item in items])
test_table_prefix = 'alternator_test_'
def test_table_name():
current_ms = int(round(time.time() * 1000))
# In the off chance that test_table_name() is called twice in the same millisecond...
if test_table_name.last_ms >= current_ms:
current_ms = test_table_name.last_ms + 1
test_table_name.last_ms = current_ms
return test_table_prefix + str(current_ms)
test_table_name.last_ms = 0
def create_test_table(dynamodb, **kwargs):
name = test_table_name()
print("fixture creating new table {}".format(name))
table = dynamodb.create_table(TableName=name,
BillingMode='PAY_PER_REQUEST', **kwargs)
waiter = table.meta.client.get_waiter('table_exists')
# recheck every second instead of the default, lower, frequency. This can
# save a few seconds on AWS with its very slow table creation, but can
# more on tests on Scylla with its faster table creation turnaround.
waiter.config.delay = 1
waiter.config.max_attempts = 200
waiter.wait(TableName=name)
return table
# DynamoDB's ListTables request returns up to a single page of table names
# (e.g., up to 100) and it is up to the caller to call it again and again
# to get the next page. This is a utility function which calls it repeatedly
# as much as necessary to get the entire list.
# We deliberately return a list and not a set, because we want the caller
# to be able to recognize bugs in ListTables which causes the same table
# to be returned twice.
def list_tables(dynamodb, limit=100):
ret = []
pos = None
while True:
if pos:
page = dynamodb.meta.client.list_tables(Limit=limit, ExclusiveStartTableName=pos);
else:
page = dynamodb.meta.client.list_tables(Limit=limit);
results = page.get('TableNames', None)
assert(results)
ret = ret + results
newpos = page.get('LastEvaluatedTableName', None)
if not newpos:
break;
# It doesn't make sense for Dynamo to tell us we need more pages, but
# not send anything in *this* page!
assert len(results) > 0
assert newpos != pos
# Note that we only checked that we got back tables, not that we got
# any new tables not already in ret. So a buggy implementation might
# still cause an endless loop getting the same tables again and again.
pos = newpos
return ret

147
alternator/auth.cc Normal file
View File

@@ -0,0 +1,147 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/error.hh"
#include "log.hh"
#include <string>
#include <string_view>
#include <gnutls/crypto.h>
#include <seastar/util/defer.hh>
#include "hashers.hh"
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/roles-metadata.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
namespace alternator {
static logging::logger alogger("alternator-auth");
static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
hmac_sha256_digest digest;
int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
if (ret) {
throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
}
return digest;
}
static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
return signing;
}
static std::string apply_sha256(std::string_view msg) {
sha256_hasher hasher;
hasher.update(msg.data(), msg.size());
return to_hex(hasher.finalize());
}
static std::string format_time_point(db_clock::time_point tp) {
time_t time_point_repr = db_clock::to_time_t(tp);
std::string time_point_str;
time_point_str.resize(17);
::tm time_buf;
// strftime prints the terminating null character as well
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));
time_point_str.resize(16);
return time_point_str;
}
void check_expiry(std::string_view signature_date) {
//FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
std::string expiration_str = format_time_point(db_clock::now() - 15min);
std::string validity_str = format_time_point(db_clock::now() + 15min);
if (signature_date < expiration_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
signature_date, expiration_str));
}
if (signature_date > validity_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
signature_date, validity_str));
}
}
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
auto amz_date_it = signed_headers_map.find("x-amz-date");
if (amz_date_it == signed_headers_map.end()) {
throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");
}
std::string_view amz_date = amz_date_it->second;
check_expiry(amz_date);
std::string_view datestamp = amz_date.substr(0, 8);
if (datestamp != orig_datestamp) {
throw api_error("InvalidSignatureException",
format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
orig_datestamp, datestamp));
}
std::string_view canonical_uri = "/";
std::stringstream canonical_headers;
for (const auto& header : signed_headers_map) {
canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
}
std::string payload_hash = apply_sha256(body_content);
std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
std::string_view algorithm = "AWS4-HMAC-SHA256";
std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope, apply_sha256(canonical_request));
hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
}
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);
auto cl = auth::password_authenticator::consistency_for_user(username);
auto timeout = auth::internal_distributed_timeout_config();
return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (res->empty()) {
throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));
}
salted_hash = res->one().get_opt<sstring>("salted_hash");
if (!salted_hash) {
throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));
}
return make_ready_future<std::string>(*salted_hash);
});
}
}

46
alternator/auth.hh Normal file
View File

@@ -0,0 +1,46 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include <array>
#include "gc_clock.hh"
#include "utils/loading_cache.hh"
namespace cql3 {
class query_processor;
}
namespace alternator {
using hmac_sha256_digest = std::array<char, 32>;
using key_cache = utils::loading_cache<std::string, std::string>;
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);
}

111
alternator/base64.cc Normal file
View File

@@ -0,0 +1,111 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values
// be encoded in the JSON API as base64-encoded strings. This is code to
// convert byte arrays to base64-encoded strings, and back.
#include "base64.hh"
#include <ctype.h>
// Arrays for quickly converting to and from an integer between 0 and 63,
// and the character used in base64 encoding to represent it.
static class base64_chars {
public:
static constexpr const char* to =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int8_t from[255];
base64_chars() {
static_assert(strlen(to) == 64);
for (int i = 0; i < 255; i++) {
from[i] = 255; // signal invalid character
}
for (int i = 0; i < 64; i++) {
from[(unsigned) to[i]] = i;
}
}
} base64_chars;
std::string base64_encode(bytes_view in) {
std::string ret;
ret.reserve(((4 * in.size() / 3) + 3) & ~3);
int i = 0;
unsigned char chunk3[3]; // chunk of input
for (auto byte : in) {
chunk3[i++] = byte;
if (i == 3) {
ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
ret += base64_chars.to[ chunk3[2] & 0x3f ];
i = 0;
}
}
if (i) {
// i can be 1 or 2.
for(int j = i; j < 3; j++)
chunk3[j] = '\0';
ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
if (i == 2) {
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
} else {
ret += '=';
}
ret += '=';
}
return ret;
}
bytes base64_decode(std::string_view in) {
int i = 0;
int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;
std::string ret;
ret.reserve(in.size() * 3 / 4);
for (unsigned char c : in) {
uint8_t dc = base64_chars.from[c];
if (dc == 255) {
// Any unexpected character, include the "=" character usually
// used for padding, signals the end of the decode.
break;
}
chunk4[i++] = dc;
if (i == 4) {
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];
i = 0;
}
}
if (i) {
// i can be 2 or 3, meaning 1 or 2 more output characters
if (i>=2)
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
if (i==3)
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
}
// FIXME: This copy is sad. The problem is we need back "bytes"
// but "bytes" doesn't have efficient append and std::string.
// To fix this we need to use bytes' "uninitialized" feature.
return bytes(ret.begin(), ret.end());
}

34
alternator/base64.hh Normal file
View File

@@ -0,0 +1,34 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string_view>
#include "bytes.hh"
#include "rjson.hh"
std::string base64_encode(bytes_view);
bytes base64_decode(std::string_view);
inline bytes base64_decode(const rjson::value& v) {
return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
}

564
alternator/conditions.cc Normal file
View File

@@ -0,0 +1,564 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <list>
#include <map>
#include <string_view>
#include "alternator/conditions.hh"
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "rjson.hh"
#include "serialization.hh"
#include "base64.hh"
#include <stdexcept>
namespace alternator {
static logging::logger clogger("alternator-conditions");
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {
static std::unordered_map<std::string, comparison_operator_type> ops = {
{"EQ", comparison_operator_type::EQ},
{"NE", comparison_operator_type::NE},
{"LE", comparison_operator_type::LE},
{"LT", comparison_operator_type::LT},
{"GE", comparison_operator_type::GE},
{"GT", comparison_operator_type::GT},
{"IN", comparison_operator_type::IN},
{"NULL", comparison_operator_type::IS_NULL},
{"NOT_NULL", comparison_operator_type::NOT_NULL},
{"BETWEEN", comparison_operator_type::BETWEEN},
{"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
{"CONTAINS", comparison_operator_type::CONTAINS},
{"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
};
if (!comparison_operator.IsString()) {
throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error("ValidationException", format("Unsupported comparison operator {}", op));
}
return it->second;
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {
bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));
auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));
bytes raw_value = serialize_item(value);
auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {
bytes raw_value = get_key_from_typed_value(value, cdef, type_to_string(cdef.type));
auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));
}
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {
clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));
auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);
for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {
std::string_view column_name(it->name.GetString(), it->name.GetStringLength());
const rjson::value& condition = it->value;
const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");
const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");
comparison_operator_type op = get_comparison_operator(comp_definition);
if (op != comparison_operator_type::EQ) {
throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");
}
if (attr_list.Size() != 1) {
throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));
}
if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {
// Primary key restriction
filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);
} else {
// Regular column restriction
filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);
}
}
return filtering_restrictions;
}
namespace {
struct size_check {
// True iff size passes this check.
virtual bool operator()(rapidjson::SizeType size) const = 0;
// Check description, such that format("expected array {}", check.what()) is human-readable.
virtual sstring what() const = 0;
};
class exact_size : public size_check {
rapidjson::SizeType _expected;
public:
explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}
bool operator()(rapidjson::SizeType size) const override { return size == _expected; }
sstring what() const override { return format("of size {}", _expected); }
};
struct empty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size < 1; }
sstring what() const override { return "to be empty"; }
};
struct nonempty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size > 0; }
sstring what() const override { return "to be non-empty"; }
};
} // anonymous namespace
// Check that array has the expected number of elements
static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {
if (!array || !array->IsArray()) {
throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
}
if (!expected(array->Size())) {
throw api_error("ValidationException",
format("{} operator requires AttributeValueList {}, instead found list size {}",
op, expected.what(), array->Size()));
}
}
struct rjson_engaged_ptr_comp {
bool operator()(const rjson::value* p1, const rjson::value* p2) const {
return rjson::single_value_comp()(*p1, *p2);
}
};
// It's not enough to compare underlying JSON objects when comparing sets,
// as internally they're stored in an array, and the order of elements is
// not important in set equality. See issue #5021
static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {
if (set1.Size() != set2.Size()) {
return false;
}
std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;
for (auto it = set1.Begin(); it != set1.End(); ++it) {
set1_raw.insert(&*it);
}
for (const auto& a : set2.GetArray()) {
if (set1_raw.count(&a) == 0) {
return false;
}
}
return true;
}
// Check if two JSON-encoded values match with the EQ relation
static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {
return check_EQ_for_sets(it1->value, it2->value);
}
}
return *v1 == v2;
}
// Check if two JSON-encoded values match with the NE relation
static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
return !v1 || *v1 != v2; // null is unequal to anything.
}
// Check if two JSON-encoded values match with the BEGINS_WITH relation
static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
// BEGINS_WITH requires that its single operand (v2) be a string or
// binary - otherwise it's a validation error. However, problems with
// the stored attribute (v1) will just return false (no match).
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
}
auto it2 = v2.MemberBegin();
if (it2->name != "S" && it2->name != "B") {
throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
auto it1 = v1->MemberBegin();
if (it1->name != it2->name) {
return false;
}
if (it2->name == "S") {
std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
return val1.substr(0, val2.size()) == val2;
} else /* it2->name == "B" */ {
// TODO (optimization): Check the begins_with condition directly on
// the base64-encoded string, without making a decoded copy.
bytes val1 = base64_decode(it1->value);
bytes val2 = base64_decode(it2->value);
return val1.substr(0, val2.size()) == val2;
}
}
static std::string_view to_string_view(const rjson::value& v) {
return std::string_view(v.GetString(), v.GetStringLength());
}
static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {
return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");
}
// Check if two JSON-encoded values match with the CONTAINS relation
static bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", kv2.name));
}
if (kv1.name == "S" && kv2.name == "S") {
return to_string_view(kv1.value).find(to_string_view(kv2.value)) != std::string_view::npos;
} else if (kv1.name == "B" && kv2.name == "B") {
return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;
} else if (is_set_of(kv1.name, kv2.name)) {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (*i == kv2.value) {
return true;
}
}
} else if (kv1.name == "L") {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (!i->IsObject() || i->MemberCount() != 1) {
clogger.error("check_CONTAINS received a list whose element is malformed");
return false;
}
const auto& el = *i->MemberBegin();
if (el.name == kv2.name && el.value == kv2.value) {
return true;
}
}
}
return false;
}
// Check if two JSON-encoded values match with the NOT_CONTAINS relation
static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
return !check_CONTAINS(v1, v2);
}
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
static bool check_IN(const rjson::value* val, const rjson::value& array) {
if (!array[0].IsObject() || array[0].MemberCount() != 1) {
throw api_error("ValidationException",
format("IN operator encountered malformed AttributeValue: {}", array[0]));
}
const auto& type = array[0].MemberBegin()->name;
if (type != "S" && type != "N" && type != "B") {
throw api_error("ValidationException",
"IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
}
if (!val) {
return false;
}
bool have_match = false;
for (const auto& elem : array.GetArray()) {
if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
throw api_error("ValidationException",
"IN operator requires all AttributeValueList elements to have the same type ");
}
if (!have_match && *val == elem) {
// Can't return yet, must check types of all array elements. <sigh>
have_match = true;
}
}
return have_match;
}
static bool check_NULL(const rjson::value* val) {
return val == nullptr;
}
static bool check_NOT_NULL(const rjson::value* val) {
return val != nullptr;
}
// Check if two JSON-encoded values match with cmp.
template <typename Comparator>
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
if (kv1.name != kv2.name) {
return false;
}
if (kv1.name == "N") {
return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));
}
if (kv1.name == "S") {
return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
}
if (kv1.name == "B") {
return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
}
clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
return false;
}
struct cmp_lt {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
static constexpr const char* diagnostic = "LT operator";
};
struct cmp_le {
// bytes only has <, so we cannot use <=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs || lhs == rhs; }
static constexpr const char* diagnostic = "LE operator";
};
struct cmp_ge {
// bytes only has <, so we cannot use >=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs || lhs == rhs; }
static constexpr const char* diagnostic = "GE operator";
};
struct cmp_gt {
// bytes only has <, so we cannot use >.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }
static constexpr const char* diagnostic = "GT operator";
};
// True if v is between lb and ub, inclusive. Throws if lb > ub.
template <typename T>
bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
if (ub < lb) {
throw api_error("ValidationException",
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
}
return cmp_ge()(v, lb) && cmp_le()(v, ub);
}
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
if (!v) {
return false;
}
if (!v->IsObject() || v->MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
}
if (!lb.IsObject() || lb.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
}
if (!ub.IsObject() || ub.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
}
const auto& kv_v = *v->MemberBegin();
const auto& kv_lb = *lb.MemberBegin();
const auto& kv_ub = *ub.MemberBegin();
if (kv_lb.name != kv_ub.name) {
throw api_error(
"ValidationException",
format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
kv_lb.name, kv_ub.name));
}
if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
return false;
}
if (kv_v.name == "N") {
const char* diag = "BETWEEN operator";
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
}
if (kv_v.name == "S") {
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
}
if (kv_v.name == "B") {
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
}
throw api_error("ValidationException",
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
kv_lb.name));
}
// Verify one Expect condition on one attribute (whose content is "got")
// for the verify_expected() below.
// This function returns true or false depending on whether the condition
// succeeded - it does not throw ConditionalCheckFailedException.
// However, it may throw ValidationException on input validation errors.
static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {
const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");
const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");
const rjson::value* value = rjson::find(condition, "Value");
const rjson::value* exists = rjson::find(condition, "Exists");
// There are three types of conditions that Expected supports:
// A value, not-exists, and a comparison of some kind. Each allows
// and requires a different combinations of parameters in the request
if (value) {
if (exists && (!exists->IsBool() || exists->GetBool() != true)) {
throw api_error("ValidationException", "Cannot combine Value with Exists!=true");
}
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");
}
return check_EQ(got, *value);
} else if (exists) {
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");
}
if (!exists->IsBool() || exists->GetBool() != false) {
throw api_error("ValidationException", "Exists!=false requires Value");
}
// Remember Exists=false, so we're checking that the attribute does *not* exist:
return !got;
} else {
if (!comparison_operator) {
throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");
}
comparison_operator_type op = get_comparison_operator(*comparison_operator);
switch (op) {
case comparison_operator_type::EQ:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_EQ(got, (*attribute_value_list)[0]);
case comparison_operator_type::NE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NE(got, (*attribute_value_list)[0]);
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
case comparison_operator_type::LE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_le{});
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
case comparison_operator_type::GE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
case comparison_operator_type::IN:
verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
return check_IN(got, *attribute_value_list);
case comparison_operator_type::IS_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NULL(got);
case comparison_operator_type::NOT_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NOT_NULL(got);
case comparison_operator_type::BETWEEN:
verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
case comparison_operator_type::CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_CONTAINS(got, (*attribute_value_list)[0]);
case comparison_operator_type::NOT_CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);
}
throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));
}
}
// Verify that the existing values of the item (previous_item) match the
// conditions given by the Expected and ConditionalOperator parameters
// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
// This function will throw a ConditionalCheckFailedException API error
// if the values do not match the condition, or ValidationException if there
// are errors in the format of the condition itself.
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {
const rjson::value* expected = rjson::find(req, "Expected");
if (!expected) {
return;
}
if (!expected->IsObject()) {
throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");
}
// ConditionalOperator can be "AND" for requiring all conditions, or
// "OR" for requiring one condition, and defaults to "AND" if missing.
const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
bool require_all = true;
if (conditional_operator) {
if (!conditional_operator->IsString()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
}
std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());
if (s == "AND") {
// require_all is already true
} else if (s == "OR") {
require_all = false;
} else {
throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");
}
if (expected->GetObject().ObjectEmpty()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");
}
}
for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {
const rjson::value* got = nullptr;
if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {
got = rjson::find((*previous_item)["Item"], rjson::string_ref_type(it->name.GetString()));
}
bool success = verify_expected_one(it->value, got);
if (success && !require_all) {
// When !require_all, one success is enough!
return;
} else if (!success && require_all) {
// When require_all, one failure is enough!
throw api_error("ConditionalCheckFailedException", "Failed condition.");
}
}
// If we got here and require_all, none of the checks failed, so succeed.
// If we got here and !require_all, all of the checks failed, so fail.
if (!require_all) {
throw api_error("ConditionalCheckFailedException", "None of ORed Expect conditions were successful.");
}
}
}

49
alternator/conditions.hh Normal file
View File

@@ -0,0 +1,49 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* This file contains definitions and functions related to placing conditions
* on Alternator queries (equivalent of CQL's restrictions).
*
* With conditions, it's possible to add criteria to selection requests (Scan, Query)
* and use them for narrowing down the result set, by means of filtering or indexing.
*
* Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
*/
#pragma once
#include "cql3/restrictions/statement_restrictions.hh"
#include "serialization.hh"
namespace alternator {
enum class comparison_operator_type {
EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
};
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);
}

50
alternator/error.hh Normal file
View File

@@ -0,0 +1,50 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
namespace alternator {
// DynamoDB's error messages are described in detail in
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser
// HTTP code (almost always, 400), and a human readable message. Eventually these
// will be wrapped into a JSON object returned to the client.
class api_error : public std::exception {
public:
using status_type = httpd::reply::status_type;
status_type _http_code;
std::string _type;
std::string _msg;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
: _http_code(std::move(http_code))
, _type(std::move(type))
, _msg(std::move(msg))
{ }
api_error() = default;
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}

2275
alternator/executor.cc Normal file

File diff suppressed because it is too large Load Diff

71
alternator/executor.hh Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include <seastar/json/json_elements.hh>
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
#include "service/client_state.hh"
#include "stats.hh"
namespace alternator {
class executor {
service::storage_proxy& _proxy;
service::migration_manager& _mm;
public:
using client_state = service::client_state;
stats _stats;
static constexpr auto ATTRS_COLUMN_NAME = ":attrs";
static constexpr auto KEYSPACE_NAME = "alternator";
executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}
future<json::json_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> list_tables(client_state& client_state, std::string content);
future<json::json_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);
future<json::json_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<> start();
future<> stop() { return make_ready_future<>(); }
future<> maybe_create_keyspace();
static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
};
}

98
alternator/expressions.cc Normal file
View File

@@ -0,0 +1,98 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "expressions.hh"
#include "alternator/expressionsLexer.hpp"
#include "alternator/expressionsParser.hpp"
#include <seastarx.hh>
#include <seastar/core/print.hh>
#include <seastar/util/log.hh>
#include <functional>
namespace alternator {
template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
Result do_with_parser(std::string input, Func&& f) {
expressionsLexer::InputStreamType input_stream{
reinterpret_cast<const ANTLR_UINT8*>(input.data()),
ANTLR_ENC_UTF8,
static_cast<ANTLR_UINT32>(input.size()),
nullptr };
expressionsLexer lexer(&input_stream);
expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());
expressionsParser parser(&tstream);
auto result = f(parser);
return result;
}
parsed::update_expression
parse_update_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::update_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));
}
}
std::vector<parsed::path>
parse_projection_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::projection_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));
}
}
template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
namespace parsed {
void update_expression::add(update_expression::action a) {
std::visit(overloaded {
[&] (action::set&) { seen_set = true; },
[&] (action::remove&) { seen_remove = true; },
[&] (action::add&) { seen_add = true; },
[&] (action::del&) { seen_del = true; }
}, a._action);
_actions.push_back(std::move(a));
}
void update_expression::append(update_expression other) {
if ((seen_set && other.seen_set) ||
(seen_remove && other.seen_remove) ||
(seen_add && other.seen_add) ||
(seen_del && other.seen_del)) {
throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");
}
std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));
seen_set |= other.seen_set;
seen_remove |= other.seen_remove;
seen_add |= other.seen_add;
seen_del |= other.seen_del;
}
} // namespace parsed
} // namespace alternator

214
alternator/expressions.g Normal file
View File

@@ -0,0 +1,214 @@
/*
* Copyright 2019 ScyllaDB
*
* This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
* top-level directory for licensing information.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* The DynamoDB protocol is based on JSON, and most DynamoDB requests
* describe the operation and its parameters via JSON objects such as maps
* and lists. Nevertheless, in some types of requests an "expression" is
* passed as a single string, and we need to parse this string. These
* cases include:
* 1. Attribute paths, such as "a[3].b.c", are used in projection
* expressions as well as inside other expressions described below.
* 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
* used in conditional updates, filters, and other places.
* 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
*
* All these expression syntaxes are very simple: Most of them could be
* parsed as regular expressions, and the parenthesized condition expression
* could be done with a simple hand-written lexical analyzer and recursive-
* descent parser. Nevertheless, we decided to specify these parsers in the
* ANTLR3 language already used in the Scylla project, hopefully making these
* parsers easier to reason about, and easier to change if needed - and
* reducing the amount of boiler-plate code.
*/
grammar expressions;
options {
language = Cpp;
}
@parser::namespace{alternator}
@lexer::namespace{alternator}
/* TODO: explain what these traits things are. I haven't seen them explained
* in any document... Compilation fails without these fail because a definition
* of "expressionsLexerTraits" and "expressionParserTraits" is needed.
*/
@lexer::traits {
class expressionsLexer;
class expressionsParser;
typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
}
@parser::traits {
typedef expressionsLexerTraits expressionsParserTraits;
}
@lexer::header {
#include "alternator/expressions.hh"
// ANTLR generates a bunch of unused variables and functions. Yuck...
#pragma GCC diagnostic ignored "-Wunused-variable"
#pragma GCC diagnostic ignored "-Wunused-function"
}
@parser::header {
#include "expressionsLexer.hpp"
}
/* By default, ANTLR3 composes elaborate syntax-error messages, saying which
* token was unexpected, where, and so on on, but then dutifully writes these
* error messages to the standard error, and returns from the parser as if
* everything was fine, with a half-constructed output object! If we define
* the "displayRecognitionError" method, it will be called upon to build this
* error message, and we can instead throw an exception to stop the parsing
* immediately. This is good enough for now, for our simple needs, but if
* we ever want to show more information about the syntax error, Cql3.g
* contains an elaborate implementation (it would be nice if we could reuse
* it, not duplicate it).
* Unfortunately, we have to repeat the same definition twice - once for the
* parser, and once for the lexer.
*/
@parser::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
/*
* Lexical analysis phase, i.e., splitting the input up to tokens.
* Lexical analyzer rules have names starting in capital letters.
* "fragment" rules do not generate tokens, and are just aliases used to
* make other rules more readable.
* Characters *not* listed here, e.g., '=', '(', etc., will be handled
* as individual tokens on their own right.
* Whitespace spans are skipped, so do not generate tokens.
*/
WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
/* shortcuts for case-insensitive keywords */
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
/* These keywords must be appear before the generic NAME token below,
* because NAME matches too, and the first to match wins.
*/
SET: S E T;
REMOVE: R E M O V E;
ADD: A D D;
DELETE: D E L E T E;
fragment ALPHA: 'A'..'Z' | 'a'..'z';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA | DIGIT | '_';
INTEGER: DIGIT+;
NAME: ALPHA ALNUM*;
NAMEREF: '#' ALNUM+;
VALREF: ':' ALNUM+;
/*
* Parsing phase - parsing the string of tokens generated by the lexical
* analyzer defined above.
*/
path_component: NAME | NAMEREF;
path returns [parsed::path p]:
root=path_component { $p.set_root($root.text); }
( '.' name=path_component { $p.add_dot($name.text); }
| '[' INTEGER ']' { $p.add_index(std::stoi($INTEGER.text)); }
)*;
update_expression_set_value returns [parsed::value v]:
VALREF { $v.set_valref($VALREF.text); }
| path { $v.set_path($path.p); }
| NAME { $v.set_func_name($NAME.text); }
'(' x=update_expression_set_value { $v.add_func_parameter($x.v); }
(',' x=update_expression_set_value { $v.add_func_parameter($x.v); })*
')'
;
update_expression_set_rhs returns [parsed::set_rhs rhs]:
v=update_expression_set_value { $rhs.set_value(std::move($v.v)); }
( '+' v=update_expression_set_value { $rhs.set_plus(std::move($v.v)); }
| '-' v=update_expression_set_value { $rhs.set_minus(std::move($v.v)); }
)?
;
update_expression_set_action returns [parsed::update_expression::action a]:
path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
update_expression_remove_action returns [parsed::update_expression::action a]:
path { $a.assign_remove($path.p); };
update_expression_add_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_add($path.p, $VALREF.text); };
update_expression_delete_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_del($path.p, $VALREF.text); };
update_expression_clause returns [parsed::update_expression e]:
SET s=update_expression_set_action { $e.add(s); }
(',' s=update_expression_set_action { $e.add(s); })*
| REMOVE r=update_expression_remove_action { $e.add(r); }
(',' r=update_expression_remove_action { $e.add(r); })*
| ADD a=update_expression_add_action { $e.add(a); }
(',' a=update_expression_add_action { $e.add(a); })*
| DELETE d=update_expression_delete_action { $e.add(d); }
(',' d=update_expression_delete_action { $e.add(d); })*
;
// Note the "EOF" token at the end of the update expression. We want to the
// parser to match the entire string given to it - not just its beginning!
update_expression returns [parsed::update_expression e]:
(update_expression_clause { e.append($update_expression_clause.e); })* EOF;
projection_expression returns [std::vector<parsed::path> v]:
p=path { $v.push_back(std::move($p.p)); }
(',' p=path { $v.push_back(std::move($p.p)); } )* EOF;

41
alternator/expressions.hh Normal file
View File

@@ -0,0 +1,41 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <stdexcept>
#include <vector>
#include "expressions_types.hh"
namespace alternator {
class expressions_syntax_error : public std::runtime_error {
public:
using runtime_error::runtime_error;
};
parsed::update_expression parse_update_expression(std::string query);
std::vector<parsed::path> parse_projection_expression(std::string query);
} /* namespace alternator */

View File

@@ -0,0 +1,166 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include <string>
#include <variant>
/*
* Parsed representation of expressions and their components.
*
* Types in alternator::parse namespace are used for holding the parse
* tree - objects generated by the Antlr rules after parsing an expression.
* Because of the way Antlr works, all these objects are default-constructed
* first, and then assigned when the rule is completed, so all these types
* have only default constructors - but setter functions to set them later.
*/
namespace alternator {
namespace parsed {
// "path" is an attribute's path in a document, e.g., a.b[3].c.
class path {
// All paths have a "root", a top-level attribute, and any number of
// "dereference operators" - each either an index (e.g., "[2]") or a
// dot (e.g., ".xyz").
std::string _root;
std::vector<std::variant<std::string, unsigned>> _operators;
public:
void set_root(std::string root) {
_root = std::move(root);
}
void add_index(unsigned i) {
_operators.emplace_back(i);
}
void add_dot(std::string(name)) {
_operators.emplace_back(std::move(name));
}
const std::string& root() const {
return _root;
}
bool has_operators() const {
return !_operators.empty();
}
};
// "value" is is a value used in the right hand side of an assignment
// expression, "SET a = ...". It can be a reference to a value included in
// the request (":val"), a path to an attribute from the existing item
// (e.g., "a.b[3].c"), or a function of other such values.
// Note that the real right-hand-side of an assignment is actually a bit
// more general - it allows either a value, or a value+value or value-value -
// see class set_rhs below.
struct value {
struct function_call {
std::string _function_name;
std::vector<value> _parameters;
};
std::variant<std::string, path, function_call> _value;
void set_valref(std::string s) {
_value = std::move(s);
}
void set_path(path p) {
_value = std::move(p);
}
void set_func_name(std::string s) {
_value = function_call {std::move(s), {}};
}
void add_func_parameter(value v) {
std::get<function_call>(_value)._parameters.emplace_back(std::move(v));
}
};
// The right-hand-side of a SET in an update expression can be either a
// single value (see above), or value+value, or value-value.
class set_rhs {
public:
char _op; // '+', '-', or 'v''
value _v1;
value _v2;
void set_value(value&& v1) {
_op = 'v';
_v1 = std::move(v1);
}
void set_plus(value&& v2) {
_op = '+';
_v2 = std::move(v2);
}
void set_minus(value&& v2) {
_op = '-';
_v2 = std::move(v2);
}
};
class update_expression {
public:
struct action {
path _path;
struct set {
set_rhs _rhs;
};
struct remove {
};
struct add {
std::string _valref;
};
struct del {
std::string _valref;
};
std::variant<set, remove, add, del> _action;
void assign_set(path p, set_rhs rhs) {
_path = std::move(p);
_action = set { std::move(rhs) };
}
void assign_remove(path p) {
_path = std::move(p);
_action = remove { };
}
void assign_add(path p, std::string v) {
_path = std::move(p);
_action = add { std::move(v) };
}
void assign_del(path p, std::string v) {
_path = std::move(p);
_action = del { std::move(v) };
}
};
private:
std::vector<action> _actions;
bool seen_set = false;
bool seen_remove = false;
bool seen_add = false;
bool seen_del = false;
public:
void add(action a);
void append(update_expression other);
bool empty() const {
return _actions.empty();
}
const std::vector<action>& actions() const {
return _actions;
}
};
} // namespace parsed
} // namespace alternator

172
alternator/rjson.cc Normal file
View File

@@ -0,0 +1,172 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "rjson.hh"
#include "error.hh"
#include <seastar/core/print.hh>
namespace rjson {
static allocator the_allocator;
std::string print(const rjson::value& value) {
string_buffer buffer;
writer writer(buffer);
value.Accept(writer);
return std::string(buffer.GetString());
}
rjson::value copy(const rjson::value& value) {
return rjson::value(value, the_allocator);
}
rjson::value parse(const std::string& str) {
return parse_raw(str.c_str(), str.size());
}
rjson::value parse_raw(const char* c_str, size_t size) {
rjson::document d;
d.Parse(c_str, size);
if (d.HasParseError()) {
throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));
}
rjson::value& v = d;
return std::move(v);
}
rjson::value& get(rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
rjson::value from_string(const std::string& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const sstring& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const char* str, size_t size) {
return rjson::value(str, size, the_allocator);
}
const rjson::value* find(const rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
rjson::value* find(rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {
base.AddMember(name, std::move(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {
base.AddMember(name, rjson::value(member), the_allocator);
}
void push_back(rjson::value& base_array, rjson::value&& item) {
base_array.PushBack(std::move(item), the_allocator);
}
bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r2) const {
auto r1_type = r1.GetType();
auto r2_type = r2.GetType();
// null is the smallest type and compares with every other type, nothing is lesser than null
if (r1_type == rjson::type::kNullType || r2_type == rjson::type::kNullType) {
return r1_type < r2_type;
}
// only null, true, and false are comparable with each other, other types are not compatible
if (r1_type != r2_type) {
if (r1_type > rjson::type::kTrueType || r2_type > rjson::type::kTrueType) {
throw rjson::error(format("Types are not comparable: {} {}", r1, r2));
}
}
switch (r1_type) {
case rjson::type::kNullType:
// fall-through
case rjson::type::kFalseType:
// fall-through
case rjson::type::kTrueType:
return r1_type < r2_type;
case rjson::type::kObjectType:
throw rjson::error("Object type comparison is not supported");
case rjson::type::kArrayType:
throw rjson::error("Array type comparison is not supported");
case rjson::type::kStringType: {
const size_t r1_len = r1.GetStringLength();
const size_t r2_len = r2.GetStringLength();
size_t len = std::min(r1_len, r2_len);
int result = std::strncmp(r1.GetString(), r2.GetString(), len);
return result < 0 || (result == 0 && r1_len < r2_len);
}
case rjson::type::kNumberType: {
if (r1.IsInt() && r2.IsInt()) {
return r1.GetInt() < r2.GetInt();
} else if (r1.IsUint() && r2.IsUint()) {
return r1.GetUint() < r2.GetUint();
} else if (r1.IsInt64() && r2.IsInt64()) {
return r1.GetInt64() < r2.GetInt64();
} else if (r1.IsUint64() && r2.IsUint64()) {
return r1.GetUint64() < r2.GetUint64();
} else {
// it's safe to call GetDouble() on any number type
return r1.GetDouble() < r2.GetDouble();
}
}
default:
return false;
}
}
} // end namespace rjson
std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {
return os << rjson::print(v);
}

163
alternator/rjson.hh Normal file
View File

@@ -0,0 +1,163 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
/*
* rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.
*
* rapidjson has strict copy elision policies, which, among other things, involves
* using provided char arrays without copying them and allows copying objects only explicitly.
* As such, one should be careful when passing strings with limited liveness
* (e.g. data underneath local std::strings) to rjson functions, because created JSON objects
* may end up relying on dangling char pointers. All rjson functions that create JSONs from strings
* by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live
* at least as long as the object, e.g. a static char array) and for std::strings. The more optimal
* variants should be used *only* if the liveness of the string is guaranteed, otherwise it will
* result in undefined behaviour.
* Also, bear in mind that methods exposed by rjson::value are generic, but some of them
* work fine only for specific types. In case the type does not match, an rjson::error will be thrown.
* Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type
* or calling Size() on a non-array value.
*/
#include <string>
#include <stdexcept>
namespace rjson {
class error : public std::exception {
std::string _msg;
public:
error() = default;
error(const std::string& msg) : _msg(msg) {}
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}
// rapidjson configuration macros
#define RAPIDJSON_HAS_STDSTRING 1
// Default rjson policy is to use assert() - which is dangerous for two reasons:
// 1. assert() can be turned off with -DNDEBUG
// 2. assert() crashes a program
// Fortunately, the default policy can be overridden, and so rapidjson errors will
// throw an rjson::error exception instead.
#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <rapidjson/error/en.h>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace rjson {
using allocator = rapidjson::CrtAllocator;
using encoding = rapidjson::UTF8<>;
using document = rapidjson::GenericDocument<encoding, allocator>;
using value = rapidjson::GenericValue<encoding, allocator>;
using string_ref_type = value::StringRefType;
using string_buffer = rapidjson::GenericStringBuffer<encoding>;
using writer = rapidjson::Writer<string_buffer, encoding>;
using type = rapidjson::Type;
// Returns an object representing JSON's null
inline rjson::value null_value() {
return rjson::value(rapidjson::kNullType);
}
// Returns an empty JSON object - {}
inline rjson::value empty_object() {
return rjson::value(rapidjson::kObjectType);
}
// Returns an empty JSON array - []
inline rjson::value empty_array() {
return rjson::value(rapidjson::kArrayType);
}
// Returns an empty JSON string - ""
inline rjson::value empty_string() {
return rjson::value(rapidjson::kStringType);
}
// Convert the JSON value to a string with JSON syntax, the opposite of parse().
// The representation is dense - without any redundant indentation.
std::string print(const rjson::value& value);
// Copies given JSON value - involves allocation
rjson::value copy(const rjson::value& value);
// Parses a JSON value from given string or raw character array.
// The string/char array liveness does not need to be persisted,
// as both parse() and parse_raw() will allocate member names and values.
// Throws rjson::error if parsing failed.
rjson::value parse(const std::string& str);
rjson::value parse_raw(const char* c_str, size_t size);
// Creates a JSON value (of JSON string type) out of internal string representations.
// The string value is copied, so str's liveness does not need to be persisted.
rjson::value from_string(const std::string& str);
rjson::value from_string(const sstring& str);
rjson::value from_string(const char* str, size_t size);
// Returns a pointer to JSON member if it exists, nullptr otherwise
rjson::value* find(rjson::value& value, rjson::string_ref_type name);
const rjson::value* find(const rjson::value& value, rjson::string_ref_type name);
// Returns a reference to JSON member if it exists, throws otherwise
rjson::value& get(rjson::value& value, rjson::string_ref_type name);
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name);
// Sets a member in given JSON object by moving the member - allocates the name.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference - allocates the name.
// NOTICE: member string liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);
// Sets a member in given JSON object by moving the member.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// NOTICE: member liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);
// Adds a value to a JSON list by moving the item to its end.
// Throws if base_array is not a JSON array.
void push_back(rjson::value& base_array, rjson::value&& item);
struct single_value_comp {
bool operator()(const rjson::value& r1, const rjson::value& r2) const;
};
} // end namespace rjson
namespace std {
std::ostream& operator<<(std::ostream& os, const rjson::value& v);
}

261
alternator/serialization.cc Normal file
View File

@@ -0,0 +1,261 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "base64.hh"
#include "log.hh"
#include "serialization.hh"
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
static logging::logger slogger("alternator-serialization");
namespace alternator {
type_info type_info_from_string(std::string type) {
static thread_local const std::unordered_map<std::string, type_info> type_infos = {
{"S", {alternator_type::S, utf8_type}},
{"B", {alternator_type::B, bytes_type}},
{"BOOL", {alternator_type::BOOL, boolean_type}},
{"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_infos.find(type);
if (it == type_infos.end()) {
return {alternator_type::NOT_SUPPORTED_YET, utf8_type};
}
return it->second;
}
type_representation represent_type(alternator_type atype) {
static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {
{alternator_type::S, {"S", utf8_type}},
{alternator_type::B, {"B", bytes_type}},
{alternator_type::BOOL, {"BOOL", boolean_type}},
{alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_representations.find(atype);
if (it == type_representations.end()) {
throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));
}
return it->second;
}
struct from_json_visitor {
const rjson::value& v;
bytes_ostream& bo;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };
void operator()(const string_type_impl& t) {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
void operator()(const bytes_type_impl& t) const {
bo.write(base64_decode(v));
}
void operator()(const boolean_type_impl& t) const {
bo.write(boolean_type->decompose(v.GetBool()));
}
void operator()(const decimal_type_impl& t) const {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
// default
void operator()(const abstract_type& t) const {
bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));
}
};
bytes serialize_item(const rjson::value& item) {
if (item.IsNull() || item.MemberCount() != 1) {
throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));
}
auto it = item.MemberBegin();
type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings
if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal serialization of type {}", it->name.GetString());
return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));
}
bytes_ostream bo;
bo.write(bytes{int8_t(type_info.atype)});
visit(*type_info.dtype, from_json_visitor{it->value, bo});
return bytes(bo.linearize());
}
struct to_json_visitor {
rjson::value& deserialized;
const std::string& type_ident;
bytes_view bv;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };
void operator()(const decimal_type_impl& t) const {
auto s = to_json_string(*decimal_type, bytes(bv));
//FIXME(sarna): unnecessary copy
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));
}
void operator()(const string_type_impl& t) {
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));
}
void operator()(const bytes_type_impl& t) const {
std::string b64 = base64_encode(bv);
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));
}
// default
void operator()(const abstract_type& t) const {
rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));
}
};
rjson::value deserialize_item(bytes_view bv) {
rjson::value deserialized(rapidjson::kObjectType);
if (bv.empty()) {
throw api_error("ValidationException", "Serialized value empty");
}
alternator_type atype = alternator_type(bv[0]);
bv.remove_prefix(1);
if (atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));
return rjson::parse_raw(reinterpret_cast<const char *>(bv.data()), bv.size());
}
type_representation type_representation = represent_type(atype);
visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});
return deserialized;
}
std::string type_to_string(data_type type) {
static thread_local std::unordered_map<data_type, std::string> types = {
{utf8_type, "S"},
{bytes_type, "B"},
{boolean_type, "BOOL"},
{decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type
};
auto it = types.find(type);
if (it == types.end()) {
throw std::runtime_error(format("Unknown type {}", type->name()));
}
return it->second;
}
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text();
std::string expected_type = type_to_string(column.type);
const rjson::value& key_typed_value = rjson::get(item, rjson::value::StringRefType(column_name.c_str()));
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {
throw api_error("ValidationException",
format("Missing or invalid value object for key column {}: {}", column_name, item));
}
return get_key_from_typed_value(key_typed_value, column, expected_type);
}
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type) {
auto it = key_typed_value.MemberBegin();
if (it->name.GetString() != expected_type) {
throw api_error("ValidationException",
format("Type mismatch: expected type {} for key column {}, got type {}",
expected_type, column.name_as_text(), it->name.GetString()));
}
if (column.type == bytes_type) {
return base64_decode(it->value);
} else {
return column.type->from_string(it->value.GetString());
}
}
rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {
if (column.type == bytes_type) {
std::string b64 = base64_encode(cell);
return rjson::from_string(b64);
} if (column.type == utf8_type) {
return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));
} else if (column.type == decimal_type) {
// FIXME: use specialized Alternator number type, not the more
// general "decimal_type". A dedicated type can be more efficient
// in storage space and in parsing speed.
auto s = to_json_string(*decimal_type, bytes(cell));
return rjson::from_string(s);
} else {
// We shouldn't get here, we shouldn't see such key columns.
throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));
}
}
partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {
std::vector<bytes> raw_pk;
// FIXME: this is a loop, but we really allow only one partition key column.
for (const column_definition& cdef : schema->partition_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_pk.push_back(std::move(raw_value));
}
return partition_key::from_exploded(raw_pk);
}
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key::make_empty();
}
std::vector<bytes> raw_ck;
// FIXME: this is a loop, but we really allow only one clustering key column.
for (const column_definition& cdef : schema->clustering_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value));
}
return clustering_key::from_exploded(raw_ck);
}
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error("ValidationException", format("{}: invalid number object", diagnostic));
}
auto it = v.MemberBegin();
if (it->name != "N") {
throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));
}
if (it->value.IsNumber()) {
// FIXME(sarna): should use big_decimal constructor with numeric values directly:
return big_decimal(rjson::print(it->value));
}
if (!it->value.IsString()) {
throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(it->value.GetString());
}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return {"", nullptr};
}
auto it = v.MemberBegin();
const std::string it_key = it->name.GetString();
if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
return {"", nullptr};
}
return std::make_pair(it_key, &(it->value));
}
}

View File

@@ -0,0 +1,72 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include "types.hh"
#include "schema.hh"
#include "keys.hh"
#include "rjson.hh"
#include "utils/big_decimal.hh"
namespace alternator {
enum class alternator_type : int8_t {
S, B, BOOL, N, NOT_SUPPORTED_YET
};
struct type_info {
alternator_type atype;
data_type dtype;
};
struct type_representation {
std::string ident;
data_type dtype;
};
type_info type_info_from_string(std::string type);
type_representation represent_type(alternator_type atype);
bytes serialize_item(const rjson::value& item);
rjson::value deserialize_item(bytes_view bv);
std::string type_to_string(data_type type);
bytes get_key_column_value(const rjson::value& item, const column_definition& column);
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type);
rjson::value json_key_column_value(bytes_view cell, const column_definition& column);
partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it. Otherwise,
// raises ValidationException with diagnostic.
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
// and returns set's type and a pointer to that set. If the object does not encode a set,
// returned value is {"", nullptr}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);
}

314
alternator/server.cc Normal file
View File

@@ -0,0 +1,314 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/server.hh"
#include "log.hh"
#include <seastar/http/function_handlers.hh>
#include <seastar/json/json_elements.hh>
#include <seastarx.hh>
#include "error.hh"
#include "rjson.hh"
#include "auth.hh"
#include <cctype>
#include "cql3/query_processor.hh"
static logging::logger slogger("alternator-server");
using namespace httpd;
namespace alternator {
static constexpr auto TARGET = "X-Amz-Target";
inline std::vector<std::string_view> split(std::string_view text, char separator) {
std::vector<std::string_view> tokens;
if (text == "") {
return tokens;
}
while (true) {
auto pos = text.find_first_of(separator);
if (pos != std::string_view::npos) {
tokens.emplace_back(text.data(), pos);
text.remove_prefix(pos + 1);
} else {
tokens.emplace_back(text);
break;
}
}
return tokens;
}
// DynamoDB HTTP error responses are structured as follows
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Our handlers throw an exception to report an error. If the exception
// is of type alternator::api_error, it unwrapped and properly reported to
// the user directly. Other exceptions are unexpected, and reported as
// Internal Server Error.
class api_handler : public handler_base {
public:
api_handler(const future_json_function& _handle) : _f_handle(
[_handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([rep = std::move(rep)](future<json::json_return_type> resf) mutable {
if (resf.failed()) {
// Exceptions of type api_error are wrapped as JSON and
// returned to the client as expected. Other types of
// exceptions are unexpected, and returned to the user
// as an internal server error:
api_error ret;
try {
resf.get();
} catch (api_error &ae) {
ret = ae;
} catch (rjson::error & re) {
ret = api_error("ValidationException", re.what());
} catch (...) {
ret = api_error(
"Internal Server Error",
format("Internal server error: {}", std::current_exception()),
reply::status_type::internal_server_error);
}
// FIXME: what is this version number?
rep->_content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + ret._type + "\"," +
"\"message\":\"" + ret._msg + "\"}";
rep->_status = ret._http_code;
slogger.trace("api_handler error case: {}", rep->_content);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
slogger.trace("api_handler success case");
auto res = resf.get0();
if (res._body_writer) {
rep->write_body("json", std::move(res._body_writer));
} else {
rep->_content += res._res;
}
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}), _type("json") { }
api_handler(const api_handler&) = default;
future<std::unique_ptr<reply>> handle(const sstring& path,
std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
return _f_handle(std::move(req), std::move(rep)).then(
[this](std::unique_ptr<reply> rep) {
rep->done(_type);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}
protected:
future_handler_function _f_handle;
sstring _type;
};
class health_handler : public handler_base {
virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
rep->set_status(reply::status_type::ok);
rep->write_body("txt", format("healthy: {}", req->get_header("Host")));
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
};
future<> server::verify_signature(const request& req) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<>();
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
std::string credential;
std::string user_signature;
std::string signed_headers_str;
std::vector<std::string_view> signed_headers;
for (std::string_view entry : credentials_raw) {
std::vector<std::string_view> entry_split = split(entry, '=');
if (entry_split.size() != 2) {
if (entry != "AWS4-HMAC-SHA256") {
throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
}
continue;
}
std::string_view auth_value = entry_split[1];
// Commas appear as an additional (quite redundant) delimiter
if (auth_value.back() == ',') {
auth_value.remove_suffix(1);
}
if (entry_split[0] == "Credential") {
credential = std::string(auth_value);
} else if (entry_split[0] == "Signature") {
user_signature = std::string(auth_value);
} else if (entry_split[0] == "SignedHeaders") {
signed_headers_str = std::string(auth_value);
signed_headers = split(auth_value, ';');
std::sort(signed_headers.begin(), signed_headers.end());
}
}
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
std::string region(credential_split[2]);
std::string service(credential_split[3]);
std::map<std::string_view, std::string_view> signed_headers_map;
for (const auto& header : signed_headers) {
signed_headers_map.emplace(header, std::string_view());
}
for (auto& header : req._headers) {
std::string header_str;
header_str.resize(header.first.size());
std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
auto it = signed_headers_map.find(header_str);
if (it != signed_headers_map.end()) {
it->second = std::string_view(header.second);
}
}
auto cache_getter = [] (std::string username) {
return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req,
user = std::move(user),
host = std::move(host),
datestamp = std::move(datestamp),
signed_headers_str = std::move(signed_headers_str),
signed_headers_map = std::move(signed_headers_map),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");
}
});
}
future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
_executor.local()._stats.total_operations++;
sstring target = req->get_header(TARGET);
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
slogger.trace("Request: {} {}", op, req->content);
return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor.local()._stats.unsupported_operations++;
throw api_error("UnknownOperationException",
format("Unsupported operation {}", op));
}
//FIXME: Client state can provide more context, e.g. client's endpoint address
// We use unique_ptr because client_state cannot be moved or copied
return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
client_state->set_raw_keyspace(executor::KEYSPACE_NAME);
tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);
tracing::trace(trace_state, op);
return callback_it->second(_executor.local(), *client_state, trace_state, std::move(req)).finally([trace_state] {});
});
});
}
void server::set_routes(routes& r) {
api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
return handle_api_request(std::move(req));
});
r.add(operation_type::POST, url("/"), req_handler);
r.add(operation_type::GET, url("/"), new health_handler);
}
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(seastar::sharded<executor>& e)
: _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) {
return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req), trace_state = std::move(trace_state)] () mutable { return e.create_table(client_state, std::move(trace_state), req->content); }); }
},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_table(client_state, std::move(trace_state), req->content); }},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_table(client_state, std::move(trace_state), req->content); }},
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.put_item(client_state, std::move(trace_state), req->content); }},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.update_item(client_state, std::move(trace_state), req->content); }},
{"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.get_item(client_state, std::move(trace_state), req->content); }},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_item(client_state, std::move(trace_state), req->content); }},
{"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},
{"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.scan(client_state, std::move(trace_state), req->content); }},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, std::move(trace_state), req->content); }},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, std::move(trace_state), req->content); }},
{"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.query(client_state, std::move(trace_state), req->content); }},
} {
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {
_enforce_authorization = enforce_authorization;
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
}
return seastar::async([this, addr, port, https_port, creds] {
try {
_executor.invoke_on_all([] (executor& e) {
return e.start();
}).get();
if (port) {
_control.start().get();
_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_control.listen(socket_address{addr, *port}).get();
slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);
}
if (https_port) {
_https_control.start().get();
_https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_https_control.server().invoke_on_all([creds] (http_server& serv) {
return serv.set_tls_credentials(creds->build_server_credentials());
}).get();
_https_control.listen(socket_address{addr, *https_port}).get();
slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);
}
} catch (...) {
slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
std::throw_with_nested(std::runtime_error(
format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));
}
});
}
}

54
alternator/server.hh Normal file
View File

@@ -0,0 +1,54 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "alternator/executor.hh"
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include <seastar/net/tls.hh>
#include <optional>
#include <alternator/auth.hh>
namespace alternator {
class server {
using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, tracing::trace_state_ptr, std::unique_ptr<request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
seastar::httpd::http_server_control _control;
seastar::httpd::http_server_control _https_control;
seastar::sharded<executor>& _executor;
key_cache _key_cache;
bool _enforce_authorization;
alternator_callbacks_map _callbacks;
public:
server(seastar::sharded<executor>& executor);
seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);
private:
void set_routes(seastar::httpd::routes& r);
future<> verify_signature(const seastar::httpd::request& r);
future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);
};
}

98
alternator/stats.cc Normal file
View File

@@ -0,0 +1,98 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "stats.hh"
#include <seastar/core/metrics.hh>
namespace alternator {
const char* ALTERNATOR_METRICS = "alternator";
stats::stats() : api_operations{} {
// Register the
seastar::metrics::label op("op");
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
OPERATION(create_global_table, "CreateGlobalTable")
OPERATION(create_table, "CreateTable")
OPERATION(delete_backup, "DeleteBackup")
OPERATION(delete_item, "DeleteItem")
OPERATION(delete_table, "DeleteTable")
OPERATION(describe_backup, "DescribeBackup")
OPERATION(describe_continuous_backups, "DescribeContinuousBackups")
OPERATION(describe_endpoints, "DescribeEndpoints")
OPERATION(describe_global_table, "DescribeGlobalTable")
OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")
OPERATION(describe_limits, "DescribeLimits")
OPERATION(describe_table, "DescribeTable")
OPERATION(describe_time_to_live, "DescribeTimeToLive")
OPERATION(get_item, "GetItem")
OPERATION(list_backups, "ListBackups")
OPERATION(list_global_tables, "ListGlobalTables")
OPERATION(list_tables, "ListTables")
OPERATION(list_tags_of_resource, "ListTagsOfResource")
OPERATION(put_item, "PutItem")
OPERATION(query, "Query")
OPERATION(restore_table_from_backup, "RestoreTableFromBackup")
OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")
OPERATION(scan, "Scan")
OPERATION(tag_resource, "TagResource")
OPERATION(transact_get_items, "TransactGetItems")
OPERATION(transact_write_items, "TransactWriteItems")
OPERATION(untag_resource, "UntagResource")
OPERATION(update_continuous_backups, "UpdateContinuousBackups")
OPERATION(update_global_table, "UpdateGlobalTable")
OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")
OPERATION(update_item, "UpdateItem")
OPERATION(update_table, "UpdateTable")
OPERATION(update_time_to_live, "UpdateTimeToLive")
OPERATION_LATENCY(put_item_latency, "PutItem")
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
});
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API")),
seastar::metrics::make_total_operations("total_operations", total_operations,
seastar::metrics::description("number of total operations via Alternator API")),
seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
seastar::metrics::description("number of performed read-before-write operations")),
seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
});
}
}

95
alternator/stats.hh Normal file
View File

@@ -0,0 +1,95 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <cstdint>
#include <seastar/core/metrics_registration.hh>
#include "seastarx.hh"
#include "utils/estimated_histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
// Object holding per-shard statistics related to Alternator.
// While this object is alive, these metrics are also registered to be
// visible by the metrics REST API, with the "alternator" prefix.
class stats {
public:
stats();
// Count of DynamoDB API operations by types
struct {
uint64_t batch_get_item = 0;
uint64_t batch_write_item = 0;
uint64_t create_backup = 0;
uint64_t create_global_table = 0;
uint64_t create_table = 0;
uint64_t delete_backup = 0;
uint64_t delete_item = 0;
uint64_t delete_table = 0;
uint64_t describe_backup = 0;
uint64_t describe_continuous_backups = 0;
uint64_t describe_endpoints = 0;
uint64_t describe_global_table = 0;
uint64_t describe_global_table_settings = 0;
uint64_t describe_limits = 0;
uint64_t describe_table = 0;
uint64_t describe_time_to_live = 0;
uint64_t get_item = 0;
uint64_t list_backups = 0;
uint64_t list_global_tables = 0;
uint64_t list_tables = 0;
uint64_t list_tags_of_resource = 0;
uint64_t put_item = 0;
uint64_t query = 0;
uint64_t restore_table_from_backup = 0;
uint64_t restore_table_to_point_in_time = 0;
uint64_t scan = 0;
uint64_t tag_resource = 0;
uint64_t transact_get_items = 0;
uint64_t transact_write_items = 0;
uint64_t untag_resource = 0;
uint64_t update_continuous_backups = 0;
uint64_t update_global_table = 0;
uint64_t update_global_table_settings = 0;
uint64_t update_item = 0;
uint64_t update_table = 0;
uint64_t update_time_to_live = 0;
utils::estimated_histogram put_item_latency;
utils::estimated_histogram get_item_latency;
utils::estimated_histogram delete_item_latency;
utils::estimated_histogram update_item_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;
uint64_t unsupported_operations = 0;
uint64_t reads_before_write = 0;
// CQL-derived stats
cql3::cql_stats cql_stats;
private:
// The metric_groups object holds this stat object's metrics registered
// as long as the stats object is alive.
seastar::metrics::metric_groups _metrics;
};
}

View File

@@ -13,7 +13,7 @@
{
"method":"GET",
"summary":"get row cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_row_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -35,7 +35,7 @@
"description":"row cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"get key cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_key_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -70,7 +70,7 @@
"description":"key cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -83,7 +83,7 @@
{
"method":"GET",
"summary":"get counter cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -105,7 +105,7 @@
"description":"counter cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -118,7 +118,7 @@
{
"method":"GET",
"summary":"get row cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_row_cache_keys_to_save",
"produces":[
"application/json"
@@ -140,7 +140,7 @@
"description":"row cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -153,7 +153,7 @@
{
"method":"GET",
"summary":"get key cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_key_cache_keys_to_save",
"produces":[
"application/json"
@@ -175,7 +175,7 @@
"description":"key cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -188,7 +188,7 @@
{
"method":"GET",
"summary":"get counter cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_keys_to_save",
"produces":[
"application/json"
@@ -210,7 +210,7 @@
"description":"counter cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -448,7 +448,7 @@
{
"method": "GET",
"summary": "Get key entries",
"type": "int",
"type": "long",
"nickname": "get_key_entries",
"produces": [
"application/json"
@@ -568,7 +568,7 @@
{
"method": "GET",
"summary": "Get row entries",
"type": "int",
"type": "long",
"nickname": "get_row_entries",
"produces": [
"application/json"
@@ -688,7 +688,7 @@
{
"method": "GET",
"summary": "Get counter entries",
"type": "int",
"type": "long",
"nickname": "get_counter_entries",
"produces": [
"application/json"

View File

@@ -121,7 +121,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -172,7 +172,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -223,7 +223,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -231,7 +231,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -544,7 +544,7 @@
"summary":"sstable count for each level. empty unless leveled compaction is used",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_sstable_count_per_level",
"produces":[
@@ -636,7 +636,7 @@
"description":"Duration (in milliseconds) of monitoring operation",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -644,7 +644,7 @@
"description":"number of the top partitions to list",
"required":false,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -652,7 +652,7 @@
"description":"capacity of stream summary: determines amount of resources used in query processing",
"required":false,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -921,7 +921,7 @@
{
"method":"GET",
"summary":"Get memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_memtable_switch_count",
"produces":[
"application/json"
@@ -945,7 +945,7 @@
{
"method":"GET",
"summary":"Get all memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_all_memtable_switch_count",
"produces":[
"application/json"
@@ -1082,7 +1082,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1235,7 +1235,7 @@
{
"method":"GET",
"summary":"Get all read latency",
"type":"int",
"type": "long",
"nickname":"get_all_read_latency",
"produces":[
"application/json"
@@ -1251,7 +1251,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"
@@ -1275,7 +1275,7 @@
{
"method":"GET",
"summary":"Get all range latency",
"type":"int",
"type": "long",
"nickname":"get_all_range_latency",
"produces":[
"application/json"
@@ -1291,7 +1291,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1444,7 +1444,7 @@
{
"method":"GET",
"summary":"Get all write latency",
"type":"int",
"type": "long",
"nickname":"get_all_write_latency",
"produces":[
"application/json"
@@ -1460,7 +1460,7 @@
{
"method":"GET",
"summary":"Get pending flushes",
"type":"int",
"type": "long",
"nickname":"get_pending_flushes",
"produces":[
"application/json"
@@ -1484,7 +1484,7 @@
{
"method":"GET",
"summary":"Get all pending flushes",
"type":"int",
"type": "long",
"nickname":"get_all_pending_flushes",
"produces":[
"application/json"
@@ -1500,7 +1500,7 @@
{
"method":"GET",
"summary":"Get pending compactions",
"type":"int",
"type": "long",
"nickname":"get_pending_compactions",
"produces":[
"application/json"
@@ -1524,7 +1524,7 @@
{
"method":"GET",
"summary":"Get all pending compactions",
"type":"int",
"type": "long",
"nickname":"get_all_pending_compactions",
"produces":[
"application/json"
@@ -1540,7 +1540,7 @@
{
"method":"GET",
"summary":"Get live ss table count",
"type":"int",
"type": "long",
"nickname":"get_live_ss_table_count",
"produces":[
"application/json"
@@ -1564,7 +1564,7 @@
{
"method":"GET",
"summary":"Get all live ss table count",
"type":"int",
"type": "long",
"nickname":"get_all_live_ss_table_count",
"produces":[
"application/json"
@@ -1580,7 +1580,7 @@
{
"method":"GET",
"summary":"Get live disk space used",
"type":"int",
"type": "long",
"nickname":"get_live_disk_space_used",
"produces":[
"application/json"
@@ -1604,7 +1604,7 @@
{
"method":"GET",
"summary":"Get all live disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_live_disk_space_used",
"produces":[
"application/json"
@@ -1620,7 +1620,7 @@
{
"method":"GET",
"summary":"Get total disk space used",
"type":"int",
"type": "long",
"nickname":"get_total_disk_space_used",
"produces":[
"application/json"
@@ -1644,7 +1644,7 @@
{
"method":"GET",
"summary":"Get all total disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_total_disk_space_used",
"produces":[
"application/json"
@@ -2100,7 +2100,7 @@
{
"method":"GET",
"summary":"Get speculative retries",
"type":"int",
"type": "long",
"nickname":"get_speculative_retries",
"produces":[
"application/json"
@@ -2124,7 +2124,7 @@
{
"method":"GET",
"summary":"Get all speculative retries",
"type":"int",
"type": "long",
"nickname":"get_all_speculative_retries",
"produces":[
"application/json"
@@ -2204,7 +2204,7 @@
{
"method":"GET",
"summary":"Get row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2228,7 +2228,7 @@
{
"method":"GET",
"summary":"Get all row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2244,7 +2244,7 @@
{
"method":"GET",
"summary":"Get row cache hit",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit",
"produces":[
"application/json"
@@ -2268,7 +2268,7 @@
{
"method":"GET",
"summary":"Get all row cache hit",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit",
"produces":[
"application/json"
@@ -2284,7 +2284,7 @@
{
"method":"GET",
"summary":"Get row cache miss",
"type":"int",
"type": "long",
"nickname":"get_row_cache_miss",
"produces":[
"application/json"
@@ -2308,7 +2308,7 @@
{
"method":"GET",
"summary":"Get all row cache miss",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_miss",
"produces":[
"application/json"
@@ -2324,7 +2324,7 @@
{
"method":"GET",
"summary":"Get cas prepare",
"type":"int",
"type": "long",
"nickname":"get_cas_prepare",
"produces":[
"application/json"
@@ -2348,7 +2348,7 @@
{
"method":"GET",
"summary":"Get cas propose",
"type":"int",
"type": "long",
"nickname":"get_cas_propose",
"produces":[
"application/json"
@@ -2372,7 +2372,7 @@
{
"method":"GET",
"summary":"Get cas commit",
"type":"int",
"type": "long",
"nickname":"get_cas_commit",
"produces":[
"application/json"

View File

@@ -118,7 +118,7 @@
{
"method": "GET",
"summary": "Get pending tasks",
"type": "int",
"type": "long",
"nickname": "get_pending_tasks",
"produces": [
"application/json"
@@ -181,7 +181,7 @@
{
"method": "GET",
"summary": "Get bytes compacted",
"type": "int",
"type": "long",
"nickname": "get_bytes_compacted",
"produces": [
"application/json"
@@ -197,7 +197,7 @@
"description":"A row merged information",
"properties":{
"key":{
"type":"int",
"type": "long",
"description":"The number of sstable"
},
"value":{

View File

@@ -110,7 +110,7 @@
{
"method":"GET",
"summary":"Get count down endpoint",
"type":"int",
"type": "long",
"nickname":"get_down_endpoint_count",
"produces":[
"application/json"
@@ -126,7 +126,7 @@
{
"method":"GET",
"summary":"Get count up endpoint",
"type":"int",
"type": "long",
"nickname":"get_up_endpoint_count",
"produces":[
"application/json"
@@ -180,11 +180,11 @@
"description": "The endpoint address"
},
"generation": {
"type": "int",
"type": "long",
"description": "The heart beat generation"
},
"version": {
"type": "int",
"type": "long",
"description": "The heart beat version"
},
"update_time": {
@@ -209,7 +209,7 @@
"description": "Holds a version value for an application state",
"properties": {
"application_state": {
"type": "int",
"type": "long",
"description": "The application state enum index"
},
"value": {
@@ -217,7 +217,7 @@
"description": "The version value"
},
"version": {
"type": "int",
"type": "long",
"description": "The application state version"
}
}

View File

@@ -75,7 +75,7 @@
{
"method":"GET",
"summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -99,7 +99,7 @@
{
"method":"GET",
"summary":"Get heart beat version for a node",
"type":"int",
"type": "long",
"nickname":"get_current_heart_beat_version",
"produces":[
"application/json"

View File

@@ -99,7 +99,7 @@
{
"method": "GET",
"summary": "Get create hint count",
"type": "int",
"type": "long",
"nickname": "get_create_hint_count",
"produces": [
"application/json"
@@ -123,7 +123,7 @@
{
"method": "GET",
"summary": "Get not stored hints count",
"type": "int",
"type": "long",
"nickname": "get_not_stored_hints_count",
"produces": [
"application/json"

View File

@@ -191,7 +191,7 @@
{
"method":"GET",
"summary":"Get the version number",
"type":"int",
"type": "long",
"nickname":"get_version",
"produces":[
"application/json"

View File

@@ -105,7 +105,7 @@
{
"method":"GET",
"summary":"Get the max hint window",
"type":"int",
"type": "long",
"nickname":"get_max_hint_window",
"produces":[
"application/json"
@@ -128,7 +128,7 @@
"description":"max hint window in ms",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -141,7 +141,7 @@
{
"method":"GET",
"summary":"Get max hints in progress",
"type":"int",
"type": "long",
"nickname":"get_max_hints_in_progress",
"produces":[
"application/json"
@@ -164,7 +164,7 @@
"description":"max hints in progress",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -177,7 +177,7 @@
{
"method":"GET",
"summary":"get hints in progress",
"type":"int",
"type": "long",
"nickname":"get_hints_in_progress",
"produces":[
"application/json"
@@ -602,7 +602,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -632,7 +632,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_condition_not_met",
"produces": [
"application/json"
@@ -647,7 +647,7 @@
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_read_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -671,28 +671,13 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/condition_not_met",
"operations": [
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"nickname": "get_cas_read_metrics_condition_not_met",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/timeouts",
"operations": [
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_timeouts",
"produces": [
"application/json"
@@ -707,7 +692,7 @@
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_unavailables",
"produces": [
"application/json"
@@ -842,7 +827,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_timeouts",
"produces": [
"application/json"
@@ -857,7 +842,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_unavailables",
"produces": [
"application/json"
@@ -902,7 +887,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_timeouts",
"produces": [
"application/json"
@@ -917,7 +902,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_unavailables",
"produces": [
"application/json"
@@ -1023,7 +1008,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1055,7 +1040,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1087,7 +1072,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"

View File

@@ -458,7 +458,7 @@
{
"method":"GET",
"summary":"Return the generation value for this node.",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -646,7 +646,7 @@
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace",
"type":"int",
"type": "long",
"nickname":"force_keyspace_cleanup",
"produces":[
"application/json"
@@ -678,7 +678,7 @@
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",
"type":"int",
"type": "long",
"nickname":"scrub",
"produces":[
"application/json"
@@ -726,7 +726,7 @@
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",
"type":"int",
"type": "long",
"nickname":"upgrade_sstables",
"produces":[
"application/json"
@@ -800,7 +800,7 @@
"summary":"Return an array with the ids of the currently active repairs",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_active_repair_async",
"produces":[
@@ -816,7 +816,7 @@
{
"method":"POST",
"summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",
"type":"int",
"type": "long",
"nickname":"repair_async",
"produces":[
"application/json"
@@ -947,7 +947,7 @@
"description":"The repair ID to check for status",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1277,18 +1277,18 @@
},
{
"name":"dynamic_update_interval",
"description":"integer, in ms (default 100)",
"description":"interval in ms (default 100)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
"name":"dynamic_reset_interval",
"description":"integer, in ms (default 600,000)",
"description":"interval in ms (default 600,000)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
@@ -1493,7 +1493,7 @@
"description":"Stream throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1501,7 +1501,7 @@
{
"method":"GET",
"summary":"Get stream throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_stream_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1517,7 +1517,7 @@
{
"method":"GET",
"summary":"get compaction throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_compaction_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1539,7 +1539,7 @@
"description":"compaction throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1943,7 +1943,7 @@
{
"method":"GET",
"summary":"Returns the threshold for warning of queries with many tombstones",
"type":"int",
"type": "long",
"nickname":"get_tombstone_warn_threshold",
"produces":[
"application/json"
@@ -1965,7 +1965,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1978,7 +1978,7 @@
{
"method":"GET",
"summary":"",
"type":"int",
"type": "long",
"nickname":"get_tombstone_failure_threshold",
"produces":[
"application/json"
@@ -2000,7 +2000,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2013,7 +2013,7 @@
{
"method":"GET",
"summary":"Returns the threshold for rejecting queries due to a large batch size",
"type":"int",
"type": "long",
"nickname":"get_batch_size_failure_threshold",
"produces":[
"application/json"
@@ -2035,7 +2035,7 @@
"description":"batch size debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2059,7 +2059,7 @@
"description":"throttle in kb",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2072,7 +2072,7 @@
{
"method":"GET",
"summary":"Get load",
"type":"int",
"type": "long",
"nickname":"get_metrics_load",
"produces":[
"application/json"
@@ -2088,7 +2088,7 @@
{
"method":"GET",
"summary":"Get exceptions",
"type":"int",
"type": "long",
"nickname":"get_exceptions",
"produces":[
"application/json"
@@ -2104,7 +2104,7 @@
{
"method":"GET",
"summary":"Get total hints in progress",
"type":"int",
"type": "long",
"nickname":"get_total_hints_in_progress",
"produces":[
"application/json"
@@ -2120,7 +2120,7 @@
{
"method":"GET",
"summary":"Get total hints",
"type":"int",
"type": "long",
"nickname":"get_total_hints",
"produces":[
"application/json"

View File

@@ -32,7 +32,7 @@
{
"method":"GET",
"summary":"Get number of active outbound streams",
"type":"int",
"type": "long",
"nickname":"get_all_active_streams_outbound",
"produces":[
"application/json"
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"Get total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_total_incoming_bytes",
"produces":[
"application/json"
@@ -72,7 +72,7 @@
{
"method":"GET",
"summary":"Get all total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_incoming_bytes",
"produces":[
"application/json"
@@ -88,7 +88,7 @@
{
"method":"GET",
"summary":"Get total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_total_outgoing_bytes",
"produces":[
"application/json"
@@ -112,7 +112,7 @@
{
"method":"GET",
"summary":"Get all total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_outgoing_bytes",
"produces":[
"application/json"
@@ -154,7 +154,7 @@
"description":"The peer"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"connecting":{
@@ -211,7 +211,7 @@
"description":"The ID"
},
"files":{
"type":"int",
"type": "long",
"description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."
},
"total_size":{
@@ -242,7 +242,7 @@
"description":"The peer address"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"file_name":{

View File

@@ -52,6 +52,21 @@
}
]
},
{
"path":"/system/uptime_ms",
"operations":[
{
"method":"GET",
"summary":"Get system uptime, in milliseconds",
"type":"long",
"nickname":"get_system_uptime",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/system/logger/{name}",
"operations":[

View File

@@ -23,6 +23,8 @@
#include "service/storage_proxy.hh"
#include <seastar/http/httpd.hh>
namespace service { class load_meter; }
namespace api {
struct http_context {
@@ -31,9 +33,11 @@ struct http_context {
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm)
: db(_db), sp(_sp), lmeter(_lm) {
}
};

View File

@@ -26,7 +26,7 @@
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/system_keyspace_view_types.hh"
#include "db/data_listeners.hh"
extern logging::logger apilog;
@@ -53,8 +53,7 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {
try {
return db.find_uuid(ks, cf);
} catch (std::out_of_range& e) {
@@ -62,6 +61,11 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
}
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
return get_uuid(ks, cf, db);
}
future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {
auto uuid = get_uuid(name, ctx.db.local());
@@ -71,28 +75,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f) {
int64_t column_family_stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f) {
int64_t column_family_stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, f](database& db) {
// Histograms information is sample of the actual load
@@ -108,14 +112,14 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
@@ -126,7 +130,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
@@ -142,7 +146,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
@@ -153,7 +157,7 @@ static future<json::json_return_type> get_cf_rate_and_histogram(http_context& c
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db) {
utils::rate_moving_average_and_histogram res;
for (auto i : db.get_column_families()) {
@@ -250,12 +254,11 @@ class sum_ratio {
uint64_t _n = 0;
T _total = 0;
public:
future<> operator()(T value) {
void operator()(T value) {
if (value > 0) {
_total += value;
_n++;
}
return make_ready_future<>();
}
// Returns average value of all registered ratios.
T get() && {
@@ -404,11 +407,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);
});
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);
});
// FIXME: this refers to partitions, not rows.
@@ -453,67 +456,67 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);
});
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::pending_flushes);
return get_cf_stats(ctx, &column_family_stats::pending_flushes);
});
cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);
return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);
});
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family::stats::reads);
return get_cf_stats_count(ctx, &column_family_stats::reads);
});
cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);
return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);
});
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family::stats::writes);
return get_cf_stats_count(ctx, &column_family_stats::writes);
});
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);
});
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);
});
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
return get_cf_histogram(ctx, &column_family_stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
return get_cf_histogram(ctx, &column_family_stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -529,11 +532,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);
return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);
});
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_sstable_count);
return get_cf_stats(ctx, &column_family_stats::live_sstable_count);
});
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -792,25 +795,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_prepare;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_propose;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_commit;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -821,11 +824,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -843,13 +846,28 @@ void set_column_family(http_context& ctx, routes& r) {
return true;
});
cf::get_built_indexes.set(r, [](const_req) {
// FIXME
// Currently there are no index support
return std::vector<sstring>();
cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);
return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
std::set<sstring> vp;
for (auto b : vb) {
if (b.view.first == ks) {
vp.insert(b.view.second);
}
}
std::vector<sstring> res;
auto uuid = get_uuid(ks, cf_name, ctx.db.local());
column_family& cf = ctx.db.local().find_column_family(uuid);
res.reserve(cf.get_index_manager().list_indexes().size());
for (auto&& i : cf.get_index_manager().list_indexes()) {
if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {
res.emplace_back(i.metadata().name());
}
}
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
// FIXME
// Currently there are no information on the compression

View File

@@ -109,9 +109,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f);
int64_t column_family_stats::*f);
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f);
int64_t column_family_stats::*f);
}

View File

@@ -74,13 +74,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([&ctx](database& db) {
std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> tasks;
return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
return make_ready_future<>();
}).then([&tasks] {
return tasks;
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
return make_ready_future<>();
}).then([&tasks] {
return std::move(tasks);
});
});
}, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(
[](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {

View File

@@ -81,12 +81,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// hinted handoff is not supported currently,
// so we should return false
return make_ready_future<json::json_return_type>(false);
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
return make_ready_future<json::json_return_type>(enabled);
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
@@ -250,68 +247,40 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
});
sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
});
sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
});
sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
});
sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
});
sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
});
sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
});
sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
});
sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
});
sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -382,19 +351,11 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return sum_timer_stats(ctx.sp, &proxy::stats::write);
});
sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return empty moving average
return make_ready_future<json::json_return_type>(get_empty_moving_average());
return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
});
sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return empty moving average
return make_ready_future<json::json_return_type>(get_empty_moving_average());
return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
});
sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -27,6 +27,7 @@
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include "service/storage_service.hh"
#include "service/load_meter.hh"
#include "db/commitlog/commitlog.hh"
#include "gms/gossiper.hh"
#include "db/system_keyspace.hh"
@@ -55,26 +56,22 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
}
static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
std::vector<ss::token_range> res;
for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
r.endpoint_details.push(ed);
static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
res.push_back(r);
r.endpoint_details.push(ed);
}
return res;
return r;
}
void set_storage_service(http_context& ctx, routes& r) {
@@ -176,13 +173,13 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
ss::describe_any_ring.set(r, [&ctx](const_req req) {
return describe_ring("");
ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));
});
ss::describe_ring.set(r, [&ctx](const_req req) {
auto keyspace = validate_keyspace(ctx, req.param);
return describe_ring(keyspace);
ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));
});
ss::get_host_id_map.set(r, [](const_req req) {
@@ -192,11 +189,11 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
});
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.lmeter.get_load_map().then([] (auto&& load_map) {
std::vector<ss::map_string_double> res;
for (auto i : load_map) {
ss::map_string_double val;
@@ -254,6 +251,9 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_family.empty()) {
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.empty()) {
throw httpd::bad_param_exception("The keyspace of column families must be specified");
}
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
@@ -304,17 +304,24 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
if (!is_cleanup_allowed) {
return make_exception_future<json::json_return_type>(
std::runtime_error("Can not perform cleanup operation when topology changes"));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
@@ -598,9 +605,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::join_ring.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().join_ring().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
return make_ready_future<json::json_return_type>(json_void());
});
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
@@ -860,7 +865,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
});
ss::get_exceptions.set(r, [](const_req req) {

View File

@@ -30,6 +30,10 @@ namespace api {
namespace hs = httpd::system_json;
void set_system(http_context& ctx, routes& r) {
hs::get_system_uptime.set(r, [](const_req req) {
return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();
});
hs::get_all_logger_names.set(r, [](const_req req) {
return logging::logger_registry().get_all_logger_names();
});

View File

@@ -21,8 +21,8 @@
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "counters.hh"
#include "types.hh"
#include "types/collection.hh"
/// LSA mirator for cells with irrelevant type
///
@@ -148,35 +148,6 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,
{
}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
: _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
{
auto ptr_a = _data.get();
@@ -231,7 +202,7 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
size_t external_value_size = 0;
if (flags.get<data::cell::tags::external_data>()) {
if (flags.get<data::cell::tags::collection>()) {
external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
external_value_size = as_collection_mutation().data.size_bytes();
} else {
auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
external_value_size = cell_view.value_size();
@@ -244,6 +215,61 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view& acv) {
if (acv.is_live()) {
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
acv.is_counter_update()
? "counter_update_value=" + to_sstring(acv.counter_update_value())
: to_hex(acv.value().linearize()),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell& ac) {
return os << atomic_cell_view(ac);
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {
auto& type = acvp._type;
auto& acv = acvp._cell;
if (acv.is_live()) {
std::ostringstream cell_value_string_builder;
if (type.is_counter()) {
if (acv.is_counter_update()) {
cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();
} else {
cell_value_string_builder << "shards: ";
counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {
cell_value_string_builder << ::join(", ", ccv.shards());
});
}
} else {
cell_value_string_builder << type.to_string(acv.value().linearize());
}
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
cell_value_string_builder.str(),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell::printer& acp) {
return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {
if (!p._cell._data.get()) {
return os << "{ null atomic_cell_or_collection }";
@@ -253,9 +279,9 @@ std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::prin
if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {
os << "collection ";
auto cmv = p._cell.as_collection_mutation();
os << to_hex(cmv.data.linearize());
os << collection_mutation_view::printer(*p._cdef.type, cmv);
} else {
os << p._cell.as_atomic_cell(p._cdef);
os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));
}
return os << " }";
}

View File

@@ -153,6 +153,14 @@ public:
}
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
class printer {
const abstract_type& _type;
const atomic_cell_view& _cell;
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
@@ -219,30 +227,12 @@ public:
static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
friend class atomic_cell_or_collection;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
};
class collection_mutation_view;
// Represents a mutation of a collection. Actual format is determined by collection type,
// and is:
// set: list of atomic_cell
// map: list of pair<atomic_cell, bytes> (for key/value)
// list: tbd, probably ugly
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
collection_mutation() {}
collection_mutation(const collection_type_impl&, collection_mutation_view v);
collection_mutation(const collection_type_impl&, bytes_view bv);
operator collection_mutation_view() const;
};
class collection_mutation_view {
public:
atomic_cell_value_view data;
class printer : atomic_cell_view::printer {
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class column_definition;

View File

@@ -34,14 +34,12 @@ template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
auto m_view = ctype->deserialize_mutation_form(cell_bv);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
});
}
};

View File

@@ -22,6 +22,7 @@
#pragma once
#include "atomic_cell.hh"
#include "collection_mutation.hh"
#include "schema.hh"
#include "hashing.hh"

View File

@@ -33,6 +33,7 @@
#include "auth/resource.hh"
#include "seastarx.hh"
#include "exceptions/exceptions.hh"
namespace auth {
@@ -52,9 +53,9 @@ struct role_config_update final {
///
/// A logical argument error for a role-management operation.
///
class roles_argument_exception : public std::invalid_argument {
class roles_argument_exception : public exceptions::invalid_request_exception {
public:
using std::invalid_argument::invalid_argument;
using exceptions::invalid_request_exception::invalid_request_exception;
};
class role_already_exists : public roles_argument_exception {

View File

@@ -39,7 +39,7 @@
#include "db/consistency_level_type.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_listener.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
@@ -77,17 +77,23 @@ private:
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_drop_keyspace(const sstring& ks_name) override {
_authorizer.revoke_all(
// Do it in the background.
(void)_authorizer.revoke_all(
auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);
});
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
_authorizer.revoke_all(
// Do it in the background.
(void)_authorizer.revoke_all(
auth::make_data_resource(
ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped table: {}", e);
});
}
@@ -108,14 +114,14 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_manager& mm,
::service::migration_notifier& mn,
std::unique_ptr<authorizer> z,
std::unique_ptr<authenticator> a,
std::unique_ptr<role_manager> r)
: _permissions_cache_config(std::move(c))
, _permissions_cache(nullptr)
, _qp(qp)
, _migration_manager(mm)
, _mnotifier(mn)
, _authorizer(std::move(z))
, _authenticator(std::move(a))
, _role_manager(std::move(r))
@@ -135,18 +141,19 @@ service::service(
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_notifier& mn,
::service::migration_manager& mm,
const service_config& sc)
: service(
std::move(c),
qp,
mm,
mn,
create_object<authorizer>(sc.authorizer_java_name, qp, mm),
create_object<authenticator>(sc.authenticator_java_name, qp, mm),
create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {
}
future<> service::create_keyspace_if_missing() const {
future<> service::create_keyspace_if_missing(::service::migration_manager& mm) const {
auto& db = _qp.db();
if (!db.has_keyspace(meta::AUTH_KS)) {
@@ -160,15 +167,15 @@ future<> service::create_keyspace_if_missing() const {
// We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
// See issue #2129.
return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
return mm.announce_new_keyspace(ksm, api::min_timestamp, false);
}
return make_ready_future<>();
}
future<> service::start() {
return once_among_shards([this] {
return create_keyspace_if_missing();
future<> service::start(::service::migration_manager& mm) {
return once_among_shards([this, &mm] {
return create_keyspace_if_missing(mm);
}).then([this] {
return _role_manager->start().then([this] {
return when_all_succeed(_authorizer->start(), _authenticator->start());
@@ -177,7 +184,7 @@ future<> service::start() {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
return once_among_shards([this] {
_migration_manager.register_listener(_migration_listener.get());
_mnotifier.register_listener(_migration_listener.get());
return make_ready_future<>();
});
});
@@ -186,9 +193,9 @@ future<> service::start() {
future<> service::stop() {
// Only one of the shards has the listener registered, but let's try to
// unregister on each one just to make sure.
_migration_manager.unregister_listener(_migration_listener.get());
return _permissions_cache->stop().then([this] {
return _mnotifier.unregister_listener(_migration_listener.get()).then([this] {
return _permissions_cache->stop();
}).then([this] {
return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
});
}

View File

@@ -28,6 +28,7 @@
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include <seastar/util/bool_class.hh>
#include <seastar/core/sharded.hh>
#include "auth/authenticator.hh"
#include "auth/authorizer.hh"
@@ -42,6 +43,7 @@ class query_processor;
namespace service {
class migration_manager;
class migration_notifier;
class migration_listener;
}
@@ -76,13 +78,15 @@ public:
///
/// All state associated with access-control is stored externally to any particular instance of this class.
///
class service final {
/// peering_sharded_service inheritance is needed to be able to access shard local authentication service
/// given an object from another shard. Used for bouncing lwt requests to correct shard.
class service final : public seastar::peering_sharded_service<service> {
permissions_cache_config _permissions_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache;
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
::service::migration_notifier& _mnotifier;
std::unique_ptr<authorizer> _authorizer;
@@ -97,7 +101,7 @@ public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_manager&,
::service::migration_notifier&,
std::unique_ptr<authorizer>,
std::unique_ptr<authenticator>,
std::unique_ptr<role_manager>);
@@ -110,10 +114,11 @@ public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_notifier&,
::service::migration_manager&,
const service_config&);
future<> start();
future<> start(::service::migration_manager&);
future<> stop();
@@ -159,7 +164,7 @@ public:
private:
future<bool> has_existing_legacy_users() const;
future<> create_keyspace_if_missing() const;
future<> create_keyspace_if_missing(::service::migration_manager& mm) const;
};
future<bool> has_superuser(const service&, const authenticated_user&);

View File

@@ -101,8 +101,8 @@ static future<std::optional<record>> find_record(cql3::query_processor& qp, std:
return std::make_optional(
record{
row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
row.get_as<bool>("is_superuser"),
row.get_as<bool>("can_login"),
row.get_or<bool>("is_superuser", false),
row.get_or<bool>("can_login", false),
(row.has("member_of")
? row.get_set<sstring>("member_of")
: role_set())});
@@ -203,7 +203,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
role_config config;
config.is_superuser = row.get_as<bool>("super");
config.is_superuser = row.get_or<bool>("super", false);
config.can_login = true;
return do_with(

71
build_id.cc Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
#include "build_id.hh"
#include <fmt/printf.h>
#include <link.h>
#include <seastar/core/align.hh>
#include <sstream>
using namespace seastar;
static const Elf64_Nhdr* get_nt_build_id(dl_phdr_info* info) {
auto base = info->dlpi_addr;
const auto* h = info->dlpi_phdr;
auto num_headers = info->dlpi_phnum;
for (int i = 0; i != num_headers; ++i, ++h) {
if (h->p_type != PT_NOTE) {
continue;
}
auto* p = reinterpret_cast<const char*>(base) + h->p_vaddr;
auto* e = p + h->p_memsz;
while (p != e) {
const auto* n = reinterpret_cast<const Elf64_Nhdr*>(p);
if (n->n_type == NT_GNU_BUILD_ID) {
return n;
}
p += sizeof(Elf64_Nhdr);
p += n->n_namesz;
p = align_up(p, 4);
p += n->n_descsz;
p = align_up(p, 4);
}
}
assert(0 && "no NT_GNU_BUILD_ID note");
}
static int callback(dl_phdr_info* info, size_t size, void* data) {
std::string& ret = *(std::string*)data;
std::ostringstream os;
// The first DSO is always the main program, which has an empty name.
assert(strlen(info->dlpi_name) == 0);
auto* n = get_nt_build_id(info);
auto* p = reinterpret_cast<const char*>(n);
p += sizeof(Elf64_Nhdr);
p += n->n_namesz;
p = align_up(p, 4);
const char* desc = p;
for (unsigned i = 0; i < n->n_descsz; ++i) {
fmt::fprintf(os, "%02x", (unsigned char)*(desc + i));
}
ret = os.str();
return 1;
}
std::string get_build_id() {
std::string ret;
int r = dl_iterate_phdr(callback, &ret);
assert(r == 1);
return ret;
}

9
build_id.hh Normal file
View File

@@ -0,0 +1,9 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
#pragma once
#include <string>
std::string get_build_id();

View File

@@ -38,6 +38,7 @@ class bytes_ostream {
public:
using size_type = bytes::size_type;
using value_type = bytes::value_type;
using fragment_type = bytes_view;
static constexpr size_type max_chunk_size() { return 128 * 1024; }
private:
static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
@@ -93,6 +94,29 @@ public:
return _current != other._current;
}
};
using const_iterator = fragment_iterator;
class output_iterator {
public:
using iterator_category = std::output_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = bytes_ostream::value_type;
using pointer = bytes_ostream::value_type*;
using reference = bytes_ostream::value_type&;
friend class bytes_ostream;
private:
bytes_ostream* _ostream = nullptr;
private:
explicit output_iterator(bytes_ostream& os) : _ostream(&os) { }
public:
reference operator*() const { return *_ostream->write_place_holder(1); }
output_iterator& operator++() { return *this; }
output_iterator operator++(int) { return *this; }
};
private:
inline size_type current_space_left() const {
if (!_current) {
@@ -289,6 +313,11 @@ public:
return _size;
}
// For the FragmentRange concept
size_type size_bytes() const {
return _size;
}
bool empty() const {
return _size == 0;
}
@@ -326,6 +355,8 @@ public:
fragment_iterator begin() const { return { _begin.get() }; }
fragment_iterator end() const { return { nullptr }; }
output_iterator write_begin() { return output_iterator(*this); }
boost::iterator_range<fragment_iterator> fragments() const {
return { begin(), end() };
}

View File

@@ -61,6 +61,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
// - _underlying is engaged and fast-forwarded
reading_from_underlying,
end_of_stream
@@ -99,7 +100,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// forward progress is not guaranteed in case iterators are getting constantly invalidated.
bool _lower_bound_changed = false;
// Points to the underlying reader conforming to _schema,
// either to *_underlying_holder or _read_context->underlying().underlying().
flat_mutation_reader* _underlying = nullptr;
std::optional<flat_mutation_reader> _underlying_holder;
future<> do_fill_buffer(db::timeout_clock::time_point);
future<> ensure_underlying(db::timeout_clock::time_point);
void copy_from_cache_to_buffer();
future<> process_static_row(db::timeout_clock::time_point);
void move_to_end();
@@ -186,23 +193,22 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_
return make_ready_future<>();
} else {
_read_context->cache().on_row_miss();
return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
return ensure_underlying(timeout).then([this, timeout] {
return (*_underlying)(timeout).then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
});
});
}
}
inline
void cache_flat_mutation_reader::touch_partition() {
if (_snp->at_latest_version()) {
rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
_snp->tracker()->touch(last_dummy);
}
_snp->touch();
}
inline
@@ -232,14 +238,36 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t
});
}
inline
future<> cache_flat_mutation_reader::ensure_underlying(db::timeout_clock::time_point timeout) {
if (_underlying) {
return make_ready_future<>();
}
return _read_context->ensure_underlying(timeout).then([this, timeout] {
flat_mutation_reader& ctx_underlying = _read_context->underlying().underlying();
if (ctx_underlying.schema() != _schema) {
_underlying_holder = make_delegating_reader(ctx_underlying);
_underlying_holder->upgrade_schema(_schema);
_underlying = &*_underlying_holder;
} else {
_underlying = &ctx_underlying;
}
});
}
inline
future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::move_to_underlying) {
if (!_underlying) {
return ensure_underlying(timeout).then([this, timeout] {
return do_fill_buffer(timeout);
});
}
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
return _underlying->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
return read_from_underlying(timeout);
});
}
@@ -280,7 +308,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin
inline
future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
return consume_mutation_fragments_until(_read_context->underlying().underlying(),
return consume_mutation_fragments_until(*_underlying,
[this] { return _state != state::reading_from_underlying || is_buffer_full(); },
[this] (mutation_fragment mf) {
_read_context->cache().on_row_miss();

View File

@@ -35,6 +35,7 @@
#include "idl/uuid.dist.impl.hh"
#include "idl/keys.dist.impl.hh"
#include "idl/mutation.dist.impl.hh"
#include <iostream>
canonical_mutation::canonical_mutation(bytes data)
: _data(std::move(data))
@@ -79,7 +80,8 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
if (version == m.schema()->version()) {
auto partition_view = mutation_partition_view::from_view(mv.partition());
m.partition().apply(*m.schema(), partition_view, *m.schema());
mutation_application_stats app_stats;
m.partition().apply(*m.schema(), partition_view, *m.schema(), app_stats);
} else {
column_mapping cm = mv.mapping();
converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
@@ -88,3 +90,81 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
}
return m;
}
static sstring bytes_to_text(bytes_view bv) {
sstring ret(sstring::initialized_later(), bv.size());
std::copy_n(reinterpret_cast<const char*>(bv.data()), bv.size(), ret.data());
return ret;
}
std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm) {
auto in = ser::as_input_stream(cm._data);
auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());
column_mapping mapping = mv.mapping();
auto partition_view = mutation_partition_view::from_view(mv.partition());
fmt::print(os, "{{canonical_mutation: ");
fmt::print(os, "table_id {} schema_version {} ", mv.table_id(), mv.schema_version());
fmt::print(os, "partition_key {} ", mv.key());
class printing_visitor : public mutation_partition_view_virtual_visitor {
std::ostream& _os;
const column_mapping& _cm;
bool _first = true;
bool _in_row = false;
private:
void print_separator() {
if (!_first) {
fmt::print(_os, ", ");
}
_first = false;
}
public:
printing_visitor(std::ostream& os, const column_mapping& cm) : _os(os), _cm(cm) {}
virtual void accept_partition_tombstone(tombstone t) override {
print_separator();
fmt::print(_os, "partition_tombstone {}", t);
}
virtual void accept_static_cell(column_id id, atomic_cell ac) override {
print_separator();
auto&& entry = _cm.static_column_at(id);
fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
}
virtual void accept_static_cell(column_id id, collection_mutation_view cmv) override {
print_separator();
auto&& entry = _cm.static_column_at(id);
fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
}
virtual void accept_row_tombstone(range_tombstone rt) override {
print_separator();
fmt::print(_os, "row tombstone {}", rt);
}
virtual void accept_row(position_in_partition_view pipv, row_tombstone rt, row_marker rm, is_dummy, is_continuous) override {
if (_in_row) {
fmt::print(_os, "}}, ");
}
fmt::print(_os, "{{row {} tombstone {} marker {}", pipv, rt, rm);
_in_row = true;
_first = false;
}
virtual void accept_row_cell(column_id id, atomic_cell ac) override {
print_separator();
auto&& entry = _cm.regular_column_at(id);
fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
}
virtual void accept_row_cell(column_id id, collection_mutation_view cmv) override {
print_separator();
auto&& entry = _cm.regular_column_at(id);
fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
}
void finalize() {
if (_in_row) {
fmt::print(_os, "}}");
}
}
};
printing_visitor pv(os, mapping);
partition_view.accept(mapping, pv);
pv.finalize();
fmt::print(os, "}}");
return os;
}

View File

@@ -26,6 +26,7 @@
#include "database_fwd.hh"
#include "mutation_partition_visitor.hh"
#include "mutation_partition_serializer.hh"
#include <iosfwd>
// Immutable mutation form which can be read using any schema version of the same table.
// Safe to access from other shards via const&.
@@ -52,4 +53,5 @@ public:
const bytes& representation() const { return _data; }
friend std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm);
};

835
cdc/cdc.cc Normal file
View File

@@ -0,0 +1,835 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <utility>
#include <algorithm>
#include <boost/range/irange.hpp>
#include <seastar/util/defer.hh>
#include <seastar/core/thread.hh>
#include "cdc/cdc.hh"
#include "bytes.hh"
#include "database.hh"
#include "db/config.hh"
#include "dht/murmur3_partitioner.hh"
#include "partition_slice_builder.hh"
#include "schema.hh"
#include "schema_builder.hh"
#include "service/migration_listener.hh"
#include "service/storage_service.hh"
#include "types/tuple.hh"
#include "cql3/statements/select_statement.hh"
#include "cql3/multi_column_relation.hh"
#include "cql3/tuples.hh"
#include "log.hh"
#include "json.hh"
using locator::snitch_ptr;
using locator::token_metadata;
using locator::topology;
using seastar::sstring;
using service::migration_notifier;
using service::storage_proxy;
namespace std {
template<> struct hash<std::pair<net::inet_address, unsigned int>> {
std::size_t operator()(const std::pair<net::inet_address, unsigned int> &p) const {
return std::hash<net::inet_address>{}(p.first) ^ std::hash<int>{}(p.second);
}
};
}
using namespace std::chrono_literals;
static logging::logger cdc_log("cdc");
namespace cdc {
static schema_ptr create_log_schema(const schema&, std::optional<utils::UUID> = {});
static schema_ptr create_stream_description_table_schema(const schema&, std::optional<utils::UUID> = {});
static future<> populate_desc(db_context ctx, const schema& s);
}
class cdc::cdc_service::impl : service::migration_listener::empty_listener {
friend cdc_service;
db_context _ctxt;
bool _stopped = false;
public:
impl(db_context ctxt)
: _ctxt(std::move(ctxt))
{
_ctxt._migration_notifier.register_listener(this);
}
~impl() {
assert(_stopped);
}
future<> stop() {
return _ctxt._migration_notifier.unregister_listener(this).then([this] {
_stopped = true;
});
}
void on_before_create_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
if (schema.cdc_options().enabled()) {
auto& db = _ctxt._proxy.get_db().local();
auto logname = log_name(schema.cf_name());
if (!db.has_schema(schema.ks_name(), logname)) {
// in seastar thread
auto log_schema = create_log_schema(schema);
auto stream_desc_schema = create_stream_description_table_schema(schema);
auto& keyspace = db.find_keyspace(schema.ks_name());
auto log_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
}
void on_before_update_column_family(const schema& new_schema, const schema& old_schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
bool is_cdc = new_schema.cdc_options().enabled();
bool was_cdc = old_schema.cdc_options().enabled();
// we need to create or modify the log & stream schemas iff either we changed cdc status (was != is)
// or if cdc is on now unconditionally, since then any actual base schema changes will affect the column
// etc.
if (was_cdc || is_cdc) {
auto logname = log_name(old_schema.cf_name());
auto descname = desc_name(old_schema.cf_name());
auto& db = _ctxt._proxy.get_db().local();
auto& keyspace = db.find_keyspace(old_schema.ks_name());
auto log_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), logname).schema() : nullptr;
auto stream_desc_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), descname).schema() : nullptr;
if (!is_cdc) {
auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
return;
}
auto new_log_schema = create_log_schema(new_schema, log_schema ? std::make_optional(log_schema->id()) : std::nullopt);
auto new_stream_desc_schema = create_stream_description_table_schema(new_schema, stream_desc_schema ? std::make_optional(stream_desc_schema->id()) : std::nullopt);
auto log_mut = log_schema
? db::schema_tables::make_update_table_mutations(keyspace.metadata(), log_schema, new_log_schema, timestamp, false)
: db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_log_schema, timestamp)
;
auto stream_mut = stream_desc_schema
? db::schema_tables::make_update_table_mutations(keyspace.metadata(), stream_desc_schema, new_stream_desc_schema, timestamp, false)
: db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_stream_desc_schema, timestamp)
;
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
void on_before_drop_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
if (schema.cdc_options().enabled()) {
auto logname = log_name(schema.cf_name());
auto descname = desc_name(schema.cf_name());
auto& db = _ctxt._proxy.get_db().local();
auto& keyspace = db.find_keyspace(schema.ks_name());
auto log_schema = db.find_column_family(schema.ks_name(), logname).schema();
auto stream_desc_schema = db.find_column_family(schema.ks_name(), descname).schema();
auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {
// This callback is done on all shards. Only do the work once.
if (engine().cpu_id() != 0) {
return;
}
auto& db = _ctxt._proxy.get_db().local();
auto& cf = db.find_column_family(ks_name, cf_name);
auto schema = cf.schema();
if (schema->cdc_options().enabled()) {
populate_desc(_ctxt, *schema).get();
}
}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed) override {
on_create_column_family(ks_name, cf_name);
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {}
future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
lowres_clock::time_point timeout,
std::vector<mutation>&& mutations
);
template<typename Iter>
future<> append_mutations(Iter i, Iter e, schema_ptr s, lowres_clock::time_point, std::vector<mutation>&);
};
cdc::cdc_service::cdc_service(service::storage_proxy& proxy)
: cdc_service(db_context::builder(proxy).build())
{}
cdc::cdc_service::cdc_service(db_context ctxt)
: _impl(std::make_unique<impl>(std::move(ctxt)))
{
_impl->_ctxt._proxy.set_cdc_service(this);
}
future<> cdc::cdc_service::stop() {
return _impl->stop();
}
cdc::cdc_service::~cdc_service() = default;
cdc::options::options(const std::map<sstring, sstring>& map) {
if (map.find("enabled") == std::end(map)) {
return;
}
for (auto& p : map) {
if (p.first == "enabled") {
_enabled = p.second == "true";
} else if (p.first == "preimage") {
_preimage = p.second == "true";
} else if (p.first == "postimage") {
_postimage = p.second == "true";
} else if (p.first == "ttl") {
_ttl = std::stoi(p.second);
} else {
throw exceptions::configuration_exception("Invalid CDC option: " + p.first);
}
}
}
std::map<sstring, sstring> cdc::options::to_map() const {
if (!_enabled) {
return {};
}
return {
{ "enabled", _enabled ? "true" : "false" },
{ "preimage", _preimage ? "true" : "false" },
{ "postimage", _postimage ? "true" : "false" },
{ "ttl", std::to_string(_ttl) },
};
}
sstring cdc::options::to_sstring() const {
return json::to_json(to_map());
}
bool cdc::options::operator==(const options& o) const {
return _enabled == o._enabled && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl;
}
bool cdc::options::operator!=(const options& o) const {
return !(*this == o);
}
namespace cdc {
using operation_native_type = std::underlying_type_t<operation>;
using column_op_native_type = std::underlying_type_t<column_op>;
sstring log_name(const sstring& table_name) {
static constexpr auto cdc_log_suffix = "_scylla_cdc_log";
return table_name + cdc_log_suffix;
}
sstring desc_name(const sstring& table_name) {
static constexpr auto cdc_desc_suffix = "_scylla_cdc_desc";
return table_name + cdc_desc_suffix;
}
static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid) {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.set_comment(sprint("CDC log for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("stream_id", uuid_type, column_kind::partition_key);
b.with_column("time", timeuuid_type, column_kind::clustering_key);
b.with_column("batch_seq_no", int32_type, column_kind::clustering_key);
b.with_column("operation", data_type_for<operation_native_type>());
b.with_column("ttl", long_type);
auto add_columns = [&] (const schema::const_iterator_range_type& columns, bool is_data_col = false) {
for (const auto& column : columns) {
auto type = column.type;
if (is_data_col) {
type = tuple_type_impl::get_instance({ /* op */ data_type_for<column_op_native_type>(), /* value */ type, /* ttl */long_type});
}
b.with_column("_" + column.name(), type);
}
};
add_columns(s.partition_key_columns());
add_columns(s.clustering_key_columns());
add_columns(s.static_columns(), true);
add_columns(s.regular_columns(), true);
if (uuid) {
b.set_uuid(*uuid);
}
return b.build();
}
static schema_ptr create_stream_description_table_schema(const schema& s, std::optional<utils::UUID> uuid) {
schema_builder b(s.ks_name(), desc_name(s.cf_name()));
b.set_comment(sprint("CDC description for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("node_ip", inet_addr_type, column_kind::partition_key);
b.with_column("shard_id", int32_type, column_kind::partition_key);
b.with_column("created_at", timestamp_type, column_kind::clustering_key);
b.with_column("stream_id", uuid_type);
if (uuid) {
b.set_uuid(*uuid);
}
return b.build();
}
// This function assumes setup_stream_description_table was called on |s| before the call to this
// function.
static future<> populate_desc(db_context ctx, const schema& s) {
auto& db = ctx._proxy.get_db().local();
auto desc_schema =
db.find_schema(s.ks_name(), desc_name(s.cf_name()));
auto log_schema =
db.find_schema(s.ks_name(), log_name(s.cf_name()));
auto belongs_to = [&](const gms::inet_address& endpoint,
const unsigned int shard_id,
const int shard_count,
const unsigned int ignore_msb_bits,
const utils::UUID& stream_id) {
const auto log_pk = partition_key::from_singular(*log_schema,
data_value(stream_id));
const auto token = ctx._partitioner.decorate_key(*log_schema, log_pk).token();
if (ctx._token_metadata.get_endpoint(ctx._token_metadata.first_token(token)) != endpoint) {
return false;
}
const auto owning_shard_id = dht::murmur3_partitioner(shard_count, ignore_msb_bits).shard_of(token);
return owning_shard_id == shard_id;
};
std::vector<mutation> mutations;
const auto ts = api::new_timestamp();
const auto ck = clustering_key::from_single_value(
*desc_schema, timestamp_type->decompose(ts));
auto cdef = desc_schema->get_column_definition(to_bytes("stream_id"));
for (const auto& dc : ctx._token_metadata.get_topology().get_datacenter_endpoints()) {
for (const auto& endpoint : dc.second) {
const auto decomposed_ip = inet_addr_type->decompose(endpoint.addr());
const unsigned int shard_count = ctx._snitch->get_shard_count(endpoint);
const unsigned int ignore_msb_bits = ctx._snitch->get_ignore_msb_bits(endpoint);
for (unsigned int shard_id = 0; shard_id < shard_count; ++shard_id) {
const auto pk = partition_key::from_exploded(
*desc_schema, { decomposed_ip, int32_type->decompose(static_cast<int>(shard_id)) });
mutations.emplace_back(desc_schema, pk);
auto stream_id = utils::make_random_uuid();
while (!belongs_to(endpoint, shard_id, shard_count, ignore_msb_bits, stream_id)) {
stream_id = utils::make_random_uuid();
}
auto value = atomic_cell::make_live(*uuid_type,
ts,
uuid_type->decompose(stream_id));
mutations.back().set_cell(ck, *cdef, std::move(value));
}
}
}
return ctx._proxy.mutate(std::move(mutations),
db::consistency_level::QUORUM,
db::no_timeout,
nullptr,
empty_service_permit());
}
db_context::builder::builder(service::storage_proxy& proxy)
: _proxy(proxy)
{}
db_context::builder& db_context::builder::with_migration_notifier(service::migration_notifier& migration_notifier) {
_migration_notifier = migration_notifier;
return *this;
}
db_context::builder& db_context::builder::with_token_metadata(locator::token_metadata& token_metadata) {
_token_metadata = token_metadata;
return *this;
}
db_context::builder& db_context::builder::with_snitch(locator::snitch_ptr& snitch) {
_snitch = snitch;
return *this;
}
db_context::builder& db_context::builder::with_partitioner(dht::i_partitioner& partitioner) {
_partitioner = partitioner;
return *this;
}
db_context db_context::builder::build() {
return db_context{
_proxy,
_migration_notifier ? _migration_notifier->get() : service::get_local_storage_service().get_migration_notifier(),
_token_metadata ? _token_metadata->get() : service::get_local_storage_service().get_token_metadata(),
_snitch ? _snitch->get() : locator::i_endpoint_snitch::get_local_snitch_ptr(),
_partitioner ? _partitioner->get() : dht::global_partitioner()
};
}
class transformer final {
public:
using streams_type = std::unordered_map<std::pair<net::inet_address, unsigned int>, utils::UUID>;
private:
db_context _ctx;
schema_ptr _schema;
schema_ptr _log_schema;
utils::UUID _time;
bytes _decomposed_time;
::shared_ptr<const transformer::streams_type> _streams;
const column_definition& _op_col;
ttl_opt _cdc_ttl_opt;
clustering_key set_pk_columns(const partition_key& pk, int batch_no, mutation& m) const {
const auto log_ck = clustering_key::from_exploded(
*m.schema(), { _decomposed_time, int32_type->decompose(batch_no) });
auto pk_value = pk.explode(*_schema);
size_t pos = 0;
for (const auto& column : _schema->partition_key_columns()) {
assert (pos < pk_value.size());
auto cdef = m.schema()->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(pk_value[pos]),
_cdc_ttl_opt);
m.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
return log_ck;
}
void set_operation(const clustering_key& ck, operation op, mutation& m) const {
m.set_cell(ck, _op_col, atomic_cell::make_live(*_op_col.type, _time.timestamp(), _op_col.type->decompose(operation_native_type(op)), _cdc_ttl_opt));
}
partition_key stream_id(const net::inet_address& ip, unsigned int shard_id) const {
auto it = _streams->find(std::make_pair(ip, shard_id));
if (it == std::end(*_streams)) {
throw std::runtime_error(format("No stream found for node {} and shard {}", ip, shard_id));
}
return partition_key::from_exploded(*_log_schema, { uuid_type->decompose(it->second) });
}
public:
transformer(db_context ctx, schema_ptr s, ::shared_ptr<const transformer::streams_type> streams)
: _ctx(ctx)
, _schema(std::move(s))
, _log_schema(ctx._proxy.get_db().local().find_schema(_schema->ks_name(), log_name(_schema->cf_name())))
, _time(utils::UUID_gen::get_time_UUID())
, _decomposed_time(timeuuid_type->decompose(_time))
, _streams(std::move(streams))
, _op_col(*_log_schema->get_column_definition(to_bytes("operation")))
{
if (_schema->cdc_options().ttl()) {
_cdc_ttl_opt = std::chrono::seconds(_schema->cdc_options().ttl());
}
}
// TODO: is pre-image data based on query enough. We only have actual column data. Do we need
// more details like tombstones/ttl? Probably not but keep in mind.
mutation transform(const mutation& m, const cql3::untyped_result_set* rs = nullptr) const {
auto& t = m.token();
auto&& ep = _ctx._token_metadata.get_endpoint(
_ctx._token_metadata.first_token(t));
if (!ep) {
throw std::runtime_error(format("No owner found for key {}", m.decorated_key()));
}
auto shard_id = dht::murmur3_partitioner(_ctx._snitch->get_shard_count(*ep), _ctx._snitch->get_ignore_msb_bits(*ep)).shard_of(t);
mutation res(_log_schema, stream_id(ep->addr(), shard_id));
auto& p = m.partition();
if (p.partition_tombstone()) {
// Partition deletion
auto log_ck = set_pk_columns(m.key(), 0, res);
set_operation(log_ck, operation::partition_delete, res);
} else if (!p.row_tombstones().empty()) {
// range deletion
int batch_no = 0;
for (auto& rt : p.row_tombstones()) {
auto set_bound = [&] (const clustering_key& log_ck, const clustering_key_prefix& ckp) {
auto exploded = ckp.explode(*_schema);
size_t pos = 0;
for (const auto& column : _schema->clustering_key_columns()) {
if (pos >= exploded.size()) {
break;
}
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(exploded[pos]),
_cdc_ttl_opt);
res.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
};
{
auto log_ck = set_pk_columns(m.key(), batch_no, res);
set_bound(log_ck, rt.start);
// TODO: separate inclusive/exclusive range
set_operation(log_ck, operation::range_delete_start, res);
++batch_no;
}
{
auto log_ck = set_pk_columns(m.key(), batch_no, res);
set_bound(log_ck, rt.end);
// TODO: separate inclusive/exclusive range
set_operation(log_ck, operation::range_delete_end, res);
++batch_no;
}
}
} else {
// should be update or deletion
int batch_no = 0;
for (const rows_entry& r : p.clustered_rows()) {
auto ck_value = r.key().explode(*_schema);
std::optional<clustering_key> pikey;
const cql3::untyped_result_set_row * pirow = nullptr;
if (rs) {
for (auto& utr : *rs) {
bool match = true;
for (auto& c : _schema->clustering_key_columns()) {
auto rv = utr.get_view(c.name_as_text());
auto cv = r.key().get_component(*_schema, c.component_index());
if (rv != cv) {
match = false;
break;
}
}
if (match) {
pikey = set_pk_columns(m.key(), batch_no, res);
set_operation(*pikey, operation::pre_image, res);
pirow = &utr;
++batch_no;
break;
}
}
}
auto log_ck = set_pk_columns(m.key(), batch_no, res);
size_t pos = 0;
for (const auto& column : _schema->clustering_key_columns()) {
assert (pos < ck_value.size());
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
res.set_cell(log_ck, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
if (pirow) {
assert(pirow->has(column.name_as_text()));
res.set_cell(*pikey, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
}
++pos;
}
std::vector<bytes_opt> values(3);
auto process_cells = [&](const row& r, column_kind ckind) {
r.for_each_cell([&](column_id id, const atomic_cell_or_collection& cell) {
auto& cdef = _schema->column_at(ckind, id);
auto* dst = _log_schema->get_column_definition(to_bytes("_" + cdef.name()));
// todo: collections.
if (cdef.is_atomic()) {
column_op op;
values[1] = values[2] = std::nullopt;
auto view = cell.as_atomic_cell(cdef);
if (view.is_live()) {
op = column_op::set;
values[1] = view.value().linearize();
if (view.is_live_and_has_ttl()) {
values[2] = long_type->decompose(data_value(view.ttl().count()));
}
} else {
op = column_op::del;
}
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(op)));
res.set_cell(log_ck, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
if (pirow && pirow->has(cdef.name_as_text())) {
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(column_op::set)));
values[1] = pirow->get_blob(cdef.name_as_text());
values[2] = std::nullopt;
assert(std::addressof(res.partition().clustered_row(*_log_schema, *pikey)) != std::addressof(res.partition().clustered_row(*_log_schema, log_ck)));
assert(pikey->explode() != log_ck.explode());
res.set_cell(*pikey, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
}
} else {
cdc_log.warn("Non-atomic cell ignored {}.{}:{}", _schema->ks_name(), _schema->cf_name(), cdef.name_as_text());
}
});
};
process_cells(r.row().cells(), column_kind::regular_column);
process_cells(p.static_row().get(), column_kind::static_column);
set_operation(log_ck, operation::update, res);
++batch_no;
}
}
return res;
}
static db::timeout_clock::time_point default_timeout() {
return db::timeout_clock::now() + 10s;
}
future<lw_shared_ptr<cql3::untyped_result_set>> pre_image_select(
service::client_state& client_state,
db::consistency_level cl,
const mutation& m)
{
auto& p = m.partition();
if (p.partition_tombstone() || !p.row_tombstones().empty() || p.clustered_rows().empty()) {
return make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>();
}
dht::partition_range_vector partition_ranges{dht::partition_range(m.decorated_key())};
auto&& pc = _schema->partition_key_columns();
auto&& cc = _schema->clustering_key_columns();
std::vector<query::clustering_range> bounds;
if (cc.empty()) {
bounds.push_back(query::clustering_range::make_open_ended_both_sides());
} else {
for (const rows_entry& r : p.clustered_rows()) {
auto& ck = r.key();
bounds.push_back(query::clustering_range::make_singular(ck));
}
}
std::vector<const column_definition*> columns;
columns.reserve(_schema->all_columns().size());
std::transform(pc.begin(), pc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
std::transform(cc.begin(), cc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
query::column_id_vector static_columns, regular_columns;
auto sk = column_kind::static_column;
auto rk = column_kind::regular_column;
// TODO: this assumes all mutations touch the same set of columns. This might not be true, and we may need to do more horrible set operation here.
for (auto& [r, cids, kind] : { std::tie(p.static_row().get(), static_columns, sk), std::tie(p.clustered_rows().begin()->row().cells(), regular_columns, rk) }) {
r.for_each_cell([&](column_id id, const atomic_cell_or_collection&) {
auto& cdef =_schema->column_at(kind, id);
cids.emplace_back(id);
columns.emplace_back(&cdef);
});
}
auto selection = cql3::selection::selection::for_columns(_schema, std::move(columns));
auto partition_slice = query::partition_slice(std::move(bounds), std::move(static_columns), std::move(regular_columns), selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_partitions);
return _ctx._proxy.query(_schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
[s = _schema, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) -> lw_shared_ptr<cql3::untyped_result_set> {
cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *s, *selection));
auto result_set = builder.build();
if (!result_set || result_set->empty()) {
return {};
}
return make_lw_shared<cql3::untyped_result_set>(*result_set);
});
}
};
// This class is used to build a mapping from <node ip, shard id> to stream_id
// It is used as a consumer for rows returned by the query to CDC Description Table
class streams_builder {
const schema& _schema;
transformer::streams_type _streams;
net::inet_address _node_ip = net::inet_address();
unsigned int _shard_id = 0;
api::timestamp_type _latest_row_timestamp = api::min_timestamp;
utils::UUID _latest_row_stream_id = utils::UUID();
public:
streams_builder(const schema& s) : _schema(s) {}
void accept_new_partition(const partition_key& key, uint32_t row_count) {
auto exploded = key.explode(_schema);
_node_ip = value_cast<net::inet_address>(inet_addr_type->deserialize(exploded[0]));
_shard_id = static_cast<unsigned int>(value_cast<int>(int32_type->deserialize(exploded[1])));
_latest_row_timestamp = api::min_timestamp;
_latest_row_stream_id = utils::UUID();
}
void accept_new_partition(uint32_t row_count) {
assert(false);
}
void accept_new_row(
const clustering_key& key,
const query::result_row_view& static_row,
const query::result_row_view& row) {
auto row_iterator = row.iterator();
api::timestamp_type timestamp = value_cast<db_clock::time_point>(
timestamp_type->deserialize(key.explode(_schema)[0])).time_since_epoch().count();
if (timestamp <= _latest_row_timestamp) {
return;
}
_latest_row_timestamp = timestamp;
for (auto&& cdef : _schema.regular_columns()) {
if (cdef.name_as_text() != "stream_id") {
row_iterator.skip(cdef);
continue;
}
auto val_opt = row_iterator.next_atomic_cell();
assert(val_opt);
val_opt->value().with_linearized([&] (bytes_view bv) {
_latest_row_stream_id = value_cast<utils::UUID>(uuid_type->deserialize(bv));
});
}
}
void accept_new_row(const query::result_row_view& static_row, const query::result_row_view& row) {
assert(false);
}
void accept_partition_end(const query::result_row_view& static_row) {
_streams.emplace(std::make_pair(_node_ip, _shard_id), _latest_row_stream_id);
}
transformer::streams_type build() {
return std::move(_streams);
}
};
static future<::shared_ptr<transformer::streams_type>> get_streams(
db_context ctx,
const sstring& ks_name,
const sstring& cf_name,
lowres_clock::time_point timeout,
service::query_state& qs) {
auto s =
ctx._proxy.get_db().local().find_schema(ks_name, desc_name(cf_name));
query::read_command cmd(
s->id(),
s->version(),
partition_slice_builder(*s).with_no_static_columns().build());
return ctx._proxy.query(
s,
make_lw_shared(std::move(cmd)),
{dht::partition_range::make_open_ended_both_sides()},
db::consistency_level::QUORUM,
{timeout, qs.get_permit(), qs.get_client_state()}).then([s = std::move(s)] (auto qr) mutable {
return query::result_view::do_with(*qr.query_result,
[s = std::move(s)] (query::result_view v) {
auto slice = partition_slice_builder(*s)
.with_no_static_columns()
.build();
streams_builder builder{ *s };
v.consume(slice, builder);
return ::make_shared<transformer::streams_type>(builder.build());
});
});
}
template <typename Func>
future<std::vector<mutation>>
transform_mutations(std::vector<mutation>& muts, decltype(muts.size()) batch_size, Func&& f) {
return parallel_for_each(
boost::irange(static_cast<decltype(muts.size())>(0), muts.size(), batch_size),
std::move(f))
.then([&muts] () mutable { return std::move(muts); });
}
} // namespace cdc
future<std::tuple<std::vector<mutation>, cdc::result_callback>>
cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
// we do all this because in the case of batches, we can have mixed schemas.
auto e = mutations.end();
auto i = std::find_if(mutations.begin(), e, [](const mutation& m) {
return m.schema()->cdc_options().enabled();
});
if (i == e) {
return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
}
mutations.reserve(2 * mutations.size());
return do_with(std::move(mutations), service::query_state(service::client_state::for_internal_calls(), empty_service_permit()), [this, timeout, i](std::vector<mutation>& mutations, service::query_state& qs) {
return transform_mutations(mutations, 1, [this, &mutations, timeout, &qs] (int idx) {
auto& m = mutations[idx];
auto s = m.schema();
if (!s->cdc_options().enabled()) {
return make_ready_future<>();
}
// for batches/multiple mutations this is super inefficient. either partition the mutation set by schema
// and re-use streams, or probably better: add a cache so this lookup is a noop on second mutation
return get_streams(_ctxt, s->ks_name(), s->cf_name(), timeout, qs).then([this, s = std::move(s), &qs, &mutations, idx](::shared_ptr<transformer::streams_type> streams) mutable {
auto& m = mutations[idx]; // should not really need because of reserve, but lets be conservative
transformer trans(_ctxt, s, streams);
if (!s->cdc_options().preimage()) {
mutations.emplace_back(trans.transform(m));
return make_ready_future<>();
}
// Note: further improvement here would be to coalesce the pre-image selects into one
// iff a batch contains several modifications to the same table. Otoh, batch is rare(?)
// so this is premature.
auto f = trans.pre_image_select(qs.get_client_state(), db::consistency_level::LOCAL_QUORUM, m);
return f.then([trans = std::move(trans), &mutations, idx] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
mutations.push_back(trans.transform(mutations[idx], rs.get()));
});
});
}).then([](std::vector<mutation> mutations) {
return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
});
});
}
bool cdc::cdc_service::needs_cdc_augmentation(const std::vector<mutation>& mutations) const {
return std::any_of(mutations.begin(), mutations.end(), [](const mutation& m) {
return m.schema()->cdc_options().enabled();
});
}
future<std::tuple<std::vector<mutation>, cdc::result_callback>>
cdc::cdc_service::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
return _impl->augment_mutation_call(timeout, std::move(mutations));
}

142
cdc/cdc.hh Normal file
View File

@@ -0,0 +1,142 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <optional>
#include <map>
#include <string>
#include <vector>
#include <seastar/core/future.hh>
#include <seastar/core/lowres_clock.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sstring.hh>
#include "exceptions/exceptions.hh"
#include "timestamp.hh"
#include "cdc_options.hh"
class schema;
using schema_ptr = seastar::lw_shared_ptr<const schema>;
namespace locator {
class snitch_ptr;
class token_metadata;
} // namespace locator
namespace service {
class migration_notifier;
class storage_proxy;
class query_state;
} // namespace service
namespace dht {
class i_partitioner;
} // namespace dht
class mutation;
class partition_key;
namespace cdc {
class db_context;
// Callback to be invoked on mutation finish to fix
// the whole bit about post-image.
// TODO: decide on what the parameters are to be for this.
using result_callback = std::function<future<>()>;
/// \brief CDC service, responsible for schema listeners
///
/// CDC service will listen for schema changes and iff CDC is enabled/changed
/// create/modify/delete corresponding log tables etc as part of the schema change.
///
class cdc_service {
class impl;
std::unique_ptr<impl> _impl;
public:
future<> stop();
cdc_service(service::storage_proxy&);
cdc_service(db_context);
~cdc_service();
// If any of the mutations are cdc enabled, optionally selects preimage, and adds the
// appropriate augments to set the log entries.
// Iff post-image is enabled for any of these, a non-empty callback is also
// returned to be invoked post the mutation query.
future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
lowres_clock::time_point timeout,
std::vector<mutation>&& mutations
);
bool needs_cdc_augmentation(const std::vector<mutation>&) const;
};
struct db_context final {
service::storage_proxy& _proxy;
service::migration_notifier& _migration_notifier;
locator::token_metadata& _token_metadata;
locator::snitch_ptr& _snitch;
dht::i_partitioner& _partitioner;
class builder final {
service::storage_proxy& _proxy;
std::optional<std::reference_wrapper<service::migration_notifier>> _migration_notifier;
std::optional<std::reference_wrapper<locator::token_metadata>> _token_metadata;
std::optional<std::reference_wrapper<locator::snitch_ptr>> _snitch;
std::optional<std::reference_wrapper<dht::i_partitioner>> _partitioner;
public:
builder(service::storage_proxy& proxy);
builder& with_migration_notifier(service::migration_notifier& migration_notifier);
builder& with_token_metadata(locator::token_metadata& token_metadata);
builder& with_snitch(locator::snitch_ptr& snitch);
builder& with_partitioner(dht::i_partitioner& partitioner);
db_context build();
};
};
// cdc log table operation
enum class operation : int8_t {
// note: these values will eventually be read by a third party, probably not privvy to this
// enum decl, so don't change the constant values (or the datatype).
pre_image = 0, update = 1, row_delete = 2, range_delete_start = 3, range_delete_end = 4, partition_delete = 5
};
// cdc log data column operation
enum class column_op : int8_t {
// same as "operation". Do not edit values or type/type unless you _really_ want to.
set = 0, del = 1, add = 2,
};
seastar::sstring log_name(const seastar::sstring& table_name);
seastar::sstring desc_name(const seastar::sstring& table_name);
} // namespace cdc

51
cdc/cdc_options.hh Normal file
View File

@@ -0,0 +1,51 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <map>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace cdc {
class options final {
bool _enabled = false;
bool _preimage = false;
bool _postimage = false;
int _ttl = 86400; // 24h in seconds
public:
options() = default;
options(const std::map<sstring, sstring>& map);
std::map<sstring, sstring> to_map() const;
sstring to_sstring() const;
bool enabled() const { return _enabled; }
bool preimage() const { return _preimage; }
bool postimage() const { return _postimage; }
int ttl() const { return _ttl; }
bool operator==(const options& o) const;
bool operator!=(const options& o) const;
};
} // namespace cdc

View File

@@ -68,7 +68,7 @@ public:
public:
explicit iterator(const mutation_partition& mp)
: _mp(mp)
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row())
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row().get())
{ }
iterator(const mutation_partition& mp, mutation_partition::rows_type::const_iterator it)

476
collection_mutation.cc Normal file
View File

@@ -0,0 +1,476 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "types/collection.hh"
#include "types/user.hh"
#include "concrete_types.hh"
#include "atomic_cell_or_collection.hh"
#include "mutation_partition.hh"
#include "compaction_garbage_collector.hh"
#include "combine.hh"
#include "collection_mutation.hh"
collection_mutation::collection_mutation(const abstract_type& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator())) {}
collection_mutation::collection_mutation(const abstract_type& type, const bytes_ostream& data)
: _data(imr_object_type::make(data::cell::make_collection(fragment_range_view(data)), &type.imr_state().lsa_migrator())) {}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
bool collection_mutation_view::is_empty() const {
auto in = collection_mutation_input_stream(data);
auto has_tomb = in.read_trivial<bool>();
return !has_tomb && in.read_trivial<uint32_t>() == 0;
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
static bool is_any_live(const atomic_cell_value_view& data, tombstone tomb, gc_clock::time_point now, F&& read_cell_type_info) {
auto in = collection_mutation_input_stream(data);
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
auto ts = in.read_trivial<api::timestamp_type>();
auto ttl = in.read_trivial<gc_clock::duration::rep>();
tomb.apply(tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))});
}
auto nr = in.read_trivial<uint32_t>();
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
if (value.is_live(tomb, now, false)) {
return true;
}
}
return false;
}
bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone tomb, gc_clock::time_point now) const {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::is_any_live(data, tomb, now, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
in.skip(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::is_any_live(data, tomb, now, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
auto key = in.read(key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
[&] (const abstract_type& o) -> bool {
throw std::runtime_error(format("collection_mutation_view::is_any_live: unknown type {}", o.name()));
}
));
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
static api::timestamp_type last_update(const atomic_cell_value_view& data, F&& read_cell_type_info) {
auto in = collection_mutation_input_stream(data);
api::timestamp_type max = api::missing_timestamp;
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
max = std::max(max, in.read_trivial<api::timestamp_type>());
(void)in.read_trivial<gc_clock::duration::rep>();
}
auto nr = in.read_trivial<uint32_t>();
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
max = std::max(value.timestamp(), max);
}
return max;
}
api::timestamp_type collection_mutation_view::last_update(const abstract_type& type) const {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::last_update(data, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
in.skip(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::last_update(data, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
auto key = in.read(key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
[&] (const abstract_type& o) -> api::timestamp_type {
throw std::runtime_error(format("collection_mutation_view::last_update: unknown type {}", o.name()));
}
));
}
std::ostream& operator<<(std::ostream& os, const collection_mutation_view::printer& cmvp) {
fmt::print(os, "{{collection_mutation_view ");
cmvp._cmv.with_deserialized(cmvp._type, [&os, &type = cmvp._type] (const collection_mutation_view_description& cmvd) {
bool first = true;
fmt::print(os, "tombstone {}", cmvd.tomb);
visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto&& key_type = ctype.name_comparator();
auto&& value_type = ctype.value_comparator();
for (auto&& [key, value] : cmvd.cells) {
if (!first) {
fmt::print(os, ", ");
}
fmt::print(os, "{}: {}", key_type->to_string(key), atomic_cell_view::printer(*value_type, value));
first = false;
}
},
[&] (const user_type_impl& utype) {
for (auto&& [raw_idx, value] : cmvd.cells) {
if (!first) {
fmt::print(os, ", ");
}
auto idx = deserialize_field_index(raw_idx);
fmt::print(os, "{}: {}", utype.field_name_as_string(idx), atomic_cell_view::printer(*utype.type(idx), value));
first = false;
}
},
[&] (const abstract_type& o) {
// Not throwing exception in this likely-to-be debug context
fmt::print(os, "attempted to pretty-print collection_mutation_view_description with type {}", o.name());
}
));
});
fmt::print(os, "}}");
return os;
}
collection_mutation_description
collection_mutation_view_description::materialize(const abstract_type& type) const {
collection_mutation_description m;
m.tomb = tomb;
m.cells.reserve(cells.size());
visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& value_type = *ctype.value_comparator();
for (auto&& e : cells) {
m.cells.emplace_back(to_bytes(e.first), atomic_cell(value_type, e.second));
}
},
[&] (const user_type_impl& utype) {
for (auto&& e : cells) {
m.cells.emplace_back(to_bytes(e.first), atomic_cell(*utype.type(deserialize_field_index(e.first)), e.second));
}
},
[&] (const abstract_type& o) {
throw std::runtime_error(format("attempted to materialize collection_mutation_view_description with type {}", o.name()));
}
));
return m;
}
bool collection_mutation_description::compact_and_expire(column_id id, row_tombstone base_tomb, gc_clock::time_point query_time,
can_gc_fn& can_gc, gc_clock::time_point gc_before, compaction_garbage_collector* collector)
{
bool any_live = false;
auto t = tomb;
tombstone purged_tomb;
if (tomb <= base_tomb.regular()) {
tomb = tombstone();
} else if (tomb.deletion_time < gc_before && can_gc(tomb)) {
purged_tomb = tomb;
tomb = tombstone();
}
t.apply(base_tomb.regular());
utils::chunked_vector<std::pair<bytes, atomic_cell>> survivors;
utils::chunked_vector<std::pair<bytes, atomic_cell>> losers;
for (auto&& name_and_cell : cells) {
atomic_cell& cell = name_and_cell.second;
auto cannot_erase_cell = [&] {
return cell.deletion_time() >= gc_before || !can_gc(tombstone(cell.timestamp(), cell.deletion_time()));
};
if (cell.is_covered_by(t, false) || cell.is_covered_by(base_tomb.shadowable().tomb(), false)) {
continue;
}
if (cell.has_expired(query_time)) {
if (cannot_erase_cell()) {
survivors.emplace_back(std::make_pair(
std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
} else if (collector) {
losers.emplace_back(std::pair(
std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
}
} else if (!cell.is_live()) {
if (cannot_erase_cell()) {
survivors.emplace_back(std::move(name_and_cell));
} else if (collector) {
losers.emplace_back(std::move(name_and_cell));
}
} else {
any_live |= true;
survivors.emplace_back(std::move(name_and_cell));
}
}
if (collector) {
collector->collect(id, collection_mutation_description{purged_tomb, std::move(losers)});
}
cells = std::move(survivors);
return any_live;
}
template <typename Iterator>
static collection_mutation serialize_collection_mutation(
const abstract_type& type,
const tombstone& tomb,
boost::iterator_range<Iterator> cells) {
auto element_size = [] (size_t c, auto&& e) -> size_t {
return c + 8 + e.first.size() + e.second.serialize().size();
};
auto size = accumulate(cells, (size_t)4, element_size);
size += 1;
if (tomb) {
size += sizeof(tomb.timestamp) + sizeof(tomb.deletion_time);
}
bytes_ostream ret;
ret.reserve(size);
auto out = ret.write_begin();
*out++ = bool(tomb);
if (tomb) {
write(out, tomb.timestamp);
write(out, tomb.deletion_time.time_since_epoch().count());
}
auto writeb = [&out] (bytes_view v) {
serialize_int32(out, v.size());
out = std::copy_n(v.begin(), v.size(), out);
};
// FIXME: overflow?
serialize_int32(out, boost::distance(cells));
for (auto&& kv : cells) {
auto&& k = kv.first;
auto&& v = kv.second;
writeb(k);
writeb(v.serialize());
}
return collection_mutation(type, ret);
}
collection_mutation collection_mutation_description::serialize(const abstract_type& type) const {
return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
}
collection_mutation collection_mutation_view_description::serialize(const abstract_type& type) const {
return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
}
template <typename C>
GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
static collection_mutation_view_description
merge(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type) {
using element_type = std::pair<bytes_view, atomic_cell_view>;
auto compare = [&] (const element_type& e1, const element_type& e2) {
return key_type.less(e1.first, e2.first);
};
auto merge = [] (const element_type& e1, const element_type& e2) {
// FIXME: use std::max()?
return std::make_pair(e1.first, compare_atomic_cell_for_merge(e1.second, e2.second) > 0 ? e1.second : e2.second);
};
// applied to a tombstone, returns a predicate checking whether a cell is killed by
// the tombstone
auto cell_killed = [] (const std::optional<tombstone>& t) {
return [&t] (const element_type& e) {
if (!t) {
return false;
}
// tombstone wins if timestamps equal here, unlike row tombstones
if (t->timestamp < e.second.timestamp()) {
return false;
}
return true;
// FIXME: should we consider TTLs too?
};
};
collection_mutation_view_description merged;
merged.cells.reserve(a.cells.size() + b.cells.size());
combine(a.cells.begin(), std::remove_if(a.cells.begin(), a.cells.end(), cell_killed(b.tomb)),
b.cells.begin(), std::remove_if(b.cells.begin(), b.cells.end(), cell_killed(a.tomb)),
std::back_inserter(merged.cells),
compare,
merge);
merged.tomb = std::max(a.tomb, b.tomb);
return merged;
}
collection_mutation merge(const abstract_type& type, collection_mutation_view a, collection_mutation_view b) {
return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
return merge(std::move(a_view), std::move(b_view), *ctype.name_comparator());
},
[&] (const user_type_impl& utype) {
return merge(std::move(a_view), std::move(b_view), *short_type);
},
[] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("collection_mutation merge: unknown type: {}", o.name()));
}
)).serialize(type);
});
});
}
template <typename C>
GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
static collection_mutation_view_description
difference(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type)
{
collection_mutation_view_description diff;
diff.cells.reserve(std::max(a.cells.size(), b.cells.size()));
auto it = b.cells.begin();
for (auto&& c : a.cells) {
while (it != b.cells.end() && key_type.less(it->first, c.first)) {
++it;
}
if (it == b.cells.end() || !key_type.equal(it->first, c.first)
|| compare_atomic_cell_for_merge(c.second, it->second) > 0) {
auto cell = std::make_pair(c.first, c.second);
diff.cells.emplace_back(std::move(cell));
}
}
if (a.tomb > b.tomb) {
diff.tomb = a.tomb;
}
return diff;
}
collection_mutation difference(const abstract_type& type, collection_mutation_view a, collection_mutation_view b)
{
return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
return difference(std::move(a_view), std::move(b_view), *ctype.name_comparator());
},
[&] (const user_type_impl& utype) {
return difference(std::move(a_view), std::move(b_view), *short_type);
},
[] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("collection_mutation difference: unknown type: {}", o.name()));
}
)).serialize(type);
});
});
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, collection_mutation_input_stream&>)
static collection_mutation_view_description
deserialize_collection_mutation(collection_mutation_input_stream& in, F&& read_kv) {
collection_mutation_view_description ret;
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
auto ts = in.read_trivial<api::timestamp_type>();
auto ttl = in.read_trivial<gc_clock::duration::rep>();
ret.tomb = tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))};
}
auto nr = in.read_trivial<uint32_t>();
ret.cells.reserve(nr);
for (uint32_t i = 0; i != nr; ++i) {
ret.cells.push_back(read_kv(in));
}
assert(in.empty());
return ret;
}
collection_mutation_view_description
deserialize_collection_mutation(const abstract_type& type, collection_mutation_input_stream& in) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
// value_comparator(), ugh
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return deserialize_collection_mutation(in, [&type_info] (collection_mutation_input_stream& in) {
// FIXME: we could probably avoid the need for size
auto ksize = in.read_trivial<uint32_t>();
auto key = in.read(ksize);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
return std::make_pair(key, value);
});
},
[&] (const user_type_impl& utype) {
return deserialize_collection_mutation(in, [&utype] (collection_mutation_input_stream& in) {
// FIXME: we could probably avoid the need for size
auto ksize = in.read_trivial<uint32_t>();
auto key = in.read(ksize);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(
utype.type(deserialize_field_index(key))->imr_state().type_info(), in.read(vsize));
return std::make_pair(key, value);
});
},
[&] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("deserialize_collection_mutation: unknown type {}", o.name()));
}
));
}

139
collection_mutation.hh Normal file
View File

@@ -0,0 +1,139 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "utils/chunked_vector.hh"
#include "schema_fwd.hh"
#include "gc_clock.hh"
#include "atomic_cell.hh"
#include "cql_serialization_format.hh"
#include "marshal_exception.hh"
#include "utils/linearizing_input_stream.hh"
#include <iosfwd>
class abstract_type;
class bytes_ostream;
class compaction_garbage_collector;
class row_tombstone;
class collection_mutation;
// An auxiliary struct used to (de)construct collection_mutations.
// Unlike collection_mutation which is a serialized blob, this struct allows to inspect logical units of information
// (tombstone and cells) inside the mutation easily.
struct collection_mutation_description {
tombstone tomb;
// FIXME: use iterators?
// we never iterate over `cells` more than once, so there is no need to store them in memory.
// In some cases instead of constructing the `cells` vector, it would be more efficient to provide
// a one-time-use forward iterator which returns the cells.
utils::chunked_vector<std::pair<bytes, atomic_cell>> cells;
// Expires cells based on query_time. Expires tombstones based on max_purgeable and gc_before.
// Removes cells covered by tomb or this->tomb.
bool compact_and_expire(column_id id, row_tombstone tomb, gc_clock::time_point query_time,
can_gc_fn&, gc_clock::time_point gc_before, compaction_garbage_collector* collector = nullptr);
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
// Similar to collection_mutation_description, except that it doesn't store the cells' data, only observes it.
struct collection_mutation_view_description {
tombstone tomb;
// FIXME: use iterators? See the fixme in collection_mutation_description; the same considerations apply here.
utils::chunked_vector<std::pair<bytes_view, atomic_cell_view>> cells;
// Copies the observed data, storing it in a collection_mutation_description.
collection_mutation_description materialize(const abstract_type&) const;
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
using collection_mutation_input_stream = utils::linearizing_input_stream<atomic_cell_value_view, marshal_exception>;
// Given a linearized collection_mutation_view, returns an auxiliary struct allowing the inspection of each cell.
// The struct is an observer of the data given by the collection_mutation_view and is only valid while the
// passed in `collection_mutation_input_stream` is alive.
// The function needs to be given the type of stored data to reconstruct the structural information.
collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, collection_mutation_input_stream&);
class collection_mutation_view {
public:
atomic_cell_value_view data;
// Is this a noop mutation?
bool is_empty() const;
// Is any of the stored cells live (not deleted nor expired) at the time point `tp`,
// given the later of the tombstones `t` and the one stored in the mutation (if any)?
// Requires a type to reconstruct the structural information.
bool is_any_live(const abstract_type&, tombstone t = tombstone(), gc_clock::time_point tp = gc_clock::time_point::min()) const;
// The maximum of timestamps of the mutation's cells and tombstone.
api::timestamp_type last_update(const abstract_type&) const;
// Given a function that operates on a collection_mutation_view_description,
// calls it on the corresponding description of `this`.
template <typename F>
inline decltype(auto) with_deserialized(const abstract_type& type, F f) const {
auto stream = collection_mutation_input_stream(data);
return f(deserialize_collection_mutation(type, stream));
}
class printer {
const abstract_type& _type;
const collection_mutation_view& _cmv;
public:
printer(const abstract_type& type, const collection_mutation_view& cmv)
: _type(type), _cmv(cmv) {}
friend std::ostream& operator<<(std::ostream& os, const printer& cmvp);
};
};
// A serialized mutation of a collection of cells.
// Used to represent mutations of collections (lists, maps, sets) or non-frozen user defined types.
// It contains a sequence of cells, each representing a mutation of a single entry (element or field) of the collection.
// Each cell has an associated 'key' (or 'path'). The meaning of each (key, cell) pair is:
// for sets: the key is the serialized set element, the cell contains no data (except liveness information),
// for maps: the key is the serialized map element's key, the cell contains the serialized map element's value,
// for lists: the key is a timeuuid identifying the list entry, the cell contains the serialized value,
// for user types: the key is an index identifying the field, the cell contains the value of the field.
// The mutation may also contain a collection-wide tombstone.
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
collection_mutation() {}
collection_mutation(const abstract_type&, collection_mutation_view);
collection_mutation(const abstract_type& type, const bytes_ostream& data);
operator collection_mutation_view() const;
};
collection_mutation merge(const abstract_type&, collection_mutation_view, collection_mutation_view);
collection_mutation difference(const abstract_type&, collection_mutation_view, collection_mutation_view);
// Serializes the given collection of cells to a sequence of bytes ready to be sent over the CQL protocol.
bytes serialize_for_cql(const abstract_type&, collection_mutation_view, cql_serialization_format);

View File

@@ -22,7 +22,7 @@
#pragma once
#include "schema.hh"
#include "types/collection.hh"
#include "collection_mutation.hh"
class atomic_cell;
class row_marker;
@@ -31,6 +31,6 @@ class compaction_garbage_collector {
public:
virtual ~compaction_garbage_collector() = default;
virtual void collect(column_id id, atomic_cell) = 0;
virtual void collect(column_id id, collection_type_impl::mutation) = 0;
virtual void collect(column_id id, collection_mutation_description) = 0;
virtual void collect(row_marker) = 0;
};

View File

@@ -74,8 +74,8 @@ private:
* <len(value1)><value1><len(value2)><value2>...<len(value_n)><value_n>
*
*/
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out) {
template<typename RangeOfSerializedComponents, typename CharOutputIterator>
static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out) {
for (auto&& val : values) {
assert(val.size() <= std::numeric_limits<size_type>::max());
write<size_type>(out, size_type(val.size()));

View File

@@ -248,15 +248,16 @@ private:
static size_t size(const data_value& val) {
return val.serialized_size();
}
template<typename Value, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
static void write_value(Value&& val, bytes::iterator& out) {
template<typename Value, typename CharOutputIterator, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
static void write_value(Value&& val, CharOutputIterator& out) {
out = std::copy(val.begin(), val.end(), out);
}
static void write_value(const data_value& val, bytes::iterator& out) {
template <typename CharOutputIterator>
static void write_value(const data_value& val, CharOutputIterator& out) {
val.serialize(out);
}
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out, bool is_compound) {
template<typename RangeOfSerializedComponents, typename CharOutputIterator>
static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out, bool is_compound) {
if (!is_compound) {
auto it = values.begin();
write_value(std::forward<decltype(*it)>(*it), out);

View File

@@ -92,14 +92,17 @@ struct duration_type_impl final : public concrete_type<cql_duration> {
struct timestamp_type_impl final : public simple_type_impl<db_clock::time_point> {
timestamp_type_impl();
static db_clock::time_point from_sstring(sstring_view s);
};
struct simple_date_type_impl final : public simple_type_impl<uint32_t> {
simple_date_type_impl();
static uint32_t from_sstring(sstring_view s);
};
struct time_type_impl final : public simple_type_impl<int64_t> {
time_type_impl();
static int64_t from_sstring(sstring_view s);
};
struct string_type_impl : public concrete_type<sstring> {
@@ -125,8 +128,11 @@ struct date_type_impl final : public concrete_type<db_clock::time_point> {
date_type_impl();
};
using timestamp_date_base_class = concrete_type<db_clock::time_point>;
struct timeuuid_type_impl final : public concrete_type<utils::UUID> {
timeuuid_type_impl();
static utils::UUID from_sstring(sstring_view s);
};
struct varint_type_impl final : public concrete_type<boost::multiprecision::cpp_int> {
@@ -135,10 +141,13 @@ struct varint_type_impl final : public concrete_type<boost::multiprecision::cpp_
struct inet_addr_type_impl final : public concrete_type<seastar::net::inet_address> {
inet_addr_type_impl();
static sstring to_sstring(const seastar::net::inet_address& addr);
static seastar::net::inet_address from_sstring(sstring_view s);
};
struct uuid_type_impl final : public concrete_type<utils::UUID> {
uuid_type_impl();
static utils::UUID from_sstring(sstring_view s);
};
template <typename Func> using visit_ret_type = std::invoke_result_t<Func, const ascii_type_impl&>;
@@ -239,3 +248,28 @@ static inline visit_ret_type<Func> visit(const abstract_type& t, Func&& f) {
}
__builtin_unreachable();
}
template <typename Func> struct data_value_visitor {
const void* v;
Func& f;
auto operator()(const empty_type_impl& t) { return f(t, v); }
auto operator()(const counter_type_impl& t) { return f(t, v); }
auto operator()(const reversed_type_impl& t) { return f(t, v); }
template <typename T> auto operator()(const T& t) {
return f(t, reinterpret_cast<const typename T::native_type*>(v));
}
};
// Given an abstract_type and a void pointer to an object of that
// type, call f with the runtime type of t and v casted to the
// corresponding native type.
// This takes an abstract_type and a void pointer instead of a
// data_value to support reversed_type_impl without requiring that
// each visitor create a new data_value just to recurse.
template <typename Func> inline auto visit(const abstract_type& t, const void* v, Func&& f) {
return ::visit(t, data_value_visitor<Func>{v, f});
}
template <typename Func> inline auto visit(const data_value& v, Func&& f) {
return ::visit(*v.type(), v._value, f);
}

View File

@@ -25,15 +25,19 @@
# multiple tokens per node, see http://cassandra.apache.org/doc/latest/operating
num_tokens: 256
# Directory where Scylla should store all its files, which are commitlog,
# data, hints, view_hints and saved_caches subdirectories. All of these
# subs can be overriden by the respective options below.
# If unset, the value defaults to /var/lib/scylla
# workdir: /var/lib/scylla
# Directory where Scylla should store data on disk.
# If not set, the default directory is /var/lib/scylla/data.
data_file_directories:
- /var/lib/scylla/data
# data_file_directories:
# - /var/lib/scylla/data
# commit log. when running on magnetic HDD, this should be a
# separate spindle than the data directories.
# If not set, the default directory is /var/lib/scylla/commitlog.
commitlog_directory: /var/lib/scylla/commitlog
# commitlog_directory: /var/lib/scylla/commitlog
# commitlog_sync may be either "periodic" or "batch."
#
@@ -112,6 +116,9 @@ read_request_timeout_in_ms: 5000
# How long the coordinator should wait for writes to complete
write_request_timeout_in_ms: 2000
# how long a coordinator should continue to retry a CAS operation
# that contends with other proposals for the same row
cas_contention_timeout_in_ms: 1000
# phi value that must be reached for a host to be marked down.
# most users should never need to adjust this.
@@ -238,7 +245,10 @@ batch_size_fail_threshold_in_kb: 50
# broadcast_rpc_address: 1.2.3.4
# Uncomment to enable experimental features
# experimental: true
# experimental_features:
# - cdc
# - lwt
# - udf
# The directory where hints files are stored if hinted handoff is enabled.
# hints_directory: /var/lib/scylla/hints
@@ -257,24 +267,6 @@ batch_size_fail_threshold_in_kb: 50
# created until it has been seen alive and gone down again.
# max_hint_window_in_ms: 10800000 # 3 hours
# Maximum throttle in KBs per second, per delivery thread. This will be
# reduced proportionally to the number of nodes in the cluster. (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously.)
# hinted_handoff_throttle_in_kb: 1024
# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
# max_hints_delivery_threads: 2
###################################################
## Not currently supported, reserved for future use
###################################################
# Maximum throttle in KBs per second, total. This will be
# reduced proportionally to the number of nodes in the cluster.
# batchlog_replay_throttle_in_kb: 1024
# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
@@ -302,120 +294,6 @@ batch_size_fail_threshold_in_kb: 50
#
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# Maximum size of the key cache in memory.
#
# Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
# minimum, sometimes more. The key cache is fairly tiny for the amount of
# time it saves, so it's worthwhile to use it at large numbers.
# The row cache saves even more time, but must contain the entire row,
# so it is extremely space-intensive. It's best to only use the
# row cache if you have hot rows or static rows.
#
# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
#
# Default value is empty to make it "auto" (min(5% of Heap (in MB), 100MB)). Set to 0 to disable key cache.
# key_cache_size_in_mb:
# Duration in seconds after which Scylla should
# save the key cache. Caches are saved to saved_caches_directory as
# specified in this configuration file.
#
# Saved caches greatly improve cold-start speeds, and is relatively cheap in
# terms of I/O for the key cache. Row cache saving is much more expensive and
# has limited use.
#
# Default is 14400 or 4 hours.
# key_cache_save_period: 14400
# Number of keys from the key cache to save
# Disabled by default, meaning all keys are going to be saved
# key_cache_keys_to_save: 100
# Maximum size of the row cache in memory.
# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
#
# Default value is 0, to disable row caching.
# row_cache_size_in_mb: 0
# Duration in seconds after which Scylla should
# save the row cache. Caches are saved to saved_caches_directory as specified
# in this configuration file.
#
# Saved caches greatly improve cold-start speeds, and is relatively cheap in
# terms of I/O for the key cache. Row cache saving is much more expensive and
# has limited use.
#
# Default is 0 to disable saving the row cache.
# row_cache_save_period: 0
# Number of keys from the row cache to save
# Disabled by default, meaning all keys are going to be saved
# row_cache_keys_to_save: 100
# Maximum size of the counter cache in memory.
#
# Counter cache helps to reduce counter locks' contention for hot counter cells.
# In case of RF = 1 a counter cache hit will cause Scylla to skip the read before
# write entirely. With RF > 1 a counter cache hit will still help to reduce the duration
# of the lock hold, helping with hot counter cell updates, but will not allow skipping
# the read entirely. Only the local (clock, count) tuple of a counter cell is kept
# in memory, not the whole counter, so it's relatively cheap.
#
# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
#
# Default value is empty to make it "auto" (min(2.5% of Heap (in MB), 50MB)). Set to 0 to disable counter cache.
# NOTE: if you perform counter deletes and rely on low gcgs, you should disable the counter cache.
# counter_cache_size_in_mb:
# Duration in seconds after which Scylla should
# save the counter cache (keys only). Caches are saved to saved_caches_directory as
# specified in this configuration file.
#
# Default is 7200 or 2 hours.
# counter_cache_save_period: 7200
# Number of keys from the counter cache to save
# Disabled by default, meaning all keys are going to be saved
# counter_cache_keys_to_save: 100
# The off-heap memory allocator. Affects storage engine metadata as
# well as caches. Experiments show that JEMAlloc saves some memory
# than the native GCC allocator (i.e., JEMalloc is more
# fragmentation-resistant).
#
# Supported values are: NativeAllocator, JEMallocAllocator
#
# If you intend to use JEMallocAllocator you have to install JEMalloc as library and
# modify cassandra-env.sh as directed in the file.
#
# Defaults to NativeAllocator
# memory_allocator: NativeAllocator
# saved caches
# If not set, the default directory is /var/lib/scylla/saved_caches.
# saved_caches_directory: /var/lib/scylla/saved_caches
# For workloads with more data than can fit in memory, Scylla's
# bottleneck will be reads that need to fetch data from
# disk. "concurrent_reads" should be set to (16 * number_of_drives) in
# order to allow the operations to enqueue low enough in the stack
# that the OS and drives can reorder them. Same applies to
# "concurrent_counter_writes", since counter writes read the current
# values before incrementing and writing them back.
#
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
# concurrent_reads: 32
# concurrent_writes: 32
# concurrent_counter_writes: 32
# Total memory to use for sstable-reading buffers. Defaults to
# the smaller of 1/4 of heap or 512MB.
# file_cache_size_in_mb: 512
# Total space to use for commitlogs.
#
# If space gets above this value (it will round up to the next nearest
@@ -427,28 +305,6 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# available for Scylla.
commitlog_total_space_in_mb: -1
# A fixed memory pool size in MB for for SSTable index summaries. If left
# empty, this will default to 5% of the heap size. If the memory usage of
# all index summaries exceeds this limit, SSTables with low read rates will
# shrink their index summaries in order to meet this limit. However, this
# is a best-effort process. In extreme conditions Scylla may need to use
# more than this amount of memory.
# index_summary_capacity_in_mb:
# How frequently index summaries should be resampled. This is done
# periodically to redistribute memory from the fixed-size pool to sstables
# proportional their recent read rates. Setting to -1 will disable this
# process, leaving existing index summaries at their current sampling level.
# index_summary_resize_interval_in_minutes: 60
# Whether to, when doing sequential writing, fsync() at intervals in
# order to force the operating system to flush the dirty
# buffers. Enable this to avoid sudden dirty buffer flushing from
# impacting read latencies. Almost always a good idea on SSDs; not
# necessarily on platters.
# trickle_fsync: false
# trickle_fsync_interval_in_kb: 10240
# TCP port, for commands and data
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
# storage_port: 7000
@@ -461,91 +317,21 @@ commitlog_total_space_in_mb: -1
# listen_interface: eth0
# listen_interface_prefer_ipv6: false
# Internode authentication backend, implementing IInternodeAuthenticator;
# used to allow/disallow connections from peer nodes.
# internode_authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
# Whether to start the native transport server.
# Please note that the address on which the native transport is bound is the
# same as the rpc_address. The port however is different and specified below.
# start_native_transport: true
# The maximum threads for handling requests when the native transport is used.
# This is similar to rpc_max_threads though the default differs slightly (and
# there is no native_transport_min_threads, idle threads will always be stopped
# after 30 seconds).
# native_transport_max_threads: 128
#
# The maximum size of allowed frame. Frame (requests) larger than this will
# be rejected as invalid. The default is 256MB.
# native_transport_max_frame_size_in_mb: 256
# The maximum number of concurrent client connections.
# The default is -1, which means unlimited.
# native_transport_max_concurrent_connections: -1
# The maximum number of concurrent client connections per source ip.
# The default is -1, which means unlimited.
# native_transport_max_concurrent_connections_per_ip: -1
# Whether to start the thrift rpc server.
# start_rpc: true
# enable or disable keepalive on rpc/native connections
# rpc_keepalive: true
# Scylla provides two out-of-the-box options for the RPC Server:
#
# sync -> One thread per thrift connection. For a very large number of clients, memory
# will be your limiting factor. On a 64 bit JVM, 180KB is the minimum stack size
# per thread, and that will correspond to your use of virtual memory (but physical memory
# may be limited depending on use of stack space).
#
# hsha -> Stands for "half synchronous, half asynchronous." All thrift clients are handled
# asynchronously using a small number of threads that does not vary with the amount
# of thrift clients (and thus scales well to many clients). The rpc requests are still
# synchronous (one thread per active request). If hsha is selected then it is essential
# that rpc_max_threads is changed from the default value of unlimited.
#
# The default is sync because on Windows hsha is about 30% slower. On Linux,
# sync/hsha performance is about the same, with hsha of course using less memory.
#
# Alternatively, can provide your own RPC server by providing the fully-qualified class name
# of an o.a.c.t.TServerFactory that can create an instance of it.
# rpc_server_type: sync
# Uncomment rpc_min|max_thread to set request pool size limits.
#
# Regardless of your choice of RPC server (see above), the number of maximum requests in the
# RPC thread pool dictates how many concurrent requests are possible (but if you are using the sync
# RPC server, it also dictates the number of clients that can be connected at all).
#
# The default is unlimited and thus provides no protection against clients overwhelming the server. You are
# encouraged to set a maximum that makes sense for you in production, but do keep in mind that
# rpc_max_threads represents the maximum number of client requests this server may execute concurrently.
#
# rpc_min_threads: 16
# rpc_max_threads: 2048
# uncomment to set socket buffer sizes on rpc connections
# rpc_send_buff_size_in_bytes:
# rpc_recv_buff_size_in_bytes:
# Uncomment to set socket buffer size for internode communication
# Note that when setting this, the buffer size is limited by net.core.wmem_max
# and when not setting it it is defined by net.ipv4.tcp_wmem
# See:
# /proc/sys/net/core/wmem_max
# /proc/sys/net/core/rmem_max
# /proc/sys/net/ipv4/tcp_wmem
# /proc/sys/net/ipv4/tcp_rmem
# and: man tcp
# internode_send_buff_size_in_bytes:
# internode_recv_buff_size_in_bytes:
# Frame size for thrift (maximum message length).
# thrift_framed_transport_size_in_mb: 15
# Set to true to have Scylla create a hard link to each sstable
# flushed or streamed locally in a backups/ subdirectory of the
# keyspace data. Removing these links is the operator's
@@ -588,30 +374,6 @@ commitlog_total_space_in_mb: -1
# column_index_size_in_kb: 64
# Number of simultaneous compactions to allow, NOT including
# validation "compactions" for anti-entropy repair. Simultaneous
# compactions can help preserve read performance in a mixed read/write
# workload, by mitigating the tendency of small sstables to accumulate
# during a single long running compactions. The default is usually
# fine and if you experience problems with compaction running too
# slowly or too fast, you should look at
# compaction_throughput_mb_per_sec first.
#
# concurrent_compactors defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#concurrent_compactors: 1
# Throttles compaction to the given total throughput across the entire
# system. The faster you insert data, the faster you need to compact in
# order to keep the sstable count down, but in general, setting this to
# 16 to 32 times the rate you are inserting data is more than sufficient.
# Setting this to 0 disables throttling. Note that this account for all types
# of compaction, including validation compaction.
# compaction_throughput_mb_per_sec: 16
# Log a warning when writing partitions larger than this value
# compaction_large_partition_warning_threshold_mb: 1000
@@ -624,18 +386,6 @@ commitlog_total_space_in_mb: -1
# Log a warning when row number is larger than this value
# compaction_rows_count_warning_threshold: 100000
# When compacting, the replacement sstable(s) can be opened before they
# are completely written, and used in place of the prior sstables for
# any range that has been written. This helps to smoothly transfer reads
# between the sstables, reducing page cache churn and keeping hot rows hot
# sstable_preemptive_open_interval_in_mb: 50
# Throttles all streaming file transfer between the datacenters,
# this setting allows users to throttle inter dc stream throughput in addition
# to throttling all network stream traffic as configured with
# stream_throughput_outbound_megabits_per_sec
# inter_dc_stream_throughput_outbound_megabits_per_sec:
# How long the coordinator should wait for seq or index scans to complete
# range_request_timeout_in_ms: 10000
# How long the coordinator should wait for writes to complete
@@ -650,88 +400,23 @@ commitlog_total_space_in_mb: -1
# The default timeout for other, miscellaneous operations
# request_timeout_in_ms: 10000
# Enable operation timeout information exchange between nodes to accurately
# measure request timeouts. If disabled, replicas will assume that requests
# were forwarded to them instantly by the coordinator, which means that
# under overload conditions we will waste that much extra time processing
# already-timed-out requests.
#
# Warning: before enabling this property make sure to ntp is installed
# and the times are synchronized between the nodes.
# cross_node_timeout: false
# Enable socket timeout for streaming operation.
# When a timeout occurs during streaming, streaming is retried from the start
# of the current file. This _can_ involve re-streaming an important amount of
# data, so you should avoid setting the value too low.
# Default value is 0, which never timeout streams.
# streaming_socket_timeout_in_ms: 0
# controls how often to perform the more expensive part of host score
# calculation
# dynamic_snitch_update_interval_in_ms: 100
# controls how often to reset all host scores, allowing a bad host to
# possibly recover
# dynamic_snitch_reset_interval_in_ms: 600000
# if set greater than zero and read_repair_chance is < 1.0, this will allow
# 'pinning' of replicas to hosts in order to increase cache capacity.
# The badness threshold will control how much worse the pinned host has to be
# before the dynamic snitch will prefer other replicas over it. This is
# expressed as a double which represents a percentage. Thus, a value of
# 0.2 means Scylla would continue to prefer the static snitch values
# until the pinned host was 20% worse than the fastest.
# dynamic_snitch_badness_threshold: 0.1
# request_scheduler -- Set this to a class that implements
# RequestScheduler, which will schedule incoming client requests
# according to the specific policy. This is useful for multi-tenancy
# with a single Scylla cluster.
# NOTE: This is specifically for requests from the client and does
# not affect inter node communication.
# org.apache.cassandra.scheduler.NoScheduler - No scheduling takes place
# org.apache.cassandra.scheduler.RoundRobinScheduler - Round robin of
# client requests to a node with a separate queue for each
# request_scheduler_id. The scheduler is further customized by
# request_scheduler_options as described below.
# request_scheduler: org.apache.cassandra.scheduler.NoScheduler
# Scheduler Options vary based on the type of scheduler
# NoScheduler - Has no options
# RoundRobin
# - throttle_limit -- The throttle_limit is the number of in-flight
# requests per client. Requests beyond
# that limit are queued up until
# running requests can complete.
# The value of 80 here is twice the number of
# concurrent_reads + concurrent_writes.
# - default_weight -- default_weight is optional and allows for
# overriding the default which is 1.
# - weights -- Weights are optional and will default to 1 or the
# overridden default_weight. The weight translates into how
# many requests are handled during each turn of the
# RoundRobin, based on the scheduler id.
#
# request_scheduler_options:
# throttle_limit: 80
# default_weight: 5
# weights:
# Keyspace1: 1
# Keyspace2: 5
# request_scheduler_id -- An identifier based on which to perform
# the request scheduling. Currently the only valid option is keyspace.
# request_scheduler_id: keyspace
# Enable or disable inter-node encryption.
# You must also generate keys and provide the appropriate key and trust store locations and passwords.
# No custom encryption options are currently enabled. The available options are:
#
# The available internode options are : all, none, dc, rack
# If set to dc scylla will encrypt the traffic between the DCs
# If set to rack scylla will encrypt the traffic between the racks
#
# SSL/TLS algorithm and ciphers used can be controlled by
# the priority_string parameter. Info on priority string
# syntax and values is available at:
# https://gnutls.org/manual/html_node/Priority-Strings.html
#
# The require_client_auth parameter allows you to
# restrict access to service based on certificate
# validation. Client must provide a certificate
# accepted by the used trust store to connect.
#
# server_encryption_options:
# internode_encryption: none
# certificate: conf/scylla.crt

View File

@@ -144,8 +144,12 @@ def flag_supported(flag, compiler):
def gold_supported(compiler):
src_main = 'int main(int argc, char **argv) { return 0; }'
if try_compile_and_link(source=src_main, flags=['-fuse-ld=gold'], compiler=compiler):
return '-fuse-ld=gold'
link_flags = ['-fuse-ld=gold']
if try_compile_and_link(source=src_main, flags=link_flags, compiler=compiler):
threads_flag = '-Wl,--threads'
if try_compile_and_link(source=src_main, flags=link_flags + [threads_flag], compiler=compiler):
link_flags.append(threads_flag)
return ' '.join(link_flags)
else:
print('Note: gold not found; using default system linker')
return ''
@@ -257,136 +261,142 @@ modes = {
}
scylla_tests = [
'tests/mutation_test',
'tests/mvcc_test',
'tests/mutation_fragment_test',
'tests/flat_mutation_reader_test',
'tests/schema_registry_test',
'tests/canonical_mutation_test',
'tests/range_test',
'tests/types_test',
'tests/keys_test',
'tests/partitioner_test',
'tests/frozen_mutation_test',
'tests/serialized_action_test',
'tests/hint_test',
'tests/clustering_ranges_walker_test',
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
'tests/row_cache_alloc_stress',
'tests/perf_row_cache_update',
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/perf/perf_simple_query',
'tests/perf/perf_fast_forward',
'tests/perf/perf_cache_eviction',
'tests/cache_flat_mutation_reader_test',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/perf/perf_sstable',
'tests/cql_query_test',
'tests/secondary_index_test',
'tests/json_cql_query_test',
'tests/filtering_test',
'tests/storage_proxy_test',
'tests/schema_change_test',
'tests/mutation_reader_test',
'tests/mutation_query_test',
'tests/row_cache_test',
'tests/test-serialization',
'tests/broken_sstable_test',
'tests/sstable_test',
'tests/sstable_datafile_test',
'tests/sstable_3_x_test',
'tests/sstable_mutation_test',
'tests/sstable_resharding_test',
'tests/memtable_test',
'tests/commitlog_test',
'tests/cartesian_product_test',
'tests/hash_test',
'tests/map_difference_test',
'tests/message',
'tests/gossip',
'tests/gossip_test',
'tests/compound_test',
'tests/config_test',
'tests/gossiping_property_file_snitch_test',
'tests/ec2_snitch_test',
'tests/gce_snitch_test',
'tests/snitch_reset_test',
'tests/network_topology_strategy_test',
'tests/query_processor_test',
'tests/batchlog_manager_test',
'tests/bytes_ostream_test',
'tests/UUID_test',
'tests/murmur_hash_test',
'tests/allocation_strategy_test',
'tests/logalloc_test',
'tests/log_heap_test',
'tests/managed_vector_test',
'tests/crc_test',
'tests/checksum_utils_test',
'tests/flush_queue_test',
'tests/dynamic_bitset_test',
'tests/auth_test',
'tests/idl_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/database_test',
'tests/nonwrapping_range_test',
'tests/input_stream_test',
'tests/virtual_reader_test',
'tests/view_schema_test',
'tests/view_build_test',
'tests/view_complex_test',
'tests/counter_test',
'tests/cell_locker_test',
'tests/row_locker_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/continuous_data_consumer_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/loading_cache_test',
'tests/castas_fcts_test',
'tests/big_decimal_test',
'tests/aggregate_fcts_test',
'tests/role_manager_test',
'tests/caching_options_test',
'tests/auth_resource_test',
'tests/cql_auth_query_test',
'tests/enum_set_test',
'tests/extensions_test',
'tests/cql_auth_syntax_test',
'tests/querier_cache',
'tests/limiting_data_source_test',
'tests/meta_test',
'tests/imr_test',
'tests/partition_data_test',
'tests/reusable_buffer_test',
'tests/mutation_writer_test',
'tests/observable_test',
'tests/transport_test',
'tests/fragmented_temporary_buffer_test',
'tests/json_test',
'tests/auth_passwords_test',
'tests/multishard_mutation_query_test',
'tests/top_k_test',
'tests/utf8_test',
'tests/small_vector_test',
'tests/data_listeners_test',
'tests/truncation_migration_test',
'tests/like_matcher_test',
'test/boost/UUID_test',
'test/boost/aggregate_fcts_test',
'test/boost/allocation_strategy_test',
'test/boost/anchorless_list_test',
'test/boost/auth_passwords_test',
'test/boost/auth_resource_test',
'test/boost/auth_test',
'test/boost/batchlog_manager_test',
'test/boost/big_decimal_test',
'test/boost/broken_sstable_test',
'test/boost/bytes_ostream_test',
'test/boost/cache_flat_mutation_reader_test',
'test/boost/caching_options_test',
'test/boost/canonical_mutation_test',
'test/boost/cartesian_product_test',
'test/boost/castas_fcts_test',
'test/boost/cdc_test',
'test/boost/cell_locker_test',
'test/boost/checksum_utils_test',
'test/boost/chunked_vector_test',
'test/boost/clustering_ranges_walker_test',
'test/boost/commitlog_test',
'test/boost/compound_test',
'test/boost/compress_test',
'test/boost/config_test',
'test/boost/continuous_data_consumer_test',
'test/boost/counter_test',
'test/boost/cql_auth_query_test',
'test/boost/cql_auth_syntax_test',
'test/boost/cql_query_test',
'test/boost/crc_test',
'test/boost/data_listeners_test',
'test/boost/database_test',
'test/boost/duration_test',
'test/boost/dynamic_bitset_test',
'test/boost/enum_option_test',
'test/boost/enum_set_test',
'test/boost/extensions_test',
'test/boost/filtering_test',
'test/boost/flat_mutation_reader_test',
'test/boost/flush_queue_test',
'test/boost/fragmented_temporary_buffer_test',
'test/boost/frozen_mutation_test',
'test/boost/gossip_test',
'test/boost/gossiping_property_file_snitch_test',
'test/boost/hash_test',
'test/boost/idl_test',
'test/boost/input_stream_test',
'test/boost/json_cql_query_test',
'test/boost/keys_test',
'test/boost/like_matcher_test',
'test/boost/limiting_data_source_test',
'test/boost/linearizing_input_stream_test',
'test/boost/loading_cache_test',
'test/boost/log_heap_test',
'test/boost/logalloc_test',
'test/boost/managed_vector_test',
'test/boost/map_difference_test',
'test/boost/memtable_test',
'test/boost/meta_test',
'test/boost/multishard_mutation_query_test',
'test/boost/murmur_hash_test',
'test/boost/mutation_fragment_test',
'test/boost/mutation_query_test',
'test/boost/mutation_reader_test',
'test/boost/mutation_test',
'test/boost/mutation_writer_test',
'test/boost/mvcc_test',
'test/boost/network_topology_strategy_test',
'test/boost/nonwrapping_range_test',
'test/boost/observable_test',
'test/boost/partitioner_test',
'test/boost/querier_cache_test',
'test/boost/query_processor_test',
'test/boost/range_test',
'test/boost/range_tombstone_list_test',
'test/boost/reusable_buffer_test',
'test/boost/role_manager_test',
'test/boost/row_cache_test',
'test/boost/schema_change_test',
'test/boost/schema_registry_test',
'test/boost/secondary_index_test',
'test/boost/serialization_test',
'test/boost/serialized_action_test',
'test/boost/small_vector_test',
'test/boost/snitch_reset_test',
'test/boost/sstable_3_x_test',
'test/boost/sstable_datafile_test',
'test/boost/sstable_mutation_test',
'test/boost/sstable_resharding_test',
'test/boost/sstable_test',
'test/boost/storage_proxy_test',
'test/boost/top_k_test',
'test/boost/transport_test',
'test/boost/truncation_migration_test',
'test/boost/types_test',
'test/boost/user_function_test',
'test/boost/user_types_test',
'test/boost/utf8_test',
'test/boost/view_build_test',
'test/boost/view_complex_test',
'test/boost/view_schema_test',
'test/boost/vint_serialization_test',
'test/boost/virtual_reader_test',
'test/manual/ec2_snitch_test',
'test/manual/gce_snitch_test',
'test/manual/gossip',
'test/manual/hint_test',
'test/manual/imr_test',
'test/manual/json_test',
'test/manual/message',
'test/manual/partition_data_test',
'test/manual/row_locker_test',
'test/manual/streaming_histogram_test',
'test/perf/perf_cache_eviction',
'test/perf/perf_cql_parser',
'test/perf/perf_fast_forward',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_row_cache_update',
'test/perf/perf_simple_query',
'test/perf/perf_sstable',
'test/tools/cql_repl',
'test/unit/lsa_async_eviction_test',
'test/unit/lsa_sync_eviction_test',
'test/unit/memory_footprint_test',
'test/unit/row_cache_alloc_stress_test',
'test/unit/row_cache_stress_test',
]
perf_tests = [
'tests/perf/perf_mutation_readers',
'tests/perf/perf_checksum',
'tests/perf/perf_mutation_fragment',
'tests/perf/perf_idl',
'tests/perf/perf_vint',
'test/perf/perf_mutation_readers',
'test/perf/perf_checksum',
'test/perf/perf_mutation_fragment',
'test/perf/perf_idl',
'test/perf/perf_vint',
]
apps = [
@@ -429,8 +439,6 @@ arg_parser.add_argument('--dpdk-target', action='store', dest='dpdk_target', def
help='Path to DPDK SDK target location (e.g. <DPDK SDK dir>/x86_64-native-linuxapp-gcc)')
arg_parser.add_argument('--debuginfo', action='store', dest='debuginfo', type=int, default=1,
help='Enable(1)/disable(0)compiler debug information generation')
arg_parser.add_argument('--compress-exec-debuginfo', action='store', dest='compress_exec_debuginfo', type=int, default=1,
help='Enable(1)/disable(0) debug information compression in executables')
arg_parser.add_argument('--static-stdc++', dest='staticcxx', action='store_true',
help='Link libgcc and libstdc++ statically')
arg_parser.add_argument('--static-thrift', dest='staticthrift', action='store_true',
@@ -453,6 +461,8 @@ arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_i
help='enable allocation failure injection')
arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
help='path to antlr3 executable')
arg_parser.add_argument('--with-ragel', dest='ragel_exec', action='store', default='ragel',
help='path to ragel executable')
args = arg_parser.parse_args()
defines = ['XXH_PRIVATE_API',
@@ -466,6 +476,8 @@ cassandra_interface = Thrift(source='interface/cassandra.thrift', service='Cassa
scylla_core = (['database.cc',
'table.cc',
'atomic_cell.cc',
'collection_mutation.cc',
'connection_notifier.cc',
'hashers.cc',
'schema.cc',
'frozen_schema.cc',
@@ -485,6 +497,7 @@ scylla_core = (['database.cc',
'utils/buffer_input_stream.cc',
'utils/limiting_data_source.cc',
'utils/updateable_value.cc',
'utils/directories.cc',
'mutation_partition.cc',
'mutation_partition_view.cc',
'mutation_partition_serializer.cc',
@@ -505,6 +518,8 @@ scylla_core = (['database.cc',
'sstables/partition.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/size_tiered_compaction_strategy.cc',
'sstables/leveled_compaction_strategy.cc',
'sstables/compaction_manager.cc',
'sstables/integrity_checked_file_impl.cc',
'sstables/prepended_input_stream.cc',
@@ -513,6 +528,8 @@ scylla_core = (['database.cc',
'transport/event_notifier.cc',
'transport/server.cc',
'transport/messages/result_message.cc',
'cdc/cdc.cc',
'cql3/type_json.cc',
'cql3/abstract_marker.cc',
'cql3/attributes.cc',
'cql3/cf_name.cc',
@@ -524,7 +541,9 @@ scylla_core = (['database.cc',
'cql3/sets.cc',
'cql3/tuples.cc',
'cql3/maps.cc',
'cql3/functions/user_function.cc',
'cql3/functions/functions.cc',
'cql3/functions/aggregate_fcts.cc',
'cql3/functions/castas_fcts.cc',
'cql3/statements/cf_prop_defs.cc',
'cql3/statements/cf_statement.cc',
@@ -533,14 +552,18 @@ scylla_core = (['database.cc',
'cql3/statements/create_table_statement.cc',
'cql3/statements/create_view_statement.cc',
'cql3/statements/create_type_statement.cc',
'cql3/statements/create_function_statement.cc',
'cql3/statements/drop_index_statement.cc',
'cql3/statements/drop_keyspace_statement.cc',
'cql3/statements/drop_table_statement.cc',
'cql3/statements/drop_view_statement.cc',
'cql3/statements/drop_type_statement.cc',
'cql3/statements/drop_function_statement.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
'cql3/statements/function_statement.cc',
'cql3/statements/modification_statement.cc',
'cql3/statements/cas_request.cc',
'cql3/statements/parsed_statement.cc',
'cql3/statements/property_definitions.cc',
'cql3/statements/update_statement.cc',
@@ -578,6 +601,10 @@ scylla_core = (['database.cc',
'service/priority_manager.cc',
'service/migration_manager.cc',
'service/storage_proxy.cc',
'service/paxos/proposal.cc',
'service/paxos/prepare_response.cc',
'service/paxos/paxos_state.cc',
'service/paxos/prepare_summary.cc',
'cql3/operator.cc',
'cql3/relation.cc',
'cql3/column_identifier.cc',
@@ -716,6 +743,7 @@ scylla_core = (['database.cc',
'tracing/trace_keyspace_helper.cc',
'tracing/trace_state.cc',
'tracing/tracing_backend_registry.cc',
'tracing/traced_file.cc',
'table_helper.cc',
'range_tombstone.cc',
'range_tombstone_list.cc',
@@ -733,6 +761,7 @@ scylla_core = (['database.cc',
'utils/ascii.cc',
'utils/like_matcher.cc',
'mutation_writer/timestamp_based_splitting_writer.cc',
'lua.cc',
] + [Antlr3Grammar('cql3/Cql.g')] + [Thrift('interface/cassandra.thrift', 'Cassandra')]
)
@@ -772,6 +801,34 @@ api = ['api/api.cc',
'api/api-doc/config.json',
]
alternator = [
'alternator/server.cc',
'alternator/executor.cc',
'alternator/stats.cc',
'alternator/base64.cc',
'alternator/serialization.cc',
'alternator/expressions.cc',
Antlr3Grammar('alternator/expressions.g'),
'alternator/conditions.cc',
'alternator/rjson.cc',
'alternator/auth.cc',
]
redis = [
'redis/service.cc',
'redis/server.cc',
'redis/query_processor.cc',
'redis/protocol_parser.rl',
'redis/keyspace_utils.cc',
'redis/options.cc',
'redis/stats.cc',
'redis/mutation_utils.cc',
'redis/query_utils.cc',
'redis/abstract_command.cc',
'redis/command_factory.cc',
'redis/commands.cc',
]
idls = ['idl/gossip_digest.idl.hh',
'idl/uuid.idl.hh',
'idl/range.idl.hh',
@@ -796,77 +853,80 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/consistency_level.idl.hh',
'idl/cache_temperature.idl.hh',
'idl/view.idl.hh',
'idl/messaging_service.idl.hh',
'idl/paxos.idl.hh',
]
headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])
scylla_tests_generic_dependencies = [
'tests/cql_test_env.cc',
'tests/test_services.cc',
'test/lib/cql_test_env.cc',
'test/lib/test_services.cc',
]
scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependencies + [
'tests/cql_assertions.cc',
'tests/result_set_assertions.cc',
'tests/mutation_source_test.cc',
'tests/data_model.cc',
'tests/exception_utils.cc',
'tests/random_schema.cc',
'test/lib/cql_assertions.cc',
'test/lib/result_set_assertions.cc',
'test/lib/mutation_source_test.cc',
'test/lib/data_model.cc',
'test/lib/exception_utils.cc',
'test/lib/random_schema.cc',
]
deps = {
'scylla': idls + ['main.cc', 'release.cc'] + scylla_core + api,
'scylla': idls + ['main.cc', 'release.cc', 'build_id.cc'] + scylla_core + api + alternator + redis,
}
pure_boost_tests = set([
'tests/map_difference_test',
'tests/keys_test',
'tests/compound_test',
'tests/range_tombstone_list_test',
'tests/anchorless_list_test',
'tests/nonwrapping_range_test',
'tests/test-serialization',
'tests/range_test',
'tests/crc_test',
'tests/checksum_utils_test',
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/cartesian_product_test',
'tests/streaming_histogram_test',
'tests/duration_test',
'tests/vint_serialization_test',
'tests/compress_test',
'tests/chunked_vector_test',
'tests/big_decimal_test',
'tests/caching_options_test',
'tests/auth_resource_test',
'tests/enum_set_test',
'tests/cql_auth_syntax_test',
'tests/meta_test',
'tests/observable_test',
'tests/json_test',
'tests/auth_passwords_test',
'tests/top_k_test',
'tests/small_vector_test',
'tests/like_matcher_test',
'test/boost/anchorless_list_test',
'test/boost/auth_passwords_test',
'test/boost/auth_resource_test',
'test/boost/big_decimal_test',
'test/boost/caching_options_test',
'test/boost/cartesian_product_test',
'test/boost/checksum_utils_test',
'test/boost/chunked_vector_test',
'test/boost/compound_test',
'test/boost/compress_test',
'test/boost/cql_auth_syntax_test',
'test/boost/crc_test',
'test/boost/duration_test',
'test/boost/dynamic_bitset_test',
'test/boost/enum_option_test',
'test/boost/enum_set_test',
'test/boost/idl_test',
'test/boost/keys_test',
'test/boost/like_matcher_test',
'test/boost/linearizing_input_stream_test',
'test/boost/map_difference_test',
'test/boost/meta_test',
'test/boost/nonwrapping_range_test',
'test/boost/observable_test',
'test/boost/range_test',
'test/boost/range_tombstone_list_test',
'test/boost/serialization_test',
'test/boost/small_vector_test',
'test/boost/top_k_test',
'test/boost/vint_serialization_test',
'test/manual/json_test',
'test/manual/streaming_histogram_test',
])
tests_not_using_seastar_test_framework = set([
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
'tests/row_cache_alloc_stress',
'tests/perf_row_cache_update',
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/message',
'tests/perf/perf_cache_eviction',
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/gossip',
'tests/perf/perf_sstable',
'tests/small_vector_test',
'test/boost/small_vector_test',
'test/manual/gossip',
'test/manual/message',
'test/perf/perf_cache_eviction',
'test/perf/perf_cql_parser',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_row_cache_update',
'test/perf/perf_sstable',
'test/unit/lsa_async_eviction_test',
'test/unit/lsa_sync_eviction_test',
'test/unit/memory_footprint_test',
'test/unit/row_cache_alloc_stress_test',
'test/unit/row_cache_stress_test',
]) | pure_boost_tests
for t in tests_not_using_seastar_test_framework:
@@ -887,28 +947,29 @@ perf_tests_seastar_deps = [
for t in perf_tests:
deps[t] = [t + '.cc'] + scylla_tests_dependencies + perf_tests_seastar_deps
deps['tests/sstable_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
deps['tests/sstable_datafile_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']
deps['test/boost/sstable_test'] += ['test/lib/sstable_utils.cc', 'test/lib/normalizing_reader.cc']
deps['test/boost/sstable_datafile_test'] += ['test/lib/sstable_utils.cc', 'test/lib/normalizing_reader.cc']
deps['test/boost/mutation_reader_test'] += ['test/lib/sstable_utils.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'hashers.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
deps['tests/perf/perf_fast_forward'] += ['release.cc']
deps['tests/perf/perf_simple_query'] += ['release.cc']
deps['tests/meta_test'] = ['tests/meta_test.cc']
deps['tests/imr_test'] = ['tests/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/reusable_buffer_test'] = ['tests/reusable_buffer_test.cc']
deps['tests/utf8_test'] = ['utils/utf8.cc', 'tests/utf8_test.cc']
deps['tests/small_vector_test'] = ['tests/small_vector_test.cc']
deps['tests/multishard_mutation_query_test'] += ['tests/test_table.cc']
deps['tests/vint_serialization_test'] = ['tests/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
deps['test/boost/bytes_ostream_test'] = ['test/boost/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'hashers.cc']
deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
deps['test/boost/anchorless_list_test'] = ['test/boost/anchorless_list_test.cc']
deps['test/perf/perf_fast_forward'] += ['release.cc']
deps['test/perf/perf_simple_query'] += ['release.cc']
deps['test/boost/meta_test'] = ['test/boost/meta_test.cc']
deps['test/manual/imr_test'] = ['test/manual/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/reusable_buffer_test'] = ['test/boost/reusable_buffer_test.cc']
deps['test/boost/utf8_test'] = ['utils/utf8.cc', 'test/boost/utf8_test.cc']
deps['test/boost/small_vector_test'] = ['test/boost/small_vector_test.cc']
deps['test/boost/multishard_mutation_query_test'] += ['test/boost/test_table.cc']
deps['test/boost/vint_serialization_test'] = ['test/boost/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
deps['test/boost/linearizing_input_stream_test'] = ['test/boost/linearizing_input_stream_test.cc']
deps['tests/duration_test'] += ['tests/exception_utils.cc']
deps['test/boost/duration_test'] += ['test/lib/exception_utils.cc']
deps['utils/gz/gen_crc_combine_table'] = ['utils/gz/gen_crc_combine_table.cc']
@@ -951,9 +1012,13 @@ modes['release']['cxx_ld_flags'] += ' ' + ' '.join(optimization_flags)
gold_linker_flag = gold_supported(compiler=args.cxx)
dbgflag = '-g' if args.debuginfo else ''
dbgflag = '-g -gz' if args.debuginfo else ''
tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'
# Strip if debuginfo is disabled, otherwise we end up with partial
# debug info from the libraries we static link with
regular_link_rule = 'link' if args.debuginfo else 'link_stripped'
if args.so:
args.pie = '-shared'
args.fpie = '-fpic'
@@ -970,6 +1035,10 @@ else:
optional_packages = [['libsystemd', 'libsystemd-daemon']]
pkgs = []
# Lua can be provided by lua53 package on Debian-like
# systems and by Lua on others.
pkgs.append('lua53' if have_pkg('lua53') else 'lua')
def setup_first_pkg_of_list(pkglist):
# The HAVE_pkg symbol is taken from the first alternative
@@ -1060,23 +1129,6 @@ scylla_release = file.read().strip()
extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
seastar_flags = []
if args.dpdk:
# fake dependencies on dpdk, so that it is built before anything else
seastar_flags += ['--enable-dpdk']
if args.gcc6_concepts:
seastar_flags += ['--enable-gcc6-concepts']
if args.alloc_failure_injector:
seastar_flags += ['--enable-alloc-failure-injector']
if args.split_dwarf:
seastar_flags += ['--split-dwarf']
# We never compress debug info in debug mode
modes['debug']['cxxflags'] += ' -gz'
# We compress it by default in release mode
flag_dest = 'cxx_ld_flags' if args.compress_exec_debuginfo else 'cxxflags'
modes['release'][flag_dest] += ' -gz'
for m in ['debug', 'release', 'sanitize']:
modes[m]['cxxflags'] += ' ' + dbgflag
@@ -1085,27 +1137,56 @@ seastar_cflags += ' -Wno-error'
if args.target != '':
seastar_cflags += ' -march=' + args.target
seastar_ldflags = args.user_ldflags
seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' % (seastar_ldflags),
'--c++-dialect=gnu++17', '--use-std-optional-variant-stringview=1', '--optflags=%s' % (modes['release']['cxx_ld_flags']), ]
libdeflate_cflags = seastar_cflags
zstd_cflags = seastar_cflags + ' -Wno-implicit-fallthrough'
status = subprocess.call([args.python, './configure.py'] + seastar_flags, cwd='seastar')
MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }
if status != 0:
print('Seastar configuration failed')
sys.exit(1)
def configure_seastar(build_dir, mode):
seastar_build_dir = os.path.join(build_dir, mode, 'seastar')
seastar_cmake_args = [
'-DCMAKE_BUILD_TYPE={}'.format(MODE_TO_CMAKE_BUILD_TYPE[mode]),
'-DCMAKE_C_COMPILER={}'.format(args.cc),
'-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
'-DSeastar_CXX_FLAGS={}'.format((seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']).replace(' ', ';')),
'-DSeastar_LD_FLAGS={}'.format(seastar_ldflags),
'-DSeastar_CXX_DIALECT=gnu++17',
'-DSeastar_STD_OPTIONAL_VARIANT_STRINGVIEW=ON',
'-DSeastar_UNUSED_RESULT_ERROR=ON',
]
if args.dpdk:
seastar_cmake_args += ['-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm']
if args.gcc6_concepts:
seastar_cmake_args += ['-DSeastar_GCC6_CONCEPTS=ON']
if args.split_dwarf:
seastar_cmake_args += ['-DSeastar_SPLIT_DWARF=ON']
if args.alloc_failure_injector:
seastar_cmake_args += ['-DSeastar_ALLOC_FAILURE_INJECTION=ON']
pc = {mode: 'build/{}/seastar.pc'.format(mode) for mode in build_modes}
seastar_cmd = ['cmake', '-G', 'Ninja', os.path.relpath('seastar', seastar_build_dir)] + seastar_cmake_args
cmake_dir = seastar_build_dir
if args.dpdk:
# need to cook first
cmake_dir = 'seastar' # required by cooking.sh
relative_seastar_build_dir = os.path.join('..', seastar_build_dir) # relative to seastar/
seastar_cmd = ['./cooking.sh', '-i', 'dpdk', '-d', relative_seastar_build_dir, '--'] + seastar_cmd[4:]
print(seastar_cmd)
os.makedirs(seastar_build_dir, exist_ok=True)
subprocess.check_call(seastar_cmd, shell=False, cwd=cmake_dir)
for mode in build_modes:
configure_seastar('build', mode)
pc = {mode: 'build/{}/seastar/seastar.pc'.format(mode) for mode in build_modes}
ninja = find_executable('ninja') or find_executable('ninja-build')
if not ninja:
print('Ninja executable (ninja or ninja-build) not found on PATH\n')
sys.exit(1)
def query_seastar_flags(seastar_pc_file, link_static_cxx=False):
pc_file = os.path.join('seastar', seastar_pc_file)
def query_seastar_flags(pc_file, link_static_cxx=False):
cflags = pkg_config(pc_file, '--cflags', '--static')
libs = pkg_config(pc_file, '--libs', '--static')
@@ -1119,8 +1200,6 @@ for mode in build_modes:
modes[mode]['seastar_cflags'] = seastar_cflags
modes[mode]['seastar_libs'] = seastar_libs
MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }
# We need to use experimental features of the zstd library (to use our own allocators for the (de)compression context),
# which are available only when the library is linked statically.
def configure_zstd(build_dir, mode):
@@ -1190,6 +1269,11 @@ if args.antlr3_exec:
else:
antlr3_exec = "antlr3"
if args.ragel_exec:
ragel_exec = args.ragel_exec
else:
ragel_exec = "ragel"
for mode in build_modes:
configure_zstd(outdir, mode)
@@ -1206,6 +1290,7 @@ with open(buildfile_tmp, 'w') as f:
cxx = {cxx}
cxxflags = {user_cflags} {warnings} {defines}
ldflags = {gold_linker_flag} {user_ldflags}
ldflags_build = {gold_linker_flag}
libs = {libs}
pool link_pool
depth = {link_pool_depth}
@@ -1224,6 +1309,11 @@ with open(buildfile_tmp, 'w') as f:
command = {ninja} -C $subdir $target
restat = 1
description = NINJA $out
rule ragel
# sed away a bug in ragel 7 that emits some extraneous _nfa* variables
# (the $$ is collapsed to a single one by ninja)
command = {ragel_exec} -G2 -o $out $in && sed -i -e '1h;2,$$H;$$!d;g' -re 's/static const char _nfa[^;]*;//g' $out
description = RAGEL $out
rule run
command = $in > $out
description = GEN $out
@@ -1243,7 +1333,7 @@ with open(buildfile_tmp, 'w') as f:
libs_{mode} = -l{fmt_lib}
seastar_libs_{mode} = {seastar_libs}
rule cxx.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -c -o $out $in
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in
description = CXX $out
depfile = $out.d
rule link.{mode}
@@ -1254,6 +1344,10 @@ with open(buildfile_tmp, 'w') as f:
command = $cxx $ld_flags_{mode} -s $ldflags -o $out $in $libs $libs_{mode}
description = LINK (stripped) $out
pool = link_pool
rule link_build.{mode}
command = $cxx $ld_flags_{mode} $ldflags_build -o $out $in $libs $libs_{mode}
description = LINK (build) $out
pool = link_pool
rule ar.{mode}
command = rm -f $out; ar cr $out $in; ranlib $out
description = AR $out
@@ -1288,8 +1382,10 @@ with open(buildfile_tmp, 'w') as f:
swaggers = {}
serializers = {}
thrifts = set()
ragels = {}
antlr3_grammars = set()
seastar_dep = 'seastar/build/{}/libseastar.a'.format(mode)
seastar_dep = 'build/{}/seastar/libseastar.a'.format(mode)
seastar_testing_dep = 'build/{}/seastar/libseastar_testing.a'.format(mode)
for binary in build_artifacts:
if binary in other:
continue
@@ -1313,12 +1409,12 @@ with open(buildfile_tmp, 'w') as f:
'zstd/lib/libzstd.a',
]])
objs.append('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.o')
if binary.startswith('tests/'):
if binary.startswith('test/'):
local_libs = '$seastar_libs_{} $libs'.format(mode)
if binary in pure_boost_tests:
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
if binary not in tests_not_using_seastar_test_framework:
pc_path = os.path.join('seastar', pc[mode].replace('seastar.pc', 'seastar-testing.pc'))
pc_path = pc[mode].replace('seastar.pc', 'seastar-testing.pc')
local_libs += ' ' + pkg_config(pc_path, '--libs', '--static')
if has_thrift:
local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
@@ -1327,12 +1423,12 @@ with open(buildfile_tmp, 'w') as f:
# So we strip the tests by default; The user can very
# quickly re-link the test unstripped by adding a "_g"
# to the test name, e.g., "ninja build/release/testname_g"
f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep))
f.write('build $builddir/{}/{}: {}.{} {} | {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
f.write(' libs = {}\n'.format(local_libs))
f.write('build $builddir/{}/{}_g: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
f.write('build $builddir/{}/{}_g: {}.{} {} | {} {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
f.write(' libs = {}\n'.format(local_libs))
else:
f.write('build $builddir/{}/{}: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep))
if has_thrift:
f.write(' libs = {} {} $seastar_libs_{} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system'), mode))
for src in srcs:
@@ -1345,6 +1441,9 @@ with open(buildfile_tmp, 'w') as f:
elif src.endswith('.json'):
hh = '$builddir/' + mode + '/gen/' + src + '.hh'
swaggers[hh] = src
elif src.endswith('.rl'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
ragels[hh] = src
elif src.endswith('.thrift'):
thrifts.add(src)
elif src.endswith('.g'):
@@ -1355,7 +1454,7 @@ with open(buildfile_tmp, 'w') as f:
compiles['$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'] = 'utils/gz/gen_crc_combine_table.cc'
f.write('build {}: run {}\n'.format('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.cc',
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table'))
f.write('build {}: link.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
f.write('build {}: link_build.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'))
f.write(' libs = $seastar_libs_{}\n'.format(mode))
f.write(
@@ -1373,6 +1472,7 @@ with open(buildfile_tmp, 'w') as f:
gen_headers += g.headers('$builddir/{}/gen'.format(mode))
gen_headers += list(swaggers.keys())
gen_headers += list(serializers.keys())
gen_headers += list(ragels.keys())
gen_headers_dep = ' '.join(gen_headers)
for obj in compiles:
@@ -1386,6 +1486,9 @@ with open(buildfile_tmp, 'w') as f:
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh, src))
for hh in ragels:
src = ragels[hh]
f.write('build {}: ragel {}\n'.format(hh, src))
for thrift in thrifts:
outs = ' '.join(thrift.generated('$builddir/{}/gen'.format(mode)))
f.write('build {}: thrift.{} {}\n'.format(outs, mode, thrift.source))
@@ -1399,25 +1502,33 @@ with open(buildfile_tmp, 'w') as f:
for cc in grammar.sources('$builddir/{}/gen'.format(mode)):
obj = cc.replace('.cpp', '.o')
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
if cc.endswith('Parser.cpp') and has_sanitize_address_use_after_scope:
# Parsers end up using huge amounts of stack space and overflowing their stack
f.write(' obj_cxxflags = -fno-sanitize-address-use-after-scope\n')
if cc.endswith('Parser.cpp'):
# Unoptimized parsers end up using huge amounts of stack space and overflowing their stack
flags = '-O1'
if has_sanitize_address_use_after_scope:
flags += ' -fno-sanitize-address-use-after-scope'
f.write(' obj_cxxflags = %s\n' % flags)
for hh in headers:
f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} || {gen_headers_dep}\n'.format(
mode=mode, hh=hh, gen_headers_dep=gen_headers_dep))
f.write('build seastar/build/{mode}/libseastar.a: ninja | always\n'
f.write('build build/{mode}/seastar/libseastar.a: ninja | always\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = seastar/build/{mode}\n'.format(**locals()))
f.write(' target = seastar seastar_testing\n'.format(**locals()))
f.write('build seastar/build/{mode}/apps/iotune/iotune: ninja\n'
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar\n'.format(**locals()))
f.write('build build/{mode}/seastar/libseastar_testing.a: ninja\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = seastar/build/{mode}\n'.format(**locals()))
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar_testing\n'.format(**locals()))
f.write('build build/{mode}/seastar/apps/iotune/iotune: ninja\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = iotune\n'.format(**locals()))
f.write(textwrap.dedent('''\
build build/{mode}/iotune: copy seastar/build/{mode}/apps/iotune/iotune
build build/{mode}/iotune: copy build/{mode}/seastar/apps/iotune/iotune
''').format(**locals()))
f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE | always\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
@@ -1438,7 +1549,7 @@ with open(buildfile_tmp, 'w') as f:
rule configure
command = {python} configure.py $configure_args
generator = 1
build build.ninja: configure | configure.py seastar/configure.py
build build.ninja: configure | configure.py SCYLLA-VERSION-GEN
rule cscope
command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
description = CSCOPE
@@ -1447,6 +1558,10 @@ with open(buildfile_tmp, 'w') as f:
command = rm -rf build
description = CLEAN
build clean: clean
rule mode_list
command = echo {modes_list}
description = List configured modes
build mode_list: mode_list
default {modes_list}
''').format(modes_list=' '.join(default_modes), **globals()))
f.write(textwrap.dedent('''\

71
connection_notifier.cc Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "connection_notifier.hh"
#include "db/query_context.hh"
#include "cql3/constants.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <stdexcept>
namespace db::system_keyspace {
extern const char *const CLIENTS;
}
static sstring to_string(client_type ct) {
switch (ct) {
case client_type::cql: return "cql";
case client_type::thrift: return "thrift";
case client_type::alternator: return "alternator";
default: throw std::runtime_error("Invalid client_type");
}
}
future<> notify_new_client(client_data cd) {
// FIXME: consider prepared statement
const static sstring req
= format("INSERT INTO system.{} (address, port, client_type, shard_id, protocol_version, username) "
"VALUES (?, ?, ?, ?, ?, ?);", db::system_keyspace::CLIENTS);
return db::execute_cql(req,
std::move(cd.ip), cd.port, to_string(cd.ct), cd.shard_id,
cd.protocol_version.has_value() ? data_value(*cd.protocol_version) : data_value::make_null(int32_type),
cd.username.value_or("anonymous")).discard_result();
}
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port) {
// FIXME: consider prepared statement
const static sstring req
= format("DELETE FROM system.{} where address=? AND port=? AND client_type=?;",
db::system_keyspace::CLIENTS);
return db::execute_cql(req, addr.addr(), port, to_string(ct)).discard_result();
}
future<> clear_clientlist() {
auto& db_local = service::get_storage_proxy().local().get_db().local();
return db_local.truncate(
db_local.find_keyspace(db::system_keyspace_name()),
db_local.find_column_family(db::system_keyspace_name(),
db::system_keyspace::CLIENTS),
[] { return make_ready_future<db_clock::time_point>(db_clock::now()); },
false /* with_snapshot */);
}

57
connection_notifier.hh Normal file
View File

@@ -0,0 +1,57 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "gms/inet_address.hh"
#include <seastar/core/sstring.hh>
#include <optional>
enum class client_type {
cql = 0,
thrift,
alternator,
};
// Representation of a row in `system.clients'. std::optionals are for nullable cells.
struct client_data {
gms::inet_address ip;
int32_t port;
client_type ct;
int32_t shard_id; /// ID of server-side shard which is processing the connection.
// `optional' column means that it's nullable (possibly because it's
// unimplemented yet). If you want to fill ("implement") any of them,
// remember to update the query in `notify_new_client()'.
std::optional<sstring> connection_stage;
std::optional<sstring> driver_name;
std::optional<sstring> driver_version;
std::optional<sstring> hostname;
std::optional<int32_t> protocol_version;
std::optional<sstring> ssl_cipher_suite;
std::optional<bool> ssl_enabled;
std::optional<sstring> ssl_protocol;
std::optional<sstring> username;
};
future<> notify_new_client(client_data cd);
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port);
future<> clear_clientlist();

View File

@@ -21,6 +21,9 @@
#pragma once
#include "types/user.hh"
#include "concrete_types.hh"
#include "mutation_partition_view.hh"
#include "mutation_partition.hh"
#include "schema.hh"
@@ -35,8 +38,8 @@ class converting_mutation_partition_applier : public mutation_partition_visitor
const column_mapping& _visited_column_mapping;
deletable_row* _current_row;
private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
static bool is_compatible(const column_definition& new_def, const abstract_type& old_type, column_kind kind) {
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(old_type);
}
static atomic_cell upgrade_cell(const abstract_type& new_type, const abstract_type& old_type, atomic_cell_view cell,
atomic_cell::collection_member cm = atomic_cell::collection_member::no) {
@@ -49,32 +52,59 @@ private:
return atomic_cell(new_type, cell);
}
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, atomic_cell_view cell) {
if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
return;
}
dst.apply(new_def, upgrade_cell(*new_def.type, *old_type, cell));
dst.apply(new_def, upgrade_cell(*new_def.type, old_type, cell));
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, collection_mutation_view cell) {
if (!is_compatible(new_def, old_type, kind)) {
return;
}
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto new_ctype = static_pointer_cast<const collection_type_impl>(new_def.type);
auto old_ctype = static_pointer_cast<const collection_type_impl>(old_type);
auto old_view = old_ctype->deserialize_mutation_form(cell_bv);
collection_type_impl::mutation new_view;
cell.with_deserialized(old_type, [&] (collection_mutation_view_description old_view) {
collection_mutation_description new_view;
if (old_view.tomb.timestamp > new_def.dropped_at()) {
new_view.tomb = old_view.tomb;
}
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
new_view.cells.emplace_back(c.first, upgrade_cell(*new_ctype->value_comparator(), *old_ctype->value_comparator(), c.second, atomic_cell::collection_member::yes));
visit(old_type, make_visitor(
[&] (const collection_type_impl& old_ctype) {
assert(new_def.type->is_collection()); // because is_compatible
auto& new_ctype = static_cast<const collection_type_impl&>(*new_def.type);
auto& new_value_type = *new_ctype.value_comparator();
auto& old_value_type = *old_ctype.value_comparator();
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
new_view.cells.emplace_back(c.first, upgrade_cell(
new_value_type, old_value_type, c.second, atomic_cell::collection_member::yes));
}
}
},
[&] (const user_type_impl& old_utype) {
assert(new_def.type->is_user_type()); // because is_compatible
auto& new_utype = static_cast<const user_type_impl&>(*new_def.type);
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
auto idx = deserialize_field_index(c.first);
assert(idx < new_utype.size() && idx < old_utype.size());
new_view.cells.emplace_back(c.first, upgrade_cell(
*new_utype.type(idx), *old_utype.type(idx), c.second, atomic_cell::collection_member::yes));
}
}
},
[&] (const abstract_type& o) {
throw std::runtime_error(format("not a multi-cell type: {}", o.name()));
}
}
));
if (new_view.tomb || !new_view.cells.empty()) {
dst.apply(new_def, new_ctype->serialize_mutation_form(std::move(new_view)));
dst.apply(new_def, new_view.serialize(*new_def.type));
}
});
}
@@ -100,7 +130,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), cell);
}
}
@@ -108,7 +138,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), collection);
}
}
@@ -131,7 +161,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), cell);
}
}
@@ -139,7 +169,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), collection);
}
}
@@ -147,9 +177,9 @@ public:
// Cells must have monotonic names.
static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const column_definition& old_def, const atomic_cell_or_collection& cell) {
if (new_def.is_atomic()) {
accept_cell(dst, kind, new_def, old_def.type, cell.as_atomic_cell(old_def));
accept_cell(dst, kind, new_def, *old_def.type, cell.as_atomic_cell(old_def));
} else {
accept_cell(dst, kind, new_def, old_def.type, cell.as_collection_mutation());
accept_cell(dst, kind, new_def, *old_def.type, cell.as_collection_mutation());
}
}
};

View File

@@ -43,12 +43,14 @@ options {
#include "cql3/statements/create_table_statement.hh"
#include "cql3/statements/create_view_statement.hh"
#include "cql3/statements/create_type_statement.hh"
#include "cql3/statements/create_function_statement.hh"
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/alter_type_statement.hh"
#include "cql3/statements/property_definitions.hh"
#include "cql3/statements/drop_index_statement.hh"
#include "cql3/statements/drop_table_statement.hh"
#include "cql3/statements/drop_view_statement.hh"
#include "cql3/statements/drop_function_statement.hh"
#include "cql3/statements/truncate_statement.hh"
#include "cql3/statements/raw/update_statement.hh"
#include "cql3/statements/raw/insert_statement.hh"
@@ -243,10 +245,14 @@ struct uninitialized {
return res;
}
bool convert_boolean_literal(std::string_view s) {
std::string lower_s(s.size(), '\0');
sstring to_lower(std::string_view s) {
sstring lower_s(s.size(), '\0');
std::transform(s.cbegin(), s.cend(), lower_s.begin(), &::tolower);
return lower_s == "true";
return lower_s;
}
bool convert_boolean_literal(std::string_view s) {
return to_lower(s) == "true";
}
void add_raw_update(std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>,::shared_ptr<cql3::operation::raw_update>>>& operations,
@@ -348,9 +354,9 @@ cqlStatement returns [shared_ptr<raw::parsed_statement> stmt]
| st25=createTypeStatement { $stmt = st25; }
| st26=alterTypeStatement { $stmt = st26; }
| st27=dropTypeStatement { $stmt = st27; }
#if 0
| st28=createFunctionStatement { $stmt = st28; }
| st29=dropFunctionStatement { $stmt = st29; }
#if 0
| st30=createAggregateStatement { $stmt = st30; }
| st31=dropAggregateStatement { $stmt = st31; }
#endif
@@ -524,6 +530,7 @@ usingClauseObjective[::shared_ptr<cql3::attributes::raw> attrs]
*/
updateStatement returns [::shared_ptr<raw::update_statement> expr]
@init {
bool if_exists = false;
auto attrs = ::make_shared<cql3::attributes::raw>();
std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, ::shared_ptr<cql3::operation::raw_update>>> operations;
}
@@ -531,13 +538,14 @@ updateStatement returns [::shared_ptr<raw::update_statement> expr]
( usingClause[attrs] )?
K_SET columnOperation[operations] (',' columnOperation[operations])*
K_WHERE wclause=whereClause
( K_IF conditions=updateConditions )?
( K_IF (K_EXISTS{ if_exists = true; } | conditions=updateConditions) )?
{
return ::make_shared<raw::update_statement>(std::move(cf),
std::move(attrs),
std::move(operations),
std::move(wclause),
std::move(conditions));
std::move(conditions),
if_exists);
}
;
@@ -581,6 +589,7 @@ deleteSelection returns [std::vector<::shared_ptr<cql3::operation::raw_deletion>
deleteOp returns [::shared_ptr<cql3::operation::raw_deletion> op]
: c=cident { $op = ::make_shared<cql3::operation::column_deletion>(std::move(c)); }
| c=cident '[' t=term ']' { $op = ::make_shared<cql3::operation::element_deletion>(std::move(c), std::move(t)); }
| c=cident '.' field=ident { $op = ::make_shared<cql3::operation::field_deletion>(std::move(c), std::move(field)); }
;
usingClauseDelete[::shared_ptr<cql3::attributes::raw> attrs]
@@ -683,54 +692,56 @@ dropAggregateStatement returns [DropAggregateStatement expr]
)?
{ $expr = new DropAggregateStatement(fn, argsTypes, argsPresent, ifExists); }
;
#endif
createFunctionStatement returns [CreateFunctionStatement expr]
createFunctionStatement returns [shared_ptr<cql3::statements::create_function_statement> expr]
@init {
boolean orReplace = false;
boolean ifNotExists = false;
bool or_replace = false;
bool if_not_exists = false;
boolean deterministic = true;
List<ColumnIdentifier> argsNames = new ArrayList<>();
List<CQL3Type.Raw> argsTypes = new ArrayList<>();
std::vector<shared_ptr<cql3::column_identifier>> arg_names;
std::vector<shared_ptr<cql3_type::raw>> arg_types;
bool called_on_null_input = false;
}
: K_CREATE (K_OR K_REPLACE { orReplace = true; })?
((K_NON { deterministic = false; })? K_DETERMINISTIC)?
K_FUNCTION
(K_IF K_NOT K_EXISTS { ifNotExists = true; })?
: K_CREATE
// "OR REPLACE" and "IF NOT EXISTS" cannot be used together
((K_OR K_REPLACE { or_replace = true; } K_FUNCTION)
| (K_FUNCTION K_IF K_NOT K_EXISTS { if_not_exists = true; })
| K_FUNCTION)
fn=functionName
'('
(
k=ident v=comparatorType { argsNames.add(k); argsTypes.add(v); }
( ',' k=ident v=comparatorType { argsNames.add(k); argsTypes.add(v); } )*
k=ident v=comparatorType { arg_names.push_back(k); arg_types.push_back(v); }
( ',' k=ident v=comparatorType { arg_names.push_back(k); arg_types.push_back(v); } )*
)?
')'
( (K_RETURNS K_NULL) | (K_CALLED { called_on_null_input = true; })) K_ON K_NULL K_INPUT
K_RETURNS rt = comparatorType
K_LANGUAGE language = IDENT
K_AS body = STRING_LITERAL
{ $expr = new CreateFunctionStatement(fn, $language.text.toLowerCase(), $body.text, deterministic, argsNames, argsTypes, rt, orReplace, ifNotExists); }
{ $expr = ::make_shared<cql3::statements::create_function_statement>(std::move(fn), to_lower($language.text), $body.text, std::move(arg_names), std::move(arg_types), std::move(rt), called_on_null_input, or_replace, if_not_exists); }
;
dropFunctionStatement returns [DropFunctionStatement expr]
dropFunctionStatement returns [shared_ptr<cql3::statements::drop_function_statement> expr]
@init {
boolean ifExists = false;
List<CQL3Type.Raw> argsTypes = new ArrayList<>();
boolean argsPresent = false;
bool if_exists = false;
std::vector<shared_ptr<cql3_type::raw>> arg_types;
bool args_present = false;
}
: K_DROP K_FUNCTION
(K_IF K_EXISTS { ifExists = true; } )?
(K_IF K_EXISTS { if_exists = true; } )?
fn=functionName
(
'('
(
v=comparatorType { argsTypes.add(v); }
( ',' v=comparatorType { argsTypes.add(v); } )*
v=comparatorType { arg_types.push_back(v); }
( ',' v=comparatorType { arg_types.push_back(v); } )*
)?
')'
{ argsPresent = true; }
{ args_present = true; }
)?
{ $expr = new DropFunctionStatement(fn, argsTypes, argsPresent, ifExists); }
{ $expr = ::make_shared<cql3::statements::drop_function_statement>(std::move(fn), std::move(arg_types), args_present, if_exists); }
;
#endif
/**
* CREATE KEYSPACE [IF NOT EXISTS] <KEYSPACE> WITH attr1 = value1 AND attr2 = value2;
@@ -1396,8 +1407,9 @@ columnOperation[operations_type& operations]
columnOperationDifferentiator[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
: '=' normalColumnOperation[operations, key]
| '[' k=term ']' specializedColumnOperation[operations, key, k, false]
| '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' specializedColumnOperation[operations, key, k, true]
| '[' k=term ']' collectionColumnOperation[operations, key, k, false]
| '.' field=ident udtColumnOperation[operations, key, field]
| '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' collectionColumnOperation[operations, key, k, true]
;
normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
@@ -1440,31 +1452,38 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
}
;
specializedColumnOperation[std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>,
shared_ptr<cql3::operation::raw_update>>>& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::term::raw> k,
bool by_uuid]
collectionColumnOperation[operations_type& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::term::raw> k,
bool by_uuid]
: '=' t=term
{
add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t, by_uuid));
}
;
udtColumnOperation[operations_type& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::column_identifier> field]
: '=' t=term
{
add_raw_update(operations, std::move(key), make_shared<cql3::operation::set_field>(std::move(field), std::move(t)));
}
;
columnCondition[conditions_type& conditions]
// Note: we'll reject duplicates later
: key=cident
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, *op)); }
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, {}, *op)); }
| K_IN
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(marker)); }
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, {}, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, marker, {})); }
)
| '[' element=term ']'
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::collection_condition(t, element, *op)); }
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, element, *op)); }
| K_IN
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, marker)); }
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, {}, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, marker, {})); }
)
)
)
@@ -1732,8 +1751,8 @@ basic_unreserved_keyword returns [sstring str]
| K_INITCOND
| K_RETURNS
| K_LANGUAGE
| K_NON
| K_DETERMINISTIC
| K_CALLED
| K_INPUT
| K_JSON
| K_CACHE
| K_BYPASS
@@ -1872,11 +1891,11 @@ K_STYPE: S T Y P E;
K_FINALFUNC: F I N A L F U N C;
K_INITCOND: I N I T C O N D;
K_RETURNS: R E T U R N S;
K_CALLED: C A L L E D;
K_INPUT: I N P U T;
K_LANGUAGE: L A N G U A G E;
K_NON: N O N;
K_OR: O R;
K_REPLACE: R E P L A C E;
K_DETERMINISTIC: D E T E R M I N I S T I C;
K_JSON: J S O N;
K_DEFAULT: D E F A U L T;
K_UNSET: U N S E T;

View File

@@ -45,6 +45,7 @@
#include "cql3/lists.hh"
#include "cql3/maps.hh"
#include "cql3/sets.hh"
#include "cql3/user_types.hh"
#include "types/list.hh"
namespace cql3 {
@@ -54,7 +55,7 @@ abstract_marker::abstract_marker(int32_t bind_index, ::shared_ptr<column_specifi
, _receiver{std::move(receiver)}
{ }
void abstract_marker::collect_marker_specification(::shared_ptr<variable_specifications> bound_names) {
void abstract_marker::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
bound_names->add(_bind_index, _receiver);
}
@@ -68,19 +69,22 @@ abstract_marker::raw::raw(int32_t bind_index)
::shared_ptr<term> abstract_marker::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
{
auto receiver_type = ::dynamic_pointer_cast<const collection_type_impl>(receiver->type);
if (receiver_type == nullptr) {
return ::make_shared<constants::marker>(_bind_index, receiver);
if (receiver->type->is_collection()) {
if (receiver->type->get_kind() == abstract_type::kind::list) {
return ::make_shared<lists::marker>(_bind_index, receiver);
} else if (receiver->type->get_kind() == abstract_type::kind::set) {
return ::make_shared<sets::marker>(_bind_index, receiver);
} else if (receiver->type->get_kind() == abstract_type::kind::map) {
return ::make_shared<maps::marker>(_bind_index, receiver);
}
assert(0);
}
if (receiver_type->get_kind() == abstract_type::kind::list) {
return ::make_shared<lists::marker>(_bind_index, receiver);
} else if (receiver_type->get_kind() == abstract_type::kind::set) {
return ::make_shared<sets::marker>(_bind_index, receiver);
} else if (receiver_type->get_kind() == abstract_type::kind::map) {
return ::make_shared<maps::marker>(_bind_index, receiver);
if (receiver->type->is_user_type()) {
return ::make_shared<user_types::marker>(_bind_index, receiver);
}
assert(0);
return shared_ptr<term>();
return ::make_shared<constants::marker>(_bind_index, receiver);
}
assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {

View File

@@ -57,7 +57,7 @@ protected:
public:
abstract_marker(int32_t bind_index, ::shared_ptr<column_specification>&& receiver);
virtual void collect_marker_specification(::shared_ptr<variable_specifications> bound_names) override;
virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) override;
virtual bool contains_bind_marker() const override;

View File

@@ -120,7 +120,7 @@ int32_t attributes::get_time_to_live(const query_options& options) {
return ttl;
}
void attributes::collect_marker_specification(::shared_ptr<variable_specifications> bound_names) {
void attributes::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
if (_timestamp) {
_timestamp->collect_marker_specification(bound_names);
}

Some files were not shown because too many files have changed in this diff Show More