Compare commits

...

2248 Commits

Author SHA1 Message Date
Konstantin Osipov
fd293768e7 storage_proxy: do not touch all_replicas.front() if it's empty.
The list of all endpoints for a query can be empty if we have
replication_factor 0 or there are no live endpoints for this token.
Do not access all_replicas.front() in this case.

Fixes #5935.
Message-Id: <20200306192521.73486-2-kostja@scylladb.com>

(cherry picked from commit 9827efe554)
2020-06-22 18:29:15 +03:00
Gleb Natapov
22dfa48585 cql transport: do not log broken pipe error when a client closes its side of a connection abruptly
Fixes #5661

Message-Id: <20200615075958.GL335449@scylladb.com>
(cherry picked from commit 7ca937778d)
2020-06-21 13:09:22 +03:00
Benny Halevy
2f3d7f1408 cql3::util::maybe_quote: avoid stack overflow and fix quote doubling
The function was reimplemented to solve the following issues.
The cutom implementation also improved its performance in
close to 19%

Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings
as found with the limits_test.py:TestLimits.max_key_length_test dtest.

std::regex_replace does not replace in-place so no doubling of
quotes was actually done.

Add unit test that reproduces the crash without this fix
and tests various string patterns for correctness.

Note that defining the regex with std::regex::optimize
still ended up with stack overflow.

Fixes #5671

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0329fe1fd1)
2020-06-21 13:07:21 +03:00
Gleb Natapov
76a08df939 commitlog: fix size of a write used to zero a segment
Due to a bug the entire segment is written in one huge write of 32Mb.
The idea was to split it to writes of 128K, so fix it.

Fixes #5857

Message-Id: <20200220102939.30769-1-gleb@scylladb.com>
(cherry picked from commit df2f67626b)
2020-06-21 13:03:05 +03:00
Amnon Heiman
6aa129d3b0 api/storage_service.cc: stream result of token_range
The get token range API can become big which can cause large allocation
and stalls.

This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.

Fixes #6297

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 7c4562d532)
2020-06-21 12:57:48 +03:00
Takuya ASADA
b4f781e4eb scylla_post_install.sh: fix operator precedence issue with multiple statements
In bash, 'A || B && C' will be problem because when A is true, then it will be
evaluates C, since && and || have the same precedence.
To avoid the issue we need make B && C in one statement.

Fixes #5764

(cherry picked from commit b6988112b4)
2020-06-21 12:47:05 +03:00
Takuya ASADA
27594ca50e scylla_raid_setup: create missing directories
We need to create hints, view_hints, saved_caches directories
on RAID volume.

Fixes #5811

(cherry picked from commit 086f0ffd5a)
2020-06-21 12:45:27 +03:00
Rafael Ávila de Espíndola
0f2f0d65d7 configure: Reduce the dynamic linker path size
gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path
size.

Fixes: #6494

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200528202741.398695-2-espindola@scylladb.com>
(cherry picked from commit aa778ec152)
2020-06-21 12:29:16 +03:00
Tomasz Grabiec
31c2f8a3ae row_cache: Fix undefined behavior on key linearization
This is relevant only when using partition or clustering keys which
have a representation in memory which is larger than 12.8 KB (10% of
LSA segment size).

There are several places in code (cache, background garbage
collection) which may need to linearize keys because of performing key
comparison, but it's not done safely:

 1) the code does not run with the LSA region locked, so pointers may
get invalidated on linearization if it needs to reclaim memory. This
is fixed by running the code inside an allocating section.

 2) LSA region is locked, but the scope of
with_linearized_managed_bytes() encloses the allocating section. If
allocating section needs to reclaim, linearization context will
contain invalidated pointers. The fix is to reorder the scopes so
that linearization context lives within an allocating section.

Example of 1 can be found in
range_populating_reader::handle_end_of_stream() where it performs a
lookup:

  auto prev = std::prev(it);
  if (prev->key().equal(*_cache._schema, *_last_key->_key)) {
     it->set_continuous(true);

but handle_end_of_stream() is not invoked under allocating section.

Example of 2 can be found in mutation_cleaner_impl::merge_some() where
it does:

  return with_linearized_managed_bytes([&] {
  ...
    return _worker_state->alloc_section(region, [&] {

Fixes #6637.
Refs #6108.

Tests:

  - unit (all)

Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e81fc1f095)
2020-06-21 11:58:59 +03:00
Yaron Kaikov
ec12331f11 release: prepare for 3.3.4 2020-06-15 21:19:02 +03:00
Avi Kivity
ccc463b5e5 tools: toolchain: regenerate for gnutls 3.6.14
CVE-2020-13777.

Fixes #6627.

Toolchain source image registry disambiguated due to tighter podman defaults.
2020-06-15 08:05:58 +03:00
Calle Wilund
4a9676f6b7 gms::inet_address: Fix sign extension error in custom address formatting
Fixes #5808

Seems some gcc:s will generate the code as sign extending. Mine does not,
but this should be more correct anyhow.

Added small stringify test to serialization_test for inet_address

(cherry picked from commit a14a28cdf4)
2020-06-09 20:16:50 +03:00
Takuya ASADA
aaf4989c31 aws: update enhanced networking supported instance list
Sync enhanced networking supported instance list to latest one.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Fixes #6540

(cherry picked from commit 969c4258cf)
2020-06-09 16:03:00 +03:00
Asias He
b29f954f20 gossip: Make is_safe_for_bootstrap more strict
Consider

1. Start n1, n2 in the cluster
2. Stop n2 and delete all data for n2
3. Start n2 to replace itself with replace_address_first_boot: n2
4. Kill n2 before n2 finishes the replace operation
5. Remove replace_address_first_boot: n2 from scylla.yaml of n2
6. Delete all data for n2
7. Start n2

At step 7, n2 will be allowed to bootstrap as a new node, because the
application state of n2 in the cluster is HIBERNATE which is not
rejected in the check of is_safe_for_bootstrap. As a result, n2 will
replace n2 with a different tokens and a different host_id, as if the
old n2 node was removed from the cluster silently.

Fixes #5172

(cherry picked from commit cdcedf5eb9)
2020-05-25 14:30:53 +03:00
Eliran Sinvani
5546d5df7b Auth: return correct error code when role is not found
Scylla returns the wrong error code (0000 - server internal error)
in response to trying to do authentication/authorization operations
that involves a non-existing role.
This commit changes those cases to return error code 2200 (invalid
query) which is the correct one and also the one that Cassandra
returns.
Tests:
    Unit tests (Dev)
    All auth and auth_role dtests

(cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214)

Fixes #6363.
2020-05-25 12:58:38 +03:00
Amnon Heiman
541c29677f storage_service: get_range_to_address_map prevent use after free
The implementation of get_range_to_address_map has a default behaviour,
when getting an empty keypsace, it uses the first non-system keyspace
(first here is basically, just a keyspace).

The current implementation has two issues, first, it uses a reference to
a string that is held on a stack of another function. In other word,
there's a use after free that is not clear why we never hit.

The second, it calls get_non_system_keyspaces twice. Though this is not
a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling
that function does have a cost).

This patch solves both issues, by chaning the implementation to hold a
string instead of a reference to a string.

Second, it stores the results from get_non_system_keyspaces and reuse
them it's more efficient and holds the returned values on the local
stack.

Fixes #6465

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 69a46d4179)
2020-05-25 12:48:48 +03:00
Hagit Segev
06f18108c0 release: prepare for 3.3.3 2020-05-24 23:28:07 +03:00
Tomasz Grabiec
90002ca3d2 sstables: index_reader: Fix overflow when calculating promoted index end
When index file is larger than 4GB, offset calculation will overflow
uint32_t and _promoted_index_end will be too small.

As a result, promoted_index_size calculation will underflow and the
rest of the page will be interpretd as a promoted index.

The partitions which are in the remainder of the index page will not
be found by single-partition queries.

Data is not lost.

Introduced in 6c5f8e0eda.

Fixes #6040
Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com>

(cherry picked from commit a6c87a7b9e)
2020-05-24 09:46:11 +03:00
Rafael Ávila de Espíndola
da23902311 repair: Make sure sinks are always closed
In a recent next failure I got the following backtrace

    function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101
    at ./seastar/include/seastar/core/shared_ptr.hh:463
    at repair/row_level.cc:2059

This patch changes a few functions to use finally to make sure the sink
is always closed.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200515202803.60020-1-espindola@scylladb.com>
(cherry picked from commit 311fbe2f0a)

Ref #6414
2020-05-20 09:00:57 +03:00
Asias He
2b0dc21f97 repair: Fix race between write_end_of_stream and apply_rows
Consider: n1, n2, n1 is the repair master, n2 is the repair follower.

=== Case 1 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to
   partition 1, r2 belongs to partition 2. It yields after row r1 is
   written.
   data: partition_start, r1
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream()
   data: partition_start, r1, partition_end
5) Step 2 resumes to apply the rows.
   data: partition_start, r1, partition_end, partition_end, partition_start, r2

=== Case 2 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
   is written to sstable, r2 is not written yet, r1 belongs to partition
   1, r2 belongs to partition 2. It yields after partition_start for r2
   is written but before _partition_opened is set to true.
   data: partition_start, r1, partition_end, partition_start
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream().
   Since _partition_opened[node_idx] is false, partition_end is skipped,
   end_of_stream is written.
   data: partition_start, r1, partition_end, partition_start, end_of_stream

This causes unbalanced partition_start and partition_end in the stream
written to sstables.

To fix, serialize the write_end_of_stream and apply_rows with a semaphore.

Fixes: #6394
Fixes: #6296
Fixes: #6414
(cherry picked from commit b2c4d9fdbc)
2020-05-20 08:22:05 +03:00
Piotr Dulikowski
b544691493 hinted handoff: don't keep positions of old hints in rps_set
When sending hints from one file, rps_set field in send_one_file_ctx
keeps track of commitlog positions of hints that are being currently
sent, or have failed to be sent. At the end of the operation, if sending
of some hints failed, we will choose position of the earliest hint that
failed to be sent, and will retry sending that file later, starting from
that position. This position is stored in _last_not_complete_rp.

Usually, this set has a bounded size, because we impose a limit of at
most 128 hints being sent concurrently. Because we do not attempt to
send any more hints after a failure is detected, rps_set should not have
more than 128 elements at a time.

Due to a bug, commitlog positions of old hints (older than
gc_grace_seconds of the destination table) were inserted into rps_set
but not removed after checking their age. This could cause rps_set to
grow very large when replaying a file with old hints.

Moreover, if the file mixed expired and non-expired hints (which could
happen if it had hints to two tables with different gc_grace_seconds),
and sending of some non-expired hints failed, then positions of expired
hints could influence calculation _last_not_complete_rp, and more hints
than necessary would be resent on the next retry.

This simple patch removes commitlog position of a hint from rps_set when
it is detected to be too old.

Fixes #6422

(cherry picked from commit 85d5c3d5ee)
2020-05-20 08:06:17 +03:00
Piotr Dulikowski
d420b06844 hinted handoff: remove discarded hint positions from rps_set
Related commit: 85d5c3d

When attempting to send a hint, an exception might occur that results in
that hint being discarded (e.g. keyspace or table of the hint was
removed).

When such an exception is thrown, position of the hint will already be
stored in rps_set. We are only allowed to retain positions of hints that
failed to be sent and needed to be retried later. Dropping a hint is not
an error, therefore its position should be removed from rps_set - but
current logic does not do that.

Because of that bug, hint files with many discardable hints might cause
rps_set to grow large when the file is replayed. Furthermore, leaving
positions of such hints in rps_set might cause more hints than necessary
to be re-sent if some non-discarded hints fail to be sent.

This commit fixes the problem by removing positions of discarded hints
from rps_set.

Fixes #6433

(cherry picked from commit 0c5ac0da98)
2020-05-20 08:04:10 +03:00
Avi Kivity
b3a2cb2f68 Update seastar submodule
* seastar 0ebd89a858...30f03aeba9 (1):
  > timer: add scheduling_group awareness

Fixes #6170.
2020-05-10 18:39:20 +03:00
Hagit Segev
c8c057f5f8 release: prepare for 3.3.2 2020-05-10 18:16:28 +03:00
Gleb Natapov
038bfc925c storage_proxy: limit read repair only to replicas that answered during speculative reads
Speculative reader has more targets that needed for CL. In case there is
a digest mismatch the repair runs between all of them, but that violates
provided CL. The patch makes it so that repair runs only between
replicas that answered (there will be CL of them).

Fixes #6123

Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402132245.GA21956@scylladb.com>
(cherry picked from commit 36a24bbb70)
2020-05-07 19:48:37 +03:00
Mike Goltsov
13a4e7db83 fix error in fstrim service (scylla_util.py)
On Centos 7 machine:

fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation
When trying run scylla-fstrim service manually you get error:

Traceback (most recent call last):
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module>
main()
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main
cfg = parse_scylla_dirs_with_default(conf=args.config)
File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default
if key not in y or not y[k]:
NameError: name 'k' is not defined

It caused by error in scylla_util.py

Fixes #6294.

(cherry picked from commit 068bb3a5bf)
2020-05-07 19:45:50 +03:00
Juliusz Stasiewicz
727d6cf8f3 atomic_cell: special rule for printing counter cells
Until now, attempts to print counter update cell would end up
calling abort() because `atomic_cell_view::value()` has no
specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`,
i.e. counter update IMR type. Such visitor is not easy to write
if we want to intercept counters only (and not all int64_t values).

Anyway, linearized byte representation of counter cell would not
be helpful without knowing if it consists of counter shards or
counter update (delta) - and this must be known upon `deserialize`.

This commit introduces simple approach: it determines cell type on
high level (from `atomic_cell_view`) and prints counter contents by
`counter_cell_view` or `atomic_cell_view::counter_update_value()`.

Fixes #5616

(cherry picked from commit 0ea17216fe)
2020-05-07 19:40:47 +03:00
Tomasz Grabiec
6d6d7b4abe sstables: Release reserved space for sharding metadata
The intention of the code was to clear sharding metadata
chunked_vector so that it doesn't bloat memory.

The type of c is `chunked_vector*`. Assigning `{}`
clears the pointer while the intended behavior was to reset the
`chunked_vector` instance. The original instance is left unmodified
with all its reserved space.

Because of this, the previous fix had no effect because token ranges
are stored entirely inline and popping them doesn't realease memory.

Fixes #4951

Tests:
  - sstable_mutation_test (dev)
  - manual using scylla binary on customer data on top of 2019.1.5

Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5fe626a887)
2020-05-07 19:06:22 +03:00
Tomasz Grabiec
28f974b810 Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael
Row cache needs to be invalidated whenever data in sstables
changes. Cleanup removes data from sstables which doesn't belong to
the node anymore, which means cache must be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which
data are still stored in the node's row cache, because cleanup didn't
invalidate the cache."

Fixes #4446.

tests:
- unit tests (dev mode)
- dtests:
    update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
    cleanup_test.py

(cherry picked from commit d0b6be0820)
2020-05-07 16:24:51 +03:00
Piotr Sarna
5fdadcaf3b network_topology_strategy: validate integers
In order to prevent users from creating a network topology
strategy instance with invalid inputs, it's not enough to use
std::stol() on the input: a string "3abc" still returns the number '3',
but will later confuse cqlsh and other drivers, when they ask for
topology strategy details.
The error message is now more human readable, since for incorrect
numeric inputs it used to return a rather cryptic message:
    ServerError: stol()
This commit fixes the issue and comes with a simple test.

Fixes #3801
Tests: unit(dev)
Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com>

(cherry picked from commit 5b7a35e02b)
2020-05-07 16:24:49 +03:00
Pekka Enberg
a960394f27 scripts/jobs: Keep memory reserve when calculating parallelism
The "jobs" script is used to determine the amount of compilation
parallelism on a machine. It attempts to ensure each GCC process has at
least 4 GB of memory per core. However, in the worst case scenario, we
could end up having the GCC processes take up all the system memory,
forcin swapping or OOM killer to kick in. For example, on a 4 core
machine with 16 GB of memory, this worst case scenario seems easy to
trigger in practice.

Fix up the problem by keeping a 1 GB of memory reserve for other
processes and calculating parallelism based on that.

Message-Id: <20200423082753.31162-1-penberg@scylladb.com>
(cherry picked from commit 7304a795e5)
2020-05-04 19:01:54 +03:00
Piotr Sarna
3216a1a70a alternator: fix signature timestamps
Generating timestamps for auth signatures used a non-thread-safe
::gmtime function instead of thread-safe ::gmtime_r.

Tests: unit(dev)
Fixes #6345

(cherry picked from commit fb7fa7f442)
2020-05-04 17:08:13 +03:00
Avi Kivity
5a7fd41618 Merge 'Fix hang in multishard_writer' from Asias
"
This series fix hang in multishard_writer when error happens. It contains
- multishard_writer: Abort the queue attached to consumers when producer fails
- repair: Fix hang when the writer is dead

Fixes #6241
Refs: #6248
"

* asias-stream_fix_multishard_writer_hang:
  repair: Fix hang when the writer is dead
  mutation_writer_test: Add test_multishard_writer_producer_aborts
  multishard_writer: Abort the queue attached to consumers when producer fails

(cherry picked from commit 8925e00e96)
2020-05-01 20:13:00 +03:00
Raphael S. Carvalho
dd24ba7a62 api/service: fix segfault when taking a snapshot without keyspace specified
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
(cherry picked from commit 02e046608f)

Fixes #6336.
2020-04-30 12:57:14 +03:00
Avi Kivity
204f6dd393 Update seastar submodule
* seastar a0bdc6cd85...0ebd89a858 (1):
  > http server: fix "Date" header format

Fixes #6253.
2020-04-26 19:31:44 +03:00
Nadav Har'El
b1278adc15 alternator: unzero "scylla_alternator_total_operations" metric
In commit 388b492040, which was only supposed
to move around code, we accidentally lost the line which does

    _executor.local()._stats.total_operations++;

So after this commit this counter was always zero...
This patch returns the line incrementing this counter.

Arguably, this counter is not very important - a user can also calculate
this number by summing up all the counters in the scylla_alternator_operation
array (these are counters for individual types of operations). Nevertheless,
as long as we do export a "scylla_alternator_total_operations" metric,
we need to correctly calculate it and can't leave it zero :-)

Fixes #5836

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219162820.14205-1-nyh@scylladb.com>
(cherry picked from commit b8aed18a24)
2020-04-19 19:07:31 +03:00
Botond Dénes
ee9677ef71 schema: schema(): use std::stable_sort() to sort key columns
When multiple key columns (clustering or partition) are passed to
the schema constructor, all having the same column id, the expectation
is that these columns will retain the order in which they were passed to
`schema_builder::with_column()`. Currently however this is not
guaranteed as the schema constructor sort key columns by column id with
`std::sort()`, which doesn't guarantee that equally comparing elements
retain their order. This can be an issue for indexes, the schemas of
which are built independently on each node. If there is any room for
variance between for the key column order, this can result in different
nodes having incompatible schemas for the same index.
The fix is to use `std::stable_sort()` which guarantees that the order
of equally comparing elements won't change.

This is a suspected cause of #5856, although we don't have hard proof.

Fixes: #5856
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
[avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes
      unstable at 17 elements, and the failing schema had a
      clustering key with 23 elements]
Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com>
(cherry picked from commit a4aa753f0f)
2020-04-19 18:19:05 +03:00
Nadav Har'El
2060e361cf materialized views: fix corner case of view updates used by Alternator
While CQL does not allow creation of a materialized view with more than one
base regular column in the view's key, in Alternator we do allow this - both
partition and clustering key may be a base regular column. We had a bug in
the logic handling this case:

If the new base row is missing a value for *one* of the view key columns,
we shouldn't create a view row. Similarly, if the existing base row was
missing a value for *one* of the view key columns, a view row does not
exist and doesn't need to be deleted.  This was done incorrectly, and made
decisions based on just one of the key columns, and the logic is now
fixed (and I think, simplified) in this patch.

With this patch, the Alternator test which previously failed because of
this problem now passes. The patch also includes new tests in the existing
C++ unit test test_view_with_two_regular_base_columns_in_key. This tests
was already supposed to be testing various cases of two-new-key-columns
updates, but missed the cases explained above. These new tests failed
badly before this patch - some of them had clean write errors, others
caused crashes. With this patch, they pass.

Fixes #6008.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200312162503.8944-1-nyh@scylladb.com>
(cherry picked from commit 635e6d887c)
2020-04-19 15:24:19 +03:00
Hagit Segev
6f939ffe19 release: prepare for 3.3.1 2020-04-18 00:23:31 +03:00
Kamil Braun
69105bde8a sstables: freeze types nested in collection types in legacy sstables
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).

SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).

This commit does two things:
1. for all SSTables created after this commit, include a new feature
   flag, CorrectUDTsInCollections, presence of which implies that frozen
   UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
   nested inside collections are always frozen, even if they don't have
   the tag. This assumption is safe to be made, because at the time of
   this commit, Scylla does not allow non-frozen (multi-cell) types inside
   collections or UDTs, and because of point 1 above.

There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.

Fixes #6130.

(cherry picked from commit 3d811e2f95)
2020-04-17 09:12:28 +03:00
Kamil Braun
e09e9a5929 sstables: move definition of column_translation::state::build to a .cc file
Ref #6130
2020-04-17 09:12:28 +03:00
Piotr Sarna
2308bdbccb alternator: use partition tombstone if there's no clustering key
As @tgrabiec helpfully pointed out, creating a row tombstone
for a table which does not have a clustering key in its schema
creates something that looks like an open-ended range tombstone.
That's problematic for KA/LA sstable formats, which are incapable
of writing such tombstones, so a workaround is provided
in order to allow using KA/LA in alternator.

Fixes #6035
Cherry-picked from 0a2d7addc0
2020-04-16 12:14:10 +02:00
Asias He
a2d39c9a2e gossip: Add an option to force gossip generation
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529c2b)
2020-03-27 12:49:23 +01:00
Asias He
5fe2ce3bbe gossiper: Always use the new generation number
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.

Consier the following:

- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
  INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
  storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber1, filber 1 finishes faster filber 2, it
  sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
  gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
  endpoint_state_map, at this point endpoint_state_map contains
  information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
  with new generation number generated correctly in
  storage_service::prepare_to_join, but in
  maybe_initialize_local_state(generation_nbr), it will not set new
  generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
  hbs.update_heart_beat() the heartbeat is set to the number starting
  from 0.
- n1 and n2 will not get update from n3 because they use the same
  generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.

To fix, always use the the new generation number.

Fixes: #5800
Backports: 3.0 3.1 3.2
(cherry picked from commit 62774ff882)
2020-03-27 12:49:20 +01:00
Piotr Sarna
aafa34bbad cql: fix qualifying indexed columns for filtering
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.

Fixes #5708
Tests: unit(dev)

(cherry picked from commit 767ff59418)
2020-03-22 09:00:51 +01:00
Hagit Segev
7ae2cdf46c release: prepare for 3.3.0 2020-03-19 21:46:44 +02:00
Hagit Segev
863f88c067 release: prepare for 3.3.rc3 2020-03-15 22:45:30 +02:00
Avi Kivity
90b4e9e595 Update seastar submodule
* seastar f54084c08f...a0bdc6cd85 (1):
  > tls: Fix race and stale memory use in delayed shutdown

Fixes #5759 (maybe)
2020-03-12 19:41:50 +02:00
Konstantin Osipov
434ad4548f locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>

(cherry picked from commit ac6f64a885)
2020-03-12 12:09:46 +02:00
Avi Kivity
cbbb15af5c logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>

(cherry picked from commit c020b4e5e2)
2020-03-12 11:25:20 +02:00
Benny Halevy
3231580c05 dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.

To prevent that we need to set the following macros as follows:
  unset `_unique_build_ids`
  set `_no_recompute_build_ids` to 1

Fixes #5881

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
2020-03-09 15:21:50 +02:00
Piotr Sarna
62364d9dcd Merge 'cql3: do_execute_base_query: fix null deref ...
... when clustering key is unavailable' from Benny

This series fixes null pointer dereference seen in #5794

efd7efe cql3: generate_base_key_from_index_pk; support optional index_ck
7af1f9e cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
7fe1a9e cql3: do_execute_base_query: fixup indentation

Fixes #5794

Branches: 3.3

Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug)

* bhalevy/fix-5794-generate_base_key_from_index_pk:
  cql3: do_execute_base_query: fixup indentation
  cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
  cql3: generate_base_key_from_index_pk; support optional index_ck

(cherry picked from commit 4e95b67501)
2020-03-09 15:20:01 +02:00
Takuya ASADA
3bed8063f6 dist/debian: fix "unable to open node-exporter.service.dpkg-new" error
It seems like *.service is conflicting on install time because the file
installed twice, both debian/*.service and debian/scylla-server.install.

We don't need to use *.install, so we can just drop the line.

Fixes #5640

(cherry picked from commit 29285b28e2)
2020-03-03 12:40:39 +02:00
Yaron Kaikov
413fcab833 release: prepare for 3.3.rc2 2020-02-27 14:45:18 +02:00
Juliusz Stasiewicz
9f3c3036bf cdc: set TTLs on CDC log cells
Cells in CDC logs used to be created while completely neglecting
TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs
to all cells; there are no row markers, so wee need not set TTL
there.

Fixes #5688

(cherry picked from commit 67b92c584f)
2020-02-26 18:12:55 +02:00
Benny Halevy
ff2e108a6d gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
(cherry picked from commit f45fabab73)
2020-02-26 13:00:11 +02:00
Gleb Natapov
ade788ffe8 commitlog: use commitlog IO scheduling class for segment zeroing
There may be other commitlog writes waiting for zeroing to complete, so
not using proper scheduling class causes priority inversion.

Fixes #5858.

Message-Id: <20200220102939.30769-2-gleb@scylladb.com>
(cherry picked from commit 6a78cc9e31)
2020-02-26 12:51:10 +02:00
Benny Halevy
1f8bb754d9 storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service
Match subscription done in main() and avoid cross shard access
to _lifecycle_subscribers vector.

Fixes #5385

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>
(cherry picked from commit 5b0ea4c114)
2020-02-25 16:39:49 +02:00
Tomasz Grabiec
7b2eb09225 Merge fixes for use-after-frees related to shutdown of services
Backport of 884d5e2bcb and
4839ca8491.

Fixes crashes when scylla is stopped early during boot.

Merged from https://github.com/xemul/scylla/tree/br-mm-combined-fixes-for-3.3

Fixes #5765.
2020-02-25 13:34:01 +01:00
Pavel Emelyanov
d2293f9fd5 migration_manager: Abort and wait cluster upgrade waiters
The maybe_schedule_schema_pull waits for schema_tables_v3 to
become available. This is unsafe in case migration manager
goes away before the feature is enabled.

Fix this by subscribing on feature with feature::listener and
waiting for condition variable in maybe_schedule_schema_pull.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
25b31f6c23 migration_manager: Abort and wait delayed schema pulls
The sleep is interrupted with the abort source, the "wait" part
is done with the existing _background_tasks gate. Also we need
to make sure the gate stays alive till the end of the function,
so make use of the async_sharded_service (migration manager is
already such).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Pavel Emelyanov
742a1ce7d6 storage_service: Unregister from gossiper notifications ... at all
This unregistration doesn't happen currently, but doesn't seem to
cause any problems in general, as on stop gossiper is stopped and
nothing from it hits the store_service.

However (!) if an exception pops up between the storage_service
is subscribed on gossiper and the drain_on_shutdown defer action
is set up  then we _may_ get into the following situation:

- main's stuff gets unrolled back
- gossiper is not stopped (drain_on_shutdown defer is not set up)
- migration manager is stopped (with deferred action in main)
- a nitification comes from gossiper
    -> storage_service::on_change might want to pull schema with
       the help of local migration manager
    -> assert(local_is_initialized) strikes

Fix this by registering storage_service to gossiper a bit earlier
(both are already initialized y that time) and setting up unregister
defer right afterwards.

Test: unit(dev), manual start-stop
Bug: #5628

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200130190343.25656-1-xemul@scylladb.com>
2020-02-24 14:18:15 +03:00
Avi Kivity
4ca9d23b83 Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"
This reverts commit bdc542143e. Exposes a data resurrection
bug (#5838).
2020-02-24 10:02:58 +02:00
Avi Kivity
9e97f3a9b3 Update seastar submodule
* seastar dd686552ff...f54084c08f (2):
  > reactor: fallback to epoll backend when fs.aio-max-nr is too small
  > util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as()

Fixes #5638.
2020-02-20 10:25:00 +02:00
Piotr Dulikowski
183418f228 hh: handle counter update hints correctly
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.

When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.

When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.

This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.

Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
2020-02-19 16:49:12 +02:00
Piotr Sarna
756574d094 db,view: fix generating view updates for partition tombstones
The update generation path must track and apply all tombstones,
both from the existing base row (if read-before-write was needed)
and for the new row. One such path contained an error, because
it assumed that if the existing row is empty, then the update
can be simply generated from the new row. However, lack of the
existing row can also be the result of a partition/range tombstone.
If that's the case, it needs to be applied, because it's entirely
possible that this partition row also hides the new row.
Without taking the partition tombstone into account, creating
a future tombstone and inserting an out-of-order write before it
in the base table can result in ghost rows in the view table.
This patch comes with a test which was proven to fail before the
changes.

Branches 3.1,3.2,3.3
Fixes #5793

Tests: unit(dev)
Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>
(cherry picked from commit e93c54e837)
2020-02-16 20:26:28 +02:00
Rafael Ávila de Espíndola
a348418918 service: Add a lock around migration_notifier::_listeners
Before this patch the iterations over migration_notifier::_listeners
could race with listeners being added and removed.

The addition side is not modified, since it is common to add a
listener during construction and it would require a fairly big
refactoring. Instead, the iteration is modified to use indexes instead
of iterators so that it is still valid if another listener is added
concurrently.

For removal we use a rw lock, since removing an element invalidates
indexes too. There are only a few places that needed refactoring to
handle unregister_listener returning a future<>, so this is probably
OK.

Fixes #5541.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200120192819.136305-1-espindola@scylladb.com>
(cherry picked from commit 27bd3fe203)
2020-02-16 20:13:42 +02:00
Avi Kivity
06c0bd0681 Update seastar submodule
* seastar 3f3e117de3...dd686552ff (1):
  > perftune.py: Use safe_load() for fix arbitrary code execution

Fixes #5630.
2020-02-16 15:53:16 +02:00
Avi Kivity
223c300435 Point seastar submodule at scylla-seastar.git branch-3.3
This allows us to backport seastar patches to Scylla 3.3.
2020-02-16 15:51:46 +02:00
Gleb Natapov
ac8bef6781 commitlog: fix flushing an entry marked as "sync" in periodic mode
After 546556b71b we can have mixed writes into commitlog,
some do flush immediately some do not. If non flushing write races with
flushing one and becomes responsible for writing back its buffer into a
file flush will be skipped which will cause assert in batch_cycle() to
trigger since flush position will not be advanced. Fix that by checking
that flush was skipped and in this case flush explicitly our file
position.

Fixes #5670

Message-Id: <20200128145103.GI26048@scylladb.com>
(cherry picked from commit c654ffe34b)
2020-02-16 15:48:40 +02:00
Pavel Solodovnikov
68691907af lwt: fix handling of nulls in parameter markers for LWT queries
This patch affects the LWT queries with IF conditions of the
following form: `IF col in :value`, i.e. if the parameter
marker is used.

When executing a prepared query with a bound value
of `(None,)` (tuple with null, example for Python driver), it is
serialized not as NULL but as "empty" value (serialization
format differs in each case).

Therefore, Scylla deserializes the parameters in the request as
empty `data_value` instances, which are, in turn, translated
to non-empty `bytes_opt` with empty byte-string value later.

Account for this case too in the CAS condition evaluation code.

Example of a problem this patch aims to fix:

Suppose we have a table `tbl` with a boolean field `test` and
INSERT a row with NULL value for the `test` column.

Then the following update query fails to apply due to the
error in IF condition evaluation code (assume `v=(null)`):
`UPDATE tbl SET test=false WHERE key=0 IF test IN :v`
returns false in `[applied]` column, but is expected to succeed.

Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286)

Fixes: #5710

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit bcc4647552)
2020-02-16 15:29:28 +02:00
Avi Kivity
f59d2fcbf1 Merge "stop passing tracing state pointer in client_state" from Gleb
"
client_state is used simultaneously by many requests running in parallel
while tracing state pointer is per request. Both those facts do not sit
well together and as a result sometimes tracing state is being overwritten
while still been used by active request which may cause incorrect trace
or even a crash.
"

Fixes #5700.

Backported from 9f1f60fc38

* 'gleb/trace_fix_3.3_backport' of ssh://github.com/scylladb/seastar-dev:
  client_state: drop the pointer to a tracing state from client_state
  transport: pass tracing state explicitly instead of relying on it been in the client_state
  alternator: pass tracing state explicitly instead of relying on it been in the client_state
2020-02-16 15:23:41 +02:00
Asias He
bdc542143e streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.

It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.

The steps are:

- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
  new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges

In summary, this patch will avoid a lot of cache invalidation for
streaming.

Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
2020-02-16 15:16:24 +02:00
Botond Dénes
061a02237c row: append(): downgrade assert to on_internal_error()
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.

Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
2020-02-16 15:12:46 +02:00
Gleb Natapov
35b6505517 client_state: drop the pointer to a tracing state from client_state
client_state is shared between requests and tracing state is per
request. It is not safe to use the former as a container for the later
since a state can be overwritten prematurely by subsequent requests.

(cherry picked from commit 31cf2434d6)
2020-02-13 13:45:56 +02:00
Gleb Natapov
866c04dd64 transport: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per request.
Currently next request may overwrite tracing state for previous one
causing, in a best case, wrong trace to be taken or crash if overwritten
pointer is freed prematurely.

Fixes #5700

(cherry picked from commit 9f1f60fc38)
2020-02-13 13:45:56 +02:00
Gleb Natapov
dc588e6e7b alternator: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per
request. This is not yet an issue for the alternator since it creates
new client_state object for each request, but first of all it should not
and second trace state will be dropped from the client_state, by later
patch.

(cherry picked from commit 38fcab3db4)
2020-02-13 13:45:56 +02:00
Takuya ASADA
f842154453 dist/debian: keep /etc/systemd .conf files on 'remove'
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.

Fixes #5734

(cherry picked from commit 43097854a5)
2020-02-12 14:26:40 +02:00
Yaron Kaikov
b38193f71d dist/docker: Switch to 3.3 release repository (#5756)
Change the SCYLLA_REPO_URL variable to point to branch-3.3 instead of
master. This ensures that Docker image builds that don't specify the
variable build from the right repository by default.
2020-02-10 11:11:38 +02:00
Rafael Ávila de Espíndola
f47ba6dc06 lua: Handle nil returns correctly
This is a minimum backport to 3.3.

With this patch lua nil values are mapped to CQL null values instead
of producing an error.

Fixes #5667

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200203164918.70450-1-espindola@scylladb.com>
2020-02-09 18:55:42 +02:00
Hagit Segev
0d0c1d4318 release: prepare for 3.3.rc1 2020-02-09 15:55:24 +02:00
Takuya ASADA
9225b17b99 scylla_post_install.sh: fix 'integer expression expected' error
awk returns float value on Debian, it causes postinst script failure
since we compare it as integer value.
Replaced with sed + bash.

Fixes #5569

(cherry picked from commit 5627888b7c)
2020-02-04 14:30:04 +02:00
Gleb Natapov
00b3f28199 db/system_keyspace: use user memory limits for local.paxos table
Treat writes to local.paxos as user memory, as the number of writes is
dependent on the amount of user data written with LWT.

Fixes #5682

Message-Id: <20200130150048.GW26048@scylladb.com>
(cherry picked from commit b08679e1d3)
2020-02-02 17:36:52 +02:00
Rafael Ávila de Espíndola
1bbe619689 types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
2020-02-02 16:00:58 +02:00
Avi Kivity
c36f71c783 test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>

(cherry picked from commit ec5b721db7)
2020-02-01 13:20:22 +02:00
Pekka Enberg
f5471d268b release: prepare for 3.3.rc0 2020-01-30 14:00:51 +02:00
Takuya ASADA
fd5c65d9dc dist/debian: Use tilde for release candidate builds
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.

Fixes #5641

(cherry picked from commit dd81fd3454)
2020-01-28 18:34:48 +02:00
Avi Kivity
3aa406bf00 tools: toolchain: dbuild: relax process limit in container
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.

Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).

I checked that --pids-limit is supported by old versions of
docker and by podman.

Fixes #5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>

(cherry picked from commit 897320f6ab)
2020-01-28 18:14:01 +02:00
Piotr Sarna
c0253d9221 db,view: fix checking for secondary index special columns
A mistake in handling legacy checks for special 'idx_token' column
resulted in not recognizing materialized views backing secondary
indexes properly. The mistake is really a typo, but with bad
consequences - instead of checking the view schema for being an index,
we asked for the base schema, which is definitely not an index of
itself.

Branches 3.1,3.2 (asap)
Fixes #5621
Fixes #4744

(cherry picked from commit 9b379e3d63)
2020-01-21 23:32:11 +02:00
Avi Kivity
12bc965f71 atomic_cell: consistently use comma as separator in pretty-printers
The atomic_cell pretty printers use a mix of commas and semicolons.
This change makes them use commas everywhere, for consistency.
Message-Id: <20200116133327.2610280-1-avi@scylladb.com>
2020-01-16 17:26:33 +01:00
Nadav Har'El
1ed21d70dc merge: CDC: do mutation augmentation from storage proxy
Merged pull request https://github.com/scylladb/scylla/pull/5567
from Calle Wilund:

Fixes #5314

Instead of tying CDC handling into cql statement objects, this patch set
moves it to storage proxy, i.e. shared code for mutating stuff. This means
we automatically handle cdc for code paths outside cql (i.e. alternator).

It also adds api handling (though initially inefficient) for batch statements.

CDC is tied into storage proxy by giving the former a ref to the latter (per
shard). Initially this is not a constructor parameter, because right now we
have chicken and egg issues here. Hopefully, Pavels refactoring of migration
manager and notifications will untie these and this relationship can become
nicer.

The actual augmentation can (as stated above) be made much more efficient.
Hopefully, the stream management refactoring will deal with expensive stream
lookup, and eventually, we can maybe coalesce pre-image selects for batches.
However, that is left as an exercise for when deemed needed.

The augmentation API has an optional return value for a "post-image handler"
to be used iff returned after mutation call is finished (and successful).
It is not yet actually invoked from storage_proxy, but it is at least in the
call chain.
2020-01-16 17:12:56 +02:00
Avi Kivity
e677f56094 Merge "Enable general centos RPM (not only centos7)" from Hagit 2020-01-16 14:13:24 +02:00
Tomasz Grabiec
36d90e637e Merge "Relax migration manager dependencies" from Pavel Emalyanov
The set make dependencies between mm and other services cleaner,
in particular, after the set:

- the query processor no longer needs migration manager
  (which doesn't need query processor either)

- the database no longer needs migration manager, thus the mutual
  dependency between these two is dropped, only migration manager
  -> database is left

- the migration manager -> storage_service dependency is relaxed,
  one more patchset will be needed to remove it, thus dropping one
  more mutual dependency between them, only the storage_service
  -> migration manager will be left

- the migration manager is stopped on drain, but several more
  services need it on stop, thus causing use after free problems,
  in particular there's a caught bug when view builder crashes
  when unregistering from notifier list on stop. Fixed.

Tests: unit(dev)
Fixes: #5404
2020-01-16 12:12:25 +01:00
Hagit Segev
d0405003bd building-packages doc: Update no specific el7 on path 2020-01-16 12:49:08 +02:00
Rafael Ávila de Espíndola
c42a2c6f28 configure: Add -O1 when compiling generated parsers
Enabling asan enables a few cleanup optimizations in gcc. The net
result is that using

  -fsanitize=address -fno-sanitize-address-use-after-scope

Produces code that uses a lot less stack than if the file is compiled
with just -O0.

This patch adds -O1 in addition to
-fno-sanitize-address-use-after-scope to protect the unfortunate
developer that decides to build in dev mode with --cflags='-O0 -g'.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-2-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
317e0228a8 configure: Put user flags after the mode flags
It is sometimes convenient to build with flags that don't match any
existing mode.

Recently I was tracking a bug that would not reproduce with debug, but
reproduced with dev, so I tried debugging the result of

./configure.py --cflags="-O0 -g"

While the binary had debug info, it still had optimizations because
configure.py put the mode flags after the user flags (-O0 -O1). This
patch flips the order (-O1 -O0) so that the flags passed in the
command line win.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Gleb Natapov
51281bc8ad lwt: fix write timeout exception reporting
CQL transport code relies on an exception's C++ type to create correct
reply, but in lwt we converted some mutation_timeout exceptions to more
generic request_timeout while forwarding them which broke the protocol.
Do not drop type information.

Fixes #5598.

Message-Id: <20200115180313.GQ9084@scylladb.com>
2020-01-16 12:05:50 +02:00
Piotr Jastrzębski
0c8c1ec014 config: fix description of enable_deprecated_partitioners
Murmur3 is the default partitioner.
ByteOrder and Random are the deprecated ones
and should be mentioned in the description.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-16 12:05:50 +02:00
Nadav Har'El
9953a33354 merge "Adding a schema file when creating a snapshot"
Merged pull request https://github.com/scylladb/scylla/pull/5294 from
Amnon Heiman:

To use a snapshot we need a schema file that is similar to the result of
running cql DESCRIBE command.

The DESCRIBE is implemented in the cql driver so the functionality needs
to be re-implemented inside scylla.

This series adds a describe method to the schema file and use it when doing
a snapshot.

There are different approach of how to handle materialize views and
secondary indexes.

This implementation creates each schema.cql file in its own relevant
directory, so the schema for materializing view, for example, will be
placed in the snapshot directory of the table of that view.

Fixes #4192
2020-01-16 12:05:50 +02:00
Piotr Dulikowski
c383652061 gossip: allow for aborting on sleep
This commit makes most sleeps in gossip.cc abortable. It is now possible
to quickly shut down a node during startup, most notably during the
phase while it waits for gossip to settle.
2020-01-16 12:05:50 +02:00
Avi Kivity
e5e0642f2a tools: toolchain: add dependencies for building debian and rpm packages
This reduces network traffic and eliminates time for installation when
building packages from the frozen toolchain, as well as isolating the
build from updates to those dependencies which may cause breakage.
2020-01-16 12:05:50 +02:00
Pekka Enberg
da9dae3dbe Merge 'test.py: add support for CQL tests' from Kostja
This patch set adds support for CQL tests to test.py,
as well as many other improvements:

* --name is now a positional argument
* test output is preserved in testlog/${mode}
* concise output format
* better color support
* arbitrary number of test suites
* per-suite yaml-based configuration
* options --jenkins and --xunit are removed and xml
  files are generated for all runs

A simple driver is written in C++ to read CQL for
standard input, execute in embedded mode and produce output.

The patch is checked with BYO.

Reviewed-by: Dejan Mircevski <dejan@scylladb.com>
* 'test.py' of github.com:/scylladb/scylla-dev: (39 commits)
  test.py: introduce BoostTest and virtualize custom boost arguments
  test.py: sort tests within a suite, and sort suites
  test.py: add a basic CQL test
  test.py: add CQL .reject files to gitignore
  test.py: print a colored unidiff in case of test failure
  test.py: add CqlTestSuite to run CQL tests
  test.py: initial import of CQL test driver, cql_repl
  test.py: remove custom colors and define a color palette
  test.py: split test output per test mode
  test.py: remove tests_to_run
  test.py: virtualize Test.run(), to introduce CqlTest.Run next
  test.py: virtualize test search pattern per TestSuite
  test.py: virtualize write_xunit_report()
  test.py: ensure print_summary() is agnostic of test type
  test.py: tidy up print_summary()
  test.py: introduce base class Test for CQL and Unit tests
  test.py: move the default arguments handling to UnitTestSuite
  test.py: move custom unit test command line arguments to suite.yaml
  test.py: move command line argument processing to UnitTestSuite
  test.py: introduce add_test(), which is suite-specific
  ...
2020-01-16 12:05:50 +02:00
Pekka Enberg
e8b659ec5d dist/docker: Remove Ubuntu-based Docker image
The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated
since 2017. Let's remove it as unmaintained.

Message-Id: <20200115102405.23567-1-penberg@scylladb.com>
2020-01-16 12:05:50 +02:00
Avi Kivity
546556b71b Merge "allow commitlog to wait for specific entires to be flushed on disk" from Gleb
"
Currently commitlog supports two modes of operation. First is 'periodic'
mode where all commitlog writes are ready the moment they are stored in
a memory buffer and the memory buffer is flushed to a storage periodically.
Second is a 'batch' mode where each write is flushed as soon as possible
(after previous flush completed) and writes are only ready after they
are flushed.

The first option is not very durable, the second is not very efficient.
This series adds an option to mark some writes as "more durable" in
periodic mode meaning that they will be flushed immediately and reported
complete only after the flush is complete (flushing a durable write also
flushes all writes that came before it). It also changes paxos to use
those durable writes to store paxos state.

Note that strictly speaking the last patch is not needed since after
writing to an actual table the code updates paxos table and the later
uses durable writes that make sure all previous writes are flushed. Given
that both writes supposed to run on the same shard this should be enough.
But it feels right to make base table writes durable as well.
"

* 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev:
  paxos: immediately sync commitlog entries for writes made by paxos learn stage
  paxos: mark paxos table schema as "always sync"
  schema: allow schema to be marked as 'always sync to commitlog'
  commitlog: add test for per entry sync mode
  database: pass sync flag from db::apply function to the commitlog
  commitlog: add sync method to entry_writer
2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola
2ebd1463b2 tests: Handle null and not present values differently
Before this patch result_set_assertions was handling both null values
and missing values in the same way.

This patch changes the handling of missing values so that now checking
for a null value is not the same as checking for a value not being
present.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114184116.75546-1-espindola@scylladb.com>
2020-01-16 12:05:50 +02:00
Botond Dénes
0c52c2ba50 data: make cell::make_collection(): more consistent and safer
3ec889816 changed cell::make_collection() to take different code paths
depending whether its `data` argument is nothrow copyable/movable or
not. In case it is not, it is wrapped in a view to make it so (see the
above mentioned commit for a full explanation), relying on the methods
pre-existing requirement for callers to keep `data` alive while the
created writer is in use.
On closer look however it turns out that this requirement is neither
respected, nor enforced, at least not on the code level. The real
requirement is that the underlying data represented by `data` is kept
alive. If `data` is a view, it is not expected to be kept alive and
callers don't, it is instead copied into `make_collection()`.
Non-views however *are* expected to be kept alive. This makes the API
error prone.
To avoid any future errors due to this ambiguity, require all `data`
arguments to be nothrow copyable and movable. Callers are now required
to pass views of nonconforming objects.

This patch is a usability improvement and is not fixing a bug. The
current code works as-is because it happens to conform to the underlying
requirements.

Refs: #5575
Refs: #5341

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>
2020-01-16 12:05:50 +02:00
Amnon Heiman
ac8aac2b53 tests/cql_query_test: Add schema describe tests
This patch adds tests for the describe method.

test_describe_simple_schema tests regular tables.

test_describe_view_schema tests view and index.

Each test, create a table, find the schema, call the describe method and
compare the results to the string that was used to create the table.

The view tests also verify that adding an index or view does not change
the base table.

When comparing results, leading and trailing white spaces are ignored
and all combination of whitespaces and new lines are treated equaly.

Additional tests may be added at a future phase if required.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:07:57 +02:00
Amnon Heiman
028525daeb database: add schema.cql file when creating a snapshot
When creating a snapshot we need to add a schema.cql file in the
snapshot directory that describes the table in that snapshot.

This patch adds the file using the schema describe method.

get_snapshot_details and manifest_json_filter were modified to ignore
the schema.cql file.

Fixes #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
82367b325a schema: Add a describe method
This patch adds a describe method to a table schema.

It acts similar to a DESCRIBE cql command that is implemented in a CQL
driver.

The method supports tables, secondary indexes local indexes and
materialize views.

relates to: #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Amnon Heiman
6f58d51c83 secondary_index_manager: add the index_name_from_table_name function
index_name_from_table_name is a reverse of index_table_name,
it gets a table name that was generated for an index and return the name
of the index that generated that table.

Relates to #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Pavel Emelyanov
555856b1cd migration_manager: Use in-place value factory
The factory is purely a state-less thing, there is no difference what
instance of it to use, so we may omit referencing the storage_service
in passive_announce

This is 2nd simple migration_manager -> storage_service link to cut
(more to come later).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
f129d8380f migration_manager: Get database through storage_proxy
There are several places where migration_manager needs storage_service
reference to get the database from, thus forming the mutual dependency
between them. This is the simplest case where the migration_manager
link to the storage_service can be cut -- the databse reference can be
obtained from storage_proxy instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
5cf365d7e7 database: Explicitly pass migration_manager through init_non_system_keyspace
This is the last place where database code needs the migration_manager
instance to be alive, so now the mutual dependency between these two
is gone, only the migration_manager needs the database, but not the
vice-versa.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
ebebf9f8a8 database: Do not request migration_manager instance for passive_announce
The helper in question is static, so no need to play with the
migration_manager instances.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
3f84256853 migration_manager: Remove register/unregister helpers
In the 2nd patch the migration_manager kept those for
simpler patching, but now we can drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9e4b41c32a tests: Switch on migration notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
9d31bc166b cdc: Use migration_notifier to (un)register for events
If no one provided -- get it from storage_service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:19 +03:00
Pavel Emelyanov
ecab51f8cc storage_service: Use migration_notifier (and stop worrying)
The storage_server needs migration_manager for notifications and
carefully handles the manager's stop process not to demolish the
listeners list from under itself. From now on this dependency is
no longer valid (however the storage_service seems still need the
migration_manager, but this is different story).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7814ed3c12 cql_server: Use migration_notifier in events_notifier
This patch removes an implicit cql_server -> migration_manager
dependency, as the former's event notifier uses the latter
for notifications.

This dependency also breaks a loop:
storage_service -> cql_server -> migration_manager -> storage_service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
d9edcb3f15 query_processor: Use migration_notifier
This patch breaks one (probably harmless but still) dependency
loop. The query_processor -> migration_manager -> storage_proxy
 -> tracing -> query_processor.

The first link is not not needed, as the query_processor needs the
migration_manager purely to (ub)subscribe on notifications.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
2735024a53 auth: Use migration_notifier
The same as with view builder. The constructor still needs both,
but the life-time reference is now for notifier only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
28f1250b8b view_builder: Use migration notifier
The migration manager itself is still needed on start to wait
for schema agreement, but there's no longer the need for the
life-time reference on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
7cfab1de77 database: Switch on mnotifier from migration_manager
Do not call for local migration manager instance to send notifications,
call for the local migration notifier, it will always be alive.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f45b23f088 storage_service: Keep migration_notifier
The storage service will need this guy to initialize sub-services
with. Also it registers itself with notifiers.

That said, it's convenient to have the migration notifier on board.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
e327feb77f database: Prepare to use on-database migration_notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
f240d5760c migration_manager: Split notifier from main class
The _listeners list on migration_manager class and the corresponding
notify_xxx helpers have nothing to do with the its instances, they
are just transport for notification delivery.

At the same time some services need the migration manager to be alive
at their stop time to unregister from it, while the manager itself
may need them for its needs.

The proposal is to move the migration notifier into a complete separate
sharded "service". This service doesn't need anything, so it's started
first and stopped last.

While it's not effectively a "migration" notifier, we inherited the name
from Cassandra and renaming it will "scramble neurons in the old-timers'
brains but will make it easier for newcomers" as Avi says.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:19 +03:00
Pavel Emelyanov
074cc0c8ac migration_manager: Helpers for on_before_ notifications
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Pavel Emelyanov
1992755c72 storage_service: Kill initialization helper from init.cc
The helper just makes further patching more complex, so drop it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:27:27 +03:00
Konstantin Osipov
a665fab306 test.py: introduce BoostTest and virtualize custom boost arguments 2020-01-15 13:37:25 +03:00
Gleb Natapov
51672e5990 paxos: immediately sync commitlog entries for writes made by paxos learn stage 2020-01-15 12:15:42 +02:00
Gleb Natapov
0fc48515d8 paxos: mark paxos table schema as "always sync"
We want all writes to paxos table to be persisted on a storage before
declared completed.
2020-01-15 12:15:42 +02:00
Gleb Natapov
16e0fc4742 schema: allow schema to be marked as 'always sync to commitlog'
All writes that uses this schema will be immediately persisted on a
storage.
2020-01-15 12:15:42 +02:00
Gleb Natapov
0ce70c7a04 commitlog: add test for per entry sync mode 2020-01-15 12:15:42 +02:00
Gleb Natapov
29574c1271 database: pass sync flag from db::apply function to the commitlog
Allow upper layers to request a mutation to be persisted on a disk before
making future ready independent of which mode commitlog is running in.
2020-01-15 12:15:42 +02:00
Gleb Natapov
e0bc4aa098 commitlog: add sync method to entry_writer
If the method returns true commitlog should sync to file immediately
after writing the entry and wait for flush to complete before returning.
2020-01-15 12:15:42 +02:00
Piotr Sarna
9aab75db60 alternator: clean up single value rjson comparator
The comparator is refreshed to ensure the following:
 - null compares less to all other types;
 - null, true and false are comparable against each other,
   while other types are only comparable against themselves and null.

Comparing mixed types is not currently reachable from the alternator
API, because it's only used for sets, which can only use
strings, binary blobs and numbers - thus, no new pytest cases are added.

Fixes #5454
2020-01-15 10:57:49 +02:00
Juliusz Stasiewicz
d87d01b501 storage_proxy: intercept rpc::closed_error if counter leader is down (#5579)
When counter mutation is about to be sent, a leader is elected, but
if the leader fails after election, we get `rpc::closed_error`. The
exception propagates high up, causing all connections to be dropped.

This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters`
and translates it to `mutation_write_failure_exception`.

References #2859
2020-01-15 09:56:45 +01:00
Konstantin Osipov
a351ea57d5 test.py: sort tests within a suite, and sort suites
This makes it easier to navigate the test artefacts.

No need to sort suites since they are already
stored in a dict.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
ba87e73f8e test.py: add a basic CQL test 2020-01-15 11:41:19 +03:00
Konstantin Osipov
44d31db1fc test.py: add CQL .reject files to gitignore
To avoid accidental commit, add .reject files to .gitignore
2020-01-15 11:41:19 +03:00
Konstantin Osipov
4f64f0c652 test.py: print a colored unidiff in case of test failure
Print a colored unidiff between result and reject files in case of test
failure.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
d3f9e64028 test.py: add CqlTestSuite to run CQL tests
Run the test and compare results. Manage temporary
and .reject files.

Now that there are CQL tests, improve logging.

run_test success no longer means test success.
2020-01-15 11:41:19 +03:00
Konstantin Osipov
b114bfe0bd test.py: initial import of CQL test driver, cql_repl
cql_repl is a simple program which reads CQL from stdin,
executes it, and writes results to stdout.

It support --input, --output and --log options.
--log is directed to cql_test.log by default.
--input is stdin by default
--output is stdout by default.

The result set output is print with a basic
JSON visitor.
2020-01-15 11:41:16 +03:00
Konstantin Osipov
0ec27267ab test.py: remove custom colors and define a color palette
Using a standard Python module improves readability,
and allows using colors easily in other output.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
0165413405 test.py: split test output per test mode
Store test temporary files and logs in ${testdir}/${mode}.
Remove --jenkins and --xunit, and always write XML
files at a predefined location: ${testdir}/${mode}/xml/.

Use .xunit.xml extension for tests which XML output is
in xunit format, and junit.xml for an accumulated output
of all non-boost tests in junit format.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
4095ab08c8 test.py: remove tests_to_run
Avoid storing each test twice, use per-tests
list to construct a global iterable.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
169128f80b test.py: virtualize Test.run(), to introduce CqlTest.Run next 2020-01-15 10:53:24 +03:00
Konstantin Osipov
d05f6c3cc7 test.py: virtualize test search pattern per TestSuite
CQL tests have .cql extension, while unit tests
have .cc.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
abcc182ab3 test.py: virtualize write_xunit_report()
Make sure any non-boost test can participate in the report.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
18aafacfad test.py: ensure print_summary() is agnostic of test type
Introduce a virtual Test.print_summary() to print
a failed test summary.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
21fbe5fa81 test.py: tidy up print_summary()
Now that we have tabular output, make print_summary()
more concise.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c171882b51 test.py: introduce base class Test for CQL and Unit tests 2020-01-15 10:53:24 +03:00
Konstantin Osipov
fd6897d53e test.py: move the default arguments handling to UnitTestSuite
Move UnitTeset default seastar argument handling to UnitTestSuite
(cleanup).
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d3126f08ed test.py: move custom unit test command line arguments to suite.yaml
Load the command line arguments, if any, from suite.yaml, rather
than keep them hard-coded in test.py.

This is allows operations team to have easier access to these.

Note I had to sacrifice dynamic smp count for mutation_reader_test
(the new smp count is fixed at 3) since this is part
of test configuration now.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ef6cebcbd2 test.py: move command line argument processing to UnitTestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
4a20617be3 test.py: introduce add_test(), which is suite-specific 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7e10bebcda test.py: move long test list to suite.yaml
Use suite.yaml for long tests
2020-01-15 10:53:24 +03:00
Konstantin Osipov
32ffde91ba test.py: move test id assignment to TestSuite
Going forward finding and creating tests will be
a responsibility of TestSuite, so the id generator
needs to be shared.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b5b4944111 test.py: move repeat handling to TestSuite
This way we can avoid iterating over all tests
to handle --repeat.
Besides, going forward the tests will be stored
in two places: in the global list of all tests,
for the runner, and per suite, for suite-based
reporting, so it's easier if TestSuite
if fully responsible for finding and adding tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
34a1b49fc3 test.py: move add_test_list() to TestSuite 2020-01-15 10:53:24 +03:00
Konstantin Osipov
44e1c4267c test.py: introduce test suites
- UnitTestSuite - for test/unit tests
- BoostTestSuite - a tweak on UnitTestSuite, with options
  to log xml test output to a dedicated file
2020-01-15 10:53:24 +03:00
Konstantin Osipov
eed3201ca6 test.py: use path, rather than test kind, for search pattern
Going forward there may be multiple suites of the same kind.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f95c97667f test.py: support arbitrary number of test suites
Scan entire test/ for folders that contain suite.yaml,
and load tests from these folders. Skip the rest.

Each folder with a suite.yaml is expected to have a valid
suite configuration in the yaml file.

A suite is a folder with test of the same type. E.g.
it can be a folder with unit tests, boost tests, or CQL
tests.

The harness will use suite.yaml to create an appropriate
suite test driver, to execute tests in different formats.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
c1f8169cd4 test.py: add suite.yaml to boost and unit tests
The plan is to move suite-specific settings to the
configuration file.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
ec9ad04c8a test.py: move 'success' to TestUnit class
There will be other success attributes: program return
status 0 doesn't mean the test is successful for all tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
b4aa4d35c3 test.py: save test output in tmpdir
It is handy to have it so that a reference of a failed
test is available without re-running it.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
f4efe03ade test.py: always produce xml output, derive output paths from tmpdir
It reduces the number of configurations to re-test when test.py is
modified.  and simplifies usage of test.py in build tools, since you no
longer need to bother with extra arguments.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
d2b546d464 test.py: output job count in the log 2020-01-15 10:53:24 +03:00
Konstantin Osipov
233f921f9d test.py: make test output brief&tabular
New format:

% ./test.py --verbose --mode=release
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[1/111]   boost/UUID_test                                    release  [ PASS ]
[2/111]   boost/enum_set_test                                release  [ PASS ]
[3/111]   boost/like_matcher_test                            release  [ PASS ]
[4/111]   boost/observable_test                              release  [ PASS ]
[5/111]   boost/allocation_strategy_test                     release  [ PASS ]
^C
% ./test.py foo
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[3/3]     unit/memory_footprint_test                          debug   [ PASS ]
------------------------------------------------------------------------------
2020-01-15 10:53:24 +03:00
Konstantin Osipov
879bea20ab test.py: add a log file
Going forward I'd like to make terminal output brief&tabular,
but some test details are necessary to preserve so that a failure
is easy to debug. This information now goes to the log file.

- open and truncate the log file on each harness start
- log options of each invoked test in the log, so that
  a failure is easy to reproduce
- log test result in the log

Since tests are run concurrently, having an exact
trace of concurrent execution also helps
debugging flaky tests.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
cbee76fb95 test.py: gitignore the default ./test.py tmpdir, ./testlog 2020-01-15 10:53:24 +03:00
Konstantin Osipov
1de69228f1 test.py: add --tmpdir
It will be used for test log files.
2020-01-15 10:53:24 +03:00
Konstantin Osipov
caf742f956 test.py: flake8 style fix 2020-01-15 10:53:24 +03:00
Konstantin Osipov
dab364c87d test.py: sort imports 2020-01-15 10:53:24 +03:00
Konstantin Osipov
7ec4b98200 test.py: make name a positional argument.
Accept multiple test names, treat test name
as a substring, and if the same name is given
multiple times, run the test multiple times.
2020-01-15 10:53:24 +03:00
Dejan Mircevski
bb2e04cc8b alternator: Improve comments on comparators
Some comparator methods in conditions.cc use unexpected operators;
explain why.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-01-14 22:25:55 +02:00
Tomasz Grabiec
c8a5a27bd9 Merge "storage_service: Move load_broadcaster away" from Pavel E.
The storage_service struct is a collection of diverse things,
most of them requiring only on start and on stop and/or runing
on shard 0 (but is nonetheless sharded).

As a part of clearing this structure and generated by it inter-
-componenes dependencies, here's the sanitation of load_broadcaster.
2020-01-14 19:26:06 +01:00
Calle Wilund
313ed91ab0 cdc: Listen for migration callbacks on all shards
Fixes #5582

... but only populate log on shard 0.

Migration manager callbacks are slightly assymetric. Notifications
for pre-create/update mutations are sent only on initiating shard
(neccesary, because we consider the mutations mutable).
But "created" callbacks are sent on all shards (immutable).

We must subscribe on all shards, but still do population of cdc table
only once, otherwise we can either miss table creat or populate
more than once.

v2:
- Add test case
Message-Id: <20200113140524.14890-1-calle@scylladb.com>
2020-01-14 16:35:41 +01:00
Avi Kivity
2138657d3a Update seastar submodule
* seastar 36cf5c5ff0...3f3e117de3 (16):
  > memcached: don't use C++17-only std::optional
  > reactor: Comment why _backend is assigned in constructor body
  > log: restore --log-to-stdout for backward compatibility
  > used_size.hh: Include missing headers
  > core: Move some code from reactor.cc to future.cc
  > future-util: move parallel_for_each to future-util.cc
  > task: stop wrapping tasks with unique_ptr
  > Merge "Setup timer signal handler in backend constructor" from Pavel
Fixes #5524
  > future: avoid a branch in future's move constructor if type is trivial
  > utils: Expose used_size
  > stream: Call get_future early
  > future-util: Move parallel_for_each_state code to a .cc
  > memcached: log exceptions
  > stream: Delete dead code
  > core: Turn pollable_fd into a simple proxy over pollable_fd_state.
  > Merge "log to std::cerr" from Benny
2020-01-14 16:56:25 +02:00
Pavel Emelyanov
e1ed8f3f7e storage_service: Remove _shadow_token_metadata
This is the part of de-bloating storage_service.

The field in question is used to temporary keep the _token_metadata
value during shard-wide replication. There's no need to have it as
class member, any "local" copy is enough.

Also, as the size of token_metadata is huge, and invoke_on_all()
copies the function for each shard, keep one local copy of metadata
using do_with() and pass it into the invoke_on_all() by reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Reviewed-by:  Asias He <asias@scylladb.com>
Message-Id: <20200113171657.10246-1-xemul@scylladb.com>
2020-01-14 16:29:10 +02:00
Rafael Ávila de Espíndola
054f5761a7 types: Refactor code into a serialize_varint helper
This is a bit cleaner and avoids a boost::multiprecision::cpp_int copy
while serializing a decimal.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200110221422.35807-1-espindola@scylladb.com>
2020-01-14 16:28:27 +02:00
Avi Kivity
6c84dd0045 cql3: update_statement: do not set query option always_return_static_content for list read-before-write
The query option always_return_static_content was added for lightweight
transations in commits e0b31dd273 (infrastructure) and 65b86d155e
(actual use). However, the flag was added unconditionally to
update_parameters::options. This caused it to be set for list
read-modify-write operations, not just for lightweight transactions.
This is a little wasteful, and worse, it breaks compatibility as old
nodes do not understand the always_return_static_content flag and
complain when they see it.

To fix, remove the always_return_static_content from
update_parameters::options and only set it from compare-and-swap
operations that are used to implement lightweight transactions.

Fixes #5593.

Reviewed-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20200114135133.2338238-1-avi@scylladb.com>
2020-01-14 16:15:20 +02:00
Hagit Segev
ef88e1e822 CentOS RPMs: Remove target to enable general centos. 2020-01-14 14:31:03 +02:00
Alejo Sanchez
6909d4db42 cql3: BYPASS CACHE query counter
This patch is the first part of requested full scan metrics.
It implements a counter of SELECT queries with BYPASS CACHE option.

In scope of #5209

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20200113222740.506610-2-alejo.sanchez@scylladb.com>
2020-01-14 12:19:00 +02:00
Rafael Ávila de Espíndola
dca1bc480f everywhere: Use serialized(foo) instead of data_value(foo).serialize()
This is just a simple cleanup that reduces the size of another patch I
am working on and is an independent improvement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114051739.370127-1-espindola@scylladb.com>
2020-01-14 12:17:12 +02:00
Pavel Emelyanov
b9f28e9335 storage_service: Remove dead drain branch
The drain_in_progress variable here is the future that's set by the
drain() operation itself. Its promise is set when the drain() finishes.

The check for this future in the beginning of drain() is pointless.
No two drain()-s can run in parallels because of run_with_api_lock()
protection. Doing the 2nd drain after successfull 1st one is also
impossible due to the _operation_mode check. The 2nd drain after
_exceptioned_ (and thus incomplete) 1st one will deadlock, after
this patch will try to drain for the 2nd time, but that should by ok.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200114094724.23876-1-xemul@scylladb.com>
2020-01-14 12:07:29 +02:00
Piotr Sarna
36ec43a262 Merge "add table with connected cql clients" from Juliusz
This change introduces system.clients table, which provides
information about CQL clients connected.

PK is the client's IP address, CK consists of outgoing port number
and client_type (which will be extended in future to thrift/alternator/redis).
Table supplies also shard_id and username. Other columns,
like connection_stage, driver_name, driver_version...,
are currently empty but exist for C* compatibility and future use.

This is an ordinary table (i.e. non-virtual) and it's updated upon
accepting connections. This is also why C*'s column request_count
was not introduced. In case of abrupt DB stop, the table should not persist,
so it's being truncated on startup.

Resolves #4820
2020-01-14 10:01:07 +02:00
Avi Kivity
1f46133273 Merge "data: make cell::make_collection() exception safe" from Botond
"
Most of the code in `cell` and the `imr` infrastructure it is built on
is `noexcept`. This means that extra care must be taken to avoid rouge
exceptions as they will bring down the node. The changes introduced by
0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this
code path by violating an undocumented and unvalidated assumption --
that fragment ranges passed to `cell::make_collection()` are nothrow
copyable and movable.

This series refactors `cell::make_collection()` such that it does not
have this assumption anymore and is safe to use with any range.

Note that the unit test included in this series, that was used to find
all the possible exception sources will not be currently run in any of
our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not
being set. I plan to address this in a followup because setting this
flags fails other tests using the failure injection mechanism. This is
because these tests are normally run with the failure injection disabled
so failures managed to lurk in without anyone noticing.

Fixes: #5575
Refs: #5341

Tests: unit(dev, debug)
"

* 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla:
  test: mutation_test: add exception safety test for large collection serialization
  data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
  utils/fragment_range.hh: introduce fragment_range_view
2020-01-14 10:01:06 +02:00
Nadav Har'El
5b08ec3d2c alternator: error on unsupported ScanIndexForward=false
We do not yet support the ScanIndexForward=false option for reversing
the sort order of a Query operation, as reported in issue #5153.
But even before implementing this feature, it is important that we
produce an error if a user attempts to use it - instead of outright
ignoring this parameter and giving the user wrong results. This is
what this patch does.

Before this patch, the reverse-order query in the xfailing test
test_query.py::test_query_reverse seems to succeed - yet gives
results in the wrong order. With this patch, the query itself fails -
stating that the ScanIndexForward=false argument is not supported.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200105113719.26326-1-nyh@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
c4bf532d37 storage_service: Fix race in removenode/force_removenode/other
Here's another theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and some other operation.
Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

At this point some other operation may start. Say, drain:

  run_with_api_lock
    _operation_in_progress = "drain"
    storage_service::drain
      ...
      go to sleep somewhere

No let's go back to the 1st op that wakes up from its sleep.
The code it executes is

    while (!ss._operation_in_progress.empty()) {
        sleep_abortable()
    }

and while the drain is running it will never exit.

However (! and this is the core of the race) should the drain
operation happen _before_ the force_removenode, another check
for _operation_in_progress would have made the latter exit with
the "Operation drain is in progress, try again" message.

Fix this inconsistency by making the check for current operation
every wake-up from the sleep_abortable.

Fixes #5591

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Pavel Emelyanov
cc92683894 storage_service: Fix race and deadlock in removenode/force_removenode
Here's a theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and removenode (again)
operations. Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      _force_remove_completion = false
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

! at this point we have _force_remove_completion = false and
_operation_in_progress = <empty>, which opens the following
opportunity for the 3d removenode:

  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now here's what we have in 2nd and 3rd ops:

1. _operation_in_progress = "removenode" (set by 3rd) prevents the
   force_removenode from exiting its loop
2. _force_remove_completion = false (set by 1st on exit) prevents
   the removenode from waiting on replicating_nodes list

One can start the 4th call with force_removenode, it will proceed and
wake up the 3rd op, but after it we'll have two force_removenode-s
running in parallel and killing each other.

I propose not to set _force_remove_completion to false in removenode,
but just exit and let the owner of this flag unset it once it gets
the control back.

Fixes #5590

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-14 10:01:06 +02:00
Benny Halevy
ff55b5dca3 cql3: functions: limit sum overflow detection to integral types
Other types do not have a wider accumulator at the moment.
And static_cast<accumulator_type>(ret) != _sum evaluates as
false for NaN/Inf floating point values.

Fixes #5586

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
e3310201dd atomic_cell_or_collection: type-aware print atomic_cell or collection components
Now that atomic_cell_view and collection_mutation_view have
type-aware printers, we can use them in the type-aware atomic_cell_or_collection
printer.
Message-Id: <20191231142832.594960-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Avi Kivity
931b196d20 mutation_partition: row: resolve column name when in schema-aware printer
Instead of printing the column id, print the full column name.
Message-Id: <20191231142944.595272-1-avi@scylladb.com>
2020-01-14 10:01:06 +02:00
Nadav Har'El
4aa323154e merge: Pretty print canonical_mutation objects
Merged pull request https://github.com/scylladb/scylla/pull/5533
from Avi Kivity:

canonical_mutation objects are used for schema reconciliation, which is a
fragile area and thus deserves some debugging help.

This series makes canonical_mutation objects printable.
2020-01-14 10:01:06 +02:00
Takuya ASADA
5241deda2d dist: nonroot: fix CLI tool path for nonroot (#5584)
CLI tool path is hardcorded, need to specify correct path on nonroot.
2020-01-14 10:01:06 +02:00
Nadav Har'El
1511b945f8 merge: Handle multiple regular base columns in view pk
Merged patch series from Piotr Sarna:

"Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.

This series is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006
Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)"

Piotr Sarna (3):
  db,view: fix checking if partition key is empty
  view: handle multiple regular base columns in view pk
  test: add a case for multiple base regular columns in view key

 alternator-test/test_gsi.py              |  1 -
 view_info.hh                             |  5 +-
 cql3/statements/alter_table_statement.cc |  2 +-
 db/view/view.cc                          | 77 ++++++++++++++----------
 mutation_partition.cc                    |  2 +-
 test/boost/cql_query_test.cc             | 58 ++++++++++++++++++
 6 files changed, 109 insertions(+), 36 deletions(-)
2020-01-14 10:01:00 +02:00
Nadav Har'El
f16e3b0491 merge: bouncing lwt request to an owning shard
Merged patch series from Gleb Natapov:

"LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.

The nicer way to achieve the same would be to jump to a right shard
inside of the storage_proxy::cas(), but unfortunately with current
implementation of the modification statements they are unusable by
a shard different from where it was created, so the jump should happen
before a modification statement for an cas() is created. When we fix our
cql code to be more cross-shard friendly this can be reworked to do the
jump in the storage_proxy."

Gleb Natapov (4):
  transport: change make_result to takes a reference to cql result
    instead of shared_ptr
  storage_service: move start_native_transport into a thread
  lwt: Process lwt request on a owning shard
  lwt: drop invoke_on in paxos_state prepare and accept

 auth/service.hh                           |   5 +-
 message/messaging_service.hh              |   2 +-
 service/client_state.hh                   |  30 +++-
 service/paxos/paxos_state.hh              |  10 +-
 service/query_state.hh                    |   6 +
 service/storage_proxy.hh                  |   2 +
 transport/messages/result_message.hh      |  20 +++
 transport/messages/result_message_base.hh |   4 +
 transport/request.hh                      |   4 +
 transport/server.hh                       |  25 ++-
 cql3/statements/batch_statement.cc        |   6 +
 cql3/statements/modification_statement.cc |   6 +
 cql3/statements/select_statement.cc       |   8 +
 message/messaging_service.cc              |   2 +-
 service/paxos/paxos_state.cc              |  48 ++---
 service/storage_proxy.cc                  |  47 ++++-
 service/storage_service.cc                | 120 +++++++------
 test/boost/cql_query_test.cc              |   1 +
 thrift/handler.cc                         |   3 +
 transport/messages/result_message.cc      |   5 +
 transport/server.cc                       | 203 ++++++++++++++++------
 21 files changed, 377 insertions(+), 180 deletions(-)
2020-01-14 09:59:59 +02:00
Botond Dénes
300728120f test: mutation_test: add exception safety test for large collection serialization
Use `seastar::memory::local_failure_injector()` to inject al possible
`std::bad_alloc`:s into the collection serialization code path. The test
just checks that there are no `std::abort()`:s caused by any of the
exceptions.

The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is
not defined.
2020-01-13 16:53:35 +02:00
Botond Dénes
3ec889816a data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
`cell::make_collection()` assumes that all ranges passed to it are
nothrow copyable and movable views. This is not guaranteed, is not
expressed in the interface and is not mentioned in the comments either.
The changes introduced by 0a453e5d3a to collection serialization, making
it use fragmented buffers, fell into this trap, as it passes
`bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy
constructor allocates and hence can throw, triggering an
`std::terminate()` inside `cell::make_collection()` as the latter is
noexcept.

To solve this issue, non-nothrow copyable and movable ranges are now
wrapped in a `fragment_range_view` to make them so.
`cell::make_collection()` already requires callers to keep alive the
range for the duration of the call, so this does not introduce any new
requirements to the callers. Additionally, to avoid any future
accidents, do not accept temporaries for the `data` parameter. We don't
ever want to move this param anyway, we will either have a trivially
copyable view, or a potentially heavy-weight range that we will create a
trivially copyable view of.
2020-01-13 16:53:35 +02:00
Botond Dénes
b52b4d36a2 utils/fragment_range.hh: introduce fragment_range_view
A lightweight, trivially copyable and movable view for fragment ranges.
Allows for uniform treatment of all kinds of ranges, i.e. treating all
of them as a view. Currently `fragment_range.hh` provides lightweight,
view-like adaptors for empty and single-fragment ranges (`bytes_view`). To
allow code to treat owning multi-fragment ranges the shame way as the
former two, we need a view for the latter as well -- this is
`fragment_range_view`.
2020-01-13 16:52:59 +02:00
Calle Wilund
75f2b2876b cdc: Remove free function for mutation augmentation 2020-01-13 13:18:55 +00:00
Calle Wilund
3eda3122af cdc: Move mutation augment from cql3::modification_statement to storage proxy
Using the attached service object
2020-01-13 13:18:55 +00:00
Juliusz Stasiewicz
27dfda0b9e main/transport: using the infrastructure of system.clients
Resolves #4820. Execution path in main.cc now cleans up system.clients
table if it exists (this is done on startup). Also, server.cc now calls
functions that notify about cql clients connecting/disconnecting.
2020-01-13 14:07:04 +01:00
Pavel Emelyanov
148da64a7e storage_servce: Move load_broadcaster away
This simplifies the storage_service API and fixes the
complain about shared_ptr usage instead of unique_ptr.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:55:09 +03:00
Pavel Emelyanov
b6e1e6df64 misc_services: Introduce load_meter
There's a lonely get_load_map() call on storage_service that
needs only load broadcaster, always runs on shard 0 and that's it.

Next patch will move this whole stuff into its own helper no-shard
container and this is preparation for this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-13 13:53:08 +03:00
Gleb Natapov
5753ab7195 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call on paxos_state level. RPC calls may
still arrive to a wrong shard so we need to make cross shard call there.
2020-01-13 10:26:02 +02:00
Gleb Natapov
d28dd4957b lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2020-01-13 10:26:02 +02:00
Piotr Sarna
3853594108 alternator-test: turn off TLS self-signed verification
Two test cases did not ignore TLS self-signed warnings, which are used
locally for testing HTTPS.

Fixes #5557

Tests(test_health, test_authorization)
Message-Id: <8bda759dc1597644c534f94d00853038c2688dd7.1578394444.git.sarna@scylladb.com>
2020-01-10 15:31:30 +02:00
Rafael Ávila de Espíndola
5313828ab8 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-2-espindola@scylladb.com>
2020-01-09 10:42:55 +02:00
Rafael Ávila de Espíndola
4da6dc1a7f cql3: Change a lambda capture order to match another
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200109025855.10591-1-espindola@scylladb.com>
2020-01-09 10:42:49 +02:00
Avi Kivity
6d454d13ac db/schema_tables: make gratuitous generic lambdas in do_merge_schema() concrete
Those gratuitous lambdas make life harder for IDE users by hiding the actual
types from the IDEs.
Message-Id: <20200107154746.1918648-1-avi@scylladb.com>
2020-01-08 17:43:18 +01:00
Avi Kivity
454074f284 Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz
"
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
"

* tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla:
  database: Avoid OOMing with flush continuations after failed memtable flush
  lsa: Introduce operator bool() to occupancy_stats
  lsa: Expose region_impl::evictable_occupancy in the region class
2020-01-08 16:58:54 +02:00
Gleb Natapov
feed544c5d paxos: fix truncation time checking during learn stage
The comparison is done in millisecons, not microseconds.

Fixes #5566

Message-Id: <20200108094927.GN9084@scylladb.com>
2020-01-08 14:37:07 +01:00
Gleb Natapov
2832f1d9eb storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2020-01-08 14:57:57 +02:00
Gleb Natapov
7fb2e8eb9f transport: change make_result to takes a reference to cql result instead of shared_ptr 2020-01-08 14:57:57 +02:00
Avi Kivity
0bde5906b3 Merge "cql3: detect and handle int overflow in aggregate functions #5537" from Benny
"
Fix overflow handling in sum() and avg().

sum:
 - aggregated into __int128
 - detect overflow when computing result and log a warning if found

avg:
 - fix division function to divide the accumulator type _sum (__int128 for integers) by _count

Add unit tests for both cases

Test:
  - manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it.
  - unit(dev), cql_query_test(debug)

Fixes #5536
"

* 'cql3-sum-overflow' of https://github.com/bhalevy/scylla:
  test: cql_query_test: test avg overflow
  cql3: functions: protect against int overflow in avg
  test: cql_query_test: test sum overflow
  cql3: functions: detect and handle int overflow in sum
  exceptions: sort exception_code definitions
  exceptions: define additional cassandra CQL exceptions codes
2020-01-08 10:39:38 +02:00
Avi Kivity
d649371baa Merge "Fix crash on SELECT SUM(udf(...))" from Rafael
"
We were failing to start a thread when the UDF call was nested in an
aggregate function call like SUM.
"

* 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla:
  cql3: Fix indentation
  cql3: Add missing with_thread_if_needed call
  cql3: Implement abstract_function_selector::requires_thread
  remove make_ready_future call
2020-01-08 10:25:42 +02:00
Benny Halevy
dafbd88349 query: initialize read_command timestamp to now
This was initialized to api::missing_timestamp but
should be set to either a client provided-timestamp or
the server's.

Unlike write operations, this timestamp need not be unique
as the one generated by client_state::get_timestamp.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>
2020-01-08 10:19:07 +02:00
Benny Halevy
39325cf297 storage_proxy: fix int overflow in service::abstract_read_executor::execute
exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp,
causing:
  service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int'
  Aborting on shard 1.

Do not optimize cross-dc repair if read_timestamp is missing (or just negative)
We're interested in reads that happen within write_timeout of a write.

Fixes #5556

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>
2020-01-08 10:18:59 +02:00
Raphael S. Carvalho
390c8b9b37 sstables: Move STCS implementation to source file
header only implementation potentially create a problem with duplicate symbols

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>
2020-01-08 09:55:35 +02:00
Benny Halevy
20a0b1a0b6 test: cql_query_test: test avg overflow
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:50:50 +02:00
Benny Halevy
1c81422c1b cql3: functions: protect against int overflow in avg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
9053ef90c7 test: cql_query_test: test sum overflow
Add unit tests for summing up int's and bigint's
with possible handling of overflow.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
e97a111f64 cql3: functions: detect and handle int overflow in sum
Detect integer overflow in cql sum functions and throw an error.
Note that Cassandra quietly truncates the sum if it doesn't fit
in the input type but we rather break compatibility in this
case. See https://issues.apache.org/jira/browse/CASSANDRA-4914?focusedCommentId=14158400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14158400

Fixes #5536

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:33 +02:00
Benny Halevy
98260254df exceptions: sort exception_code definitions
Be compatible with Cassandra source.
It's easier to maintain this way.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:48:21 +02:00
Benny Halevy
30d0f1df75 exceptions: define additional cassandra CQL exceptions codes
As of e9da85723a

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-01-08 09:40:57 +02:00
Rafael Ávila de Espíndola
282228b303 cql3: Fix indentation
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
4316bc2e18 cql3: Add missing with_thread_if_needed call
This fixes an assert when doing sum(udf(...)).

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola
d301d31de0 cql3: Implement abstract_function_selector::requires_thread
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:14:24 -08:00
Rafael Ávila de Espíndola
dc9b3b8ff2 remove make_ready_future call
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 22:10:27 -08:00
Calle Wilund
9f6b22d882 cdc: Assign self to storage proxy object 2020-01-07 12:01:58 +00:00
Calle Wilund
fc5904372b storage_proxy: Add (optional) cdc service object pointer member
The cdc service is assigned from outside, post construction, mainly
because of the chickens and eggs in main startup. Would be nice to
have it unconditionally, but this is workable.
2020-01-07 12:01:58 +00:00
Calle Wilund
d6003253dd storage_proxy: Move mutate_counters to private section
It is (and shall) only be called from inside storage proxy,
and we would like this to be reflected in the interface
so our eventual moving of cdc logic into the mutate call
chains become easier to verify and comprehend.
2020-01-07 12:01:58 +00:00
Calle Wilund
b6c788fccf cdc: Add augmentation call to cdc service
To eventually replace the free function.
Main difference is this is build to both handle batches correctly
and to eventually allow hanging cdc object on storage proxy,
and caches on the cdc object.
2020-01-07 12:01:58 +00:00
Piotr Sarna
04dc8faec9 test: add a case for multiple base regular columns in view key
The test case checks that having two base regular columns
in the materialized view key (not obtainable via CQL),
still works fine when values are inserted or deleted.
If TTL was involved and these columns would have different expiration
rules, the case would be more complicated, but it's not possible
for a user to reach that case - neither with CQL, nor with alternator.
2020-01-07 12:19:06 +01:00
Piotr Sarna
155a47cc55 view: handle multiple regular base columns in view pk
Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.
This patch is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.

Fixes #5006

Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)

Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>
2020-01-07 12:18:39 +01:00
Avi Kivity
6e0a073b2e mutation_partition: use type-aware printing of the clustering row
Now that position_in_partition_view has type-aware printing, use it
to provide a human readable version of clustering keys.
Message-Id: <20191231151315.602559-2-avi@scylladb.com>
2020-01-07 12:17:11 +01:00
Avi Kivity
488c42408a position_in_partition_view: add type-aware printer
If the position_in_partition_view represents a clustering key,
we can now see it with the clustering key decoded according to
the schema.
Message-Id: <20191231151315.602559-1-avi@scylladb.com>
2020-01-07 12:15:09 +01:00
Piotr Sarna
54315f89cd db,view: fix checking if partition key is empty
Previous implementation did not take into account that a column
in a partition key might exist in a mutation, but in a DEAD state
- if it's deleted. There are no regressions for CQL, while for
alternator and its capability of having two regular base columns
in a view key, this additional check must be performed.
2020-01-07 12:05:36 +01:00
Avi Kivity
3a3c20d337 schema_tables: de-templatize diff_table_or_view()
This reduces code bloat and makes the code friendlier for IDEs, as the
IDE now understands the type of create_schema.
Message-Id: <20191231134803.591190-1-avi@scylladb.com>
2020-01-07 11:56:54 +01:00
Avi Kivity
e5e42672f5 sstables: reduce bloat from sstables::write_simple()
sstables::write_simple() has quite a lot of boilerplate
which gets replicated into each template instance. Move
all of that into a non-template do_write_simple(), leaving
only things that truly depend on the component being written
in the template, and encapsulating them with a
noncopyable_function.

An explicit template instantiation was added, since this
is used in a header file. Before, it likely worked by
accident and stopped working when the template became
small enough to inline.

Tests: unit (dev)
Message-Id: <20200106135453.1634311-1-avi@scylladb.com>
2020-01-07 11:56:11 +01:00
Avi Kivity
8f7f56d6a0 schema_tables: make gratuitous generic lambda in create_tables_from_partitions() concrete
The generic lambda made IDE searches for create_table_from_table_row() fail.
Message-Id: <20191231135210.591972-1-avi@scylladb.com>
2020-01-07 11:49:10 +01:00
Avi Kivity
92fd83d3af schema_tables: make gratuitoous generic lambda in create_table_from_name() concrete
The lambda made IDE searches for read_table_mutations fail.
Message-Id: <20191231135103.591741-1-avi@scylladb.com>
2020-01-07 11:48:56 +01:00
Avi Kivity
dd6dd97df9 schema_tables: make gratuitous generic lambda in merge_tables_and_views() concrete
The generic lambda made IDE searches for create_table_from_mutations fail.
Message-Id: <20191231135059.591681-1-avi@scylladb.com>
2020-01-07 11:48:39 +01:00
Avi Kivity
c63cf02745 canonical_mutation: add pretty printing
Add type-aware printing of canonical_mutation objects.
2020-01-07 12:06:31 +02:00
Avi Kivity
e093121687 mutation_partition_view: add virtual visitor
mutation_partition_view now supports a compile-time resolved visitor.
This is performant but results in bloat when the performance is not
needed. Furthermore, the template function that applies the object
to the visitor is private and out-of-line, to reduce compile time.

To allow visitation on mutation_partition_view objects, add a virtual
visitor type and a non-template accept function.

Note: mutation_partition_visitor is very similar to the new type,
but different enough to break the template visitor which is used
to implement the new visitor.

The new visitor will be used to implement pretty printing for
canonical_mutation.
2020-01-07 12:06:31 +02:00
Avi Kivity
75d9909b27 collection_mutation_view: add type-aware pretty printer
Add a way for the user to associate a type with a collection_mutation_view
and get a nice printout.
2020-01-07 12:06:29 +02:00
Rafael Ávila de Espíndola
b80852c447 main: Explicitly allow scylla core dumps
I have not looked into the security reason for disabling it when
a program has file capabilities.

Fixes #5560

[avi: remove extraneous semicolon]
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200106231836.99052-1-espindola@scylladb.com>
2020-01-07 11:15:59 +02:00
Rafael Ávila de Espíndola
07f1cb53ea tests: run with ASAN_OPTIONS='disable_coredump=0:abort_on_error=1'
These are the same options we use in seastar.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200107001513.122238-1-espindola@scylladb.com>
2020-01-07 11:11:49 +02:00
Takuya ASADA
238a25a0f4 docker: fix typo of scylla-jmx script path (#5551)
The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx.

Fixes #5542
2020-01-07 10:54:16 +02:00
Asias He
401854dbaf repair: Avoid duplicated partition_end write
Consider this:

1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
   wait_for_writer_done(). A duplicate partition_end is written.

To fix, track the partition_start and partition_end written, avoid
unpaired writes.

Backports: 3.1 and 3.2
Fixes: #5527
2020-01-06 14:06:02 +02:00
Eliran Sinvani
e64445d7e5 debian-reloc: Propagate PRODUCT variable to renaming command in debian pkg
commit 21dec3881c introduced
a bug that will cause scylla debian build to fail. This is
because the commit relied on the environment PRODUCT variable
to be exported (and as a result, to propogate to the rename
command that is executed by find in a subshell)
This commit fixes it by explicitly passing the PRODUCT variable
into the rename command.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>
2020-01-06 12:31:58 +02:00
Asias He
38d4015619 gossiper: Remove HIBERNATE status from dead state
In scylla, the replacing node is set as HIBERNATE status. It is the only
place we use HIBERNATE status. The replacing node is supposed to be
alive and updating its heartbeat, so it is not supposed to be in dead
state.

This patch fixes the following problem in replacing.

   1) start n1, n2
   2) n2 is down
   3) start n3 to replace n2, but kill n3 in the middle of the replace
   4) start n4 to replace n2

After step 3 and step 4, the old n3 will stay in gossip forever until a
full cluster shutdown. Note n3 will only stay in gossip but in
system.peers table. User will see the annoying and infinite logs like on
all the nodes

   rpc - client $ip_of_n3:7000: fail to connect: Connection refused

Fixes: #5449
Tests: replace_address_test.py + manual test
2020-01-06 11:47:31 +02:00
Amos Kong
c5ec1e3ddc scylla_ntp_setup: check redhat variant version by prase_version (#5434)
VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7"
scylla_ntp_setup doesn't work on OEL7.7 for ValueError.

- ValueError: invalid literal for int() with base 10: '7.7'

This patch changed redhat_version() to return version string, and compare
with parse_version().

Fixes #5433

Signed-off-by: Amos Kong <amos@scylladb.com>
2020-01-06 11:43:14 +02:00
Asias He
145fd0313a streaming: Fix map access in stream_manager::get_progress
When the progress is queried, e.g., query from nodetool netstats
the progress info might not be updated yet.

Fix it by checking before access the map to avoid errors like:

std::out_of_range (_Map_base::at)

Fixes: #5437
Tests: nodetool_additional_test.py:TestNodetool.netstats_test
2020-01-06 10:31:15 +02:00
Rafael Ávila de Espíndola
98cd8eddeb tests: Run with halt_on_error=1:abort_on_error=1
This depends on the just emailed fixes to undefined behavior in
tests. With this change we should quickly notice if a change
introduces undefined behavior.

Fixes #4054

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>

Message-Id: <20191230222646.89628-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Rafael Ávila de Espíndola
dc5ecc9630 enum_option_test: Add explicit underlying types to enums
We expect to be able to create variables with out of range values, so
these enums needs explicit underlying types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200102173422.68704-1-espindola@scylladb.com>
2020-01-05 17:20:31 +02:00
Nadav Har'El
f0d8dd4094 merge: CDC rolling upgrade
Merged pull request https://github.com/scylladb/scylla/pull/5538 from
Avi Kivity and Piotr Jastrzębski.

This series prepares CDC for rolling upgrade. This consists of
reducing the footprint of cdc, when disabled, on the schema, adding
a cluster feature, and redacting the cdc column when transferring
it to other nodes. The latter is needed because we'll want to backport
this to 3.2, which doesn't have canonical_mutations yet.
2020-01-05 17:13:12 +02:00
Gleb Natapov
720c0aa285 commitlog: update last sync timestamp when cycle a buffer
If in memory buffer has not enough space for incoming mutation it is
written into a file, but the code missed updating timestamp of a last
sync, so we may sync to often.
Message-Id: <20200102155049.21291-9-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
14746e4218 commitlog: drop segment gate
The code that enters the gate never defers before leaving, so the gate
behaves like a flag. Lets use existing flag to prohibit adding data to a
closed segment.
Message-Id: <20200102155049.21291-8-gleb@scylladb.com>
2020-01-05 16:13:59 +02:00
Gleb Natapov
f8c8a5bd1f test: fix error reporting in commitlog_test
Message-Id: <20200102155049.21291-7-gleb@scylladb.com>
2020-01-05 16:13:58 +02:00
Gleb Natapov
680330ae70 commitlog: introduce segment::close() function.
Currently segment closing code is spread over several functions and
activated based on the _closed flag. Make segment closing explicit
by moving all the code into close() function and call it where _closed
flag is set.
Message-Id: <20200102155049.21291-6-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
a1ae08bb63 commitlog: remove unused segment::flush() parameter
Message-Id: <20200102155049.21291-5-gleb@scylladb.com>
2020-01-05 16:13:55 +02:00
Gleb Natapov
1e15e1ef44 commitlog: cleanup segment sync()
Call cycle() only once.
Message-Id: <20200102155049.21291-4-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
3d3d2c572e commitlog: move segment shutdown code from sync()
Currently sync() does two completely different things based on the
shutdown parameter. Separate code into two different function.
Message-Id: <20200102155049.21291-3-gleb@scylladb.com>
2020-01-05 16:13:54 +02:00
Gleb Natapov
89afb92b28 commitlog: drop superfluous this
Message-Id: <20200102155049.21291-2-gleb@scylladb.com>
2020-01-05 16:13:53 +02:00
Piotr Jastrzebski
95feeece0b scylla_tables: treat empty cdc props as disabled
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
396e35bf20 cdc: add schema_change test for cdc_options
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after CDC was enabled
and a table with CDC enabled is created,
in order to make sure that the digest computed
including CDC column does not change spuriously as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
c08e6985cd cdc: allow cluster rolling upgrade
Addition of cdc column in scylla_tables changes how schema
digests are calculated, and affect the ABI of schema update
messages (adding a column changes other columns' indexes
in frozen_mutation).

To fix this, extend the schema_tables mechanism with support
for the cdc column, and adjust schemas and mutations to remove
that column when sending schemas during upgrade.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
caa0a4e154 tests: disable CDC in schema_change_tests
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
129af99b94 cdc: Return reference from cluster_supports_cdc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
4639989964 cdc: Add CDC_OPTIONS schema_feature
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Avi Kivity
c150f2e5d7 schema_tables, cdc: don't store empty cdc columns in scylla_tables
An empty cdc column in scylla_tables is hashed differently from
a missing column. This causes schema mismatch when a schema is
propagated to another node, because the other node will redact
the schema column completely if the cluster feature isn't enabled,
and an empty value is hashed differently from a missing value.

Store a tombstone instead. Tombstones are removed before
digesting, so they don't affect the outcome.

This change also undoes the changes in 386221da84 ("schema_tables:
 handle 'cdc' options") to schema_change_test
test_merging_does_not_alter_tables_which_didnt_change. That change
enshrined the breakage into the test, instead of fixing the root cause,
which was that we added an an extra mutation to the schema (for
cdc options, which were disabled).
2020-01-05 14:36:18 +02:00
Rafael Ávila de Espíndola
3d641d4062 lua: Use existing cpp_int cast logic
Different versions of boost have different rules for what conversions
from cpp_int to smaller intergers are allowed.

We already had a function that worked with all supported versions, but
it was not being use by lua.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200104041028.215153-1-espindola@scylladb.com>
2020-01-05 12:10:54 +02:00
Rafael Ávila de Espíndola
88b5aadb05 tests: cql_test_env: wait for two futures starting internal services
I noticed this while looking at the crashes next is currently
experiencing.

While I have no idea if this fixes the issue, it does avoid broken
future warnings (for no_sharded_instance_exception) in a debug build.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200103201540.65324-1-espindola@scylladb.com>
2020-01-05 12:09:59 +02:00
Avi Kivity
4b8e2f5003 Update seastar submodule
* seastar 0525bbb08...36cf5c5ff (6):
  > memcached: Fix use after free in shutdown
  > Revert "task: stop wrapping tasks with unique_ptr"
  > task: stop wrapping tasks with unique_ptr
  > http: Change exception formating to the generic seastar one
  > Merge "Avoid a few calls to ~exception_ptr" from Rafael
  > tests: fix core generation with asan
2020-01-03 15:48:53 +02:00
Nadav Har'El
44c2a44b54 alternator-test: test for ConditionExpression feature
This patch adds a very comprehensive test for the ConditionExpression
feature, i.e., the newer syntax of conditional writes replacing
the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem
operations.

I wrote these tests while closely following the DynamoDB ConditionExpression
documentation, and attempted to cover all conceivable features, subfeatures
and subcases of the ConditionExpression syntax - to serve as a test for a
future support for this feature in Alternator (see issue #5053).

As usual, all these tests pass on AWS DynamoDB, but because we haven't yet
implemented this feature in Alternator, all but one xfail on Alternator.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191229143556.24002-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
aad5eeab51 alternator: better error messages when Alternator port is taken
If Alternator is requested to be enabled on a specific port but the port is
already taken, the boot fails as expected - but the error log is confusing;
It currently looks something like this:

WARN  2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
... (many more messages about the server shutting down)
INFO  2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

There are two problems here. First, the "WARN" should really be an "ERROR",
because it causes the server to be shut down and the user must see this error.
Second, the final line in the log, something the user is likely to see first,
contains only the ultimate cause for the exception (an address already in use)
but not the information what this address was needed for.

This patch solves both issues, and the log now looks like:

ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna
tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system
:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
...
INFO  2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191224124127.7093-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
1f64a3bbc9 alternator: error on unsupported ReturnValues option
We don't support yet the ReturnValues option on PutItem, UpdateItem or
DeleteItem operations (see issue #5053), but if a user tries to use such
an option anyway, we silently ignore this option. It's better to fail,
reporting the unsupported option.

In this patch we check the ReturnValues option and if it is anything but
the supported default ("NONE"), we report an error.

Also added a test to confirm this fix. The test verifies that "NONE" is
allowed, and something which is unsupported (e.g., "DOG") is not ignored
but rather causes an error.

Refs #5053.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191216193310.20060-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
dc93228b66 reloc: Turn the default flags into common flags
These are flags we always want to enable. In particular, we want them
to be used by the bots, but the bots run this script with
--configure-flags, so they were being discarded.

We put the user option later so that they can override the common
options.

Fixes #5505

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola
d4dfb6ff84 build-id: Handle the binary having multiple PT_NOTE headers
There is no requirement that all notes be placed in a single
PT_NOTE. It looks like recent lld's actually put each section in its
own PT_NOTE.

This change looks for build-id in all PT_NOTE headers.

Fixes #5525

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191227000311.421843-1-espindola@scylladb.com>
2020-01-03 15:48:20 +02:00
Avi Kivity
1e9237d814 dist: redhat: use parallel compression for rpm payload
rpm compression uses xz, which is painfully slow. Adjust the
compression settings to run on all threads.

The xz utility documentation suggests that 0 threads is
equivalent to all CPUs, but apparently the library interface
(which rpmbuild uses) doesn't think the same way.

Message-Id: <20200101141544.1054176-1-avi@scylladb.com>
2020-01-03 15:48:20 +02:00
Nadav Har'El
de1171181c user defined types: fix support for case-sensitive type names
In the current code, support for case-sensitive (quoted) user-defined type
names is broken. For example, a test doing:

    CREATE TYPE "PHone" (country_code int, number text)
    CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk))

Fails - the first line creates the type with the case-sensitive name PHone,
but the second line wrongly ends up looking for the lowercased name phone,
and fails with an exception "Unknown type ks.phone".

The problem is in cql3_type_name_impl. This class is used to convert a
type object into its proper CQL syntax - for example frozen<list<int>>.
The problem is that for a user-defined type, we forgot to quote its name
if not lowercase, and the result is wrong CQL; For example, a list of
PHone will be written as list<PHone> - but this is wrong because the CQL
parser, when it sees this expression, lowercases the unquoted type name
PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>.

The solution is for cql3_type_name_impl to use for a user-defined type
its get_name_as_cql_string() method instead of get_name_as_string().

get_name_as_cql_string() is a new method which prints the name of the
user type as it should be in a CQL expression, i.e., quoted if necessary.

The bug in the above test was apparently caused when our code serialized
the type name to disk as the string PHone (without any quoting), and then
later deserialized it using the CQL type parser, which converted it into
a lowercase phone. With this patch, the type's name is serialized as
"PHone", with the quotes, and deserialized properly as the type PHone.
While the extra quotes may seem excessive, they are necessary for the
correct CQL type expression - remember that the type expression may be
significantly more complex, e.g., frozen<list<"PHone">> and all of this,
including the quotes, is necessary for our parser to be able to translate
this string back into a type object.

This patch may cause breakage to existing databases which used case-
sensitive user-defined types, but I argue that these use cases were
already broken (as demonstrated by this test) so we won't break anything
that actually worked before.

Fixes #5544

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200101160805.15847-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
34f8762c4d storage_service: Drop _update_jobs
This field is write-only.
Leftover from 83ffae1 (storage_service: Drop block_until_update_pending_ranges_finished)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226091210.20966-1-xemul@scylladb.com>
2020-01-03 15:48:20 +02:00
Pavel Emelyanov
f2b20e7083 cache_hitrate_calculator: Do not reinvent the peering_sharded_service
The class in question wants to run its own instances on different
shards, for this sake it keeps reference on sharded self to call
invoke_on() on. There's a handy peering_sharded_service<> in seastar
for the same, using it makes the code nicer and shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226112401.23960-1-xemul@scylladb.com>
2020-01-03 15:48:19 +02:00
Rafael Ávila de Espíndola
bbed9cac35 cql3: move function creation to a .cc file
We had a lot of code in a .hh file, that while using templeates, was
only used from creating functions during startup.

This moves it to a new .cc file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200101002158.246736-1-espindola@scylladb.com>
2020-01-03 15:48:19 +02:00
Benny Halevy
c0883407fe scripts: Add cpp-name-format: pretty printer
Pretty-print cpp-names, useful for deciphering complex backtraces.

For example, the following line:
    service::storage_proxy::init_messaging_service()::{lambda(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>)#1}::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Is formatted as:
    service::storage_proxy::init_messaging_service()::{
      lambda(
        seastar::rpc::client_info const&,
        seastar::rpc::opt_time_point,
        std::vector<
          frozen_mutation,
          std::allocator<frozen_mutation>
        >,
        db::consistency_level,
        std::optional<tracing::trace_info>
      )#1
    }::operator()(
      seastar::rpc::client_info const&,
      seastar::rpc::opt_time_point,
      std::vector<
        frozen_mutation,
        std::allocator<frozen_mutation>
      >,
      db::consistency_level,
      std::optional<tracing::trace_info>
    ) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191226142212.37260-1-bhalevy@scylladb.com>
2020-01-01 12:08:12 +02:00
Rafael Ávila de Espíndola
75817d1fe7 sstable: Add checks to help track problems with large_data_handler use after free
I can't quite figure out how we were trying to write a sstable with
the large data handler already stopped, but the backtrace suggests a
good place to add extra checks.

This patch adds two check. One at the start and one at the end of
sstable::write_components. The first one should give us better
backtraces if the large_data_handler is already stopped. The second
one should help catch some race condition.

Refs: #5470
Message-Id: <20191231173237.19040-1-espindola@scylladb.com>
2020-01-01 12:03:31 +02:00
Rafael Ávila de Espíndola
3c34e2f585 types: Avoid an unaligned load in json integer serialization
The patch also adds a test that makes the fixed issue easier to
reproduce.

Fixes #5413
Message-Id: <20191231171406.15980-1-espindola@scylladb.com>
2019-12-31 19:23:42 +02:00
Gleb Natapov
bae5cb9f37 commitlog: remove unused argument during segment creation
Since 99a5a77234 all segments are created
equal and "active" argument is never true, so drop it.

Message-Id: <20191231150639.GR9084@scylladb.com>
2019-12-31 17:14:03 +02:00
Rafael Ávila de Espíndola
aa535a385d enum_option_test: Add an explicit underlying type to an enum
We expect to be able to create a variable with an out of range value,
so the enum needs an explicit underlying type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191230222029.88942-1-espindola@scylladb.com>
2019-12-31 16:59:00 +02:00
Nadav Har'El
48a914c291 Fix uninitialized members
Merged pull request https://github.com/scylladb/scylla/pull/5532 from
Benny Halevy:

Initialize bool members in row_level_repair and _storage_service causing
ubsan errors.

Fixes #5531
2019-12-31 10:32:54 +02:00
Takuya ASADA
aa87169670 dist/debian: add procps on Depends
We require procps package to use sysctl on postinst script for scylla-kernel-conf.

Fixes #5494

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191218234100.37844-1-syuu@scylladb.com>
2019-12-30 19:30:35 +02:00
Avi Kivity
972127e3a8 atomic_cell: add type-aware pretty printing
The standard printer for atomic_cell prints the value as hex,
because atomic_cell does not include the type. Add a type-aware
printer that allows the user to provide the type.
2019-12-30 18:27:04 +02:00
Avi Kivity
19f68412ad atomic_cell: move pretty printers from database.cc to atomic_cell.cc
atomic_cell.cc is the logical home for atomic_cell pretty printers,
and since we plan to add more pretty printers, start by tidying up.
2019-12-30 18:20:30 +02:00
Eliran Sinvani
21dec3881c debian-reloc: rename buld product to the name specified in SCYLLA-VERSION-GEN
When the product name is other than "scylla", the debian
packaging scripts go over all files that starts with "scylla-"
and change the prefix to be the actual product name.
However, if there are no such files in the directory
the script will fail since the renaming command will
get the wildcard string instrad of an actual file name.
This patch replaces the command with a command with
an equivalent desired effect that only operates on files
if there are any.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>
2019-12-30 17:45:50 +02:00
Takuya ASADA
263385cb4b dist: stop replacing /usr/lib/scylla with symlink (#5530)
Since we merged /usr/lib/scylla with /opt/scylladb, we removed
/usr/lib/scylla and replace it with the symlink point to /opt/scylladb.
However, RPM does not support replacing a directory with a symlink,
we are doing some dirty hack using RPM scriptlet, but it causes
multiple issues on upgrade/downgrade.
(See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/)

To minimize Scylla upgrading/downgrade issues on user side, it's better
to keep /usr/lib/scylla directory.
Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb,
we can create symlinks for each setup scripts like
/usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>.

Fixes #5522
Fixes #4585
Fixes #4611
2019-12-30 13:52:24 +02:00
Hagit Segev
9d454b7dc6 reloc/build_rpm.sh: Fix '--builddir' option handling (#5519)
The '--builddir' option value is assigned to the "builddir" variable,
which is wrong. The correct variable is "BUILDDIR" so use that instead
to fix the '--builddir' option.

Also, add logging to the script when executing the "dist/redhat_build.rpm.sh"
script to simplify debugging.
2019-12-30 13:25:22 +02:00
Benny Halevy
8aa5d84dd8 storage_service: initialize _is_bootstrap_mode
Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode:
  service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool'

The use site is:
  service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519

While at it, initialize `_initialized` to false as well, just in case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Benny Halevy
474ffb6e54 repair: initialize row_level_repair: _zero_rows
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'

Fixes #5531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-30 11:44:58 +02:00
Fabiano Lucchese
d7795b1efa scylla_setup: Support for enforcing optimal Linux clocksource setting (#5499)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is used.

Fixes #4474
Fixes #5474
Fixes #5480
2019-12-30 10:54:14 +02:00
Avi Kivity
e223154268 cdc: options: return an empty options map when cdc is disabled
This is compatible with 3.1 and below, which didn't have that schema
field at all.
2019-12-29 16:34:37 +02:00
Benny Halevy
27e0aee358 docs/debugging.md: fix anchor links
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191229074136.13516-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Pavel Solodovnikov
aba9a11ff0 cql: pass variable_specifications via lw_shared_ptr
Instances of `variable_specifications` are passed around as
shared_ptr's, which are redundant in this case since the class
is marked as `final`. Use `lw_shared_ptr` instead since we know
for sure it's not a polymorphic pointer.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>
2019-12-29 16:26:26 +02:00
Benny Halevy
4c884908bb directories: Keep a unique set of directories to initialize
If any two directories of data/commitlog/hints/view_hints
are the same we still end up running verify_owner_and_mode
and disk_sanity(check_direct_io_support) in parallel
on the same directoriea and hit #5510.

This change uses std::set rather than std::vector to
collect a unique set of directories that need initialization.

Fixes #5510

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>
2019-12-29 16:26:26 +02:00
Gleb Natapov
60a851d3a5 commitlog: always flush segments atomically with writing
db::commitlog::segment::batch_cycle() assumes that after a write
for a certain position completes (as reported by
_pending_ops.wait_for_pending()) it will also be flushed, but this is
true only if writing and flushing are atomic wrt _pending_ops lock.
It usually is unless flush_after is set to false when cycle() is
called. In this case only writing is done under the lock. This
is exactly what happens when a segment is closed. Flush is skipped
because zero header is added after the last entry and then flushed, but
this optimization breaks batch_cycle() assumption. Fix it by flushing
after the write atomically even if a segment is being closed.

Fixes #5496

Message-Id: <20191224115814.GA6398@scylladb.com>
2019-12-24 14:52:23 +02:00
Pavel Emelyanov
a5cdfea799 directories: Do not mess with per-shard base dir
The hints and view_hints directory has per-shard sub-dirs,
and the directories code tries to create, check and lock
all of them, including the base one.

The manipulations in question are excessive -- it's enough
to check and lock either the base dir, or all the per-shard
ones, but not everything. Let's take the latter approach for
its simplicity.

Fixes #5510

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Looks-good-to: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223142429.28448-1-xemul@scylladb.com>
2019-12-24 14:49:28 +02:00
Benny Halevy
f8f5db42ca dbuild: try to pull image if not present locally
Pekka Enberg <penberg@scylladb.com> wrote:
> Image might not be present, but the subsequent "docker run" command will automatically pull it.

Just letting "docker run" fail produces kinda confusing error message,
referring to docker help, but the we want to provide the user
with our own help, so still fail early, just also try to pull the image
if "docker image inspect" failed, indicating it's not present locally.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>
2019-12-24 11:13:23 +02:00
Benny Halevy
ee2f97680a dbuild: just die when no image-id is provided
Suggested-by: Pekka Enberg <penberg@scylladb.com>
> This will print all the available Docker images,
> many (most?) of them completely unrelated.
> Why not just print an error saying that no image was specified,
> and then perhaps print usage.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>
2019-12-24 11:13:22 +02:00
Benny Halevy
87b2f189f7 dbuild: s/usage/die/
Suggested-by: Dejan Mircevski <dejan@scylladb.com>
> The use pattern of this function strongly suggests a name like `die`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-2-bhalevy@scylladb.com>
2019-12-24 11:13:21 +02:00
Benny Halevy
718e9eb341 table: move_sstables_from_staging: fix use after free of shared_sstable
Introduced in 4b3243f5b9

Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test
and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test

==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0
READ of size 8 at 0x60200023de18 thread T1 (reactor-1)
    #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289
    #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530
    #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl
e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556
    #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(
std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618
    #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626
    #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>*, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging
(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20)
freed by thread T1 (reactor-1) here:
    #0 0x7f8a153b796f in operator delete(void*) (/lib64/libasan.so.5+0x11096f)
    #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128
    #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470
    #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351
    #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332
    #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680
    #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477
    #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496
    #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573
    #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645
    #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488
    #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36
    #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44
    #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable>
 >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563
    ...

Fixes #5511

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>
2019-12-23 15:20:41 +02:00
Konstantin Osipov
476fbc60be test.py: prepare to remove custom colors
Add dbuild dependency on python3-colorama,
which will be used in test.py instead of a hand-made palette.

[avi: update tools/toolchain/image]
Message-Id: <20191223125251.92064-2-kostja@scylladb.com>
2019-12-23 15:13:22 +02:00
Pavel Emelyanov
d361894b9d batchlog_manager: Speed up token_metadata endpoints counting a bit
In this place we only need to know the number of endpoints,
while current code additionally shuffles them before counting.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:45 +02:00
Pavel Emelyanov
6e06c88b4c token_metadata: Remove unused helper
There are two _identical_ methods in token_metadata class:
get_all_endpoints_count() and number_of_endpoints().
The former one is used (called) the latter one is not used, so
let's remove it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:43 +02:00
Pavel Emelyanov
2662d9c596 migration_manager: Remove run_may_throw() first argument
It's unused in this function. Also this helps getting
rid of global instances of components.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:42 +02:00
Pavel Emelyanov
703b16516a storage_service: Remove unused helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-23 14:22:41 +02:00
Takuya ASADA
e0071b1756 reloc: don't archive dist/ami/files/*.rpm on relocatable package
We should skip archiving dist/ami/files/*.rpm on relocatable package,
since it doesn't used.
Also packer and variables.json, too.

Fixes #5508

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191223121044.163861-1-syuu@scylladb.com>
2019-12-23 14:19:51 +02:00
Tomasz Grabiec
28dec80342 db/schema_tables: Add trace-level logging of schema digesting
This greatly helps to narrow down the source of schema digest mismatch
between nodes. Intented use is to enable this logger on disagreeing
nodes and trigger schema digest recalculation and observe which
mutations differ in digest and then examine their content.

Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>
2019-12-23 12:28:22 +02:00
Konstantin Osipov
1116700bc9 test.py: do not return 0 if there are failed tests
Fix a return value regression introduced when switching to asyncio.

Message-Id: <20191222134706.16616-2-kostja@scylladb.com>
2019-12-22 16:14:32 +02:00
Asias He
7322b749e0 repair: Do not return working_row_buf_nr in get combined row hash verb
In commit b463d7039c (repair: Introduce
get_combined_row_hash_response), working_row_buf_nr is returned in
REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is
scheduled to be part of 3.1 release. However it is not backported to 3.1
by accident.

In order to be compatible between 3.1 and 3.2 repair. We need to drop
the working_row_buf_nr in 3.2 release.

Fixes: #5490
Backports: 3.2
Tests: Run repair in a mixed 3.1 and 3.2 cluster
2019-12-21 20:13:15 +02:00
Takuya ASADA
8eaecc5ed6 dist/common/scripts/scylla_setup: add swap existance check
Show warnings when no swap is configured on the node.

Closes #2511

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220080222.46607-1-syuu@scylladb.com>
2019-12-21 20:03:58 +02:00
Pavel Solodovnikov
5a15bed569 cql3: return result_set by cref in cql3::result::result_set
Changes summary:
* make `cql3::result_set` movable-only
* change signature of `cql3::result::result_set` to return by cref
* adjust available call sites to the aforementioned method to accept cref

Motivation behind this change is elimination of dangerous API,
which can easily set a trap for developers who don't expect that
result_set would be returned by value.

There is no point in copying the `result_set` around, so make
`cql3::result::result_set` to cache `result_set` internally in a
`unique_ptr` member variable and return a const reference so to
minimize unnecessary copies here and there.

Tests: unit(debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>
2019-12-21 16:56:42 +02:00
Takuya ASADA
3a6cb0ed8c install.sh: drop limits.d from nonroot mode
The file only required for root mode.

Fixes #5507

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191220101940.52596-1-syuu@scylladb.com>
2019-12-21 15:26:08 +02:00
Botond Dénes
08bb0bd6aa mutation_fragment_stream_validator: wrap exceptions into own exception type
So a higher level component using the validator to validate a stream can
catch only validation errors, and let any other incidental exception
through.

This allows building data correctors on top of the
`mutation_fragment_stream_validator`, by filtering a fragment stream
through a validator, catching invalid fragment stream exceptions and
dropping the respective fragments from the stream.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>
2019-12-20 12:05:00 +01:00
Rafael Ávila de Espíndola
91c7f5bf44 Print build-id on startup
Fixes #5426

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191218031556.120089-1-espindola@scylladb.com>
2019-12-19 15:43:04 +02:00
Avi Kivity
440ad6abcc Revert "relocatable: Check that patchelf didn't mangle the PT_LOAD headers"
This reverts commit 237ba74743. While it
works for the scylla executable, it fails for iotune, which is built
by seastar. It should be reinstated after we pass the correct link
parameters to the seastar build system.
2019-12-19 11:20:34 +02:00
Pekka Enberg
c0aea19419 Merge "Add a timeout for housekeeping for offline installs" from Amnon
"
These series solves an issue with scylla_setup and prevent it from
waiting forever if housekeeping cannot look for the new Scylla version.

Fixes #5302

It should be backported to versions that support offline installations.
"

* 'scylla_setup_timeout' of git://github.com/amnonh/scylla:
  scylla_setup: do not wait forever if no reply is return housekeeping
  scylla_util.py: Add optional timeout to out function
2019-12-19 08:18:19 +02:00
Rafael Ávila de Espíndola
8d777b3ad5 relocatable: Use a super long path for the dynamic linker
Having a long path allows patchelf to change the interpreter without
changing the PT_LOAD headers and therefore without moving the
build-id out of the first page.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-1-espindola@scylladb.com>
2019-12-18 19:10:59 +02:00
Pavel Solodovnikov
c451f6d82a LWT: Fix required participants calculation for LOCAL_SERIAL CL
Suppose we have a multi-dc setup (e.g. 9 nodes distributed across
3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]).

When a query that uses LWT is executed with LOCAL_SERIAL consistency
level, the `storage_proxy::get_paxos_participants` function
incorrectly calculates the number of required participants to serve
the query.

In the example above it's calculated to be 5 (i.e. the number of
nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL,
which is equivalent to LOCAL_QUORUM cl in this case).

This behavior results in an exception being thrown when executing
the following query with LOCAL_SERIAL cl:

INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS

Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'}

Tests: unit(dev), dtest(consistency_test.py)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>
2019-12-18 16:58:32 +01:00
Botond Dénes
cd6bf3cb28 scylla-gdb.py: static_vector: update for changed storage
The actual buffer is now in a member called 'data'. Leave the old
`dummy.dummy` and `dummy` as fall-back. This seems to change every
Fedora release.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>
2019-12-18 17:39:56 +02:00
Tomasz Grabiec
5865d08d6c migration_manager: Recalculate schema only on shard 0
Schema is node-global, update_schema_version_and_announce() updates
all shards.  We don't need to recalculate it from every shard, so
install the listeners only on shard 0. Reduces noise in the logs.

Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>
2019-12-18 16:43:26 +02:00
Pavel Emelyanov
998f51579a storage_service: Rip join_ring config option
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.

This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.

The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
2019-12-18 12:45:13 +02:00
Nadav Har'El
8157f530f5 merge: CDC: handle schema changes
Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund:

Moves schema creation/alter/drop awareness to use new "before" callbacks from
migration manager, and adds/modifies log and streams table as part of the base
table modification.

Makes schema changes semi-atomic per node. While this does not deal with updates
coming in before a schema change has propagated cluster, it now falls into the
same pit as when this happens without CDC.

Added side effect is also that now schemas are transparent across all subsystems,
not just cql.

Patches:
  cdc_test: Add small test for altering base schema (add column)
  cdc: Handle schema changes via migration manager callbacks
  migration_manager: Invoke "before" callbacks for table operations
  migration_listener: Add empty base class and "before" callbacks for tables
  cql_test_env: Include cdc service in cql tests
  cdc: Add sharded service that does nothing.
  cdc: Move "options" to separate header to avoid to much header inclusion
  cdc: Remove some code from header
2019-12-17 23:04:36 +02:00
Avi Kivity
1157ee16a5 Update seastar submodule
* seastar 00da4c8760...0525bbb08f (7):
  > future: Simplify future_state_base::any move constructor
  > future: don't create temporary tuple on future::get().
  > future: don't instantiate new future on future::then_wrapped().
  > future: clean-up the Result handling in then_wrapped().
  > Merge "Fix core dumps when asan is enabled" from Rafael
  > future: Move ignore to the base class
  > future: Don't delete in ignore
2019-12-17 19:47:50 +02:00
Botond Dénes
638623b56b configure.py: make build.ninja target depend on SCYLLA-VERSION-GEN
Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and
hence changes done to it will not be picked up by ninja. To trigger a
rebuild and hence version changes to appear in the `scylla` target
binary, one has to do `touch configure.py`. This is counter intuitive
and frustrating to people who don't know about it and wonder why their
changed version is not appearing as the output of `scylla --version`.

This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja,
making the `build.ninja` target out-of-date whenever
`SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of
`configure.py` when the next target is built, allowing a build of e.g.
`scylla` to pick up any changes done to the version automatically.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>
2019-12-17 17:40:04 +02:00
Avi Kivity
7152ba0c70 Merge "tests: automatically search for unit tests" from Kostja
"
This patch set rearranges the test files so that
it is now possible to search for tests automatically,
and adds this functionality to test.py
"

* 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev:
  cmake: update CMakeLists.txt to scan test/ rather than tests/
  test.py: automatically lookup all unit and boost tests
  tests: move all test source files to their new locations
  tests: move a few remaining headers
  tests: move another set of headers to the new test layout
  tests: move .hh files and resources to new locations
  tests: remove executable property from data_listeners_test.cc
2019-12-17 17:32:18 +02:00
Amnon Heiman
dd42f83013 scylla_setup: do not wait forever if no reply is return housekeeping
When scylla is installed without a network connectivity, the test if a
newer version is available can cause scylla_setup to wait forever.

This patch adds a limit to the time scylla_setup will wait for a reply.

When there is no reply, the relevent error will be shown that it was
unable to check for newer version, but this will not block the setup
script.

Fixes #5302

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-17 14:56:47 +02:00
Nadav Har'El
aa1de5a171 merge: Synchronize snapshot and staging sstable deletion using sem
Merged pull request https://github.com/scylladb/scylla/pull/5343 from
Benny Halevy.

Fixes #5340

Hold the sstable_deletion_sem table::move_sstables_from_subdirs to
serialize access to the staging directory. It now synchronizes snapshot,
compaction deletion of sstables, and view_update_generator moving of
sstables from staging.

Tests:

    unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master]
    snapshot_test.py (dev)
2019-12-17 14:06:02 +02:00
Juliusz Stasiewicz
7fdc8563bf system_keyspace: Added infrastructure for table `system.clients'
I used the following as a reference:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java
At this moment there is only info about IP, clients outgoing port,
client 'type' (i.e. CQL/thrift/alternator), shard ID and username.
Column `request_count' is NOT present and CK consists of
(`port', `client_type'), contrary to what C*'s has: (`port').

Code that notifies `system.clients` about new connections goes
to top-level files `connection_notifier.*`. Currently only CQL
clients are observed, but enum `client_type` can be used in future
to notify about connections with other protocols.
2019-12-17 11:31:28 +01:00
Benny Halevy
4b3243f5b9 table: move_sstables_from_staging_in_thread with _sstable_deletion_sem
Hold the _sstable_deletion_sem while moving sstables from the staging directory
so not to move them under the feet of table::snapshot.

Fixes #5340

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0446ce712a view_update_generator::start: use variable binding
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
5d7c80c148 view_update_generator::start: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
02784f46b9 view_update_generator: handle errors when processing sstable
Consumer may throw, in this case, break from the loop and retry.

move_sstable_from_staging_in_thread may theoretically throw too,
ignore the error in this case since the sstable was already processed,
individual move failures are already ignored and moving from staging
will be retried upon restart.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
abda12107f sstables: move_to_new_dir: add do_sync_dirs param
To be used for "batch" move of several sstables from staging
to the base directory, allowing the caller to sync the directories
once when all are moved rather than for each one of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
6efef84185 sstable: return future from move_to_new_dir
distributed_loader::probe_file needlessly creates a seastar
thread for it and the next patch will use it as part of
a parallel_for_each loop to move a list of sstables
(and sync the directories once at the end).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
0d2a7111b2 view_update_generator: sstable_with_table: std::move constructor args
Just a small optimization.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:19:55 +02:00
Nadav Har'El
fc85c49491 alternator: error on unsupported parallel scan
We do not yet support the parallel Scan options (TotalSegments, Segment),
as reported in issue #5059. But even before implementing this feature, it
is important that we produce an error if a user attempts to use it - instead
of outright ignoring this parameter. This is what this patch does.

The patch also adds a full test, test_scan.py::test_scan_parallel, for the
parallel scan feature. The test passes on DynamoDB, and still xfails
on Alternator after this patch - but now the Scan request fails immediately
reporting the unsupported option - instead of what the pre-patch code did:
returning the wrong results and the test failing just when the results
do not match the expectations.

Refs #5059.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191217084917.26191-1-nyh@scylladb.com>
2019-12-17 11:27:56 +02:00
Avi Kivity
f7d69b0428 Revert "Merge "bouncing lwt request to an owning shard" from Gleb"
This reverts commit 64cade15cc, reversing
changes made to 9f62a3538c.

This commit is suspected of corrupting the response stream.

Fixes #5479.
2019-12-17 11:06:10 +02:00
Rafael Ávila de Espíndola
237ba74743 relocatable: Check that patchelf didn't mangle the PT_LOAD headers
Should avoid issue #4983 showing up again.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-2-espindola@scylladb.com>
2019-12-16 20:18:32 +02:00
Avi Kivity
3b7aca3406 Merge "db: Don't create a reference to nullptr" from Rafael
"
Only the first patch is needed to fix the undefined behavior, but the
followup ones simplify the memory management around user types.
"

* 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla:
  db: Don't use lw_shared_ptr for user_types_metadata
  user_types_metadata: don't implement enable_lw_shared_from_this
  cql3: pass a const user_types_metadata& to prepare_internal
  db: drop special case for top level UDTs
  db: simplify db::cql_type_parser::parse
  db: Don't create a reference to nullptr
  Add test for loading a schema with a non native type
2019-12-16 17:10:58 +02:00
Konstantin Osipov
d6bc7cae67 cmake: update CMakeLists.txt to scan test/ rather than tests/
A follow up on directory rename.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
e079a04f2a test.py: automatically lookup all unit and boost tests 2019-12-16 17:47:42 +03:00
Konstantin Osipov
1c8736f998 tests: move all test source files to their new locations
1. Move tests to test (using singular seems to be a convention
   in the rest of the code base)
2. Move boost tests to test/boost, other
   (non-boost) unit tests to test/unit, tests which are
   expected to be run manually to test/manual.

Update configure.py and test.py with new paths to tests.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
2fca24e267 tests: move a few remaining headers
Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to
test/lib or test/boost and update dependent .cc files.
Move tests/perf_sstable.hh to test/perf/perf_sstable.hh
2019-12-16 17:47:42 +03:00
Konstantin Osipov
b9bf1fbede tests: move another set of headers to the new test layout
Move another small subset of headers to test/
with the same goals:
- preserve bisectability
- make the revision history traceable after a move

Update dependent files.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
8047d24c48 tests: move .hh files and resources to new locations
The plan is to move the unstructured content of tests/ directory
into the following directories of test/:

test/lib - shared header and source files for unit tests
test/boost - boost unit tests
test/unit - non-boost unit tests
test/manual - tests intended to be run manually
test/resource - binary test resources and configuration files

In order to not break git bisect and preserve the file history,
first move most of the header files and resources.
Update paths to these files in .cc files, which are not moved.
2019-12-16 17:47:42 +03:00
Konstantin Osipov
644595e15f tests: remove executable property from data_listeners_test.cc
Executable flag must be committed to git by mistake.
2019-12-16 17:47:41 +03:00
Benny Halevy
d2e00abe13 tests: commitlog_test: test_allocation_failure: improve error reporting
We're seeing the following error from test from time to time:
  fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record

This is not reproducible and the error string does not contain
enough information to figure out what happened exactly, therefore
this patch adds an exception if the call succeeded unexpectedly
and also prints the unexpected exception if one was caught.

Refs #4714

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>
2019-12-16 15:38:48 +01:00
Asias He
6b7344f6e5 streaming: Fix typo in stream_result_future::maybe_complete
s/progess/progress/

Refs: #5437
2019-12-16 11:12:03 +02:00
Dejan Mircevski
f3883cd935 dbuild: Fix podman invocation (#5481)
The is_podman check was depending on `docker -v` printing "podman" in
the output, but that doesn't actually work, since podman prints $0.
Use `docker --help` instead, which will output "podman".

Also return podman's return status, which was previously being
dropped.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-16 11:11:48 +02:00
Avi Kivity
00ae4af94c Merge "Sanitize and speed-up (a bit) directories set up" from Pavel
"
On start there are two things that scylla does on data/commitlog/etc.
dirs: locks and verifies permissions. Right now these two actions are
managed by different approaches, it's convenient to merge them.

Also the introduced in this set directories class makes a ground for
better --workdir option handling. In particular, right now the db::config
entries are modified after options parse to update directories with
the workdir prefix. With the directories class at hands will be able
to stop doing this.
"

* 'br-directories-cleanup' of https://github.com/xemul/scylla:
  directories: Make internals work on fs::path
  directories: Cleanup adding dirs to the vector to work on
  directories: Drop seastar::async usage
  directories: Do touch_and_lock and verify sequentially
  directories: Do touch_and_lock in parallel
  directories: Move the whole stuff into own .cc file
  directories: Move all the dirs code into .init method
  file_lock: Work with fs::path, not sstring
2019-12-15 16:02:46 +02:00
Takuya ASADA
5e502ccea9 install.sh: setup workdir correctly on nonroot mode
Specify correct workdir on nonroot mode, to set correct path of
data / commitlog / hints directories at once.

Fixes #5475

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191213012755.194145-1-syuu@scylladb.com>
2019-12-15 16:00:57 +02:00
Avi Kivity
c25d51a4ea Revert "scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)"
This reverts commit 4333b37f9e. It breaks upgrades,
and the user question is not informative enough for the user to make a correct
decision.

Fixes #5478.
Fixes #5480.
2019-12-15 14:37:40 +02:00
Pavel Emelyanov
23a8d32920 directories: Make internals work on fs::path
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
373fcfdb3e directories: Cleanup adding dirs to the vector to work on
The unordered_set is turned into vector since for fs::path
there's no hash() method that's needed for set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
14437da769 directories: Drop seastar::async usage
Now the only future-able operation remained is the call to
parallel_for_each(), all the rest is non-blocking preparation,
so we can drop the seastar::async and just return the future
from parallel_for_each.

The indendation is now good, as in previous patch is was prepared
just for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
06f4f3e6d8 directories: Do touch_and_lock and verify sequentially
The goal is to drop the seastar::async() usage.

Currently we have two places that return futures -- calls to
parallel_for_each-s.  We can either chain them together or,
since both are working on the same set of directories, chain
actions inside them.

For code simplicity I propose to chain actions.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
8d0c820aa1 directories: Do touch_and_lock in parallel
The list of paths that should be touch-and-locked is already
at hands, this shortens the code and makes it slightly faster
(in theory).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Pavel Emelyanov
71a528d404 directories: Move the whole stuff into own .cc file
In order not to pollute the root dir place the code in
utils/ directory, "utils" namespace.

While doing this -- move the touch_and_lock from the
class declaration.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 19:52:01 +03:00
Benny Halevy
9ec98324ed messaging_service: unregister_handler: return rpc unregister_handler future
Now that seastar returns it.

Fixes https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>
2019-12-12 16:38:36 +02:00
Pavel Emelyanov
f2b3c17e66 directories: Move all the dirs code into .init method
The seastar::async usage is tempoarary, added for bisect-safety,
soon it will go away. For this reason the indentation in the
.init method is not "canonical", but is prepared for one-patch
drop of the seastar::async.

The hinted_handoff_enabled arg is there, as it's not just a
parameter on config, it had been parsed in main.cc.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:33:11 +03:00
Pavel Emelyanov
82ef2a7730 file_lock: Work with fs::path, not sstring
The main.cc code that converts sstring to fs::path
will be patched soon, the file_desc::open belongs
to seastar and works on sstrings.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-12-12 17:32:10 +03:00
Konstantin Osipov
bc482ee666 test.py: remove an unused option
Message-Id: <20191204142622.89920-2-kostja@scylladb.com>
2019-12-12 15:53:35 +02:00
Avi Kivity
64cade15cc Merge "bouncing lwt request to an owning shard" from Gleb
"
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.
"

* 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev:
  lwt: take raw lock for entire cas duration
  lwt: drop invoke_on in paxos_state prepare and accept
  lwt: Process lwt request on a owning shard
  storage_service: move start_native_transport into a thread
  transport: change make_result to takes a reference to cql result instead of shared_ptr
2019-12-12 15:50:22 +02:00
Nadav Har'El
9f62a3538c alternator: fix BEGINS_WITH operator for blobs
The implementation of Expected's BEGINS_WITH operator on blobs was
incorrect, naively comparing the base64-encoded strings, which doesn't
work. This patches fixes the code to compare the decoded strings.

The reason why the BEGINS_WITH test missed this bug was that we forgot
to check the blob case and only tested the string case; So this patch
also adds the missing test - which reproduces this bug, and verifies
its fix.

Fixes #5457

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191211115526.29862-1-nyh@scylladb.com>
2019-12-12 14:02:56 +01:00
Dejan Mircevski
27b8b6fe9d cql3: Fix needs_filtering() for clustering columns
The LIKE operator requires filtering, so needs_filtering() must check
is_LIKE().  This already happens for partition columns, but it was
overlooked for clustering columns in the initial implementation of
LIKE.

Fixes #5400.

Tests: unit(dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-12 01:19:13 +02:00
Benny Halevy
d1bcb39e7f hinted handoff: log message after removing hints directory (#5372)
To be used by dtest as an indicator that endpoint's hints
were drained and hints directory is removed.

Refs #5354

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-12 01:16:19 +02:00
Rafael Ávila de Espíndola
3b61cf3f0b db: Don't use lw_shared_ptr for user_types_metadata
The user_types_metadata can simply be owned by the keyspace. This
simplifies the code since we never have to worry about nulls and the
ownership is now explicit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
a55838323b user_types_metadata: don't implement enable_lw_shared_from_this
It looks like this was done just to avoid including
user_types_metadata.hh, which seems a bit much considering that it
requires adding specialization to the seastar namespace.

A followup patch will also stop using lw_shared_ptr for
user_types_metadata.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
f7c2c60b07 cql3: pass a const user_types_metadata& to prepare_internal
We never modify the user_types_metadata via prepare_internal, so we
can pass it a const reference.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
99cb8965be db: drop special case for top level UDTs
This was originally done in 7f64a6ec4b,
but that commit was reverted in reverted in
8517eecc28.

The revert was done because the original change would call parse_raw
for non UDT types. Unlike the old patch, this one doesn't change the
behavior of non UDT types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
7ae9955c5f db: simplify db::cql_type_parser::parse
The variant of db::cql_type_parser::parse that has a
user_types_metadata argument was only used from the variant that
didn't. This inlines one in the other.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
2092e1ef6f db: Don't create a reference to nullptr
The user_types variable can be null during db startup since we have to
create types before reading the system table defining user types.

This avoids undefined behavior, but is unlikely that it was causing
more serious problems since the variable is only used when creating
user types and we don't create any until after all system tables are
read, in which case the user_types variable is not null.

Fixes #5193

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
6143941535 Add test for loading a schema with a non native type
This would have found the error with the previous version of the patch
series.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:43:34 -08:00
Gleb Natapov
64cfb9b1f6 lwt: take raw lock for entire cas duration
It will prevent parallel update by the same coordinator and should
reduce contention.
2019-12-11 14:41:31 +02:00
Gleb Natapov
898d2330a2 lwt: drop invoke_on in paxos_state prepare and accept
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call.
2019-12-11 14:41:31 +02:00
Gleb Natapov
964c532c4f lwt: Process lwt request on a owning shard
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt.  It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
2019-12-11 14:41:31 +02:00
Gleb Natapov
54be057af3 storage_service: move start_native_transport into a thread
The code runs only once and it is simple if it runs in a seastar thread.
2019-12-11 14:41:31 +02:00
Gleb Natapov
007ba3e38e transport: change make_result to takes a reference to cql result instead of shared_ptr 2019-12-11 14:41:31 +02:00
Nadav Har'El
9e5c6995a3 alternator-test: add tests for ReturnValues parameter
This patch adds comprehensive tests for the ReturnValue parameter of
the write operations (PutItem, UpdateItem, DeleteItem), which can return
pre-write or post-write values of the modified item. The tests are in
a new test file, alternator-test/test_returnvalues.py.

This feature is not yet implemented in Alternator, so all the new
tests xfail on Alternator (and all pass on AWS).

Refs #5053

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127163735.19499-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Nadav Har'El
ab69bfc111 alternator-test: add xfailing tests for ScanIndexForward
This patch adds tests for Query's "ScanIndexForward" parameter, which
can be used to return items in reversed sort order.
We test that a Limit works and returns the given number of *last* items
in the sort order, and also that such reverse queries can be resumed,
i.e., paging works in the reverse order.

These tests pass against AWS DynamoDB, but fail against Alternator (which
doesn't support ScanIndexForward yet), so it is marked xfail.

Refs #5153.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127114657.14953-1-nyh@scylladb.com>
2019-12-11 13:26:39 +01:00
Pekka Enberg
6bc18ba713 storage_proxy: Remove reference to MBean interface
The JMX interface is implemented by the scylla-jmx project, not scylla.
Therefore, let's remove this historical reference to MBeans from
storage_proxy.

Message-Id: <20191211121652.22461-1-penberg@scylladb.com>
2019-12-11 14:24:28 +02:00
Avi Kivity
63474a3380 Merge "Add experimental_features option" from Dejan
"
Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser.

Fixes #5338
"

* 'vecexper' of https://github.com/dekimir/scylla:
  config: Add `experimental_features` option
  utils: Add enum_option
2019-12-11 14:23:08 +02:00
Avi Kivity
56b9bdc90f Update seastar submodule
* seastar e440e831c8...00da4c8760 (7):
  > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi
Fixes #5443.
  > install-dependencies.sh: fix arch dependencies
  > Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny
  > Merge "testing: improve the observability of abandoned failed futures" from Botond
  > rework the fair_queue tester
  > directory_test: Update to use run instead of run_deprecated
  > log: support fmt 6.0 branch with chrono.h for log
2019-12-11 14:17:49 +02:00
Benny Halevy
105c8ef5a9 messaging_service: wait on unregister_handler
Prepare for returning future<> from seastar rpc
unregister_handler.

Refs https://github.com/scylladb/scylla/issues/5228

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>
2019-12-11 14:17:41 +02:00
Nadav Har'El
06c3802a1a storage_proxy: avoid overflow in view-backlog delay calculation
In the calculate_delay() code for view-backlog flow control, we calculate
a delay and cap it at a "budget" - the remaining timeout. This timeout is
measured in milliseconds, but the capping calculation converted it into
microseconds, which overflowed if the timeout is very large. This causes
some tests which enable the UB sanitizer to fail.

We fix this problem by comparing the delay to the budget in millisecond
resolution, not in microsecond resolution. Then, if the calculated delay
is short enough, we return it using its full microsecond resolution.

Fixes #5412

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191205131130.16793-1-nyh@scylladb.com>
2019-12-11 14:10:54 +02:00
Nadav Har'El
2824d8f6aa Merge: alternator: Fix EQ operator for sets
Merged pull request https://github.com/scylladb/scylla/pull/5453
from Piotr Sarna:

Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:

[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 13:20:25 +02:00
Piotr Sarna
421db1dc9d alternator-test: remove XFAIL from set EQ test
With this series merged, test_update_expected_1_eq_set from
test_expected.py suite starts passing.
2019-12-11 12:07:39 +01:00
Piotr Sarna
a8e45683cb alternator: add EQ comparison for sets
Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:
[1, 3, 2] == [1, 2, 3]

Fixes #5021
2019-12-11 12:07:39 +01:00
Piotr Sarna
fb37394995 schema_tables: notify table deletions before creations
If a set of mutations contains both an entry that deletes a table
and an entry that adds a table with the same name, it's expected
to be a replacement operation (delete old + create new),
rather than a useless "try to create a table even though it exists
already and then immediately delete the original one" operation.
As such, notifications about the deletions should be performed
before notifications about the creations. The place that originally
suffered from this wrong order is view building - which in this case
created an incorrect duplicated entry in the view building bookkeeping,
and then immediately deleted it, resulting in having old, deprecated
entries with stale UUIDS lying in the build queue and never proceeding,
because the underlying table is long gone.
The issue is fixed by ensuring the order of notifications:
 - drops are announced first, view drops are announced before table drops;
 - creations follow, table creations are announced before views;
 - finally, changes to tables and views are announced;

Fixes #4382

Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test
2019-12-11 12:48:29 +02:00
Benny Halevy
d544df6c3c dist/ami/build_ami.sh: support incremental build of rpms (#5191)
Iterate over an array holding all rpm names to see if any
of them is missing from `dist/ami/files`. If they are missing,
look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh
was run manually before dist/ami/build_ami.sh we can just collect
the built rpms from its output dir.

If we're still missing any rpms, then run reloc/build_rpm.sh
and copy the required rpms from build/redhat/RPMS/x86_64.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
2019-12-11 12:48:29 +02:00
Amnon Heiman
f43285f39a api: replace swagger definition to use long instead of int (#5380)
In swagger 1.2 int is defined as int32.

We originally used int following the jmx definition, in practice
internally we use uint and int64 in many places.

While the API format the type correctly, an external system that uses
swagger-based code generator can face a type issue problem.

This patch replace all use of int in a return type with long that is defined as int64.

Changing the return type, have no impact on the system, but it does help
external systems that use code generator from swagger.

Fixes #5347

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-12-11 12:48:29 +02:00
Nadav Har'El
2abac32f2e Merged: alternator: Implement CONTAINS and NOT_CONTAINS in Expected
Merged pull request https://github.com/scylladb/scylla/pull/5447
by Dejan Mircevski.

Adds the last missing operators in the "Expected" parameter and re-enable
their tests.

Fixes #5034.
2019-12-11 12:48:29 +02:00
Cem Sancak
86b8036502 Fix DPDK mode in prepare script
Fixes #5455.
2019-12-11 12:48:29 +02:00
Calle Wilund
35089da983 conf/config: Add better descriptive text on server/client encryption
Provide some explanation on prio strings + direction to gnutls manual.
Document client auth option.
Remove confusing/misleading statement on "custom options"

Message-Id: <20191210123714.12278-1-calle@scylladb.com>
2019-12-11 12:48:28 +02:00
Dejan Mircevski
32af150f1d alternator: Implement NOT_CONTAINS operator in Expected
Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of
recognized operators, implement check_NOT_CONTAINS, and hook it up to
verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
bd2bd3c7c8 alternator: Implement CONTAINS operator in Expected
Enable existing CONTAINS test, implement check_CONTAINS, and hook it
up to verify_expected_one().

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 15:31:47 -05:00
Dejan Mircevski
5a56fd384c config: Add experimental_features option
When the user wants to turn on only some experimental features, they
can use this new option.  The existing `experimental` option is
preserved for backwards compatibility.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-10 11:47:03 -05:00
Piotr Sarna
9504bbf5a4 alternator: move unwrap_set to serialization header
The utility function for unwrapping a set is going to be useful
across source files, so it's moved to serialization.hh/serialization.cc.
2019-12-10 15:08:47 +01:00
Piotr Sarna
4660e58088 alternator: move rjson value comparison to rjson.hh
The comparison struct is going to be useful across source files,
so it's moved into rjson header, where it conceptually belongs anyway.
2019-12-10 15:08:47 +01:00
Botond Dénes
db0e2d8f90 scylla-gdb.py: document and add safety net to seastar::thread related commands
Almost all commands provided by `scylla-gdb.py` are safe to use. The
worst that could happen if they fail is that you won't get the desired
information. There is one notable exception: `scylla thread`. If
anything goes wrong while this command is executed - gdb crashes, a bug
in the command, etc. - there is a good change the process under
examination will crash. Sometimes this is fine, but other times e.g.
when live debugging a production node, this is unacceptable.
To avoid any accidents add documentation to all commands working with
`seastar::thread`. And since most people don't read documentation,
especially when debugging under pressure, add a safety net to the
`scylla thread` command. When run, this command will now warn of the
dangers and will ask for explicit acknowledgment of the risk of crash,
by means of passing an `--iamsure` flag. When this flag is missing, it
will refuse to run. I am sure this will be very annoying but I am also
sure that the avoided crashes are worth it.

As part of making `scylla thread` safe, its argument parsing code is
migrated to `argparse`. This changes the usage but this should be fine
because it is well documented.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>
2019-12-10 11:51:57 +02:00
Eliran Sinvani
765db5d14f build_ami: Trim ami description attribute to the allowed size
The ami description attribute is only allowed to be 255
characters long. When build_ami.sh generates an ami, it
generates an ami description which is a concatenation
of all of the componnents version strings. It can
happen that the description string is too long which
eventually causes the ami build to fail. This patch
trims the description string to 255 characters.
It is ok since the individual versions of the components
are also saved in tags attached to the image.

Tests:
 1. Reproduced with a long description and
    validated that it doesn't fail after the fix.

Fixes #5435

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>
2019-12-10 11:51:57 +02:00
Fabiano Lucchese
4333b37f9e scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379)
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.

This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is
used.

Fixes #4474
2019-12-10 11:51:57 +02:00
Pavel Emelyanov
3a21419fdb features: Remove _FEATURE suffix from hinted_handoff feature name
All the other features are named w/o one. The internal const-s
are all different, but I'm fixing it separately.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191209154310.21649-1-xemul@scylladb.com>
2019-12-10 11:51:57 +02:00
Dejan Mircevski
a26bd9b847 utils: Add enum_option
This allows us to accept command-line options with a predefined set of
valid arguments.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-12-09 09:45:59 -05:00
Calle Wilund
7c5e4c527d cdc_test: Add small test for altering base schema (add column) 2019-12-09 14:35:04 +00:00
Calle Wilund
cb0117eb44 cdc: Handle schema changes via migration manager callbacks
This allows us to create/alter/drop log and desc tables "atomically"
with the base, by including these mutations in the original mutation
set, i.e. batch create/alter tables.

Note that population does not happen until types are actually
already put into database (duh), thus there _is_ still a gap
between creating cdc and it being truly usable. This may or may
not need handling later.
2019-12-09 14:35:04 +00:00
Rafael Ávila de Espíndola
761b19cee5 build: Split the build and host linker flags
A general build system knows about 3 machines:

* build: where the building is running
* host: where the built software will run
* target: the machine the software will produce code for

The target machine is only relevant for compilers, so we can ignore
it.

Until now we could ignore the build and host distinction too. This
patch adds the first difference: don't use host ld_flags when linking
build tools (gen_crc_combine_table).

The reason for this change is to make it possible to build with
-Wl,--dynamic-linker pointing to a path that will exist on the host
machine, but may not exist on the build machine.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191207030408.987508-1-espindola@scylladb.com>
2019-12-09 15:54:57 +02:00
Calle Wilund
27183f648d migration_manager: Invoke "before" callbacks for table operations
Potentially allowing (cdc) augmentation of mutations.

Note: only does the listener part in seastar::thread, to avoid
changing call behaviour.
2019-12-09 12:12:09 +00:00
Calle Wilund
f78a3bf656 migration_listener: Add empty base class and "before" callbacks for tables
Empty base type makes for less boiler plate in implementations.
The "before" callbacks are for listeners who need to potentially
react/augment type creation/alteration _before_ actually
committing type to schema tables (and holding the semaphore for this).

I.e. it is for cdc to add/modify log/desc tables "atomically" with base.
2019-12-09 12:12:09 +00:00
Calle Wilund
4e406105b1 cql_test_env: Include cdc service in cql tests 2019-12-09 12:12:09 +00:00
Calle Wilund
a21e140169 cdc: Add sharded service that does nothing.
But can be used to hang functionality into eventually.
2019-12-09 12:12:09 +00:00
Calle Wilund
2787b0c4f8 cdc: Move "options" to separate header to avoid to much header inclusion
cdc should not contaminate the whole universe.
2019-12-09 12:12:09 +00:00
fastio
8f326b28f4 Redis: Combine all the source files redis/commands/* into redis/commands.{hh,cc}
Fixes: #5394

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-08 13:54:33 +02:00
Avi Kivity
9c63cd8da5 sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417)
The vm.swappiness sysctl controls the kernel's prefernce for swapping
anonymous memory vs page cache. Since Scylla uses very large amounts
of anonymous memory, and tiny amounts of page cache, the correct setting
is to prefer swapping page cache. If the kernel swaps anonymous memory
the reactor will stall until the page fault is satisfied. On the other
hand, page cache pages usually belong to other applications, usually
backup processes that read Scylla files.

This setting has been used in production in Scylla Cloud for a while
with good results.

Users can opt out by not installing the scylla-kernel-conf package
(same as with the other kernel tunables).
2019-12-08 13:04:25 +02:00
Avi Kivity
0e319e0359 Update seastar submodule
* seastar 166061da3...e440e831c (8):
  > Fail tests on ubsan errors
  > future: make a couple of asserts more strict
  > future: Move make_ready out of line
  > config: Do not allow zero rates
Fixes #5360
  > future: add new state to avoid temporaries in get_available_state().
  > future: avoid temporary future_state on get_available_state().
  > future: inline future::abandoned
  > noncopyable_function: Avoid uninitialized warning on empty types
2019-12-06 18:33:23 +02:00
Piotr Sarna
0718ff5133 Merge 'min/max on collections returns human-readable result' from Juliusz
Previously, scylla used min/max(blob)->blob overload for collections,
tuples and UDTs; effectively making the results being printed as blobs.
This PR adds "dynamically"-typed min()/max() functions for compound types.

These types can be complicated, like map<int,set<tuple<..., and created
in runtime, so functions for them are created on-demand,
similarly to tojson(). The comparison remains unchanged - underneath
this is still byte-by-byte weak lex ordering.

Fixes #5139

* jul-stas/5139-minmax-bad-printing-collections:
  cql_query_tests: Added tests for min/max/count on collections
  cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
2019-12-06 16:40:17 +01:00
Juliusz Stasiewicz
75955beb0b cql_query_tests: Added tests for min/max/count on collections
This tests new min/max function for collections and tuples. CFs
in test suite were named according to types being tested, e.g.
`cf_map<int,text>' what is not a valid CF name. Therefore, these
names required "escaping" of invalid characters, here: simply
replacing with '_'.
2019-12-06 12:15:49 +01:00
Juliusz Stasiewicz
9efad36fb8 cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
Before:
cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]);
cqlsh> select max(val) from ks.list_types;

 system.max(val)
------------------------------------------------------------
 0x00000003000000040000000300000004000000040000000400000005

After:
cqlsh> select max(val) from ks.list_types;

 system.max(val)
--------------------
 [3, 4, 5]

This is accomplished similarly to `tojson()`/`fromjson()`: functions
are generated on demand from within `cql3::functions::get()`.
Because collections can have a variety of types, including UDTs
and tuples, it would be impossible to statically define max(T t)->T
for every T. Until now, max(blob)->blob overload was used.

Because `impl_max/min_function_for` is templated with the
input/output type, which can be defined in runtime, we need type-erased
("dynamic") versions of these functors. They work identically, i.e.
they compare byte representations of lhs and rhs with
`bytes::operator<`.

Resolves #5139
2019-12-06 12:14:51 +01:00
Avi Kivity
a18a921308 docs: maintainer.md: use command line to merge multi-commit pull requests
If you merge a pull request that contains multiple patches via
the github interface, it will document itself as the committer.

Work around this brain damage by using the command line.
2019-12-06 10:59:46 +01:00
Botond Dénes
7b37a700e1 configure.py: make tests explicitely depend on libseastar_testing.a
So that changes to libseastar_testing.a make all test target out of
date.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191205142436.560823-1-bdenes@scylladb.com>
2019-12-05 19:30:34 +02:00
Piotr Sarna
3a46b1bb2b Merge "handle hints on separate connection and scheduling group" from Piotr
Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write.

The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.

Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION.

Fixes #4974.

Tests: unit(release)
2019-12-05 17:25:26 +01:00
Calle Wilund
c11874d851 gms::inet_address: Use special ostream formatting to match Java
To make gms::inet_address::to_string() similar in output to origin.
The sole purpose being quick and easy fix of API/JMX ipv6
formatting of endpoints etc, where strings are used as lexical
comparisons instead of textual representation.

A better, but more work, solution is to fix the scylla-jmx
bridge to do explicit parse + re-format of addresses, but there
are many such callpoints.

An even better solution would be to fix nodetool to not make this
mistake of doing lexical comparisons, but then we risk breaking
merge compatibility. But could be an option for a separate
nodeprobe impl.

Message-Id: <20191204135319.1142-1-calle@scylladb.com>
2019-12-05 17:01:26 +02:00
Gleb Natapov
4893bc9139 tracing: split adding prepared query parameters from stopping of a trace
Currently query_options objects is passed to a trace stopping function
which makes it mandatory to make them alive until the end of the
query. The reason for that is to add prepared statement parameters to
the trace.  All other query options that we want to put in the trace are
copied into trace_state::params_values, so lets copy prepared statement
parameters there too. Trace enabled case will become a little bit more
expensive but on the other hand we can drop a continuation that holds
query_options object alive from a fast path. It is safe to drop the call
to stop_foreground_prepared() here since The tracing will be stopped
in process_request_one().

Message-Id: <20191205102026.GJ9084@scylladb.com>
2019-12-05 17:00:47 +02:00
Tomasz Grabiec
aa173898d6 Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz
Selected semaphores' names are now included in exception messages in
case of timeout or when admission queue overflows.

Resolves #5281
2019-12-05 14:19:56 +01:00
Nadav Har'El
5b2f35a21a Merge "Redis: fix the options related to Redis API, fix the DEL and GET command"
Merged pull request https://github.com/scylladb/scylla/pull/5381 by
Peng Jian, fixing multiple small issues with Redis:

* Rename the options related to Redis API, and describe them clearly.
* Rename redis_transport_port to redis_port
* Rename redis_transport_port_ssl to redis_ssl_port
* Rename redis_default_database_count to redis_database_count
* Remove unnecessary option enable_redis_protocol
* Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM

* Fix the DEL command: support to delete mutilple keys in one command.

* Fix the GET command: return the empty string when the required key is not exists.

* Fix the redis-test/test_del_non_existent_key: mark xfail.
2019-12-05 11:58:34 +02:00
Avi Kivity
85822c7786 database: fix schema use-after-move in make_multishard_streaming_reader
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.

Fix by evaluating full_slice before moving the schema.

Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.

Fixes #5419.
2019-12-05 11:58:34 +02:00
Piotr Sarna
79c3a508f4 table: Reduce read amplification in view update generation
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418
2019-12-05 11:58:34 +02:00
Konstantin Osipov
6a5e7c0e22 tests: reduce the number of iterations of dynamic_bitset_test
This test execution time dominates by a serious margin
test execution time in dev/release mode: reducing its
execution time improves the test.py turnaround by over 70%.

Message-Id: <20191204135315.86374-2-kostja@scylladb.com>
2019-12-05 11:58:34 +02:00
Avi Kivity
07427c89a2 gdb: change 'scylla thread' command to access fs_base register directly
Currently, 'scylla thread' uses arch_prctl() to extract the value of
fsbase, used to reference thread local variables. gdb 8 added support
for directly accessing the value as $fs_base, so use that instead. This
works from core dumps as well as live processes, as you don't need to
execute inferior functions.

The patch is required for debugging threads in core dumps, but not
sufficient, as we still need to set $rip and $rsp, and gdb still[1]
doesn't allow this.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370
2019-12-05 11:58:34 +02:00
Piotr Dulikowski
adfa7d7b8d messaging_service: don't move unsigned values in handlers
Performing std::move on integral types is pointless. This commit gets
rid of moves of values of `unsigned` type in rpc handlers.
2019-12-05 00:58:31 +01:00
Piotr Dulikowski
77d2ceaeba storage_proxy: handle hints through separate rpc verb 2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2609065090 storage_proxy: move register_mutation handler to local lambda
This refactor makes it possible to reuse the lambda in following
commits.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
6198ee2735 hh: introduce HINTED_HANDOFF_SEPARATE_CONNECTION feature
The feature introduced by this commit declares that hints can be sent
using the new dedicated RPC verb. Before using the new verb, nodes need
to know if other nodes in the cluster will be able to handle the new
RPC verb.
2019-12-05 00:51:52 +01:00
Piotr Dulikowski
2e802ca650 hh: add HINT_MUTATION verb
Introduce a new verb dedicated for receiving and sending hints:
HINT_MUTATION. It is handled on the streaming connection, which is
separate from the one used for handling mutations sent by coordinator
during a write.

The intent of using a separate connection is to increase fariness while
handling hints and user requests - this way, a situation can be avoided
in which one type of requests saturate the connection, negatively
impacting the other one.
2019-12-05 00:51:49 +01:00
Avi Kivity
fd951a36e3 Merge "Let compaction wait on background deletions" from Benny
"
In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done.
However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted.

This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish.

Fixes #4909

Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction
"
2019-12-04 11:18:41 +02:00
Takuya ASADA
c9d8606786 dist/common/scripts/scylla_ntp_setup: relax RHEL version check
We may able to use chrony setup script on future version of RHEL/CentOS,
it better to run chrony setup when RHEL version >= 8, not only 8.

Note that on Fedora it still provides ntp/ntpdate package, so we run
ntp setup on it for now. (same on debian variants)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191203192812.5861-1-syuu@scylladb.com>
2019-12-04 10:59:14 +02:00
Juliusz Stasiewicz
430b2ad19d commitlog+region_group: timeout exceptions with names
`segment_manager' now uses a decorated version of `timed_out_error'
with hardcoded name. On the other hand `region_group' uses named
`on_request_expiry' within its `expiring_fifo'.
2019-12-03 19:07:19 +01:00
Avi Kivity
91d3f2afce docs: maintainers.md: fix typo in git push --force-with-lease
Just one lease, not many.

Reported by Piotr Sarna.
2019-12-03 18:17:46 +01:00
Calle Wilund
56a5e0a251 commitlog_replayer: Ensure applied frozen_mutation is safe during apply
Fixes #5211

In 79935df959 replay apply-call was
changed from one with no continuation to one with. But the frozen
mutation arg was still just lambda local.

Change to use do_with for this case as well.

Message-Id: <20191203162606.1664-1-calle@scylladb.com>
2019-12-03 18:28:01 +02:00
Juliusz Stasiewicz
d043393f52 db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore
Exception messages contain semaphore's name (provided in ctor).
This affects the queue overflow exception as well as timeout
exception. Also, custom throwing function in ctor was changed
to `prethrow_action', i.e. metrics can still be updated there but
now callers have no control over the type of the exception being
thrown. This affected `restricted_reader_max_queue_length' test.
`reader_concurrency_semaphore'-s docs are updated accordingly.
2019-12-03 15:41:34 +01:00
Amos Kong
e26b396f16 scylla-docker: fix default data_directories in scyllasetup.py (#5399)
Use default data_file_directories if it's not assigned in scylla.yaml

Fixes #5398

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 13:58:17 +02:00
Rafael Ávila de Espíndola
1cd17887fa build: strip debug when configured with --debuginfo 0
In a build configured with --debuginfo 0 the scylla binary still ends
up with some debug info from the libraries that are statically linked
in.

We should avoid compiling subprojects (including seastar) with debug
info when none is needed, but this at least avoids it showing up in
the binary.

The main motivation for this is that it is confusing to get a binary
with *some* debug info in it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191127215843.44992-1-espindola@scylladb.com>
2019-12-03 12:41:04 +02:00
Tomasz Grabiec
0a453e5d30 Merge "Use fragmented buffers for collection de/serialization" from Botond
This series refactors the collection de/serialization code to use
fragmented buffers, avoiding the large allocations and the associated
pains when working with large collections. Currently all operations that
involve collections require deserializing them, executing the operation,
then serializing them again to their internal storage format. The
de/serialization operations happen in linearized buffers, which means
that we have to allocate a buffer large enough to hold the *entire*
collection. This can cause immense pressure on the memory allocator,
which, in the face of memory fragmentation, might be unable to serve the
allocation at all. We've seen this causing all sorts of nasty problems,
including but not limited to: failing compactions, failing memtable
flush, OOM crash and etc.

Users are strongly discouraged from using large collections, yet they
are still a fact of life and have been haunting us since forever.

The proper solution for these problems would be to come up with an
in-memory format for collections, however that is a major effort, with a
lot of unknowns. This is something we plan on doing at some point but
until it happens we should make life less painful for those with large
collections.

The goal of this series is to avoid the need of allocating these large
buffers. Serialization now happens into a `bytes_ostream` which
automatically fragments the values internally. Deserialization happens
with `utils::linearizing_input_stream` (introduced by this series), which
linearizes only the individual collection cells, but not the entire
collection.
An important goal of this series was to introduce the least amount of
risk, and hence the least amount of code. This series does not try to
make a revolution and completely revamp and optimize the
de/serialization codepaths. These codepaths have their days numbered so
investing a lot of effort into them is in vain. We can apply incremental
optimizations where we deem it necessary.

Fixes: #5341
2019-12-03 10:31:34 +01:00
fastio
01599ffbae Redis API: Support the syntax of deleting multiple keys in one DEL command, fix the returning value for GET command.
Support to delete multiple keys in one DEL command.
The feature of returning number of the really deleted keys is still not supported.
Return empty string to client for GET command when the required key is not exists.

Fixes: #5334

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:27:40 +08:00
fastio
039b83ad3b Redis API: Rename options related to Redis API, describe them clearly, and remove unnecessary one.
Rename option redis_transport_port to redis_port, which the redis transport listens on for clients.
Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients.
Rename option redis_database_count. Set the redis dabase count.
Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace.
Remove option enable_redis_protocol, which is unnecessary.

Fixes: #5335

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-12-03 17:13:35 +08:00
Nadav Har'El
7b93360c8d Merge: redis: skip processing request of EOF
Merged pull request https://github.com/scylladb/scylla/pull/5393/ by
Amos Kong:
`
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
2019-12-03 10:40:20 +02:00
Avi Kivity
83feb9ea77 tools: toolchain: update frozen image
Commit 96009881d8 added diffutils to the dependencies via
Seastar's install-dependencies.sh, after it was inadvertantly
dropped in 1164ff5329 (update to Fedora 31; diffutils is no
longer brought in as a side effect of something else).

Regenerate the image to include diffutils.

Ref #5401.
2019-12-03 10:36:55 +02:00
Amos Kong
fb9af2a86b redis-test: add test_raw_cmd.py
This patch added subtests for EOF process, it reads and writes the socket
directly by using protocol cmds.

We can add more tests in future, tests with Redis module will hide some
protocol error.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Amos Kong
4fa862adf4 redis: skip processing request of EOF
When I test the redis cmd by echo and nc, there is a redundant error in the end.
I checked by strace, currently if client read nothing from stdin, it will
shutdown the socket, redis server will read nothing (0 byte) from socket. But
it tries to process the empty command and returns an error.

$ echo -n -e '*1\r\n$4\r\nping\r\n' |strace nc localhost 6379
| ...
|    read(0, "*1\r\n$4\r\nping\r\n", 8192)   = 14
|    select(5, [4], [4], [], NULL)           = 1 (out [4])
|>>> sendto(4, "*1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14
|    select(5, [0 4], [], [], NULL)          = 1 (in [0])
|    recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket)
|    read(0, "", 8192)                       = 0
|>>> shutdown(4, SHUT_WR)                    = 0
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32
|    write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG
|    -ERR unknown command ''
|    ) = 32
|    select(5, [4], [], [], NULL)            = 1 (in [4])
|    recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0
|    close(1)                                = 0
|    close(4)                                = 0

Current result:
  $ echo -n -e '' |nc localhost 6379
  -ERR unknown command ''
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG
  -ERR unknown command ''

Expected:
  $ echo -n -e '' |nc localhost 6379
  $ echo -n -e '*1\r\n$4\r\nping\r\n' |nc localhost 6379
  +PONG

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-12-03 10:47:56 +08:00
Rafael Ávila de Espíndola
bb114de023 dbuild: Fix confusion about relabeling
podman needs to relabel directories in exactly the same cases docker
does. The difference is that podman cannot relabel /tmp.

The reason it was working before is that in practice anyone using
dbuild has already relabeled any directories that need relabeling,
with the exception of /tmp, since it is recreated on every boot.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-2-espindola@scylladb.com>
2019-12-02 18:38:16 +02:00
Rafael Ávila de Espíndola
867cdbda28 dbuild: Use a temporary directory for /tmp
With this we don't have to use --security-opt label=disable.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-1-espindola@scylladb.com>
2019-12-02 18:38:14 +02:00
Botond Dénes
1d1f8b0d82 tests: mutation_test: add large collection allocation test
Checking that there are no large allocations when a large collection is
de/serialized.
2019-12-02 17:13:53 +02:00
Avi Kivity
28355af134 docs: add maintainer's handbook (#5396)
This is a list of recipes used by maintainers to maintain
scylla.git.
2019-12-02 15:01:54 +02:00
Calle Wilund
8c6d6254cf cdc: Remove some code from header 2019-12-02 13:00:19 +00:00
Botond Dénes
4c59487502 collection_mutation: don't linearize the buffer on deserialization
Use `utils::linearizing_input_stream` for the deserizalization of the
collection. Allows for avoiding the linearization of the entire cell
value, instead only linearizing individual values as they are
deserialized from the buffer.
2019-12-02 10:10:31 +02:00
Botond Dénes
690e9d2b44 utils: introduce linearizing_input_stream
`linearizing_input_stream` allows transparently reading linearized
values from a fragmented buffer. This is done by linearizing on-the-fly
only those read values that happen to be split across multiple
fragments. This reduces the size of the largest allocation from the size
of the entire buffer (when the entire buffer is linearized) to the size
of the largest read value. This is a huge gain when the buffer contains
loads of small objects, and modest gains when the buffer contains few
large objects. But the even in the worst case the size of the largest
allocation will be less or equal compared to the case where the entire
buffer is linearized.

This stream is planned to be used as glue code between the fragmented
cell value and the collection deserialization code which expects to be
reading linearized values.
2019-12-02 10:10:31 +02:00
Botond Dénes
065d8d37eb tests: random-utils: get_string(): add overload that takes engine parameter 2019-12-02 10:10:31 +02:00
Botond Dénes
2f9307c973 collection_mutation: use a fragmented buffer for serialization
For the serialization `bytes_ostream` is used.
2019-12-02 10:10:31 +02:00
Botond Dénes
fc5b096f73 imr: value_writer::write_to_destination(): don't dereference chunk iterator eagerly
Currently the loop which writes the data from the fragmented origin to
the destination, moves to the next chunk eagerly after writing the value
of the current chunk, if the current chunk is exhausted.
This presents a problem when we are writing the last piece of data from
the last chunk, as the chunk will be exhausted and we eagerly attempt to
move to the next chunk, which doesn't exist and dereferencing it will
fail. The solution is to not be eager about moving to the next chunk and
only attempt it if we actually have more data to write and hence expect
more chunks.
2019-12-02 10:10:31 +02:00
Botond Dénes
875314fc4b bytes_ostream: make it a FragmentRange
The presence of `const_iterator` seems to be a requirement as well
although it is not part of the concept. But perhaps it is just an
assumption made by code using it.
2019-12-02 10:10:31 +02:00
Botond Dénes
4054ba0c45 serialization: accept any CharOutputIterator
Not just bytes::output_iterator. Allow writing into streams other than
just `bytes`. In fact we should be very careful with writing into
`bytes` as they require potentially large contiguous allocations.

The `write()` method is now templatized also on the type of its first
argument, which now accepts any CharOutputIterator. Due to our poor
usage of namespace this now collides with `write` defined inside
`db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to
be templatized on the data type it reads from, and de-templatizing it
resolves the clash.
2019-12-02 10:10:31 +02:00
Botond Dénes
07007edab9 bytes_ostream: add output_iterator
To allow it being used for serialization code, which works in terms of
output iterators.
2019-12-02 10:10:31 +02:00
Takuya ASADA
c5a95210fe dist/common/scripts/scylla_setup: list virtio-blk devices correctly on interactive RAID setup
Currently interactive RAID setup prompt does not list virtio-blk devices due to
following reasons:
 - We fail matching '-p' option on 'lsblk --help' output since misusage of
   regex functon, list_block_devices() always skipping to use lsblk output.
 - We don't check existance of /dev/vd* when we skipping to use lsblk.
 - We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e'
   option, but we actually needed them.

To fix the problem we need to use re.search() instead of re.match() to match
'-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list,
then need to stop '-e 252' option on lsblk which excludes virtio-blk.

Additionally, it better to parse 'TYPE' field of lsblk output, we should skip
'loop' devices and 'rom' devices since these are not disk devices.

Fixes #4066

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191201160143.219456-1-syuu@scylladb.com>
2019-12-01 18:36:48 +02:00
Takuya ASADA
124da83103 dist/common/scripts: use chrony as NTP server on RHEL8/CentOS8
We need to use chrony as NTP server on RHEL8/CentOS8, since it dropped
ntpd/ntpdate.

Fixes #4571

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191101174032.29171-1-syuu@scylladb.com>
2019-12-01 18:35:03 +02:00
Nadav Har'El
b82417ba27 Merge "alternator: Implement Expected operators LE, GE, and BETWEEN"
Merged pull request https://github.com/scylladb/scylla/pull/5392 from
Dejan Mircevski.

Refs #5034

The patches:
  alternator: Implement LE operator in Expected
  alternator: Implement GE operator in Expected
  alternator: Make cmp diagnostic a value, not funct
  utils: Add operator<< for big_decimal
  alternator: Implement BETWEEN operator in Expected
2019-12-01 16:11:11 +02:00
Nadav Har'El
8614c30bcf Merge "implement echo command"
Merged pull request https://github.com/scylladb/scylla/pull/5387 from
Amos Kong:

This patch implemented echo command, which return the string back to client.

Reference:

    https://redis.io/commands/echo
2019-12-01 10:29:57 +02:00
Amos Kong
49fee4120e redis-test: add test_echo
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:32:00 +08:00
Amos Kong
3e2034f07b redis: implement echo command
This patch implemented echo command, which return the string back to client.

Reference:
- https://redis.io/commands/echo

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-30 13:30:35 +08:00
Dejan Mircevski
dcb1b360ba alternator: Implement BETWEEN operator in Expected
Enable existing BETWEEN test, and add some more coverage to it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 16:47:21 -05:00
Dejan Mircevski
c43b286f35 utils: Add operator<< for big_decimal
... and remove an existing duplicate from lua.cc.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:32:09 -05:00
Dejan Mircevski
e0d77739cc alternator: Make cmp diagnostic a value, not funct
All check_compare diagnostics are static strings, so there's no need
to call functions to get them.  Instead of a function, make diagnostic
a simple value.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 15:09:05 -05:00
Dejan Mircevski
65cb84150a alternator: Implement GE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 12:29:08 -05:00
Dejan Mircevski
f201f0eaee alternator: Implement LE operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-11-29 11:59:52 -05:00
Avi Kivity
96009881d8 Update seastar submodule
* seastar 8eb6a67a4...166061da3 (3):
  > install-dependencies.sh: add diffutils
  > reactor: replace std::optional (in _network_stack_ready) with compat::optional
  > noncopyable_function: disable -Wuninitialized warning in noncopyable_function_base

Ref #5386.
2019-11-29 12:50:48 +02:00
Tomasz Grabiec
6562c60c86 Merge "test.py: terminate children upon signal" from Kostja
Allows a signal to terminate the outstanding
test tasks, to avoid dangling children.
2019-11-29 12:05:03 +02:00
Pekka Enberg
bb227cf2b4 Merge "Fix default directories in Scylla setup scripts" from Amos
"Fix two problem in scylla_io_setup:

 - Problem 1: paths of default directories is invalid, introduced by
   commit 5ec1915 ("scylla_io_setup: assume default directories under
   /var/lib/scylla").

 - Problem 2: wrong path join, introduced by commit 31ddb21
   ("dist/common/scripts: support nonroot mode on setup scripts").

Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py:

  - Fixed default scylla directories when they aren't assigned in
    scylla.yaml"

Fixes #5370

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>

* 'scylla_io_setup' of git://github.com/amoskong/scylla:
  use parse_scylla_dirs_with_default to get scylla directories
  scylla_io_setup: fix data_file_directories check
  scylla_util: introduce helper to process the default scylla directories
  scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
  scylla_io_setup: fix path join of default scylla directories
2019-11-29 12:05:03 +02:00
Ultrabug
61f1e6e99c test.py: fix undefined variable 'options' in write_xunit_report() 2019-11-28 19:06:22 +03:00
Ultrabug
5bdc0386c4 test.py: comparison to False should be 'if cond is False:' 2019-11-28 19:06:22 +03:00
Ultrabug
737b1cff5e test.py: use isinstance() for type comparison 2019-11-28 19:06:22 +03:00
Konstantin Osipov
c611325381 test.py: terminate children upon signal
Use asyncio as a more modern way to work with concurrency,
Process signals in an event loop, terminate all outstanding
tests before exiting.

Breaking change: this commit requires Python 3.7 or
newer to run this script. The patch adds a version
check and a message to enforce it.
2019-11-28 19:06:22 +03:00
Botond Dénes
cf24f4fe30 imr: move documentation to docs/
Where all the other documentation is, and hence where people would be
looking for it.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191128144612.378244-1-bdenes@scylladb.com>
2019-11-28 16:47:52 +02:00
Avi Kivity
36dd0140a8 Update seastar submodule
* seastar 5c25de907a...8eb6a67a4b (1):
  > util/backtrace.hh: add missing print.hh include
2019-11-28 16:47:16 +02:00
Benny Halevy
7aef39e400 tracing: one_session_records: keep local tracing ptr
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.

Fixes #5243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-28 15:24:10 +01:00
Gleb Natapov
75499896ab client_state: store _user as optional instead of shared_ptr
_user cannot outlive client_state class instance, so there is no point
in holding it in shared_ptr.

Tested: debug test.py and dtest auth_test.py

Message-Id: <20191128131217.26294-5-gleb@scylladb.com>
2019-11-28 15:48:59 +02:00
Gleb Natapov
1538cea043 cql: modification_statement: store _restrictions as optional instead of shared_ptr
_restrictions can be optional since its lifetime is managed by
modification_statement class explicitly.

Message-Id: <20191128131217.26294-4-gleb@scylladb.com>
2019-11-28 15:48:54 +02:00
Gleb Natapov
ce5d6d5eee storage_service: store thrift server as an optional instead of shared_ptr
Only do_stop_rpc_server uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-3-gleb@scylladb.com>
2019-11-28 15:48:51 +02:00
Gleb Natapov
b9b99431a8 storage_service: store cql server as an optional instead of shared_ptr
Only do_stop_native_transport() uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.

Message-Id: <20191128131217.26294-2-gleb@scylladb.com>
2019-11-28 15:48:47 +02:00
Avi Kivity
2b7e97514a Update seastar submodule
* seastar 6f0ef32514...5c25de907a (7):
  > shared_future: Fix crash when all returned futures time out
Fixes #5322.
  > future: don't create temporaries on get_value().
  > reactor: lower the default stall threshold to 200ms
  > reactor: Simplify network initialization
  > reactor: Replace most std::function with noncopyable_function
  > futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode
  > inet_address: Make inet_address == operator ignore scope (again)
2019-11-28 14:48:01 +02:00
Juliusz Stasiewicz
fa12394dfe reader_concurrency_semaphore: cosmetic changes
Added line breaks, replaced unused include, included seastarx.hh
instead of `using namespace seastar`.
2019-11-28 13:39:08 +01:00
Nadav Har'El
fde336a882 Merged "5139 minmax bad printing"
Merged pull request https://github.com/scylladb/scylla/pull/5311 from
Juliusz Stasiewicz:

This is a partial solution to #5139 (only for two types) because of the
above and because collections are much harder to do. They are coming in
a separate PR.
2019-11-28 14:06:43 +02:00
Juliusz Stasiewicz
3b9ebca269 tests/cql_query_test: add test for aggregates on inet+time_type
This is a test to max(), min() and count() system functions on
the arguments of types: `net::inet_address` and `time_native_type`.
2019-11-28 11:20:43 +01:00
Juliusz Stasiewicz
9c23d89531 cql3/functions: add missing min/max/count for inet and time type
References #5139. Aggregate functions, like max(), when invoked
on `inet_address' and `time_native_type' used to choose
max(blob)->blob overload, with casting of argument and result to
bytes. This is because appropriate calls to
`aggregate_fcts::make_XXX_function()' were missing. This commit
adds them. Functioning remains the same but now clients see
user-friendly representations of aggregate result, not binary.

Comparing inet addresses without inet::operator< is performed by
trick, where ADL is bypassed by wrapping the name of std::min/max
and providing an overload of wrapper on inet type.
2019-11-28 11:18:31 +01:00
Pavel Emelyanov
8532093c61 cql: The cql_server does not need proxy reference
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191127153842.4098-1-xemul@scylladb.com>
2019-11-28 10:58:46 +01:00
Amos Kong
e2eb754d03 use parse_scylla_dirs_with_default to get scylla directories
Use default data_file_directories/commitlog_directory if it's not assigned
in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:48:14 +08:00
Amos Kong
bd265bda4f scylla_io_setup: fix data_file_directories check
Use default data_file_directories if it's not assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 15:47:56 +08:00
Amos Kong
123c791366 scylla_util: introduce helper to process the default scylla directories
Currently we support to assign workdir from scylla.yaml, and we use many
hardcode '/var/lib/scylla' in setup scripts.

Some setup scripts get scylla directories by parsing scylla.yaml, introduced
parse_scylla_dirs_with_default() that adds default values if scylla directories
aren't assigned in scylla.yaml

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:54:32 +08:00
Amos Kong
b75061b4bc scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:38:01 +08:00
Amos Kong
ada0e92b85 scylla_io_setup: fix path join of default scylla directories
Currently we are checking an invalid path of some default scylla directories,
the directories don't exist, so the tune will always be skipped. It caused by
two problem.

Problem 1: paths of default directories is invalid

Introduced by commit 5ec191536e, we try to tune some scylla default directories
if they exist. But the directory paths we try are wrong.

For example:
- What we check: /var/lib/scylla/commitlog_directory
- Correct one: /var/lib/scylla/commitlog

Problem 2: wrong path join

Introduced by commit 31ddb2145a, default_path might be replaced from
'/var/lib/scylla/' to '/var/lib/scylla'.

Our code tries to check an invalid path that is wrongly join, eg:
'/var/lib/scyllacommitlog'

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-28 14:37:58 +08:00
Amos Kong
d4a26f2ad0 scylla_util: get_scylla_dirs: return default data/commitlog directories if they aren't set (#5358)
The default values of data_file_directories and commitlog_directory were
commented by commit e0f40ed16a. It causes scylla_util.py:get_scylla_dirs() to
fail in checking the values.

This patch changed get_scylla_dirs() to return default data/commitlog
directories if they aren't set.

Fixes #5358 

Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-27 13:52:05 +02:00
Nadav Har'El
cb1ed5eab2 alternator-test: test Query's Limit parameter
Add a test, test_query.py::test_query_limit, to verify that the Limit
parameter correctly limits the number of rows returned by the Query.
This was supposed to already work correctly - but we never had a test for
it. As we hoped, the test passes (on both Alternator and DynamoDB).

Another test, test_query.py::test_query_limit_paging, verifies that
paging can be done with any setting of Limit. We already had tests
for paging of the Scan operation, but not for the Query operation.

Refs #5153

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:27:26 +01:00
Nadav Har'El
c01ca661a0 alternator-test: Select parameter of Query and Scan
This is a comprehensive test for the "Select" parameter of Query and Scan
operations, but only for the base-table case, not index, so another future
patch should add similar tests in test_gsi.py and test_lsi.py as well.

The main use of the Select parameter is to allow returning just the count
of items, instead of their content, but it also has other esoteric options,
all of which we test here.

The test currently succeeds on AWS DynamoDB, demonstrating that the test
is correct, but fails on Alternator because the "Select" parameter is not
yet supported. So the test is marked xfail.

Refs #5058

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-27 12:22:33 +01:00
Botond Dénes
9d09f57ba5 scylla-gdb.py: scylla_smp_queues: use lazy initalization
Currently the command tries to read all seastar smp queues in its
initialization code in the constructor. This constructor is run each
time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and
sometimes also annoying errors because the sourcing happens in the wrong
context and seastar symbols are not available.
Avoid this by running this initializing code lazily, on the first
invocation.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>
2019-11-27 12:04:57 +01:00
Tomasz Grabiec
87b72dad3e Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov
This patchset adds missing "const" function qualifiers throughout
the Scylla code base, which would make code less error-prone.

The changeset incorporates Kostja's work regarding const qualifiers
in the cql code hierarchy along with a follow-up patch addressing the
review comment of the corresponding patch set (the patch subject is
"cql: propagate const property through prepared statement tree.").
2019-11-27 10:56:20 +01:00
Rafael Ávila de Espíndola
91b43f1f06 dbuild: fix podman with selinux enabled
With this change I am able to run tests using docker-podman. The
option also exists in docker.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194101.25221-1-espindola@scylladb.com>
2019-11-26 21:50:56 +02:00
Rafael Ávila de Espíndola
480055d3b5 dbuild: Fix missing docker options
With the recent changes docker was missing a few options. In
particular, it was missing -u.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126194347.25699-1-espindola@scylladb.com>
2019-11-26 21:45:31 +02:00
Rafael Ávila de Espíndola
c0a2cd70ff lua: fix test with boost 1.66
The boost 1.67 release notes says

Changed maximum supported year from 10000 to 9999 to resolve various issues

So change the test to use a larger number so that we get an exception
with both boost 1.66 and boost 1.67.

Fixes #5344

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126180327.93545-1-espindola@scylladb.com>
2019-11-26 21:17:15 +02:00
Pavel Solodovnikov
55a1d46133 cql: some more missing const qualifiers
There are several virtual functions in public interfaces named "is_*"
that clearly should be marked as "const", so fix that.
2019-11-26 17:57:51 +03:00
Pavel Solodovnikov
412f1f946a cql: remove "mutable" on _opts in select_statement
_opts initialization can be safely done in the constructor, hence no need to make it mutable.
2019-11-26 17:55:10 +03:00
Piotr Sarna
d90dbd6ab0 Merge "support podman as a replacement to docker" from Avi
Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8.
Podman is a drop-in replacement for docker; this series adds support
for using podman in dbuild.

Apart from actually working on Fedora 31 hosts,
podman is nicer in being more secure and not requiring a daemon.

Fixes #5332
2019-11-26 15:17:49 +01:00
Tomasz Grabiec
5c9fe83615 Merge "Sanitize sub-modules shutting down" from Pavel
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "[it] was successull" one. In between it catches
the exception (if any) and warns this in logs.

By "then" I mean literally then, not the seastar's then() :)

Fixes: #4586
2019-11-26 15:14:22 +02:00
Piotr Sarna
9c5a5a5ac2 treewide: add names to semaphores
By default, semaphore exceptions bring along very little context:
either that a semaphore was broken or that it timed out.
In order to make debugging easier without introducing significant
runtime costs, a notion of named semaphore is added.
A named semaphore is simply a semaphore with statically defined
name, which is present in its errors, bringing valuable context.
A semaphore defined as:

  auto sem = semaphore(0);

will present the following message when it breaks:
"Semaphore broken"
However, a named semaphore:

  auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"});

will present a message with at least some debugging context:

  "Semaphore broken: io_concurrency_sem"

It's not much, but it would really help in pinpointing bugs
without having to inspect core dumps.

At the same time, it does not incur any costs for normal
semaphore operations (except for its creation), but instead
only uses more CPU in case an error is actually thrown,
which is considered rare and not to be on the hot path.

Refs #4999

Tests: unit(dev), manual: hardcoding a failure in view building code
2019-11-26 15:14:21 +02:00
Avi Kivity
6fbb724140 conf: remove unsupported options from scylla.yaml (#5299)
These unsupported options do nothing except to confuse users who
try to tune them.

Options removed:

hinted_handoff_throttle_in_kb
max_hints_delivery_threads
batchlog_replay_throttle_in_kb
key_cache_size_in_mb
key_cache_save_period
key_cache_keys_to_save
row_cache_size_in_mb
row_cache_save_period
row_cache_keys_to_save
counter_cache_size_in_mb
counter_cache_save_period
counter_cache_keys_to_save
memory_allocator
saved_caches_directory
concurrent_reads
concurrent_writes
concurrent_counter_writes
file_cache_size_in_mb
index_summary_capacity_in_mb
index_summary_resize_interval_in_minutes
trickle_fsync
trickle_fsync_interval_in_kb
internode_authenticator
native_transport_max_threads
native_transport_max_concurrent_connections
native_transport_max_concurrent_connections_per_ip
rpc_server_type
rpc_min_threads
rpc_max_threads
rpc_send_buff_size_in_bytes
rpc_recv_buff_size_in_bytes
internode_send_buff_size_in_bytes
internode_recv_buff_size_in_bytes
thrift_framed_transport_size_in_mb
concurrent_compactors
compaction_throughput_mb_per_sec
sstable_preemptive_open_interval_in_mb
inter_dc_stream_throughput_outbound_megabits_per_sec
cross_node_timeout
streaming_socket_timeout_in_ms
dynamic_snitch_update_interval_in_ms
dynamic_snitch_reset_interval_in_ms
dynamic_snitch_badness_threshold
request_scheduler
request_scheduler_options
throttle_limit
default_weight
weights
request_scheduler_id
2019-11-26 15:14:21 +02:00
Amos Kong
817f34d1a9 ami: support new aws instance types: c5d, m5d, m5ad, r5d, z1d (#5330)
Currently scylla_io_setup will skip in scylla_setup, because we didn't support
those new instance types.

I manually executed scylla_io_setup, and the scylla-server started and worked
well.

Let's apply this patch first, then check if there is some new problem in
ami-test.

Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-26 15:14:21 +02:00
Konstantin Osipov
90346236ac cql: propagate const property through prepared statement tree.
cql_statement is a class representing a prepared statement in Scylla.
It is used concurrently during execution, so it is important that its
change is not changed by execution.

Add const qualifier to the execution methods family, throghout the
cql hierarchy.

Mark a few places which do mutate prepared statement state during
execution as mutable. While these are not affecting production today,
as code ages, they may become a source of latent bugs and should be
moved out of the prepared state or evaluated at prepare eventually:

cf_property_defs::_compaction_strategy_class
list_permissions_statement::_resource
permission_altering_statement::_resource
property_definitions::_properties
select_statement::_opts
2019-11-26 14:18:17 +03:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Pavel Emelyanov
50a1ededde main: Remove now unused defer-with-log helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
a0f92d40ee main: Shut down sighup handler with verbose helper
And (!) fix the misprinted variable name.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
0719369d83 repair: Remove extra logging on shutdown
The shutdown start/finish messages are already printed in verbose_shutdown()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
2d64fc3a3e main: Shut down database with verbose_shutdown helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
636c300db5 main: Shut down prometheus with verbose_shutdown()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

---

v2:
- Have stop easrlier so that exception in start/listen do
  not prevent prometheu.stop from calling
2019-11-25 18:47:03 +03:00
Pavel Emelyanov
804b152527 main: Sanitize shutting down callbacks
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "shutting down foo was successfull". In between
it catches the exception (if any) and warns this in logs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-25 18:45:49 +03:00
Nadav Har'El
4160b3630d Merge "Return preimage from CDC only when it's enabled"
Merged pull request https://github.com/scylladb/scylla/pull/5218
from Piotr Jastrzębski:

Users should be able to decide whether they need preimage or not. There is
already an option for that but it's not respected by the implementation.
This PR adds support for this functionality.

Tests: unit(dev).

Individual patches:
  cdc: Don't take storage_proxy as transformer::pre_image_select param
  cdc::append_log_mutations: use do_with instead of shared_ptr
  cdc::append_log_mutations: fix undefined behavior
  cdc: enable preimage in test_pre_image_logging test
  cdc: Return preimage only when it's requested
  cdc: test both enabled and disabled preimage in test_pre_image_logging
2019-11-25 14:32:17 +02:00
Pavel Emelyanov
f6ac969f1e mm: Stop migration manager
Before stopping the db itself, stop the migration service.
It must be stopped before RPC, but RPC is not stopped yet
itself, so we should be safe here.

Here's the tail of the resulting logs:

INFO  2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager
INFO  2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started
INFO  2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished
INFO  2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server
INFO  2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete.

Also -- stop the mm on drain before the commitlog it stopped.
[Tomasz: mm needs the cl because pulling schema changes from other nodes
involves applying them into the database. So cl/db needs to be
stopped after mm is stopped.]

The drain logs would look like

...
INFO  2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service
INFO  2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED:

and then on stop

...
INFO  2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started
INFO  2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished
INFO  2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server
INFO  2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete.

Fixes #5300

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191125080605.7661-1-xemul@scylladb.com>
2019-11-25 12:59:01 +01:00
Asias He
6ec602ff2c repair: Fix rx_hashes_nr metrics (#5213)
In get_full_row_hashes_with_rpc_stream and
repair_get_row_diff_with_rpc_stream_process_op which were introduced in
the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not
updated correctly.

In the test we have 3 nodes and run repair on node3, we makes sure the
following metrics are correct.

assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'],
   	    node3_metrics['scylla_repair_rx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'],
   	    node3_metrics['scylla_repair_tx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'],
   	    node3_metrics['scylla_repair_rx_row_nr'])
assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'],
   	    node3_metrics['scylla_repair_tx_row_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'],
   	    node3_metrics['scylla_repair_rx_row_bytes'])
assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'],
            node3_metrics['scylla_repair_tx_row_bytes'])

Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test
Fixes: #5339
Backports: 3.2
2019-11-25 13:57:37 +02:00
Piotr Jastrzebski
2999cb5576 cdc: test both enabled and disabled preimage in test_pre_image_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
222b94c707 cdc: Return preimage only when it's requested
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
c94a5947b7 cdc: enable preimage in test_pre_image_logging test
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
595c9f9d32 cdc::append_log_mutations: fix undefined behavior
The code was iterating over a collection that was modified
at the same time. Iterators were used for that and collection
modification can invalidate all iterators.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
f0f44f9c51 cdc::append_log_mutations: use do_with instead of shared_ptr
This will not only safe some allocations but also improve
code readability.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Piotr Jastrzebski
b8d9158c21 cdc: Don't take storage_proxy as transformer::pre_image_select param
transformer has access to storage_proxy through its _ctx field.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-11-25 12:43:39 +01:00
Nadav Har'El
3eab6cd549 Merged "toolchain: update to Fedora 31"
Merged pull request https://github.com/scylladb/scylla/pull/5310 from
Avi Kivity:

This is a minor update as gcc and boost versions did not change. A noteable
update is patchelf 0.10, which adds support to large binaries.

A few minor issues exposed by the update are fixed in preparatory patches.

Patches:
  dist: rpm: correct systemd post-uninstall scriptlet
  build: force xz compression on rpm binary payload
  tools: toolchain: update to Fedora 31
2019-11-24 13:38:45 +02:00
Tomasz Grabiec
e3d025d014 row_cache: Fix abort on bad_alloc during cache update
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).

Fix by destroying the coroutine first.

Fixes #5327.

Tests:
  - row_cache_test (dev)

Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola
8599f8205b rpmbuild: don't use dwz
By default rpm uses dwz to merge the debug info from various
binaries. Unfortunately, it looks like addr2line has not been updated
to handle this:

// This works
$ addr2line  -e build/release/scylla 0x1234567

$ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug

// now this fails
$ addr2line -e build/release/scylla 0x1234567

I think the issue is

https://sourceware.org/bugzilla/show_bug.cgi?id=23652

Fixes #5289

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123015734.89331-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
25d5d39b3c reloc: Force using sha1 for build-ids
The default build-id used by lld is xxhash, which is 8 bytes long. rpm
requires build-ids to be at least 16 bytes long
(https://github.com/rpm-software-management/rpm/issues/950). We force
using sha1 for now. That has no impact in gold and bfd since that is
their default. We set it in here instead of configure.py to not slow
down regular builds.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123020801.89750-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
b5667b9c31 build: don't compress debug info in executables
By default we were compressing debug info only in release
executables. The idea, if I understand it correctly, is that those are
the ones we ship, so we want a more compact binary.

I don't think that was doing anything useful. The compression is just
gzip, so when we ship a .tar.xz, having the debug info compressed
inside the scylla binary probably reduces the overall compression a
bit.

When building a rpm the situation in amusing. As part of the rpm
build process the debug info is decompressed and extracted to an
external file.

Given that most of the link time goes to compressing debug info, it is
probably a good idea to just skip that.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123022825.102837-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
d84859475e Merge "Refactor test.py and cleanup resources" from Kostja
Structure the code to be able to introduce futures.
Apply trivial cleanups.
Switch to asyncio and use it to work with processes and
handle signals. Cleanup all processes upon signal.
2019-11-24 11:35:29 +02:00
Tomasz Grabiec
e166fdfa26 Merge "Optimize LWT query phase" from Vladimir Davydov
This patch implements a simple optimization for LWT: it makes PAXOS
prepare phase query locally and return the current value of the modified
key so that a separate query is not necessary. For more details see
patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial
preparatory refactoring.
2019-11-24 11:35:29 +02:00
Pavel Solodovnikov
4879db70a6 system_keyspace: support timeouts in queries to system.paxos table.
Also introduce supplementary `execute_cql_with_timeout` function.

Remove redundant comment for `execute_cql`.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
bf5f864d80 paxos: piggyback result query on prepare response
Current LWT implementation uses at least three network round trips:
 - first, execute PAXOS prepare phase
 - second, query the current value of the updated key
 - third, propose the change to participating replicas

(there's also learn phase, but we don't wait for it to complete).

The idea behind the optimization implemented by this patch is simple:
piggyback the current value of the updated key on the prepare response
to eliminate one round trip.

To generate less network traffic, only the closest to the coordinator
replica sends data while other participating replicas send digests which
are used to check data consistency.

Note, this patch changes the API of some RPC calls used by PAXOS, but
this should be okay as long as the feature in the early development
stage and marked experimental.

To assess the impact of this optimization on LWT performance, I ran a
simple benchmark that starts a number of concurrent clients each of
which updates its own key (uncontended case) stored in a cluster of
three AWS i3.2xlarge nodes located in the same region (us-west-1) and
measures the aggregate bandwidth and latency. The test uses shard-aware
gocql driver. Here are the results:

                latency 99% (ms)    bandwidth (rq/s)    timeouts (rq/s)
    clients     before  after       before  after       before  after
          1          2      2          626    637            0      0
          5          4      3         2616   2843            0      0
         10          3      3         4493   4767            0      0
         50          7      7        10567  10833            0      0
        100         15     15        12265  12934            0      0
        200         48     30        13593  14317            0      0
        400        185     60        14796  15549            0      0
        600        290     94        14416  15669            0      0
        800        568    118        14077  15820            2      0
       1000        710    118        13088  15830            9      0
       2000       1388    232        13342  15658           85      0
       3000       1110    363        13282  15422          233      0
       4000       1735    454        13387  15385          329      0

That is, this optimization improves max LWT bandwidth by about 15%
and allows to run 3-4x more clients while maintaining the same level
of system responsiveness.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
6160b9017d commitlog: make sure a file is closed
If allocate or truncate throws, we have to close the file.

Fixes #4877

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
3d1d4b018f paxos: remove unnecessary move constructor invocations
invoke_on() guarantees that captures object won't be destroyed until the
future returned by the invoked function is resolved so there's no need
to move key, token, proposal for calling paxos_state::*_impl helpers.
2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola
cfb079b2c9 types: Refactor duplicated value_cast implementation
The two implementations of value_cast were almost identical.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-3-espindola@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
ef2e96c47c storage_proxy: factor out helper to sort endpoints by proximity
We need it for PAXOS.
2019-11-24 11:35:29 +02:00
Nadav Har'El
854e6c8d7b alternator-test: test_health_only_works_for_root_path: remove wrong check
The test_health_only_works_for_root_path test checks that while Alternator's
HTTP server responds to a "GET /" request with success ("health check"), it
should respond to different URLs with failures (page not found).

One of the URLs it tested was "/..", but unfortunately some versions of
Python's HTTP client canonize this request to just a "/", causing the
request to unexpectedly succeed - and the test to fail.

So this patch just drops the "/.." check. A few other nonsense URLs are
attempted by the test - e.g., "/abc".

Fixes #5321

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
63d4590336 storage_proxy: move digest_algorithm upper
We need it for PAXOS.

Mark it as static inline while we are at it.
2019-11-24 11:35:29 +02:00
Nadav Har'El
43d3e8adaf alternator: make DescribeTable return table schema
One of the fields still missing in DescribeTable's response (Refs #5026)
was the table's schema - KeySchema and AttributeDefinitions.

This patch adds this missing feature, and enables the previously-xfailing
test test_describe_table_schema.

A complication of this patch is that in a table with secondary indexes,
we need to return not just the base table's schema, but also the indexes'
schema. The existing tests did not cover that feature, so we add here
two more tests in test_gsi.py for that.

One of these secondary-index schema tests, test_gsi_2_describe_table_schema,
still fails, because it outputs a range-key which Scylla added to a view
because of its own implementation needs, but wasn't in the user's
definition of the GSI. I opened a separate issue #5320 for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
f5c2a23118 serializer: add reference_wrapper handling
Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats
reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or
std::tuple<> as T.

We need it to avoid copying query::result while serializing
paxos::promise.
2019-11-24 11:35:29 +02:00
Botond Dénes
89f9b89a89 scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0
Currently even if `-a` or `-s 0` is provided, `scylla task_histogram`
will scan a limited amount of pages due to a bug in the scan loop's stop
condition, which will be trigger a stop once the default sample limit is
reached. Fix the loop by skipping this check when the user wants to scan
all tasks.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>
2019-11-24 11:35:29 +02:00
Vladimir Davydov
1452653fbc query_context: fix use after free of timeout_config in execute_cql_with_timeout
timeout_config is used by reference by cql3::query_processor::process(),
see cql3::query_options, so the caller must make sure it doesn't go away.
2019-11-24 11:35:29 +02:00
Avi Kivity
ff7e78330c tools: toolchain: dbuild: work around "podman logs --follow" hang
At least some versions of 'podman logs --follow' hang when the
container eventually exits (also happens with docker on recent
versions). Fortunately, we don't need to use 'podman logs --follow'
and can use the more natural non-detached 'podman run', because
podman does not proxy SIGTERM and instead shuts down the container
when it receives it.

So, to work around the problem, use the same code path in interactive
and non-interactive runs, when podman is in use instead of docker.
2019-11-22 13:59:05 +02:00
Avi Kivity
702834d0e4 tools: dbuild: avoid uid/gid/selinux hacks when using podman
With docker, we went to considerable lengths to ensure that
access to mounted volume was done using the calling user, including
supplementary groups. This avoids root-owned files being left around
after a build, and ensures that access to group-shared files (like
/var/cache/ccache) works as expected.

All of this is unnecessary and broken when using podman. Podman
uses a proxy to access files on behalf of the container, so naturally
all access is done using the calling user's identity. Since it remaps
user and group IDs, assigning the host uid/gid is meaningless. Using
--userns host also breaks, because sudo no longer works.

Fix this by making all the uid/gid/selinux games specific to docker and
ignore them when using podman. To preserve the functionality of tools
that depend on $HOME, set that according to the host setting.
2019-11-22 13:58:29 +02:00
Tomasz Grabiec
9d7f8f18ab database: Avoid OOMing with flush continuations after failed memtable flush
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.

Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.

Fixes #3717
2019-11-22 12:08:36 +01:00
Tomasz Grabiec
fb28543116 lsa: Introduce operator bool() to occupancy_stats 2019-11-22 12:08:28 +01:00
Tomasz Grabiec
a69fda819c lsa: Expose region_impl::evictable_occupancy in the region class 2019-11-22 12:08:10 +01:00
Avi Kivity
1c181c1b85 tools: dbuild: don't mount duplicate volumes
podman refuses to start with duplicate volumes, which routinely
happen if the toplevel directory is the working directory. Detect
this and avoid the duplicate.
2019-11-22 10:13:30 +02:00
Konstantin Osipov
b8b5834cf1 test.py: simplify message output in run_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
90a8f79d7e test.py: use UnitTest class where possible 2019-11-21 23:16:22 +03:00
Konstantin Osipov
8cd8cfc307 test.py: rename harness command line arguments to 'options'
UnitTest class uses juggles with the name 'args' quite a bit to
construct the command line for a unit test, so let's spread
the harness command line arguments from the unit test command line
arguments a bit apart by consistently calling the harness command line
arguments 'options', and unit test command line arguments 'args'.

Rename usage() to parse_cmd_line().
2019-11-21 23:16:22 +03:00
Konstantin Osipov
e5d624d055 test.py: consolidate argument handling in UnitTest constructor
Create unique UnitTest objects in find_tests() for each found match,
including repeat, to ensure each test has its own unique id.
This will also be used to store execution state in the test.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
dd60673cef test.py: move --collectd to standard args 2019-11-21 23:16:22 +03:00
Konstantin Osipov
fe12f73d7f test.py: introduce class UnitTest 2019-11-21 23:16:22 +03:00
Konstantin Osipov
bbcdee37f7 test.py: add add_test_list() to find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
4723afa09c test.py: add long tests with add_test() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
13f1e2abc6 test.py: store the non-default seastar arguments along with definition 2019-11-21 23:16:22 +03:00
Konstantin Osipov
72ef11eb79 test.py: introduce add_test() to find_tests()
To avoid code duplication, and to build upon later.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b50b24a8a7 test.py: avoid an unnecessary loop in find_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
a5103d0092 test.py: move args.repeat processing to find_tests()
It somewhat stands in the way of using asyncio

This patch also implements a more comprehensive
fix for #5303, since we not only have --repeat, but
run some tests in different configurations, in which
case xml output is also overwritten.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
0f0a49b811 test.py: introduce print_summary() and write_xunit_report()
(One more moving of the code around).
2019-11-21 23:16:22 +03:00
Konstantin Osipov
22166771ef test.py: rename test_to_run tests_to_run 2019-11-21 23:16:22 +03:00
Konstantin Osipov
1d94d9827e test.py: introduce run_all_tests() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
29087e1349 test.py: move out run_test() routine
(Trivial code refactoring.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
79506fc5ab test.py: introduce find_tests()
Trivial code refactoring.
2019-11-21 23:16:22 +03:00
Konstantin Osipov
a44a1c4124 test.py: remove print_status_succint
(Trivial code cleanup.)
2019-11-21 23:16:22 +03:00
Konstantin Osipov
b9605c1d37 test.py: move mode list evaluation to usage() 2019-11-21 23:16:22 +03:00
Konstantin Osipov
0c4df5a548 test.py: add usage() 2019-11-21 23:16:22 +03:00
Pavel Emelyanov
e0f40ed16a cli: Add the --workdir|-W option
When starting scylla daemon as non-root the initialization fails
because standard /var/lib/scylla is not accessible by regular users.
Making the default dir accessible for user is not very convenient
either, as it will cause conflicts if two or more instances of scylla
are in use.

This problem can be resolved by specifying --commitlog-directory,
--data-file-directories, etc on start, but it's too much typing. I
propose to revive Nadav's --home option that allows to move all the
directories under the same prefix in one go.

Unlike Nadav's approach the --workdir option doesn't do any tricky
manipulations with existing directories. Insead, as Pekka suggested,
the individual directories are placed under the workir if and only
if the respective option is NOT provided. Otherwise the directory
configuration is taken as is regardless of whether its absolute or
relative path.

The values substutution is done early on start. Avi suggested that
this is unsafe wrt HUP config re-read and proper paths must be
resolved on the fly, but this patch doesn't address that yet, here's
why.

First of all, the respective options are MustRestart now and the
substitution is done before HUP handler is installed.

Next, commitlog and data_file values are copied on start, so marking
the options as LiveUpdate won't make any effect.

Finally, the existing named_value::operator() returns a reference,
so returning a calculated (and thus temporary) value is not possible
(from my current understanding, correct me if I'm wrong). Thus if we
want the *_directory() to return calculated value all callers of them
must be patched to call something different (e.g. *_directory.get() ?)
which will lead to more confusion and errors.

Changes v3:
 - the option is --workdir back again
 - the existing *directory are only affected if unset
 - default config doesn't have any of these set
 - added the short -W alias

Changes v2:
 - the option is --home now
 - all other paths are changed to be relative

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191119130059.18066-1-xemul@scylladb.com>
2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola
5417c5356b types: Move get_castas_fctn to cql3
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-9-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
f06d6df4df types: Simplify casts to string
These now just use the to_string member functions, which makes it
possible to move the code to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-8-espindola@scylladb.com>
2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola
786b1ec364 types: Move json code to its own file
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-7-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
af8e207491 types: Avoid using deserialize_value in json code
This makes it independent of internal functions and makes it possible
to move it to another file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-6-espindola@scylladb.com>
2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola
ed65e2c848 types: Move cql3_kind to the cql3 directory
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-5-espindola@scylladb.com>
2019-11-21 12:08:47 +02:00
Rafael Ávila de Espíndola
bd560e5520 types: Fix dynamic types of some data_value objects
I found these mismatched types while converting some member functions
to standalone functions, since they have to use the public API that
has more type checks.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-4-espindola@scylladb.com>
2019-11-21 12:08:46 +02:00
Rafael Ávila de Espíndola
0d953d8a35 types: Add a test for value_cast
We had no tests on when value_cast throws or when it moves the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-2-espindola@scylladb.com>
2019-11-21 12:08:45 +02:00
Konstantin Osipov
002ff51053 lua: make sure the latest master builds on Debian/Ubuntu
Use pkg-config to search for Lua dependencies rather
than hard-code include and link paths.

Avoid using boost internals, not present in earlier
versions of boost.

Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com>
Message-Id: <20191120170005.49649-1-kostja@scylladb.com>
2019-11-21 07:57:12 +02:00
Pavel Solodovnikov
d910899d61 configure.py: support multi-threaded linking via gold
Use `-Wl,--threads` flag to enable multi-threaded linking when
using `ld.gold` linker.

Additional compilation test is required because it depends on whether
or not the `gold` linker has been compiled with `--enable-threads` option.

This patch introduces a substantial improvement to the link times of
`scylla` binary in release and debug modes (around 30 percent).

Local setup reports the following numbers with release build for
linking only build/release/scylla:

Single-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30
Multi-threaded mode:
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>
2019-11-20 19:28:00 +02:00
Nadav Har'El
89d6d668cb Merge "Redis API in Scylla"
Merged patch series from Peng Jian, adding optionally-enabled Redis API
support to Scylla. This feature is experimental, and partial - the extent
of this support is detailed in docs/redis/redis.md.

Patches:
   Document: add docs/redis/redis.md
   redis: Redis API in Scylla
   Redis API: graft redis module to Scylla
   redis-test: add test cases for Redis API
2019-11-20 16:59:13 +02:00
Piotr Sarna
086e744f8f scripts/find-maintainer: refresh maintainers list
This commit attempts to make the maintainers list up-to-date
to the best of my knowledge, because it got really stale over the time.

Message-Id: <eab6d3f481712907eb83e91ed2b8dbfa0872155f.1574261533.git.sarna@scylladb.com>
2019-11-20 16:56:31 +02:00
Glauber Costa
73aff1fc95 api: export system uptime via REST
This will be useful for tools like nodetool that want to query the uptime
of the system.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190619110850.14206-1-glauber@scylladb.com>
2019-11-20 16:44:11 +02:00
Tomasz Grabiec
9a686ac551 Merge "scylla-gdb: active sstables: support k_l/mc sstable readers" from Benny
Fixes #5277
2019-11-19 23:49:39 +01:00
Avi Kivity
1164ff5329 tools: toolchain: update to Fedora 31
This is a minor update as gcc and boost versions do not change.

glibc-langpack-en no longer gets pulled in by default. As it is required
by some locale use somewhere, it is added to the explicit dependencies.
2019-11-20 00:08:30 +02:00
Avi Kivity
301c835cbf build: force xz compression on rpm binary payload
Fedora 31 switched the default compression to zstd, which isn't readable
by some older rpm distributions (CentOS 7 in particular). Tell it to use
the older xz compression instead, so packages produced on Fedora 31 can
be installed on older distributions.
2019-11-20 00:08:24 +02:00
Avi Kivity
3ebd68ef8a dist: rpm: correct systemd post-uninstall scriptlet
The post-uninstall scriptlet requires a parameter, but older versions
of rpm survived without it. Fedora 31's rpm is more strict, so supply
this parameter.
2019-11-20 00:03:49 +02:00
Peng Jian
e6adddd8ef redis-test: add test cases for Redis API
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:56:16 +08:00
Peng Jian
f2801feb66 Redis API: graft redis module to Scylla
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

v2: build: work around ragel 7 generated code bug (suggested by Avi)
    Ragel 7 incorrectly emits some unused variables that don't compile.
    As a workaround, sed them away.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
2019-11-20 04:55:58 +08:00
Peng Jian
0737d9e84d redis: Redis API in Scylla
Scylla has advantage and amazing features. If Redis build on the top of Scylla,
it has the above features automatically. It's achived great progress
in cluster master managment, data persistence, failover and replication.

The benefits to the users are easy to use and develop in their production
environment, and taking avantages of Scylla.

Using the Ragel to parse the Redis request, server abtains the command name
and the parameters from the request, invokes the Scylla's internal API to
read and write the data, then replies to client.

Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>
2019-11-20 04:55:56 +08:00
Peng Jian
708a42c284 Document: add docs/redis/redis.md
In this document, the detailed design and implementation of Redis API in
Scylla is provided.

Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
2019-11-20 04:46:33 +08:00
Nadav Har'El
9b9609c65b merge: row_marker: correct row expiry condition
Merged patch set by Piotr Dulikowski:

This change corrects condition on which a row was considered expired by its
TTL.

The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.

Fixes #4263,
Fixes #5290.

Tests:

    unit(dev)
    python test described in issue #5290
2019-11-19 18:14:15 +02:00
Amnon Heiman
9df10e2d4b scylla_util.py: Add optional timeout to out function
It is useful to have an option to limit the execution time of a shell
script.

This patch adds an optional timeout parameter, if a parameter will be
provided a command will return and failure if the duration is passed.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-11-19 17:30:28 +02:00
Nadav Har'El
b38c3f1288 Merge "Add separate counters for accesses to system tables"
Merged patch series from Juliusz Stasiewicz:

Welcome to my first PR to Scylla!
The task was intended as a warm-up ("noob") exercise; its description is
here: #4182 Sorry, I also couldn't help it and did some scouting: edited
descriptions of some metrics and shortened few annoyingly long LoC.
2019-11-19 15:21:56 +02:00
Piotr Dulikowski
9be842d3d8 row_marker: tests for row expiration 2019-11-19 13:45:30 +01:00
Tomasz Grabiec
5e4abd75cc main: Abort on EBADF and ENOTSOCK by default
Those are typically symptoms of use-after-free or memory corruption in
the program. It's better to catch such error sooner than later.

That situation is also dangerous since if a valid descriptor would
land under the invalid access, not the one which was intended for the
operation, then the operation may be performed on the wrong file and
result in corruption.

Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>
2019-11-19 13:07:33 +02:00
Piotr Dulikowski
589313a110 row_marker: correct expiration condition
This change corrects condition on which a row was considered expired by
its TTL.

The logic that decides when a row becomes expired was inconsistent with
the logic that decides if a single cell is expired. A single cell
becomes expired when `expiry_timestamp <= now`, while a row became
expired when `expiry_timestamp < now` (notice the strict inequality).
For rows inserted with TTL, this caused non-key cells to expire (change
their values to null) one second before the row disappeared. Now, row
expiry logic uses non-strict inequality.

Fixes: #4263, #5290.

Tests:
- unit(dev)
- python test described in issue #5290
2019-11-19 11:46:59 +01:00
Pekka Enberg
505f2c1008 test.py: Append test repeat cycle to output XML filename
Currently, we overwrite the same XML output file for each test repeat
cycle. This can cause invalid XML to be generated if the XML contents
don't match exactly for every iteration.

Fix the problem by appending the test repeat cycle in the XML filename
as follows:

  $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test

  $ ls -1 *.xml
  jenkins_test.release.vint_serialization_test.0.boost.xml
  jenkins_test.release.vint_serialization_test.1.boost.xml
  jenkins_test.release.vint_serialization_test.2.boost.xml


Fixes #5303.

Message-Id: <20191119092048.16419-1-penberg@scylladb.com>
2019-11-19 11:30:47 +02:00
Rafael Ávila de Espíndola
750adee6e3 lua: fix build with boost 1.67 and older vs fmt
It is not completely clear why the fmt base code fails with boost
1.67, but it is easy to avoid.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191118210540.129603-1-espindola@scylladb.com>
2019-11-19 11:14:00 +02:00
Tomasz Grabiec
ff567649fa Merge "gossip: Limit number of pending gossip ACK and ACK2 messages" from Asias
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.
2019-11-18 10:52:38 +01:00
Benny Halevy
f9e93bba38 sstables: compaction: move cleanup parameter to compaction_descriptor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>
2019-11-18 10:52:20 +01:00
Avi Kivity
1fe062aed4 Merge "Add basic UDF support" from Rafael
"

This patch series adds only UDF support, UDA will be in the next patch series.

With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.

We require --experimental to enable UDF in case there is some change
to the table format.
"

* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
  Lua: Document the conversions between Lua and CQL
  Lua: Implement decimal subtraction
  Lua: Implement decimal addition
  Lua: Implement support for returning decimal
  Lua: Implement decimal to string conversion
  Lua: Implement decimal to floating point conversion
  Lua: Implement support for decimal arguments
  Lua: Implement support for returning varint
  Lua: Implement support for returning duration
  Lua: Implement support for duration arguments
  Lua: Implement support for returning inet
  Lua: Implement support for inet arguments
  Lua: Implement support for returning time
  Lua: Implement support for time arguments
  Lua: Implement support for returning timeuuid
  Lua: Implement support for returning uuid
  Lua: Implement support for uuid and timeuuid arguments
  Lua: Implement support for returning date
  Lua: Implement support for date arguments
  Lua: Implement support for returning timestamp
  ...
2019-11-17 16:38:19 +02:00
Konstantin Osipov
48f3ca0fcb test.py: use the configured build modes from ninja mode_list
Add mode_list rule to ninja build and use it by default when searching
for tests in test.py.

Now it is no longer necessary to explicitly specify the test mode when
invoking test.py.

(cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)
2019-11-17 13:42:10 +01:00
Nadav Har'El
2fb2eb27a2 sstables: allow non-traditional characters in table name
The goal of this patch is to fix issue #5280, a rather serious Alternator
bug, where Scylla fails to restart when an Alternator table has secondary
indexes (LSI or GSI).

Traditionally, Cassandra allows table names to contain only alphanumeric
characters and underscores. However, most of our internal implementation
doesn't actually have this restriction. So Alternator uses the characters
':' and '!' in the table names to mark global and local secondary indexes,
respectively. And this actually works. Or almost...

This patch fixes a problem of listing, during boot, the sstables stored
for tables with such non-traditional names. The sstable listing code
needlessly assumes that the *directory* name, i.e., the CF names, matches
the "\w+" regular expression. When an sstable is found in a directory not
matching such regular expression, the boot fails. But there is no real
reason to require such a strict regular expression. So this patch relaxes
this requirement, and allows Scylla to boot with Alternator's GSI and LSI
tables and their names which include the ":" and "!" characters, and in
fact any other name allowed as a directory name.

Fixes #5280.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191114153811.17386-1-nyh@scylladb.com>
2019-11-17 14:27:47 +02:00
Shlomi Livne
3e873812a4 Document backport queue and procedure (#5282)
This document adds information about how fixes are tracked to be
backported into releases and what is the procedure that is followed to
backport those fixes.

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2019-11-17 01:45:24 -08:00
Benny Halevy
c215ad79a9 scylla-gdb: resolve: add startswith parameter
Allow filtering the resolved addresses by a startswith string.

The common use case if for resolving vtable ptrs, when resolving
the output of `find_vptrs` that may be too long for the host
(running gdb) memory size. In this case the number of vtable
ptrs is considerably smaller than the total number of objects
returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I
examined from scylla --smp=2 --memory=1024M)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:40:54 +02:00
Benny Halevy
2f688dcf08 scylla-gdb.py: find_single_sstable_readers: fix support for sstable_mutation_reader
provide template arguments for k_l and m readers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-11-17 11:02:05 +02:00
Kamil Braun
a67e887dea sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285)
CQL tracing would only report file I/O involving one sstable, even if
multiple sstables were read from during the query.

Steps to reproduce:

create a table with NullCompactionStrategy
insert row, flush memtables
insert row, flush memtables
restart Scylla
tracing on
select * from table
The trace would only report DMA reads from one of the two sstables.

Kudos to @denesb for catching this.

Related issue: #4908
2019-11-17 00:38:37 -08:00
Tomasz Grabiec
a384d0af76 Merge "A set of cleanups over main() code" from Pavel E.
There are ... signs of massive start/stop code rework in the
main() function. While fixing the sub-modules interdependencies
during start/stop I've polished these signs too, so here's the
simplest ones.
2019-11-15 15:25:18 +01:00
Pavel Emelyanov
1dc490c81c tracing: Move register_tracing_keyspace_backend forward decl into proper header
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
7e81df71ba main: Shorten developer_mode() evaluation
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
1bd68d87fc main: Do not carry pctx all over the code
v2:
- do not use struct initialization extention

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
655b6d0d1e main: Hide start_thrift
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
26f2b2ce5e main,db: Kill some unused .hh includes
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
f5b345604f main: Factor out get_conf_sub
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
924d52573d main: Remove unused return_value variable (and capture)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2019-11-14 19:59:03 +03:00
Pavel Emelyanov
2195edb819 gitignore: Add tags file
This file is generated by ctags utility for navigation, so it
is not to be tracked by git.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191031221339.19030-1-xemul@scylladb.com>
2019-11-14 16:50:11 +01:00
Gleb Natapov
e0668f806a lwt: change format of partition key serialization for system.paxos table
Serialize provided partition_key in such a way that the serialized value
will hash to the same token as the original key. This way when system.paxos
table is updated the update is shard local.

Message-Id: <20191114135449.GU10922@scylladb.com>
2019-11-14 15:07:16 +01:00
Avi Kivity
19b665ea6b Merge "Correctly handle null/unset frozen collection/UDT columns in INSERT JSON." from Kamil
"
When using INSERT JSON with frozen collection/UDT columns, if the columns were left unspecified or set to null, the statement would create an empty non-null value for these columns instead of using null values as it should have. For example:

cqlsh:b> create table t (k text primary key, l frozen<list<int>>, m frozen<map<int, int>>, s frozen<set<int>>, u frozen<ut>);
cqlsh:b> insert into t JSON '{"k": "insert_json"}';
cqlsh:b> select * from t;
 k                 | l    | m    | s    | u
-------------------+------+------+------+------
       insert_json |     [] |     {} |     {} |

This PR fixes this.
Resolves #5246 and closes #5270.
"

* 'frozen-json' of https://github.com/kbr-/scylla:
  tests: add null/unset frozen collection/UDT INSERT JSON test
  cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON
  cql3: decouple execute from term binding in user_type::setter
2019-11-14 15:29:30 +02:00
Avi Kivity
4544aa0b34 Update seastar submodule
* seastar 75e189c6ba...6f0ef32514 (6):
  > Merge "Add named semaphores" from Piotr
  > parallel_for_each_state: pass rvalue reference to add_future
  > future: Pass rvalue to uninitialized_wrapper::uninitialized_set.
  > dependencies: Add libfmt-dev to debian
  > log: Fix logger behavior when logging both to stdout and syslog.
  > README.md: list Scylla among the projects using Seastar
2019-11-14 15:01:18 +02:00
Juliusz Stasiewicz
1cfa458409 metrics: separate counters for `system' KS accesses
Resolves #4182. Metrics per system tables are accumulated separately,
depending on the origin of query (DB internals vs clients).
2019-11-14 13:14:39 +01:00
Vladimir Davydov
ab42b72c6d cql: fix SERIAL consistency check for batch statements
If CONSISTENCY is set to SERIAL or LOCAL SERIAL, all write requests must
fail according to Cassandra's documentation. However, batched writes
bypass this check. Fix this.
2019-11-14 12:15:39 +01:00
Vladimir Davydov
25aeefd6f3 cql: fix CAS consistency level validation
This patch resurrects Cassandra's code validating a consistency level
for CAS requests. Basically, it makes CAS requests use a special
function instead of validate_for_write to make error messages more
coherent.

Note, we don't need to resurrect requireNetworkTopologyStrategy as
EACH_QUORUM should work just fine for both CAS and non-CAS writes.
Looks like it is just an artefact of a rebase in the Cassandra
repository.
2019-11-14 12:15:39 +01:00
Juliusz Stasiewicz
b1e4d222ed cql3: cosmetics - improved description of metrics 2019-11-14 10:35:42 +01:00
Avi Kivity
cd075e9132 reloc: do not install dependencies when building the relocatable package
The dependencies are provided by the frozen toolchain. If a dependency
is missing, we must update the toolchain rather than rely on build-time
installation, which is not reproducible (as different package versions
are available at different times).

Luckily "dnf install" does not update an already-installed package. Had
that been a case, none of our builds would have been reproducible, since
packages would be updated to the latest version as of the build time rather
than the version selected by the frozen toolchain.

So, to prevent missing packages in the frozen toolchain translating to
an unreproducible build, remove the support for installing dependencies
from reloc/build_reloc.sh. We still parse the --nodeps option in case some
script uses it.

Fixes #5222.

Tests: reloc/build_reloc.sh.
2019-11-14 09:37:14 +02:00
Gleb Natapov
552c56633e storage_proxy: do not release mutation if not all replies were received
MV backpressure code frees mutation for delayed client replies earlier
to save memory. The commit 2d7c026d6e that
introduced the logic claimed to do it only when all replies are received,
but this is not the case. Fix the code to free only when all replies
are received for real.

Fixes #5242

Message-Id: <20191113142117.GA14484@scylladb.com>
2019-11-13 16:23:19 +02:00
Raphael S. Carvalho
3e70523111 distributed_loader: Release disk space of SSTables deleted by resharding
Resharding is responsible for the scheduling the deletion of sstables
resharded, but it was not refreshing the cache of the shards those
sstables belong to, which means cache was incorrectly holding reference
to them even after they were deleted. The consequence is sstables
deleted by resharding not having their disk space freed until cache
is refreshed by a subsequent procedure that triggers it.

Fixes #5261.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191107193550.7860-1-raphaelsc@scylladb.com>
2019-11-13 16:03:27 +02:00
Avi Kivity
6aed3b7471 Merge "cql: trivial cleanup" from Vova
* 'cql-trivial-cleanup' of ssh://github.com/scylladb/scylla-dev:
  cql: rename modification_statement::_sets_a_collection to _selects_a_collection
  cql: rename _column_conditions to _regular_conditions
  cql: remove unnecessary optional around prefetch_data
2019-11-13 15:12:10 +02:00
Avi Kivity
1cb9f9bdfe Merge "Use a fixed-size bitset for column set" from Kostja
"
Use a fixed-size, rather than a dynamically growing
bitset for column mask. This avoids unnecessary memory
reallocation in the most common case.
"

* 'column_set' of ssh://github.com/scylladb/scylla-dev:
  schema: pre-allocate the bitset of column_set
  schema: introduce schema::all_columns_count()
  schema: rename column_mask to column_set
2019-11-13 15:08:13 +02:00
Tomasz Grabiec
f68e17eb52 Merge "Partition/row hit/miss counters for memtable write operations" from Piotr D.
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-13 13:11:51 +01:00
Juliusz Stasiewicz
8318a6720a cql3: error msg w/ arg counts for prepared stmts with wrong arg cnt
Fixes #3748. Very small change: added argument count (expectation vs. reality)
to error msg within `invalid_request_exception'.
2019-11-13 13:43:37 +02:00
Nadav Har'El
ccb9038c69 alternator: Implement Expected operators LT and GT
Merged patch series from Dejan Mircevski. Implements the "LT" and "GT"
operators of the Expected update option (i.e., conditional updates),
and enables the pre-existing tests for them.
2019-11-13 12:07:44 +02:00
Konstantin Osipov
6159c012db schema: pre-allocate the bitset of column_set
The number of columns is usually small, and avoiding
a resize speeds up bit manipulation functions.
2019-11-13 11:41:51 +03:00
Konstantin Osipov
e95d675567 schema: introduce schema::all_columns_count()
schema::all_columns_count() will be used to reserve
memory of the column_set bitmask.
2019-11-13 11:41:42 +03:00
Konstantin Osipov
191acec7ab schema: rename column_mask to column_set
Since it contains a precise set of columns, it's more
accurate to call it a set, not a mask. Besides, the name
column_mask is already used for column options on storage
level.
2019-11-13 11:41:30 +03:00
Kamil Braun
d6446e352e tests: add null/unset frozen collection/UDT INSERT JSON test
When using INSERT JSON with null/unspecified frozen collection/UDT
columns, the columns should be set to null.

See #5270.
2019-11-12 18:24:47 +01:00
Vladimir Davydov
8110178e5d cql: rename modification_statement::_sets_a_collection to _selects_a_collection
This is merely to avoid confusion: we use _sets prefix to indicate that
there are operations over static/regular columns (_sets_static_columns,
_sets_regular_columns), but _sets_a_collection is set for both operations
and conditions. So let's rename it to _selects_a_collection and add some
comments.
2019-11-12 20:15:42 +03:00
Vladimir Davydov
a19192950e cql: rename _column_conditions to _regular_conditions
It's weird that modification_statement has _static_conditions for
conditions on static columns and _column_conditions for conditions on
regular columns, as if conditions on static columns are not column
conditions. Let's rename _column_conditions to _regular_conditions to
avoid confusion.
2019-11-12 20:15:35 +03:00
Konstantin Osipov
0ad0369684 cql: remove unnecessary optional around prefetch_data 2019-11-12 20:15:24 +03:00
Kamil Braun
6c04c5bed5 cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON
Before this commit, an empty non-null value was created for
frozen collection/UDT columns when an INSERT JSON statement was executed
with the value left unspecified or set to null.
This was incompatible with Cassandra which inserted a null (dead cell).

Fixes #5270.
2019-11-12 18:05:01 +01:00
Kamil Braun
0ad7d71f31 cql3: decouple execute from term binding in user_type::setter
This commit makes it possible to pass a bound value terminal
directly to the setter.
Continuation of commit bfe3c20035.
2019-11-12 18:02:21 +01:00
Takuya ASADA
614ec6fc35 install.sh: drop --pkg option, use .install file on .deb package
--pkg option on install.sh is introduced for .deb packaging since it requires
different install directory for each subpackage.
But we actually able to use "debian/tmp" for shared install directory,
then we can specify file owner of the package using .install files.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191030203142.31743-1-syuu@scylladb.com>
2019-11-12 16:50:37 +02:00
Piotr Dulikowski
59fbbb993f memtables: add partition/row hit/miss counters
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-12 13:35:41 +01:00
Piotr Dulikowski
48f7b2e4fb table: move out table::stats to table_stats
This change was done in order to be able to forward-declare
the table::stats structure.
2019-11-12 13:35:41 +01:00
Avi Kivity
cf7291462d Merge "cql3/functions: add missing min/max/count functions for ascii type" from Piotr
"
Adds missing overloads of functions count, min, max for type ascii.

Now they work:

cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE ks;
cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii);
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl');
cqlsh:ks> SELECT * FROM test_ascii;

 id | value
----+-------
  1 |  efgh
  0 |  abcd
  2 |  ijkl

(3 rows)
cqlsh:ks> SELECT count(value) FROM test_ascii;

 system.count(value)
---------------------
                   3

(1 rows)
cqlsh:ks> SELECT min(value) FROM test_ascii;

 system.min(value)
-------------------
              abcd

(1 rows)
cqlsh:ks> SELECT max(value) FROM test_ascii;

 system.max(value)
-------------------
              ijkl

(1 rows)
Tests:

unit(release)
cql_group_functions_tests.py (with added check for ascii type)

Fixes #5147.
"

* '5147-fix-min-max-count-for-ascii' of https://github.com/piodul/scylla:
  tests/cql_query_test: add aggregate functions test
  cql3/functions: add missing min/max/count for ascii
2019-11-12 14:15:14 +02:00
Piotr Dulikowski
41cb16a526 tests/cql_query_test: add aggregate functions test
Adds a test for min, max and avg functions for those primitive types for
which those functions are working at the moment.
2019-11-12 13:01:34 +01:00
Piotr Dulikowski
6d78d7cc69 cql3/functions: add missing min/max/count for ascii
Adds missing overloads of functions `count`, `min`, `max` for
type `ascii`. Now they work:

cqlsh> CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1};
cqlsh> USE ks;
cqlsh:ks> CREATE TABLE test_ascii (id int PRIMARY KEY, value ascii);
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (0, 'abcd');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (1, 'efgh');
cqlsh:ks> INSERT INTO test_ascii (id, value) VALUES (2, 'ijkl');
cqlsh:ks> SELECT * FROM test_ascii;

 id | value
----+-------
  1 |  efgh
  0 |  abcd
  2 |  ijkl

(3 rows)
cqlsh:ks> SELECT count(value) FROM test_ascii;

 system.count(value)
---------------------
                   3

(1 rows)
cqlsh:ks> SELECT min(value) FROM test_ascii;

 system.min(value)
-------------------
              abcd

(1 rows)
cqlsh:ks> SELECT max(value) FROM test_ascii;

 system.max(value)
-------------------
              ijkl

(1 rows)

Tests:
- unit(release)
- cql_group_functions_tests.py (with added check for `ascii` type)

Fixes #5147.
2019-11-12 13:01:34 +01:00
Rafael Ávila de Espíndola
10bcbaf348 Lua: Document the conversions between Lua and CQL
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6ffddeae5e Lua: Implement decimal subtraction
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
aba8e531d1 Lua: Implement decimal addition
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bb84eabbb3 Lua: Implement support for returning decimal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc17312a86 Lua: Implement decimal to string conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
e83d5bf375 Lua: Implement decimal to floating point conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b568bf4f54 Lua: Implement support for decimal arguments
This is just the minimum to pass a value to Lua. Right now you can't
actually do anything with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
6c3f050eb4 Lua: Implement support for returning varint
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dc377abd68 Lua: Implement support for returning duration
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
c3f021d2e4 Lua: Implement support for duration arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9208b2f498 Lua: Implement support for returning inet
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
64be94ab01 Lua: Implement support for inet arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
faf029d472 Lua: Implement support for returning time
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
772f2a4982 Lua: Implement support for time arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
484f498534 Lua: Implement support for returning timeuuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9c2daf6554 Lua: Implement support for returning uuid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ae1a1a4085 Lua: Implement support for uuid and timeuuid arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8aeed5beb Lua: Implement support for returning date
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
384effa54b Lua: Implement support for date arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
63bc960152 Lua: Implement support for returning timestamp
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee95756f62 Lua: Implement support for timestamp arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
1c6d5507b4 Lua: Implement support for returning counter
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0d9d53b5da Lua: Implement support for counter arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
74c4e58b6b Lua: Add a test for nested types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b226511ce8 Lua: Implement support for returning maps
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5c8d1a797f Lua: Implement support for map arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b5b15ce4e6 Lua: Implement support for returning set
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf7ba441e4 Lua: Implement support for set arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
02f076be43 Lua: Implement support for returning udt
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
92c8e94d9a Lua: Implement support for udt arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
a7c3f6f297 Lua: Implement support for returning list
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
688736f5ff Lua: Implement support for returning tuple
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ab5708a711 Lua: Implement support for list and tuple arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
534f29172c Lua: Implement support for returning boolean
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
b03c580493 Lua: Implement support for boolean arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dcfe397eb6 Lua: Implement support for returning floating point
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
cf4b7ab39a Lua: Implement support for returning blob
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3d22433cd4 Lua: Implement support for blob arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
dd754fcf01 Lua: Implement support for returning ascii
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
affb1f8efd Lua: Implement support for returning text
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
f8ed347ee7 Lua: Implement support for string arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
0e4f047113 Lua: Implement a visitor for return values
This adds support for all integer types. Followup commits will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
34b770e2fb Lua: Push varint as decimal
This makes it substantially simpler to support both varint and
decimal, which will be implemented in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9b3cab8865 Lua: Implement support for varint to integer conversion
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
5a40264d97 Lua: Implement support for varint arguments
Right now it is not possible to do anything with the value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
3230b8bd86 Lua: Implement support for floating point arguments
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
9ad2cc2850 Lua: Implement a visitor for arguments
With this we support all simple integer types. Followup patches will
implement the missing types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ee1d87a600 Lua: Plug in the interpreter
This add a wrapper around the lua interpreter so that function
executions are interruptible and return futures.

With this patch it is possible to write and use simple UDFs that take
and return integer values.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
bc3bba1064 Lua: Add lua.cc and lua.hh skeleton files
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
7015e219ca Lua: Link with liblua
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
61200ebb04 Lua: Add config options
This patch just adds the config options that we will expose for the
lua runtime.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
d9337152f3 Use threads when executing user functions
This adds a requires_thread predicate to functions and propagates that
up until we get to code that already returns futures.

We can then use the predicate to decide if we need to use
seastar::async.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
52b48b415c Test that schema digests with UDFs don't change
This refactors test_schema_digest_does_not_change to also test a
schema with user defined functions and user defined aggregates.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
fc72a64c67 Add schema propagation and storage for UDF
With this it is possible to create user defined functions and
aggregates and they are saved to disk and the schema change is
propagated.

It is just not possible to call them yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola
ce6304d920 UDF: Add a feature and config option to track if udf is enabled
It can only be enabled with --experimental.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:40:47 -08:00
Rafael Ávila de Espíndola
dd17dfcbef Reject "OR REPLACE ... IF NOT EXISTS" in the grammar
The parser now rejects having both OR REPLACE and IF NOT EXISTS in the
same statement.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
e7e3dab4aa Convert UDF parsing code to c++
For now this just constructs the corresponding c++ classes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5c45f3b573 Update UDF syntax
This updates UDF syntax to the current specification.

In particular, this removes DETERMINISTIC and adds "CALLED ON NULL
INPUT" and "RETURNS NULL ON NULL INPUT".

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
c75cd5989c transport: Add support for FUNCTION and AGGREGATE to schema_change
While at it, modernize the code a bit and add a test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
dac3cf5059 Clear functions between cql_test_env runs
At some point we should make the function list non static, but this
allows us to write tests for now.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
de1a970b93 cql: convert functions to add, remove and replace functions
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
33f9d196f9 Add iterator version of functions::find
This avoids allocating a std::vector and is more flexible since the
iterator can be passed to erase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
7f9dadee5c Implement functions::type_equals.
Since the types are uniqued we can just use ==.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
5cef5a1b38 types: Add a friend visitor over data_value
This is a simple wrapper that allows code that is not in the types
hierarchy to visit a data_value.

Will be used by UDF.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola
9bf9a84e4d types: Move the data_value visitor to a header
It will be used by the UDF implementation.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-11-07 08:19:52 -08:00
Yaron Kaikov
4a9b2a8d96 dist/docker: Add SCYLLA_REPO_URL argument to Dockerfile (#5264)
This change adds a SCYLLA_REPO_URL argument to Dockerfile, which defines
the RPM repository used to install Scylla from.

When building a new Docker image, users can specify the argument by
passing the --build-arg SCYLLA_REPO_URL=<url> option to the docker build
command. If the argument is not specified, the same RPM repository is
used as before, retaining the old default behavior.

We intend to use this in release engineering infrastructure to specify
RPM repositories for nightly builds of release branches (for example,
3.1.x), which are currently only using the stable RPMs.
2019-11-07 09:21:05 +02:00
Pavel Emelyanov
486e3f94d0 deps: Add libunistring-dev to debian
With this, previous patch to seastar and (suddenly) xenial repo for
scylla-libthrift010-dev scylla-antlr35-c++-dev the build on debian
buster finally passes.

Signed-off-by: Pavel Emelyanov <xemul@scyladb.com>
Message-Id: <CAHTybb-QFyJ7YQW0b6pjhY_xUr-_b1w_O3K1=1FOwrNM55BkLQ@mail.gmail.com>
2019-11-01 09:03:39 +02:00
Dejan Mircevski
859883b31d alternator: Implement GT operator in Expected
Add cmp_gt and use it in check_compare() to handle the GT case.  Also
reactivate GT tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 17:18:22 -04:00
Dejan Mircevski
0f7d837757 alternator: Factor out check_compare()
Code for check_LT(), check_GT(), etc. will be nearly identical, so
factor it out into a single function that takes a comparator object.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 17:01:29 -04:00
Dejan Mircevski
a47b768959 alternator: Implement LT operator in Expected
Add check_LT() function and reactivate LT tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 16:07:29 -04:00
Dejan Mircevski
ceae3c182f alternator: Overload base64_decode on rjson::value
In 1ca9dc5d47, it was established that the correct way to
base64-decode a JSON value is via string_view, rather than directly
from GetString().

This patch adds a base64_decode(rjson::value) overload, which
automatically uses the correct procedure.  It saves typing, ensures
correctness (fixing one incorrect call found), and will come in handy
for future EXPECTED comparisons.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 15:56:03 -04:00
Dejan Mircevski
9955f0342f alternator: Make unwrap_number() visible
unwrap_number() is now a public function in serialization.hh instead
of a static function visible only in executor.cc.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-31 10:46:30 -04:00
Nadav Har'El
3f859adebd Merge: Fix filtering static columns on empty partitions
Merged patch series from Piotr Sarna:

An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.

Fixes #5248
2019-10-31 10:50:21 +02:00
Pavel Emelyanov
5fe4757725 docs: The scylla's dpdk config is boolean
Docs say one can say --disable-dpdk , while it's not so. It's the seastar's
configure.py that has tristate -dpdk option, the scylla's one can only be
enabled.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Message-Id: <CAHTybb-rxP8DbH-wW4Zf-w89iuCirt6T6-PjZAUfVFj7C5yb=A@mail.gmail.com>
2019-10-31 10:12:17 +02:00
Vladimir Davydov
9ea8114f8c cql: fix CAS metric label
"type" label is already in use for the counter type ("derive", "gauge",
etc). Using the same label for "cas" / "non-cas" overwrites it. Let's
instead call the new label "conditional" and use "yes" / "no" for its
value, as suggested by Kostja.
Message-Id: <3082b16e4d6797f064d58da95fb4e50b59ab795c.1572451480.git.vdavydov@scylladb.com>
2019-10-30 17:14:17 +01:00
Avi Kivity
398c482cd0 Merge "combined reader gallop mode" from Piotr
"
In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode.

In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations.

In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally.

A microbenchmark was added in order to test performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%
Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%
Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.

Tests: unit(release)
Fixes #3593.
"

* '3593-combined_reader-gallop-mode' of https://github.com/piodul/scylla:
  mutation_reader: gallop mode microbenchmark
  mutation_reader: combined reader gallop tests
  mutation_reader: gallop mode for combined reader
  mutation_reader: refactor prepare_next
2019-10-30 17:34:47 +02:00
Piotr Sarna
dd00470a44 tests: add a test case for filtering on static columns
The test case covers filtering with an empty partition.

Refs #5248
2019-10-30 15:34:10 +01:00
Piotr Sarna
ca6fe598ec cql3: fix filtering on a static column for empty partitions
An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.

Fixes #5248
2019-10-30 15:31:54 +01:00
Tomasz Grabiec
9da3aec115 Merge "Mutation diff improvements" from Benny
- accept diff_command option
 - standard input support
2019-10-30 13:40:58 +01:00
Tomasz Grabiec
0d9367e08f Merge "Scyllatop: one pass update of multiple metrics" from Benny
Update previous results dictionary using the update_metrics method.
It calls metric_source.query_list to get a list of results (similar to discover()) then for each line in the response it updates results dictionary.

New results may be appeneded depending on the do_append parameter (True by default).

Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build.
(E.g. dtest-debug/216/testReport/scyllatop_test/TestScyllaTop/default_start_test/)
2019-10-30 13:38:39 +01:00
Tomasz Grabiec
b7b0a53b50 Merge "Add metrics for light-weigth transactions" from Vova
This patch set adds metrics useful for analyzing light-weight
transaction performance. The same metrics are available in Cassandra.
2019-10-30 12:09:03 +01:00
Vladimir Davydov
f0075ba845 cql: account cas requests separately
This patch adds "type" label to the following CQL metrics:

  inserts
  updates
  deletes
  batches
  statements_in_batches

The label is set to "cas" for conditional statements and "non-cas" for
unconditional statements.

Note, for a batch to be accounted as CAS, it is enough to have just one
conditional statement. In this case all statements within the batch are
accounted as CAS as well.
2019-10-30 13:44:35 +03:00
Piotr Dulikowski
81883a9f2e mutation_reader: gallop mode microbenchmark
This microbenchmark tests performance of the galloping reader
optimization. A combining reader that merges results from four other
readers is created. Each sub-reader provides a range of 32 clustering
rows that is disjoint from others. All sub-readers return rows from
the same partition. An improvement can be observed after introducing the
galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty
close to the old ones. The only one that seems to have suffered slightly
is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined:
(1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%

Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%

Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.
2019-10-30 09:51:18 +01:00
Piotr Dulikowski
29d6842db9 mutation_reader: combined reader gallop tests 2019-10-30 09:51:18 +01:00
Piotr Dulikowski
2b4ca0c562 mutation_reader: gallop mode for combined reader
In case when a single reader contributes a stream of fragments
and keeps winning over other readers, mutation_reader_merger will
enter gallop mode, in which it is assumed that the reader will keep
winning over other readers. Currently, a reader needs to contribute
3 fragments to enter that mode.

In gallop mode, fragments returned by the galloping reader will be
compared with the best fragment from _fragment_heap. If it wins, the
fragment is directly returned. Otherwise, gallop mode ends and
merging performed as in general case, which involves heap operations.

In current implementation, when the end of partition is encountered
while in gallop mode, the gallop mode is ended unconditionally.

Fixes #3593.
2019-10-30 09:51:18 +01:00
Piotr Dulikowski
2a46a09e7c mutation_reader: refactor prepare_next
Move out logic responsible for adding readers at partition boundary
into `maybe_add_readers_at_partition_boundary`, and advancing one reader
into `prepare_one`. This will allow to reuse this logic outside
`prepare_next`.
2019-10-30 09:49:12 +01:00
Avi Kivity
623071020e commitlog: change variadic stream in read_log_file to future<struct>
Since seastar::streams are based on future/promise, variadic streams
suffer the same fate as variadic futures - deprecation and eventual
removal.

This patch therefore replaces a variadic stream in commitlog::read_log_file()
with a non-variadic stream, via a helper struct.

Tests: unit (dev)
2019-10-29 19:25:12 +01:00
Botond Dénes
271ab750a6 scylla-gdb.py: add replica section to scylla memory
Recently, scylla memory started to go beyond just providing raw stats
about the occupancy of the various memory pools, to additionally also
provide an overview of the "usual suspects" that cause memory pressure.
As part of this, recently 46341bd63f
added a section of the coordinator stats. This patch continues this
trend and adds a replica section, with the "usual suspects":
* read concurrency semaphores
* execution stages
* read/write operations

Example:

    Replica:
      Read Concurrency Semaphores:
        user sstable reads:        0/100, remaining mem:      84347453 B, queued: 0
        streaming sstable reads:   0/ 10, remaining mem:      84347453 B, queued: 0
        system sstable reads:      0/ 10, remaining mem:      84347453 B, queued: 0
      Execution Stages:
        data query stage:
          03 "service_level_sg_0"             4967
             Total                            4967
        mutation query stage:
             Total                            0
        apply stage:
          03 "service_level_sg_0"             12608
          06 "statement"                      3509
             Total                            16117
      Tables - Ongoing Operations:
        pending writes phaser (top 10):
                  2 ks.table1
                  2 Total (all)
        pending reads phaser (top 10):
               3380 ks.table2
                898 ks.table1
                410 ks.table3
                262 ks.table4
                 17 ks.table8
                  2 system_auth.roles
               4969 Total (all)
        pending streams phaser (top 10):
                  0 Total (all)

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029164817.99865-1-bdenes@scylladb.com>
2019-10-29 18:03:06 +01:00
Vladimir Davydov
e510288b6f api: wire up column_family cas-related statistics 2019-10-29 19:26:18 +03:00
Vladimir Davydov
b75862610e paxos_state: account paxos round latency
This patch adds the following per table stats:

  cas_prepare_latency
  cas_propose_latency
  cas_commit_latency

They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed
by Cassandra.
2019-10-29 19:26:18 +03:00
Vladimir Davydov
21c3c98e5b api: wire up storage_proxy cas-related statistics 2019-10-29 19:26:18 +03:00
Vladimir Davydov
c27ab87410 storage_proxy: add cas request accounting
This patch implements accounting of Cassandra's metrics related to
lightweight transactions, namely:

  cas_read_latency              transactional read latency (histogram)
  cas_write_latency             transactional write latency (histogram)
  cas_read_timeouts             number of transactional read timeouts
  cas_write_timeouts            number of transactional write timeouts
  cas_read_unavailable          number of transactional read
                                unavailable errors
  cas_write_unavailable         number of transactional write
                                unavailable errors
  cas_read_unfinished_commit    number of transaction commit attempts
                                that occurred on read
  cas_write_unfinished_commit   number of transaction commit attempts
                                that occurred on write
  cas_write_condition_not_met   number of transaction preconditions
                                that did not match current values
  cas_read_contention           how many contended reads were
                                encountered (histogram)
  cas_write_contention          how many contended writes were
                                encountered (histogram)
2019-10-29 19:25:47 +03:00
Vladimir Davydov
967a9e3967 storage_proxy: zap ballot_and_contention
Pass contention by reference to begin_and_repair_paxos(), where it is
incremented on every sleep. Rationale: we want to account the total
number of times query() / cas() had to sleep, either directly or within
begin_and_repair_paxos(), no matter if the function failed or succeeded.
2019-10-29 19:22:18 +03:00
Botond Dénes
49aa8ab8a0 scylla-gdb.py: add compatibility with Scylla 3.0
Even though every Scylla version has its own scylla-gdb.py, because we
don't backport any fixes or improvements, practically we end up always
using master's version when debugging older versions of Scylla too. This
is made harder by the fact that both Scylla's and its dependencies'
(most notably that of libstdc++ and boost) code is constantly changing
between releases, requiring edits to scylla-gdb.py to make it usable
with past releases.

This patch attempts to make it easier to use scylla-gdb.py with past
releases, more specifically Scylla 3.0. This is achieved by wrapping
problematic lines in a `try: except:` and putting the backward
compatible version in the `except:` clause. These lines have comments
with the version they provide support for, so they can be removed when
said version is not supported anymore.

I did not attempt to provide full coverage, I only fixed up problems
that surfaced when using my favourite commands with 3.0.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029155737.94456-1-bdenes@scylladb.com>
2019-10-29 17:05:19 +01:00
Botond Dénes
e48f301e95 repair: repair_cf_range(): extract result of local checksum calculation only once
The loop that collects the result of the checksum calculations and logs
any errors. The error logging includes `checksums[0]` which corresponds
to the checksum calculation on the local node. This violates the
assumption of the code following the loop, which assumes that the future
of `checksums[0]` is intact after the loop terminates. However this is
only true when the checksum calculation is successful and is false when
it fails, as in this case the loop extracts the error and logs it. When
the code after the loop checks again whether said calculation failed, it
will get a false negative and will go ahead and attempt to extract the
value, triggering an assert failure.
Fix by making sure that even in the case of failed checksum calculation,
the result of `checksum[0]` is extracted only once.

Fixes: #5238
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>
2019-10-29 17:00:37 +01:00
Avi Kivity
60ea29da90 Update seastar submodule
* seastar 2963970f6b...75e189c6ba (7):
  > posix-stack: Do auto-resolve of ipv6 scope iff not set for link-local dests
  > README.md: Add redpanda and smf to 'Projects using Seastar'
  > unix_domain_test: don't assume that at temporary_buffer is null terminated
  > socket_address: Use offsetof instead of null pointer
  > README: add projects using seastar section to readme
  > Adjustments for glibc 2.30 and hwloc 2.0
  > Mark future::failed() as const
2019-10-29 14:34:10 +02:00
Gleb Natapov
0e9df4eaf8 lwt: mark lwt as experimental
We may want to change paxos tables format and change internode protocol,
so hide lwt behind experimental flag for now.

Message-Id: <20191029102725.GM2866@scylladb.com>
2019-10-29 14:33:48 +02:00
Benny Halevy
79d5fed40b mutation_fragment_stream_validator: validate end of stream in partition_key filter
Currently end of stream validation is done in the destructor,
but the validator may be destructed prematurely, e.g. on
exception, as seen in https://github.com/scylladb/scylla/issues/5215

This patch adds a on_end_of_stream() method explicitly called by
consume_pausable_in_thread.  Also, the respective concepts for
ParitionFilter, MutationFragmentFilter and a new on for the
on_end_of_stream method were unified as FlattenedConsumerFilter.

Refs #5215

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)
2019-10-29 12:35:33 +01:00
Benny Halevy
d5f53bc307 mutation_fragment_stream_validator: validate partition key monotonicity
Fixes #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 736360f823621f7994964fee77f37378ca934c56)
2019-10-29 12:35:33 +01:00
Gleb Natapov
e5e44bfda2 client_state: fix get_timestamp_for_paxos() to always advance a timestamp
Message-Id: <20191029102336.GL2866@scylladb.com>
2019-10-29 13:07:33 +02:00
Tomasz Grabiec
c2a4c915f3 Merge "Fix a few issues with CAS requests" from Vladimir D.
There are a few issues at the CQL layer, because of which the result of
a CAS request execution may differ between Scylla and Cassandra. Mostly,
it happens when static columns are involved. The goal of this patch set
is to fix these issues, thus making Scylla's implementation of CAS yield
the same results as Cassandra's.
2019-10-29 11:50:15 +01:00
Rafael Ávila de Espíndola
c74864447b types: Simplify validate_visitor for strings
We have different types for ascii and utf8, so there is no need for
an extra if.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191024232911.22700-1-espindola@scylladb.com>
2019-10-29 11:02:55 +02:00
Nadav Har'El
d69ab1b588 CDC: (atomic) delta + (non-optional) pre-image data columns
Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski:

Adds delta and pre-image data column writes for the atomic columns in a
cdc-enabled table.

Note that in this patch set it is still unconditional. Adding option support
comes in next set.

Uses code more or less derived from alternator to select pre-image, using
raw query interface. So should be fairly low overhead to query generation.
Pre-image and delta mutations are mixed in with the actual modification
mutations to generate the full cdc log (sans post-image).
2019-10-29 09:39:28 +02:00
Calle Wilund
7db393fe12 cdc_test: Add helper methods + preimage test
Add filtering, sorting etc helpers + simple pre-image test

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-29 07:49:05 +01:00
Vladimir Davydov
65b86d155e cql: add static row to CAS failure result if there are static conditions
Even if no rows match clustering key restrictions of a conditional
statement with static columns conditions, we still must include the
static column value into the CAS failure result set. For example,
the following conditional DELETE statement

  create table t(k int, c int, s int static, v int, primary key(k, c));
  insert into t(k, s) values(1, 1);
  delete v from t where k=1 and c=1 if v=1 and s=1;

must return

  [applied=False, v=null, s=1]

not just

  [applied=False, v=null, s=null]

To fix that, set partition_slice::option::always_return_static_content
for querying rows used for checking conditions so that we have the
static row in update_parameters::prefetch_data even if no regular row
matches clustering column restrictions. Plus modify cas_request::
applies_to() so that it sets is_in_cas_result_set flag for the static
row in case there are static column conditions, but the result set
happens to be empty.

As pointed out by Tomek, there's another reason to set partition_slice::
option::always_return_static_content apart from building a correct
result set on CAS failure. There could be a batch with two statements,
one with clustering key restrictions which select no row, and another
statement with only static column conditions. If we didn't enable this
flag, we wouldn't get a static row even if it exists, and static column
conditions would evaluate as if the static row didn't exist, for
example, the following batch

  create table t(k int, c int, s int static, primary key(k, c));
  insert into t(k, s) values(1, 1);
  begin batch
  insert into t(k, c) values(1, 1) if not exists
  update t set s = 2 where k = 1 if s = 1
  apply batch;

would fail although it clearly must succeed.
2019-10-28 22:30:37 +03:00
Vladimir Davydov
e0b31dd273 query: add flag to return static row on partition with no rows
A SELECT statement that has clustering key restrictions isn't supposed
to return static content if no regular rows matches the restrictions,
see #589. However, for the CAS statement we do need to return static
content on failure so this patch adds a flag that allows the caller to
override this behavior.
2019-10-28 21:50:44 +03:00
Vladimir Davydov
57d284d254 cql: exclude statements not checked by cas from result set
Apart from conditional statements, there may be other reading statements
in a batch, e.g. manipulating lists. We must not include rows fetched
for them into the CAS result set. For instance, the following CAS batch:

  create table t(p int, c int, i int, l list<int>, primary key(p, c));
  insert into t(p, c, i) values(1, 1, 1)
  insert into t(p, c, i, l) values(1, 1, 1, [1, 2, 3])
  begin batch
  update t set i=3 where p=1 and c=1 if i=2
  update t set l=l-[2] where p=1 and c=2
  apply batch;

is supposed to return

  [applied] | p | c | i
  ----------+---+---+---
     False  | 1 | 1 | 1

not

  [applied] | p | c | i
  ----------+---+---+---
     False  | 1 | 1 | 1
     False  | 1 | 2 | 1

To filter out such collateral rows from the result set, let's mark rows
checked by conditional statements with a special flag.
2019-10-28 21:50:43 +03:00
Vladimir Davydov
74b9e80e4c cql: fix EXISTS check that applies only to static columns
If a CQL statement only updates static columns, i.e. has no clustering
key restrictions, we still fetch a regular row so that we can check it
against EXISTS condition. In this case we must be especially careful: we
can't simply pass the row to modification_statement::applies_to, because
it may turn out that the row has no static columns set, i.e. there's no
in fact static row in the partition. So we filter out such rows without
static columns right in cas_request::applies_to before passing them
further to modification_statement::applies_to.

Example:

  create table t(p int, c int, s int static, primary key(p, c));
  insert into t(p, c) values(1, 1);
  insert into t(p, s) values(1, 1) if not exists;

The conditional statement must succeed in this case.
2019-10-28 21:49:37 +03:00
Vladimir Davydov
8fbf344f03 cql: ignore clustering key if statement checks only static columns
In case a CQL statement has only static columns conditions, we must
ignore clustering key restrictions.

Example:

  create table t(p int, c int, s int static, v int, primary key(p, c));
  insert into t(p, s) values(1, 1);
  update t set v=1 where p=1 and c=1 if s=1;

This conditional statement must successfully insert row (p=1, c=1, v=1)
into the table even though there's no regular row with p=1 and c=1 in
the table before it's executed, because the statement condition only
applies to the static column s, which exists and matches.
2019-10-28 21:13:19 +03:00
Vladimir Davydov
54cf903bb2 cql: differentiate static from regular EXISTS conditions
If a modification statement doesn't have a clustering column restriction
while the table has static columns, then EXISTS condition just needs to
check if there's a static row in the partition, i.e. it doesn't need to
select any regular rows. Let's treat such EXIST condition like a static
column condition so that we can ignore its clustering key range while
checking CAS conditions.
2019-10-28 21:13:05 +03:00
Vladimir Davydov
934a87999f cql: turn prefetch_data::row into struct
This will allow us to add helper methods and store extra info in each
row. For example, we can add a method for checking if a row has static
columns. Also, to build CAS result set, we need to differentiate rows
fetched to check conditions from those fetched for reading operations.
Using struct as row container will allow us to store this information in
each prefetched row.
2019-10-28 21:12:52 +03:00
Vladimir Davydov
bdd62b8bc3 cql: remove static column check from create_clustering_ranges
The check is pointless, because we check exactly the same while
preparing the statement, see process_where_clause() method of
modification_statement.
2019-10-28 21:12:43 +03:00
Vladimir Davydov
a8ddbffa75 cql: fix applies_only_to_static_columns check
Currently, we set _sets_regular_columns/_sets_static_columns flags when
adding regular/static conditions to modification_statement. We use them
in applies_only_to_static_columns() function that returns true iff
_sets_static_columns is set and _sets_regular_columns is clear. We
assume that if this function returns true then the statement only deals
with static columns and so must not have clustering key restrictions.
Usually, that's true, but there's one exception: DELETE FROM ...
statement that deletes whole rows. Technically, this statement doesn't
have any column operations, i.e. _sets_regular_columns flag is clear.
So if such a statement happens to have a static condition, we will
assume that it only applies to static columns and mistakenly raise an
error.

Example:

  create table t(k int, c int, s int static, v int, primary key(k, c));
  delete from t where k=1 and c=1 if s=1;

To fix this, let's not set the above mentioned flags when adding
conditions and instead check if _column_conditions array is empty in
applies_only_to_static_columns().
2019-10-28 21:12:36 +03:00
Vladimir Davydov
fbb11dac11 cql: set conditions before processing where clause
modification_statement::process_where_clause() assumes that both
operations and conditions has been added to the statement when it's
called: it uses this information to raise an error in case the statement
restrictions are incompatible with operations or conditions. Currently,
operations are set before this function is called, but not conditions.
This results in "Invalid restrictions on clustering columns since
the {} statement modifies only static columns" error while trying to
execute the following statements:

  create table t(k int, c int, s int static, v int, primary key(k, c));
  delete s from t where k=1 and c=1 if v=1;
  update t set s=1 where k=1 and c=1 if v=1;

Fix this by always initializing conditions before processing WHERE
clause.
2019-10-28 21:12:22 +03:00
Botond Dénes
edc1750297 scylla-gdb.py: introduce scylla smp-queues
Print a histogram of the number of async work items in the shard's
outgoing smp queues.
Example:

    (gdb) scylla smp-queues
        10747 17 ->  3 ++++++++++++++++++++++++++++++++++++++++
          721 17 -> 19 ++
          247 17 -> 20 +
          233 17 -> 10 +
          210 17 -> 14 +
          205 17 ->  4 +
          204 17 ->  5 +
          198 17 -> 16 +
          197 17 ->  6 +
          189 17 -> 11 +
          181 17 ->  1 +
          179 17 -> 13 +
          176 17 ->  2 +
          173 17 ->  0 +
          163 17 ->  8 +
            1 17 ->  9 +

Useful for identifying the target shard, when `scylla task_histogram`
indicates a high number of async work items.

To produce the histogram the command goes over all virtual objects in
memory and identifies the source and target queues of each
`seastar::smp_message_queue::async_work_item` object. Practically the
source queue will always be that of the current shard. As this scales
with the number of virtual objects in memory, it can take some time to
run. An alternative implementation would be to instead read the actual
smp queues, but the code of that is scary so I went for the simpler and
more reliable solution.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191028132456.37796-1-bdenes@scylladb.com>
2019-10-28 15:42:55 +02:00
Tomasz Grabiec
3b37027598 Merge "lwt: implement basic lightweight transactions support" from Kostja
This patch set introduces light-weight transactions support to
ScyllaDB. It is a subset of the full series, which adds
basic LWT support and which has been reviewed thus far.
2019-10-28 11:45:28 +01:00
Tomasz Grabiec
f745819ed7 Merge "lwt: paxos protocol implementation" from Gleb
This is paxos implementation for LWT. LWT itself is not included in the
patch so the code is essentially is not wired yet (except read path).
2019-10-28 11:29:40 +01:00
Avi Kivity
f8ba96efcf Merge "test_udt_mutations fixes" from Benny
"
mutation_test/test_udt_mutations kept failing on my machine and I tracked it down to the 3rd patch in this series (use int64_t constants for long_type). While at it, this series also fixes a comment and the end iterator in BOOST_REQUIRE(std::all_of(...))

mutation_test: test_udt_mutations: fixup udt comment
mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
mutation_test: test_udt_mutations: use int64_t constants for long_type

Test: mutation_test(dev, debug)
"

* 'test_udt_mutations-fixes' of https://github.com/bhalevy/scylla:
  mutation_test: test_udt_mutations: use int64_t constants for long_type
  mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
  mutation_test: test_udt_mutations: fixup udt comment
2019-10-28 10:43:52 +02:00
Calle Wilund
36328acf60 cql_assertions: Change signature to accept sstring 2019-10-28 06:16:12 +01:00
Calle Wilund
7d98f735ee cdc: Add static columns to data/preimage mutations
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
19bba5608a cdc: Create and perform a pre-image select for mutations
As well as generate per-image rows in resulting log mutation

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d4ee1938c7 cdc: Add modification record for regular atomic values in mutations
Fills in the data columns for regular columns iff they are
atomic (not unfrozed collections)
2019-10-28 06:16:12 +01:00
Calle Wilund
3fdcbd9dff cdc: Set row op in log
Adds actual operation (part delete, range delete, update) to
cdc log
2019-10-28 06:16:12 +01:00
Calle Wilund
8a6b72f47e cdc: Add pre-image select generator method
Based on a mutation, creates a pre-image select operation.

Note, this uses raw proxy query to shortcut parsing etc,
instead of trying to cache by generated query. Hypothesis is that
this is essentially faster.

The routine assumes all rows in a mutation touch same static/regular
columns. If this is not always true it will need additional
calculations.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d74f32b07a cql3::untyped_result_set: Add constructor from cql3:;result_set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
3ed7a9dd69 cql3::untyped_result_set: Add view getter to make non-intrusive read chaper
Also use in actual data conversion.
2019-10-28 06:16:12 +01:00
Calle Wilund
451bb7447d cdc: Add log / log data column operation types and make data cols tuples of these
Makes static/regular data columns tuple<op, value, ttl> as per spec.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Konstantin Osipov
e555dc502e lwt: implement basic lightweight transactions support
Support single-statement conditional updates and as well as batches.

This patch almost fully rewrites column_condition.cc, implementing
is_satisfied_by().

Most of the remaining complications in column_condition implementation
come from the need to properly handle frozen and multi-cell
collection in predicates - up until now it was not possible
to compare entire collection values between each other. This is further
complicated since multi-cell lists and sets are returned as maps.

We can no longer assume that the columns fetched by prefetch operation
are non-frozen collections. IF EXISTS/IF NOT EXISTS condition
fetches all columns, besides, a column may be needed to check other
condition.

When fetching the old row for LWT or to apply updates on list/columns,
we now calculate precisely the list of columns to fetch.

The primary key columns are also included in CAS batch result set,
and are thus also prefetched (the user needs them to figure out which
statements failed to apply).

The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502
but does deviate from the origin in handling of conditions on static
row cells. This is addressed in future series.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
67e68dabf0 lwt: ensure we don't crash when we get a LIKE 2019-10-27 23:42:49 +03:00
Konstantin Osipov
f8f36d066c lwt: check for unsupported collection type in condition element access
We don't support conditions with element access on non-frozen UDTs,
check that only supported collection types are supplied.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
c9f0adf616 lwt: rewrite cql3::raw::column_condition::prepare()
Restructure the code to avoid quite a bit of code duplication.
2019-10-27 23:42:47 +03:00
Konstantin Osipov
c2217df4d8 lwt: reorganize column_condition declaration and add comments 2019-10-27 23:42:03 +03:00
Konstantin Osipov
22b0240fe7 lwt: remove useless code in column_condition.hh
Each column_condition and raw::column_condition construction case had a
static method wrapping its constructor, simply supplying some defaults.

This neither improves clarity nor maintainability.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
3e25b83391 lwt: propagate if_exists condition from the parser to AST
UPDATE ... IF EXISTS is legal, but IF EXISTS condition
was not propagated from the parser to AST (rad::update_statement).
2019-10-27 23:42:03 +03:00
Konstantin Osipov
df28985295 lwt: introduce cql_statment_opt_metadata
cql_statement_opt_metadata is an interim node
in cql (prepared) statement hierarchy parenting
modification_statement and batch_statement. If there
is IF condition in such statements, they return a result set,
and thus have a result set metadata.

The metadata itself is filled in a subsequent patch.
2019-10-27 23:42:03 +03:00
Vladimir Davydov
c8869e803e lwt: remove commented out validateWhereClauseForConditions
This logic was implemented in validate_where_clause_for_conditions()
method of modification_statement class.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
eb5e82c6a1 lwt: add CAS where clause validation
Add checks for conditional modification statement limitations:
- WHERE clustering_key IN (list) IF condition is not supported
  since a conditions is evaluated for a single row/cell, so
  allowing multiple rows to match the WHERE clause would create
  ambiguity,
- the same is true for conditional range deletions.
- ensure all clustering restrictions are eq for conditional delete

  We must not allow statements like

  create table t(p int, c int, v int, primary key (p, c));
  delete from t where p=1 and c>0 if v=1;

  because there may be more than one statement in a partition satisfying
  WHERE clause, in which case it's unclear which of them should satisfy
  IF condition: all or just one.

  Raising an error on such a statement is consistent with Cassandra's
  behavior.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
203eb3eccc lwt: sleep a random amount of time when retrying CAS
Sleep a random interval between 0 and 100 ms before retrying CAS.
Reuse sleep function, make the distribution object thread local.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
0674fab05c lwt: implement storage_proxy::cas()
Introduce service::cas_request abstract base class
which can be used to parameterize Paxos logic.

Implement storage_proxy::cas() - compare and swap - the storage proxy
entry point for lightweight transactions.
2019-10-27 23:42:03 +03:00
Gleb Natapov
70adf65341 storage_proxy: make mutation holder responsible for mutation operation
Currently the code that manipulates mutations during write need to
check what kind of mutations are those and (sometimes) choose different
code paths. This patch encapsulates the differences in virtual
functions of mutation_holder object, so that high level code will not
concern itself with the details. The functions that are added:
apply_locally(), apply_remotely() and store_hint().
2019-10-27 23:21:51 +03:00
Gleb Natapov
b3e01a45d7 lwt: storage_proxy: implement paxos protocol
This patch adds all functionality needed for Paxos protocol. The
implementation does not strictly adhere to Paxos paper since the original
paper allows setting a value only once, while for LWT we need to be able
to make another Paxos round after "learn" phase completes, which requires
things like repair to be introduced.
2019-10-27 23:21:51 +03:00
Gleb Natapov
8d6201a23b lwt: Add RPC verbs needed for paxos implementation
Paxos protocol has three stages: prepare, accept, learn. This patch adds
rpc verb for each of those stages. To be term compatible with Cassandra
the patch calls those stages: prepare, propose, commit.
2019-10-27 23:21:51 +03:00
Gleb Natapov
d1774693bf lwt: Define state needed by paxos and persist it
Paxos protocol relies on replicas having a state that persists over
crashes/restarts. This patch defines such state and stores it in the
database itself in the paxos table to make it persistent.

The stored state is:
  in_progress_ballot    - promised ballot
  proposal              - accepted value
  proposal_ballot       - the ballot of the accepted value
  most_recent_commit    - most recently learned value
  most_recent_commit_at - the ballot of the most recently learned value
2019-10-27 23:21:51 +03:00
Gleb Natapov
15b935b95d lwt: add data structures needed for paxos implementation
This patch add two data structures that will be used by paxos. First
one is "proposal" which contains a ballot and a mutation representing
a value paxos protocol is trying to set. Second one is
"prepare_response" which is a value returned by paxos prepare stage.
It contains currently accepted value (if any) and most recently
learned value (again if any). The later is used to "repair" replicas
that missed previous "learn" message.
2019-10-27 23:21:51 +03:00
Benny Halevy
1895fb276e mutation_test: test_udt_mutations: use int64_t constants for long_type
Otherwise they are decomposed and serialized as 4-byte int32.

For example, on my machine cell[1] looked like this:
{0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}}

and it failed cells_equal against:
{0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}}

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 20:51:29 +02:00
Benny Halevy
fec772538c mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:49:25 +02:00
Benny Halevy
9c8cf9f51d mutation_test: test_udt_mutations: fixup udt comment
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:47:43 +02:00
Benny Halevy
76581e7f14 docs/debugging.md: fix gdb command for retrieving shared libraries information
This correct command is `info sharedlibrary`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191027153541.27286-1-bhalevy@scylladb.com>
2019-10-27 18:15:09 +02:00
Dejan Mircevski
2a136ba1bc alternator: Fix race condition in set_routes()
server::set_routes() was setting the value of server::_callbacks.
This led to a race condition, as set_routes() is invoked on every
shard simultaneously.  It is also unnecessary, since _callbacks can be
initialized in the constructor.

Fixes #5220.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-27 12:31:24 +02:00
Avi Kivity
27ef73f4f1 Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil
"
Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future.
Use this file to trace reads from SSTable data and index files.

Fixes #4908.
"

* 'traced_file' of https://github.com/kbr-/scylla:
  sstables: report sstable index file I/O in CQL tracing
  sstables: report sstable data file I/O in CQL tracing
  tracing: add traced_file class
2019-10-26 22:53:37 +03:00
Avi Kivity
2b856a7317 Merge "Support non-frozen UDTs." from Kamil
"
This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted.

I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code.

Fixes #2201.
"

* 'udt' of https://github.com/kbr-/scylla: (64 commits)
  tests: too many UDT fields check test
  collection_mutation: add a FIXME.
  tests: add a non-frozen UDT materialized view test
  tests: add a UDT mutation test.
  tests: add a non-frozen UDT "JSON INSERT" test.
  tests: add a non-frozen UDT to for_each_schema_change.
  tests: more non-frozen UDT tests.
  tests: move some UDT tests from cql_query_test.cc to new file.
  types: handle trailing nulls in tuples/UDTs better.
  cql3: enable deleting single fields of non-frozen UDTs.
  cql3: enable setting single fields of a non-frozen UDT.
  cql3: enable non-frozen UDTs.
  cql3: introduce user_types::marker.
  cql3: generalize function_call::make_terminal to UDTs.
  cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
  cql3: use a dedicated setter operation for inserting user types.
  cql3: introduce user_types::value.
  types: introduce to_bytes_opt_vec function.
  cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
  cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
  ...
2019-10-26 22:53:37 +03:00
Piotr Sarna
657e7ef5a5 alternator: add alternator health check
The health check is performed simply by issuing a GET request
to the alternator port - it returns the following status 200
response when the server is healthy:

$ curl -i localhost:8000
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 23
Server: Seastar httpd
Date: 21 Oct 2019 12:55:33 GMT

healthy: localhost:8000

This commit comes with a test.
Fixes #5050
Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>
2019-10-26 18:14:18 +03:00
Botond Dénes
01e913397a tests: memtable_test: flush_reader_test: compare compacted mutations
To filter out artificial differences due to different representation of
an equivalent set of writes.

Fixes: #5207

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191024103718.29266-1-bdenes@scylladb.com>
2019-10-26 18:14:18 +03:00
Kamil Braun
432ef7c9af sstables: report sstable index file I/O in CQL tracing
Use tracing::make_traced_file when reading from the index file in
index_reader.
2019-10-25 14:10:28 +02:00
Kamil Braun
394c36835a sstables: report sstable data file I/O in CQL tracing
Use tracing::make_traced_file when creating an sstable input_stream.
To achieve that, trace_state needs to be plumbed down through some
functions.
2019-10-25 14:10:28 +02:00
Kamil Braun
a8c9d1206a tracing: add traced_file class
This is a thin wrapper over the `seastar::file` class which adds
CQL trace messages before and after I/O operations.
2019-10-25 14:10:24 +02:00
Kamil Braun
2889edea3e tests: too many UDT fields check test 2019-10-25 12:05:10 +02:00
Kamil Braun
adfc04ebec collection_mutation: add a FIXME.
We could use iterators over cells instead of a vector of cells
in collection_mutation(_view)_description. Then some use cases could
provide iterators that construct the cells "on the fly".
2019-10-25 12:05:10 +02:00
Kamil Braun
45d2a96980 tests: add a non-frozen UDT materialized view test 2019-10-25 12:05:10 +02:00
Kamil Braun
e0c233ede1 tests: add a UDT mutation test. 2019-10-25 12:05:08 +02:00
Kamil Braun
a21d12faae tests: add a non-frozen UDT "JSON INSERT" test. 2019-10-25 12:04:44 +02:00
Kamil Braun
ae3464da45 tests: add a non-frozen UDT to for_each_schema_change. 2019-10-25 12:04:44 +02:00
Kamil Braun
b87b700e66 tests: more non-frozen UDT tests. 2019-10-25 12:04:44 +02:00
Kamil Braun
474742ac5d tests: move some UDT tests from cql_query_test.cc to new file. 2019-10-25 12:04:44 +02:00
Kamil Braun
612de1f4e3 types: handle trailing nulls in tuples/UDTs better.
Comparing user types after adding new fields was bugged.
In the following scenario:

create type ut (a int);
create table cf (a int primary key, b frozen<ut>);
insert into cf (a, b) values (0, (0));
alter type ut add b int;
select * from cf where b = {a:0,b:null};

the row with a = 0 should be returned, even though the value stored in the database is shorter
(by one null) than the value given by the user. Until now it wouldn't
have.
2019-10-25 12:04:44 +02:00
Kamil Braun
1a9034e38a cql3: enable deleting single fields of non-frozen UDTs.
This was already possible by setting the field to null, but now it
supports the DELETE syntax.
2019-10-25 12:04:44 +02:00
Kamil Braun
4d271051dd cql3: enable setting single fields of a non-frozen UDT.
The commit introduces the necessary modifications to the grammar,
a set_field raw operation, and a setter_by_field operation.
2019-10-25 12:04:44 +02:00
Kamil Braun
e74b5deb5d cql3: enable non-frozen UDTs.
Add a cluster feature for non-frozen UDTs.

If the cluster supports non-frozen UDTs, do not return an error
message when trying to create a table with a non-frozen user type.
2019-10-25 12:04:44 +02:00
Kamil Braun
7ac7a3994d cql3: introduce user_types::marker.
cql3::user_types::marker is a dedicated cql3::abstract_marker for user
type placeholders in prepared CQL queries. When bound, it returns a
user_types::value.
2019-10-25 12:04:44 +02:00
Kamil Braun
36999c94f4 cql3: generalize function_call::make_terminal to UDTs.
Use the dedicated user_types::value.
There is no way this code can be executed now, so I left a TODO.
2019-10-25 12:04:44 +02:00
Kamil Braun
49a7461345 cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
For user types, use its dedicated setter and value.
2019-10-25 12:04:44 +02:00
Kamil Braun
40f9ce2781 cql3: use a dedicated setter operation for inserting user types.
cql3::user_types::setter is a dedicated cql3::operation
for inserting and updating user types. It handles the multi-cell
(non-frozen) case.
2019-10-25 12:04:44 +02:00
Kamil Braun
51be1e3e9d cql3: introduce user_types::value.
This is a dedicated multi_item_terminal for user type values.
Will be useful in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
abe6c2d3d2 types: introduce to_bytes_opt_vec function.
It converts a vector<bytes_view_opt> to a vector<bytes_opt>.
Used in a bunch of places.
2019-10-25 12:04:44 +02:00
Kamil Braun
8ff2aebd76 cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
Previously it returned vector<cql3::raw_value>, even though we don't use
unset values when setting a UDT value (fields that are not provided
become nulls. Thats how C* does it).
This simplifies future implementation of user_types::{value, setter}.
2019-10-25 12:04:44 +02:00
Kamil Braun
f0a3af6adc cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
This is used in error messages and may be useful.
2019-10-25 12:04:44 +02:00
Kamil Braun
c89de228e3 cql3: generalize some error messages to UDTs 2019-10-25 12:04:44 +02:00
Kamil Braun
fd3bc27418 cql3: disallow non-frozen UDTs when creating secondary indexes 2019-10-25 12:04:44 +02:00
Kamil Braun
ff0bd0bb7a cql3: check for nested non-frozen UDTs in create_type_statement. 2019-10-25 12:04:44 +02:00
Kamil Braun
adf857e9ed cql3: add cql3_type::is_user_type.
This will be used in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
6ccb1ee19f cql3: generalize create_table_statement::raw_statement::prepare to UDTs.
Check for UDT with nested non-frozen collection.
Check for UDT with COMPACT STORAGE.
Check for UDT inside PRIMARY KEY.
2019-10-25 12:04:44 +02:00
Kamil Braun
a8c7670722 types: add multi_cell field to user_type_impl.
is_value_compatible_with_internal and update_user_type were generalized
to the non-frozen case.

For now, all user_type_impls in the code are non-multi-cell (frozen).
This will be changed in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
b904d04925 cql3: add a TODO to implement column_conditions for UDTs.
This will become relevant after LWT is implemented.
2019-10-25 12:04:44 +02:00
Kamil Braun
44534a4a0a sstables: generalize some comments to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
b38b8af0f2 schema: generalize compound_name to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
270cf2b289 query-result-set: generalize result_set_builder to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
2ada219f2c view: generalize create_virtual_column and maybe_make_virtual to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
574e1cd514 tests: generalize timestamp_based_spliiting_writer and bucket_writer to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
6da89e40df tests: generalize random_schema.cc:generate_collection to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
0fbfb67cbb tests: generalize mutation_test.cc summaries to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
a3a2f65fbf types: generalize serialize_for_cql to UDTs.
Also introduces a helper "linearized" function, which implements
a pattern occurring in all serialize_for_cql_aux functions.
2019-10-25 12:04:44 +02:00
Kamil Braun
05d4b2e1a4 tests: generalize data_model.cc:mutation_description::build to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
338fde672a mp_row_consumer: generalize consume_cell (kl) and consume_column (mc) to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
5e447e3250 mutation_partition_view: generalize read_collection_cell to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
90927c075a converting_mutation_partition_applier: generalize accept_cell to UDTs. 2019-10-25 12:04:42 +02:00
Kamil Braun
d9baff0e4b collection_mutation: generalize collection_mutation.cc:difference to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
a344019b25 collection_mutation: generalize collection_mutation_view::last_update to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
691f00408d collection_mutation: generalize merge to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
7f5cd8e8ce collection_mutation: generalize collection_mutation_view_description::materialize to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
20b42b1155 collection_mutation: generalize collection_mutation_view::is_any_live to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
323370e4ba collection_mutation: generalize deserialize_collection_mutation to UDTs. 2019-10-25 10:49:19 +02:00
Kamil Braun
393974df3b cql3: make {lists,maps,sets}::value::from_serialized take const {}_type&.
This will simplify the code a bit where from_serialized is used
after switching to visitors. Also reduces the number of shared_ptr
copies.
2019-10-25 10:49:19 +02:00
Kamil Braun
4327bba0db types: introduce (de)serialize_field_index functions.
These functions are used to translate field indices, which are used to
identify fields inside UDTs, from/to a serialized representation to be
stored inside sstables and mutations.
They do it in a way that is compatible with C*.
2019-10-25 10:49:19 +02:00
Kamil Braun
90d05eb627 cql3: reject too long user-defined types 2019-10-25 10:49:19 +02:00
Kamil Braun
0f8f950b74 cql3: optimize multi_item_terminal::get_elements().
Now it returns const std::vector<bytes_opt>& instead of
std::vector<bytes_opt>.
2019-10-25 10:49:19 +02:00
Kamil Braun
4374982de0 types: collection_type_impl::to_value becomes serialize_for_cql.
The purpose of collection_type_impl::to_value was to serialize a
collection for sending over CQL. The corresponding function in origin
is called serializeForNativeProtocol, but the name is a bit lengthy,
so I settled for serialize_for_cql.

The method now became a free-standing function, using the visit
function to perform a dispatch on the collection type instead
of a virtual call. This also makes it easier to generalize it to UDTs
in future commits.

Remove the old serialize_for_native_protocol with a FIXME: implement
inside. It was already implemented (to_value), just called differently.

remove dead methods: enforce_limit and serialized_values. The
corresponding methods in C* are auxiliary methods used inside
serializeForNativeProtocol. In our case, the entire algorithm
is wholly written in serialize_for_cql.
2019-10-25 10:49:19 +02:00
Kamil Braun
e5c0a992ef cql3: make cql3_type::raw::to_string private.
It only needs to be used in operator<<, which is a friend
of cql3_type::raw.
2019-10-25 10:42:58 +02:00
Kamil Braun
ff4d857a9d cql3: remove a dynamic_pointer_cast to user_type_impl.
There exists a method to check if something is a user type:
is_user_type(); use it instead.
2019-10-25 10:42:58 +02:00
Kamil Braun
d8f8908d34 types: introduce user_type_impl::idx_of_field method.
Each field of a user type has its index inside the type.
This method allows to find it easily, which is needed in a bunch of
places.
2019-10-25 10:42:58 +02:00
Kamil Braun
c77643a345 cql3: make cql3_type::_frozen protected. Add is_frozen() method.
Noone modifies _frozen from the outside.
Moving the field to `protected` makes it harder to introduce bugs.
2019-10-25 10:42:58 +02:00
Kamil Braun
d83ebe1092 collection_mutation: move collection_type_impl::difference to collection_mutation.hh. 2019-10-25 10:42:58 +02:00
Kamil Braun
7e3bbe548c collection_mutation: move collection_type_impl::merge to collection_mutation.hh. 2019-10-25 10:42:58 +02:00
Kamil Braun
a41277a7cd collection_mutation: move collection_type_impl::last_update to collection_mutation_view 2019-10-25 10:42:58 +02:00
Kamil Braun
30802f5814 collection_mutation: move collection_type_impl::is_any_live to collection_mutation_view 2019-10-25 10:42:58 +02:00
Kamil Braun
e16ba76c2e collection_mutation: move collection_type_impl::is_empty to collection_mutation_view. 2019-10-25 10:42:58 +02:00
Kamil Braun
bbdb438d89 collection_mutation: easier (de)serialization of collection_mutation(s).
`collection_type_impl::serialize_mutation_form`
became `collection_mutation(_view)_description::serialize`.

Previously callers had to cast their data_type down to collection_type
to use serialize_mutation_form. Now it's done inside `serialize`.
In the future `serialize` will be generalized to handle UDTs.

`collection_type_impl::deserialize_mutation_form`
became a free standing function `deserialize_collection_mutation`
with similiar benefits. Actually, noone needs to call this function
manually because of the next paragraph.

A common pattern consisting of linearizing data inside a `collection_mutation_view`
followed by calling `deserialize_mutation_form` has been abstracted out
as a `with_deserialized` method inside collection_mutation_view.

serialize_mutation_form_only_live was removed,
because it hadn't been used anywhere.
2019-10-25 10:42:58 +02:00
Kamil Braun
e4101679e4 collection_mutation: generalize constructor of collection_mutation to abstract_type.
The constructor doesn't use anything specific to collection_type_impl.
In the future it will also handle non-frozen user types.
2019-10-25 10:42:58 +02:00
Kamil Braun
b1d16c1601 types: move collection_type_impl::mutation(_view) out of collection_type_impl.
collection_type_impl::mutation became collection_mutation_description.
collection_type_impl::mutation_view became collection_mutation_view_description.
These classes now reside inside collection_mutation.hh.

Additional documentation has been written for these classes.

Related function implementations were moved to collection_mutation.cc.

This makes it easier to generalize these classes to non-frozen UDTs in future commits.
The new names (together with documentation) better describe their purpose.
2019-10-25 10:19:45 +02:00
Kamil Braun
c0d3e6c773 atomic_cell: move collection_mutation(_view) to a new file.
The classes 'collection_mutation' and 'collection_mutation_view'
were moved to a separate header, collection_mutation.hh.

Implementations of functions that operate on these classes,
including some methods of collection_type_impl, were moved
to a separate compilation unit, collection_mutation.cc.

This makes it easier to modify these structures in future commits
in order to generalize them for non-frozen User Defined Types.

Some additional documentation has been written for collection_mutation.
2019-10-25 10:19:45 +02:00
Kamil Braun
c90ea1056b Remove mutation_partition_applier.
It had been replaced by partition_builder
in commit dc290f0af7.
2019-10-25 10:19:45 +02:00
Asias He
f32ae00510 gossip: Limit number of pending gossip ACK2 messages
Similar to "gossip: Limit number of pending gossip ACK messages", limit
the number of pending gossip ACK2 messages in gossiper::handle_ack_msg.

Fixes #5210
2019-10-25 12:44:28 +08:00
Asias He
15148182ab gossip: Limit number of pending gossip ACK messages
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.

To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.

Fixes #5210
2019-10-25 12:44:28 +08:00
Nadav Har'El
8bffb800e1 alternator: Use system_auth.roles for alternator authorization
Merged patch series from Piotr Sarna:

This series couples system_auth.roles with authorization routines
in alternator. The `salted_hash` field, which is every user's hashed
password, is used as a secret key for the signature generation
in alternator.
This series also adds related expiration verifications for alternator
signatures.
It also comes with more test cases and docs updates.

Tests: alternator(local, remote), manual

Piotr Sarna (11):
  alternator: add extracting key from system_auth.roles
  alternator: futurize verify_signature function
  alternator: move the api handler to a separate function
  alternator: use keys from system_auth.roles for authorization
  alternator: add key cache to authorization
  alternator-test: add a wrong password test
  alternator: verify that the signature has not expired
  alternator: add additional datestamp verification
  alternator-test: add tests for expired signatures
  docs: update alternator entry for authorization
  alternator-test: add authorization to README

 alternator-test/conftest.py                |   2 +-
 alternator-test/test_authorization.py      |  44 ++++++++-
 alternator-test/test_describe_endpoints.py |   2 +-
 alternator/auth.hh                         |  15 ++-
 alternator/server.hh                       |  10 +-
 alternator/auth.cc                         |  62 +++++++++++-
 alternator/server.cc                       | 106 ++++++++++++---------
 alternator-test/README.md                  |  28 ++++++
 docs/alternator/alternator.md              |   7 +-
 9 files changed, 221 insertions(+), 55 deletions(-)
2019-10-23 20:51:08 +03:00
Tomasz Grabiec
e621db591e Merge "Fix TTL serialization breakage" from Avi
ommit 93270dd changed gc_clock to be 64-bit, to fix the Y2038
problem. While 64-bit tombstone::deletion_time is serialized in a
compatible way, TTLs (gc_clock::duration) were not.

This patchset reverts TTL serialization to the 32-bit serialization
format, and also allows opting-in to the 64-bit format in case a
cluster was installed with the broken code. Only Scylla 3.1.0 is
vulnerable.

Fixes #4855

Tests: unit (dev)
2019-10-23 18:23:26 +02:00
Tomasz Grabiec
71720be4f7 Merge "storage_service: Reject nodetool cleanup when there is pending ranges" from Asias
From Shlomi:

4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)

I get the error on c-s once decommission ends
  java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated
The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.

To fix, we should reject the nodetool cleanup when there is pending ranges on that node.

Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.

Refs: #5045
2019-10-23 17:45:41 +02:00
Avi Kivity
2970578677 config: add configuration option for 3.1.0 heritage clusters
Scylla 3.1.0 broke the serialization format for TTLs. Later versions
corrected it, but if a cluster was originally installed as 3.1.0,
it will use the broken serialization forever. This configuration option
allows upgrades from 3.1.0 to succeed, by enabling the broken format
even for later versions.
2019-10-23 18:36:35 +03:00
Avi Kivity
bf4c319399 gc_clock, serialization: define new serialization for gc_clock::duration (aka TTLs)
Scylla 3.1.0 inadvertently changed the serialization format of TTLs
(internally represented as gc_clock::duration) from 32-bit to 64-bit,
as part of preparation for Y2038 (which comes earlier for TTLed cells).
This breaks mutations transported in a mixed cluster.

To fix this, we revert back to the 32-bit format, unless we're in a 3.1.0-
heritage cluster, in which case we use the 64-bit format. Overflow of
a TTL is not a concern, since TTLs are capped to 20 years by the TTL layer.
An assertion is added to verify this.

This patch only defines a variable to indicate we're in
a 3.1.0 heritage cluster, but a way to set it is left to
a later patch.
2019-10-23 18:36:33 +03:00
Avi Kivity
771e028c1a Update seastar submodule
* seastar 6bcb17c964...2963970f6b (4):
  > Merge "IPv6 scope support and network interface impl" from Calle
  > noncopyable_function: do not copy uninitialized data
  > Merge "Move smp and smp queue out of reactor" from Asias
  > Consolidate posix socket implementations
2019-10-23 16:43:02 +03:00
Piotr Sarna
472e3cb4e1 alternator-test: add authorization to README
The README paragraph informs about turning on authorization with:
   alternator-enforce-authorization: true
and has a short note on how to set up the secret key for tests.
2019-10-23 15:05:39 +02:00
Piotr Sarna
280eb28324 docs: update alternator entry for authorization
The document now mentions that secret keys are extracted
from the system_auth.roles table.
2019-10-23 15:05:39 +02:00
Piotr Sarna
ebb0af3500 alternator-test: add tests for expired signatures
The first test case ensures that expired signatures are not accepted,
while the second one checks that signatures with dates that reach out
too far into the future are also refused.
2019-10-23 15:05:39 +02:00
Piotr Sarna
a0a33ae4f3 alternator: add additional datestamp verification
The authorization signature contains both a full obligatory date header
and a shortened datestamp - an additional verification step ensures that
the shortened stamp matches the full date.
2019-10-23 15:05:39 +02:00
Piotr Sarna
718cba10a1 alternator: verify that the signature has not expired
AWS signatures have a 15min expiration policy. For compatibility,
the same policy is applied for alternator requests. The policy also
ensures that signatures expanding more than 15 minutes into the future
are treated as unsafe and thus not accepted.
2019-10-23 15:05:39 +02:00
Piotr Sarna
e90c4a8130 alternator-test: add a wrong password test
The additional test case submits a request as a user that is expected
to exist (in the local setup), but the provided password is incorrect.
It also updates test_wrong_key_access so it uses an empty string
for trying to authenticate as an inexistent user - in order to cover
more corner cases.
2019-10-23 15:05:39 +02:00
Piotr Sarna
524b03dea5 alternator: add key cache to authorization
In order to avoid fetching keys from system_auth.roles system table
on every request, a cache layer is introduced. And in order not to
reinvent the wheel, the existing implementation of loading_cache
with max size 1024 and a 1 minute timeout is used.
2019-10-23 15:05:39 +02:00
Piotr Sarna
6dee7737d7 alternator: use keys from system_auth.roles for authorization
Instead of having a hardcoded secret key, the server now verifies
an actual key extracted from system_auth.roles system table.
This commit comes with a test update - instead of 'whatever':'whatever',
the credentials used for a local run are 'alternator':'secret_pass',
which matches the initial contents of system_auth.roles table,
which acts as a key store.

Fixes #5046
2019-10-23 15:05:39 +02:00
Piotr Sarna
388b492040 alternator: move the api handler to a separate function
The lambda used for handling the api request has grown a little bit
too large, so it's moved to a separate method. Along with it,
the callbacks are now remembered inside the class itself.
2019-10-23 15:05:39 +02:00
Piotr Sarna
a93cf12668 alternator: futurize verify_signature function
The verify_signature utility will later be coupled with Scylla
authorization. In order to prepare for that, it is first transformed
into a function that returns future<>, and it also becomes a member
of class server. The reason it becoming a member function is that
it will make it easier to implement a server-local key cache.
2019-10-23 15:05:39 +02:00
Piotr Sarna
dc310baa2d alternator: add extracting key from system_auth.roles
As a first step towards coupling alternator authorization with Scylla
authorization, a helper function for extracting the key (salted_hash)
belonging to the user is added.
2019-10-23 15:05:39 +02:00
Asias He
f876580740 storage_service: Reject nodetool cleanup when there is pending ranges
From Shlomi:

4 node cluster Node A, B, C, D (Node A: seed)
cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node>
cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node>
while read is progressing
Node D: nodetool decommission
Node A: nodetool status node - wait for UL
Node A: nodetool cleanup (while decommission progresses)

I get the error on c-s once decommission ends
  java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated

The problem is when a node gets new ranges, e.g, the bootstrapping node, the
existing nodes after a node is removed or decommissioned, nodetool cleanup will
remove data within the new ranges which the node just gets from other nodes.

To fix, we should reject the nodetool cleanup when there is pending ranges on that node.

Note, rejecting nodetool cleanup is not a full protection because new ranges
can be assigned to the node while cleanup is still in progress. However, it is
a good start to reject until we have full protection solution.

Refs: #5045
2019-10-23 19:20:36 +08:00
Asias He
a39c8d0ed0 Revert "storage_service: remove storage_service::_is_bootstrap_mode."
It will be needed by "storage_service: Reject nodetool cleanup when
there is pending ranges"

This reverts commit dbca327b46.
2019-10-23 19:20:36 +08:00
Raphael S. Carvalho
fc120a840d compaction: dont rely on undefined behavior when making garbage collected writer
Argument evaluation order is UB, so it's not guaranteed that
c->make_garbage_collected_sstable_writer() is called before
compaction is moved to run().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191023052647.9066-1-raphaelsc@scylladb.com>
2019-10-23 11:04:51 +03:00
Benny Halevy
3b3611b57a mutation_diff: standard input support
Also, not that the file name is properly quoted
it may contain space characters.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:58 +03:00
Benny Halevy
6feb4d5207 mutation_diff: accept diff_command option
To support using other diff tools than colordiff

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:47 +03:00
Tomasz Grabiec
dfac542466 Merge "extend multi-cell list & set type support" from Kostja
Make it possible to compare multi-cell lists and sets serialized
as maps with literal values and serialize them to network using
a standard format (vector of values).

This is a pre-requisite patch for column condition evaluation
in light-weight transactions.
2019-10-23 07:39:57 +03:00
Nadav Har'El
774f8aa4b8 docs/debugging.md: add guide on how to debug cores
Merged patch series from Botond Dénes:

This series extends the existing docs/debugging.md with a detailed guide
on how to debug Scylla coredumps. The intended target audience is
developers who are debugging their first core, hence the level of
details (hopefully enough). That said this should be just as useful for
seasoned debuggers just quickly looking up some snippet they can't
remember exactly. A Throubleshooting chapter is also added in this
series for commonly-met problems.

I decided to create this guide after myself having struggled for more
than a day on just opening(!) a coredump that was produced on Ubuntu.
As my main source, I used the How-to-debug-a-coredump page from the
internal wiki which contains many useful information on debugging
coredumps, however I found it to be missing some crucial information, as
well being very terse, thus being primarily useful for experienced
debuggers who can fill in the blanks. The reason I'm not extending said
wiki page is that I think this information should not be hidden in some
internal wiki page. Also, docs/debugging.md now seems to be a much
better base for such a document. This document was started as a
comprehensive debugging manual for beginners (but not just).

You will notice that the information on how to debug cores from
CentOS/Redhat are quite sparse. This is because I have no experience
with such cores, so for now the respective chapters are just stubs. I
intend to complete them in the future after having gained the necessary
experience and knowledge, however those being in possession of said
knowledge are more then welcome to send a patch. :)

Botond Dénes (4):
	docs/debugging.md: demote 'Starting GDB' and 'Using GDB'
	docs/debugging.md: fix formatting issues
	docs/debugging.md: add 'Debugging coredumps' subchapter
	docs/debugging.md: add 'Throubleshooting' subchapter

  docs/debugging.md | 240 +++++++++++++++++++++++++++++++++++++++++++---
  1 file changed, 228 insertions(+), 12 deletions(-)
2019-10-23 07:39:57 +03:00
Rafael Ávila de Espíndola
b3372be679 install-dependencies: Add Lua
Add lua as a dependency in preparation for UDF. This is the first
patch since it has to go in before to allow for a frozen toolchain
update.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
[avi: update frozen toolchain image]
Message-Id: <20191018231442.11864-2-espindola@scylladb.com>
2019-10-23 07:39:57 +03:00
Konstantin Osipov
a30c08e04e lwt: support for multi-cell set & list value serialization 2019-10-22 17:40:42 +03:00
Piotr Jastrzebski
eb8ae06ced cdc: Return db_context::builder by reference
from it's with_* functions.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-22 17:13:43 +03:00
Konstantin Osipov
605755e3f6 lwt: support for multi-cell map & list comparison with literal values
Multi-cell lists and maps may be stored in different formats: as sorted
vectors of pairs of values, when retreived from storage, or as sorted
vectors of values, when created from parser literals or supplied as
parameter values.

Implement a specialized compare for use when receiver and paramter
representation don't match.

Add helpers.
2019-10-22 17:07:33 +03:00
Raphael S. Carvalho
3b6583990d sstables: Fix sluggish backlog controller with incremental compaction
The problem is that backlog tracker is not being updated properly after
incremental compaction.
When replacing sstables earlier, we tell backlog tracker that we're done
with exhausted sstables[1], but we *don't* tell it about the new, sealed
sstables created that will replace the exhausted ones.
[1]: exhausted sstable is one that can be replaced earlier by compaction.
We need to notify backlog tracker about every sstable replacement which
was triggered by incremental compaction.
Otherwise, backlog for a table that enables incremental compaction will
be lower than it actually should. That's because new sstables being
tracked as partial decrease the backlog, whereas the exhausted ones
increase it.
The formula for a table's backlog is basically:
backlog(sstable set + compacting(1) - partial(2))
(1) compacting includes all compaction's input sstables, but the
exhausted ones are removed from it (correct behavior).
(2) partial includes all compaction's output sstables, but the ones
that replaced the exhausted sstables aren't removed from it (incorrect
behavior).
This problem is fixed by making backlog track *fully* aware of the early
replacement, not only the exhausted sstables, but also the new sstables
that replaced the exhausted ones. The new sstables need to be moved
inside the tracker from partial state to the regular one.

Fixes #5157.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191016002838.23811-1-raphaelsc@scylladb.com>
2019-10-22 16:19:57 +03:00
Vladimir Davydov
6c6689f779 cql: refactor statement accounting
Rather than passing a pointer to a cql_stats member corresponding to
the statement type, pass a reference to a cql_stats object and use
statement_type, which is already stored in modification_statement, for
determining which counter to increment. This will allow us to account
conditional statements, which will have a separate set of counters,
right in modification_statement::execute() - all we'll need to do is
add the new counters and bump them in case execute_with_condition is
called.

While we are at it, remove extra inclusions from statement_type.hh so as
not to introduce any extra dependencies for cql_stats.hh users.

Message-Id: <20191022092258.GC21588@esperanza>
2019-10-22 12:39:14 +03:00
Nadav Har'El
51fc6c7a8e make static_row optional to reduce memory footprint
Merged patch series from Avi Kivity:

The static row can be rare: many tables don't have them, and tables
that do will often have mutations without them (if the static row
is rarely updated, it may be present in the cache and in readers,
but absent in memtable mutations). However, it always consumes ~100
bytes of memory, even if it not present, due to row's overhead.

Change it to be optional by allocating it as an external object rather
than inlined into mutation_partition. This adds overhead when the
static row is present (17 bytes for the reference, back reference,
and lsa allocator overhead).

perf_simple_query appears to marginally (2%) faster. Footprint is
reduced by ~9% for a cache entry, 12% in memtables. More details are
provided in the patch commitlog.

Tests: unit (debug)

Avi Kivity (4):
  managed_ref: add get() accessor
  managed_ref: add external_memory_usage()
  mutation_partition: introduce lazy_row
  mutation_partition: make static_row optional to reduce memory
    footprint

 cell_locking.hh                          |   2 +-
 converting_mutation_partition_applier.hh |   4 +-
 mutation_partition.hh                    | 284 ++++++++++++++++++++++-
 partition_builder.hh                     |   4 +-
 utils/managed_ref.hh                     |  12 +
 flat_mutation_reader.cc                  |   2 +-
 memtable.cc                              |   2 +-
 mutation_partition.cc                    |  45 +++-
 mutation_partition_serializer.cc         |   2 +-
 partition_version.cc                     |   4 +-
 tests/multishard_mutation_query_test.cc  |   2 +-
 tests/mutation_source_test.cc            |   2 +-
 tests/mutation_test.cc                   |  12 +-
 tests/sstable_mutation_test.cc           |  10 +-
 14 files changed, 355 insertions(+), 32 deletions(-)
2019-10-22 12:25:15 +03:00
Avi Kivity
bc03b0fd47 Merge "Some refactoring of node startup code" from Kamil
"
The node startup code (in particular the functions storage_service::prepare_to_join and storage_service::join_token_ring) is complicated and hard to understand.

This patch set aims to simplify it at least a bit by removing some dead code, moving code around so it's easier to understand and adding some comments that explain what the code does.
I did it to help me prepare for implementing generation and gossiping of CDC streams.
"

* 'bootstrap-refactors' of https://github.com/kbr-/scylla:
  storage_service: more comments in join_token_ring
  db: remove system_keyspace::update_local_tokens
  db: improve documentation for update_tokens and get_saved_tokens in system_keyspace
  storage_service: remove storage_service::_is_bootstrap_mode.
  storage_service: simplify storage_service::bootstrap method
  storage_service: fix typo in handle_state_moving
  storage_service: remove unnecessary use of stringstream
  storage_service: remove redundant call to update_tokens during join_token_ring
  storage_service: remove storage_service::set_tokens method.
  storage_service: remove is_survey_mode
  storage_service::handle_state_normal: tokens_to_update* -> owned_tokens
  storage_service::handle_state_normal: remove local_tokens_to_remove
  db::system_keyspace::update_tokens: take tokens by const ref
  db::system_keyspace::prepare_tokens: make static, take tokens by const ref
  token_metadata::update_normal_tokens: take tokens by const ref
2019-10-22 12:11:11 +03:00
Asias He
0a52ecb6df gossip: Fix max generation drift measure
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes #5164
2019-10-21 20:20:55 +02:00
Kamil Braun
f1c26bf5c9 storage_service: more comments in join_token_ring
Explain why a call to update_normal_tokens is needed.
2019-10-21 11:11:03 +02:00
Kamil Braun
fb1e35f032 db: remove system_keyspace::update_local_tokens
That was dead code.
2019-10-21 11:11:03 +02:00
Kamil Braun
1b0c8e5d99 db: improve documentation for update_tokens and get_saved_tokens in system_keyspace 2019-10-21 11:11:03 +02:00
Kamil Braun
dbca327b46 storage_service: remove storage_service::_is_bootstrap_mode.
The flag did nothing. It was used in one place to check if there's a
bug, but it can easily by proven by reading the code that the check
would never pass.
2019-10-21 11:11:03 +02:00
Kamil Braun
b757a19f84 storage_service: simplify storage_service::bootstrap method
The storage_service::bootstrap method took a parameter: tokens to
bootstrap with. However, this method is only called in one place
(join_token_ring) with only one parameter: _bootstrap_tokens. It doesn't
make sense to call this method anywhere else with any other parameter.

This commit also adds a comment explaining what the method does and
moves it into the private section of storage_service.
2019-10-21 11:11:03 +02:00
Kamil Braun
84b41bd89b storage_service: fix typo in handle_state_moving 2019-10-21 11:11:03 +02:00
Kamil Braun
2ff4f9b8f4 storage_service: remove unnecessary use of stringstream 2019-10-21 11:11:03 +02:00
Kamil Braun
06cc7d409d storage_service: remove redundant call to update_tokens during join_token_ring
When a non-seed node was bootstrapping, system_keyspace::update_tokens
was called twice: first right after the tokens were generated (or
received if we were replacing a different node) in the call to
`bootstrap`, and then later in join_token_ring. The second call was
redundant.

The join_token_ring call was also redundant if we were not bootstrapping
and had tokens saved previously (e.g. when restarting). In that case we
would have read them from LOCAL and then save the same tokens again.

This commit removes the redundant call and inserts calls to
update_tokens where they are necessary, when new tokens are generated.
The aim is to make the code easier to understand.

It also adds a comment which explains why the tokens don't need to be
generated in one of the cases.
2019-10-21 11:11:03 +02:00
Kamil Braun
a223864f81 storage_service: remove storage_service::set_tokens method.
After commit 36ccf72f3c, this method
was used only in one place.
Its name did not make it obvious what it does and when is it safe to call it.
This commit pulls out the code from set_tokens to the point where it was
called (join_token_ring). The code is only possible to understand in
context.

This code was also saving the tokens to the LOCAL table before
retrieving them from this table again. There is no point in doing that:
1. there are no races, since when join_token_ring is running, it is the
only function which can call system_keyspace::update_tokens (which saves them to the
LOCAL table). There can be no multiple instances of join_token_ring.
2. Even if there was a race, this wouldn't fix anything. The tokens we
retrieve from LOCAL by calling get_local_tokens().get0() could already
be different in the LOCAL table when the get0() returns.
2019-10-21 11:09:59 +02:00
Kamil Braun
36ccf72f3c storage_service: remove is_survey_mode
That was dead, untested code, making it unnecessarily hard
to implement new features.
2019-10-21 10:38:49 +02:00
Kamil Braun
602c7268cc storage_service::handle_state_normal: tokens_to_update* -> owned_tokens
Replace the two variables:
    tokens_to_update_in_metadata
    tokens_to_update_in_system_keyspace
which were exactly the same, with one variable owned_tokens.
The new name describes what the variable IS instead what's it used for.

Add a comment to clarify what "owned" means: those are the tokens the
node chose and any collision was resolved positively for this node.

Move the variable definition further down in the code, where it's
actually needed.
2019-10-21 10:38:49 +02:00
Kamil Braun
2db07c697f storage_service::handle_state_normal: remove local_tokens_to_remove
That was dead code.
Removing tokens is handled inside remove_endpoint, using the
endpoints_to_remove set.
2019-10-21 10:38:49 +02:00
Kamil Braun
8c8a17a0fe db::system_keyspace::update_tokens: take tokens by const ref 2019-10-21 10:38:49 +02:00
Kamil Braun
00dcea3478 db::system_keyspace::prepare_tokens: make static, take tokens by const ref 2019-10-21 10:38:49 +02:00
Kamil Braun
e4ac4db1c5 token_metadata::update_normal_tokens: take tokens by const ref 2019-10-21 10:38:45 +02:00
Nadav Har'El
765dc86de4 Fix legacy token column handling for local indexes
Merged patch series from Piotr Sarna:

Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.

Branch: master, 3.1

Tests: unit(dev) + manual confirmation that a correct legacy column is picked
2019-10-20 16:04:40 +03:00
Nadav Har'El
631846a852 CDC: Implement minimal version that logs only primary key of each change
Merge a patch series from Piotr Jastrzębski (haaawk):

This PR introduces CDC in it's minimal version.

It is possible now to create a table with CDC enabled or to enable/disable
CDC on existing table. There is a management of CDC log and description
related to enabling/disabling CDC for a table.

For now only primary key of the changed data is logged.

To be able to co-locate cdc streams with related base table partitions it
was needed to propagate the information about the number of shards per node.
This was node through gossip.

There is an assumption that all the nodes use the same value for
sharding_ignore_msb_bits. If it does not hold we would have to gossip
sharding_ignore_msb_bits around together with the number of shards.

Fixes #4986.

Tests: unit(dev, release, debug)
2019-10-20 11:41:01 +03:00
Botond Dénes
4aa734f238 scylla-gdb.py: scylla generate_object_graph: use correct obj in edges
Currently, the function that generates the graph edges (and vertices)
with a breadth-first traversal of the object graph accidentally uses the
object that is the starting point of the graph as the "to" part of each
edge. This results in the graph having each of its edges point to the
starting point, as if all objects in it referenced said object directly.
Fix by using the object of the currently examined object.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191018113019.95093-1-bdenes@scylladb.com>
2019-10-18 13:48:20 +02:00
Botond Dénes
4dff50b7a4 docs/debugging.md: add 'Throubleshooting' subchapter
To the 'Debugging Scylla with GDB' chapter.
2019-10-18 10:08:23 +03:00
Botond Dénes
77ea086975 docs/debugging.md: add 'Debugging coredumps' subchapter
To the 'Debuggin Scylla with GDB` chapter. The '### Debugging
relocatable binaries built with the toolchain' subchapter is demoted to
be just a section in this new subchapter. It is also renamed to
'Relocatable binaries'.
This subchapter intends to be a complete guide on how to debug coredumps
from how to obtain the correct version of all the binaries all the way
to how to correctly open the core with GDB.
2019-10-18 10:08:23 +03:00
Pekka Enberg
f01d0e011c Update seastar submodule
* seastar e888b1df...6bcb17c9 (4):
  > iotune: don't crash in sequential read test if hitting EOF
  > Remove FindBoost.cmake from install files
  > Merge "Move reactor backend out of reactor" from Asias
  > fair_queue: Add fair_queue.cc
2019-10-18 08:45:22 +03:00
Piotr Jastrzebski
2b26e3c904 test: change test_partition_key_logging to test_primary_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
997be35ef3 modification_statement: log in cdc clustering key of a change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
d8718a4ffc test: add test_partition_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
96c800ed0b modification_statement: log in cdc partition key of a change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
a1edb68b16 test: check that alter table with cdc manages log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
a45c894032 alter_table_statement: handle 'with cdc ='
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
629cdb5065 test: check that drop table with cdc removes log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
57c3377b1f cql_test_env: add require_table_does_not_exist assertion
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
50d53cd43e drop_table_statement: remove cdc log and desc if cdc is enabled
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
b9d6635fc5 test: check that create table with cdc sets up log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
81a34168a3 create_table_statement: handle 'with cdc ='
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:14 +02:00
Piotr Jastrzebski
6e29f5e826 create_table_statement: prepare announce_migration for cdc
This patch wrapps announce_migration logic into a lambda
that will be used both when cdc is used and when it's not.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
a9e43f4e86 test: add test_with_cdc_parameter
At the moment, this test only checks that table
creation and alteration sets cdc_options property
on a table correctly.

Future patches will extend this test to cover more
CDC aspects.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
8c6d860402 cql3: add cdc table property
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
386221da84 schema_tables: handle 'cdc' options
cdc options will be stored in scylla_tables to preserve
compatibility with Cassandra.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
8df942a320 schema_builder: handle schema::_cdc_options
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
ca9536a771 schema: add _cdc_options field
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
f079dce7b1 snitch: Provide getter for ignore_msb_bits of an endpoint
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
afe520ad77 gossip: Add application_state::IGNORE_MSB_BITS
We would like to share with other nodes
the value of ignore_msb_bits property used by the node.

This is needed because CDC will operate on
streams of changes. Each shard on each node
will have its own stream that will be identified
by a stream_id. Stream_id will be selected in
such a way that using stream_id as partition key
will locate partition identified by stream_id on
a node and shard that the stream belongs to.

To be able to generate such stream_id we need
to know ignore_msb_bits property value for each node.

IMPORTANT NOTE: At this point CDC does not support
topology changes. It will work only on a stable cluster.
Support for topology modifications will be added in
later steps.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
b9d5851830 snitch: Provide getter for shard_count of an endpoint
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
a66d7cfe57 gossip: Add application_state::SHARD_COUNT
We would like to share with other nodes
the number of shards available at the node.

This is needed because CDC will operate on
streams of changes. Each shard on each node
will have its own stream that will be identified
by a stream_id. Stream_id will be selected in
such a way that using stream_id as partition key
will locate partition identified by stream_id on
a node and shard that the stream belongs to.

To be able to generate such stream_id we need
to know how many shards are on each node.

IMPORTANT NOTE: At this point CDC does not support
topology changes. It will work only on a stable cluster.
Support for topology modifications will be added in
later steps.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Piotr Jastrzebski
f7ce8e4f2b cdc: Add flag guarding it's usage
At first, CDC will only be enabled when experimental flag is on.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00
Tomasz Grabiec
d7c3e48e8c Merge "Prepare modification_statement for LWT" from Kostja
Refactor modification_statement to enable lightweight
transaction implementation.

This patch set re-arranges logic of
modification_statement::get_mutations() and uses
a column mask of identify the columns to prefetch.
It also pre-computes a few modification statement properties
at prepare, assuming the prepared statement is invalidated if
the underlying schema changes.
2019-10-17 10:51:00 +02:00
Konstantin Osipov
5d3bf03811 lwt: pre-compute modification_statement properties at prepare
They are used more extensively with introduction of lightweight
transactions, and pre-computing makes it easier to reason about
complexity of the scenarios where they are involved.
2019-10-16 22:44:44 +03:00
Konstantin Osipov
6e0f76ea60 lwt: use column mask to build partition_slice
Pre-compute column mask of columns to prefetch when preparing
a modification statement and use it to build partition_slice
object for read command. Fetch only the required columns.

Ligthweight transactions build up on this by using adding
columns used in conditions and in cas result set to the column
maks of columns to read. Batch statements unite all column
masks to build a single relation for all rows modified by
conditional statements of a batch.
2019-10-16 22:44:37 +03:00
Konstantin Osipov
f32a7a0763 lwt: move option set for modification statement read command
Move the option set for read command to update_parameters
class, since this class encapsulates the logic of working
with the read command result.
2019-10-16 22:41:00 +03:00
Konstantin Osipov
c0f0ab5edd lwt: introduce column mask
Introduce a bitset container which can be used to compute
all columns used in a query.

Add a partition_slice constructor which uses the bitset.
2019-10-16 22:40:55 +03:00
Konstantin Osipov
a00b9a92b3 lwt: refactor modification statement get_mutations()
Refactor get_mutations() so that the read command and
apply_updates() functions can be used in lightweight transactions.

Move read_command creation to an own method, as well as apply_updates().
Rewrite get_mutations() using the new API.

Avoid unnecessary shared pointers.
2019-10-16 22:32:51 +03:00
Tomasz Grabiec
7b7e4be049 Merge "lwt: introduce column_definition::ordinal_id" from Kostja
Introduce a column definition ordinal_id and use it in boosted
update_parameters::prefetch_data as a column index of a full row.

Lightweight transactions prefetch data and return a result set.
Make sure update_parameters::prefetch_data can serve as a
single representation of prefetched list cells as well as
condition cells and as a CAS result set.

I have a lot of plans for column_definition::ordinal_id, it
simplifies a lot of operations with columns and will also be
used for building a bitset of columns used in a query
or in multiple queries of a batch.
2019-10-16 15:11:10 +02:00
Konstantin Osipov
a2b629c3a1 lwt: boost update_parameters to serve as a CAS result set
In modification_statement/batch_statement, we need to prefetch data to
1) apply list operations
2) evaluate CAS conditions
3) return CAS result set.

Boost update_parameters::prefetch_data to serve as a single result set
for all of the above. In case of a batch, store multiple rows for
multiple clustering keys involved in the batch.

Use an ordered set for columns and rows to make sure 3) CAS result set
is returned to the client in an ordered manner.

Deserialize the primary key and add it to result set rows since
it is returned to the client as part of CAS result set.

Index columns using ordinal_id - this allows having a single
set for all columns and makes columns easy to look up.

Remove an extra memcpy to build view objects when looking
up a cell by primary key, use partition_key/clustering_key
objects for lookup.
2019-10-16 15:56:50 +03:00
Konstantin Osipov
a450c25946 lwt: remove dead code in cql3/update_parameters.hh 2019-10-16 15:48:40 +03:00
Konstantin Osipov
a4ccbece5c lwt: remove an unnecessary optional around prefetch_data
Get rid of an unnecessary optional around
update_parameters::prefetch_data.

update_parameters won't own prefetch_data in the future anyway,
since prefetch_data can be shared among multiple modification
statements of a batch, each statement having its own options
and hence its own update_parameters instance.
2019-10-16 15:48:25 +03:00
Konstantin Osipov
7a399ebe0d lwt: move prefetch_data_builder to update_parameters.cc
Move prefetch_data_builder class from modification_statement.cc
to update_parameters.cc.

We're going to share the same builder to build a result set
for condition evaluation and to apply updates of batch statements, so we
need to share it.

No other changes.
2019-10-16 15:48:08 +03:00
Konstantin Osipov
fa73421198 lwt: introduce column_definition::ordinal_id
Make sure every column in the schema, be it a column of partition
key, clustering key, static or regular one, has a unique ordinal
identifier.

This makes it easy to compute the set of columns used in a query,
as well as index row cells.

Allow to get column definition in schema by ordinal id.
2019-10-16 15:46:25 +03:00
Avi Kivity
543e6974b9 Merge "Fix Incremental Compaction Efficiency" from Raphael
"
Incremental compaction code to release exhausted sstables was inefficient because
it was basically preventing any release from ever happening. So a new solution is
implemented to make incremental compaction approach actually efficient while
being cautious about not introducing data resurrection. This solution consists of
storing GC'able tombstones in a temporary sstable and keeping it till the end of
compaction. Overhead is avoided by not enabling it to strategies that don't work
with runs composed of multiple fragments.

Fixes #4531.

tests: unit, longevity 1TB for incremental compaction
"

* 'fix_incremental_compaction_efficiency/v6' of https://github.com/raphaelsc/scylla:
  tests: Check that partition is not resurrected on compaction failure
  tests: Add sstable compaction test for gc-only mutation compactor consumer
  sstables: Fix Incremental Compaction Efficiency
2019-10-16 15:15:53 +03:00
Tomasz Grabiec
054b53ac06 Merge "Introduce scylla generate_object_graph and improve scylla find and scylla fiber" from Botond
Introduce `scylla generate_object_graph`, a command which generates a
visual object graph, where vertices are objects and edges are
references. The graph starts from the object specified by the user. The
graph allows visual inspection of the object graph and hopefully allows
the user to identify the object in question.

Add the `--resolve` flag to `scylla find`. When specified, `scylla find`
will attempt to resolve the first pointer in the found objects as a vtable
pointer. If successful the pointer as well as the resolved  symbol will
be added to the listing.

In the listing of `scylla fiber` also print the starting task (as the
first item).
2019-10-15 20:11:16 +02:00
Tomasz Grabiec
c76f905497 Merge "scylla-gdb.py: improve the toolbox for investigating OOMs (but not just)" from Botond
This mini-series contains assorted improvements that I found very useful
while debugging OOM crashes in the past weeks:
* A wrapper for `std::list`.
* A wrapper for `std::variant`.
* Making `scylla find` usable from python code.
* Improvements to `scylla sstables` and `scylla task_histogram`
  commands.
* The `$downcast_vptr()` convenience function.
* The `$dereference_lw_shared_ptr()` convenience function.

Convenience functions in gdb are similar to commands, with some key
differences:
* They have a defined argument list.
* They can return values.
* They can be part of any gdb expression in which functions are allowed.

This makes them very useful for doing operations on values then
returning them so that the developer can use it the gdb shell.
2019-10-15 19:54:09 +02:00
Avi Kivity
acc433b286 mutation_partition: make static_row optional to reduce memory footprint
The static row can be rare: many tables don't have them, and tables
that do will often have mutations without them (if the static row
is rarely updated, it may be present in the cache and in readers,
but absent in memtable mutations). However, it always consumes ~100
bytes of memory, even if it not present, due to row's overhead.

Change it to be optional by using lazy_row instead of row. Some call
sites treewide were adjusted to deal with the extra indirection.

perf_simple_query appears to improve by 2%, from 163krps to 165 krps,
though it's hard to be sure due to noisy measurements.

memory_footprint comparisons (before/after):

mutation footprint:		       mutation footprint:
 - in cache:	 1096		        - in cache:	992
 - in memtable:	 854		        - in memtable:	750
 - in sstable:	 351		        - in sstable:	351
 - frozen:	 540		        - frozen:	540
 - canonical:	 827		        - canonical:	827
 - query result: 342		        - query result: 342

 sizeof(cache_entry) = 112	        sizeof(cache_entry) = 112
 -- sizeof(decorated_key) = 36	        -- sizeof(decorated_key) = 36
 -- sizeof(cache_link_type) = 32        -- sizeof(cache_link_type) = 32
 -- sizeof(mutation_partition) = 200    -- sizeof(mutation_partition) = 96
 -- -- sizeof(_static_row) = 112        -- -- sizeof(_static_row) = 8
 -- -- sizeof(_rows) = 24	        -- -- sizeof(_rows) = 24
 -- -- sizeof(_row_tombstones) = 40     -- -- sizeof(_row_tombstones) = 40

 sizeof(rows_entry) = 232	        sizeof(rows_entry) = 232
 sizeof(lru_link_type) = 16	        sizeof(lru_link_type) = 16
 sizeof(deletable_row) = 168	        sizeof(deletable_row) = 168
 sizeof(row) = 112		        sizeof(row) = 112
 sizeof(atomic_cell_or_collection) = 8  sizeof(atomic_cell_or_collection) = 8

Tests: unit (dev)
2019-10-15 15:42:05 +03:00
Avi Kivity
88613e6882 mutation_partition: introduce lazy_row
lazy_row adds indirection to the row class, in order to reduce storage requirements
when the row is not present. The intent is to use it for the static row, which is
not present in many schemas, and is often not present in writes even in schemas that
have a static row.

Indirection is done using managed_ref, which is lsa-compatible.

lazy_row implements most of row's methods, and a few more:
 - get(), get_existing(), and maybe_create(): bypass the abstraction and the
   underlying row
 - some methods that accept a row parameter also have an overload with a lazy_row
   parameter
2019-10-15 15:42:05 +03:00
Avi Kivity
efe8fa6105 managed_ref: add external_memory_usage()
Like other managed containers, add external_memory_usage() so we can account
for a partition's memory footprint in memtable/cache.
2019-10-15 15:41:42 +03:00
Botond Dénes
71923577a4 docs/debugging.md: fix formatting issues 2019-10-15 14:40:24 +03:00
Botond Dénes
4babd116d8 docs/debugging.md: demote 'Starting GDB' and 'Using GDB'
They really belong to the 'Introduction' chapter, instead of being
separate chapters of their own.
2019-10-15 14:40:20 +03:00
Pekka Enberg
0c1dad0838 Merge "Misc documentation cleanup" from Botond
"Delete README-DPDK.md, move IDL.md to docs/ and fix
docs/review-checklist.md to point to scylla's coding style document,
instead of seastar's."

* 'documentation-cleanup/v3' of https://github.com/denesb/scylla:
  docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's
  docs: mv coding-style.md docs/
  rm README-DPDK.md
  docs: mv IDL.md docs/
2019-10-15 12:53:49 +02:00
Pekka Enberg
b466d7ee33 Merge "Misc documentation cleanup" from Botond
"Delete README-DPDK.md, move IDL.md to docs/ and fix
docs/review-checklist.md to point to scylla's coding style document,
instead of seastar's."

* 'documentation-cleanup/v3' of https://github.com/denesb/scylla:
  docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's
  docs: mv coding-style.md docs/
  rm README-DPDK.md
  docs: mv IDL.md docs/
2019-10-15 08:53:22 +03:00
Benny Halevy
fef3342a34 test: random_schema::make_ckeys: fix inifinte loop
Allow returning fewer random clustering keys than requested since
the schema may limit the total number we can generate, for example,
if there is only one boolean clustering column.

Fixes #5161

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-15 08:52:39 +03:00
Botond Dénes
544f38ea6d docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's 2019-10-15 08:23:08 +03:00
Botond Dénes
56df6fbd58 docs: mv coding-style.md docs/
It is not discoverable in its current location (root directory) due to
the sheer number of source files in there.
2019-10-15 08:23:08 +03:00
Botond Dénes
c0706e52ce rm README-DPDK.md
Probably a leftover from the era when seastar and scylla shared the same
git repo.
2019-10-15 08:23:01 +03:00
Botond Dénes
061ac53332 docs: mv IDL.md docs/
Documentations should be in docs/.
2019-10-15 08:21:09 +03:00
Piotr Sarna
9e98b51aaa view: fix view_info select statement for local indexes
Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.
2019-10-14 17:14:19 +02:00
Piotr Sarna
2ee8c6f595 index: add is_global_index() utility
The helper function is useful for determining if given schema
represents a global index.
2019-10-14 17:13:32 +02:00
Botond Dénes
b2e10a3f2f scylla-gdb.py: introduce scylla generate_object_graph
When investigating OOM:s a prominent pattern is a size class that is
exploded, using up most of the available memory alone. If one is lucky,
the objects causing the OOM are instances of some virtual class, making
their identification easy. Other times the objects are referenced by
instances of some virtual class, allowing their identification with some
work. However there are cases where neither these objects nor their
direct referrers are instances of virtual classes. This is the case
`scylla generate_object_graph` intends to help.

scylla generate_object_graph, like its name suggests generates the
object graph of the requested object. The object graph is a directed
graph, where vertices are objects and edges are references between them,
going from referrers to the referee. The vertices contain information,
like the address of the object, its size, whether it is a live or not
and if applies, the address and symbol name of its vtable. The edges
contain the list of offsets the referrer has references at. The
generated graph is an image, which allows the visual inspection of the
object graph, allowing the developer to notice patterns and hopefully
identify the problematic objects.

The graph is generated with the help of `graphwiz`. The command
generates `.dot` files which can be converted to images with the help of
the `dot` utility. The command can do this if the output file is one of
the supported image formats (e.g. `png`), otherwise only the `.dot` file
is generated, leaving the actual image generation to the user.
2019-10-14 16:21:18 +03:00
Botond Dénes
f9e8e54603 scylla-gdb.py: boost scylla find
Add `--resolve` flag, which will make the command attempt to resolve the
first pointer of the found objects as a vtable pointer. If this is
successful the vtable pointer as well as the symbol name will be added
to the listing. This in particular makes backtracing continuation chains
a breeze, as the continuation object the searched one depends on can be
found at glance in the resulting listing (instead of having to manually
probe each item).

The arguments of `scylla find` are now parsed via `argparse`. While at
it, support for all the size classes supported by the underlying `find`
command were added, in addition to `w` and `g`. However the syntax of
specifying the size class to use has been changed, it now has to be
specified with the `-s|--size` command line argument, instead of passing
`-w` or `-g`.
2019-10-14 16:21:18 +03:00
Botond Dénes
0773104f32 scylla_fiber: also print the task that is the starting point of the fiber
Or in other words, the task that is the argument of the search. Example:
    (gdb) scylla fiber 0x60001a305910
    Starting task: (task*) 0x000060001a305910 0x0000000004aa5260 vtable for seastar::continuation<...> + 16
    #0  (task*) 0x0000600016217c80 0x0000000004aa5288 vtable for seastar::continuation<...> + 16
    #1  (task*) 0x000060000ac42940 0x0000000004aa2aa0 vtable for seastar::continuation<...> + 16
    #2  (task*) 0x0000600023f59a50 0x0000000004ac1b30 vtable for seastar::continuation<...> + 16
2019-10-14 13:36:25 +03:00
Botond Dénes
1a8846c04a scylla-gdb.py: move the code finding text_start and text_end to get_text_range()
This code is currently duplicated in `find_vptrs()` and
`scylla_task_histogram`. Refactor it out into a function.
The code is also improved in two ways:
* Make the search stricter, ensuring (hopefully) that indeed the
  executable's text section is found, not that of the first object in
  the `gdb file` listing.
* Throw an exception in the case when the search fails.
2019-10-14 13:25:28 +03:00
Raphael S. Carvalho
7f1a2156c7 table: Don't account for shared SSTables in compaction backlog tracker
We don't want to add shared sstables to table's backlog tracker because:
1) table's backlog tracker has only an influence on regular compaction
2) shared sstables are never regular compacted, they're worked by
resharding which has its own backlog tracker.

Such sstables belong to more than one shard, meaning that currently
they're added to backlog tracker of all shards that own them.
But the thing is that such sstables ends up being resharded in shard
that may be completely random. So increasing backlog of all shards
such sstables belong to, won't lead to faster resharding. Also, table's
backlog tracker is supposed to deal only with regular compaction.

Accounting for shared sstables in table's tracker may lead to incorrect
speed up of regular compactions because the controller is not aware
that some relevant part of the backlog is due to pending resharding.
The fix is about ignoring sstables that will be resharded and let
table's backlog tracker account only for sstables that can be worked on
by regular compaction, and rely on resharding controlling itself
with its own tracker.
NOTE: this doesn't fix the resharding controlling issue completely,
as described in #4952. We'll still need to throttle regular compaction
on behalf of resharding. So subsequent work may be about:
- move resharding to its own priority class, perhaps streaming.
- make a resharding's backlog tracker accounts for sstables in all of
its pending jobs, not only the ongoing ones (currently limited to 1 by shard).
- limit compaction shares when resharding is in progress.
THIS only fixes the issue in which controller for regular compaction
shouldn't account sstables completely exclusive to resharding.

Fixes #5077.
Refs #4952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>
2019-10-13 10:14:13 +03:00
Raphael S. Carvalho
88611d41d0 sstables: Fix major compaction's space amplification with incremental compaction
Incremental compaction efficiency depends on the reference of sstables
compacted being all released because the file descriptors of sstable
components are only closed once the sstable object is destructed.
Incremental compaction is not working for major compaction because a reference
to released sstables are being kept in the compaction manager, which prevents
their disk usage from being released. So the space amplification would be
the same as with a non-incremental approach, i.e. needs twice the amount of
used disk space for the table(s). With this issue fixed, the database now
becomes very major compaction friendly, the space requirement becoming very
low, a constant which is roughly number of fragments being currently compacted
multiplied by fragment size (1GB by default), for each table involved.

Fixes #5140.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20191003211927.24153-1-raphaelsc@scylladb.com>
2019-10-13 09:55:11 +03:00
Raphael S. Carvalho
17c66224f7 tests: Check that partition is not resurrected on compaction failure
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-13 00:06:51 -03:00
Raphael S. Carvalho
6301a10fd7 tests: Add sstable compaction test for gc-only mutation compactor consumer
Make sure gc'able-tombstone-only sstable is properly generated with data that
comes from regular compaction's input sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-12 21:38:53 -03:00
Raphael S. Carvalho
91260cf91b sstables: Fix Incremental Compaction Efficiency
Compaction prevents data resurrection from happening by checking that there's
no way a data shadowed by a GC'able tombstone will survive alone, after
a failure for example.

Consider the following scenario:
We have two runs A and B, each divided to 5 fragments, A1..A5, B1..B5.

They have the following token ranges:

 A:  A1=[0, 3]   A2=[4, 7]  A3=[8, 11]   A4=[12, 15]    A5=[16,18]
B is the same as A's ranges, offset by 1:

 B:  B1=[1,4]    B2=[5,8]  B3=[9,12]    B4=[13,16]    B5=[17,19]

Let's say we are finished flushing output until position 10 in the compaction.
We are currently working on A3 and B3, so obviously those cannot be deleted.
Because B2 overlaps with A3, we cannot delete B2 either.
Otherwise, B2 could have a GC'able tombstone that shadows data in A3, and after
B2 is gone, dead data in A3 could be resurrected *on failure*.
Now, A2 overlaps with B2 which we couldn't delete yet, so we can't delete A2.
Now A2 overlaps with B1 so we can't delete B1. And B1 overlaps with A1 so
we can't delete A1. So we can't delete any fragment.

The problem with this approach is obvious, fragments can potentially not be
released due to data dependency, so incremental compaction efficiency is
severely reduced.
To fix it, let's not purge GC'able tombstones right away in the mutation
compactor step. Instead, let's have compaction writing them to a separate
sstable run that would be deleted in the end of compaction.
By making sure that tombstone information from all compacting sstables is not
lost, we no longer need to have incremental compaction imposing lots of
restriction on which fragments could be released. Now, any sstable which data
is safe in a new sstable can be released right away. In addition, incremental
compaction will only take place if compaction procedure is working with one
multi-fragment sstable run at least.

Fixes #4531.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-12 21:36:03 -03:00
Kamil Braun
ef9d5750c8 view: fix bug in virtual columns.
When creating a virtual column of non-frozen map type,
the wrong type was used for the map's keys.

Fixes #5165.
2019-10-11 20:47:06 +03:00
Avi Kivity
f12feec2c9 Update seastar submodule
* seastar 1f68be436f...e888b1df9c (8):
  > sharded: Make map work with mapper that returns a future
  > cmake: Remove FindBoost.cmake
  > Reduce noncopyable_function instruction cache footprint
  > doc: add Loops section to the tutorial
  > Merge "Move file related code out of reactor" from Asias
  > Merge "Move the io_queue code out of reactor" from Asias
  > cmake: expose seastar_perf_testing lib
  > future: class doc: explain why discarding a future is bad

 - main.cc now includes new file io_queue.hh
 - perf tests now include seastar perf utilities via user, not
   system, includes since those are not exported
2019-10-10 18:17:28 +03:00
Nadav Har'El
33027a36b4 alternator: Add authorization
Merged patch set from Piotr Sarna:

Refs #5046

This commit adds handling "Authorization:" header in incoming requests.
The signature sent in the authorization is recomputed server-side
and compared with what the client sent. In case of a mismatch,
UnrecognizedClientException is returned.
The signature computation is based on boto3 Python implementation
and uses gnutls to compute HMAC hashes.

This series is rebased on a previous HTTPS series in order to ease
merging these two. As such, it depends on the HTTPS series being
merged first.

Tests: alternator(local, remote)

The series also comes with a simple authorization test and a docs update.

Piotr Sarna (6):
  alternator: migrate split() function to string_view
  alternator: add computing the auth signature
  config: add alternator_enforce_authorization entry
  alternator: add verifying the auth signature
  alternator-test: add a basic authorization test case
  docs: update alternator authorization entry

 alternator-test/test_authorization.py |  34 ++++++++
 configure.py                          |   1 +
 alternator/{server.hh => auth.hh}     |  22 ++---
 alternator/server.hh                  |   3 +-
 db/config.hh                          |   1 +
 alternator/auth.cc                    |  88 ++++++++++++++++++++
 alternator/server.cc                  | 112 +++++++++++++++++++++++---
 db/config.cc                          |   1 +
 main.cc                               |   2 +-
 docs/alternator/alternator.md         |   7 +-
 10 files changed, 241 insertions(+), 30 deletions(-)
 create mode 100644 alternator-test/test_authorization.py
 copy alternator/{server.hh => auth.hh} (58%)
 create mode 100644 alternator/auth.cc
2019-10-10 15:57:46 +03:00
Nadav Har'El
df62499710 docs/isolation.md: copy-edit
Minor spelling and syntax corrections. No new content or semantic changes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191010093457.20439-1-nyh@scylladb.com>
2019-10-10 15:17:28 +03:00
Piotr Dulikowski
c04e8c37aa distributed_loader: populate non-system keyspaces in parallel
Before this change, when populating non-system keyspaces, each data
directory was scanned and for each entry (keyspace directory),
a keyspace was populated. This was done in a serial fashion - populating
of one keyspace was not started until the previous one was done.

Loading keyspaces in such fashion can introduce unnecessary waiting
in case of a large number of keyspaces in one data directory. Population
process is I/O intensive and barely uses CPU.

This change enables parallel loading of keyspaces per data directory.
Populating the next keyspace does not wait for the previous one.

A benchmark was performed measuring startup time, with the following
setup:
  - 1 data directory,
  - 200 keyspaces,
  - 2 tables in each keyspace, with the following schema:
      CREATE TABLE tbl (a int, b int, c int, PRIMARY KEY(a, b))
        WITH CLUSTERING ORDER BY (b DESC),
  - 1024 rows in each table, with values (i, 2*i, 3*i) for i in 0..1023,
  - ran on 6-core virtual machine running on i7-8750H CPU,
  - compiled in dev mode,
  - parameters: --smp 6 --max-io-requests 4 --developer-mode=yes
      --datadir $DIR --commitlog-directory $DIR
      --hints-directory $DIR --view-hints-directory $DIR

The benchmark tested:
  - boot time, by comparing timestamp of the first message in log,
    and timestamp of the following message:
      "init - Scylla version ... initialization completed."
  - keyspace population time, by comparing timestamps of messages:
      "init - loading non-system sstables"
    and
      "init - starting view update generator"

The benchmark was run 5 times for sequential and parallel version,
with the following results:
  - sequential: boot 31.620s, keyspace population 6.051s
  - parallel:   boot 29.966s, keyspace population 4.360s

Keyspace population time decreased by ~27.95%, and overall boot time
by about ~5.23%.

Tests: unit(release)

Fixes #2007
2019-10-10 15:12:23 +03:00
Piotr Sarna
6ca55d3c83 docs: update alternator authorization entry
The entry now contains a comment that computing a signature works,
but is still based on a hardcoded key.
2019-10-10 13:51:00 +02:00
Piotr Sarna
23798b7301 alternator-test: add a basic authorization test case
The test case ensures that passing wrong credential results
in getting an UnrecognizedClientException.
2019-10-10 13:51:00 +02:00
Piotr Sarna
97cbb9a2c7 alternator: add verifying the auth signature
The signature sent in the "Authorization:" header is now verified
by computing the signature server-side with a matching secret key
and confirming that the signatures match.
Currently the secret key is hardcoded to be "whatever" in order
to work with current tests, but it should be replaced
by a proper key store.

Refs #5046
2019-10-10 13:51:00 +02:00
Piotr Sarna
e245b54502 config: add alternator_enforce_authorization entry
The config entry will be used to turn authorization for alternator
requests on and off. The default is currently off, since the key store
is not implemented yet.
2019-10-10 13:51:00 +02:00
Piotr Sarna
589a22d078 alternator: add computing the auth signature
A function for computing the auth signature from user requests
is added, along with helper functions. The implementation
is based on gnutls's HMAC.

Refs #5046
2019-10-10 13:51:00 +02:00
Piotr Sarna
ca58b46b4c alternator: migrate split() function to string_view
The implementation of string split was based on sstring type for
simplicity, but it turns out that more generic std::string_view
will be beneficial later to avoid unneeded string copying.
Unfortunately boost::split does not cooperate well with string views,
so a simple manual implementation is provided instead.
2019-10-10 13:50:59 +02:00
Botond Dénes
52afbae1e5 README.md: add links to other documentation sources
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191010103926.34705-3-bdenes@scylladb.com>
2019-10-10 14:15:01 +03:00
Botond Dénes
e52712f82c docs: add README.md
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191010103926.34705-2-bdenes@scylladb.com>
2019-10-10 14:14:09 +03:00
Amnon Heiman
64c2d28a7f database: Add counter for the number of schema changes
Schema changes can have big effects on performance, typically it should
be a rare event.

It is usefull to monitor how frequently the schema changed.
This patch adds a counter that increases each time a schema changed.

After this patch the metrics would look like:

scylla_database_schema_changed{shard="0",type="derive"} 2

Fixes #4785

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-10-08 17:54:49 +02:00
Asias He
b89ced4635 streaming: Do not open rpc stream connection if reader has no data
We can use the reader::peek() to check if the reader contains any data.
If not, do not open the rpc stream connection. It helps to reduce the
port usage.

Refs: #4943
2019-10-08 10:31:02 +02:00
Konstantin Osipov
94006d77b1 lwt: add cas_contention_timeout_in_ms to config
Make the default conform to the origin.
Message-Id: <20191006154532.54856-3-kostja@scylladb.com>
2019-10-08 00:02:35 +02:00
Konstantin Osipov
383e17162a lwt: implement query_options::check_serial_consistency()
Both in a single-statement transaction and in a batch
we expect that serial consistency is provided. Move the
check to query_options class and make it available for
reuse.

Keep get_serial_consistency() around for use in
transport/server.cc.
Message-Id: <20191006154532.54856-2-kostja@scylladb.com>
2019-10-08 00:02:35 +02:00
Piotr Sarna
36a1905e98 storage_proxy: handle unstarted write cancelling
When another node is reported to be down, view updates queued
for it are cancelled, but some of them may already be initiated.
Right now, cancelling such a write resulted in an exception,
but on conceptual level it's not really an exception, since
this behaviour is expected.
Previous version of this patch was based on introducing a special
exception type that was later handled specially, but it's not clear
if it's a good direction. Instead, this patch simply makes this
path non-exceptional, as was originally done by Nadav in the first
version of the series that introduced handling unstarted write
cancellations. Additionally, a message containing the information
that a write is cancelled is logged with debug level.
2019-10-07 16:55:36 +03:00
Vladimir Davydov
e8bcb34ed4 api: drop /storage_proxy/metrics/cas_read/condition_not_met
There's no such metric in Cassandra (although Cassadra's docs mistakenly
say it exists). Having it would make no sense anyway so let's drop it.

Message-Id: <b4f7a6ad278235c443cb8ea740bfa6399f8e4ee1.1570434332.git.vdavydov@scylladb.com>
2019-10-07 16:54:39 +03:00
Piotr Sarna
5ab134abef alternator-test: update HTTPS section of README
README.md has 3 fixes applied:
 - s/alternator_tls_port/alternator_https_port
 - conf directory is mentioned more explicitly
 - it now correctly states that the self-signed certificate
   warning *is* explicitly ignored in tests
Message-Id: <e5767f7dbea260852fc2fa9b613e1bebf490cc78.1570444085.git.sarna@scylladb.com>
2019-10-07 14:51:16 +03:00
Avi Kivity
8ed6f94a16 Merge "Fix handling of schema alters and eviction in cache" from Tomasz
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()
2019-10-07 14:43:36 +03:00
Nadav Har'El
f2f0f5eb0f alternator: add https support
Merged patch series from Piotr Sarna:

This series adds HTTPS support for Alternator.
The series comes with --https option added to alternator-test, which makes
the test harness run all the tests with HTTPS instead of HTTP. All the tests
pass, albeit with security warnings that a self-signed x509 certificate was
used and it should not be trusted.

Fixes #5042
Refs scylladb/seastar#685

Patches:
  docs: update alternator entry on HTTPS
  alternator-test: suppress the "Unverified HTTPS request" warning
  alternator-test: add HTTPS info to README.md
  alternator-test: add HTTPS to test_describe_endpoints
  alternator-test: add --https parameter
  alternator: add HTTPS support
  config: add alternator HTTPS port
2019-10-07 12:38:20 +03:00
Avi Kivity
969113f0c9 Update seastar submodule
* seastar c21a7557f9...1f68be436f (6):
  > scheduling: Add per scheduling group data support
  > build: Include dpdk as a single object in libseastar.a
  > sharded: fix foreign_ptr's move assignment
  > build: Fix DPDK libraries linking in pkg-config file
  > http server: https using tls support
  > Make output_stream blurb Doxygen
2019-10-07 12:18:49 +03:00
Nadav Har'El
754add1688 alternator: fix Expected's BEGINS_WITH error handling
The BEGINS_WITH condition in conditional updates (via Expected) requires
that the given operand be either a string or a binary. Any other operand
should result in a validation exception - not a failed condition as we
generate now.

This patch fixes the test for this case so it will succeed against
Amazon DynamoDB (before this patch it fails - this failure was masked by
a typo before commit 332ffa77ea). The patch
then fixes our code to handle this case correctly.

Note that BEGINS_WITH handling of wrong types is now asymmetrical: A bad
type in the operand is now handled differently from a bad type in the
attribute's value. We add another check to the test to verify that this
is the case.

Fixes #5141

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191006080553.4135-1-nyh@scylladb.com>
2019-10-06 17:16:55 +03:00
Botond Dénes
d0fa5dc34d scylla-gdb.py: introduce the downcast_vptr convenience function
When debugging one constantly has to inspect object for which only
a "virtual pointer" is available, that is a pointer that points to a
common parent class or interface.
Finding the concrete type and downcasting the pointer is easy enough but
why do it manually when it is possible to automate it trivially?
$downcast_vptr() returns any virtual pointer given to it, casted to the
actual concrete object.
Exlample:
    (gdb) p $1
    $2 = (flat_mutation_reader::impl *) 0x60b03363b900
    (gdb) p $downcast_vptr(0x60b03363b900)
    $3 = (combined_mutation_reader *) 0x60b03363b900
    # The return value can also be dereferenced on the spot.
    (gdb) p *$downcast_vptr($1)
    $4 = {<flat_mutation_reader::impl> = {_vptr.impl = 0x46a3ea8 <vtable
    for combined_mutation_reader+16>, _buffer = {_impl = {<std::al...
2019-10-04 17:45:47 +03:00
Botond Dénes
434a41d39b scylla-gdb.py: introduce the dereference_lw_shared_ptr convenience function
Dereferencing an `seastar::lw_shared_ptr` is another tedious manual
task. The stored pointer (`_p`) has to be casted to the right subclass
of `lw_shared_ptr_counter_base`, which involves inspecting the code,
then make writing a cast expression that gdb is willing to parse. This
is something machines are so much better at doing.
`$dereference_lw_shared_ptr` returns a pointer to the actual pointed-to
object, given an instance of `seastar::lw_shared_ptr`.
Example:
    (gdb) p $1._read_context
    $2 = {_p = 0x60b00b068600}
    (gdb) p $dereference_lw_shared_ptr($1._read_context)
    $3 = {<seastar::enable_lw_shared_from_this<cache::read_context>>
    = {<seastar::lw_shared_ptr_counter_base> = {_count = 1}, ...
2019-10-04 17:45:47 +03:00
Botond Dénes
f5de002318 scylla-gdb.py: scylla_sstables: also print the sstable filename
And expose the method that obtains the file-name of an sstble object to
python code.
2019-10-04 17:45:32 +03:00
Botond Dénes
ad7a668be9 scylla-gdb.py: scylla_task_histogram: expose internal parameters
Make all the parameters of the sampling tweakable via command line
arguments. I strived to keep full backward compatibility, but due to the
limitations of `argparse` there is one "breaking" change. The optional
positional size argument is now a non-positional argument as `argparse`
doesn't support optional positional arguments.
Added documentation for both the command itself as well as for all the
arguments.
2019-10-04 17:44:40 +03:00
Botond Dénes
7767cc486e scylla-gdb.py: make scylla_find usable from python code 2019-10-04 17:44:40 +03:00
Botond Dénes
9cdea440ef scylla-gdb.py: add std_variant, a wrapper for std::variant
Allows conveniently obtaining the active member via calling `get()`.
2019-10-04 17:44:40 +03:00
Botond Dénes
55e9097dd9 scylla-gdb.py: add std_list, a wrapper for an std::list
std_list makes an `std::list` instance accessible from python code just
like a regular (read-only) python container.
2019-10-04 17:44:40 +03:00
Botond Dénes
b8f0b3ba93 std_optional: fix get()
Apparently there is now another layer of indirection: `std::_Storage`.
2019-10-04 17:43:40 +03:00
Tomasz Grabiec
020a537ade tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read 2019-10-04 11:38:13 +02:00
Tomasz Grabiec
ebedefac29 tests: row_cache_stress_test: Verify all entries are evictable at the end 2019-10-04 11:38:12 +02:00
Tomasz Grabiec
1b95f5bf60 tests: row_cache_stress_test: Exercise single-partition reads
make_single_key_reader() currently doesn't actually create
single-partition readers because it doesn't set
mutation_reader::forwarding::no when it creates individual
readers. The readers will default to mutation_reader::forwarding::yes
and actually create scanning readers in preparation for
fast-forwarding across partitions.

Fix by passing mutation_reader::forwarding::no.
2019-10-04 11:38:12 +02:00
Tomasz Grabiec
81dd17da4e tests: row_cache_stress_test: Add periodic schema alters
Reproduces #5127.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
2fc144e1a8 tests: memtable_snapshot_source: Allow changing the schema 2019-10-03 22:03:29 +02:00
Tomasz Grabiec
22dde90dba tests: simple_schema: Prepare for schema altering
Currently, methods of simple_schema assume that table's schema doesn't
change. Accessors like get_value() assume that rows were generated
using simple_schema::_s. Because if that, the column_definition& for
the "v" column is cached in the instance. That column_definiion&
cannot be used to access objects created with a different schema
version. To allow using simple_schema after schema changes,
column_definition& caching is now tagged with the table schema version
of origin. Methods which access schema-dependent objects, like
get_value(), are now accepting schema& corresponding to the objects.

Also, it's now possible to tell simple_schema to use a different
schema version in its generator methods.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
e6afc89735 row_cache: Record upgraded schema in memtable entries during update
Cache update may defer in the middle of moving of partition entry
from a flushed memtable to the cache. If the schema was changed since
the entry was written, it upgrades the schema of the partition_entry
first but doesn't update the schema_ptr in memtable_entry. The entry
is removed from the memtable afterward. If a memtable reader
encounters such an entry, it will try to upgrade it assuming it's
still at the old schema.

That is undefined behavior in general, which may include:

 - read failures due to bad_alloc, if fixed-size cells are interpreted
   as variable-sized cells, and we misinterpret a value for a huge
   size

 - wrong read results

 - node crash

This doesn't result in a permanent corruption, restarting the node
should help.

It's the more likely to happen the more rows there are in a
partition. It's unlikely to happen with single-row partitions.

Introduced in 70c7277.

Fixes #5128.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
ea461a3884 memtable: Extract memtable_entry::upgrade_schema() 2019-10-03 22:03:29 +02:00
Tomasz Grabiec
90d6c0b9a2 row_cache, mvcc: Prevent locked snapshots from being evicted
If the whole partition entry is evicted while being updated from the
memtable, a subsequent read may populate the partition using the old
version of data if it attempts to do it before cache update advances
past that partition. Partial eviction is not affected because
populating reads will notice that there is a newer snapshot
corresponding to the updater.

This can happen only in OOM situations where the whole cache gets evicted.

Affects only tables with multi-row partitions, which are the only ones
that can experience the update of partition entry being preempted.

Introduced in 70c7277.

Fixes #5134.
2019-10-03 22:03:29 +02:00
Tomasz Grabiec
57a93513bd row_cache: Make evict() not use invalidate_unwrapped()
invalidate_unwrapped() calls cache_entry::evict(), which cannot be
called concurrently with cache update. invalidate() serializes it
properly by calling do_update(), but evict() doesn't. The purpose of
evict() is to stress eviction in tests, which can happen concurrently
with cache update. Switch it to use memory reclaimer, so that it's
both correct and more realistic.

evict() is used only in tests.
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
c88a4e8f47 mvcc: Introduce partition_snapshot::touch() 2019-10-03 22:03:28 +02:00
Tomasz Grabiec
25e2f87a37 row_cache, mvcc: Do not upgrade schema of entries which are being updated
When a read enters a partition entry in the cache, it first upgrades
it to the current schema of the cache. The same happens when an entry
is updated after a memtable flush. Upgrading the entry is currently
performed by squashing all versions and replacing them with a single
upgraded version. That has a side effect of detaching all snapshots
from the partition entry. Partition entry update on memtable flush is
writing into a snapshot. If that snapshot is detached by a schema
upgrade, the entry will be missing writes from the memtable which fall
into continuous ranges in that entry which have not yet been updated.

This can happen only if the update of the entry is preempted and the
schema was altered during that, and a read hit that partition before
the update went past it.

Affects only tables with multi-row partitions, which are the only ones
that can experience the update of partition entry being preempted.

The problem is fixed by locking updated entries and not upgrading
schema of locked entries. cache_entry::read() is prepared for this,
and will upgrade on-the-fly to the cache's schema.

Fixes #5135
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
0675088818 row_cache: Use the correct schema version to populate the partition entry
The sstable reader which populates the partition entry in the cache is
using the schema of the partition entry snapshot, which will be the
schema of the cache at the time the partition was entered. If there
was a schema change after the cache reader entered the partition but
before it created the sstable reader, the cache populating reader will
interpret sstable fragments using the wrong schema version. That is
more likely if partitions have many rows, and the front of the
partition is populated. With single-row partitions that's unlikely to
happen.

That is undefined behavior in general, which may include:

 - read failures due to bad_alloc, if fixed-size cells are
   interpreted as variable-sized cells, and we misinterpret
   a value for a huge size

 - wrong read results

 - node crash

This doesn't result in a permanent corruption, restarting the node
should help.

Fixes #5127.
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
10992a8846 delegating_reader: Optimize fill_buffer()
Use move_buffer_content_to() which is faster than fill_buffer_from()
because it doesn't involve popping and pushing the fragments across
buffers. We save on size estimation costs.
2019-10-03 22:03:28 +02:00
Piotr Sarna
07ac3ea632 docs: update alternator entry on HTTPS
The HTTPS entry is updated - it's now supported, but still
misses the same features as HTTP - CRC headers, etc.
2019-10-03 19:10:30 +02:00
Piotr Sarna
b63077a8dc alternator-test: suppress the "Unverified HTTPS request" warning
Running with --https and a self-signed certificate results in a flood
of expected warnings, that the connection is not to be trusted.
These warnings are silenced, as users runing a local test with --https
usually use self-signed certificates.
2019-10-03 19:10:30 +02:00
Piotr Sarna
e65fd490da alternator-test: add HTTPS info to README.md
A short paragraph about running tests with `--https` and configuring
the cluster to work correctly with this parameter is added to README.md.
2019-10-03 19:10:30 +02:00
Piotr Sarna
0d28d7f528 alternator-test: add HTTPS to test_describe_endpoints
The test_describe_endpoints test spawns another client connection
to the cluster, so it needs to be HTTPS-aware in order to work properly
with --https parameter.
2019-10-03 19:10:30 +02:00
Piotr Sarna
9fd77ed81d alternator-test: add --https parameter
Running with --https parameter will result in sending the requests
via HTTPS instead of HTTP. By default, port 8043 is used for a local
cluster. Before running pytest --https, make sure that Scylla
was properly configured to initialize a HTTPS alternator server
by providing the alternator_tls_port parameter.

The HTTPS-based connection runs with verification disabled,
otherwise it would not work with self-signed certificates,
which are useful for tests.
2019-10-03 19:10:30 +02:00
Piotr Sarna
e1b0537149 alternator: add HTTPS support
By providing a server based on a TLS socket, it's now possible
to serve HTTPS requests in alternator. The HTTPS server is enabled
by setting its port in scylla.yaml: alternator_tls_port=XXXX.
Alternator TLS relies on the existing TLS configuration,
which is provided by certificate, keyfile, truststore, priority_string
options.

Fixes #5042
2019-10-03 19:10:30 +02:00
Piotr Sarna
b42eb8b80a config: add alternator HTTPS port
The config variable will be used to set up a TLS-based server
for serving alternator HTTPS requests.
2019-10-03 19:10:29 +02:00
Nadav Har'El
9d4e71bbc6 alternator-test: fix misleading xfail message
The test test_update_expression_function_nesting() fails because DynamoDB
don't allow an expression like list_append(list_append(:val1, :val2), :val3)
but Alternator doesn't check for this (and supports this expression).

The "xfail" message was outdated, suggesting that the test fails because
the "SET" expression isn't supported - but it is. So replace the message
by a more accurate one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190915104708.30471-1-nyh@scylladb.com>
2019-10-03 18:45:03 +03:00
Nadav Har'El
9747019e7b alternator: implement additional Expected operators
Merged patch set from Dejan Mircevski implementing some of the
missing operators for Expected: NE, IN, NULL and NOT_NULL.

Patches:
  alternator: Factor out Expected operand checks
  alternator: Implement NOT_NULL operator in Expected
  alternator: Implement NULL operator in Expected
  alternator: Fix expected_1_null testcase
  alternator: Implement IN operator in Expected
  alternator: Implement NE operator in Expected
  alternator: Factor out common code in Expected
2019-10-03 18:12:38 +03:00
Konstantin Osipov
25ffd36d21 lwt: prepare the expression tree for IF condition evaluation
Frozen empty lists/map/sets are not equal to null value,
whil multi-cell empty lists/map/sets are equal to null values.

Return a NULL value for an empty multi-cell set or list
if we know the receiver is not frozen - this makes it
easy to compare the parameter with the receiver.

Add a test case for inserting an empty list or set
- the result is indistinguishable from NULL value.
Message-Id: <20191003092157.92294-2-kostja@scylladb.com>
2019-10-03 14:56:25 +02:00
Avi Kivity
3cb081eb84 Merge " hinted handoff: fix races during shutdown and draining" from Vlad
"
Fix races that may lead to use-after-free events and file system level exceptions
during shutdown and drain.

The root cause of use-after-free events in question is that space_watchdog blocks on
end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as
it's accessed even if the corresponding end_point_hints_manager instance
is destroyed in the context of manager::drain_for().

File system exceptions may occur when space_watchdog attempts to scan a
directory while it's being deleted from the drain_for() context.
In case of such an exception new hints generation is going to be blocked
- including for materialized views, till the next space_watchdog round (in 1s).

Issues that are fixed are #4685 and #4836.

Tested as follows:
 1) Patched the code in order to trigger the race with (a lot) higher
    probability and running slightly modified hinted handoff replace
    dtest with a debug binary for 100 times. Side effect of this
    testing was discovering of #4836.
 2) Using the same patch as above tested that there are no crashes and
    nodes survive stop/start sequences (they were not without this series)
    in the context of all hinted handoff dtests. Ran the whole set of
    tests with dev binary for 10 times.
"

* 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla:
  hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
  hinted handoff: make taking file_update_mutex safe
  db::hints::manager::drain_for(): fix alignment
  db::hints::manager: serialize calls to drain_for()
  db::hints: cosmetics: identation and missing method qualifier
2019-10-03 14:38:00 +03:00
Tomasz Grabiec
aad1307b14 row_cache, memtable: Use upgrade_schema() 2019-10-03 13:28:33 +02:00
Tomasz Grabiec
3177732b35 flat_mutation_reader: Introduce upgrade_schema() 2019-10-03 13:28:33 +02:00
Asias He
a9b95f5f01 repair: Fix tracker::start and tracker::done in case of error
The operation after gate.enter() in tracker::start() can fail and throw,
we should call gate.leave() in such case to avoid unbalanced enter and
leave calls. tracker::done() has similar issue too.

Fix it by removing the gate enter and leave logic in tracker start and
done. A helper tracker::run() is introduced to take care of the gate and
repair status.

In addition, the error log is improved. It now logs exceptions on all
shards in the summary. e.g.,

[shard 0] repair - repair id 1 failed: std::runtime_error
({shard 0: std::runtime_error (error0), shard 1: std::runtime_error (error1)})

Fixes #5074
2019-10-03 13:33:02 +03:00
Botond Dénes
00b432b61d querier_cache: correctly account entries evicted on insertion in the population
Currently, the population stat is not increased for entries that are
evicted immediately on insert, however the code that does the eviction
still decreases the population stat, leading to an imbalance and in some
cases the underflow of the population stat. To fix, unconditionally
increase the population stat upon inserting an entry, regardless of
whether it is immediately evicted or not.

Fixes: #5123

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>
2019-10-03 11:49:44 +03:00
Dejan Mircevski
ac98385d04 alternator: Factor out Expected operand checks
Put all AttributeValuelist size verification under
verify_operand_count(), rather than have some cases invoke
verify_operand_count() while others verify it in check_*() functions.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 17:11:58 -04:00
Dejan Mircevski
de18b3240b alternator:Implement NOT_NULL operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:23:59 -04:00
Dejan Mircevski
75960639a4 alternator: Implement NULL operator in Expected
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:19:14 -04:00
Dejan Mircevski
e4fd5f3ef0 alternator: Fix expected_1_null testcase
Testcase "For NULL, AttributeValueList must be empty" accidentally
used NOT_NULL instead of NULL.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:19:14 -04:00
Dejan Mircevski
b7ac510581 alternator: Implement IN operator in Expected
Add check_IN() and a switch case that invokes it.  Reactivate IN
tests.  Add a testcase for non-scalar attribute values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 16:17:38 -04:00
Dejan Mircevski
56efa55a06 alternator: Implement NE operator in Expected
Recognize "NE" as a new operator type, add check_NE() function, invoke
it in verify_expected_one(), and reactivate NE tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 14:47:13 -04:00
Dejan Mircevski
af0462d127 alternator: Factor out common code in Expected
Operand-count verification will be repeated a lot as more operators
are implemented, so factor it out into verify_operand_count().

Also move `got` null checks to check_* functions, which reduces
duplication at call sites.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-02 14:36:57 -04:00
Konstantin Osipov
e8c13efb41 lwt: move mutation hashers to mutation.hh
Prepare mutation hashers for reuse in CAS implementation.
Message-Id: <20190930202409.40561-2-kostja@scylladb.com>
2019-10-01 19:49:31 +02:00
Konstantin Osipov
6cde985946 lwt: remove code that no longer servers as a reference
Remove ifdef'ed Java code, since LWT implementation
is based on the current state of the origin.
Message-Id: <20190930201022.40240-2-kostja@scylladb.com>
2019-10-01 19:46:15 +02:00
Konstantin Osipov
4d214b624b lwt: ensure enum_set::of is constexpr.
This allows using it to initialize const static members.
Message-Id: <20190930200530.40063-2-kostja@scylladb.com>
2019-10-01 19:45:56 +02:00
Tomasz Grabiec
3b9bf9d448 Merge "storage_proxy: replace variadic futures with structs" from Avi
Seastar variadic futures are deprecated, so replace with structs to
avoid nasty deprecation warnings.
2019-10-01 19:32:55 +02:00
Avi Kivity
162730862d storage_proxy: remove variadic future from query_partition_key_range_concurrent()
Seastar variadic futures are deprecated, so replace with a nice struct.
2019-09-30 21:33:44 +03:00
Avi Kivity
968b34a2b4 storage_proxy: remove variadic future from digest_read_resolver
Seastar variadic futures are deprecated, so replace with a nice
struct.
2019-09-30 21:32:17 +03:00
Avi Kivity
90096da9f3 managed_ref: add get() accessor
While a managed_ref emulates a reference more closely than it does
a pointer, it is still nullable, so add a get() (similar to
unique_ptr::get()) that can be nullptr if the reference is null.

The immediate use will be mutation_partition::_static_row, which
is often empty and takes up about 10% of a cache entry.
2019-09-30 20:55:36 +03:00
Nadav Har'El
c9aae13fae docs/alternator/getting-started.md: fix indentation in example code
The example Python code had wrong indentation, and wouldn't actually
work if naively copy-pasted. Noticed by Noam Hasson.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190929091440.28042-1-nyh@scylladb.com>
2019-09-30 13:03:29 +03:00
Avi Kivity
c6b66d197b Merge "Couple of preparatory patches for lwt" from Gleb
"
This is a collection of assorted patches that will be needed for LWT.
Most of them are trivial, but one touches a lot of files, so have a
good chance to cause rebase headache (I already had to rebase it on
top of Alternator). Lets push them earlier instead of carrying them in
the lwt branch.
"

* 'gleb/lwt-prepare-v2' of github.com:scylladb/seastar-dev:
  lwt: make _last_timestamp_micros static
  lwt: Add client_state::get_timestamp_for_paxos() function
  lwt: Pass client_state reference all the way to storage_proxy::query
  exceptions: Add a constructor for unavailable_exception that allows providing a custom message
  serializer: Add std::variant support
  lwt: Add missing functions to utils/UUID_gen.hh
2019-09-29 13:02:26 +03:00
Avi Kivity
9e990725d9 Merge "Simplify and explain from_varint_to_integer #5031" from Rafael
"
This is the second version of the patch series. The previous one was just the second patch, this one adds more tests an another patch to make it easier to test that the new code has the same behavior as the old one.
"

* 'espindola/overflow-is-intentional' of https://github.com/espindola/scylla:
  types: Simplify and explain from_varint_to_integer
  Add more cast tests
2019-09-29 11:27:55 +03:00
Tomasz Grabiec
b0e0f29b06 db: read: Filter-out sstables using its first and last keys
Affects single-partition reads only.

Refs #5113

When executing a query on the replica we do several things in order to
narrow down the sstable set we read from.

For tables which use LeveledCompactionStrategy, we store sstables in
an interval set and we select only sstables whose partition ranges
overlap with the queried range. Other compaction strategies don't
organize the sstables and will select all sstables at this stage. The
reasoning behind this is that for non-LCS compaction strategies the
sstables' ranges will typically overlap and using interval sets in
this case would not be effective and would result in quadratic (in
sstable count) memory consumption.

The assumption for overlap does not hold if the sstables come from
repair or streaming, which generates non-overlapping sstables.

At a later stage, for single-partition queries, we use the sstables'
bloom filter (kept in memory) to drop sstables which surely don't
contain given partition. Then we proceed to sstable indexes to narrow
down the data file range.

Tables which don't use LCS will do unnecessary I/O to read index pages
for single-partition reads if the partition is outside of the
sstable's range and the bloom filter is ineffective (Refs #5112).

This patch fixes the problem by consulting sstable's partition range
in addition to the bloom filter, so that the non-overlapping sstables
will be filtered out with certainty and not depend on bloom filter's
efficiency.

It's also faster to drop sstables based on the keys than the bloom
filter.

Tests:
  - unit (dev)
  - manual using cqlsh

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>
2019-09-28 19:42:57 +03:00
Tomasz Grabiec
b93cc21a94 sstables: Fix partition key count estimation for a range
The method sstable::estimated_keys_for_range() was severely
under-estimating the number of partitions in an sstable for a given
token range.

The first reason is that it underestimated the number of sstable index
pages covered by the range, by one. In extreme, if the requested range
falls into a single index page, we will assume 0 pages, and report 1
partition. The reason is that we were using
get_sample_indexes_for_range(), which returns entries with the keys
falling into the range, not entries for pages which may contain the
keys.

A single page can have a lot of partitions though. By default, there
is a 1:20000 ratio between summary entry size and the data file size
covered by it. If partitions are small, that can be many hundreds of
partitions.

Another reason is that we underestimate the number of partitions in an
index page. We multiply the number of pages by:

   (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval)
     / _components->summary.header.sampling_level

Using defaults, that means multiplying by 128. In the cassandra-stress
workload a single partition takes about 300 bytes in the data file and
summary entry is 22 bytes. That means a single page covers 22 * 20'000
= 440'000 bytes of the data file, which contains about 1'466
partitions. So we underestimate by an order of magnitude.

Underestimating the number of partitions will result in too small
bloom filters being generated for the sstables which are the output of
repair or streaming. This will make the bloom filters ineffective
which results in reads selecting more sstables than necessary.

The fix is to base the estimation on the number of index pages which
may contain keys for the range, and multiply that by the average key
count per index page.

Fixes #5112.
Refs #4994.

The output of test_key_count_estimation:

Before:

count = 10000
est = 10112
est([-inf; +inf]) = 512
est([0; 0]) = 128
est([0; 63]) = 128
est([0; 255]) = 128
est([0; 511]) = 128
est([0; 1023]) = 128
est([0; 4095]) = 256
est([0; 9999]) = 512
est([5000; 5000]) = 1
est([5000; 5063]) = 1
est([5000; 5255]) = 1
est([5000; 5511]) = 1
est([5000; 6023]) = 128
est([5000; 9095]) = 256
est([5000; 9999]) = 256
est(non-overlapping to the left) = 1
est(non-overlapping to the right) = 1

After:

count = 10000
est = 10112
est([-inf; +inf]) = 10112
est([0; 0]) = 2528
est([0; 63]) = 2528
est([0; 255]) = 2528
est([0; 511]) = 2528
est([0; 1023]) = 2528
est([0; 4095]) = 5056
est([0; 9999]) = 10112
est([5000; 5000]) = 2528
est([5000; 5063]) = 2528
est([5000; 5255]) = 2528
est([5000; 5511]) = 2528
est([5000; 6023]) = 5056
est([5000; 9095]) = 7584
est([5000; 9999]) = 7584
est(non-overlapping to the left) = 0
est(non-overlapping to the right) = 0

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>
2019-09-28 19:36:43 +03:00
Piotr Sarna
10f90d0e25 types: remove deprecated comment
The comment does not apply anymore, as this definition is no more
in database.hh.
Message-Id: <a0b6ff851e1e3bcb5fcd402fbf363be7af0219af.1569580556.git.sarna@scylladb.com>
2019-09-27 19:32:17 +02:00
Dejan Mircevski
9a89e0c5ec dbuild: Update README on interactive mode
`dbuild` was recently (24c732057) updated to run in interactive mode
when given no arguments; we can now update the README to mention that.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-27 16:33:27 +02:00
Dejan Mircevski
f8638d8ae1 alternator: Add build byproducts to .gitignore
Add .pytest_cache and expressions.tokens to the top-level .gitignore.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-27 16:18:45 +02:00
Dejan Mircevski
332ffa77ea alternator: Actually use BEGINS_WITH in its tests
For some reason, BEGINS_WITH tests used EQ as comparison operator.

Tests: pytest test_expected.py

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-09-26 22:41:34 +03:00
Tomasz Grabiec
5b0e48f25b Merge "toppartitions: don't transport schema_ptr across shards" from Avi
When the toppartitions operation gathers results, it copies partition
keys with their schema_ptr:s. When these schema_ptr:s are copies
or destroyed, they can cause leaks or premature frees of the schema
in its original shard since reference count operations in are not atomic.

Fix that by converting the schema_ptr to a global_schema_ptr during
transportation.

Fixes #5104 (direct bug)
Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node)
Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node)

Tests: new test added that fails with the existing code in debug mode,
manual toppartitions test
2019-09-26 17:09:54 +02:00
Avi Kivity
36b4d55b28 tests: add test for toppartitions cross-shard schema_ptr copy 2019-09-26 17:40:46 +03:00
Avi Kivity
670f398a8a toppartitions: do not copy schema_ptr:s in item keys across shards
Copying schema_ptrs across shards results in memory corruption since
lw_shared_ptr does not use atomic operations for reference counts.
Prevent that by converting schema_ptr:s to global_schema_ptr:s before
shipping them across shards in the map operation, and converting them
back to local schema_ptr:s in the reduce operation.
2019-09-26 17:26:40 +03:00
Avi Kivity
f015bd69b7 toppartitions: compare schemas using schema::id(), not pointer to schema
This allows keys from different stages in the schema's like to compare equal.
This is safe since the partition key cannot change, unlike the rest of the schema.

More importantly, it will allow us to compare keys made local after a pass through
global_schema_ptr, which does not guarantee that the schema_ptr conversion will be
the same even when starting with the same global_schema_ptr.
2019-09-26 17:15:46 +03:00
Avi Kivity
ea4976a128 schema_registry: mark global_schema_ptr move constructor noexcept
Throwing move constructors are a a pain; so we should try to make
them noexcept. Currently, global_schema_ptr's move constructor
throws an exception if used illegaly (moving from a different shard);
this patch changes it to an assert, on the grounds that this error
is impossible to recover from.

The direct motivation for the patch is the desire to store objects
containing a global_schema_ptr in a chunked_vector, to move lists
of partition keys across shards for the topppartitions functionality.
chunked_vector currently requires noexcept move constructors for its
value_type.
2019-09-26 16:56:59 +03:00
Avi Kivity
ba64ec78cf messaging_service: use rpc::tuple instead of variadic futures for rpc
Since variadic future<> is deprecated, switch to rpc::tuple for multiple
return values in rpc calls. This is more or less mechanical translation.
2019-09-26 12:09:31 +02:00
Tomasz Grabiec
9183e28f2c Merge "Recreate dependent user types" from Rafael
When a user type changes we were not recreating other uses types that
use it. This patch series fixes that and makes it clear which code is
responsible for it.

In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049
2019-09-26 12:06:32 +02:00
Gleb Natapov
e0b303b432 lwt: make _last_timestamp_micros static
If each client_state has its own copy of the variable two clients may
generate timestamps that clash and needlessly create contention. Making
the variable shared between all client_state on the same shard will make
sure this will not happen to two clients on the same shard. It may still
happen for two client on two different shards or two different nodes.
2019-09-26 11:44:00 +03:00
Gleb Natapov
622d21f740 lwt: Add client_state::get_timestamp_for_paxos() function
Paxos needs a unique timestamp that is greater than some other
timestamp, so that the next round had more chances to succeed.
Add a function that returns such a timestamp.
2019-09-26 11:44:00 +03:00
Gleb Natapov
e72a105b5e lwt: Pass client_state reference all the way to storage_proxy::query
client_state holds a state to generate monotonically increasing unique
timestamp. Queries with a SERIAL consistency level need it to generate
a paxos round.
2019-09-26 11:44:00 +03:00
Gleb Natapov
556f65e8a1 exceptions: Add a constructor for unavailable_exception that allows providing a custom message 2019-09-26 11:44:00 +03:00
Gleb Natapov
209414b4eb serializer: Add std::variant support 2019-09-26 11:44:00 +03:00
Gleb Natapov
f9209e27d4 lwt: Add missing functions to utils/UUID_gen.hh
Some lwt related code is missing in our UUID implementation. Add it.
2019-09-26 11:44:00 +03:00
Rafael Ávila de Espíndola
5af8b1e4a3 types: recreate dependent user types.
In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
4c3209c549 types: Don't include dependent user types in update.
The way schema changes propagate is by editing the system tables and
comparing the before and after state.

When a user type A uses another user type B and we modify B, the
representation of A in the system table doesn't change, so this code
was not producing any changes on the diff that the receiving side
uses.

Deleting it makes it clear that it is the receiver's responsibility to
handle dependent user types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
34eddafdb0 types: Don't modify the type list in db::cql_type_parser::raw_builder
With this patch db::cql_type_parser::raw_builder creates a local copy
of the list of existing types and uses that internally. By doing that
build() should have no observable behavior other than returning the
new types.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola
d6b2e3b23b types: pass a reference to prepare_internal
We were never passing a null pointer and never saving a copy of the
lw_shared_ptr. Passing a reference is more flexible as not all callers
are required to hold the user_types_metadata in a lw_shared_ptr.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-25 15:40:30 -07:00
Avi Kivity
03260dd910 Update seastar submodule
* seastar b56a8c5045...c21a7557f9 (3):
  > net: socket::{set,get}_reuseaddr() should not be virtual
  > iotune: print verbose message in case of shutdown errors
  > iotune: close test file on shutdown

Fixes #4946.
2019-09-25 16:08:32 +03:00
Tomasz Grabiec
06b9818e98 Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny
1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand.
2. Lookup the view_update_write_response_handler id before calling  timeout_cb and tolerate it not found.
   Just log a warning if this happened.

Fixes #5032
2019-09-25 14:49:42 +02:00
Avi Kivity
83bc59a89f Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099)" from Tomasz
"
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during table schema altering when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
"

* 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla:
  mvcc: Fix incorrect schema verison being used to copy the mutation when applying
  mutation_partition: Track and validate schema version in debug builds
  tests: Use the correct schema to access mutation_partition
2019-09-25 15:30:22 +03:00
Tomasz Grabiec
11440ff792 mvcc: Fix incorrect schema verison being used to copy the mutation when applying
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect, because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during alter when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
2019-09-25 11:28:07 +02:00
Tomasz Grabiec
bce0dac751 mutation_partition: Track and validate schema version in debug builds
This patch makes mutation_partition validate the invariant that it's
supposed to be accessed only with the schema version which it conforms
to.

Refs #5095
2019-09-25 10:27:06 +02:00
Avi Kivity
721fa44c4f Update seastar submodule
* seastar e51a1a8ed9...b56a8c5045 (3):
  > net: add support for UNIX-domain sockets
  > future: Warn on promise::set_exception with no corresponding future or task
  > Merge "Handle exceptions in repeat_until_value and misc cleanups" from Rafael
2019-09-25 11:21:57 +03:00
Benny Halevy
e9388b3f03 storage_proxy::drain_on_shutdown fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
b7c7af8a75 storage_proxy: validate id from view_update_handlers_list
Handle a race where a write handler is removed from _response_handlers
but not yet from _view_update_handlers_list.

Fixes #5032

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
1fea5f5904 storage_proxy: refactor remove_response_handler
Refactor remove_response_handler_entry out of remove_response_handler,
to be called on a valid iterator found by _response_handlers.find(id).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Benny Halevy
592c4bcfc2 storage_proxy: remove_response_handler: assert id was found
Help identify cases like seen in #5032 where the handler id
wasn't found from the on_down -> timeout_cb path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-25 11:19:50 +03:00
Raphael S. Carvalho
571fa94eb5 sstables/compaction_manager: Don't perform upgrade on shared SSTables
compaction_manager::perform_sstable_upgrade() fails when it feeds
compaction mechanism with shared sstables. Shared sstables should
be ignored when performing upgrade and so wait for reshard to pick
them up in parallel. Whenever a shared sstable is brought up either
on restart or via refresh, reshard procedure kicks in.
Reshard picks the highest supported format so the upgrade for
shared sstable will naturally take place.

Fixes #5056.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>
2019-09-25 11:18:40 +03:00
Asias He
19e8c14ad1 gossiper: Improve the gossip timer callback lock handling (#5097)
- Update the outdated comments in do_stop_gossiping. It was
  storage_service not storage_proxy that used the lock. More
  importantly, storage_service does not use it any more.

- Drop the unused timer_callback_lock and timer_callback_unlock API

- Use with_semaphore to make sure the semaphore usage is balanced.

- Add log in gossiper::do_stop_gossiping when it tries to take the
  semaphore to help debug hang during the shutdown.

Refs: #4891
Refs: #4971
2019-09-25 10:46:38 +03:00
Tomasz Grabiec
4d9b176aaa tests: Use the correct schema to access mutation_partition 2019-09-24 19:46:57 +02:00
Botond Dénes
425cc0c104 doc: add debugging.md
A documentation file that is intended to be a place for anything
debugging related: getting started tutorial, tips and tricks and
advanced guides.
For now it contains a short introductions, some selected links to
more in-depth documentation and some trips and tricks that I could think
off the top of my head.
One of those tricks describes how to load cores obtained from
relocatable packages inside the `dbuild` container. I originally
intended to add that to `tools/toolchain/README.md` but was convinced
that `docs/debugging.md` would be a better place for this.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924133110.15069-1-bdenes@scylladb.com>
2019-09-24 20:18:45 +03:00
Botond Dénes
d57ab83bc8 querier_cache: add inserted stat
Recently we have seen a case where the population stat of the cache was
corrupt, either due to misaccounting or some more serious corruption.
When debugging something like that it would have been useful to know how
many items have been inserted to the cache. I also believe that such a
counter could be useful generally as well.

Refs: #4918

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924083429.43038-1-bdenes@scylladb.com>
2019-09-24 10:52:49 +02:00
Avi Kivity
8e8a048ada Merge "lsa: Assert no cross-shard region locking #5090" from Tomasz
"
We observed an abort on bad_alloc which was not caused by real OOM,
but could be explained by cache region being locked from a different
shard, which is not allowed, concurrently with memory reclamation.

It's impossible now to prove this, or, if that was indeed the case, to
determine which code path was attempting such lock. This patch adds an
assert which would catch such incorrect locking at the attempt.

Refs #4978

Tests:
 - unit (dev, release, debug)
"

* 'assert-no-xshard-lsa-locking' of https://github.com/tgrabiec/scylla:
  lsa: Assert no cross-shard region locking
  tests: Make managed_vector_test a seastar test
2019-09-23 19:52:47 +03:00
Avi Kivity
79d17f3c80 Update seastar submodule
* seastar 2a526bb120...e51a1a8ed9 (2):
  > rpc: introduce rpc::tuple as a way to move away from variadic future
  > shared_future: don't warn on broken futures
2019-09-23 19:50:40 +03:00
Avi Kivity
1b8009d10c sstables: compaction_manager: #include seastarx.hh
Make it easier for the IDE to resolve references to the seastar
namespace. In any case include files should be stand-alone and not
depend on previously included files.
2019-09-23 16:12:49 +02:00
Avi Kivity
07af9774b3 relocatable: erase build directory from executable and debug info
The build directory is meaningless, since it is typically some
directory in a continuous integration server. That means someone
debugging the relocatable package needs to issue the gdb command
'set substitute-path' with the correct arguments, or they lose
source debugging. Doing so in the relocatable package build saves
this step.

The default build is not modified, since a typical local build
benefits from having the paths hardcoded, as the debugger will
find the sources automatically.
2019-09-23 13:08:15 +02:00
Tomasz Grabiec
eb08ab7ed9 lsa: Assert no cross-shard region locking
We observed an abort on bad_alloc which was not caused by real OOM,
but could be explained by cache region being locked from a different
shard, which is not allowed, concurrently with memory reclamation.

It's impossible now to prove this, or, if that was indeed the case, to
determine which code path was attempting such lock. This patch adds an
assert which would catch such incorrect locking at the attempt.

Refs #4978
2019-09-23 12:51:29 +02:00
Tomasz Grabiec
8bedcd6696 tests: Make managed_vector_test a seastar test
LSA will depend on seastar reactor being present.
2019-09-23 12:51:24 +02:00
Raphael S. Carvalho
b4cf429aab sstables/LCS: Fix increased write amplification due to incorrect SSTable demotion
LCS demotes a SSTable from a given level when it thinks that level is inactive.
Inactive level means N rounds (compaction attempt) without any activity in it,
in other words, no SSTable has been promoted to it.
The problem happens because the metadata that tracks inactiveness of each level
can be incorrectly updated when there's an ongoing compaction. LCS has parallel
compaction disabled. So if a table finds itself running a long operation like
cleanup that blocks minor compaction, LCS could incorrectly think that many
levels need demotion, and by the time cleanup finishes, some demotions would
incorrectly take place.
This problem is fixed by only updating the counter that tracks inactiveness
when compaction completes, so it's not incorrectly updated when there's an
ongoing compaction for the table.

Fixes #4919.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190917235708.8131-1-raphaelsc@scylladb.com>
2019-09-22 10:46:38 +03:00
Eliran Sinvani
280715ad45 Storage proxy: protect against infinite recursion in query_partition_key_range_concurrent
A recent fix to #3767 limited the amount of ranges that
can return from query_ranges_to_vnodes_generator. This with
the combination of a large amount of token ranges can lead to
an infinite recursion. The algorithm multiplies by factor of
2 (actualy a shift left by one)  the amount of requested
tokens in each recursion iteration. As long as the requested
number of ranges is greater than 0, the recursion is implicit,
and each call is scheduled separately since the call is inside
a continuation of a map reduce.
But if the amount of iterations is large enough (~32) the
counter for requested ranges zeros out and from that moment on
two things will happen:
1. The counter will remain 0 forever (0*2 == 0)
2. The map reduce future will be immediately available and this
will result in the continuation being invoked immediately.
The latter causes the recursive call to be a "regular" recursive call
thus, through the stack and not the task queue of the scheduler, and
the former causes this recursion to be infinite.
The combination creates a stack that keeps growing and eventually
overflows resulting in undefined behavior (due to memory overrun).

This patch prevent the problem from happening, it limits the growth of
the concurrency counter beyond twice the last amount of tokens returned
by the query_ranges_to_vnodes_generator.And also makes sure it is not
get stuck at zero.

Testing: * Unit test in dev mode.
         * Modified add 50 dtest that reproduce the problem

Fixes #4944

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190922072838.14957-1-eliransin@scylladb.com>
2019-09-22 10:33:31 +03:00
Gleb Natapov
73e3d0a283 messaging_service: enable reuseaddr on messaging service rpc
Fixes #4943

Message-Id: <20190918152405.GV21540@scylladb.com>
2019-09-19 11:43:03 +03:00
Rafael Ávila de Espíndola
4d0916a094 commitlog: Handle gate_closed_exception
Before this patch, if the _gate is closed, with_gate throws and
forward_to is not executed. When the promise<> p is destroyed it marks
its _task as a broken promise.

What happens next depends on the branch.

On master, we warn when the shared_future is destroyed, so this patch
changes the warning from a broken_promise to a gate closed.

On 3.1, we warn when the promises in shared_future::_peers are
destroyed since they no longer have a future attached: The future that
was attached was the "auto f" just before the with_gate call, and it
is destroyed when with_gate throws. The net result is that this patch
fixes the warning in 3.1.

I will send a patch to seastar to make the warning on master more
consistent with the warning in 3.1.

Fixes #4394

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190917211915.117252-1-espindola@scylladb.com>
2019-09-17 23:41:21 +02:00
Avi Kivity
60656d1959 Update seastar submodule
* seastar 84d8e9fe9b...2a526bb120 (1):
  > iotune: fix exception handling in case test file creation fails

Fixes #5001.
2019-09-16 19:39:14 +03:00
Glauber Costa
c9f2d1d105 do not crash in user-defined operations if the controller is disabled
Scylla currently crashes if we run manual operations like nodetool
compact with the controller disabled. While we neither like nor
recommend running with the controller disabled, due to some corner cases
in the controller algorithm we are not yet at the point in which we can
deprecate this and are sometimes forced to disable it.

The reason for the crash is that manual operations will invoke
_backlog_of_shares, which returns what is the backlog needed to
create a certain number of shares. That scan the existing control
points, but when we run without the controller there are no control
points and we crash.

Backlog doesn't matter if the controller is disabled, and the return
value of this function will be immaterial in this case. So to avoid the
crash, we return something right away if the controller is disabled.

Fixes #5016

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-09-16 18:26:57 +02:00
Avi Kivity
d77171e10e build: adjust libthread_db file name to match gdb expectations
gdb searches for libthread_db.so using its canonical name of libthread_db.so.1 rather
than the file name of libthread_db-1.0.so, so use that name to store the file in the
archive.

Fixes #4996.
2019-09-16 14:48:42 +02:00
Avi Kivity
7502985112 Update seastar submodule
* seastar b3fb4aaab3...84d8e9fe9b (8):
  > Use aio fsync if available
  > Merge "fix some tcp connection bugs and add reuseaddr option to a client socket" from Gleb
  > lz4: use LZ4_decompress_safe
  > reactor: document seastar::remove_file()
  > core/file.hh: remove redundant std::move()
  > core/{file,sstring}: do not add `const` to return value
  > http/api_docs: always call parent constructor
  > Add input_stream blurb
2019-09-16 11:52:55 +03:00
Piotr Sarna
feec3825aa view: degrade shutdown bookkeeping update failures log to warn
Currently, if updating bookkeeping operations for view building fails,
we log the error message and continue. However, during shutdown,
some errors are more likely to happen due to existing issues
like #4384. To differentiate actual errors from semi-expected
errors during shutdown, the latter are now logged with a warning
level instead of error.

Fixes #4954
2019-09-16 10:13:06 +03:00
Piotr Sarna
f912122072 main: log unexpected errors thrown on shutdown (#4993)
Shutdown routines are usually implemented via the deferred_action
mechanism, which runs a function in its destructor. We thus expect
the function to be noexcept, but unfortunately it's not always
the case. Throwing in the destructor results in terminating the program
anyway, but before we do that, the exception can be logged so it's
easier to investigate and pinpoint the issue.

Example output before the patch:
INFO  2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder
terminate called without an active exception
Aborting on shard 0.
Backtrace:
  0x000000000184a9ad
(...)

Example output after the patch:
INFO  2019-09-10 12:49:05,858 [shard 0] view - Stopping view builder
ERROR 2019-09-10 12:49:05,858 [shard 0] init - Unexpected error on shutdown: std::runtime_error (Hello there!)
terminate called without an active exception
Aborting on shard 0.
Backtrace:
  0x000000000184a9ad
(...)
2019-09-16 09:42:55 +03:00
Rafael Ávila de Espíndola
1d9ba4c79b types: Simplify and explain from_varint_to_integer
This simplifies the implementation of from_varint_to_integer and
avoids using the fact that a static_cast from cpp_int to uint64_t
seems to just keep the low 64 bits.

The boost release notes
(https://www.boost.org/users/history/version_1_67_0.html) implies that
the conversion function should return the maximum value a uint64_t can
hold if the original value is too large.

The idea of using a & with ~0 is a suggestion from the boost release
notes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-15 14:44:54 -07:00
Rafael Ávila de Espíndola
6611e9faf7 Add more cast tests
These cover converting a varint to a value smaller than 64 bits.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-15 14:44:54 -07:00
Benny Halevy
c22ad90c04 scyllatop: livedata, metric: expire absent metrics
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 19:48:09 +03:00
Benny Halevy
6e807a56e1 scyllatop: livedata: update all metrics based on new discovered list
Update current results dictionary using the Metric.discover method.

New results are added and missing results are marked as absent.
(Both full metrics or specific keys)

Previously, with prometheous, each metric.update called query_list
resulting in O(n^2) when all metric were updated, like in the scylla_top
dtest - causing test timeout when testing debug build.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 19:45:34 +03:00
Benny Halevy
16de4600a0 scyllatop: metric: return discover results as dict
So that we can easily search by symbol for updating
multiple results in a single pass.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
02707621d4 scyllatop: metric: update_info in discover
So that all metric information can be retrieved in a single pass.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
3861460d3b scyllatop: metric: refactor update method
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
99ab60fc27 scyllatop: metric: add_to_results
In preparation to changing results to a dict
use a method to add a new metric to the results.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
b489556807 scyllatop: metric: refactor discover and discover_with_help
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
8f7c721907 scyllatop: livedata: get rid of _setupUserSpecifiedMetrics
Add self._metricPatterns member and merge _setupUserSpecifiedMetrics
with _initializeMetrics.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Benny Halevy
c17aee0dd3 scyllatop: add debug logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-15 16:07:19 +03:00
Tomasz Grabiec
79935df959 commitlog: replay: Respect back-pressure from memtable space to prevent OOM
Commit log replay was bypassing memtable space back-pressure, and if
replay was faster than memtable flush, it could lead to OOM.

The fix is to call database::apply_in_memory() instead of
table::apply(). The former blocks when memtable space is full.

Fixes #4982.

Tests:
  - unit (release)
  - manual, replay with memtable flush failin and without failing

Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>
2019-09-15 11:51:56 +03:00
Tomasz Grabiec
3c49b2960b gdb: Introduce 'scylla memtables'
Example output:

(gdb) scylla memtables
table "ks_truncate"."standard1":
  (memtable*) 0x60c0005a5500: total=131072, used=131072, free=0, flushed=0
table "keyspace1"."standard1":
  (memtable*) 0x60c0005a6000: total=5144444928, used=4512728524, free=631716404, flushed=0
  (memtable*) 0x60c0005a8a80: total=426901504, used=374294312, free=52607192, flushed=0
  (memtable*) 0x60c000eb6a80: total=0, used=0, free=0, flushed=0
table "system_traces"."sessions_time_idx":
  (memtable*) 0x60c0005a4d80: total=131072, used=131072, free=0, flushed=0


Message-Id: <1568133476-22463-1-git-send-email-tgrabiec@scylladb.com>
2019-09-15 10:39:55 +03:00
Kamil Braun
9bf4fe669f Auto-expand replication_factor for NetworkTopologyStrategy (#4667)
If the user supplies the 'replication_factor' to the 'NetworkTopologyStrategy' class,
it will expand into a replication factor for each existing DC for their convenience.

Resolves #4210.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-09-15 10:38:09 +03:00
Tomasz Grabiec
8517eecc28 Revert "Simplify db::cql_type_parser::parse"
This reverts commit 7f64a6ec4b.

Fixes #5011

The reverted commit exposes #3760 for all schemas, not only those
which have UDTs.

The problem is that table schema deserialization now requires keyspace
to be present. If the replica hasn't received schema changes which
introduce the keyspace yet, the write will fail.
2019-09-12 12:45:21 +02:00
Nadav Har'El
67a07e9cbc README.md: mention Alternator
Mention on the top-level README.md that Scylla by default is compatible
with Cassandra, but also has experimental support for DynamoDB's API.
Provide links to alternator/alternator.md and alternator/getting-started.md
with more information about this feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190911080913.10141-1-nyh@scylladb.com>
2019-09-11 18:01:58 +03:00
Avi Kivity
c08921b55a Merge "Alternator - Add support for DynamoDB Compatible API in Scylla" from Nadav & Piotr
"
In this patch set, written by Piotr Sarna and myself, we add Alternator - a new
Scylla feature adding compatibility with the API of Amazon DynamoDB(TM).
DynamoDB's API uses JSON-encoded requests and responses which are sent over
an HTTP or HTTPS transport. It is described in detail on Amazon's site:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/

Our goal is that any application written to use Amazon DynamoDB could
be run, unmodified, against Scylla with Alternator enabled. However, at this
stage the Alternator implementation is incomplete, and some of DynamoDB's
API features are not yet supported. The extent of Alternator's compatibility
with DynamoDB is described in the document docs/alternator/alternator.md
included in this patch set. The same document also describes Alternator's
design (and also points to a longer design document).

By default, Scylla continues to listen only to Cassandra API requests and not
DynamoDB API requests. To enable DynamoDB-API compatibility, you must set
the alternator-port configuration option (via command line or YAML) to the port on
which you wish to listen for DynamoDB API requests. For more information, see
docs/alternator/alternator.md. The document docs/alternator/getting-started.md
also contains some examples of how to get started with Alternator.
"

* 'alternator' of https://github.com/nyh/scylla: (272 commits)
  Added comments about DAX, monitoring and more
  alternator: fix usage of client_state
  alternator-test: complete test_expected.py for rest of comparison operators
  alternator-test: reproduce bug in Expected with EQ of set value
  alternator: implement the Expected request parameter
  alternator: add returning PAY_PER_REQUEST billing mode
  alternator: update docs/alternator.md on GSI/LSI situation
  Alternator: Add getting started document for alternator
  move alternator.md to its own directory
  alternator-test: add xfail test for GSI with 2 regular columns
  alternator/executor.cc: Latencies should use steady_clock
  alternator-test: fix LSI tests
  alternator-test: fix test_describe_endpoints.py for AWS run
  alternator-test: test_describe_endpoints.py without configuring AWS
  alternator: run local tests without configuring AWS
  alternator-test: add LSI tests
  alternator-test: bump create table time limit to 200s
  alternator: add basic LSI support
  alternator: rename reserved column name "attrs"
  alternator: migrate make_map_element_restriction to string view
  ...
2019-09-11 18:01:05 +03:00
Dor Laor
7d639d058e Added comments about DAX, monitoring and more 2019-09-11 18:01:05 +03:00
Nadav Har'El
c953aa3e20 alternator-test: complete test_expected.py for rest of comparison operators
This patch adds tests for all the missing comparion operators in the
Expected parameter (the old-style parameter for conditional operations).
All these new tests are now xfailing on Alternator (and succeeding on
DynamoDB), because these operators are not yet implemented in Alternator
(we only implemented EQ and BEGINS_WITH, so far - the rest are easy but
need to be implemented).

The test_expected.py is now hopefully comprehensive, covering the entire
feature set of the "Expected" parameter and all its various cases and
subcases.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190910092208.23461-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
23bb3948ee alternator-test: reproduce bug in Expected with EQ of set value
Our implementation of the "EQ" operator in Expected (conditional
operation) just compares the JSON represntation of the values.
This is almost always correct, but unfortunately incorrect for
sets - where we can have two equal sets despite having a
different order.

This patch just adds an (xfailing) test for this bug.

The bug itself can be fixed in the future in one of several ways
including changing the implementation of EQ, or changing the
serialization of sets so they'll always be sorted in the same
way.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190909125147.16484-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
13d657b20d alternator: implement the Expected request parameter
In this patch we implement the Expected parameter for the UpdateItem,
PutItem and DeleteItem operations. This parameter allows a conditional
update - i.e., do an update only if the existing value of the item
matches some condition.
This is the older form of conditional updates, but is still used by many
applications, including Amazon's Tic-Tac-Toe demo.

As usual, we do not yet provide isolation guarantees for read-modify-write
operations - the item is simply read before the modification, and there is
no protection against concurrent operation. This will of course need to be
addressed in the future.

The Expected parameter has a relatively large number of variations, and most
of them are supported by this code, except that currenly only two comparison
operators are supported (EQ and BEGINS_WITH) out of the 13 listed in the
documentation. The rest will be implemented later.

This patch also includes comprehensive tests for the Expected feature.
These tests are almost exhaustive, except for one missing part (labled FIXME) -
among the 13 comparison operations, the tests only check the EQ and BEGINS_WITH
operators. We'll later need to add checks to the rest of them as well.
As usual, all the tests pass on Amazon DynamoDB, and after this patch all
of them succeed on Alternator too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190905125558.29133-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
c5fc48d1ee alternator: add returning PAY_PER_REQUEST billing mode
In order for Spark jobs to work correctly, a hardcoded PAY_PER_REQUEST
billing mode entry is returned when describing a table with
a DescribeTable request.
Also, one test case in test_describe_table.py is no longer marked XFAIL.
Message-Id: <a4e6d02788d8be48b389045e6ff8c1628240197c.1567688894.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b58eadd6c9 alternator: update docs/alternator.md on GSI/LSI situation
Update docs/alternator.md on the current level of compatibility of our
GSI and LSI implementation vs. DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190904120730.12615-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Eliran Sinvani
a6f600c54f Alternator: Add getting started document for alternator
This patch adds a getting started document for alternator,
it explains how to start up a cluster that has an alternator
API port open and how to test that it works using either an
application or some simple and minimal python scripts.
The goal of the document is to get a user to have an up and
running docker based cluster with alternator support in the
shortest time possible.
2019-09-11 18:01:05 +03:00
Eliran Sinvani
573ff2de35 move alternator.md to its own directory
As part of trying to make alternator more accessible
to users, we expect more documents to be created so
it seems like a good idea to give all of the alternator
docs their own directory.
2019-09-11 18:01:05 +03:00
Piotr Sarna
6579a3850a alternator-test: add xfail test for GSI with 2 regular columns
When updating the second regular base column that is also a view
key, the code in Scylla will assume it only needs to update an entry
instead of replacing an old one. This leads to inconsitencies
exposed in the test case.
Message-Id: <5dfeb9f61f986daa6e480e9da4c7aabb5a09a4ec.1567599461.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Amnon Heiman
722b4b6e98 alternator/executor.cc: Latencies should use steady_clock
To get a correct latency estimations executor should use a higher clock
resolution.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
b470137cea alternator-test: fix LSI tests
LSI tests are amended, so they no longer needlessly XPASS:
 * two xpassing tests are no longer marked XFAIL
 * there's an additional test for partial projection
   that succeeds on DynamoDB and does not work fine yet in alternator
Message-Id: <0418186cb6c8a91de84837ffef9ac0947ea4e3d3.1567585915.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
dc1d577421 alternator-test: fix test_describe_endpoints.py for AWS run
The previous patch fixed test_describe_endpoints.py for a local run
without an AWS configuration. But when running with "--aws", we do
need to use that AWS configuration, and this patch fixes this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
897dffb977 alternator-test: test_describe_endpoints.py without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

The previous patch already fixed this for most tests, this patch fixes the
same issue in test_describe_endpoints.py, which had a separate copy of the
problematic code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b39101cd04 alternator: run local tests without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

Also modified the README to be clearer, and more focused on the local
runs.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708121420.7485-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
efff187deb alternator-test: add LSI tests
Cases for local secondary indexes are added - loosely based on
test_gsi.py suite.
2019-09-11 18:01:05 +03:00
Piotr Sarna
927dc87b9c alternator-test: bump create table time limit to 200s
Unfortunately the previous 100s limit proved to be not enough
for creating tables with both local and global indexes attached
to them. Empirically 200s was chosen as a safe default,
as the longest test oscillated around 100s with the deviation of 10s.
2019-09-11 18:01:05 +03:00
Piotr Sarna
2fcd1ff8a9 alternator: add basic LSI support
With this patch, LocalSecondaryIndexes can be added to a table
during its creation. The implementation is heavily shared
with GlobalSecondaryIndexes and as such suffers from the same TODOs:
projections, describing more details in DescribeTable, etc.
2019-09-11 18:01:05 +03:00
Nadav Har'El
7b8917b5cb alternator: rename reserved column name "attrs"
We currently reserve the column name "attrs" for a map of attributes,
so the user is not allowed to use this name as a name of a key.

We plan to lift this reservation in a future patch, but until we do,
let's at least choose a more obscure name to forbid - in this patch ":attrs".
It is even less likely that a user will want to use this specific name
as a column name.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190903133508.2033-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ef7903a90f alternator: migrate make_map_element_restriction to string view
In order to elide unnecessary copying and allow more copy elision
in the future, make_map_element_restriction helper function
uses string_view instead of a const string reference.
Message-Id: <1a3e82e7046dc40df604ee7fbea786f3853fee4d.1567502264.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
fc946ddfba alternator: clean error, not a crash, on reserved column name
Currently, we reserve the name ATTRS_COLUMN_NAME ("attrs") - the user
cannot use it as a key column name (key of the base table or GSI or LSI)
because we use this name for the attribute map we add to the schema.

Currently, if the user does attempt to create such a key column, the
result is undefined (sometimes corrupt sstables, sometimes outright crashes).
This patches fixes it to become a clean error, saying that this column name is
currently reserved.

The test test_create_table_special_column_name now cleanly fails, instead
of crashing Scylla, so it is converted from "skip" to "xfail".

Eventually we need to solve this issue completely (e.g., in rare cases
rename columns to allow us to reserve a name like ATTRS_COLUMN_NAME,
or alternatively, instead of using a fixed name ATTRS_COLUMN_NAME pick a
different one different from the key column names). But until we do,
better fail with a clear error instead of a crash.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190901102832.7452-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
d64980f2ae alternator-test: add initial test_condition_expression file
The file initially consists of a very simple case that succeeds
with `--aws` and expectedly fails without it, because the expression
is not implemented yet.
2019-09-11 18:01:05 +03:00
Piotr Sarna
80edc00f62 alternator-test: add tests for unsupported expressions
The test cases are marked XFAIL, as their expressions are not yet
supported in alternator. With `--aws`, they pass.
2019-09-11 18:01:05 +03:00
Pekka Enberg
380a7be54b dist/docker: Add support for Alternator
This adds a "alternator-address" and "alternator-port" configuration
options to the Docker image, so people can enable Alternator with
"docker run" with:

  docker run --name some-scylla -d <image> --alternator-port=8080
Message-Id: <20190902110920.19269-1-penberg@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
3fae8239fa alternator: throw on unsupported expressions
When an unsupported expression parameter is encountered -
KeyConditionExpression, ConditionExpression or FilterExpression
are such - alternator will return an error instead of ignoring
the parameter.
2019-09-11 18:01:05 +03:00
Amnon Heiman
811df711fb alternator/executor: update the latencies histogram
This patch update the latencies histogram for get, put, delete and
update.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Amnon Heiman
4a6d1f5559 alternator/stats metrics: use labels and estimated histogram
This patch make two chagnes to the alternator stats:
1. It add estimated_histogram for the get, put, update and delete
operation

2. It changes the metrics naming, so the operation will be a label, it
will be easier to handle, perform operation and display in this way.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
de53ed7cdd alternator_test: mark test_gsi_3 as passing
The test_gsi_3, involving creating a GSI with two key columns which weren't
previously a base key, now passes, so drop the "xfail" marker.

We still have problems with such materialized views, but not in the simple
scenario tested by test_gsi_3.

Later we should create a new test for the scenario which still fails, if
any.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
0e6338ffd9 alternator: allow creating GSI with 2 base regular columns
Creating an underlying materialized view with 2 regular base columns
is risky in Scylla, as second's column liveness will not be correctly
taken into account when ensuring view row liveness.
Still, in case specific conditions are met:
 * the regular base column value is always present in the base row
 * no TTLs are involved
then the materialized view will behave as expected.

Creating a GSI with 2 base regular columns issues a warning,
as it should be performed with care.
Message-Id: <5ce8642c1576529d43ea05e5c4bab64d122df829.1567159633.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
3325e76c6f alternator: fix default BillingMode
It is important that BillingMode should default to PROVISIONED, as it
does on DynamoDB. This allows old clients, which don't specify
BillingMode at all, to specify ProvisionedThroughput as allowed with
PROVISIONED.

Also added a test case for this case (where BillingMode is absent).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829193027.7982-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
395a97e928 alternator: correct error on missing index or table
When querying on a missing index, DynamoDB returns different errors in
case the entire table is missing (ResourceNotFoundException) or the table
exists and just the index is missing (ValidationException). We didn't
make this distinction, and always returned ValidationException, but this
confuses clients that expect ResourceNotFoundException - e.g., Amazon's
Tic-Tac-Toe demo.

This patch adds a test for the first case (the completely missing table) -
we already had a test for the second case - and returns the correct
error codes. As usual the test passes against DynamoDB as well as Alternator,
ensure they behave the same.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829174113.5558-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
62c4ed8ee3 alternator: improve request logging
We needlessly split the trace-level log message for the request to two
messages - one containing just the operation's name, and one with the
parameters. Moreover we printed them in the opposite order (parameters
first, then the operation). So this patch combines them into one log
message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829165341.3600-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
f755c22577 alternator-test: reproduce bug with using "attrs" as key column name
Alternator puts in the Scylla table a column called "attrs" for all the
non-key attributes. If the user happens to choose the same name, "attrs",
for one of the key columns, the result of writing two different columns
with the same name is a mess and corrupt sstables.

This test reproduces this bug (and works against DynamoDB of course).

Because the test doesn't cleanly fail, but rather leaves Scylla in a bad
state from which it can't fully recover, the test is marked as "skip"
until we fix this bug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190828135644.23248-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
6b27eaf4d0 alternator: remove redundant key checks in UpdateItem
Updating key columns is not allowed in UpdateItem requests,
but the series introducing GSI support for regular columns
also introduced redundant duplicates checks of this kind.
This condition is already checked in resolve_update_path helper function
and existing test_update_expression_cannot_modify_key test makes sure that
the condition is checked.
Message-Id: <00f83ab631f93b263003fb09cd7b055bee1565cd.1567086111.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
04a117cda3 alternator-test: improve test_update_expression_cannot_modify_key
The test test_update_expression_cannot_modify_key() verifies that an
update expression cannot modify one of the key columns. The existing
test only tried the SET and REMOVE actions - this patch makes the
test more complete by also testing the ADD and DELETE actions.

This patch also makes the expected exception more picky - we now
expect that the exception message contains the word "key" (as it,
indeed, does on both DynamoDB and Alternator). If we get any other
exception, there may be a problem.

The test passed before this patch, and passes now as well - it's just
stricter now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829135650.30928-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
81a97b2ac0 alternator-test: add test case for GSI with both keys
A case which adds a global secondary index on a table with both
hash and sort keys is added.
2019-09-11 18:01:05 +03:00
Piotr Sarna
615603877c alternator: use from_single_value instead of from_singular in ck
The code previously used clustering_key::from_singular() to compute
a clustering key value. It works fine, but has two issues:
1. involves one redundant deserialization stage compared to
   from_single_value
2. does not work with compound clustering keys, which can appear
   when using indexes
2019-09-11 18:01:05 +03:00
Piotr Sarna
4474ceceed alternator-test: enable passing tests
With more GSI features implemented, tests with XPASS status are promoted
to being enabled.

One test case (test_gsi_describe) is partially done as DescribeTable
now contains index names, but we could try providing more attributes
(e.g. IndexSizeBytes and ItemCount from the test case), so the test
is left in the XFAIL state.
2019-09-11 18:01:05 +03:00
Piotr Sarna
f922d6d771 alternator: Add 'mismatch' to serialization error message
In order to match the tests and origin more properly, the error message
for mismatched types is updated so it contains the word 'mismatch'.
2019-09-11 18:01:05 +03:00
Piotr Sarna
9dceea14f9 alternator: add describing GSI in DescribeTable
The DescribeTable request now contains the list of index names
as well. None of the attributes of the list are marked as 'required'
in the documentation, so currently the implementation provides
index names only.
2019-09-11 18:01:05 +03:00
Piotr Sarna
938a06e4c0 alternator: allow adding GSI-related regular columns to schema
In order to be able to create a Global Secondary Index over a regular
column, this column is upgraded from being a map entry to being a full
member of the schema. As such, it's possible to use this column
definition in the underlying materialized view's key.
2019-09-11 18:01:05 +03:00
Piotr Sarna
2a123925ca alternator: add handling regular columns with schema definitions
In order to prepare alternator for adding regular columns to schema,
i.e. in order to create a materialized view over them,
the code is changed so that updating no longer assumes that only keys
are included in the table schema.
2019-09-11 18:01:05 +03:00
Piotr Sarna
befa2fdc80 alternator: start fetching all regular columns
Since in the future we may want to have more regular columns
in alternator tables' schemas, the code is changed accordingly,
so all regular columns will be fetched instead of just the attribute
map.
2019-09-11 18:01:05 +03:00
Piotr Sarna
53044645aa alternator: avoid creating empty collection mutations
If no regular column attributes are passed to PutItem, the attr
collector serializes an empty collection mutation nonetheless
and sends it. It's redundant, so instead, if the attr colector
is empty, the collection does not get serialized and sent to replicas.
2019-09-11 18:01:05 +03:00
Nadav Har'El
317954fe19 alternator-test: add license blurbs
Add copyright and license blurbs to all alternator-test source files.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825161018.10358-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
c9eb9d9c76 alternator: update license blurbs
Update all the license blurbs to the one we use in the open-source
Scylla project, licensed under the AGPL.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825160321.10016-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
d6e671b04f alternator: add initial tracing to requests
Each request provides basic tracing information about itself.

Example output from tracing:

cqlsh> select request, parameters from system_traces.sessions
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 request          | parameters
------------------+-----------------------------------------------------
 Alternator Query | {'query': '{"TableName": "alternator_test_15664",
                    "KeyConditions": {"p": {"AttributeValueList":
                    [{"S": "T0FE0QCS0X"}], "ComparisonOperator": "EQ"}}}'}

cqlsh> select session_id, activity from system_traces.events
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 session_id                           | activity
--------------------------------------+-----------------------------
 39813070-c4ea-11e9-8572-000000000000 |                    Querying
 39813070-c4ea-11e9-8572-000000000000 | Performing a database query
2019-09-11 18:01:05 +03:00
Piotr Sarna
cb791abb9d alternator: enable query tracing
Probabilistic tracing can be enabled via REST API. Alternator will
from now on create tracing sessions for its operations as well.

Examples:

 # trace around 0.1% of all requests
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=0.001
 # trace everything
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=1
2019-09-11 18:01:05 +03:00
Piotr Sarna
6c8c31bfc9 alternator: add client state
Keeping an instance of client_state is a convenient way of being able
to use tracing for alternator. It's also currently used in paging,
so adding a client state to executor removes the need of keeping
a dummy value.
2019-09-11 18:01:05 +03:00
Piotr Sarna
1ca9dc5d47 alternator: use correct string views in serialization
String views used in JSON serialization should use not only the pointer
returned by rapidjson, but also the string length, as it may contain
\0 characters.
Additionally, one unnecessary copy is elided.
2019-09-11 18:01:05 +03:00
Nadav Har'El
32b898db7b alternator: docs/alternator.md: link to a longer document
Add a link to a longer document (currently, around 40 pages) about
DynamoDB's features and how we implemented or may implement them in
Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825121201.31747-2-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
a5c3d11ccb alternator: document choice of RF
After changing the choice of RF in a previous patch, let's update the
relevant part of docs/alternator.md.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825121201.31747-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
d20ec9f492 alternator: expand docs/alternator.md
Expand docs/alternator.md with new sections about how to run Alternator,
and a very brief introduction to its design.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818164628.12531-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
9b0ef1a311 alternator: refuse CreateTable if uses unsupported features
If a user tries to create a table with a unsupported feature -
a local secondary index, a used-defined encryption key or supporting
streams (CDC), let's refuse the table creation, so the application
doesn't continue thinking this feature is available to it.

The "Tags" feature is also not supported, but it is more harmless
(it is used mostly for accounting purposes) so we do not fail the
table creation because of it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818125528.9091-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ab25472034 alternator: migrate to visitor pattern in serialization
Types can now be processed with a visitor pattern, which is more neat
than a chain of if statements.
Message-Id: <256429b7593d8ad8dff737d8ddb356991fb2a423.1566386758.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
42d2910f2c alternator: add from_string with raw pointer to rjson
from_string is a family of function that create rjson values from
strings - now it's extended with accepting raw pointer and size.
Message-Id: <d443e2e4dcc115471202759ecc3641ec902ed9e4.1566386758.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
2f53423a2f alternator: automatically choose RF: 1 or 3
In CQL, before a user can create a table, they must create a keyspace to
contain this table and, among other things, specify this keyspace's RF.

But in the DynamoDB API, there is no "create keyspace" operation - the
user just creates a table, and there is no way, and no opportunity,
to specify the requested RF. Presumably, Amazon always uses the same
RF for all tables, most likely 3, although this is not officially
documented anywhere.

The existing code creates the keyspace during Scylla boot, with RF=1.
This RF=1 always works, and is a good choice for a one-node test run,
but was a really bad choice for a real cluster with multiple nodes, so
this patch fixes this choice:

With this patch, the keyspace creation is delayed - it doesn't happen
when the first node of the cluster boots, but only when the user creates
the first table. Presumably, at that time, the cluster is already up,
so at that point we can make the obvious choice automatically: a one-node
cluster will get RF=1, a >=3 node cluster will get RF=3. The choice of
RF is logged - and the choice of RF=1 is considered a warning.

Note that with this patch, keyspace creation is still automatic as it
was before. The user may manually create the keyspace via CQL, to
override this automatic choice. In the future we may also add additional
keyspace configuration options via configuration flags or new REST
requests, and the keyspace management code will also likely change
as we start to support clusters with multiple regions and global
tables. But for now, I think the automatic method is easiest for
users who want to test-drive Alternator without reading lengthy
instructions on how to set up the keyspace.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190820180610.5341-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
1a1935eb72 alternator-test: add a test for wrong BEGINS_WITH target type
The test ensures that passing a non-compatible type to BEGINS WITH,
e.g. a number, results in a validation error.
Tested both locally and remotely.
Message-Id: <894a10d3da710d97633dd12b6ac54edccc18be82.1566291989.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b7b998568f alternator: add to CreateTable verification of BillingMode setting
We allow BillingMode to be set to either PAY_PER_REQUEST (the default)
or PROVISIONED, although neither mode is fully implemented: In the former
case the payment isn't accounted, and in the latter case the throughput
limits are not enforced.
But other settings for BillingMode are now refused, and we add a new test
to verify that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190818122919.8431-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
66a2af4f7d alternator-test: require a new-enough boto library
The alternator tests want to exercise many of the DynamoDB API features,
so they need a recent enough version of the client libraries, boto3
and botocore. In particular, only in botocore 1.12.54, released a year
ago, was support for BillingMode added - and we rely on this to create
pay-per-request tables for our tests.

Instead of letting the user run with an old version of this library and
get dozens of mysterious errors, in this patch we add a test to conftest.py
which cleanly aborts the test if the libraries aren't new enough, and
recommends a "pip" command to upgrade these libraries.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190819121831.26101-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
64bf2b29a8 alternator-test: exhaustive tests for DescribeTable operation
The DescribeTable operation was currently implemented to return the
minimal information that libraries and applications usually need from
it, namely verifying that some table exists. However, this operation
is actually supposed to return a lot more information fields (e.g.,
the size of the table, its creation date, and more) which we currently
don't return.

This patch adds a new test file, test_describe_table.py, testing all
these additional attributes that DescribeTable is supposed to return.
Several of the tests are marked xfail (expected to fail) because we
did not implement these attributes yet.

The test is exhaustive except for attributes that have to do with four
major features which will be tested together with these features: GSI,
LSI, streams (CDC), and backup/restore.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190816132546.2764-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
fbd2f5077d alternator: enable timeouts on requests
Currently Alternator starts all Scylla requests (including both reads
and writes) without any timeout set. Because of bugs and/or network
problems, Requests can theoretically hang and waste Scylla request for
hours, long after the client has given up on them and closed their
connection.

The DynamoDB protocol doesn't let a user specify which timeout to use,
so we should just use something "reasonable", in this patch 10 seconds.
Remember that all DynamoDB read and write requests are small (even scans
just scan a small piece), so 10 seconds should be above and beyond
anything we actually expect to see in practice.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190812105132.18651-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b2bd3bbc1f alternator: add "--alternator-address" configuration parameter
So far we had the "--alternator-port" option allowing to configure the port
on which the Alternator server listens on, but the server always listened
to any address. It is important to also be able to configure the listen
address - it is useful in tests running several instances of Scylla on
the same machine, and useful in multi-homed machines with several interfaces.

So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0
(to listen on all interfaces). It works like the many other "--*-address"
options that Scylla already has.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808204641.28648-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
ea41dd2cf8 alternator: docs/alternator.md more about filtering support
Give more details about what is, and what isn't, currently
supported in filtering of Scan (and Query) results.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190811094425.30951-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
88eed415bd alternator: fix indentation
It turns out that recent rjson patches introduced some buggy
tabs instead of spaces due to bad IDE configuration. The indentation
is restored to spaces.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3c11428d8d alternator-test: add QueryFilter validation cases
QueryFilter validation was lately supplemented with non-key column
checks, which is hereby tested.
2019-09-11 18:01:05 +03:00
Piotr Sarna
0e0dc14302 alternator-test: add scan case for key equality filtering
With key equality filtering enabled, a test case for scanning is provided.
2019-09-11 18:01:05 +03:00
Piotr Sarna
f1641caa41 alternator: add filtering for key equality
Until now, filtering in alternator was possible only for non-key
column equality relations. This commit adds support for equality
relations for key columns.
2019-09-11 18:01:05 +03:00
Piotr Sarna
a2828f9daa alternator: add validation to QueryFilter
QueryFilter, according to docs, can only contain non-key attributes.
2019-09-11 18:01:05 +03:00
Piotr Sarna
d055658fff alternator: add computing key bounds from filtering
Alternator allows passing hash and sort key restrictions
as filters - it is, however, better to incorporate these restrictions
directly into partition and clustering ranges, if possible.
It's also necessary, as optimizations inside restrictions_filter
assume that it will not be fed unneeded rows - e.g. if filtering
is not needed on partition key restrictions, they will not be checked.
2019-09-11 18:01:05 +03:00
Piotr Sarna
9c05051b59 alternator: extract getting key value subfunction
Currently the only utility function for getting key bytes
from JSON was to parse a document with the following format:
"key_column_name" : { "key_column_type" : VALUE }.
However, it's also useful to parse only the inner document, i.e.:
{ "key_column_type" : VALUE }.
2019-09-11 18:01:05 +03:00
Piotr Sarna
c84019116a alternator: make make_map_element_restriction static
The function has no outside users and thus does not need to be exposed.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3ee99a89b1 alternator: register filtering metrics
Three metrics related to filtering are added to alternator:
 - total rows read during filtering operations
 - rows read and matched by filtering
 - rows read and dropped by filtering
2019-09-11 18:01:05 +03:00
Piotr Sarna
b3e35dab26 alternator: add bumping filtering stats
When filtering is used in querying or scanning, the number of total
filtered rows is added to stats.
2019-09-11 18:01:05 +03:00
Piotr Sarna
a6d098d3eb alternator: add cql_stats to alternator stats
Some underlying operations (e.g. paging) make use of cql_stats
structure from CQL3. As such, cql_stats structure is added
to alternator stats in order to gather and use these statistics.
2019-09-11 18:01:05 +03:00
Piotr Sarna
3ae54892cd alternator: fix a comment typo
s/Miscellenous/Miscellaneous/g
2019-09-11 18:01:05 +03:00
Piotr Sarna
ccf778578a alternator: register read-before-write stats
Read-before-write stat counters were already introduced, but the metrics
needs to be added to a metric group as well in order to be available
for users.
2019-09-11 18:01:05 +03:00
Nadav Har'El
6f81d0cb15 alternator: initial support for GSI
This patch adds partial support for GSI (Global Secondary Index) in
Alternator, implemented using a materialized view in Scylla.

This initial version only supports the specific cases of the index indexing
a column which was already part of the base table's key - e.g., indexing
what used to be a sort key (clustering key) in the base table. Indexing
of non-key attributes (which today live in a map) is not yet supported in
this version.

Creation of a table with GSIs is supported, and so is deleting the table.
UpdateTable which adds a GSI to an existing table is not yet supported.
Query and Scan operations on the index are supported.
DescribeTable does not yet list the GSIs as it should.

Seven previously-failing tests now pass, so their "xfail" tag is removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808090256.12374-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
33611acf44 alternator: add stats for read-before-write
A simple metric counting how many read-before-writes were executed
is added.
Message-Id: <d8cc1e9d77e832bbdeff8202a9f792ceb4f1e274.1565274797.git.sarna@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
ae59340c15 alternator: complement rjson.hh comments
Some comments in rjson.hh header file were not clear and are hereby
amended.
Message-Id: <7fa4e2cf39b95c176af31fe66f404a6a51a25bec.1565275276.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
5eb583ab09 alternator: remove missing key FIXME
The case for missing key in update_item was already properly fixed
along with migrating from libjsoncpp to rapidjson, but one FIXME
remained in the code by mistake.

Message-Id: <94b3cf53652aa932a661153c27aa2cb1207268c7.1565271432.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
436f806341 alternator: remove decimal_type FIXME
Decimal precision problems were already solved by commit
d5a1854d93c9448b1d22c2d02eb1c46a286c5404, but one FIXME
remained in the code by mistake.

Message-Id: <381619e26f8362a8681b83e6920052919acf1142.1565271198.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
b29b753196 alternator: add comments to rjson
The rapidjson library needs to be used with caution in order to
provide maximum performance and avoid undefined behavior.
Comments added to rjson.hh describe provided methods and potential
pitfalls to avoid.
Message-Id: <ba94eda81c8dd2f772e1d336b36cae62d39ed7e1.1565270214.git.sarna@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
7b02c524d0 alternator: remove a pointer-based workaround for future<json>
With libjsoncpp we were forced to work around the problem of
non-noexcept constructors by using an intermediate unique pointer.
Objects provided by rapidjson have correct noexcept specifiers,
so the workaround can be dropped.
2019-09-11 18:01:04 +03:00
Piotr Sarna
cb29d6485e alternator: migrate to rapidjson library
Profiling alternator implied that JSON parsing takes up a fair amount
of CPU, and as such should be optimized. libjsoncpp is a standard
library for handling JSON objects, but it also proves slower than
rapidjson, which is hereby used instead.
The results indicated that libjsoncpp used roughly 30% of CPU
for a single-shard alternator instance under stress, while rapidjson
dropped that usage to 18% without optimizations.
Future optimizations should include eliding object copying, string copying
and perhaps experimenting with different JSON allocators.
2019-09-11 18:01:04 +03:00
Piotr Sarna
0fd1354ef9 alternator: add handling rapidjson errors in the server
If a JSON parsing error is encountered, it is transformed
to a validation exception and returned to the user in JSON form.
2019-09-11 18:01:04 +03:00
Piotr Sarna
7064b3a2bf alternator: add rapidjson helper functions
Migrating from libjsoncpp to rapidjson proved to be beneficial
for parsing performance. As a first step, a set of helper functions
is provided to ease the migration process.
2019-09-11 18:01:04 +03:00
Piotr Sarna
0b0bfc6e54 alternator: add missing namespaces to status_type
error.hh file implicitly assumed that seastar:: namespace is available
when it's included, which is not always the case. To remedy that,
seastar::httpd namespace is used explicitly.
2019-09-11 18:01:04 +03:00
Nadav Har'El
56309db085 alternator: correct catch table-already-exists exception
Our CreateTable handler assumed that the function
migration_manager::announce_new_column_family()
returns a failed future if the table already exists. But in some of
our code branches, this is not the case - the function itself throws
instead of returning a failed future. The solution is to use
seastar::futurize_apply() to handle both possibilities (direct exception
or future holding an exception).

This fixes a failure of the test_table.py::test_create_table_already_exists
test case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 18:01:04 +03:00
Nadav Har'El
d74b203dee alternator: add docs/alternator.md
This adds a new document, docs/alternator.md, about Alternator.

The scope of this document should be expanded in the future. We begin
here by introducing Alternator and its current compatibility level with
Amazon DynamoDB, but it should later grow to explain the design of Alternator
and how it maps the DynamoDB data model onto Scylla's.

Whether this document should remain a short high-level overview, or a long
and detailed design document, remains an open question.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190805085340.17543-1-nyh@scylladb.com>
2019-09-11 18:01:04 +03:00
Piotr Sarna
75ee13e5f2 dependencies: add rapidjson
The rapidjson fast JSON parsing library is used instead of libjsoncpp
in the Alternator subproject.

[avi: update toolchain image to include the new dependency]

Message-Id: <a48104dec97c190e3762f927973a08a74fb0c773.1564995712.git.sarna@scylladb.com>
2019-09-11 18:00:44 +03:00
Nadav Har'El
5eaf73a292 alternator: fix sharing of a seastar::shared_ptr between threads
The function attrs_type() return a supposedly singleton, but because
it is a seastar::shared_ptr we can't use the same one for multiple
threads, and need to use a separate one per thread.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190804163933.13772-1-nyh@scylladb.com>
2019-09-11 16:06:05 +03:00
Nadav Har'El
1b1ede9288 alternator: fix cross-shard use of CQL type objects
The CQL type singletons like utf8_type et al. are separate for separate
shards and cannot be used across shards. So whatever hash tables we use
to find them, also needs to be per-shard. If we fail to do this, we
get errors running the debug build with multiple shards.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190804165904.14204-1-nyh@scylladb.com>
2019-09-11 16:05:39 +03:00
Nadav Har'El
7eae889513 alternator-test: some more GSI tests
Expand the GSI test suite. The most important new test is
test_gsi_key_not_in_index(), where the index's key includes just one of
the base table's key columns, but not a second one. In this case, the
Scylla implementation will nevertheless need to add the second key column
to the view (as a clustering key), even though it isn't considered a key
column by the DynamoDB API.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190718085606.7763-1-nyh@scylladb.com>
2019-09-11 16:05:38 +03:00
Nadav Har'El
10ad60f7de alternator: ListTables should not list materialized views
Our ListTables implementation uses get_column_families(), which lists both
base tables and materialized views. We will use materialized views to
implement DynamoDB's secondary indexes, and those should not be listed in
the results of ListTables.

The patch also includes a test for this.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190717133103.26321-2-nyh@scylladb.com>
2019-09-11 16:04:29 +03:00
Nadav Har'El
676ada4576 alternator-test: move list_tables to util.py
The list_tables() utility function was used only in test_table.py
but I want to use it elsewhere too (in GSI test) so let's move it
to util.py.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190717133103.26321-1-nyh@scylladb.com>
2019-09-11 16:04:28 +03:00
Piotr Sarna
f3963865f5 alternator: make set_sum exception more user-friendly
As in case of set_diff, an exception message in set_sum should include
the user-provided request (ADD) rather than our internal helper function
set_sum.
2019-09-11 16:03:27 +03:00
Piotr Sarna
9dd8644e4a alternator-tests: enable DELETE case for sets
UpdateExpression's case for DELETE operation for sets is enabled.
2019-09-11 16:03:26 +03:00
Piotr Sarna
2b215b159c alternator: implement set DELETE
UpdateExpression's DELETE operation for set is implemented on top
of set_diff helper function.
2019-09-11 16:02:25 +03:00
Piotr Sarna
fe72a6740c alternator: add set difference helper function
A function for computing set differene of two sets represented
as JSON is added.
2019-09-11 16:01:03 +03:00
Nadav Har'El
e13c56be0b alternator: fail attempt to create table with GSI
Although we do not support GSI yet, until now we silently ignored
CreateTable's GSI parameter, and the user wouldn't know the table
wasn't created as intended.

In this patch, GSI is still unsupported, but now CreateTable will
fail with an error message that GSI is not supported.

We need to change some of the tests which test the error path, and
expect an error - but should not consider a table creation error
as the expected error.

After this patch, test_gsi.py still fails all the tests on
Alternator, but much more quickly :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711161420.18547-1-nyh@scylladb.com>
2019-09-11 16:00:01 +03:00
Piotr Sarna
336c90daaa alternator-test: add stub case for set add duplication
The test case for adding two sets with common values is added.
This case is a stub, because boto3 transforms the result into a Python
set, which removes duplicates on its own. A proper TODO is left
in order to migrate this case to a lower-level API and check
the returned JSON directly for lack of duplicates.
2019-09-11 16:00:00 +03:00
Piotr Sarna
67c95cb303 alternator-test: enable tests for ADD operation
Tests for UpdateExpression::ADD are enabled.
2019-09-11 15:59:59 +03:00
Piotr Sarna
f29c2f6895 alternator: add ADD operation
UpdateExpression is now able to perform ADD operation on both numbers
and sets.
2019-09-11 15:59:00 +03:00
Piotr Sarna
a5f2926056 alternator: add helper function for adding sets
A helper function that allows creating a set sum out of two sets
represented in JSON is added.
2019-09-11 15:57:41 +03:00
Piotr Sarna
18686ff288 alternator: add unwrap_set
It will be needed later to implement adding sets.
2019-09-11 15:56:15 +03:00
Piotr Sarna
09993cf857 alternator: add get_item_type_string helper function
It will be useful later for ensuring that parameters for various
functions have matching types.
2019-09-11 15:52:31 +03:00
Nadav Har'El
d54c82209c alternator: fix Query verification of appropriate key columns
The Query operation's conditions can be used to search for a particular
hash key or both hash and sort keys - but not any other combinations.
We previously forgot to verify most errors, so in this patch we add
missing verifications - and tests to confirm we fail the query when
DynamoDB does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711132720.17248-1-nyh@scylladb.com>
2019-09-11 15:51:27 +03:00
Nadav Har'El
fbe63ddcc4 alternator-test: more GSI tests
Add more tests for GSI - tests that DescribeTable describes the GSI,
and test the case of more than one GSI for a base table.

Unfortunately, creating an empty table with two GSIs routinely takes
on DynamoDB more than a full minute (!), so because we now have a
test with two GSIs, I had to increase the timeout in create_test_table().

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190711112911.14703-1-nyh@scylladb.com>
2019-09-11 15:51:26 +03:00
Piotr Sarna
a3be9dda7f alternator-test: enable if_not_exists-related tests
Test cases that relied on the implementation of if_not_exists are
enabled.
2019-09-11 15:51:25 +03:00
Piotr Sarna
cec82490d2 alternator: implement if_not_exists
The if_not_exists function is implemented on the basis of recently added
read-before write mechanism.
2019-09-11 15:50:22 +03:00
Piotr Sarna
b14e3c0e72 alternator: rename holds_path to a more generic name
The holds_path() utility function is actually used to check if a value
needs read before write, so its name is changed to more fitting
check_needs_read_before_write.
2019-09-11 15:49:19 +03:00
Nadav Har'El
5fc7b0507e alternator: fix bug in collection mutations
Alternator currently keeps an item's attributes inside a map, and we
had a serious bug in the way we build mutations for this map:

We didn't know there was a requirement to build this mutation sorted by
the attribute's name. When we neglect to do this sorting, this confuses
Scylla's merging algorithms, which assume collection cells are thus
sorted, and the result can be duplicate cells in a collection, and the
visible effect is a mutation that seems to be ignored - because both
old and new values exist in the collection.

So this patch includes a new helper class, "attribute_collector", which
helps collect attribute updates (put and del) and extract them in correctly
sorted order. This helper class also eliminates some duplication of
arcane code to create collection cells or deletions of collection cells.

This patch includes a simple test that previously failed, and one xfail
test that failed just because of this bug (this was the test that exposed
this bug). Both tests now succeed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190709160858.6316-1-nyh@scylladb.com>
2019-09-11 15:48:18 +03:00
Nadav Har'El
5cce53fed9 alternator-test: exhaustive tests for GSI
This patch adds what is hopefully an exhaustive test suite for the
global secondary indexing (GSI) feature, and all its various
complications and corner cases of how GSIs can be created, deleted,
named, written, read, and more (the tests are heavily documented to
explain what they are testing).

All these tests pass on DynamoDB, and fail on Alternator, so they are
marked "xfail". As we develop the GSI feature in Alternator piece by
piece, we should make these tests start to pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708160145.13865-1-nyh@scylladb.com>
2019-09-11 15:48:17 +03:00
Nadav Har'El
9eea90d30d alternator-test: another test for BatchWriteItem
This adds another test for BatchWriteItem: That if one of the operations is
invalid - e.g., has a wrong key type - the entire batch is rejected, and not
none of its operations are done - even the valid ones.

The test succeeds, because we already handle this case correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190707134610.30613-1-nyh@scylladb.com>
2019-09-11 15:48:16 +03:00
Nadav Har'El
01f4cf1373 alternator-test: test UpdateItem's SET with #reference
Test an operation like SET #one = #two, where the RHS has a reference
to a name, rather than the name itself. Also verify that DynamoDB
gives an error if ExpressionAttributeNames includes names not needed
by neither left or right hand side of such assignments.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708133311.11843-1-nyh@scylladb.com>
2019-09-11 15:48:15 +03:00
Piotr Sarna
e482f27e2f alternator-test: add test for reading key before write
The test case checks if reading keys in order to use their values
in read-before-write updates works fine.
2019-09-11 15:48:14 +03:00
Piotr Sarna
7b605d5bec alternator-test: add test case for nested read-before-write
A test for read-before-write in nested paths (inside a function call
or inside a +/- operator) is added.
2019-09-11 15:48:13 +03:00
Piotr Sarna
da795d8733 alternator-test: enable basic read-before-write cases
With unsafe read-before-write implemented, simple cases can be enabled
by removing their xfail flag.
2019-09-11 15:48:12 +03:00
Piotr Sarna
2e473b901a alternator: fix indentation 2019-09-11 15:48:09 +03:00
Piotr Sarna
bf13564a9d alternator: add unsafe read-before-write to update_item
In order to serve update requests that depend on read-before-write,
a proper helper function which fetches the existing item with a given
key from the database is added.
This read-before-write mechanism is not considered safe, because it
provides no linearizability guarantees and offers no synchronization
protection. As such, it should be consider a placeholder that works
fine on a single machine and/or no concurrent access to the same key.
2019-09-11 15:45:21 +03:00
Piotr Sarna
2fb711a438 alternator: add context parameters to calculate_value
The calculate_value utility function is going to need more context
in order to resolve paths present in the right-hand side of update_item
operators: update_info and schema.
2019-09-11 15:40:17 +03:00
Piotr Sarna
cbe1836883 alternator: add allowing key columns when resolving path
Historically, resolving a path checked for key columns, which are not
allowed to be on the left-hand side of the assignment. However, path
resolving will now also be used for right-hand side, where it should
be allowed to use the key value.
2019-09-11 15:39:15 +03:00
Piotr Sarna
20a6077fb3 alternator: add optional previous item to calculate_value
In order to implement read-before-write in the future, calculate_value
now accepts an additional parameter: previous_item. If read-before-write
was performed, previous_item will contain an item for the given key
which already exists in the database at the time of the update.
2019-09-11 15:38:13 +03:00
Piotr Sarna
784aaaa8ff alternator: move describe_item implementation up
It will be needed later to add read-before-write to update_item.
2019-09-11 15:37:13 +03:00
Nadav Har'El
bd4dfa3724 alternator-test: move create_test_table() to util.py
This patch moves the create_test_table() utility function, which creates
a test table with a unique name, from the fixtures (conftest.py) to
util.py. This will allow reusing this function in tests which need to
create tables but not through the existing fixtures. In particular
we will need to do this for GSI (global secondary index) tests
in the next patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708104438.5830-1-nyh@scylladb.com>
2019-09-11 15:37:12 +03:00
Nadav Har'El
ce13a0538c alternator-test: expand tests of duplicate items in BatchWriteItem
The tests we had for BatchWriteItem's refusal to accept duplicate keys
only used test_table_s, with just a hash key. This patch adds tests
for test_table, i.e., a table with both hash and sort keys - to check
that we check duplicates in that case correctly as well.

Moreover, the expanded tests also verify that although identical
keys are not allowed, keys with just one component (hash or sort key)
the same but the other not the same - are fine.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190705191737.22235-1-nyh@scylladb.com>
2019-09-11 15:37:11 +03:00
Nadav Har'El
9bc2685a92 alternator-test: run local tests without configuring AWS
Even when running against a local Alternator, Boto3 wants to know the
region name, and AWS credentials, even though they aren't actually needed.
For a local run, we can supply garbage values for these settings, to
allow a user who never configured AWS to run tests locally.
Running against "--aws" will, of course, still require the user to
configure AWS.

Also modified the README to be clearer, and more focused on the local
runs.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708121420.7485-1-nyh@scylladb.com>
2019-09-11 15:37:10 +03:00
Nadav Har'El
cb42c75e0a alternator-test: don't hardcode us-east-1 region
For "--aws" tests, use the default region chosen by the user in the
AWS configuration (~/.aws/config or environment variable), instead of
hard-coding "us-east-1".

Patch by Pekka Enberg.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190708105852.6313-1-nyh@scylladb.com>
2019-09-11 15:37:09 +03:00
Piotr Sarna
8f9e720f10 alternator-test: enable precision test for add
With big_decimal-based implementation, the precision test passes.
Message-Id: <6d631a43901a272cb9ebd349cb779c9677ce471e.1562318971.git.sarna@scylladb.com>
2019-09-11 15:37:08 +03:00
Piotr Sarna
78e495fac3 alternator: allow arithmetics without losing precision
Calculating value represented as 'v1 + v2' or 'v1 - v2' was previously
implemented with a double type, which offers limited precision.
From now on, these computations are based on big_decimal, which
allows returning values without losing precision.
This patch depends on 'add big_decimal arithmetic operators' series.
Message-Id: <f741017fe3d3287fa70618068bdc753bfc903e74.1562318971.git.sarna@scylladb.com>
2019-09-11 15:36:08 +03:00
Piotr Sarna
466f25b1e8 alternator-test: enable batch duplication cases
With duplication checks implemented, batch write and delete tests
no longer need to be marked @xfail.
Message-Id: <6c5864607e06e8249101bd711dac665743f78d9f.1562325663.git.sarna@scylladb.com>
2019-09-11 15:36:07 +03:00
Piotr Sarna
eb7ada8387 alternator: add checking for duplicate keys in batches
Batch writes and batch deletes do not allow multiple entries
for the same key. This patch implements checking for duplicated
entries and throws an error if applicable.
Message-Id: <450220ba74f26a0893430cb903e4749f978dfd31.1562325663.git.sarna@scylladb.com>
2019-09-11 15:35:01 +03:00
Nadav Har'El
b810fa59c4 alternator-test: move utility functions to a new "util.py"
Move some common utility functions to a common file "util.py"
instead of repeating them in many test files.

The utility functions include random_string(), random_bytes(),
full_scan(), full_query(), and multiset() (the more general
version, which also supports freezing nested dicts).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190705081013.1796-1-nyh@scylladb.com>
2019-09-11 15:35:00 +03:00
Nadav Har'El
2fb77ed9ad alternator: use std::visit for reading std::variant
The idiomatic way to use an std::variant depending the type holds is to use
std::visit. This modern API makes it unnecessary to write many boiler-plate
functions to test and cast the type of the variant, and makes it impossible
to forget one of the options. So in this patch we throw out the old ways,
and welcome the new.

Thanks to Piotr Sarna for the idea.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190704205625.20300-1-nyh@scylladb.com>
2019-09-11 15:33:57 +03:00
Nadav Har'El
4d07e2b7c5 alternator: support BatchGetItem
This patch adds to Alternator an implementation of the BatchGetItem
operation, which allows to start a number of GetItem requests in parallel
in a single request.

The implementation is almost complete - the only missing feature is the
ability to ask only for non-top-level attributes in ProjectionExpression.
Everything else should work, and this patch also includes tests which,
as usual, pass on DynamoDB and now also on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:33:50 +03:00
Nadav Har'El
d1a5512a35 alternator: fix second boot
Amazingly, it appears we never tested booting Alternator a second time :-)

Our initialization code creates a new keyspace, and was supposed to ignore
the error if this keyspace already existed - but we thought the error will
come as an exceptional future, which it didn't - it came as a thrown
exception. So we need to change handle_exception() to a try/catch.

With this patch, I can kill Alternator and it will correctly start again.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:22:48 +03:00
Nadav Har'El
374162f759 alternator: generate error on spurious key columns
Operations which take a key as parameter, namely GetItem, UpdateItem,
DeleteItem and BatchWriteItem's DeleteRequest, already fail if the given
key is missing one of the nessary key attributes, or has the wrong types
for them. But they should also fail if the given key has spurious
attributes beyond those actually needed in a key.

So this patch adds this check, and tests to confirm that we do these checks
correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:21:50 +03:00
Nadav Har'El
da4da6afbf alternator: fix PutItem to really replace item.
The PutItem operation, and also the PutRequest of BatchWriteItem, are
supposed to completely replace the item - not to merge the new value with
the previous value. We implemented this wrongly - we just wrote the new
item forgetting a tombstone to remove the old item.

So this patch fixes these operations, and adds tests which confirm the
fix (as usual, these tests pass on DynamoDB, failed on Alternator before
this patch, and pass after the patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:20:55 +03:00
Nadav Har'El
a0fffcebde alternator: add support for DeleteRequest in BatchWriteItem
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:20:01 +03:00
Nadav Har'El
83b91d4b49 alternator: add DeleteItem
Add support for the DeleteItem operation, which deletes an item.

The basic deletion operation is supported. Still not supported are:

1. Parameters to conditionally delete (ConditionalExpression or Expected)
2. Parameters to return pre-delete content
3. ReturnItemCollectionMetrics (statistics relevant for tables with LSI)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:19:46 +03:00
Nadav Har'El
b09603ed9b alternator: cleaner error on DeleteRequest
In BatchWriteItem, we currently only support the PutRequest operation.
If a user tries to use DeleteRequest (which we don't support yet), he
will get a bizarre error. Let's test the request type more carefully,
and print a better error message. This will also be the place where
eventually we'll actually implement the DeleteRequest.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:16:02 +03:00
Nadav Har'El
a7f7ce1a73 alternator-test: tests for BatchWriteItem
This patch adds more comprehensive tests for the BatchWriteItem operation,
in a new file batch_test.py. The one test we already had for it was also
moved from test_item.py here.

Some of the test still xfail for two reasons:
1. Support for the DeleteRequest operation of BatchWriteItem is missing.
2. Tests that forbid duplicate keys in the same request are missing.

As usual, all tests succeed on DynamoDB, and hopefully (I tried...)
cover all the BatchWriteItem features.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:16:01 +03:00
Nadav Har'El
a8dd3044e2 alternator: support (most of) ProjectionExpression
DynamoDB has two similar parameters - AttributesToGet and
ProjectionExpression - which are supported by the GetItem, Scan and
Query operations. Until now we supported only the older AttributesToGet,
and this patch adds support to the newer ProjectionExpression.

Besides having a different syntax, the main difference between
AttributesToGet and ProjectionExpression is that the latter also
allows fetching only a specific nested attribute, e.g., a.b[3].c.
We do not support this feature yet, although it would not be
hard to add it: With our current data representation, it means
fetching the top-level attribute 'a', whose value is a JSON, and then
post-filtering it to take out only the '.b[3].c'. We'll do that
later.

This patch also adds more test cases to test_projection_expression.py.
All tests except three which check the nested attributes now pass,
and those three xfail (they succeed on DynamoDB, and fail as expected
on Alternator), reminding us what still needs to be done.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:15:01 +03:00
Nadav Har'El
98c4e646a5 alternator-test: tests for yet-unimplemented ProjectionExpression
Our GetItem, Query and Scan implementations support the AttributesToGet
parameter to fetch only a subset of the attributes, but we don't yet
support the more elaborate ProjectionExpression parameter, which is
similar but has a different syntax and also allows to specify nested
document paths.

This patch adds existive testing of all the ProjectionExpression features.
All these tests pass against DynamoDB, but fail against the current
Alternator so they are marked "xfail". These tests will be helpful for
developing the ProjectionExpression feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:15:00 +03:00
Nadav Har'El
7c9e64ed81 alternator-test: more tests for AttributesToGet parameter
The AttributesToGet parameter - saying which attributes to fetch for each
item - is already supported in the GetItem, Query and Scan operations.
However, we only had a test for it for it for Scan. This patch adds
similar tests also for the GetItem and Query operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:59 +03:00
Nadav Har'El
9c53f33003 alternator-test: another test for top-level attribute overwrite
Yet another test for overwriting a top-level attribute which contains
a nested document - here, overwriting it by just a string.

This test passes. In the current implementation we don't yet support
updates to specific attribute paths (e.g. a.b[3].c) but we do support
well writing and over-writing top-level attributes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:58 +03:00
Nadav Har'El
f6fa971e96 alternator: initial implementation of "+" and "-" in UpdateExpression
This patch implements the last (finally!) syntactic feature of the
UpdateExpression - the ability to do SET a=val1+val2 (where, as
before, each of the values can be a reference to a value, an
attribute path, or a function call).

The implementation is not perfect: It adds the values as double-precision
numbers, which can lose precision. So the patch adds a new test which
checks that the precision isn't lost - a test that currently fails
(xfail) on Alternator, but passes on DynamoDB. The pre-existing test
for adding small integer now passes on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:14:01 +03:00
Nadav Har'El
a5af962d80 alternator: support the list_append() function in UpdateExpression
In the previous patch we added function-call support in the UpdateExpression
parser. In this patch we add support for one such function - list_append().
This function takes two values, confirms they are lists, and concatenates
them. After this patch only one function remains unimplemented:
if_not_exists().

We also split the test we already had for list_append() into two tests:
One uses only value references (":val") and passes after this patch.
The second test also uses references to other attributes and will only
work after we start supporting read-modify-write.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:13:07 +03:00
Nadav Har'El
9d2eba1c75 alternator: parse more types of values in UpdateExpression
Until this patch, in update expressions like "SET a = :val", we only
allowed the right-hand-side of the assignment to be a reference to a
value stored in the request - like ":val" in the above example.

But DynamoDB also allows the value to be an attribute path (e.g.,
"a.b[3].c", and can also be a function of a bunch of other values.
This patch adds supports for parsing all these value types.

This patch only adds the correct parsing of these additional types of
values, but they are still not supported: reading existing attributes
(i.e., read-modify-write operations) is still not supported, and
none of the two functions which UpdateExpression needs to support
are supported yet. Nevertheless, the parsing is now correct, and the
the "unknown_function" test starts to pass.

Note that DynamoDB allows the right-hand side of an assignment to be
not only a single value, but also value+value and value-value. This
possibility is not yet supported by the parser and will be added
later.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:12:06 +03:00
Piotr Sarna
cb50207c7b alternator-test: add initial filtering test for scans
Currently the only supported case is equality on non-key attributes.
More complex filtering tests are also included in test_query.py.
2019-09-11 15:12:05 +03:00
Piotr Sarna
b5eb3aed10 alternator-test: add initial filtering test for query
The test cases verify that equality-based filtering on non-key
attributes works fine. It also contains test stubs for key filtering
and non-equality attribute filtering.
2019-09-11 15:12:04 +03:00
Piotr Sarna
319e946d8f alternator-test: diversify attribute values in filled test table
Filled test table used to have identical non-key attributes for all
rows. These values are now diversified in order to allow writing
filtering test cases.
2019-09-11 15:12:03 +03:00
Piotr Sarna
e4516617eb alternator: add filtering to Query
Query requests now accept QueryFilter parameter.
2019-09-11 15:11:10 +03:00
Piotr Sarna
4ea02bec89 alternator: enable filtering for Scan
Scans can now accept ScanFilter parameter to perform filtering
on returned rows.
2019-09-11 15:10:12 +03:00
Piotr Sarna
8cb078f757 alternator: add initial filtering implementation
Filtering is currently only implemented for the equality operator
on non-key attributes.
Next steps (TODO) involve:
1. Implementing filtering for key restrictions
2. Implementing non-key attribute filtering for operators other than EQ.
   It, in turn, may involve introducing 'map value restrictions' notion
   to Scylla, since now it only allows equality restrictions on map
   values (alternator attributes are currently kept in a CQL map).
3. Implementing FilterExpression in addition to deprecated QueryFilter
2019-09-11 15:08:50 +03:00
Nadav Har'El
aa94e7e680 alternator: clean up parsing of attribute-path components
Before this patch, we read either an attribute name like "name" or
a reference to one "#name", as one type of token - NAME.
However, while attribute paths indeed can use either one, in some other
contexts - such as a function name - only "name" is allowed, so we
need to distinguish between two types of tokens: NAME and NAMEREF.

While separating those, I noticed that we incorrectly allowed a "#"
followed by *zero* alphanumeric characters to be considered a NAMEREF,
which it shouldn't. In other words, NAMEREF should have ALNUM+, not ALNUM*.
Same for VALREF, which can't be just a ":" with nothing after it.
So this patch fixes these mistakes, and adds tests for them.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:08:36 +03:00
Nadav Har'El
13476c8202 alternator: complain about unused values or names in UpdateExpression
DynamoDB complains, and fails an update, if the update contains in
ExpressionAttributeNames or ExpressionAttributeValues names which aren't
used by the expression.

Let's do the same, although sadly this means more work to track which
of the references we've seen and which we haven't.

This patch makes two previously xfail (expected fail) tests become
successful tests on Alternator (they always succeeded against DynamoDB).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:35 +03:00
Nadav Har'El
c4fc02082b alternator-test: complete test for UpdateItem's UpdateExpression
The existing tests in test_update_expression.py thoroughly tested the
UpdateExpression features which we currently support. But tests for
features which Alternator *doesn't* yet support were partial.

In this patch, we add a large number of new tests to
test_update_expression.py aiming to cover ALL the features of
UpdateExpression, regardless of whether we already support it in
Alternator or not. Every single feature and esoteric edge-case I could
discover is covered in these tests - and as far as I know these tests
now cover the *entire* UpdateExpression feature. All the tests succeed
on DynamoDB, and confirm our understanding of what DynamoDB actually does
on all these cases.

After this patch, test_update_expression.py is a whopper, with 752 lines of
code and 37 separate test functions. 23 out of these 37 tests are still
"xfail" - they succeed on DynamoDB but fail on Alternator, because of
several features we are still missing. Those missing features include
direct updates of nested attributes, read-modify-write updates (e.g.,
"SET a=b" or "SET a=a+1"), functions (e.g., "SET a = list_append(a, :val)"),
the ADD and DELETE operations on sets, and various other small missing
pieces.

The benefit of this whopper test is two-fold: First, it will allow us
to test our implementation as we continue to fill it (i.e., "test-
driven development"). Second, all these tested edge cases basically
"reverse engineer" how DynamoDB's expression parser is supposed to work,
and we will need this knowledge to implement the still-missing features of
UpdateExpression.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:34 +03:00
Nadav Har'El
ede5943401 alternator-test: test for UpdateItem's UpdateExpression
This patch adds an extensive array of tests for UpdateItem's UpdateExpression
support, which was introduced in the previous patch.

The tests include verification of various edge cases of the parser, support
for ":value" and "#name" references, functioning SET and REMOVE operations,
combinations of multiple such operations, and much more.

As usual, all these tests were ran and succeed on DynamoDB, as well as on
Alternator - to confirm Alternator behaves the same as DynamoDB.

There are two tests marked "xfail" (expected to fail), because Alternator
still doesn't support the attribute copy syntax (e.g., "SET a = b",
doing a read-before-write).

There are some additional areas which we don't support - such as the DELETE
and ADD operations or SET with functions - but those areas aren't yet test
in these tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:07:33 +03:00
Nadav Har'El
4baa0d3b67 alternator: enable support for UpdateItem's UpdateExpression
For the UpdateItem operation, so far we supported updates via the
AttributeUpdates parameter, specifying which attributes to set or remove
and how. But this parameter is considered deprecated, and DynamoDB supports
a more elaborate way to modify attributes, via an "UpdateExpression".

In the previous patch we added a function to parse such an UpdateExpression,
and in this patch we use the result of this parsing to actually perform
the required updates.

UpdateExpression is only partially supported after this patch. The basic
"SET" and "REMOVE" operations are supported, but various other cases aren't
fully supported and will be fixed in followup patches. The following
patch will add extensive tests to confirm exactly what works correctly
with the new UpdateExpression support.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:34 +03:00
Nadav Har'El
829bafd181 alternator: add expression parsers
The DynamoDB protocol is based on JSON, and most DynamoDB requests describe
the operation and its parameters via JSON objects such as maps and lists.
However, in some types of requests an "expression" is passed as a single
string, and we need to parse this string. These cases include:
1. Attribute paths, such as "a[3].b.c", are used in projection
 expressions as well as inside other expressions described below.
2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
 used in conditional updates, filters, and other places.
3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"

This patch introduces the framework to parse these expressions, and
an implementation of parsing update expressions. These update expressions
will be used in the UpdateItem operation in the next patch.

All these expression syntaxes are very simple: Most of them could be
parsed as regular expressions, or at most a simple hand-written lexical
analyzer and recursive-descent parser. Nevertheless, we decided to specify
these parsers in the same ANTLR3 language already used in the Scylla
project for parsing CQL, hopefully making these parsers easier to reason
about, and easier to change if needed - and reducing the amount of boiler-
plate code.

The parsing of update expressions is most complete except that in SET
actions, only the "path = value" form is supported and not yet forms
forms such as "path1 = path2" (which does read-before-write) or
"path1 = path1 + value" or "path = function(...)".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:12 +03:00
Nadav Har'El
f0f50607a7 alternator-test: split nested-document tests to new file
We need to write more tests for various case of handling
nested documents and nested attributes. Let's collect them
all in the same test file.

This patch mostly moves existing code, but also adds one
small test, test_nested_document_attribute_write, which
just writes a nested document and reads it back (it's
mostly covered by the existing test_put_and_get_attribute_types,
but is specifically about a nested document).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:11 +03:00
Nadav Har'El
12abe8e797 alternator-test: make local test the default
We usually run Alternator tests against the local Alternator - testing
against AWS DynamoDB is rarer, and usually just done when writing the
test. So let's make "pytest" without parameters default to testing locally.
To test against AWS, use "pytest --aws" explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:06:10 +03:00
Piotr Sarna
b67f22bfc6 alternator: move related functions to serialization.cc
Existing functions related to serialization and deserialization
are moved to serialization.cc source file.
Message-Id: <fb49a08b05fdfcf7473e6a7f0ac53f6eaedc0144.1559646761.git.sarna@scylladb.com>
2019-09-11 15:06:05 +03:00
Piotr Sarna
fdba9866fc alternator: apply new serialization to reads and writes
Attributes for reads (GetItem, Query, Scan, ...) and writes (PutItem,
UpdateItem, ...) are now serialized and deserialized in binary form
instead of raw JSON, provided that their type is S, B, BOOL or N.
Optimized serialization for the rest of the types will be introduced
as follow-ups.
Message-Id: <6aa9979d5db22ac42be0a835f8ed2931dae208c1.1559646761.git.sarna@scylladb.com>
2019-09-11 15:02:21 +03:00
Piotr Sarna
b3fd4b5660 alternator: add simple attribute serialization routines
Attributes used to be written into the database in raw JSON format,
which is far from optimal. This patch introduces more robust
serializationi routines for simple alternator types: S, B, BOOL, N.
Serialization uses the first byte to encode attribute type
and follows with serializing data in binary form.
More complex types (sets, lists, etc.) are currently still
serialized in raw JSON and will be optimized in follow-up patches.
Message-Id: <10955606455bbe9165affb8ac8fba4d9e7c3705f.1559646761.git.sarna@scylladb.com>
2019-09-11 15:01:07 +03:00
Piotr Sarna
27f00d1693 alternator: move error class to a separate header
Error class definitions were previously in server.hh, but they
are separate entities - future .cc files can use the errors without
the need of including server definitions.
Message-Id: <b5689e0f4c9f9183161eafff718f45dd8a61b653.1559646761.git.sarna@scylladb.com>
2019-09-11 14:52:58 +03:00
Nadav Har'El
52810d1103 configure.py: move alternator source files to separate list
For some unknown reason we put the list of alternator source files
in configure.py inside the "api" list. Let's move it into a separate
list.

We could have just put it in the scylla_core list, but that would cause
frequent and annoying patch conflicts when people add alternator source
files and Scylla core source files concurrently.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:52:39 +03:00
Nadav Har'El
d4b3c493ad alternator: stub support for UpdateItem with UpdateExpression
So far for UpdateItem we only supported the old-style AttributeUpdates
parameter, not the newer UpdateExpression. This patch begins the path
to supporting UpdateExpression. First, trying to use *both* parameters
should result in an error, and this patch does this (and tests this).
Second, passing neither parameters is allowed, and should result in
an *empty* item being created.

Finally, since today we do not yet support UpdateExpression, this patch
will cause UpdateItem to fail if UpdateExpression is used, instead of
silently being ignored as we did so far.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:40 +03:00
Nadav Har'El
04856a81f5 alternator-tests: two simple test for nested documents
This patch adds two simple tests for nested documents, which pass:

test_nested_document_attribute_overwrite() tests what happens when
we UpdateItem a top-level attribute to a dictionary. We already tested
this works on an empty item in a previous test, but now we check what
happens when the attribute already existed, and already was a dictionary,
and now we update it to a new dictionary. In the test attribute a was
{b:3, c:4} and now we update it to {c:5}. The test verifies that the new
dictionary completely replaces the old one - the two are not merged.
The new value of the attribute is just {c:5}, *not* {b:3, c:5}.

The second test verifies that the AttributeUpdates parameter of
UpdateItem cannot be used to update a just a nested attributes.
Any dots in the attribute name are considered an actual dot - not
part of a path of attribute names.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:39 +03:00
Nadav Har'El
b782d1ef8d alternator-test: test_query.py: change item list comparison
Comparing two lists of items without regard for order is not trivial.
For this reason some tests in test_query.py only compare arrays of sort
keys, and those tests are fine.

But other tests used a trick of converting a list of items into a
of set_of_frozen_elements() and compare this sets. This trick is almost
correct, but it can miss cases where items repeat.

So in this patch, we replace the set_of_frozen_elements() approach by
a similar one using a multiset (set with repetitions) instead of a set.
A multiset in Python is "collections.Counter". This is the same approach
we started to also used in test_scan.py in a recent patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:51:38 +03:00
Nadav Har'El
15f47a351e alternator: remove unused code
Remove the incomplete and unused function to convert DynamoDB type names
to ScyllaDB type objects:

DynamoDB has a different set of types relevant for keys and for attributes.
We already have a separate function, parse_key_type(), for parsing key
types, and for attributes - we don't currently parse the type names at
all (we just save them as JSON strings), so the function we removed here
wasn't used, and was in fact #if'ed out. It was never completed, and it now
started to decay (the type for numbers is wrong), so we're better off
completely removing it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:50:44 +03:00
Nadav Har'El
b63bd037ea alternator: implement correct "number" type for keys
This patch implements a fully working number type for keys, and now
Alternator fully and correctly supports every key type - strings, byte
arrays, and numbers.

The patch also adds a test which verifies that Scylla correctly sorts
number sort keys, and also correctly retrieves them to the full precision
guaranteed by DynamoDB (38 decimal digits).

The implementation uses Scylla's "decimal" type, which supports arbitrary
precision decimal floating point, and in particular supports the precision
specified by DynamoDB. However, "decimal" is actually over-qualified for
this use, so might not be optimal for the more specific requirements of
DynamoDB. So a FIXME is left to optimize this case in the future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:47 +03:00
Nadav Har'El
cb1b2b1fc2 alternator-test: test_scan.py: change item list comparison
Comparing two lists of items without regard for order is not trivial.
test_scan.py currently has two ways of doing this, both unsatisfactory:

1. We convert each list to a set via set_of_frozen_elements(), and compare
   the sets. But this comparison can miss cases where items repeat.

2. We use sorted() on the list. This doesn't work on Python 3 because
   it removed the ability to compare (with "<") dictionaries.

So in this patch, we replace both by a new approach, similar to the first
one except we use a multiset (set with repetitions) instead of a set.
A multiset in Python is "collections.Counter".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:46 +03:00
Nadav Har'El
4a1b6bf728 alternator-test: drop "test_2_tables" fixture
Creating and deleting tables is the slowest part of our tests,
so we should lower the number of tables our tests create.

We had a "test_2_tables" fixture as a way to create two
tables, but since our tests already create other tables
for testing different key types, it's faster to reuse those
tables - instead of creating two more unused tables.

On my system, a "pytest --local", running all 38 tests
locally, drops from 25 seconds to 20 seconds.

As a bonus, we also have one fewer fixture ;-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:45 +03:00
Nadav Har'El
013fb1ae38 alternator-text: fix errors in len/length variable name
Also change "xrage" to "range" to appease Python 3

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:44 +03:00
Nadav Har'El
30a123d8ad DynamoDB limits the size of hash keys to 2048 bytes, sort keys
to 1024 bytes, and the entire item to 400 KB which therefore also
limits the size of one attribute. This test checks that we can
reach up to these limits, with binary keys and attributes.

The test does *not* check what happens once we exceed these
limits. In such a case, DynamoDB throws an error (I checked that
manually) but Alternator currently simply succeeds. If in the
future we decide to add artificial limits to Alternator as well,
we should add such tests as well.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:43 +03:00
Nadav Har'El
b91eca28bd alternator-test: don't use "len" as a parameter name
"len" is an unfortunate choice for a variable name, in case one
day the implementation may want to call the built-in "len" function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:42 +03:00
Nadav Har'El
e21e0e6a37 alternator-test: test sort-key ordering - for both string and binary keys
We already have a test for *string* sort-key ordering of items returned
by the Scan operation, and this test adds a similar test for the Query
operation. We verify that items are retrieved in the desired sorted
order (sorted by the aptly-named sort key) and not in creation order
or any other wrong order.

But beyond just checking that Query works as expected (it should,
given it uses the same machinary as Scan), the nice thing about this
test is that it doesn't create a new table - it uses a shared table
and creates one random partition inside it. This makes this test
faster and easier to write (no need for a new fixture), and most
importantly - easily allows us to write similar tests for other
key types.

So this patch also tests the correct ordering of *binary* sort keys.
It helped exposed bugs in previous versions of the binary key implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:41 +03:00
Nadav Har'El
1d058cf753 alternator-test: test item operations with binary keys
Simple tests for item operations (PutItem, GetItem) with binary key instead
of string for the hash and sort keys. We need to be able to store such
keys, and then retrieve them correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:40 +03:00
Nadav Har'El
4bfd5d7ed1 alternator: add support for bytes as key columns
Until now we only supported string for key columns (hash or sort key).
This patch adds support for the bytes type (a.k.a binary or blob) as well.
The last missing type to be supported in keys is the number type.

Note that in JSON, bytes values are represented with base64 encoding,
so we need to decode them before storing the decoded value, and re-encode
when the user retrieves the value. The decoding is important not just
for saving storage space (the encoding is 4/3 the size of the decoded)
but also for correct *sorting* of the binary keys.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:49:35 +03:00
Nadav Har'El
57b46a92d7 alternator: add base64 encoding and decoding functions
The DynamoDB API uses base64 encoding to encode binary blobs as JSON
strings. So we need functions to do these conversions.

This code was "inspired" by https://github.com/ReneNyffenegger/cpp-base64
but doesn't actually copy code from it.

I didn't write any specific unit tests for this code, but it will be
exercised and tested in a following patch which tests Alternator's use
of these functions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:46:13 +03:00
Piotr Sarna
0980fde9d5 alternator-test: add dedicated BEGINS_WITH case to Query
BEGINS_WITH behaves in a special way when a key postfix
consists of <255> bytes. The initial test does not use that
and instead checks UTF-8 characters, but once bytes type
is implemented for keys, it should also test specifically for
corner cases, like strings that consist of <255> byte only.
Message-Id: <fe10d7addc1c9d095f7a06f908701bb2990ce6fe.1558603189.git.sarna@scylladb.com>
2019-09-11 14:46:12 +03:00
Piotr Sarna
5bc7bb00e0 alternator-test: rename test_query_with_paginator
Paginator is an implementation detail and does not belong in the name,
and thus the test is renamed to test_query_basic_restrictions.
Message-Id: <849bc9d210d0faee4bb8479306654f2a59e18517.1558524028.git.sarna@scylladb.com>
2019-09-11 14:46:11 +03:00
Piotr Sarna
9e2ecf5188 alternator: fix string increment for BEGINS_WITH
BEGINS_WITH statement increments a string in order to compute
the upper bound for a clustering range of a query.
Unfortunately, previous implementation was not correct,
as it appended a <0> byte if the last character was <255>,
instead of incrementing a last-but-one character.
If the string contains <255> bytes only, the upper bound
of the returned upper bound is infinite.
Message-Id: <3a569f08f61fca66cc4f5d9e09a7188f6daad578.1558524028.git.sarna@scylladb.com>
2019-09-11 14:45:17 +03:00
Nadav Har'El
7b9180cd99 alternator: common get_read_consistency() function
We had several places in the code that need to parse the
ConsistentRead flag in the request. Let's add a function
that does this, and while at it, checks for more error
cases and also returns LOCAL_QUORUM and LOCAL_ONE instead
of QUORUM and ONE.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:44:24 +03:00
Nadav Har'El
56907bf6c6 alternator: for writes, use LOCAL_QUORUM instead of QUORUM
As Shlomi suggested in the past, it is more likely that when we
eventually support global tables, we will use LOCAL_QUORUM,
not QUORUM. So let's switch to that now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:44:20 +03:00
Nadav Har'El
8c347cc786 alternator-test: verify that table with only hash key also works
So far, all of the tests in test_item.py (for PutItem, GetItem, UpdateItem),
were arbitrarily done on a test table with both hash key and sort key
(both with string type). While this covers most of the code paths, we still
need to verify that the case where there is *not* a sort key, also works
fine. E.g., maybe we have a bug where a missing clustering key is handled
incorrectly or an error is incorrectly reported in that case?

But in this patch we add tests for the hash-key-only case, and see that
it already works correctly. No bug :-)

We add a new fixture test_table_s for creating a test table with just
a single string key. Later we'll probably add more of these test tables
for additional key types.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:41:16 +03:00
Nadav Har'El
c53b2ebe4d alternator-test: also test for missing part of key
Another type of key type error can be to forget part of the key
(the hash or sort key). Let's test that too (it already works correctly,
no need to patch the code).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:41:15 +03:00
Nadav Har'El
f58abb76d6 alternator: gracefully handle wrong key types
When a table has a hash key or sort key of a certain type (this can
be string, bytes, or number), one cannot try to choose an item using
values of different types.

We previously did not handle this case gracefully, and PutItem handled
it particularly bad - writing malformed data to the sstable and basically
hanging Scylla. In this patch we fix the pk_from_json() and ck_from_json()
functions to verify the expected type, and fail gracefully if the user
sent the wrong type.

This patch also adds tests for these failures, for the GetItem, PutItem,
and UpdateItem operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:40:23 +03:00
Nadav Har'El
9ee912d5cf alternator: correct handling of missing item in GetItem
According to the documentation, trying to GetItem a non-existant item
should result in an empty response - NOT a response with an empty "Item"
map as we do before this patch.

This patch fixes this case, and adds a test case for it. As usual,
we verify that the test case also works on Amazon DynamoDB, to verify
DynamoDB really behaves the way we thik it does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:39:32 +03:00
Nadav Har'El
7f73f561d5 alternator: fix support for empty items
If an empty item (i.e., no attributes except the key) is created, or an item
becomes empty (by deleting its existing attributes), the empty item must be
maintained - it cannot just disappear. To do this in Scylla, we must add a
row marker - otherwise an empty attribute map is not enough to keep the
row alive.

This patch includes 4 test cases for all the various ways an empty item can be
created empty or non-empty item be emptied, and verifies that the empty item
can be correctly retrieved (as usual, to verify that our expectation of
"correctness" is indeed correct, we run the same tests against DynamoDB).
All these 4 tests failed before this patch, and now succeed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:38:40 +03:00
Nadav Har'El
95ed2f7de8 alternator: remove two unused lines of code
These lines of codes were superfluous and their result unused: the
make_item_mutation() function finds the pk and ck on its own.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:37:49 +03:00
Nadav Har'El
eb81b31132 alternator: add statistics
his patch adds a statistics framework to Alternator: Executor has (for
each shard) a _stats object which contains counters for various events,
and also is in charge of making these counters visible via Scylla's regular
metrics API (http://localhost:9180/metrics).

This patch includes a counter for each of DynamoDB's operation types,
and we increase the ones we support when handled. We also added counters
for total operations and unsupported operations (operation types we don't
yet handle). In the future we can easily add many more counters: Define
the counter in stats.hh, export it in stats.cc, and increment it in
where relevant in executor.cc (or server.cc).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:36:26 +03:00
Piotr Sarna
d267e914ad alternator-test: add initial Query test
The test covers simple restrictions on primary keys.
Message-Id: <2a7119d380a9f8572210571c565feb8168d43001.1558356119.git.sarna@scylladb.com>
2019-09-11 14:36:25 +03:00
Piotr Sarna
b309c9d54b alternator: implement basic Query
The implementation covers the following restrictions
 - equality for hash key;
 - equality, <, <=, >, >=, between, begins_with for sort key.
Message-Id: <021989f6d0803674cbd727f9b8b3815433ceeea5.1558356119.git.sarna@scylladb.com>
2019-09-11 14:36:16 +03:00
Piotr Sarna
8571046d3e alternator: move do_query to separate function
A fair portion of code from scan() will be used later to implement
query(), so it's extracted as a helper function.
Message-Id: <d3bc163a1cb2032402768fcbc6a447192fba52a4.1558356119.git.sarna@scylladb.com>
2019-09-11 14:31:31 +03:00
Nadav Har'El
4a8b2c794d alternator-test: another edge case for Scan with AttributesToGet
Ask to retrieve only an attribute name which *none* of the items have.
The result should be a silly list of empty items, and indeed it is.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:30 +03:00
Nadav Har'El
c766d1153d alternator-test: shorten test_scan.py by reusing full_scan more
Use full_scan() in another test instead of open-coding the scan.
There are two more tests that could have used full_scan(), but
since they seem to be specifically adding more assertions or
using a different API ("paginators"), I decided to leave them
as-is. But new tests should use full_scan().

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:29 +03:00
Nadav Har'El
2666b29c77 alternator-test: test AttributesToGet parameter in Scan request
This is a short, but extensive, test to the AttributesToGet parameter
to Scan, allowing to select for output only some of the attributes.

The AttributesToGet feature has several non-obvious features. Firstly,
it doesn't require that any key attributes be selected. So since each
item may have different non-key attributes, some scanned items may
be missing some of the selected columns, and some of the items may
even be missing *all* the selected columns - in which case DynamoDB
returns an empty item (and doesn't entirely skip this item). This
test covers all these cases, and it adds yet another item to the
'filled_test_table' fixture, one which has different attributes,
so we can see these issues.

As usual, this test passes in both DynamoDB and Alternator, to
assure we correspond to the *right* behavior, not just what we
think is right.

This test actually exposed a bug in the way our code returned
empty items (items which had none of the selected columns),
a bug which was fixed by the previous patch.

Instead of having yet another copy of table-scanning code, this
patch adds a utility function full_scan(), to scan an entire
table (with optional extra parameters for the scan) and return
the result as an array. We should simply existing tests in
test_scan.py by using this new function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:31:28 +03:00
Avi Kivity
446faba49c Merge "dbuild: add --image option, help, and usage" from Benny
* tag 'dbuild-image-help-usage-v1' of github.com:bhalevy/scylla:
  dbuild: add usage
  dbuild: add help option
  dbuild: list available images when no image arg is given
  dbuild: add --image option
2019-09-11 14:30:45 +03:00
Nadav Har'El
f871a4bc87 alternator: fix bug in returning an empty item in a Scan
When a Scan selects only certain attributes, and none of the key
attributes are selected, for some of the scanned items *nothing*
will remain to be output, but still Dynamo outputs an empty item
in this case. Our code had a bug where after each item we "moved"
the object leaving behind a null object, not an empty map, so a
completely empty item wasn't output as an empty map as expected,
and resulted in boto3 failing to parse the response.

This simple one-line patch fixes the bug, by resetting the item
to an empty map after moving it out.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:30:37 +03:00
Piotr Sarna
8525b14271 alternator: add lookup table for requests
Instead of using a really long if-else chain, requests are now
looked up via a routing table.
Message-Id: <746a34b754c3070aa9cbeaf98a6e7c6781aaee65.1557914794.git.sarna@scylladb.com>
2019-09-11 14:29:59 +03:00
Piotr Sarna
f3440f2e4a alternator-test: migrate filled_test_table to use batches
Filled test table fixture now takes advantage of batch writes
in order to run faster.
Message-Id: <e299cdffa9131d36465481ca3246199502d65e0c.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:58 +03:00
Piotr Sarna
4c3bdd3021 alternator-test: add batch writing test case
Message-Id: <a950799dd6d31db429353d9220b63aa96676a7a7.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:57 +03:00
Piotr Sarna
c0ecd1a334 alternator: add basic BatchWriteItem
The initial implementation only supports PutRequest requests,
without serving DeleteRequest properly.
Message-Id: <451bcbed61f7eb2307ff5722de33c2e883563643.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:50 +03:00
Nadav Har'El
9a0c13913d alternator: improve where DescribeEndpoints gets its information
Instead of blindly returning "localhost:8000" in response to
DescribeEndpoints and for sure causing us problems in the future,
the right thing to do is to return the same domain name which the
user originally used to get to us, be it "localhost:8000" or
"some.domain.name:1234". But how can we know what this domain name
was? Easy - this is why HTTP 1.1 added a mandatory "Host:" header,
and the DynamoDB driver I tested (boto3) adds it as expected,
indeed with the expected value of "localhost:8000" on my local setup.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:25:22 +03:00
Nadav Har'El
a4a3b2fe43 alternator-test: test for sort order of items in a single partition
Although different partitions are returned by a Scan in (seemingly)
random order, items in a single partition need to be returned sorted
by their sort key. This adds a test to verify this.

This patch adds to the filled_test_table fixture, which until now
had just one item in each partition, another partition (with the key
"long") with 164 additional items. The test_scan_sort_order_string
test then scans this table, and verifies that the items are really
returned in sorted order.

The sort order is, of course, string order. So we have the first
item with sort key "1", then "10", then "100", then "101", "102",
etc. When we implement numeric keys we'll need to add a version
of this test which uses a numeric clustering key and verifies the
sort order is numeric.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:25:21 +03:00
Nadav Har'El
32c388b48c alternator: fix clustering key setup
Because of a typo, we incorrectly set the table's sort key as a second
partition key column instead of a clustering key column. This has bad
but subtle consequences - such as that the items are *not* sorted
according to the sort key. So in this patch we fix the typo.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:24:30 +03:00
Nadav Har'El
29e0f68ee0 alternator: add initial implementation of DescribeEndpoints
DescribeEndpoints is not a very important API (and by default, clients
don't use it) but I wanted to understand how DynamoDB responds to it,
and what better way than to write a test :-)

And then, if we already have a test, let's implement this request in
Scylla as well. This is a silly implementation, which always returns
"localhost:8000". In the future, this will need to be configurable -
we're not supposed here to return *this* server's IP address, but rather
a domain name which can be used to get to all servers.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:22:47 +03:00
Avi Kivity
211b0d3eb4 Merge "sstables, gdb: Centralize tracking of sstable instances" from Tomasz
"
Currently, GDB scripts locate sstables by scanning the heap for
bag_sstable_set containers. That has disadvatanges:

  - not all containers are considered

  - it's extremely slow on large heaps

  - fragile, new containers can be added, and we won't even know

This series fixes all above by adding a per-shard sstable tracker
which tracks sstable objects in a linked-list.
"

* 'sstable-tracker' of github.com:tgrabiec/scylla:
  gdb: Use sstable tracker to get the list of sstables
  gdb: Make intrusive_list recognize member_hook links
  sstables: Track whether sstable was already open or not
  sstables: Track all instances of sstable objects
  sstables: Make sstable object not movable
  sstables: Move constructor out of line
2019-09-11 14:22:41 +03:00
Nadav Har'El
982b5e60e7 alternator: unify and improve TableName field handling
Most of the request types need to a TableName parameter, specifying the
name of the table they operate on. There's a lot of boilerplate code
required to get this table name and verify that it is valid (the parameter
exists, is a string, passes DynamoDB's naming rules, and the table
actually exists), which resulted in a lot of code duplication - and
in some cases missing checks.

So this patch introduces two utility functions, get_table_name()
and get_table(), to fetch a table name or the schema of an existing
table, from the request, with all necessary validation. If validation
fails, the appropriate api_error() is thrown so the user gets the
right error message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:21:53 +03:00
Nadav Har'El
b8fc783171 alternator-test: clean up conftest.py
Remove unused random-string code from conftest.py, and also add a
TODO comment how we should speed up filled_test_table fixture by
using a batch write - when that becomes available in Alternator.
(right now this fixture takes almost 4 seconds to prepare on a local
Alternator, and a whopping 3 minutes (!) to prepare on DynamoDB).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:21:52 +03:00
Piotr Sarna
a4387079ac alternator-test: add initial scan test
Message-Id: <c28ff1d38930527b299fe34e9295ecd25607398c.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:51 +03:00
Piotr Sarna
b6d148c9e0 alternator-test: add filled test table fixture
The fixture creates a test table and fills it with random data,
which can be later used for testing reads.
Message-Id: <649a8b8928e1899c5cbd82d65d745a464c1163c8.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:50 +03:00
Piotr Sarna
4def674731 alternator: implement basic scan
The most basic version of Scan request is implemented.
It still contains a list of TODOs, among which the support for Segments
parameter for scan parallelism.
Message-Id: <5d1bfc086dbbe64b3674b0053e58a0439e64909b.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:39 +03:00
Piotr Sarna
0ce3866fb5 alternator: lower debug messages verbosity in the HTTP server
The HTTP server still uses WARN log level to log debug messages,
which is way higher than necessary. These messages are degraded
to TRACE level.
Message-Id: <59559277f2548d4046001bebff45ab2d3b7063b5.1557744617.git.sarna@scylladb.com>
2019-09-11 14:12:40 +03:00
Nadav Har'El
d45220fb39 alternator-test: simplify test_put_and_get_attribute_types
The test test_put_and_get_attribute_types needlessly named all the
different attributes and their variables, causing a lot of repetition
and chance for mistakes when adding additional attributes to the test.

In this rewrite, we only have a list of items, and automatically build
attributes with them as values (using sequential names for the attributes)
and check we read back the same item (Python's dict equality operator
checks the equality recursively, as expected).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:39 +03:00
Nadav Har'El
ea32841dab alternator-test: test all attribute types
Although we planned to initially support only string types, it turns out
for the attributes (*not* the key), we actually support all types already,
including all scalar types (string, number, bool, binary and null) and
more complex types (list, nested document, and sets).

This adds a tests which PutItem's these types and verifies that we can
retrieve them.

Note that this test deals with top-level attributes only. There is no
attempt to modify only a nested attribute (and with the current code,
it wouldn't work).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:38 +03:00
Nadav Har'El
c645538061 alternator-test: rewrite ListTables test
In our tests, we cannot really assume that ListTables should returns *only*
the tables we created for the test, or even that a page size of 100 will
be enough to list our 3 pages. The issue is that on a shared DynamoDB, or
in hypothetical cases where multiple tests are run in parallel, or previous
tests had catestrophic errors and failed to clean up, we have no idea how
many unrelated tables there are in the system. There may be hundreds of
them.  So every ListTables test will need to use paging.

So in this re-implementation, we begin with a list_tables() utility function
which calls ListTables multiple times to fetch all tables, and return the
resulting list (we assume this list isn't so huge it becomes unreasonable
to hold it in memory). We then use this utility function to fetch the table
list with various page sizes, and check that the test tables we created are
listed in the resulting list.

There's no longer a separate test for "all" tables (really was a page of 100
tables) and smaller pages (1,2,3,4) - we now have just one test that does the
page sizes 1,2,3,4, 50 and 100.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:12:37 +03:00
Piotr Sarna
6b83e17b74 alternator: add tests to ListTables command
Test cases cover both listing appropriate table names
and pagination.
Message-Id: <e7d5f1e5cce10c86c47cdfb4d803149488935ec0.1557402320.git.sarna@scylladb.com>
2019-09-11 14:12:36 +03:00
Piotr Sarna
dfbf4ffe0f alternator-test: add 2 tables fixture
For some tests, more than 1 table is needed, so another fixture
that provided two additional test tables is added.
Message-Id: <75ae9de5cc1bca19594db1f0bc03260f83459380.1557402320.git.sarna@scylladb.com>
2019-09-11 14:12:35 +03:00
Piotr Sarna
b6dde25bcc alternator: implement ListTables
ListTables is used to extract all table names created so far.
Message-Id: <04f4d804a40ff08a38125f36351e56d7426d2e3d.1557402320.git.sarna@scylladb.com>
2019-09-11 14:10:54 +03:00
Piotr Sarna
b73a9f3744 alternator: use trace level for debug messages
In the early development stage, warn level was used for all
debug messages, while it's more appropriate to use 'trace' or 'debug'.
Message-Id: <419ca5a22bc356c6e47fce80b392403cefbee14d.1557402320.git.sarna@scylladb.com>
2019-09-11 14:10:02 +03:00
Nadav Har'El
4ed9aa4fb4 alternator-test: cleanup in conftest.py
This patch cleans up some comments and reorganizes some functions in
conftest.py, where the test_table fixture was defined. The goal is to
later add additional types of test tables with different schemas (e.g.,
just a partition key, different key types, etc.) without too much
code duplication.

This patch doesn't change anything functional in the tests, and they
still pass ("pytest --local" runs all tests against the local Alternator).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:10:01 +03:00
Nadav Har'El
5c564b7117 alternator: make ck_from_json() easier to use
The ck_from_json() utility function is easier to use if it handles
the no-clustering-key case as the callers need them too, instead of
requiring them to handle the no-clustering-key case separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:09:06 +03:00
Nadav Har'El
3ae0066aae alternator: add support for UpdateItem's DELETE operation
So far we supported UpdateItem only with PUT operations - this patch
adds support for DELETE operations, to delete specific attributes from
an item.

Only the case of a missing value is support. DynamoDB also provides
the ability to pass the old value, and only perform the deletion if
the value and/or its type is still up-to-date - but we don't support
this yet and fail such request if it is attempted.

This patch also includes a test for this case in alternator-test/

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:08:57 +03:00
Nadav Har'El
81679d7401 alternator-test: add tests for UpdateItem
Add initial tests for UpdateItem. Only the features currently supported
by our code (only string attributes, only "PUT" action) are tested.

As usual, this test (like all others) was tested to pass on both DynamoDB
and Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:03:10 +03:00
Nadav Har'El
0c2a440f7f alternator: add initial UpdateItem implementation
Add an initial UpdateItem implementation. As PutItem and GetItem we
are still limited to string attributes. This initial implementation
of UpdateItem implements only the "PUT" action (not "DELETE" and
certainly not "ADD") and not any of the more advanced options.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:03:00 +03:00
Piotr Sarna
686d1d9c3c alternator: add attrs_column() helper function
Message-Id: <d93ae70ccd27fe31d0bc6915a20d83d7a85342cf.1557223199.git.sarna@scylladb.com>
2019-09-11 13:08:52 +03:00
Piotr Sarna
6ad9b10317 alternator: make constant names more explicit
KEYSPACE and ATTRS constants refer to their names, not objects,
so they're named more explicitly.
Message-Id: <14b1f00d625e041985efbc4cbde192bd447cbf03.1557223199.git.sarna@scylladb.com>
2019-09-11 13:07:14 +03:00
Piotr Sarna
2975ca668c alternator: remove inaccessible return statement
Message-Id: <afaef20e7e110fa23271fb8c3dc40cec0716efb6.1557223199.git.sarna@scylladb.com>
2019-09-11 13:06:21 +03:00
Piotr Sarna
6e8db5ac6a alternator: inline keywords
It was decided that all alternator-specific keywords can be inlined
in code instead of defining them as constants.
Message-Id: <6dffb9527cfab2a28b8b95ac0ad614c18027f679.1557223199.git.sarna@scylladb.com>
2019-09-11 13:04:38 +03:00
Nadav Har'El
50a69174b3 alternator: some cleanups in validate_table_name()
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:03:44 +03:00
Nadav Har'El
0e06d82a1f alternator: clean up api_error() interface
All operation-generated error messages should have the 400 HTTP error
code. It's a real nag to have to type it every time. So make it the
default.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:01:47 +03:00
Nadav Har'El
0634629a79 alternator-test: test for error on creating an already-existing table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:01:46 +03:00
Nadav Har'El
6fe6cf0074 alternator: correct error when trying to CreateTable an existing table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:00:54 +03:00
Nadav Har'El
871dd7b908 alternator: fix return object from PutItem
Without special options, PutItem should return nothing (an empty
JSON result). Previously we had trouble doing this, because instead
of return an empty JSON result, we converted an empty string into
JSON :-) So the existing code had an ugly workaround which worked,
sort of, for the Python driver but not for the Java driver.

The correct fix, in this patch, is to invent a new type json_string
which is a string *already* in JSON and doesn't need further conversion,
so we can use it to return the empty result. PutItem now works from
YCSB's Java driver.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:00:47 +03:00
Nadav Har'El
ae1ee91f3c alternator-test: more examples in README.md
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:56:07 +03:00
Nadav Har'El
886438784c alternator-test: test table name limit of 222 bytes, instead of 255.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:56:06 +03:00
Nadav Har'El
28e7fa20ed alternator: limit table names to 222 bytes
Although we would like to allow table names up to 222 bytes, this is not
currently possible because Scylla tacks additional 33 bytes to create
a directory name, and directory names are limited to 255 bytes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:55:07 +03:00
Nadav Har'El
a702e5a727 alternator-test: verify appropriate error when invalid key type is used
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:55:06 +03:00
Nadav Har'El
8af58b0801 alternator: better key type parsing
The supported key types are just S(tring), B(lob), or N(umber).
Other types are valid for attributes, but not for keys, and should
not be accepted. And wrong types used should result in the appropriate
user-visible error.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:54:12 +03:00
Nadav Har'El
6cdcf5abac alternator-test: additional cases of invalid schemas in CreateTable
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:54:11 +03:00
Nadav Har'El
9839183157 alternator: better invalid schema detection for CreateTable
To be correct, CreateTable's input parsing need to work in reverse from
what it did: First, the key columns are listed in KeySchema, and then
each of these (and potetially more, e.g., from indexes) need to appear
AttributeDefinitions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:53:22 +03:00
Nadav Har'El
8bfbc1bae5 alternator-test: tests for CreateTable with bad schema
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:53:21 +03:00
Benny Halevy
0f01a4c1b8 dbuild: add usage
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:53:02 +03:00
Benny Halevy
f43bffdf9c dbuild: add help option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:52:50 +03:00
Nadav Har'El
dc34c92899 alternator: better error handling for schema errors in CreateTable
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:52:31 +03:00
Nadav Har'El
77de0af40f alternator-test: test for PutItem to nonexistant table
We expect to see the right error code, not some "internal error".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:52:30 +03:00
Nadav Har'El
ca3553c880 alternator: PutItem: appropriate error for a non-existant table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:38 +03:00
Nadav Har'El
275a07cf10 alternator-test: add another column to test_basic_string_put_and_get()
Just to make sure our success isn't limited to just a single non-key
attribute, let's add another one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:37 +03:00
Nadav Har'El
6ca72b5fed alternator: GetItem should by default returns all the columns, not none
The test

  pytest --local test_item.py::test_basic_string_put_and_get

Now passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:51:31 +03:00
Benny Halevy
c840c43fa7 dbuild: list available images when no image arg is given
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 12:51:26 +03:00
Nadav Har'El
9920143fb7 alternator: change empty return of PutItem
Without any arguments, PutItem should return no data at all. But somehow,
for reasons I don't understand, the boto3 driver gets confused from an
empty JSON thinking it isn't JSON at all. If we return a structure with
an empty "attributes" fields, boto3 is happy.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:49:20 +03:00
Nadav Har'El
8dec31d23b alternator: add initial implementation of DeleteTable
Add an initial implementation of Delete table, enough for making the

   pytest --local test_table.py::test_create_and_delete_table

Pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:42 +03:00
Nadav Har'El
41d4b88e78 alternator: on unknown operation, return standard API error
When given an unknown operation (we didn't implement yet many of them...)
we should throw the appropriate api_error, not some random exception.

This allows the client to understand the operation is not supported
and stop retrying - instead of retrying thinking this was a weird
internal error.

For example the test
   pytest --local test_table.py::test_create_and_delete_table

Now fails immediately, saying Unsupported operation DeleteTable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:04 +03:00
Nadav Har'El
1b1921bc94 alternator: fix JSON in DescribeTable response
The structure's name in DescribeTable's output is supposed to be called
"Table", not "TableDescription". Putting in the wrong place caused the
driver's table creation waiters to fail.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:44:14 +03:00
Nadav Har'El
6a455035ba alternator: validate table name in CreateTable
validate table name in CreateTable, and if it doesn't fit DynamoDB's
requirement, return the appropriate error as drivers expect.

With this patch, test_table.py::test_create_table_unsupported_names
now passes (albeit with a one minute pause - this a bug with keep-alive
support...).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:24 +03:00
Nadav Har'El
0da214c2fe alternator-test: test_create_table_unsupported_names minor fix
Check the expected error message to contain just ValidationException
instead of an overly specific text message from DynamoDB, so we aren't
so constraint in our own messages' wording.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:23 +03:00
Nadav Har'El
4f721a0637 alternator-test: test for creating table with very long name
Dynamo allows tables names up to 255 characters, but when this is tested on
Alternator, the results are disasterous: mkdir with such a long directory
name fails, Scylla considers this an unrecoverable "I/O error", and exits
the server.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:22 +03:00
Nadav Har'El
6967dd3d8f test-table: test DescribeTable on non-existent table
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:21 +03:00
Nadav Har'El
d0cdc65b4c Add "--local" option to run test against local Scylla installation
For example "pytest --local test_item.py"

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:21 +03:00
Nadav Har'El
079c7c3737 test_item.py: basic string put and get test
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:20 +03:00
Nadav Har'El
4550f3024d test_table fixture: be quicker to realize table was created.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:19 +03:00
Nadav Har'El
f1f76ed17b test_table fixture: automatically delete
Automatically delete the test table when the test ends.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:18 +03:00
Nadav Har'El
a946e255c6 test_item.py: start testing CRUD operations
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:17 +03:00
Nadav Har'El
4d7d871930 Start to use "test fixtures"
Start to use "test fixtures" defined in conftest.py: The connection to
the DynamoDB API, and also temporary tables, can be reused between multiple
tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:16 +03:00
Nadav Har'El
6984ccf462 Add some table tests and README
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:43:15 +03:00
Nadav Har'El
f66ec337f7 alternator: very initial implementation of DescribeTable
This initial implementation is enough to pass a test of getting a
failure for a non-existant table -
test_table.py::test_describe_table_non_existent_table
and to recognize an existing table. But it's still missing a lot
of fields for an existing table (among others, the schema).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:41:32 +03:00
Nadav Har'El
ad9eb0a003 alternator: errors should be output from server as Dynamo drivers expect
Exceptions from the handlers need to be output in a certain way - as
a JSON with specific fields - as DynamoDB drivers expect them to be.
If a handler throws an alternator::api_error with these specific fields,
they are output, but any other exception is converted into the same
format as an "Internal Error".

After this patch, executor code can throw an alternator::api_error and
the client will receive this error in the right format.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:40:55 +03:00
Nadav Har'El
db49bc6141 alternator: add alternator::api_error exception type
DynamoDB error messages are returned in JSON format and expect specific
information: Some HTTP error code (often but not always 400), a string
error "type" and a user-readable message. Code that wants to return
user-visible exceptions should use this type, and in the next patch we
will translate it to the appropriate JSON string.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:39:26 +03:00
Nadav Har'El
9d72bc3167 alternator: table creation time is in seconds
The "Timestamp" type returned for CreationDateTime can be one of several
things but if it is a number, it is supposed to be the time in *seconds*
since the epoch - not in milliseconds. Returning milliseconds as we
wrongly did causes boto3 (AWS's Python driver) to throw a parse exception
on this response.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:38:41 +03:00
Nadav Har'El
c0518183c2 alternator: require alternator-port configuration
Until now, we always opened the Alternator port along with Scylla's
regular ports (CQL etc.). This should really be made optional.

With this patch, by default Alternator does NOT start and does not
open a port. Run Scylla with --alternator-port=8000 to open an Alternator
API port on port 8000, as was the default until now. It's also possible
to set this in scylla.yaml.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:38:31 +03:00
Piotr Sarna
2ec78164bc alternator: add minimal HTTP interface
The interface works on port 8000 by default and provides
the most basic alternator operations - it's an incomplete
set without validation, meant to allow testing as early as possible.
2019-09-11 12:34:18 +03:00
Benny Halevy
443e0275ab dbuild: add --image option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-11 11:46:33 +03:00
Tomasz Grabiec
06154569d5 gdb: Use sstable tracker to get the list of sstables 2019-09-10 17:05:19 +02:00
Tomasz Grabiec
a141d30eca gdb: Make intrusive_list recognize member_hook links
GDB now gives "struct boost::intrusive::member_hook" from template_arguments()
2019-09-10 17:05:19 +02:00
Tomasz Grabiec
c014c79d4b sstables: Track whether sstable was already open or not
Some sstable objects correspond to sstables which are being written
and are not sealed yet. Such sstables don't have all the fields
filled-in. Tools which calculate statistics (like GDB scripts) need to
distinguish such sstables.
2019-09-10 17:05:18 +02:00
Tomasz Grabiec
33bef82f6b sstables: Track all instances of sstable objects
Will make it easier to collect statistics about sstable in-memory metadata.
2019-09-10 17:05:16 +02:00
Tomasz Grabiec
fd74504e87 sstables: Make sstable object not movable
Will be easier to add non-movable fields.

We don't really need it to be movable, all instances should be managed
by a shared pointer.
2019-09-10 17:04:54 +02:00
Tomasz Grabiec
589c7476e0 sstables: Move constructor out of line 2019-09-10 17:04:54 +02:00
Tomasz Grabiec
785fe281e7 gdb: scylla sstables: Print table name
Message-Id: <1568121825-32008-1-git-send-email-tgrabiec@scylladb.com>
2019-09-10 16:36:21 +03:00
Glauber Costa
6651f96a70 sstables: do not keep sharding information from scylla metadata in memory (#4915)
There is no reason to keep parts of the the Scylla Metadata component in memory
after it is read, parsed, and its information fed into the SSTable.

We have seen systems in which the Scylla metadata component is one
of the heaviest memory users, more than the Summary and Filter.

In particular, we use the token metadata, which is the largest part of the
Scylla component, to calculate a single integer -> the shards that are
responsible for this SSTable. Once we do that, we never use it again

Tests: unit (release/debug), + manual scylla write load + reshard.

Fixes #4951

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-09-09 22:28:51 +03:00
Tomasz Grabiec
a09479e63c Merge "Validate position in partition monotonicity" from Benny
Introduce mutation_fragment_stream_validator class and use it as a
Filter to flat_mutation_reader::consume_in_thread from
sstable::write_components to validate partition region and optionally
clustering key monotonicity.

Fixes #4803
2019-09-09 15:38:31 +02:00
Benny Halevy
42f6462837 config: enable_sstable_key_validation by default in debug build
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
34d306b982 config: add enable_sstable_key_validation option
key monotonicity validation requires an overhead to store the last key and also to compare
therefore provide an option to enable/disable it (disabled by default).

Refs #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
507c99c011 mutation_fragment_stream_validator: add compare_keys flag
Storing and comparing keys is expensive.
Add a flag to enable/disable this feature (disabled by default).
Without the flag, only the partition region monotonicity is
validated, allowing repeated clustering rows, regardless of
clustering keys.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
bc2ef1d409 mutation_fragment: declare partition_region operator<< in header file
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
496467d0a2 sstables: writer: Validate input mutation fragment stream
Fixes #4803
Refs #4804

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
a37acee68f position_in_partition: define operator=(position_in_partition_view)
The respective constructor is explicit.
Define this assignment operator to be used by flat_mutation_reader
mutation_fragment_stream_validator filter so that it can use
mutation_fragment::position() verbatim and keep its internal
state as position_in_partition.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
41b60b8bc5 compaction: s/filter_func/make_partition_filter/
It expresses the purpose of this function better
as suggested by Tomasz Grabiec.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-09 15:30:59 +03:00
Benny Halevy
24c7320575 dbuild: run interactive shell by default
If not given any other args to run, just run an interactive shell.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190909113140.9130-1-bhalevy@scylladb.com>
2019-09-09 15:15:57 +03:00
Nadav Har'El
2543760ee6 docs/metrics.md: document additional "lables"
Recently we started to use more the concept of metric labels - several
metrics which share the same name, but differ in the value of some label
such a "group" (for different scheduling groups).

This patch documents this feature in docs/metrics.md, gives the example of
scheduling groups, and explains a couple more relevant Promethueus syntax
tricks.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190909113803.15383-1-nyh@scylladb.com>
2019-09-09 15:15:57 +03:00
Botond Dénes
59a96cd995 scylla-gdb.py: introduce scylla task-queues
This command provides an overview of the reactors task queues.
Example:
   id name                             shares  tasks
 A 00 "main"                           1000.00 4
   01 "atexit"                         1000.00 0
   02 "streaming"                       200.00 0
 A 03 "compaction"                      171.51 1
   04 "mem_compaction"                 1000.00 0
*A 05 "statement"                      1000.00 2
   06 "memtable"                          8.02 0
   07 "memtable_to_cache"               200.00 0

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190906060039.42301-1-bdenes@scylladb.com>
2019-09-09 15:15:57 +03:00
Avi Kivity
8e8975730d Update seastar submoodule
* seastar cb7026c16f...b3fb4aaab3 (10):
  > Revert "scheduling groups: Adding per scheduling group data support"
  > scheduling groups: Adding per scheduling group data support
  > rpc: check that two servers are not created with the same streaming id
  > future: really ignore exceptions in ignore_ready_future
  > iostream: Constify eof() function
  > apply.hh: add missing #include for size_t
  > scheduling_group_demo: add explicit yields since future::get() no longer does
  > Fix buffer size used when calling accept4()
  > future-util: reduce allocations and continuations in parallel_for_each
  > rpc: lz4_decompressor: Add a static constexpr variable decleration for Cpp14 compatibility
2019-09-09 15:15:34 +03:00
Gleb Natapov
9e9f64d90e messaging_service: configure different streaming domain for each rpc server
A streaming domain identifies a server across shards. Each server should
have different one.

Fixes: #4953

Message-Id: <20190908085327.GR21540@scylladb.com>
2019-09-08 14:05:40 +03:00
Piotr Sarna
01410c9770 transport: make sure returning connection errors happens inside the gate.
Previously, the gate could get
closed too early, which would result in shutting down the server
before it had an opportunity to respond to the client.

Refs #4818
2019-09-08 13:23:20 +03:00
Avi Kivity
5663218fac Merge "types: Fix decimal to integer and varint to integer conversion" from Rafael
"
The release notes for boost 1.67.0 includes:

Breaking Change: When converting a multiprecision integer to a narrower type, if the value is too large (or negative) to fit in the smaller type, then the result is either the maximum (or minimum) value of the target

Since we just moved out of boost 1.66, we have to update our code.

This fixes issue #4960
"

* 'espindola/fix-4960' of https://github.com/espindola/scylla:
  types: fix varint to integer conversion
  types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer
  types: fix decimal to integer conversion
  types: extract helper for converting a decimal to a cppint
  types: rename and detemplate make_castas_fctn_from_decimal_to_integer
2019-09-08 10:45:42 +03:00
Avi Kivity
244218e483 Merge "simplify date type" from Rafael
"
With this patch series one has to be explicit to create a date_type_impl and now there is only the one documented difference between date_type_impl and timestamp_type_impl.
"

* 'espindola/simplify-date-type' of https://github.com/espindola/scylla:
  types: Reduce duplication around date_type_impl
  types: Don't use date_type_native_type when we want a timestamp
  types: Remove timestamp_native_type
  types: Don't specialize data_type_for for db_clock::time_point
  types: Make it harder to create date_type
2019-09-08 10:21:48 +03:00
Rafael Ávila de Espíndola
3bac4ebac7 types: Reduce duplication around date_type_impl
According to the comments, the only different between date_type_impl
and timestamp_type_impl is the comparison function.

This patch makes that explicit by merging all code paths except:

* The warning when converting between the two
* The compare function

The date_type_impl type can still be user visible via very old
sstables or via the thrift protocol. It is not clear if we still need
to support either, but with this patch it is easy to do so.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
36d40b4858 types: Don't use date_type_native_type when we want a timestamp
In these cases it is pretty clear that the original code wanted to
create a timestamp_type data_value but was creating a date_type one
because of the old defaults.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
01cd21c04d types: Remove timestamp_native_type
Now that we know that anything expecting a date_type has been
converted to date_type_native_type, switch to using
db_clock::time_point when we want a timestamp_type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
df6c2d1230 types: Don't specialize data_type_for for db_clock::time_point
This also moves every user to date_type_native_type. A followup patch
will convert to timestamp_type when appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Rafael Ávila de Espíndola
e09fa2dcff types: Make it harder to create date_type
date_type was replaced with timestamp_type, but it was very easy to
create a date_type instead of a timestamp_type by accident.

This patch changes the code so that a date_type is no longer
implicitly used when constructing a data_value. All existing code that
was depending on this is converted to explicitly using
date_type_native_type. A followup patch will convert to timestamp_type
when appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-07 10:07:33 -07:00
Gleb Natapov
f78b2c5588 transport: remove remaining craft related to cql's server load balancing
Commit 7e3805ed3d removed the load balancing code from cql
server, but it did not remove most of the craft that load balancing
introduced. The most of the complexity (and probably the main reason the
code never worked properly) is around service::client_state class which
is copied before been passed to the request processor (because in the past
the processing could have happened on another shard) and then merged back
into the "master copy" because a request processing may have changed it.

This commit remove all this copying. The client_request is passed as a
reference all the way to the lowest layer that needs it and it copy
construction is removed to make sure nobody copies it by mistake.

tests: dev, default c-s load of 3 node cluster

Message-Id: <20190906083050.GA21796@scylladb.com>
2019-09-07 18:17:53 +03:00
Avi Kivity
3b5aa13437 Merge "Optimize type find" from Rafael
"
This avoids a double dispatch on _kind and also removes a few shared_ptr copies.

The extra work was a small regression from the recent types refactoring.
"

* 'espindola/optimize_type_find' of https://github.com/espindola/scylla:
  types: optimize type find implementation
  types: Avoid shared_ptr copies
2019-09-07 18:14:36 +03:00
Gleb Natapov
5b9dc00916 test: fix query_processor_test::test_query_counters to use SERIAL consistency correctly
It is not possible to scan a table with SERIAL consistency only to read
a single partition.

Message-Id: <20190905143023.GQ21540@scylladb.com>
2019-09-07 18:07:01 +03:00
Gleb Natapov
e52ebfb957 cql3: remove unused next_timestamp() function
next_timestamp() just calls get_timestamp() directly and nobody uses it
anyway.

Message-Id: <20190905101648.GO21540@scylladb.com>
2019-09-05 17:20:21 +03:00
Botond Dénes
783277fb02 stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase
Currently when an error happens during the receive and distribute phase
it is swallowed and we just return a -1 status to the remote. We only
log errors that happen during responding with the status. This means
that when streaming fails, we only know that something went wrong, but
the node on which the failure happened doesn't log anything.

Fix by also logging errors happening in the receive and distribute
phase. Also mention the phase in which the error happened in both error
log messages.

Refs: #4901
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>
2019-09-05 13:43:00 +02:00
Rafael Ávila de Espíndola
dd81e94684 types: fix varint to integer conversion
The previous code was using the boost::multiprecision::cpp_int to
integer conversion, but that doesn't have the same semantics an cql
for signed numbers.

This fixes the dtest cql_cast_test.py:CQLCastTest.cast_varint_test.

Fixes #4960

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
263e18b625 types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer
It will be used when converting varint to integer too.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
2d453b8e17 types: fix decimal to integer conversion
The previous code was using the boost::multiprecision::cpp_rational to
integer conversion, but that doesn't have the same semantics an cql.

This patch avoids creating a cpp_rational in the first place and works
just with integers.

This fixes the dtest cql_cast_test.py:CQLCastTest.cast_decimal_test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:14 -07:00
Rafael Ávila de Espíndola
fb760774dd types: extract helper for converting a decimal to a cppint
It will also be used in the decimal to integer conversion.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 15:08:07 -07:00
Rafael Ávila de Espíndola
40e6882906 types: rename and detemplate make_castas_fctn_from_decimal_to_integer
It was only ever used for varint.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-04 14:54:47 -07:00
Avi Kivity
301246f6c0 storage_proxy: protect _view_update_handlers_list iterators from invalidation
on_down() iterates over _view_update_handlers_list, but it yields during iteration,
and while it yields, elements in that list can be removed, resulting in a
use-after-free.

Prevent this by registering iterators that can be potentially invalidated, and
any time we remove an element from the list, check whether we're removing an element
that is being pointed to by a live iterator. If that is the case, advance the iterator
so that it points at a valid element (or at the end of the list).

Fixes #4912.

Tests: unit (dev)
2019-09-04 17:19:28 +03:00
Tomasz Grabiec
9f5826fd4b Merge "Use canonical mutations for background schema sync" from Botond
Currently the background schema sync (push/pull) uses frozen mutation to
send the schema mutations over the wire to the remote node. For this to
work correctly, both nodes have to have the exact same schema for the
system schema tables, as attempting to unpack the frozen mutation with
the wrong schema leads to undefined behaviour.
To avoid this and to ensure syncing schema between nodes with different
schema table schema versions is defined we migrate the background
schema sync to use canonical mutations for the transfer of the schema
mutations. Canonical mutations are immune to this problem, as they
support deserializing with any version of the schema, older or newer
one.

The foreground schema sync mechanisms -- the on-demand schema pulls on
reads and writes -- already use canonical mutations to transmit the
schema mutations.

It is important to note that due to this change, column-level
incompatibilities between the schema mutations and the schema used to
deserialize them will be hidden. This is undesired and should be fixed
in a follow-up (#4956). Table level incompatibilities are detected and
schema mutations containing such mutations will be rejected just like before.

This patch adds canonical mutation support to the two background schema
sync verbs:
* `DEFINITIONS_UPDATE` (schema push)
* `MIGRATION_REQUEST` (schema pull)

Both verbs still support the old frozen mutation schema transfer, albeit
that path is now much less efficient. After all nodes are upgraded, the
pull verb can effectively avoid sending frozen mutations altogether,
completely migrating to canonical mutations. Unfortunately this was not
possible for the push verb, so that one now has an overhead as it needs
to send both the frozen and canonical mutations.

Fixes: #4273
2019-09-04 13:58:14 +02:00
Benny Halevy
bc29520eb8 flat_mutation_reader: consume_in_thread: add mutation_filter
For validating mutation_fragment's monotonicity.

Note: forwarding constructor allows implicit conversion by
current callers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-04 13:42:37 +03:00
Rafael Ávila de Espíndola
000514e7cc sstable: close file_writer if an exception in thrown
The previous code was not exception safe and would eventually cause a
file to be destroyed without being closed, causing an assert failure.

Unfortunately it doesn't seem to be possible to test this without
error injection, since using an invalid directory fails before this
code is executed.

Fixes #4948

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190904002314.79591-1-espindola@scylladb.com>
2019-09-04 13:28:55 +03:00
Botond Dénes
7adc764b6e messaging_service: add canonical_support to schema pull and push verbs
The verbs are:
* DEFINITIONS_UPDATE (push)
* MIGRATION_REQUEST (pull)

Support was added in a backward-compatible way. The push verb, sends
both the old frozen mutation parameter, and the new optional canonical
mutation parameter. It is expected that new nodes will use the latter,
while old nodes will fall-back to the former. The pull verb has a new
optional `options` parameter, which for now contains a single flag:
`remote_supports_canonical_mutation_retval`. This flag, if set, means
that the remote node supports the new canonical mutation return value,
thus the old frozen mutations return value can be left empty.
2019-09-04 10:32:44 +03:00
Botond Dénes
d9a8ff15d8 service::migration_manager: add canonical_mutation merge_schema_from() overload
Add an overload which takes a vector of canonical mutations. Going
forward, this is the overload to use.
2019-09-04 10:32:44 +03:00
Botond Dénes
e02b93cae1 schema_tables: convert_schema_to_mutations: return canonical_mutations
In preparation to the schema push/pull migrating to use canonical
mutations, convert the method producing the schema mutations to return a
vector of canonical mutations. The only user, MIGRATION_REQUEST verb,
converts the canonical mutations back to frozen mutations. This is very
inefficient, but this path will only be used in mixed clusters. After
all nodes are upgraded the verb will be sending the canonical mutations
directly instead.
2019-09-04 08:47:20 +03:00
Rafael Ávila de Espíndola
b100f95adc types: optimize type find implementation
This turns find into a template so there is only one switch over the
kind of each type in the search.

To evaluate the change in code size sizes, I added [[noinline]] to
find and obtained the following results.

The release columns for release in the before case have an extra column
because the functions are sufficiently complex to trigger gcc to split
them in hot + cold.

before:
                      dev                         release (hot + cold split)
find                  0x35f               = 863   0x3d5 + 0x112               = 1255
references_duration   0x62 + 0x22 + 0x8   = 140   0x55 + 0x1f + 0x2a + 0x8    = 166
references_user_type  0x6b + 0x26 + 0x111 = 418   0x65 + 0x1f + 0x32 + 0x11b  = 465

after:
                      dev                          release
find                  0xd6 + 0x1b4        = 650    0xd2 + 0x1f5               = 711
references_duration   0x13                = 19     0x13                       = 19
references_user_type  0x1a                = 26     0x21                       = 33

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-03 08:23:21 -07:00
Rafael Ávila de Espíndola
e0065b414e types: Avoid shared_ptr copies
They are somewhat expensive (in code size at least) and not needed
everywhere.

Inside the getter the variables are 'const data_type&', so we can
return that. Everything still works when a copy is needed, but in code
that just wants to check a property we avoid the copy.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-09-03 07:43:35 -07:00
Benny Halevy
bdfb73f67d scripts/create-relocatable-package: ldd: print executable name in exception
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190903080511.534-1-bhalevy@scylladb.com>
2019-09-03 15:34:38 +03:00
Avi Kivity
294a86122e Merge "nonroot installer" from Takuya
"
This is nonroot installer patchset v9.
"

* 'nonroot_v9' of https://github.com/syuu1228/scylla:
  dist/common/scripts: support nonroot mode on setup scripts
  reloc/python3: add install.sh on python relocatable package
  install.sh: add --nonroot mode
  dist/common/systemd: untemplataize *.service, use drop-in units instead
  dist/debian: delete debian/*.install, debian/*.dirs
2019-09-03 15:33:20 +03:00
Piotr Sarna
7b297865e1 transport: wait for the connections to finish when stopping (#4818)
During CQL request processing, a gate is used to ensure that
the connection is not shut down until all ongoing requests
are done. However, the gate might have been left too early
if the database was not ready to respond immediately - which
could result in trying to respond to an already closed connection
later. This issue is solved by postponing leaving the gate
until the continuation chain that handles the request is finished.

Refs #4808
2019-09-03 14:49:11 +03:00
Avi Kivity
8fb59915bb Merge "Minor cleanup patches for sstables" from Asias
* 'cleanup_sstables' of https://github.com/asias/scylla:
  sstables: Move leveled_compaction_strategy implementation to source file
  sstables: Include dht/i_partitioner.hh for dht::partition_range
2019-09-03 14:47:44 +03:00
Takuya ASADA
31ddb2145a dist/common/scripts: support nonroot mode on setup scripts
Since nonroot mode requires to run everything on non-privileged user,
most of setup scripts does not able to use nonroot mode.
We only provide following functions on nonroot mode:
 - EC2 check
 - IO setup
 - Node exporter installer
 - Dev mode setup
Rest of functions will be skipped on scylla_setup.
To implement nonroot mode on setup scripts, scylla_util provides
utility functions to abstract difference of directory structure between normal
installation and nonroot mode.
2019-09-03 20:06:35 +09:00
Takuya ASADA
cfa8885ae1 reloc/python3: add install.sh on python relocatable package
To support nonroot installation on scylla-python3, add install.sh on
scylla-python3 relocatable package.
2019-09-03 20:06:30 +09:00
Takuya ASADA
2de14e0800 install.sh: add --nonroot mode
This implements the way to install Scylla without requires root privilege,
not distribution dependent, does not uses package manager.
2019-09-03 20:06:24 +09:00
Takuya ASADA
cde798dba5 dist/common/systemd: untemplataize *.service, use drop-in units instead
Since systemd unit can override parameters using drop-in unit, we don't need
mustache template for them.

Also, drop --disttype and --target options on install.sh since it does not
required anymore, introduce --sysconfdir instead for non-redhat distributions.
2019-09-03 20:06:15 +09:00
Takuya ASADA
49a360f234 dist/debian: delete debian/*.install, debian/*.dirs
Since ac9b115, we switched to install.sh on Debian so we don't rely on .deb
specific packaging scripts anymore.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2019-09-03 20:06:09 +09:00
Benny Halevy
7827e3f11d tests: test_large_data: do not stop database
Now that compaction returns only after the compacted sstables are
deleted we no longer need to stop the base to force waiting
for deletes (that were previously done asynchronously)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
19b67d82c9 table::on_compaction_completion: fix indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
8dd6e13468 table::on_compaction_completion: wait for background deletes
Don't let background deletes accumulate uncontrollably.

Fixes #4909

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:38 +03:00
Benny Halevy
da6645dc2c table: refresh_snapshot before deleting any sstables
The row cache must not hold refrences to any sstable we're
about to delete.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-09-02 12:15:29 +03:00
Nadav Har'El
6c4ad93296 api/compaction_manager: do not hold map on the stack
Merged patch series by Amnon Heiman:

This patch fixes a bug that a map is held on the stack and then is used
by a future.

Instead, the map is now moved to the relevant lambda function.

Fixes #4824
2019-09-01 13:16:34 +03:00
Avi Kivity
e962beea20 toolchain: update to Fedora 30 and gcc 9.2
In Fedora 30 we have a new boost version, so we no longer need to
use our patched boost, so we also remove the scylladb/toolchain copr.
2019-09-01 12:05:26 +03:00
Piotr Sarna
23c891923e main: make sure view_builder doesn't propagate semaphore errors
Stopping services which occurs in a destructor of deferred_action
should not throw, or it will end the program with
terminate(). View builder breaks a semaphore during its shutdown,
which results in propagating a broken_semaphore exception,
which in turn results in throwing an exception during stop().get().
In order to fix that issue, semaphore exceptions are explicitly
ignored, since they're expected to appear during shutdown.

Fixes #4875
2019-09-01 11:59:57 +03:00
Tomasz Grabiec
c8f8a9450f Merge "Improve cpu instruction set support checks" from Avi
To prevent termination with SIGILL, tighten the instruction set
support checks. First, check for CLMUL too. Second, add a check in
scylla_prepare to catch the problem early.

Fixes #4921.
2019-08-30 16:54:04 +02:00
Avi Kivity
07010af44c scylla_prepare: verify processor satisfies instruction set requirements
Scylla requires the CLMUL and SSE 4.2 instruction sets and will fail without them.
There is a check in main(), but that happens after the code is running and it may
already be too late. Add a check in scylla_prepare which runs before the main
executable.
2019-08-29 15:34:29 +03:00
Avi Kivity
9579946e72 main: extend CPU feature check to verify that PCLMUL is available
Since 79136e895f, we use the pclmul instruction set,
so check it is there.
2019-08-29 15:13:32 +03:00
Gleb Natapov
e61a86bbb2 to_string: Add operator<< overload for std::tuple.
Message-Id: <20190829100902.GN21540@scylladb.com>
2019-08-29 13:35:02 +03:00
Rafael Ávila de Espíndola
036f51927c sstables: Remove unused include
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190827210424.37848-1-espindola@scylladb.com>
2019-08-28 11:32:44 +03:00
Benny Halevy
869b518dca sstables: auto-delete unsealed sstables
Fixes #4807

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190827082044.27223-1-bhalevy@scylladb.com>
2019-08-28 09:46:17 +03:00
Botond Dénes
969aa22d51 configure.py: promote unused result warning to error
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190827111428.6829-2-bdenes@scylladb.com>
2019-08-28 09:46:17 +03:00
Botond Dénes
480b42b84f tests/gossip_test: silence discarded future warning
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190827111428.6829-1-bdenes@scylladb.com>
2019-08-28 09:46:17 +03:00
Avi Kivity
d85339e734 Update seastar submodule
* seastar 20bfd61955...cb7026c16f (2):
  > net: dpdk: suppress discarded future warning
  > Merge "Optimize promises in then/then_wrapped" from Rafael
2019-08-28 09:46:17 +03:00
Avi Kivity
f1d73d0c13 Merge "systemd: put scylla processes in systemd slices. #4743" from Glauber
"
It is well known that seastar applications, like Scylla, do not play
well with external processes: CPU usage from external processes may
confuse the I/O and CPU schedulers and create stalls.

We have also recently seen that memory usage from other application's
anonymous and page cache memory can bring the system to OOM.

Linux has a very good infrastructure for resource control contributed by
amazingly bright engineers in the form of cgroup controllers. This
infrastructure is exposed by SystemD in the form of slices: a
hierarchical structure to which controllers can be attached.

In true systemd way, the hierarchy is implicit in the filenames of the
slice files. a "-" symbol defines the hierarchy, so the files that this
patch presents, scylla-server and scylla-helper, essentially create a
"scylla" cgroup at the top level with "server" and "helper" children.

Later we mark the Services needed to run scylla as belonging to one
or the other through the Slice= directive.

Scylla DBAs can benefit from this setup by using the systemd-run
utility to fire ad-hoc commands.

Let's say for example that someone wants to hypothetically run a backup
and transfer files to an external object store like S3, making sure that
the amount of page cache used won't create swap pressure leading to
database timeouts.

One can then run something like:

sudo systemd-run --uid=id -u scylla --gid=id -g scylla -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool

(or even better, the backup tool can itself be a systemd timer)
"

* 'slices' of https://github.com/glommer/scylla:
  systemd: put scylla processes in systemd slices.
  move postinst steps to an external script
2019-08-26 20:16:55 +03:00
Benny Halevy
20083be9f6 sstables: delete_atomically: fix misplaced parenthesis in pending_delete_log warning message
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190818064637.9207-1-bhalevy@scylladb.com>
2019-08-26 19:50:21 +03:00
Avi Kivity
b9e9d7d379 Merge "Resolve discarded future warnings" from Botond
"
The warning for discarded futures will only become useful, once we can
silence all present warnings and flip the flag to make it become error.
Then it will start being useful in finding new, accidental discarding of
futures.
This series silences all remaining warnings in the Scylla codebase. For
those cases where it was obvious that the future is discarded on
purpose, the author taking all necessary precaution (handling exception)
the warning was simply silenced by casting the future to void and
adding a relevant comment. Where the discarding seems to have been done
in error, I have fixed the code to not discard it. To the rest of the
sites I added a FIXME to fix the discarding.
"

* 'resolve-discarded-future-warnings/v4.2' of https://github.com/denesb/scylla:
  treewide: silence discarded future warnings for questionable discards
  treewide: silence discarded future warnings for legit discards
  tests: silence discarded future warnings
  tests/cql_query_test.cc: convert some tests to thread
2019-08-26 19:40:25 +03:00
Botond Dénes
136fc856c5 treewide: silence discarded future warnings for questionable discards
This patches silences the remaining discarded future warnings, those
where it cannot be determined with reasonable confidence that this was
indeed the actual intent of the author, or that the discarding of the
future could lead to problems. For all those places a FIXME is added,
with the intent that these will be soon followed-up with an actual fix.
I deliberately haven't fixed any of these, even if the fix seems
trivial. It is too easy to overlook a bad fix mixed in with so many
mechanical changes.
2019-08-26 19:28:43 +03:00
Botond Dénes
fddd9a88dd treewide: silence discarded future warnings for legit discards
This patch silences those future discard warnings where it is clear that
discarding the future was actually the intent of the original author,
*and* they did the necessary precautions (handling errors). The patch
also adds some trivial error handling (logging the error) in some
places, which were lacking this, but otherwise look ok. No functional
changes.
2019-08-26 18:54:44 +03:00
Botond Dénes
cff4c4932d tests: silence discarded future warnings 2019-08-26 18:54:44 +03:00
Botond Dénes
486fa8c10c tests/cql_query_test.cc: convert some tests to thread
Some tests are currently discarding futures unjustifiably, however
adding code to wait on these futures is quite inconvenient due to the
continuation style code of these tests. Convert them to run in a seastar
thread to make the fix easier.
2019-08-26 18:54:44 +03:00
Tomasz Grabiec
ac5ff4994a service: Announce the new schema version when features are enabled
Introduced in c96ee98.

We call update_schema_version() after features are enabled and we
recalculate the schema version. This method is not updating gossip
though. The node will still use it's database::version() to decide on
syncing, so it will not sync and stay inconsistent in gossip until the
next schema change.

We should call updatE_schema_version_and_announce() instead so that
the gossip state is also updated.

There is no actual schema inconsistency, but the joining node will
think there is and will wait indefinitely. Making a random schema
change would unbock it.

Fixes #4647.

Message-Id: <1566825684-18000-1-git-send-email-tgrabiec@scylladb.com>
2019-08-26 17:54:59 +03:00
Avi Kivity
a7b82af4c3 Update seastar submodule
* seastar afc5bbf511...20bfd61955 (18):
  > reactor: closing file used to check if direct_io is supported
  > future: set_coroutine(): s/state()/_state/
  > tests/perf/perf_test.hh: suppress discarded future warning
  > tests: rpc: fix memory leak in timeout wraparound tests
  > Revert "future-util: reduce allocations and continuations in parallel_for_each"
  > reactor: fix rename_priority_class() build failure in C++14 mode
  > future: mark future_state_base::failed() as unlikely
  > future-util: reduce allocations and continuations in parallel_for_each
  > future-utils: generalize when_all_estimate_vector_capacity()
  > output_stream: Add comment on sequentiality
  > docs/tutorial.md: minor cleanups in first section
  > core: fix a race in execution stages (Fixes #4856, fixes #4766)
  > semaphore: use semaphore's clock type in with_semaphore()/get_units()
  > future: fix doxygen documentation for promise<>
  > sharded: fixed detecting stop method when building with clang
  > reactor: fixed locking error in rename_priority_class
  > Assert that append_challenged_posix_file_impl are closed.
  > rpc: correctly handle huge timeouts
2019-08-26 15:37:58 +03:00
Asias He
3ea1255020 storage_service: Use sleep_abortable instead of sleep (#4697)
Make the sleep abortable so that it is able to break the loop during
shutdown.

Fixes #4885
2019-08-26 13:35:44 +03:00
Asias He
2f24fd9106 sstables: Move leveled_compaction_strategy implementation to source file
It is better than putting everything in header.
2019-08-26 16:49:48 +08:00
Asias He
b69138c4e4 sstables: Include dht/i_partitioner.hh for dht::partition_range
Get rid of one FIXME.
2019-08-26 16:35:18 +08:00
Nadav Har'El
b60d201a11 API: column_family.cc Add get_built_indexes implmentation
Merged patch series from Amnon Heiman amnon@scylladb.com

This Patch adds an implementation of the get built index API and remove a
FIXME.

The API returns a list of secondary indexes belongs to a column family
and have already been fully built.

Example:

CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) );
CREATE index on scylla_demo.mytableID (time);

$ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid'
["mytableid_time_idx"]
2019-08-25 18:37:44 +03:00
Amnon Heiman
2d3185fa7d column_family.cc: remove unhandle future
The sum_ratio struct is a helper struct that is used when calculating
ratio over multiple shards.

Originally it was created thinking that it may need to use future, in
practice it was never used and the future was ignore.

This patch remove the future from the implementation and reduce an
unhandle future warning from the compilation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-25 16:51:14 +03:00
Amnon Heiman
21dee3d8ef API:column_family.cc Add get_build_index implmentation
This Patch adds an implementation of the get build index API and remove a
FIXME.

The API returns the list of the built secondary indexes belongs to a column family.

Example:

CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE scylla_demo.mytableID (     uid uuid,     text text,     time timeuuid,     PRIMARY KEY (uid, time) );
CREATE index on scylla_demo.mytableID (time);

$ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid'
["mytableid_time_idx"]

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-25 16:46:49 +03:00
Juliana Oliveira
711ed76c82 auth: standard_role_manager: read null columns as false
When a role is created through the `create role` statement, the
'is_superuser' and 'can_login' columns are set to false by default.
Likewise, `list roles`, `alter roles` and `* roles` operations
expect to find a boolean when reading the same columns.

This is not the case, though, when a user directly inserts to
`system_auth.roles` and doesn't set those columns. Even though
manually creating roles is not a desired day-to-day operation,
it is an insert just like any other and it should work.

`* roles` operations, on the other hand, are not prepared for
this deviations. If a user manually creates a role and doesn't
set boolean values to those columns, `* roles` will return all
sorts of errors. This happens because `* roles` is explicitly
expecting a boolean and casting for it.

This patch makes `* roles` more friendly by considering the
boolean variable `false` - inside `* roles` context - if the
actual value is `null`; it won't change the `null` value.

Fixes #4280

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20190816032617.61680-1-juliana@scylladb.com>
2019-08-25 11:52:43 +03:00
Pekka Enberg
118a141f5d scylla_blocktune.py: Kill btrfs related FIXME
The scylla_blocktune.py has a FIXME for btrfs from 2016, which is no
longer relevant for Scylla deployments, as Red Hat dropped support for
the file system in 2017.

Message-Id: <20190823114013.31112-1-penberg@scylladb.com>
2019-08-24 20:40:08 +03:00
Botond Dénes
18581cfb76 multishard_mutation_query: create_readr(): use the caller's priority class
The priority class the shard reader was created with was hardcoded to be
`service::get_local_sstable_query_read_priority()`. At the time this
code was written, priority classes could not be passed to other shards,
so this method, receiving its priority class parameters from another
shard, could not use it. This is now fixed, so we can just use whatever
the caller wants us to use.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190823115111.68711-1-bdenes@scylladb.com>
2019-08-23 16:10:43 +02:00
Tomasz Grabiec
080989d296 Merge "cql3: cartesian product limits" from Avi
Cartesian products (generated by IN restrictions) can grow very large,
even for short queries. This can overwhelm server resources.

Add limit checking for cartesian products, and configuration items for
users that are not satisfied with the default of 100 records fetched.

Fixes #4752.

Tests: unit (dev), manual test with SIGHUP.
2019-08-21 19:35:59 +02:00
Avi Kivity
67b0d379e0 main: add glue between db::config and cql3::cql_config
Copy values between the flat db::config and the hierarchical cql_config, adding
observers to keep the values updated.
2019-08-21 19:35:59 +02:00
Avi Kivity
8c7ad1d4cd cql: single_column_clustering_key_restrictions: limit cartesian products
Cartesian products (via IN restrictions) make it easy to generate huge
primary key sets with simple queries, overflowing server resources. Limit
them in the coordinator and report an exception instead of trying to
execute a query that would consume all of our memory.

A unit test is added.
2019-08-21 19:35:59 +02:00
Avi Kivity
3a44fa9988 cql3, treewide: introduce empty cql3::cql_config class and propagate it
We need a way to configure the cql interpreter and runtime. So far we relied
on accessing the configuration class via various backdoors, but that causes
its own problems around initialization order and testability. To avoid that,
this patch adds an empty cql_config class and propagates it from main.cc
(and from tests) to the cql interpreter via the query_options class, which is
already passed everywhere.

Later patches will fill it with contents.
2019-08-21 19:35:59 +02:00
Rafael Ávila de Espíndola
86c29256eb types: Fix references_user_type
This was broken since the type refactoring. It was checking the static
type, which is always abstract_type. Unfortunately we only had dtests
for this.

This can probably be optimized to avoid the double switch over kind,
but it is probably better to do the simple fix first.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190821155354.47704-1-espindola@scylladb.com>
2019-08-21 19:13:59 +03:00
Dejan Mircevski
ea9d358df9 cql3: Optimize LIKE regex construction
Currently we create a regex from the LIKE pattern for every row
considered during filtering, even though the pattern is always the
same.  This is wasteful, especially since we require costly
optimization in the regex compiler.  Fix it by reusing the regex
whenever the pattern is unchanged since the last call.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-21 16:45:47 +03:00
Piotr Sarna
526f4c42aa storage_proxy: fix iterator liveness issue in on_down (#4876)
The loop over view update handlers used a guard in order to ensure
that the object is not prematurely destroyed (thus invalidating
the iterator), but the guard itself was not in the right scope.
Fixed by replacinga 'for' loop with a 'while' loop, which moves
the iterator incrementation inside the scope in which it's still
guarded and valid.

Fixes #4866
2019-08-21 15:44:43 +03:00
Avi Kivity
4ef7429c4a build: build seastar in build directory
Currently, seastar is built in seastar/build/{mode}. This means we have two build
directories: build/{mode} and seastar/build/{mode}.

This patch changes that to have only a single build directory (build/{mode}). It
does that by calling Seastar's cmake directly instead of through Seastar's
./configure.py.  However, to support dpdk, if that is enabled it calls cmake
through Seastar's ./cooking.sh (similar to what Seastar's ./configure.py does).

All ./configure.py flags are translated to cmake variables, in the same way that
Seastar does.

Contains fix from Rafael to pass the flags for the correct mode.
2019-08-21 13:10:17 +02:00
Rafael Ávila de Espíndola
278b6abb2b Improve documentation on the system.large_* tables
This clarifies that "rows" are clustering rows and that there is no
information about individual collection elements.

The patch also documents some properties common to all these tables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190820171204.48739-1-espindola@scylladb.com>
2019-08-21 10:36:25 +03:00
Vlad Zolotarov
d253846c91 hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
The endpoint directories scanned by space_watchdog may get deleted
by the manager::drain_for().

If a deleted directory is given to a lister::scan_dir() this will end up
in an exception and as a result a space_watchdog will skip this round
and hinted handoff is going to be disabled (for all agents including MVs)
for the whole space_watchdog round.

Let's make sure this doesn't happen by serializing the scanning and deletion
using end_point_hints_manager::file_update_mutex.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:46:46 -04:00
Vlad Zolotarov
b34c36baa2 hinted handoff: make taking file_update_mutex safe
end_point_hints_manager::file_update_mutex is taken by space_watchdog
but while space_watchdog is waiting for it the corresponding
end_point_hints_manager instance may get destroyed by manager::drain_for()
or by manager::stop().

This will end up in a use-after-free event.

Let's change the end_point_hints_manager's API in a way that would prevent
such an unsafe locking:

   - Introduce the with_file_update_mutex().
   - Make end_point_hints_manager::file_update_mutex() method private.

Fixes #4685
Fixes #4836

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:26:19 -04:00
Vlad Zolotarov
dbad9fcc7d db::hints::manager::drain_for(): fix alignment
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Vlad Zolotarov
7a12b46fc9 db::hints::manager: serialize calls to drain_for()
If drain_for() is running together with itself: one instance for the local
node and one for some other node, erasing of elements from the _ep_managers
map may lead to a use-after-free event.

Let's serialize drain_for() calls with a semaphore.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Vlad Zolotarov
09600f1779 db::hints: cosmetics: identation and missing method qualifier
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 10:58:36 -04:00
Avi Kivity
698b72b501 relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
2019-08-20 00:25:43 +02:00
Botond Dénes
4cb873abfe query::trim_clustering_row_ranges_to(): fix handling of non-full prefix keys
Non-full prefix keys are currently not handled correctly as all keys
are treated as if they were full prefixes, and therefore they represent
a point in the key space. Non-full prefixes however represent a
sub-range of the key space and therefore require null extending before
they can be treated as a point.
As a quick reminder, `key` is used to trim the clustering ranges such
that they only cover positions >= then key. Thus,
`trim_clustering_row_ranges_to()` does the equivalent of intersecting
each range with (key, inf). When `key` is a prefix, this would exclude
all positions that are prefixed by key as well, which is not desired.

Fixes: #4839
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190819134950.33406-1-bdenes@scylladb.com>
2019-08-20 00:24:51 +02:00
Avi Kivity
21d6f0bb16 Merge "Add LIKE test cases for all non-string types #4859" from Dejan
"
Follow-up to #4610, where a review comment asked for test coverage on all types. Existing tests cover all the types admissible in LIKE, while this PR adds coverage for all inadmissible types.

Tests: unit (dev)
"

* 'like-nonstring' of https://github.com/dekimir/scylla:
  cql_query_test: Add LIKE tests for all types
  cql_query_test: Remove LIKE-nonstring-pattern case
  cql_query_test: Move a testcase elsewhere in file
2019-08-20 00:24:51 +02:00
Tomasz Grabiec
6813ae22b0 Merge "Handle termination signals during streaming" from Avi
In b197924, we changed the shutdown process not to rely on the global
reactor-defined exit, but instead added a local variable to hold the
shutdown state. However, we did not propagate that state everywhere,
and now streaming processes are not able to abort.

Fix that by enhancing stop_signal with a sharded<abort_source> member
that can be propagated to services. Propagate it to storage_service
and thence to boot_strapper and range_streamer so that streaming
processes can be aborted.

Fixes #4674
Fixes #4501

Tests: unit (dev), manual bootstrap test
2019-08-20 00:24:51 +02:00
Avi Kivity
2c7435418a Merge "database: assign proper io priority for streaming view updates" from Piotr
"
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.

Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...}

Tests: unit(dev)
"

* 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla:
  db,view: wrap view update generation in stream scheduling group
  database: assign proper io priority for streaming view updates
2019-08-20 00:24:51 +02:00
Pekka Enberg
d0eecbf3bb api/storage_proxy: Wire up hinted-handoff status to API
We support hinted-handoff now, so let's return it's status via the API.

Message-Id: <20190819080006.18070-1-penberg@scylladb.com>
2019-08-20 00:24:50 +02:00
Piotr Sarna
3cc5a04301 db,view: wrap view update generation in stream scheduling group
Generating view updates is used by streaming, so the service itself
should also run under the matching scheduling group.
2019-08-20 00:24:50 +02:00
Piotr Sarna
1ab07b80b4 database: assign proper io priority for streaming view updates
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.
2019-08-20 00:24:50 +02:00
Tomasz Grabiec
b9447d0319 Revert "relocatable: switch from run-time relocation to install-time relocation"
This reverts commit 4ecce2d286.

Should be committed via the next branch.
2019-08-20 00:22:30 +02:00
Avi Kivity
4ecce2d286 relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
2019-08-20 00:20:19 +02:00
Glauber Costa
da260ecd61 systemd: put scylla processes in systemd slices.
It is well known that seastar applications, like Scylla, do not play
well with external processes: CPU usage from external processes may
confuse the I/O and CPU schedulers and create stalls.

We have also recently seen that memory usage from other application's
anonymous and page cache memory can bring the system to OOM.

Linux has a very good infrastructure for resource control contributed by
amazingly bright engineers in the form of cgroup controllers. This
infrastructure is exposed by SystemD in the form of slices: a
hierarchical structure to which controllers can be attached.

In true systemd way, the hierarchy is implicit in the filenames of the
slice files. a "-" symbol defines the hierarchy, so the files that this
patch presents, scylla-server and scylla-helper, essentially create a
"scylla" cgroup at the top level with "server" and "helper" children.

Later we mark the Services needed to run scylla as belonging to one
or the other through the Slice= directive.

Scylla DBAs can benefit from this setup by using the systemd-run
utility to fire ad-hoc commands.

Let's say for example that someone wants to hypothetically run a backup
and transfer files to an external object store like S3, making sure that
the amount of page cache used won't create swap pressure leading to
database timeouts.

One can then run something like:

```
   sudo systemd-run --uid=`id -u scylla` --gid=`id -g scylla` -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool
```

(or even better, the backup tool can itself be a systemd timer)

Changes from last version:
- No longer use the CPUQuota
- Minor typo fixes
- postinstall fixup for small machines

Benchmark results:
==================

Test: read from disk, with 100% disk util using a single i3.xlarge (4 vCPUs).
We have to fill the cache as we read, so this should stress CPU, memory and
disk I/O.

cassandra-stress command:
```
  cassandra-stress read no-warmup duration=5m -rate threads=20 -node 10.2.209.188 -pop dist=uniform\(1..150000000\)
```

Baseline results:

```
Results:
Op rate                   :   13,830 op/s  [READ: 13,830 op/s]
Partition rate            :   13,830 pk/s  [READ: 13,830 pk/s]
Row rate                  :   13,830 row/s [READ: 13,830 row/s]
Latency mean              :    1.4 ms [READ: 1.4 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.8 ms [READ: 2.8 ms]
Latency 99.9th percentile :    3.4 ms [READ: 3.4 ms]
Latency max               :   12.0 ms [READ: 12.0 ms]
Total partitions          :  4,149,130 [READ: 4,149,130]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Question 1:
===========

Does putting scylla in a special slice affect its performance ?

Results with Scylla running in a slice:

```
Results:
Op rate                   :   13,811 op/s  [READ: 13,811 op/s]
Partition rate            :   13,811 pk/s  [READ: 13,811 pk/s]
Row rate                  :   13,811 row/s [READ: 13,811 row/s]
Latency mean              :    1.4 ms [READ: 1.4 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.2 ms [READ: 2.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.6 ms]
Latency 99.9th percentile :    3.3 ms [READ: 3.3 ms]
Latency max               :   23.2 ms [READ: 23.2 ms]
Total partitions          :  4,151,409 [READ: 4,151,409]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion* : No significant change

Question 2:
===========

What happens when there is a CPU hog running in the same server as scylla?

CPU hog:

```
   taskset -c 0 /bin/sh -c "while true; do true; done" &
   taskset -c 1 /bin/sh -c "while true; do true; done" &
   taskset -c 2 /bin/sh -c "while true; do true; done" &
   taskset -c 3 /bin/sh -c "while true; do true; done" &
   sleep 330
```

Scenario 1: CPU hog runs freely:

```
Results:
Op rate                   :    2,939 op/s  [READ: 2,939 op/s]
Partition rate            :    2,939 pk/s  [READ: 2,939 pk/s]
Row rate                  :    2,939 row/s [READ: 2,939 row/s]
Latency mean              :    6.8 ms [READ: 6.8 ms]
Latency median            :    5.3 ms [READ: 5.3 ms]
Latency 95th percentile   :   11.0 ms [READ: 11.0 ms]
Latency 99th percentile   :   14.9 ms [READ: 14.9 ms]
Latency 99.9th percentile :   17.1 ms [READ: 17.1 ms]
Latency max               :   26.3 ms [READ: 26.3 ms]
Total partitions          :    884,460 [READ: 884,460]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Scenario 2: CPU hog runs inside scylla-helper slice

```
Results:
Op rate                   :   13,527 op/s  [READ: 13,527 op/s]
Partition rate            :   13,527 pk/s  [READ: 13,527 pk/s]
Row rate                  :   13,527 row/s [READ: 13,527 row/s]
Latency mean              :    1.5 ms [READ: 1.5 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile :    3.8 ms [READ: 3.8 ms]
Latency max               :   18.7 ms [READ: 18.7 ms]
Total partitions          :  4,069,934 [READ: 4,069,934]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion*: With systemd slice we can keep the performance very close to
baseline

Question 3:
===========

What happens when there is a CPU hog running in the same server as scylla?

I/O hog: (Data in the cluster is 2x size of memory)

```
while true; do
	find /var/lib/scylla/data -type f -exec grep glauber {} +
done
```

Scenario 1: I/O hog runs freely:

```
Results:
Op rate                   :    7,680 op/s  [READ: 7,680 op/s]
Partition rate            :    7,680 pk/s  [READ: 7,680 pk/s]
Row rate                  :    7,680 row/s [READ: 7,680 row/s]
Latency mean              :    2.6 ms [READ: 2.6 ms]
Latency median            :    1.3 ms [READ: 1.3 ms]
Latency 95th percentile   :    7.8 ms [READ: 7.8 ms]
Latency 99th percentile   :   10.9 ms [READ: 10.9 ms]
Latency 99.9th percentile :   16.9 ms [READ: 16.9 ms]
Latency max               :   40.8 ms [READ: 40.8 ms]
Total partitions          :  2,306,723 [READ: 2,306,723]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

Scenario 2: I/O hog runs in the scylla-helper systemd slice:

```
Results:
Op rate                   :   13,277 op/s  [READ: 13,277 op/s]
Partition rate            :   13,277 pk/s  [READ: 13,277 pk/s]
Row rate                  :   13,277 row/s [READ: 13,277 row/s]
Latency mean              :    1.5 ms [READ: 1.5 ms]
Latency median            :    1.4 ms [READ: 1.4 ms]
Latency 95th percentile   :    2.4 ms [READ: 2.4 ms]
Latency 99th percentile   :    2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile :    3.5 ms [READ: 3.5 ms]
Latency max               :  183.4 ms [READ: 183.4 ms]
Total partitions          :  3,984,080 [READ: 3,984,080]
Total errors              :          0 [READ: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:05:00
```

*Conclusion*: With systemd slice we can keep the performance very close to
baseline

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-08-19 14:31:28 -04:00
Avi Kivity
c32f9a8f7b dht: check for aborts during streaming
Propagate the abort_source from main() into boot_strapper and range_stream and
check for aborts at strategic points. This includes aborting running stream_plans
and aborting sleeps between retries.

Fixes #4674
2019-08-18 20:41:07 +03:00
Avi Kivity
5af6f5aa22 main: expose SIGINT/SIGTERM as abort_source
In order to propagate stop signals, expose them as sharded<abort_source>. This
allows propagating the signal to all shards, and integrating it with
sleep_abortable().

Because sharded<abort_source>::stop() will block, we'll now require stop_signal
to run in a thread (which is already the case).
2019-08-18 20:15:26 +03:00
Avi Kivity
20aed3398d Merge "Simplify types" from Rafael
"
This is hopefully the last large refactoring on the way of UDF.

In UDF we have to convert internal types to Lua and back. Currently
almost all our types and hidden in types.cc and expose functionality
via virtual functions. While it should be possible to add a
convert_{to|from}_lua virtual functions, that seems like a bad design.

In compilers, the type definition is normally public and different
passes know how to reason about each type. The alias analysis knows
about int and floats, not the other way around.

This patch series is inspired by both the LLVM RTTI
(https://www.llvm.org/docs/HowToSetUpLLVMStyleRTTI.html) and
std::variant.

The series makes the types public, adds a visit function and converts
the various virtual methods to just use visit. As a small example of
why this is useful, it then moves a bit of cql3 and json specific
logic out of types.cc and types.hh. In a similar way, the UDF code
will be able to used visit to convert objects to Lua.

In comparison with the previous versions, this series doesn't require the intermediate step of converting void* to data_value& in a few member functions.

This version also has fewer double dispatches I a am fairly confident has all the tools for avoiding all double dispatches.
"

* 'simplify-types-v3' of https://github.com/espindola/scylla: (80 commits)
  types: Move abstract_type visit to a header
  types: Move uuid_type_impl to a header
  types: Move inet_addr_type_impl to a header
  types: Move varint_type_impl to a header
  types: Move timeuuid_type_impl to a header
  types: Move date_type_impl to a header
  types: Move bytes_type_impl to a header
  types: Move utf8_type_impl to a header
  types: Move ascii_type_impl to a header
  types: Move string_type_impl to a header
  types: Move time_type_impl to a header
  types: Move simple_date_type_impl to a header
  types: Move timestamp_type_impl to a header
  types: Move duration_type_impl to a header
  types: Move decimal_type_impl to a header
  types: Move floating point types  to a header
  types: Move boolean_type_impl to a header
  types: Move integer types to a header
  types: Move integer_type_impl to a header
  types: Move simple_type_impl to a header
  ...
2019-08-18 19:04:05 +03:00
Takuya ASADA
f574112301 dist/debian: handle --dist correctly
On ac9b115, it mistakenly ignores --dist option.
It should set 'housekeeping' template variable to 'enable'.

Fixes #4857

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190816120127.14099-1-syuu@scylladb.com>
2019-08-18 15:00:33 +03:00
Avi Kivity
14d40cc659 Update seastar submodule
* seastar fe2b5b0c6...afc5bbf51 (4):
  > Merge "handle discarded futures or suppress warning" from Benny
  > Remove variadic futures from the Seastar implementation
  > Revert "Merge "handle discarded futures or suppress warning" from Benny"
  > io_queue: Forward declare smp class
2019-08-17 12:18:18 +03:00
Dejan Mircevski
48bb89fcb7 cql_query_test: Add LIKE tests for all types
As requested in a prior code review [1], ensure that LIKE cannot be
used on any non-string type.

[1] https://github.com/scylladb/scylla/pull/4610#pullrequestreview-255590129

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:55:35 -04:00
Dejan Mircevski
ef071bf7ce cql_query_test: Remove LIKE-nonstring-pattern case
This testcase was previously commented out, pending a fix that cannot
be made.  Currently it is impossible to validate the marker-value type
at filtering time.  The value is entered into the options object under
its presumed type of string, regardless of what it was made from.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:07:44 -04:00
Dejan Mircevski
20e688e703 cql_query_test: Move a testcase elsewhere in file
Somehow this test case sits in the middle of LIKE-operator tests:
test_alter_type_on_compact_storage_with_no_regular_columns_does_not_crash

Move it so LIKE test cases are contiguous.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-16 17:07:44 -04:00
Glauber Costa
ffc328c924 move postinst steps to an external script
There are systemd-related steps done in both rpm and deb builds.
Move that to a script so we avoid duplication.

The tests are so far a bit specific to the distributions, so it
needs to be adapted a bit.

Also note that this also fixes a bug with rpm as a side-effect:
rpm does not call daemon-reload after potentially changing the
systemd files (it is only implied during postun operations, that
happen during uninstall). daemon-reload was called explicitly for
debian packages, and now it is called for both.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-08-15 10:43:17 -04:00
Rafael Ávila de Espíndola
7f0a434cfa types: Move abstract_type visit to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
dccefd1ddb types: Move uuid_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
038728a381 types: Move inet_addr_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
1966416cb3 types: Move varint_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
9229f99c86 types: Move timeuuid_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
993f132619 types: Move date_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
a299ed3b9b types: Move bytes_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
09ac2a1bc6 types: Move utf8_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
da472a65ec types: Move ascii_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
b98bac65b0 types: Move string_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
3e5b1e2630 types: Move time_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
909df932ac types: Move simple_date_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
8f3bebb6e8 types: Move timestamp_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:43 -07:00
Rafael Ávila de Espíndola
3260153d35 types: Move duration_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
2f6a26b1c1 types: Move decimal_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
480ca52b59 types: Move floating point types to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
6a4ec7488e types: Move boolean_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
404b26d3fa types: Move integer types to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
bd3e725605 types: Move integer_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
03aca28dc5 types: Move simple_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
e8ba37fa5a types: Move counter_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
cb03c79a48 types: Move empty_type_impl to a header
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
1cb7127bf3 types: Make abstract_type::serialize a static helper
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
b175657ee7 types: Devirtualize abstract_type::validate
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:42 -07:00
Rafael Ávila de Espíndola
bf96f1111c types: Make abstract_type::serialized_size a static helper
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 16:25:41 -07:00
Rafael Ávila de Espíndola
6831e05471 types: Move functions that use abstract_type::serialized_size out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
047e34a31d types: Remove serialize_value
It is no longer needed.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1e0663c56c types: Devirtualize abstract_type::from_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
68b26047cc types: Devirtualize abstract_type::serialize
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
18da5f9001 types: Devirtualize abstract_type::from_json_object
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
b987b2dcbe types: Devirtualize abstract_type::to_json_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
b4bc888eac types: Refactor abstract_type::serialized_size
The following logic was duplicated:

* For all types, if value is null, the result is zero.
* For non collection types, if the native object is empty, the result
  is zero.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
968365b7e3 types: Devirtualize abstract_type::serialized_size
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
793bc50d69 types: Delete abstract_type::validate_collection_member
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
37686964f0 types: Devirtualize abstract_type::hash
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
396f5c7656 types: Devirtualize abstract_type::native_typeid
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
492043a77d types: Devirtualize abstract_type::native_value_delete
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
4d849d7742 types: Devirtualize abstract_type::native_value_clone
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ba887b7e56 types: Delete abstract_type::native_value_destroy
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5c0e78d70c types: Delete abstract_type::native_value_move
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
2bc6471a1e types: Delete abstract_type::native_value_copy
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
33394dfdc1 types: Delete abstract_type::native_value_size
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
c22ca2f9c9 types: Delete abstract_type::native_value_alignment
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
37c0f5b985 types: Devirtualize get_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
f633f70616 types: Devirtualize abstract_type::is_value_compatible_with_internal
It now is a static helper.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
19c9a033d9 types: Devirtualize abstract_type::is_compatible_with
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
d245d08045 types: Devirtualize abstract_type::is_string
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ae30d78ca9 types: Devirtualize abstract_type::equal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
f087756684 types: Implement less with compare
We defined less for some types and compare for others. There is no
type for which compare is substantially more expensive, so define it
for all types and implement less with compare.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
9bbf55e9c0 types: Devirtualize abstract_type::compare
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a5daa8d258 types: Devirtualize abstract_type::less
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a3e898a648 types: Devirtualize abstract_type::deserialize
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
8145faa66f types: Inline is_byte_order_comparable into only user
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
325418db16 types: Devirtualize abstract_type::is_byte_order_comparable
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
d2b063877b types: Devirtualize abstract_type::is_byte_order_equal
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
21da060b24 types: Devirtualize abstract_type::update_user_type
The type walking is similar to what the find function does, but
refactoring it doesn't seem worth it if these are the only two uses.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ae6e96a1e2 types: Refactor references_duration and references_user_type
With this patch the logic for walking all nested types is moved to a
helper function. It also fixes reversed_type_impl not being handled in
references_duration.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
25a5631a46 types: Devirtualize abstract_type::references_user_type
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
544337f380 types: Devirtualize abstract_type::references_duration
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a6b48bda03 types: Devirtualize abstract_type::is_native
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
f5b4fe5685 types: Devirtualize abstract_type::is_atomic
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
ec09fb94cb types: Devirtualize abstract_type::is_multi_cell
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1bea7747ce types: Devirtualize abstract_type::is_tuple
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1581805a8d types: Devirtualize abstract_type::is_collection
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
1137695cb2 types: Devirtualize abstract_type::is_counter
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
d3ba0d132a types: Devirtualize abstract_type::is_user_type
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
0ff539500f types: Devirtualize abstract_type::cql3_type_name_impl
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5314b489e3 types: Devirtualize abstract_type::get_cql3_kind_impl
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
2f0c64844f types: Devirtualize abstract_type::is_reversed
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
33d2ec8e1c types: Devirtualize abstract_type::underlying_type
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
064db9b92e types: Devirtualize abstract_type::to_string_impl
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
69d6fd21d2 types: Add a listlike_collection_type_impl class
With this we can share code that wants to access the element type of
set and list.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
a4837301a6 types: Move _is_multi_cell to collection_type_impl
It was duplicated in each concrete collection type.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
de6d6c46a1 types: Remove collection_type_impl::kind
All uses have been switched to abstract_type::kind.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
c80c19459e types: Add a visitor over data_value
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5701051857 types: Add a generic visit over abstract_type
The api is inspired by on std::variant.

This bridges the runtime type of a abstract_type object to a compile
time overload resolution. For example, it is possible to have a single
lambda to visit a string_type_impl, but it corresponds to two leaf
types (ascii and utf8).

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
e5c7deaeb5 types: Add a kind to abstract_type
The type hierarchy is closed, so we can give each leaf an enum value.

This will be used to implement a visitor pattern and reduce code
duplication.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
5c098eb7d0 types: Add more tests for abstract_type::to_string_impl
The corresponding code is correct, but I noticed no tests would fail
if it was broken while refactoring it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
096de10eee types: Remove abstract_type::equals
All types are interned, so we can just compare the pointers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Rafael Ávila de Espíndola
6a8ffb35ff types: Make a few concrete_type member functions public
These only use public member functions from data_value, so there is no
reason for not making them public too.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-08-14 10:02:00 -07:00
Gleb Natapov
1779c3b7f6 move admission control semaphore from cql server to storage_service
There are two reasons for the move. First is that cql server lifetime
is shorter than storage_proxy one and the later stores references to
the semaphore in each service_permit it holds. Second - we want thrift
(and in the future other user APIs) to share the same admission control
memory pool.

Fixes #4844

Message-Id: <20190814142614.GT17984@scylladb.com>
2019-08-14 18:49:56 +03:00
Gleb Natapov
a1e9e6faa2 storage_service: remove outdated comment
We in fact do stop cql server in storage_service::drain_on_shutdown()
which is called in main.cc during shutdown.

Message-Id: <20190814085027.GP17984@scylladb.com>
2019-08-14 11:52:49 +03:00
Avi Kivity
9f512509c7 github: remove github pull request template (#4833)
Since we do accept pull requests (in a long-running experiment), the
pull request template suggesting not to use them is inaccurate, and
many requesters forget to remove the boilerplace.

Remove the outdate template.
2019-08-14 09:28:39 +03:00
Pekka Enberg
595434a554 Merge "docker: relax permission checks" from Avi
"Commit e3f7fe4 added file owner validation to prevent Scylla from
 crashing when it tries to touch a file it doesn't own. However, under
 docker, we cannot expect to pass this check since user IDs are from
 different namespaces: the process runs in a container namespace, but the
 data files usually come from a mounted volume, and so their uids are
 from the host namespace.

 So we need to relax the check. We do this by reverting b1226fb, which
 causes Scylla to run as euid 0 in docker, and by special-casing euid 0
 in the ownership verification step.

 Fixes #4823."

* 'docker-euid-0' of git://github.com/avikivity/scylla:
  main: relax file ownership checks if running under euid 0
  Revert "dist/docker/redhat: change user of scylla services to 'scylla'"
2019-08-13 19:55:05 +03:00
Tomasz Grabiec
64ff1b6405 cql: alter type: Format field name as text instead of hex
Fixes #4841

Message-Id: <1565702635-26214-1-git-send-email-tgrabiec@scylladb.com>
2019-08-13 16:25:48 +03:00
Tomasz Grabiec
34cff6ed6b types: Fix abort on type alter which affects a compact storage table with no regular columns
Fixes #4837

Message-Id: <1565702247-23800-1-git-send-email-tgrabiec@scylladb.com>
2019-08-13 16:25:02 +03:00
Avi Kivity
1ed3356e0e main: relax file ownership checks if running under euid 0
During startup, we check that the data files are owned by our euid.
But in a container environment, this is impossible to enforce because
uid/username mappings are different between the host and the container,
and the data files are likely to be mounted from the host.

To allow for such environments, relax the checks if euid=0. This
both matches what happens in a container (programs run as root) and
the kernel access checks (euid 0 can do anything).

We can reconsider this when container uid mapping is better developed.

Fixes #4823.
Fixes #4536.
2019-08-13 14:36:08 +03:00
Avi Kivity
ca28fdc37d Revert "dist/docker/redhat: change user of scylla services to 'scylla'"
This reverts commit b1226fb15a. When the
data volume is mounted from the host (as is usual in container
deployments), we can't expect that the files will be owned by the
in-container scylla user. So that commit didn't really fix #4536.

A follow-up patch will relax the check so it passes in a container
environment.
2019-08-13 14:36:00 +03:00
Pekka Enberg
fed38f5179 reloc/build_reloc.sh: Add '--configure-flags' command line option
This adds a '--configure-flags FLAGS' command line option, which
overrides the flags passed to scylla.git 'configure.py' script. We need
this for flexibility of custom builds in Jenkins pipelines, for example.

Message-Id: <20190813095428.13590-1-penberg@scylladb.com>
2019-08-13 14:05:25 +03:00
Tomasz Grabiec
0cf4fab2ca Merge "Multishard combining reader more robust reader recreation" from Botond
Make the reader recreation logic more robust, by moving away from
deciding which fragments have to be dropped based on a bunch of
special cases, instead replacing this with a general logic which just
drops all already seen fragments (based on their position).  Special
handling is added for the case when the last position is a range
tombstone with a non full prefix starting position.  Reproducer unit
tests are added for both cases.

Refs #4695
Fixes #4733
2019-08-13 11:53:07 +02:00
Gleb Natapov
00c4078af3 cache_hitrate_calculator: do not ignore a future returned from gossiper::add_local_application_state
We should wait for a future returned from add_local_application_state() to
resolve before issuing new calculation, otherwise two
add_local_application_state() may run simultaneously for the same state.

Fixes #4838.

Message-Id: <20190812082158.GE17984@scylladb.com>
2019-08-13 11:48:38 +03:00
Botond Dénes
fe58324fb9 tests: test_multishard_combining_reader_as_mutation_source: don't copy mutations cross shard
It's illegal. Freeze-unfreeze them instead when crossing shard
boundaries.
2019-08-13 10:16:02 +03:00
Botond Dénes
d746fb59a7 mutation_reader_test: harden test_multishard_combining_reader_as_mutation_source
Add `single_fragment_buffer` test variable. When set, the shard readers
are created with a max buffer size of 1, effectively causing them to
read a single fragment at a time. This, when combined with
`evict_readers=true` will stress the recreate reader logic to the max.
2019-08-13 10:16:02 +03:00
Botond Dénes
899afc0661 flat_mutation_reader_assertions: produces_range_tombstone(): be more lenient
Be more tolerant with different but equivalent representation of range
deletions. When expecting a range tombstone, keep reading range
tombstones while these can be merged with the cumulative range
tombstone, resulting from the merging of the previous range tombstones.
This results in tests tolerating range tombstones that are split into
several, potentially overlapping range tombstones, representing the
same underlying deletion.
2019-08-13 10:16:02 +03:00
Botond Dénes
53e1dca5ca tests/mutation_source_test: generate_mutation_sets() add row that falls into deleted prefix
This is tailored to the multishard_combining_reader, to make sure it
doesn't loos rows following a range tombstone with a prefix starting
position (whose prefix their keys fall into).
2019-08-13 09:47:55 +03:00
Botond Dénes
6bfe468a17 multishard_combining_reader: remote_reader::recreate_reader(): restore indentation 2019-08-13 09:47:55 +03:00
Botond Dénes
68353acc1c multishard_combining_reader: remote_reader: use next instead of last pos
Currently the remote reader uses the last seen fragment's position to
calculate the position the reader should continue from when the reader
is recreated after having been evicted. Recently it was discovered that
this logic breaks down badly when this last position is a non-full
clustering prefix (a range tombstone start bound). In this case, if only
the last position is available, there is no good way of computing the
starting position. Starting after this position will potentially miss
any rows that fall into the prefix (the current behaviour). Starting
from before it will cause all range tombstones with said prefix to be
re-emitted, causing other problems. A better solution is to exploit the
fact that sometimes we also know what the next fragment is.
These "some" times are the exact times that are problematic with the
current approach -- when the last fragment is a range tombstone.
Exploiting this extra knowledge allows for a much better way for
calculating the starting position: instead of maintaining the last
position, we maintain the next position, which is always safe to start
from. This is not always possible, but in many cases we can know for
sure what the next position is, for example if the last position was a
static row we can be sure the next position is the first clustering
position (or partition end). In the few cases where we cannot calculate
the next position we fall back to the previous logic and start from
*after* the last positions. The good news is that in these remaining
cases (the last fragment is a clustering row) it is safe to do so.

This patch also does some refactoring of the remote-reader internals,
all fill-buffer related logic is grouped together in a single
`fill_buffer()` method.
2019-08-13 09:47:55 +03:00
Botond Dénes
3949189918 multishard_combining_reader: remote_reader::do_fill_buffer(): reorganize drop logic
To make it more readable.
2019-08-13 09:47:55 +03:00
Botond Dénes
20c06adf80 position_in_partition: add for_partition_start() 2019-08-13 09:47:55 +03:00
Botond Dénes
87973498a1 query: refactor trim_clustering_row_ranges_to()
Allow expressing `pos` in term of a `position_in_partition_view`, which
allows finer control of the exact position, allowing specifying position
before, at or after a certain key.
The previous overload is kept for backward compatibility, invoking the
new overload behind the curtains.
2019-08-13 09:47:55 +03:00
Botond Dénes
3a5e7db9b6 tests: add unit test for query::trim_clustering_row_ranges_to()
We are about to do a major refactoring of this method. Add extensive
unit tests to ensure we don't brake it in the process.
2019-08-13 09:47:55 +03:00
Botond Dénes
1b4e88b972 position_in_partition_view: add get_bound_weight() 2019-08-13 09:47:55 +03:00
Avi Kivity
0d0ee20f76 Merge "Implement sstable_info API command (info on sstables)" from Calle
"
Refs #4726

Implement the api portion of a "describe sstables" command.

Adds rest types for collecting both fixed and dynamic attributes, some grouped. Allows extensions to add attributes as well. (Hint hint)
"

* 'sstabledesc' of https://github.com/elcallio/scylla:
  api/storage_service: Add "sstable_info" command
  sstables/compress: Make compressor pointer accessible from compression info
  sstables.hh: Add attribute description API to file extension
  sstables.hh: Add compression component accessor
  sstables.hh: Make "has_component" public
2019-08-12 21:16:08 +03:00
Dejan Mircevski
8be147d069 cql3: Handle empty LIKE pattern
Match SQL's LIKE in allowing an empty pattern, which matches only
an empty text field.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-08-12 19:48:31 +03:00
Rafael Ávila de Espíndola
99c7f8457d logalloc: Add a migrators_base that is common to debug and release
This simplifies the debug implementation and it now should work with
scylla-gdb.py.

It is not clear what, if anything, is lost by not using random
ids. They were never being reused in the debug implementation anyway.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190618144755.31212-1-espindola@scylladb.com>
2019-08-12 19:44:55 +03:00
Calle Wilund
2b19bfbfbc types: Remove obsolete "FIXME"
inet_addr_type_impl has supported ipv6 for some time now.
Message-Id: <20190812142731.6384-1-calle@scylladb.com>
2019-08-12 17:30:15 +03:00
Calle Wilund
1afc899e37 type_parser: Fix/improve exception messages
Removes long-standing FIXME for message detail
Also simplifies some code, removing duplication.

Message-Id: <20190812134144.2417-1-calle@scylladb.com>
2019-08-12 17:03:43 +03:00
Calle Wilund
fdf2017487 cql3::term: Remove unneeded const_cast
Removed no longer needed FIXME (to_string became const long ago)

Message-Id: <20190812133943.2011-1-calle@scylladb.com>
2019-08-12 17:00:46 +03:00
Amnon Heiman
6a0490c419 api/compaction_manager: indentation 2019-08-12 14:04:40 +03:00
Amnon Heiman
8181601f0e api/compaction_manager: do not hold map on the stack
This patch fixes a bug that a map is held on the stack and then is used
by a future.

Instead, the map is now wrapped with do_with.

Fixes #4824

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-08-12 14:04:00 +03:00
Asias He
131acc09cc repair: Adjust parallelism according to memory size (#4696)
After commit 8a0c4d5 (Merge "Repair switch to rpc stream" from Asias),
we increased the row buffer size for repair from 512KiB to 32MiB per
repair instance. We allow repairing 16 ranges (16 repair instance) in
parallel per repair request. So, a node can consume 16 * 32MiB = 512MiB
per user requested repair. In addition, the repair master node can hold
data from all the repair followers, so the memory usage on repair master
can be larger than 512MiB. We need to provide a way to limit the memory
usage.

In this patch, we limit the total memory used by repair to 10% of the
shard memory. The ranges that can be repaired in parallel is:

max_repair_ranges_in_parallel = max_repair_memory / max_repair_memory_per_range.

For example, if each shard has 4096MiB of memory, then we will have
max_repair_ranges_in_parallel = 4096MiB / 32MiB = 12.

Fixes #4675
2019-08-12 11:09:27 +03:00
Avi Kivity
e6cde72d2b Merge "Fix cql server admission control to take all leftover work into account" from Gleb
"
Current admission control takes a permit when cql requests starts and
releases it when reply is sent, but some requests may leave background
work behind after that point (some because there is genuine background
work to do like complete a write or do a read repair, and some because
a read/write may stuck in a queue longer than the request's timeout), so
after Scylla replies with a timeout some resources are still occupied.

The series fixes this by passing the permit down to storage_proxy where
it is held until all background work is completed.

Fixes #4768
"

* 'gleb/admission-v3' of github.com:scylladb/seastar-dev:
  transport: add a metric to follow memory available for service permit.
  storage_proxy: store a permit in a read executor
  storage_proxy: store a permit in a write response handler
  Pass service permit to storage_proxy
  transport: introduce service_permit class and use it instead of semaphore_units
  transport: hold admission a permit until a reply is sent
  transport: remove cql server load balancer
2019-08-12 11:02:37 +03:00
Gleb Natapov
3e27c2198a transport: add a metric to follow memory available for service permit.
Add a metric to follow memory available for service permit. When this
memory is close to zero cql server stops admitting new requests.
2019-08-12 10:20:43 +03:00
Gleb Natapov
7d7b1685aa storage_proxy: store a permit in a read executor
A read executor exists until read operation completes in its entirety
so storing a permit there guaranties that it will be freed only after
no background work left for the request on this server.
2019-08-12 10:20:43 +03:00
Gleb Natapov
d5ced800f0 storage_proxy: store a permit in a write response handler
A write response handler exists until write operation completes in its
entirety so storing a permit there guaranties that it will be freed only
after no background work left for the request on this server.
2019-08-12 10:20:43 +03:00
Gleb Natapov
6a4207f202 Pass service permit to storage_proxy
Current cql transport code acquire a permit before processing a query and
release it when the query gets a reply, but some quires leave work behind.
If the work is allowed to accumulate without any limit a server may
eventually run out of memory. To prevent that the permit system should
account for the background work as well. The patch is a first step in
this direction. It passes a permit down to storage proxy where it will
be later hold by background work.
2019-08-12 10:20:43 +03:00
Raphael S. Carvalho
b436c41128 compaction_manager: Prevent sstable runs from being partially compacted
Manager trims sstables off to allow compaction jobs to proceed in parallel
according to their weights. The problem is that trimming procedure is not
sstable run aware, so it could incorrectly remove only a subset of a sstable
run, leading to partial sstable run compaction.

Compaction of a sstable run could lead to inneficiency because the run structure
would be messed up, affecting all amplification factors, and the same generation
could even end up being compacted twice.

This is fixed by making the trim procedure respect the sstable runs.

Fixes #4773.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190730042023.11351-1-raphaelsc@scylladb.com>
2019-08-11 17:20:20 +03:00
Gleb Natapov
ddff7f48cf transport: introduce service_permit class and use it instead of semaphore_units
service_permit is a new class that allows sharing a permit among
different parts of request processing many of which can complete
at different times.
2019-08-11 16:08:55 +03:00
Gleb Natapov
2daa72b7dc transport: hold admission a permit until a reply is sent
Current code release admission permit to soon. If new requests are
admitted faster than client read replies back reply queue can grow to
be very big. The patch moves service permit release until after a reply
is sent.
2019-08-11 16:08:55 +03:00
Gleb Natapov
7e3805ed3d transport: remove cql server load balancer
It is buggy, unused and unnecessary complicates the code.
2019-08-11 16:08:52 +03:00
Nadav Har'El
f9d6eaf5ff reconcilable_result: switch to chunked_vector
Merged patch series from Avi Kivity:

In rare but valid cases (reconciling many tombstones, paging disabled),
a reconciled_result can grow large. This triggers large allocation
warnings. Switch to chunked_vector to avoid the large allocation.
In passing, fix chunked_vector's begin()/end() const correctness, and
add the reverse iterator function family which is needed by the conversion.

Fixes #4780.

Tests: unit (dev)

Commit Summary

    utils: chunked_vector: make begin()/end() const correct
    utils::chunked_vector: add rbegin() and related iterators
    reconcilable_result: use chunked_vector to hold partitions
2019-08-11 16:03:13 +03:00
Avi Kivity
ce2b0b2682 Merge "Add listen/rpc "prefer_ipv6" options to DNS lookup #4775" from Calle
"
Add listen/rpc "prefer_ipv6" options to DNS lookup of bind addresses for API/rpc/prometheus etc .

Fixes #4751

Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.

API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.

Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.

Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
"

* 'calle/ipv6' of https://github.com/elcallio/scylla:
  init: Use the "prefer_ipv6" options available for rpc/listen address/interface
  inet_address: Add optional "preferred type" to lookup
  config: Add rpc_interface_prefer_ipv6 parameter
  config: Add listen_interface_perfer_ipv6 parameter
  config.cc: Fix enable_ipv6_dns_lookup actual param name
2019-08-11 15:21:45 +03:00
Pekka Enberg
73113c0ea4 utils/fb_utilities.hh: Kill obsolete FIXME and commented out Java code
The FIXME was added in the very first commit ("utils: Convert
utils/FBUtilities.java") that introduced the fb_utilities class as a
stub. However, we have long implemented the parts that we actually use,
so drop the FIXME as obsolete. In addition, drop the remaining
uncommented Java code as unused and also obsolete.

Message-Id: <20190808182758.1155-1-penberg@scylladb.com>
2019-08-11 10:26:36 +03:00
Botond Dénes
fd925f6049 position_in_partition_view: add constructor with bound_weight
This is a low level constructor which allows directly providing a bound
weight to go with the key.
2019-08-09 10:54:27 +03:00
Pekka Enberg
547c072f93 dbuild: Make Maven local repository accessible
The Maven build tool ("mvn"), which is used by scylla-jmx and
scylla-tools-java, stores dependencies in a local repository stored at
$HOME/.m2. Make sure it's accessible to dbuild.

Message-Id: <20190808140216.26141-1-penberg@scylladb.com>
2019-08-08 17:36:13 +03:00
Avi Kivity
8f19b16fe4 Update seastar submodule
* seastar ed608e3c9e...fe2b5b0c6b (2):
  > Merge "handle discarded futures or suppress warning" from Benny
  > output_stream: Add close() blurb
2019-08-08 16:22:38 +03:00
Avi Kivity
4a5ec61438 Update seastar submodule
* seastar a1cf07858b...ed608e3c9e (4):
  > core: Add ability to abort on EBADF and ENOTSOCK
  > Revert "Merge "handle discarded futures or suppress warning" from Benny"
  > Merge "handle discarded futures or suppress warning" from Benny
  > reactor: remove replace variadic future<pollable_fd, socket_address> with future<tuple>
2019-08-08 14:22:29 +03:00
Raphael S. Carvalho
76cde84540 sstables/compaction_manager: Fix logic for filtering out partial sstable runs
ignore_partial_runs() brings confusion because i__p__r() equal to true
doesn't mean filter out partial runs from compaction. It actually means
not caring about compaction of a partial run.

The logic was wrong because any compaction strategy that chooses not to ignore
partial sstable run[1] would have any fragment composing it incorrectly
becoming a candidate for compaction.
This problem could make compaction include only a subset of fragments composing
the partial run or even make the same fragment be compacted twice due to
parallel compaction.

[1]: partial sstable run is a sstable that is still being generated by
compaction and as a result cannot be selected as candidate whatsoever.

Fix is about making sure partial sstable run has none of its fragments
selected for compaction. And also renaming i__p__r.

Fixes #4729.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
7d4bf10d87 docs/building-packages.md: Document how to build Scylla packages
This documents the steps needed to build Scylla's Linux packages with
the relocatable package infrastructure we use today.

Message-Id: <20190807134017.4275-1-penberg@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
79cece9f33 toolchain: Fix default command for dbuild Docker image
Running "dbuild" without a build command fails as follows:

  $ ./tools/toolchain/dbuild
  Error: This command has to be run under the root user.

Israel Fruchter discovered that the default command of our Docker image is this:

  "Cmd": [
    "bash",
    "-c",
    "dnf -y install python3-cassandra-driver && dnf clean all"
   ]

Let's make "/bin/bash" the default command instead, which will make
"dbuild" with no build command to return to the host shell.

Message-Id: <20190807133955.4202-1-penberg@scylladb.com>
2019-08-08 14:11:35 +03:00
Pekka Enberg
76cdec222f build_reloc.sh: Remove "--with" passed to "configure.py"
The build_reloc.sh script passes "--with=scylla" and "--with=iotune" to
the configure.py script. This is redundant as the
"scylla-package.tar.gz" target of ninja already limits itself to them.

Removing the "--with" options allows building unit tests after a
relocatable package has been built without having to rebuild anything.

Message-Id: <20190807130505.30089-1-penberg@scylladb.com>
2019-08-07 16:28:00 +03:00
Avi Kivity
e548bdb2e8 thrift, transport: switch to new seastar accept() API (#4814)
Seastar switched accept() to return a single struct instead of a variadic future,
adjust the code to the new API to avoid deprecation warnings.
2019-08-07 15:23:26 +02:00
Pekka Enberg
f68fffd99a reloc/build_reloc.sh: Make build mode configurable
Add a '--mode <mode>' command line option to the 'build_reloc.sh' script
so that we can create relocatable packages for debug builds.

The '--mode' command line option defaults to 'release' so existing users
are unaffected.

Message-Id: <20190807120759.32634-1-penberg@scylladb.com>
2019-08-07 16:19:37 +03:00
Asias He
fee26b9f6e repair: Fix use after free in do_estimate_partitions_on_local_shard (#4813)
We need to keep the sstables object alive during the operation of
do_for_each.

Notes: No need to backport to 3.1.

Fixes #4811
2019-08-07 15:19:21 +02:00
Asias He
49a73aa2fc streaming: Move stream_mutation_fragments_cmd to a new file (#4812)
Avoid including the lengthy stream_session.hh in messaging_service.

More importantly, fix the build because currently messaging_service.cc
and messaging_service.hh does not include stream_mutation_fragments_cmd.
I am not sure why it builds on my machine. Spotted this when backporting
the "streaming: Send error code from the sender to receiver" to 3.0
branch.

Refs: #4789
2019-08-07 14:59:46 +02:00
Asias He
288371ce75 streaming: Do not call rpc stream flush in send_mutation_fragments
The stream close() guarantees the data sent will be flushed. No need to
call the stream flush() since the stream is not reused.

Follow up fix for commit bac987e32a (streaming: Send error code from
the sender to receiver).

Refs #4789
2019-08-07 14:31:17 +02:00
Avi Kivity
689fc72bab Update seastar submodule
* seastar d199d27681...a1cf07858b (1):
  > Merge 'Do not return a variadic future form server_socket::accept()' from Avi

Seastar configure.py now has --api-level=1, to keep us one the old variadic future
server_socket::accept() API.
2019-08-06 18:37:27 +03:00
Avi Kivity
97f66c72af Update seastar submodule
* seastar d90834443c...d199d27681 (3):
  > sharded: support for non-cooperative service types
  > shared_future: silence warning about discarded future
  > Fix backtrace suppression message in cpu_stall_detector.

Fixes #4560.
2019-08-06 18:00:48 +03:00
Asias He
bac987e32a streaming: Send error code from the sender to receiver
In case of error on the sender side, the sender does not propagate the
error to the receiver. The sender will close the stream. As a result,
the receiver will get nullopt from the source in
get_next_mutation_fragment and pass mutation_fragment_opt with no value
to the generating_reader. In turn, the generating_reader generates end
of stream. However, the last element that the generating_reader has
generated can be any type of mutation_fragment. This makes the sstable
that consumes the generating_reader violates the mutation_fragment
stream rule.

To fix, we need to propagate the error. However RPC streaming does not
support propagate the error in the framework. User has to send an error
code explicitly.

Fixes: #4789
2019-08-06 16:54:56 +02:00
Piotr Jastrzebski
24f6d90a45 sstables: add test of sstables_mutation_reader for missing partition_end
Reproduces #4783

Issue was fixed by 9b8ac5ecbc

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-08-06 15:11:19 +03:00
Calle Wilund
6c62e5741e init: Use the "prefer_ipv6" options available for rpc/listen address/interface
Fixes #4751

Adds using a preferred address family to dns name lookups related to
listen address and rpc address, adhering to the respective "prefer" options.

API, prometheus and broadcast address are all considered to be covered by
the "listen_interface_prefer_ipv6" option.

Note: scylla does not yet support actual interface binding, but these
options should apply equally to address name parameters.

Setting a "prefer_ipv6" option automtially enables ipv6 dns family query.
2019-08-06 08:32:10 +00:00
Calle Wilund
6c0c1309b3 inet_address: Add optional "preferred type" to lookup
Allows using prio in address family dns lookup. I.e. prefer ipv4/ipv6 if avail.
2019-08-06 08:32:10 +00:00
Calle Wilund
d3410f0e48 config: Add rpc_interface_prefer_ipv6 parameter
As already existing in scylla.yaml
2019-08-06 08:32:10 +00:00
Calle Wilund
0028cecb8e config: Add listen_interface_perfer_ipv6 parameter
As already existing in scylla.yaml.
https://github.com/apache/cassandra/blob/cassandra-3.11/conf/cassandra.yaml#L622
2019-08-06 08:32:10 +00:00
Calle Wilund
39d18178eb config.cc: Fix enable_ipv6_dns_lookup actual param name
When adding option (and iterating through config refactoring)
the member name and the config param name got out of sync
2019-08-06 08:32:09 +00:00
Calle Wilund
298da3fc4b api/storage_service: Add "sstable_info" command
Assembles information and attributes of sstables in one or more
column families.

v2:
* Use (not really legal) nested "type" in json
* Rename "table" param to "cf" for consistency
* Some comments on data sizes
* Stream result to avoid huge string allocations on final json
2019-08-06 08:14:15 +00:00
Calle Wilund
95a8ff12e7 sstables/compress: Make compressor pointer accessible from compression info 2019-08-06 07:07:44 +00:00
Calle Wilund
d15c63627c sstables.hh: Add attribute description API to file extension 2019-08-06 07:07:44 +00:00
Calle Wilund
4c67d702c2 sstables.hh: Add compression component accessor 2019-08-06 07:07:44 +00:00
Calle Wilund
770f912221 sstables.hh: Make "has_component" public 2019-08-06 07:07:44 +00:00
Avi Kivity
b77c4e68c2 Merge "Add Zstandard compression #4802" from Kamil
"
This adds the option to compress sstables using the Zstandard algorithm
(https://facebook.github.io/zstd/).

To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the 'compression' argument when creating a table.
You can also specify a 'compression_level' (default is 3). See Zstd documentation for the available
compression levels.

Resolves #2613.

This PR also fixes a bug in sstables/compress.cc, where chunk length in bytes
was passed to the compressor as chunk length in kilobytes. Fortunately,
none of the compressors implemented until now used this parameter.

Example usage (assuming there exists a keyspace 'a'):

    create table a.a (a text primary key, b int) with compression = {'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': 1, 'chunk_length_in_kb': '64'};

Notes:

 1. The code uses an external dependency: https://github.com/facebook/zstd. Since I'm using "experimental" features of the library (using my own allocated memory to store the compression/decompression contexts), according to the library's documentation we need to link it statically (https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L63). I added a git submodule.
 2. The compressor performs some dynamic allocations. Depending on the specified chunk length and/or compression level the allocations might be big and seastar throws warnings. But with reasonable chunk length sizes it should be OK.
 3. It doesn't yet provide an option to train it with dictionaries, but that should be easy to add in another commit.
"

* 'zstd' of https://github.com/kbr-/scylla:
  Configure: rename seastar_pool to submodule_pool, add more submodules to the pool
  Add unit tests for Zstd compression
  Enable tests that use compressed sstable files
  Add ZStandard compression
  Fix the value of the chunk length parameter passed to compressors
2019-08-05 16:29:27 +03:00
Botond Dénes
23cc6d6fb2 make_flat_mutation_reader_from_fragments: reader: silence discarded future warning
The fragment reader calls `fast_forward_to()` from its constructor to
discard fragments that fall outside the query range. Mmove the
the fast-forward code in to an internal void returning method, and call
that from both the constructor and `fast_forward_to()`, to avoid a
warning on a discarded future<>.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>
2019-08-05 16:21:50 +03:00
Kamil Braun
3a0308f76f Configure: rename seastar_pool to submodule_pool, add more submodules to the pool
Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00
Kamil Braun
c3c7c06e10 Add unit tests for Zstd compression
Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00
Kamil Braun
8b58cdab0a Enable tests that use compressed sstable files
The files in tests/sstables/3.x/compressed/ were not used in the tests.
This commit:
- renames the directory to tests/sstables/3.x/lz4/,
- adds analogous directories and files for other compressors,
- adds tests using these files,
- does some minor refactoring.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:56 +02:00
Kamil Braun
f14e6e73bb Add ZStandard compression
This adds the option to compress sstables using the Zstandard algorithm
(https://facebook.github.io/zstd/).
To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the 'compression' argument when creating a table.
You can also specify a 'compression_level'. See Zstd documentation for the available
compression levels.
Resolves #2613.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:53 +02:00
Kamil Braun
7a61bcb021 Fix the value of the chunk length parameter passed to compressors
This commit also fixes a bug in sstables/compress.cc, where chunk length in bytes
was passed to the compressor as chunk length in kilobytes. Fortunately,
none of the compressors implemented until now used this parameter.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:31:33 +02:00
Avi Kivity
95c0804731 Merge "Catch unclosed partition sstable write #4794" from Tomasz
"
Not emitting partition_end for a partition is incorrect. SStable
writer assumes that it is emitted. If it's not, the sstable will not
be written correctly. The partition index entry for the last partition
will be left partially written, which will result in errors during
reads. Also, statistics and sstable key ranges will not include the
last partition.

It's better to catch this problem at the time of writing, and not
generate bad sstables.

Another way of handling this would be to implicitly generate a
partition_end, but I don't think that we should do this. We cannot
trust the mutation stream when invariants are violated, we don't know
if this was really the last partition which was supposed to be
written. So it's safer to fail the write.

Enabled for both mc and la/ka.

Passing --abort-on-internal-error on the command line will switch to
aborting instead of throwing an exception.

The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
"

* 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla:
  sstables: writer: Validate that partition is closed when the input mutation stream ends
  config, exceptions: Add helper for handling internal errors
  utils: config_file: Introduce named_value::observe()
2019-08-04 15:18:31 +03:00
Asias He
3b39a59135 storage_service: Replicate and advertise tokens early in the boot up process
When a node is restarted, there is a race between gossip starts (other
nodes will mark this node up again and send requests) and the tokens are
replicated to other shards. Here is an example:

- n1, n2
- n2 is down, n1 think n2 is down
- n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends
  reads/writes to n2, but n2 hasn't replicated the token_metadata to all
  the shards.
- n2 complains:
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error
  (sorted_tokens is empty in first_token_index!)

The code path looks like below:

0 stoarge_service::init_server
1    prepare_to_join()
2          add gossip application state of NET_VERSION, SCHEMA and so on.
3         _gossiper.start_gossiping().get()
4    join_token_ring()
5           _token_metadata.update_normal_tokens(tokens, get_broadcast_address());
6           replicate_to_all_cores().get()
7           storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS

The race talked above is at line 3 and line 6.

To fix, we can replicate the token_metadata early after it is filled
with the tokens read from system table before gossip starts. So that
when other nodes think this restarting node is up, the tokens are
already replicated to all the shards.

In addition, this patch also fixes the issue that other nodes might see
a node miss the TOKENS and STATUS application state in gossip if that
node failed in the middle of a restarting process, i.e., it is killed
after line 3 and before line 7. As a result we could not replace the
node.

Tests: update_cluster_layout_tests.py
Fixes: #4709
Fixes: #4723
2019-08-04 15:18:31 +03:00
Avi Kivity
aebb9bd755 Merge "tests/mutation_source_test: pass query time to populate" from Botond
"
Altough 733c68cb1 made sure to synchronize the query time used for
compaction happening in the mutation_source_test suite and that
happening in the `flat_mutation_assertions` class, there remained
another hidden compaction that potentially could use a different
timestamp and hence produce false positive test failures. This was
hastily fixed by cea3338e3, by just increasing the TTL of cells, thus
avoiding possible differences in compaction output. This mini-series is
the proper fix to this problem. It passes a query time to the populate
function, allowing the users of the mutation source test suite to
forward it to any compaction they might be doing on the data. The quick
fix is reverted in favor of the proper fix.

Refs: #4747
"

* 'mutation_source_tests_proper_ttl_fix/v1' of https://github.com/denesb/scylla:
  Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl"
  tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time
  tests/mutation_source_test: add populate_fn overload with query_time
2019-08-04 15:18:31 +03:00
Tomasz Grabiec
43c7144133 sstables: writer: Validate that partition is closed when the input mutation stream ends
Not emitting partition_end for a partition is incorrect. Sstable
writer assumes that it is emitted. If it's not, the sstable will not
be written correctly. The partition index entry for the last partition
will be left partially written, which will may result in errors during
reads. Also, statistics and sstable key ranges will not include the
last partition.

It's better to catch this problem at the time of writing, and not
generate bad sstables.

Another way of handling this would be to implicitly generate a
partition_end, but I don't think that we should do this. We cannot
trust the mutation stream when invariants are violated, we don't know
if this was really the last partition which was supposed to be
written. So it's safer to fail the write.

Enabled for both mc and la/ka.
2019-08-02 11:13:54 +02:00
Tomasz Grabiec
bf70ee3986 config, exceptions: Add helper for handling internal errors
The handler is intended to be called when internal invariants are
violated and the operation cannot safely continue. The handler either
throws (default) or aborts, depending on configuration option.

Passing --abort-on-internal-error on the command line will switch to
aborting.

The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
2019-08-02 11:13:54 +02:00
Tomasz Grabiec
61a9cfbfa9 utils: config_file: Introduce named_value::observe() 2019-08-02 11:13:53 +02:00
Avi Kivity
093d2cd7e5 reconcilable_result: use chunked_vector to hold partitions
Usually, a reconcilable_result holds very few partitions (1 is common),
since the page size is limited by 1MB. But if we have paging disabled or
if we are reconciling a range full of tombstones, we may see many more.
This can cause large allocations.

Change to chunked_vector to prevent those large allocations, as they
can be quite expensive.

Fixes #4780.
2019-08-01 18:49:13 +03:00
Avi Kivity
eaa9a5b0d7 utils::chunked_vector: add rbegin() and related iterators
Needed as an std::vector replacement.
2019-08-01 18:39:47 +03:00
Avi Kivity
df6faae980 utils: chunked_vector: make begin()/end() const correct
begin() of a const vector should return a const_iterator, to avoid
giving the caller the ability to mutate it.

This slipped through since iterator's constructor does a const_cast.

Noticed by code inspection.
2019-08-01 18:38:53 +03:00
Botond Dénes
0b748bb8fe Revert "tests/mutation_source_tests: generate_mutation_sets() use larger ttl"
This reverts commit cea3338e38.

The above was a quick fix to allow the tests to pass, there is a proper
fix now.
2019-08-01 13:05:46 +03:00
Botond Dénes
ac91f1f6b8 tests/sstable_mutation_test: test_sstable_conforms_to_mutation_source: use query_time
Use the query_time passed in to the populate function and forward it to
the sstable constructor, so that the compaction happening during sstable
write uses the same query time that any compaction done by the mutation
source test suit does.
2019-08-01 13:04:21 +03:00
Botond Dénes
ce1ed2cb70 tests/mutation_source_test: add populate_fn overload with query_time
So tests that do compaction can pass the query_time they used for it to
clients that do some compaction themselves, making sure all compactions
happen with the same query time, avoiding false positive test failures.
2019-08-01 13:03:03 +03:00
Vlad Zolotarov
15eaf2fd8e dist: scylla_util.py: get_mode_cpuset(): don't let false alarm error messages
Don't let perftune.py print false alarm error message when we calculate
a compute CPU set for tuning modes.

This may happen when we calculate a CPU set for non-MQ tuning modes on
small systems on which these modes are forbidden because they would
result in a zero CPU set, e.g. sq_split on a system with a single
physical core.

We are going to utilize a newly introduced --get-cpu-mask-quiet execution
mode introduced to the seastar/script/perftune.py by the
"perftune.py: introduce --get-cpu-mask-quiet" series which would return
a zero CPU set if that's what it turns out to be instead of exiting with
an error what --get-cpu-mask would do in such a case.

The rest of scylla_util.py logic is going to handle a zero CPU set
returned by get_mode_cpuset() correctly.

Fixes #4211
Fixes #4443

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190731212901.9510-1-vladz@scylladb.com>
2019-08-01 11:14:39 +03:00
Botond Dénes
339be3853d foreign_reader: silence warning about discarded future
And add a comment explaining why this is fine.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190801062234.69081-1-bdenes@scylladb.com>
2019-08-01 10:11:24 +03:00
Avi Kivity
47b0f40d27 Merge "introduce metrics for non-local queries" from Konstantin
"
A fix for #4338 "storage_proxy add a counter for cql requests
that arrived to a non replica"

Such requests should be tracked since forwarding them to a correct
replica can create a lot network noise and incur significant performance
penalty.

The current metrics are considered insufficient after introduction
of heat-weighted load balancing.
"

Fixes #4388.

* 'gh-4338' of https://github.com/kostja/scylla:
  metrics: introduce a metric for non-local reads
  metrics: account writes forwarded by a coordinator in an own metric.
2019-08-01 10:09:33 +03:00
Avi Kivity
77686ab889 Merge "Make SSTable cleanup run aware" from Raphael
"
Fixes #4663.
Fixes #4718.
"

* 'make_cleanup_run_aware_v3' of https://github.com/raphaelsc/scylla:
  tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id
  table: Make SSTable cleanup run aware
  compaction: introduce constants for compaction descriptor
  compaction: Make it possible to config the identifier of the output sstable run
  table: do not rely on undefined behavior in cleanup_sstables
2019-07-31 19:10:22 +03:00
Botond Dénes
a41e8f0bcf query::consume_page: move away from variadic future
Require the `consumer` to return 0 or 1 value in its future. Update all
downstream code.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190731140440.57295-1-bdenes@scylladb.com>
2019-07-31 18:49:47 +03:00
Avi Kivity
320fd2be60 Update seastar submodule
* seastar 3f88e9068b...d90834443c (12):
  > Print warning when somaxconn lower than backlog parameter used for listen()
  > Merge "perftune.py: introduce --get-cpu-mask-quiet" from Vlad
  > seastar-json2code: Handle "$ref"-usage for nested object types properly
  > Make future [[nodiscard]]
  > Allow pass listen_options to http_server::listen
  > Handle EPOLLHUP and EPOLLERR from epoll explicitly
  > reactor: fix false positives in the stall detector due to large task queue
  > Merge "Small asan related improvements" from Rafael
  > thread: reduce allocations during context switch
  > thread: remove deprecated thread_scheduling_group and its unit test
  > reactor: make _polls to be non atomic
  > reactor: remove unused _tasks_processed variable
2019-07-31 18:30:10 +03:00
Takuya ASADA
60ec8b2a04 install.sh: install everything when --pkg is not specified
On previous commit ac9b115a8f, install.sh requires to specify single package using --pkg, there is no way to select all.
It should be select all packages when running install.sh without --pkg.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190731013245.5857-1-syuu@scylladb.com>
2019-07-31 16:43:57 +03:00
Asias He
5d3e4d7b73 messaging_service: Check if messaging_service is stopped before get_rpc_client
get_rpc_client assumes the messaging_service is not stopped. We should check
is_stopping() before we call get_rpc_client.

We do such check in existing code, e.g., send_message and friends. Do
the same check in the newly introduced
make_sink_and_source_for_stream_mutation_fragments() and friends for row
level repair.

Fixes: #4767
2019-07-31 11:44:57 +03:00
Avi Kivity
74349bdf7e Merge "Partially devirtualize CQL restrictions" from Piotr
"
This series is a batch of first small steps towards devirtualizing CQL restrictions:
 - one artificial parent class in the hierarchy is removed: abstract_restriction
 - the following functions are devirtualized:
    * is_EQ()
    * is_IN()
    * is_slice()
    * is_contains()
    * is_LIKE()
    * is_on_token()
    * is_multi_column()

Future steps can involve the following:
 - introducing a std::variant of restriction targets: it's either a column
   or a vector of columns
 - introducing a std::variant of restriction values: it's one of:
   {term, term_slice, std::vector<term>, abstract_marker}

The steps above will allow devirtualizing most of the remaining virtual functions
in favor of std::visit. They will also reduce the number of subclasses,
e.g. what's currently `token_restriction::IN_with_values` can be just an instance
of `restriction`, knowing that it's on a token, having a target of std::vector<column>
and a value of std::vector<term>.

Tests: unit(dev), dtest: cql_tests, cql_additional_tests
"

* 'refactor_restrictions_2' of https://github.com/psarna/scylla:
  cql3: devirtualize is_on_token()
  cql3: devirtualize is_multi_column()
  cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE
  tests: add enum_set adding case
  cql3: allow adding enum_sets
  cql3: remove abstract_restriction class
2019-07-31 11:44:57 +03:00
Vlad Zolotarov
9df53b8bca configure.py: ignore 'thrift -version' exit code
(At least) on Ubuntu 19 'thrift -version' prints the expected
string but its exit status is non-zero:

$ thrift -version
Thrift version 0.9.1
$ echo $?
1

We don't really care about the exit status but rather about the printed
version string. If there is going to be some problem with the command,
e.g. it's missing, the printed string is not going to be as expected
anyway - let's verify that explicitly by checking the format of the
returned string in that case.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190722211729.24225-1-vladz@scylladb.com>
2019-07-31 11:44:57 +03:00
Botond Dénes
cea3338e38 tests/mutation_source_tests: generate_mutation_sets() use larger ttl
Currently all cells generated by this method uses a ttl of 1. This
causes test flakyness as tests often compact the input and output
mutations to weed out artificial differences between them. If this
compaction is not done with the exact same query time, then some cells
will be expired in one compaction but not in the other.
733c68cb1 attempted to solve this by passing the same query time to
`flat_mutation_reader_assertions::produce_compacted()` as well as
`mutation_partition::compact_for_query()` when compacting the input
mutation. However a hidden compaction spot remained: the ka/la sstable
writer also does some compaction, and for this it uses the time point
passed to the `sstable` constructor, which defaults to
`gc_clock::now()`. This leads to false positive failures in
`sstable_mutation_test.cc`.
At this point I don't know what the original intent was behind this low
`ttl` value. To solve the immediate problem of the tests failing, I
increased it. If it turns out that this `ttl` value has a good reason,
we can do a more involved fix, of making sure all sstables written also
get the same query time as that used for the compaction.

Fixes: #4747

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190731081522.22915-1-bdenes@scylladb.com>
2019-07-31 11:44:57 +03:00
Piotr Sarna
2f65144a20 cql3: devirtualize is_on_token()
Instead of being a virtual function, is_on_token leverages
the existing enum inside the `restriction` class.
2019-07-29 17:18:50 +02:00
Piotr Sarna
68aa42c545 cql3: devirtualize is_multi_column()
Instead of being a virtual function, is_multi_column leverages
an enum.
2019-07-29 17:18:50 +02:00
Piotr Sarna
83fbfe5a4f cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE
Instead of virtual functions, operation for each restriction
is determined by an enum value it stores.
2019-07-29 17:18:49 +02:00
Piotr Sarna
e9798354ae tests: add enum_set adding case 2019-07-29 17:15:51 +02:00
Piotr Sarna
989c31f68b cql3: allow adding enum_sets
Enum set can now be added to another enum set in order to create
a sum of both.
2019-07-29 17:15:51 +02:00
Piotr Sarna
5e06801f12 cql3: remove abstract_restriction class
All restrictions inherit from `abstract_restriction` class,
which has only one parent class: `restriction`. To simplify the
inheritance tree, `restriction` and `abstract_restriction`
are merged into one class named `restriction`.
2019-07-29 15:54:39 +02:00
Botond Dénes
733c68cb13 tests: flat_reader_assertions::produces_compacted(): add query_time param
`produces_compacted()` is usually used in tandem of another
compaction done on the expected output (`m` param). This is usually done
so that even though the reader works with an uncompacted stream, when
checking the checking of the result will not fail due to insignificant
changes to the data, e.g. expired collection cells dropped while merging
two collections. Currently, the two compactions, the one inside
`produce_compacted()` and the one done by the caller uses two separate
calls to `gc_clock::now()` to obtain the query time. This can lead to
off-by-one errors in the two query times and subsequently artificial
differences between the two compacted mutations, ultimately failing the
test due to a false-positive.
To prevent this allow callers to pass in a query time, the same they
used to compact the input mutation (`m`).

This solves another source of flakyness in unit tests using the mutation
source test suite.

Refs: #4695
Fixes: #4747
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190726144032.3411-1-bdenes@scylladb.com>
2019-07-28 10:59:50 +03:00
Botond Dénes
f215286525 tests/mutation_reader_tests: move away from variadic futures
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190724101005.19126-1-bdenes@scylladb.com>
2019-07-27 13:21:24 +03:00
Botond Dénes
0f30bc0004 mutation_reader: move away from variadic futures
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190724102246.20450-1-bdenes@scylladb.com>
2019-07-27 13:21:24 +03:00
Botond Dénes
6742c77229 scylla-gdb.py: fix scylla_ptr
Broken since b3adabda2.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190726140532.124406-1-bdenes@scylladb.com>
2019-07-27 13:21:24 +03:00
Avi Kivity
b272db368f sstable: index_reader: close index_reader::reader more robustly
If we had an error while reading, then we would have failed to close
the reader, which in turn can cause memory corruption. Make the
closing more robust by using then_wrapped (that doesn't skip on
exception) and log the error for analysis.

Fixes #4761.
2019-07-26 14:26:04 +02:00
Avi Kivity
fcf3195e54 Update seastar submodule
* seastar c1be3c912f...3f88e9068b (3):
  > reactor: improve handling of connect storms
  > json: Make date formatter use RFC8601/RFC3339 format
  > reactor: fix deadlock of stall detector vs dlopen

Fixes #4759.
2019-07-25 18:29:54 +03:00
Takuya ASADA
ac9b115a8f dist/debian: use install.sh on Debian
Currently, install.sh just used for building .rpm, we have similar build script
under dist/debian, sometimes it become inconsistent with install.sh.
Since most of package build process are same, we should share install.sh on both
.rpm and .deb package build process.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190725123207.2326-1-syuu@scylladb.com>
2019-07-25 18:22:42 +03:00
Botond Dénes
6dd8c4da83 test_multishard_combining_reader_non_strictly_monotonic_positions: use the same deletion_time for tombstones
Across all calls to `make_fragments_with_non_monotonic_positions()`, to
prevent off-by one errors between the separately generated test input
and expected output. This problem was already supposed to be fixed by
5f22771ea8 but for some reason that only
used the same deletion time *inside* a single call, which will still
fall short in some cases.
This should hopefully fix this problem for good.

Refs: #4695
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190724073240.125975-1-bdenes@scylladb.com>
2019-07-25 12:37:34 +02:00
Kamil Braun
148d4649d6 Add option to create a XUnit output file for non-boost tests in test.py. (#4757)
If the user specifies an output file name using "--xunit=<filename>",
test.py will write the test results of non-boost tests to the file in the XUnit XML format.
Every boost test creates its own results file already.
Resolves #4680.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-25 12:47:47 +03:00
Vlad Zolotarov
53cf90b075 ec2_snitch: properly build the AWS meta server address
Explicity pass the port number of the AWS metadata server API
when creating a corresponding socket.

This patch fixes the regression introduced by 4ef940169f.

Fixes #4719

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-07-25 10:50:01 +03:00
Tomasz Grabiec
3af8431a40 Merge "compaction: allow collecting purged data" from Botond
compaction: allow collecting purged data

Allow the compaction initiator to pass an additional consumer that will
consume any data that is purged during the compaction process. This
allows the separate retention of these dead cells and tombstone until
some long-running process like compaction safely finishes. If the
process fails or is interrupted the purged data can be used to prevent
data resurrection.

This patch was developed to serve as the basis for a solution to #4531
but it is not a complete solution in and on itself.

This series is a continuation of the patch: "[PATCH v1 1/3] Introduce
Garbage Collected Consumer to Mutation Compactor" by Raphael S.
Carvalho <raphaelsc@scylladb.com>.

Refs: #4531

* https://github.com/denesb/scylla.git compaction_collect_purged_data/v8:
  Introduce compaction_garbage_collector interface
  collection_type_impl::mutation: compact_and_expire() add collector
    parameter
  row: add garbage_collector
  row_marker: de-inline compact_and_expire()
  row_marker: add garbage_collector
  Introduce Garbage Collected Consumer to Mutation Compactor
  tests: mutation_writer_test.cc/generate_mutations() ->
    random_schema.hh/generate_random_mutations()
  tests/random_schema: generate_random_mutations(): remove `engine`
    parameter
  tests/random_schema: add assert to make_clustering_key()
  tests/random_schema: generate_random_mutations(): allow customizing
    generated data
  tests: random_schema: futurize generate_random_mutations()
  random_schema: generate_random_mutations(): restore indentation
  data_model: extend ttl and expiry support
  tests/random_schema: generate_random_mutations(): generate partition
    tombstone
  random_schema: add ttl and expiry support
  tests/random: add get_bool() overload with random engine param
  random_schema: generate_random_mutations(): ensure partitions are
    unique
  tests: add unit tests for the data stream split in compaction
2019-07-23 17:12:28 +02:00
Avi Kivity
44b5878011 Merge "Fix possible stalls in row level repair" from Asias
"
After switching to rpc stream interface, we increased the row buffer
size. Code works on the buffer that do not yield can stall the reactor.

This series fixes the issue by futurizing or running the code in thread
and yield.

Fixes: #4642
"

* 'repair_switch_to_rpc_stream_fix_stall' of https://github.com/asias/scylla:
  repair: Enable rpc stream in row level repair
  repair: Wrap with foreign_ptr to avoid cross cpu free
  repair: Futurize get_repair_rows_size and row_buf_size
  repair: Avoid calling get_repair_rows_size in get_sync_boundary
  repair: Futurize row_buf_csum
  repair: Yield inside get_set_diff
  repair: Use get_repair_rows_size helper in get_sync_boundary
  repair: Avoid stall in do_estimate_partitions_on_local_shard
  remove get_row_diff
  repair: Futurize get_row_diff to avoid stall
  repair: Fix possible stall in request_row_hashes
  repair: Allow default construct for repair_row
  repair: Remove apply_rows
  repair: Run get_row_diff_with_rpc_stream in a thread
  repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread
  repair: Run get_row_diff inside a thread
  repair: Add apply_rows_on_master_in_thread
  repair: Add apply_rows_on_follower
  repair: Futurize working_row_hashes
  repair: Remove get_full_row_hashes helper
2019-07-22 15:54:06 +03:00
Avi Kivity
9e630eb734 Update seastar submodule
* seastar 44a300cd50...c1be3c912f (9):
  > execution_stage: prevent unbounded growth
  > io queues: Add renaming functionality to io priority class
  > scheduling: Add rename functionality to scheduling groups
  > net: Add listen_backlog option for posix stack
  > future: deprecate variadic futures
  > include,tests: add workaround for missing guaranteed copy elision
  > core/dpdk_rte: handle 64+ cores
  > perftune: add a dry-run mode
  > build: support building dpdk on arm64

Fixes #4749.
2019-07-22 15:41:54 +03:00
Avi Kivity
e03c7003f1 toppartitions: fix race between listener removal and reads
Data listener reads are implemented as flat_mutation_readers, which
take a reference to the listener and then execute asynchronously.
The listener can be removed between the time when the reference is
taken and actual execution, resulting in a dangling pointer
dereference.

Fix by using a weak_ptr to avoid writing to a destroyed object. Note that writes
don't need protection because they execute atomically.

Fixes #4661.

Tests: unit (dev)
2019-07-22 13:26:18 +02:00
Avi Kivity
d730969278 Merge "make sure failure to create snapshots won't crash the node" from Glauber
"
Issue #4558 describes a situation in which failure to execute clearsnapshots
will hard crash the node. The problem is that clearsnapshots will internally use
lister::rmdir, which in turn has two in-tree users: clearing snapshots and clearing
temporary directories during sstable creation. The way it is currently
coded, it wraps the io functions in io_check, which means that failures
to remove the directory will crash the database.

We recently saw how benign failures crashed a database during
clearsnapshot: we had snapshot creation running in parallel, adding more
files to the directory that wasn't empty by the time of deletion. I
have also seen very often users add files to existing directories by
accident, which is another possibility to trigger that.

This patch removes the io_check from lister, and moves it to the caller
in which we want to be more strict. We still want to be strict about
the creation of temporary directories, since users shouldn't be touching
that in any way.

Also while working on that, I realized we have no tests for snapshots of
any kind in tree, so let's write some
"

* 'snapshots' of https://github.com/glommer/scylla:
  tests: add tests for snapshots.
  lister: don't crash the node on failure to remove snapshot
2019-07-22 11:09:23 +03:00
Rafael Ávila de Espíndola
636e2470b1 Always close commitlog files
We were using segment::_closed to decide whether _file was already
closed. Unfortunately they are not exactly the same thing. As far as
I understand it, segments can be closed and reused without actually
closing the file.

Found with a seastar patch that asserts on destroying an open
append_challenged_posix_file_impl.

Fixes #4745.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190721171332.7995-1-espindola@scylladb.com>
2019-07-22 10:08:57 +03:00
Vlad Zolotarov
5632c0776e tests: fix the compilation with fmt v5.3.0
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.

Compiler's complain is: "error: static assertion failed: mismatch
between char-types of context and argument"

Fix this by explicitly using to_hex() converter.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190716221231.22605-3-vladz@scylladb.com>
2019-07-21 16:42:54 +03:00
Nadav Har'El
db8d4a0cc6 Add computed columns
Merged patch series by Piotr Sarna:

This series introduces the concept of "computed" column, which represents
values not provided directly by the user, but computed on the fly -
possibly using other column values. It will be used in the future to
implement map value indexing, collection indexing, etc. Right now the only
use is the token column for secondary indexes - which is a column computed
from the base partition key value.

After this series, another one that depends on it and adds map value
indexing will be pushed.

Tests: unit(dev)

Piotr Sarna (14):
  schema: add computed info to column definition
  schema: add implementation of computing token column
  schema: allow marking columns as computed in schema builder
  service: add computed columns feature
  view: check for computed columns in view
  view: remove unused token_for function
  database: add fixing previous secondary index schemas
  tests: disable computed columns feature in schema change test
  tests: add schema change test regeneration comment
  db: add system_schema.computed_columns
  docs: init system_schema_keyspace.md with column computations
  tests: generate new test case for schema change + computed cols
  index: mark token column as 'computed' when creating mv
  tests: add checking computed columns in SI

 column_computation.hh                         |  63 ++++++++
 db/schema_features.hh                         |   4 +-
 db/schema_tables.hh                           |   4 +
 idl/frozen_schema.idl.hh                      |   1 +
 schema.hh                                     |  40 +++++
 schema_builder.hh                             |   4 +-
 schema_mutations.hh                           |  18 ++-
 service/storage_service.hh                    |   8 +
 view_info.hh                                  |   2 -
 database.cc                                   |   6 +-
 db/schema_tables.cc                           | 146 ++++++++++++++++--
 db/view/view.cc                               |  46 +++---
 index/secondary_index_manager.cc              |   2 +-
 schema.cc                                     |  58 ++++++-
 schema_mutations.cc                           |  14 +-
 service/storage_service.cc                    |   5 +
 tests/schema_change_test.cc                   |  63 ++++++--
 tests/secondary_index_test.cc                 |  28 ++++
 docs/system_schema_keyspace.md                |  40 +++++

 plus about 200 new test sstable files
2019-07-21 13:05:46 +03:00
Piotr Sarna
4d1eaf8478 tests: add checking computed columns in SI
The test case checks if token column generated for global indexing
is indeed only present in global indexes and is marked as a computed
column.
2019-07-19 11:58:42 +02:00
Piotr Sarna
a8f7d64a08 index: mark token column as 'computed' when creating mv
Secondary indexes use a computed token column to preserve proper
query ordering. This column is now marked as 'computed'.
2019-07-19 11:58:42 +02:00
Piotr Sarna
1c0ef5f9e9 tests: generate new test case for schema change + computed cols
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after computed columns
are allowed, in order to make sure that the digest computed
including computed columns does not change spuriously as well.
2019-07-19 11:58:42 +02:00
Piotr Sarna
1e54752167 docs: init system_schema_keyspace.md with column computations
The documentation file for system_schema keyspace is introduced,
and its first entry describes the column_computation table.
2019-07-19 11:58:42 +02:00
Piotr Sarna
c1d5aef735 db: add system_schema.computed_columns
Information on which columns of a table are 'computed' is now kept
in system_schema.computed_columns system table.
2019-07-19 11:58:42 +02:00
Piotr Sarna
589200f5a2 tests: add schema change test regeneration comment
Schema change test might need regenerating every time a system table
is added. In order to save future developer's time on debugging this
test, a short description of that requirement is added.
2019-07-19 11:58:42 +02:00
Piotr Sarna
03ade01db7 tests: disable computed columns feature in schema change test
In order to make sure that old schema digest is not recomputed
and can be verified - computed columns feature is initially disabled
in schema_change_test.
The reason for that is as follows: running CQL test env assumes that
we are running the newest cluster with all features enabled. However,
the mere existence of some features might influence digest calculation.
So, in order for the existing test to work correctly, it should have
exactly the same set of cluster supported features as it had during
its creation. It used to be "all features", but now it's "all features
except computed columns". One can think of that as running a cluster
with some nodes not yet knowing what computed columns are, so they
are not taken into account when computing digests.
Additionally, a separate test case that takes computed column digest
into account will be generated and added in this series.
2019-07-19 11:58:42 +02:00
Piotr Sarna
17c323c096 database: add fixing previous secondary index schemas
If a schema was created before computed columns were implemented,
its token column may not have been marked as computed.
To remedy this, if no computed column is found, the schema
will be recreated.
The code will work correctly even without this patch in order to support
upgrading from legacy versions, but it's still important: it transforms
token columns from the legacy format to new computed format, which will
eventually (after a few release cycles) allow dropping the support for
legacy format altogether.
2019-07-19 11:58:42 +02:00
Piotr Sarna
3c5dd94306 view: remove unused token_for function
The function was only used once in code removed in this series.
2019-07-19 11:58:42 +02:00
Piotr Sarna
6a6871aa0e view: check for computed columns in view
Currently, having a 'computed' column in view update generation
indicates that token value needs to be generated and assigned to it.
2019-07-19 11:58:42 +02:00
Piotr Sarna
a0e02df36a service: add computed columns feature
Computed columns feature should be checked before creating
index schemas the new way - by adding computed column names
to system_schema.computed_columns.
2019-07-19 11:58:42 +02:00
Piotr Sarna
a1100e3737 schema: allow marking columns as computed in schema builder
In order to be able to transform legacy materialized view definitions,
builder is now able to mark an existing column as computed.
2019-07-19 11:58:41 +02:00
Piotr Sarna
65bf6d34fe schema: add implementation of computing token column
Computed column of 'token' type can now have its value computed.
2019-07-19 11:47:48 +02:00
Piotr Sarna
491b7a817f schema: add computed info to column definition
Some columns may represent not user-provided values, but ones computed
from other columns. Currently an example is token column used in secondary
indexes to provide proper ordering. In order to avoid hardcoding special
cases in execution stage, optional additional information for computed
columns is stored in column definition.
2019-07-19 11:47:46 +02:00
Tomasz Grabiec
7604980d63 database: Add missing partition slicing on streaming reader recreation
streaming_reader_lifecycle_policy::create_reader() was ignoring the
partition_slice passed to it and always creating the reader for the
full slice.

That's wrong because create_reader() is called when recreating a
reader after it's evicted. If the reader stopped in the middle of
partition we need to start from that point. Otherwise, fragments in
the mutation stream will appear duplicated or out of ordre, violating
assumptions of the consumers.

This was observed to result in repair writing incorrect sstables with
duplicated clustering rows, which results in
malformed_sstable_exception on read from those sstables.

Fixes #4659.

In v2:

  - Added an overload without partition_slice to avoid changing existing users which never slice

Tests:

  - unit (dev)
  - manual (3 node ccm + repair)

Backport: 3.1
Reviewd-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>
2019-07-18 18:35:28 +03:00
Asias He
64a4c0ede2 streaming: Do not open rpc stream connection if ranges are not relevant to a shard
Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: #4708
2019-07-18 18:31:21 +03:00
Avi Kivity
51cff8ad23 Merge "Fix storage service for tests" from Botond
"
Fix another source of flakyness in mutation_reader_test. This one is caused by storage_service_for_tests lacking a config::broadcast_to_all_shards() call, triggering an invalid memory access (or SEGFAULT) when run on more than one shards.

Refs: #4695
"

* 'fix_storage_service_for_tests' of https://github.com/denesb/scylla:
  tests: storage_service_for_tests: broadcast config to all shards
  tests: move storage_service_for_tests impl to test_services.cc
2019-07-18 18:27:47 +03:00
Nadav Har'El
997b92a666 migration_manager: allow dropping table and all its views
The function announce_column_family_drop() drops (deletes) a base table
and all the materialized-views used for its secondary indexes, but not
other materialized views - if there are any, the operation refuses to
continue. This is exactly what CQL's "DROP TABLE" needs, because it is
not allowed to drop a table before manually dropping its views.

But there is no inherent reason why it we can't support an operation
to delete a table and *all* its views - not just those related to indexes.
This patch adds such an option to announce_column_family_drop().
This option is not used by the existing CQL layer, but can be used
by other code automating operations programatically without CQL.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190716150559.11806-1-nyh@scylladb.com>
2019-07-18 13:26:25 +02:00
Takuya ASADA
bd7d1b2d38 dist/common/systemd: change stop timeout sec to 900s
Currently scylla-server.service uses DefaultTimeoutStopSec = 90, if Scylla
does not able to clean-shutdown in 90sec we may have data corruption on the node.
Since we already set TimeoutStartSec = 900, we can use TimeoutSec to set both
TimeoutStartSec and TimeoutStopSec to 900.

See #4700

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190717095416.10652-1-syuu@scylladb.com>
2019-07-17 15:37:47 +03:00
Nadav Har'El
759752947b drop_index_statement: fix column_family()
All statement objects which derive from cf_statement, including
drop_index_statement, have a column_family() returning the name of the
column family involved in this statement. For most statement this is
known at the time of construction, because it is part of the statement,
but for "DROP INDEX", the user doesn't specify the table's name - just
the index name. So we need to override column_family() to find the
table name.

The existing implementation assert()ed that we can always find such
a table, but this is not true - for example, in a DROP INDEX with
"IF EXISTS", it is perfectly fine for no such table to exist. In this
case we don't want a crash, and not even an except - it's fine that
we just return an empty table name.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190716180104.15985-1-nyh@scylladb.com>
2019-07-17 09:44:47 +03:00
Glauber Costa
be26cbd952 tests: add tests for snapshots.
While inspecting the snapshot code, I realized that we don't have any
tests for it. So I decided to add some.

Unfortunately I couldn't come up with a test of clearsnapshot reliably
failing to remove the directory: relying on create snapshot +
clearsnapshot is racy (not always happen), and other tricks that can be
used to reproduce this -- like creating a root-owned file inside the
snapshots directory -- is environment-dependent, and a bit ugly for unit
tests. Dtests would probably be a better place for that.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-07-16 13:35:53 -04:00
Glauber Costa
2008d982c3 lister: don't crash the node on failure to remove snapshot
lister::rmdir has two in-tree users: clearing snapshots and clearing
temporary directories during sstable creation. The way it is currently
coded, it wraps the io functions in io_check, which means that failures
to remove the directory will crash the database.

We recently saw how benign failures crashed a database during
clearsnapshot: we had snapshot creation running in parallel, adding more
files to the directory that wasn't empty by the time of deletion.  I
have also seen very often users add files to existing directories by
accident, which is another possibility to trigger that.

This patch removes the io_check from lister, and moves it to the caller
in which we want to be more strict. We still want to be strict about
the creation of temporary directories, since users shouldn't be touching
that in any way.

Fixes #4558

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-07-16 13:35:36 -04:00
Kamil Braun
4417e78125 Fix timestamp_type_impl::timestamp_from_string.
Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00.
Fixes #4641.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-16 19:16:56 +03:00
Asias He
722ab3bb65 repair: Log repair id in check_failed_ranges
Add the word `id` before the repair id in the log. It makes the log
easier to figure out what the number stands for.
2019-07-16 19:10:19 +03:00
Avi Kivity
43690ecbdf Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny
"
disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize
with background deletions done by on_compaction_completion to ensure no sstables will
be created or deleted during reshuffle_sstables after
storage_service::load_new_sstables disables sstable writes.

Fixes #4622

Test: unit(dev), nodetool_additional_test.py migration_test.py
"

* 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla:
  table: document _sstables_lock/_sstable_deletion_sem locking order
  table: disable_sstable_write: acquire _sstable_deletion_sem
  table: uninline enable_sstable_write
  table: reshuffle_sstables: add log message
2019-07-16 19:06:58 +03:00
Amnon Heiman
399d79fc6f init: do not allow replace-address for seeds
If a node is a seed node, it can not be started with
replace-address-first-boot or the replace-address flag.

The issue is that as a seed node it will generate new tokens instead of
replacing the existing one the user expect it to replaec when supplying
the flags.

This patch will throw a bad_configuration_error exception
in this case.

Fixes #3889

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-16 18:53:19 +03:00
Calle Wilund
dbc3499fd1 server: Fix cql notification inet address serialization
Fixes #4717

Bug in ipv6 support series caused inet_address serialization
to include an additional "size" parameter in the address chunk.

Message-Id: <20190716134254.20708-1-calle@scylladb.com>
2019-07-16 16:51:59 +03:00
Botond Dénes
b40cf1c43d tests: storage_service_for_tests: broadcast config to all shards
Due to recent changes to the config subsystem, configuration has to be
broadcast to all shards if one wishes to use it on them. The
`storage_service_for_tests` has a `sharded<gms::gossiper>` member, which
reads config values on initialization on each shard, causing a crash as
the configuration was initialized only on shard 0. Add a call to
`config::broadcast_to_all_shards()` to ensure all shards have access to
valid config values.
2019-07-16 10:37:17 +03:00
Botond Dénes
fc9f46d7c1 tests: move storage_service_for_tests impl to test_services.cc
Let's make it easier to find.
2019-07-16 10:36:49 +03:00
Raphael S. Carvalho
7180731d43 tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:50 -03:00
Raphael S. Carvalho
332c2ff710 table: Make SSTable cleanup run aware
The cleanup procedure will move any sstable out of its sstable run
because sstables are cleaned up individually and they end up receiving
a new run identifier, meaning a table may potentially end up with a
new sstable run for each of the sstables cleaned.

SStable cleanup needs to be run aware, so that the run structure is
not messed up after the operation is done. Given that only one fragment
or other, composing a sstable run, may need cleanup, it's better to keep
them in their original sstable run.

Fixes #4663.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:47 -03:00
Raphael S. Carvalho
8c97e0e43e compaction: introduce constants for compaction descriptor
Make it easier for users, and also avoid duplicating knowledge
about descriptor defaults across the codebase.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:44 -03:00
Raphael S. Carvalho
a1db29e705 compaction: Make it possible to config the identifier of the output sstable run
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:38 -03:00
Raphael S. Carvalho
0e732ed1cf table: do not rely on undefined behavior in cleanup_sstables
It shouldn't rely on argument evaluation order, which is ub.

Fixes #4718.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 23:39:22 -03:00
Paweł Dziepak
060e3f8ac2 mutation_partition: verify row::append_cell() precondition
row::append_cell() has a precondition that the new cell column id needs
to be larger than that of any other already existing cell. If this
precondition is violated the row will end up in an invalid state. This
patch adds assertion to make sure we fail early in such cases.
2019-07-15 23:25:06 +02:00
Botond Dénes
5f22771ea8 tests/mutation_reader_test stabilize test_multishard_combining_reader_non_strictly_monotonic_positions
Currently the
test_multishard_combining_reader_non_strictly_monotonic_positions is
flaky. The test is somewhat unconventional, in that it doesn't use the
same instance of data as the input to the test and as it's expected
output, instead it invokes the method which generates this data
(`make_fragments_with_non_monotonic_positions()`) twice, first to
generate the input, and a secondly to generate the expected output. This
means that the test is prone to any deviation in the data generated by
said method. One such deviation, discovered recently, is that the method
doesn't explicitly specify the deletion time of the generated range
tombstones. This results in this deletion time sometimes differing
between the test input and the expected output. Solve by explicitly
passing the same deletion time to all created range tombstones.

Refs: #4695
2019-07-15 23:24:16 +02:00
Tomasz Grabiec
14700c2ac4 Merge "Fix the system.size_estimates table" from Kamil
Fixes a segfault when querying for an empty keyspace.

Also, fixes an infinite loop on smp > 1. Queries to
system.size_estimates table which are not single-partition queries
caused Scylla to go into an infinite loop inside
multishard_combining_reader::fill_buffer. This happened because
multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for
size_estimates_mutation_reader.

Fixes #4689.
2019-07-15 22:09:30 +02:00
Asias He
8774adb9d0 repair: Avoid deadlock in remove_repair_meta
Start n1, n2
Create ks with rf = 2
Run repair on n2
Stop n2 in the middle of repair
n1 will notice n2 is DOWN, gossip handler will remove repair instance
with n2 which calls remove_repair_meta().

Inside remove_repair_meta(), we have:

```
1        return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) {
2            return rm->stop();
3        }).then([repair_metas, from] {
4            rlogger.debug("Removed all repair_meta for single node {}", from);
5        });
```

Since 3.1, we start 16 repair instances in parallel which will create 16
readers.The reader semaphore is 10.

At line 2, it calls

```
6    future<> stop() {
7       auto gate_future = _gate.close();
8       auto writer_future = _repair_writer.wait_for_writer_done();
9       return when_all_succeed(std::move(gate_future), std::move(writer_future));
10    }
```

The gate protects the reader to read data from disk:

```
11 with_gate(_gate, [] {
12   read_rows_from_disk
13        return _repair_reader.read_mutation_fragment() --> calls reader() to read data
14 })
```

So line 7 won't return until all the 16 readers return from the call of
reader().

The problem is, the reader won't release the reader semaphore until the
reader is destroyed!
So, even if 10 out of the 16 readers have finished reading, they won't
release the semaphore. As a result, the stop() hangs forever.

To fix in short term, we can delete the reader, aka, drop the the
repair_meta object once it is stopped.

Refs: #4693
2019-07-15 21:51:57 +02:00
Benny Halevy
0e4567c881 table: document _sstables_lock/_sstable_deletion_sem locking order
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-15 19:20:35 +03:00
Botond Dénes
135c84c29a tests: add unit tests for the data stream split in compaction 2019-07-15 17:38:00 +03:00
Botond Dénes
719ad51bea random_schema: generate_random_mutations(): ensure partitions are unique
Duplicate partitions can appear as a result of the same partition key
generated more than once. For now we simply remove any duplicates. This
means that in some circumstances there will be less partitions generated
than asked.
2019-07-15 17:38:00 +03:00
Botond Dénes
eaedbed069 tests/random: add get_bool() overload with random engine param 2019-07-15 17:38:00 +03:00
Botond Dénes
057f9aa655 random_schema: add ttl and expiry support
When generating data, the user can now also generate ttls and
expiry for all generated atoms. This happens in a controlled way, via a
generator functor, very similar to how the timestamps are generated.
This functor is also used by `random_schema` to generate `deletion_time`
for all tombstones, so the user now has full control of when all of the
atoms can be GC'd.
2019-07-15 17:38:00 +03:00
Botond Dénes
76a853e345 tests/random_schema: generate_random_mutations(): generate partition tombstone 2019-07-15 17:38:00 +03:00
Botond Dénes
4d9f3e5705 data_model: extend ttl and expiry support 2019-07-15 17:38:00 +03:00
Botond Dénes
96d3c1efb1 random_schema: generate_random_mutations(): restore indentation 2019-07-15 17:38:00 +03:00
Botond Dénes
b26fe76fc1 tests: random_schema: futurize generate_random_mutations()
To avoid reactor stalls when generate many and/or large partitions.
2019-07-15 17:38:00 +03:00
Botond Dénes
cf135c6257 tests/random_schema: generate_random_mutations(): allow customizing generated data
Allow callers to specify the number of partitions generated, as well as
the number of clustering rows and range tombstones generated per
partition.
2019-07-15 17:38:00 +03:00
Botond Dénes
d2930ffa53 tests/random_schema: add assert to make_clustering_key()
Verify that the schema *does* indeed have clustering columns. Better an
assert than a cryptic "division by 0" exception deeper in the call stack.
2019-07-15 17:38:00 +03:00
Botond Dénes
d90ac6bd7b tests/random_schema: generate_random_mutations(): remove engine parameter
Use an internally create instance of random engine. Passing a readily
seeded engine from the outside is pointless now that we have a mechanism
to seed entire test suites with a command line algorithm: the internal
engine is seeded from tests::random, so the seed of the test suite
determines the internal seed as well.

Update the sole user of this method (mutation_writer_test.cc) to not
generate local seeds anymore.
2019-07-15 17:38:00 +03:00
Botond Dénes
fd2f53f292 tests: mutation_writer_test.cc/generate_mutations() -> random_schema.hh/generate_random_mutations()
We plan on allowing other tests to use this method. The first step is to
make it available in a header.
2019-07-15 17:38:00 +03:00
Botond Dénes
7a4a609e88 Introduce Garbage Collected Consumer to Mutation Compactor
Introduce consumer in mutation compactor that will only consume
data that is purged away from regular consumer. The goal is to
allow compaction implementation to do whatever it wants with
the garbage collected data, like saving it for preventing
data resurrection from ever happening, like described in
issue #4531.
noop_compacted_fragments_consumer is made available for users that
don't need this capability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 17:38:00 +03:00
Botond Dénes
4c2781edaa row_marker: add garbage_collector
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
the row_marker when it expired and would be discarded.
The collector param is optional and defaults to nullptr.
2019-07-15 17:38:00 +03:00
Botond Dénes
7db2006162 row_marker: de-inline compact_and_expire() 2019-07-15 17:38:00 +03:00
Botond Dénes
4c7a7ffe8f row: add garbage_collector
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
all atoms that are expired and can would be discarded. The body of
`compact_and_expire()` was changed so that it checks cells' tombstone
coverage before it checks their expiry, so that cells that are both
covered by a tombstone and also expired are not passed to the collector.
The collector is forwarded to
`collection_type_impl::mutation::compact_and_expire()` as well.
The collector param is optional and defaults to nullptr
2019-07-15 17:38:00 +03:00
Botond Dénes
307b48794d collection_type_impl::mutation: compact_and_expire() add collector parameter
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
all atoms that are expired and would be discarded. The body of
`compact_and_expire()` was changed so that it checks cells' tombstone
coverage before it checks their expiry, so that cells that are both
covered by a tombstone and also expired are not passed to the collector.
The collector param is optional and defaults to nullptr. To accommodate
the collector, which needs to know the column id, a new `column_id`
parameter was added as well.
2019-07-15 17:37:55 +03:00
Calle Wilund
1ed9a44396 utils::config_file: Propagare broadcast_to_all_shards to dependent files
Fixes #4713

Modifying config files to use sharded storage misses the fact
that extensions are allowed to add non-member config fields to
the main configuration, typically from "extra" config_file
objects.

Unless those "extra" files are broadcast when main file broadcast,
the values will not be readable from other shards.

This patch propagates the broadcast to all other config files
whose entries are in the top level object. This ensures we
always keep data up to date on config reload.

Message-Id: <20190715135851.19948-1-calle@scylladb.com>
2019-07-15 17:02:09 +03:00
Nadav Har'El
9cc9facbea configure.py: atomically overwrite build.ninja
configure.py currently takes some time to write build.ninja. If the user
interrupts (e.g., control-C) configure.py, it can leave behind a partial
or even empty build.ninja file. This is most frustrating when the user
didn't explicitly run "configure.py", but rather just ran "ninja" and
ninja decided to run configure.py, and after interrupting it the user
cannot run "ninja" again because build.ninja is gone. Another result of
losing build.ninja is that the user now needs to remember which parameters
to run "configure.py", because the old ones stored in build.ninja were lost.

The solution in this patch is simple: We write the new build.ninja contents
into a temporary file, not directly into build.ninja. Then, only when the
entire file has been succesfully written, do we rename the temporary file
to its intended name - build.ninja.

Fixes #4706

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190715122129.16033-1-nyh@scylladb.com>
2019-07-15 15:34:48 +03:00
Botond Dénes
5002ebb73f Introduce compaction_garbage_collector interface
This interface can be used to implement a garbage collector that
collects atoms that are purged due to expiry during compaction.
The intended usage is collecting purged atoms for safekeeping until the
compaction process finishes safely, to be dropped only at the end when
the compaction is known to have finished successfully.
2019-07-15 15:30:43 +03:00
Eliran Sinvani
997a146c7f auth: Prevent race between role_manager and pasword_authenticator
When scylla is started for the first time with PasswordAuthenticator
enabled, it can be that a record of the default superuser
will be created in the table with the can_login and is_superuser
set to null. It happens because the module in charge of creating
the row is the role manger and the module in charge of setting the
default password salted hash value is the password authenticator.
Those two modules are started together, it the case when the
password authenticator finish the initialization first, in the
period until the role manager completes it initialization, the row
contains those null columns and any loging attempt in this period
will cause a memory access violation since those columns are not
expected to ever be null. This patch removes the race by starting
the password authenticator and autorizer only after the role manger
finished its initialization.

Tests:
  1. Unit tests (release)
  2. Auth and cqlsh auth related dtests.

Fixes #4226

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>
2019-07-14 16:19:57 +03:00
Rafael Ávila de Espíndola
67c624d967 Add documentation for large_rows and large_cells
Fixes #4552

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190614151907.20292-1-espindola@scylladb.com>
2019-07-12 19:21:26 +03:00
Amnon Heiman
1c6dec139f API: compaction_manager add get pending tasks by table
The pending tasks by table name API return an array of pending tasks by
keyspace/table names.

After this patch the following command would work:
curl -X GET 'http://localhost:10000/compaction_manager/metrics/pending_tasks_by_table'

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-12 19:21:26 +03:00
Takuya ASADA
842f75d066 reloc: provide libthread_db.so.1 to debug thread on gdb
In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug
but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc
since it's not available on ldd result with scylla binary.

To debug thread, we need to add the library in a relocatable package manually.

Fixes #4673

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190711111058.7454-1-syuu@scylladb.com>
2019-07-12 19:21:26 +03:00
Piotr Sarna
ac7531d8d9 db,hints: decouple in-flight hints limits from resource manager
The resource manager is used to manage common resources between
various hints managers. In-flight hints used to be one of the shared
resources, but it proves to cause starvation, when one manager eats
the whole limit - which may be especially painful if the background
materialized views hints manager starves the regular hints manager,
which can in turn start failing user writes because of admission control.
This patch makes the limit per-manager again,
which effectively reverts the limit to its original behavior.

Fixes #4483
Message-Id: <8498768e8bccbfa238e6a021f51ec0fa0bf3f7f9.1559649491.git.sarna@scylladb.com>
2019-07-12 19:21:26 +03:00
Rafael Ávila de Espíndola
4e7ffb80c0 cql: Fix use of UDT in reversed columns
We were missing calls to underlying_type in a few locations and so the
insert would think the given literal was invalid and the select would
refuse to fetch a UDT field.

Fixes #4672

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190708200516.59841-1-espindola@scylladb.com>
2019-07-12 19:21:26 +03:00
Kamil Braun
60a4867a5b Fix infinite looping when performing a range query on system.size_estimates.
Queries to system.size_estimates table which are not single parition queries
caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer.
This happened because multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for size_estimates_mutation_reader.
This commit fixes the issue and closes #4689.
2019-07-12 18:09:15 +02:00
Kamil Braun
ba5a02169e Fix segmentation fault when querying system.size_estimates for an empty keyspace. 2019-07-12 18:02:10 +02:00
Kamil Braun
a1665b74a9 Refactor size_estimates_virtual_reader
Move the implementation of size_estimates_mutation_reader
to a separate compilation unit to speed up compilation times
and increase readability.

Refactor tests to use seastar::thread.
2019-07-12 17:53:00 +02:00
Benny Halevy
6dad9baa1c table: disable_sstable_write: acquire _sstable_deletion_sem
`disable_sstable_write` needs to acquire `_sstable_deletion_sem`
to properly synchronize with background deletions done by
`on_compaction_completion` to ensure no sstables will be created
or deleted during `reshuffle_sstables` after
`storage_service::load_new_sstables` disables sstable writes.

Fixes #4622

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-11 12:14:44 +03:00
Benny Halevy
bbbd749f70 table: uninline enable_sstable_write
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-11 12:14:44 +03:00
Benny Halevy
c6bad3f3c2 table: reshuffle_sstables: add log message
To mark the point in time writes are disabled and
scanning of the data directory is beginning.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-11 12:14:44 +03:00
Asias He
aa8d7af4f0 repair: Enable rpc stream in row level repair
Add the row_level_diff_detect_algorithm::send_full_set_rpc_stream as
supported algo. If both repair master and followers support it, the
master will use the rpc stream interface, otherwise use the old rpc verb
interface.
2019-07-11 08:59:48 +08:00
Asias He
38b72b398b repair: Wrap with foreign_ptr to avoid cross cpu free
The moved set_diff and rows will be freed on the target cpu instead of the
source cpu, which will cause a lot of cross-cpu frees.

To fix, wrap them in foreign_ptr.
2019-07-11 08:59:48 +08:00
Asias He
06c84be257 repair: Futurize get_repair_rows_size and row_buf_size
To prevent stall when number of rows inside row buf is large.
2019-07-11 08:36:39 +08:00
Asias He
809c992b30 repair: Avoid calling get_repair_rows_size in get_sync_boundary
Instead of calling get_repair_rows_size() which might stall with large
number of rows, return the size of the rows from read_rows_from_disk.
2019-07-11 08:36:39 +08:00
Asias He
4d41f8e57e repair: Futurize row_buf_csum
To prevent stall when number of rows inside row buf is large.
2019-07-11 08:36:39 +08:00
Asias He
0ef167c9c8 repair: Yield inside get_set_diff
get_set_diff always runs inside a thread, so we can
thread::maybe_yield() to avoid stall.
2019-07-11 08:36:39 +08:00
Asias He
f871d9edd4 repair: Use get_repair_rows_size helper in get_sync_boundary
We have a helper get_repair_rows_size to get the row size in the list.
2019-07-11 08:36:39 +08:00
Asias He
ccbc9fb0ca repair: Avoid stall in do_estimate_partitions_on_local_shard
Do not use boost::accumulate which does not yield. Use do_for_each for
each sstable to avoid stall.
2019-07-11 08:36:39 +08:00
Asias He
b7b5cb33e8 remove get_row_diff 2019-07-11 08:36:39 +08:00
Rafael Ávila de Espíndola
281f3a69f8 mc writer: Fix exception safety when closing _index_writer
This fixes a possible cause of #4614.

From the backtrace in that issue, it looks like a file is being closed
twice. The first point in the backtrace where that seems likely is in
the MC writer.

My first idea was to add a writer::close and make it the responsibility
of the code using the writer to call it. That way we would move work
out of the destructor.

That is a bit hard since the writer is destroyed from
flat_mutation_reader::impl::~consumer_adapter and that would need to
get a close function too.

This patch instead just fixes an exception safety issue. If
_index_writer->close() throws, _index_writer is still valid and
~writer will try to close it again.

If the exception was thrown after _completed.set_value(), that would
explain the assert about _completed.set_value() being called twice.

With this patch the path outside of the destructor now moves the
writer to a local variable before trying to close it.

Fixes #4614
Message-Id: <20190710171747.27337-1-espindola@scylladb.com>
2019-07-10 19:27:19 +02:00
Paweł Dziepak
eb7d17e5c5 lsa: make sure align_up_for_asan() doesn't cause reads past end of segment
In debug mode the LSA needs objects to be 8-byte aligned in order to
maximise coverage from the AddressSanitizer.

Usually `close_active()` creates a dummy objects that covers the end of
the segment being closed. However, it the last real objects ends in the
last eight bytes of the segment then that dummy won't be created because
of the alignment requirements. This broke exit conditions on loops
trying to read all objects in the segment and caused them to attempt to
dereference address at the end of the segment. This patch fixes that.

Fixes #4653.
2019-07-10 19:19:24 +02:00
Avi Kivity
e32bdb6b90 Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil
"
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.

Resolves #4481 and #4651.
"

* 'multidc' of https://github.com/kbr-/scylla:
  Warn user about using SimpleStrategy with Multi DC deployment
  Add warning support to the CQL binary protocol implementation
2019-07-10 16:47:07 +03:00
Avi Kivity
138b28ae43 Merge "Fix command line parsing and add logging." from Kamil
"
Fixes #4203 and #4141.
"

* 'cmdline' of https://github.com/kbr-/scylla:
  Add logging of parsed command line options
  Fix command line argument parsing in main.
2019-07-10 16:40:57 +03:00
Avi Kivity
405fd517b0 Merge "IPv6 support" from Calle
"
Fixes #2027

Modifies inet address type in scylla to use seastar::net::inet_address,
and removes explicit use of ipv4_addr in various network code in favour
of socket_address. Thus capable of resolving and binding to ipv6.

Adds config option to enable/disable ipv6 (default enabled), so
upgrading cluster can continue to work while running mixed version
nodes (since gossip message address serialization becomes different).
"

* 'calle/ipv6' of https://github.com/elcallio/scylla:
  test-serialization: Add small roundtrip test for inet address (v4 + v6)
  inet_address/init: Make ipv6 default enabled
  db::config: Add enable ipv6 switch (default off)
  gms::inet_address: Make serialization ipv6 aware
  Remove usage of inet_address::raw_addr()
  Replace use of "ipv4_addr" with socket_address
  inet_address: Add optional family to lookup
  gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
  types: Add ipv6_address support
2019-07-10 15:07:56 +03:00
Benny Halevy
b4dc118639 tests: logalloc_test: scale down test_region_groups
Post commit b3adabda2d
(Reduce logalloc differences between debug and release)
logalloc_test's memory footprint has grown, in particular
in test_region_groups, and it triggers the oom killer on
our test automation machines.

This patch scales down this test case so it requires less memory.

Fixes #4669

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-10 12:06:10 +02:00
Pekka Enberg
bb53c109b4 test.py: Add option for repeating test execution
This adds a '--repeat N' command line option to test.py, which can be
used to execute the tests N times. This is useful for finding flakey
tests, for example.

Message-Id: <20190710092115.15960-1-penberg@scylladb.com>
2019-07-10 12:42:39 +03:00
Botond Dénes
ce647fac9f timestamp_based_splitting_writer: fix the handling of partition tombstone
Currently the handling of partition tombstones is broken in multiple
ways:
* The partition-tombstone is lost when the bucket is calculated for its
timestamp (due to a misplaced `std::exchange()`).
* When the `partition_start` fragment (containing the partition
tombstone) is actually written to the bucket we emit another
`partition_start` fragment before it because the bucket has not seen
that partition before and we fail to notice that we are actually writing
the partition header.

This bug was allowed to fly under the radar because the unit test was
accidentally not creating partition tombstones in the generated data
(due to a mistake). It was discovered while working on unit tests for
another test and fixing the data generation function to actually
generate partition tombstones.

This patch fixes both problems in the handling of partition tombstones
but it doesn't yet fixes the test. That is deferred until the patch
series which uncovered this bug is merged to avoid merge conflicts.
The other series mentioned here is: [PATCH v6 00/15] compaction: allow
collecting purged data

Fixes: #4683

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190710092427.122623-1-bdenes@scylladb.com>
2019-07-10 12:36:57 +03:00
Pekka Enberg
e6cc90aa98 test: add 'eventually' block to index paging test (#4681)
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670

Tests: unit(dev)
2019-07-10 11:46:03 +03:00
Kamil Braun
d6736a304a Add metric for failed memtable flushes
Resolves #3316.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-10 11:30:10 +03:00
Amnon Heiman
2fbc5ea852 config_file.hh: get_value return a pointer to the value
The get_value method returns a pointer to the value that is used by the
value_to_json method.

The assumption is that the void pointer points to the actual value.

Fixes #4678

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-10 10:40:35 +03:00
Piotr Sarna
ebbe038d19 test: add 'eventually' block to index paging test
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670
2019-07-09 17:07:16 +02:00
Asias He
39ca044dab repair: Allow repair when a replica is down
Since commit bb56653 (repair: Sync schema from follower nodes before
repair), the behaviour of handling down node during repair has been
changed.  That is, if a repair follower is down, it will fail to sync
schema with it and the repair of the range will be skipped. This means
a range can not be repaired unless all the nodes for the replicas are up.

To fix, we filter out the nodes that is down and mark the repair is
partial and repair with the nodes that are still up.

Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test
Fixes: #4616
Backports: 3.1

Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>
2019-07-09 10:07:36 +03:00
Konstantin Osipov
56f3bda4c7 metrics: introduce a metric for non-local reads
A read which arrived to a non-replica and had to be forwarded to a
replica by the coordinator is accounted in an own metric,
reads_coordinator_outside_replica_set.
Most often such read is produced by a driver which is unaware of
token distribution on the ring.

If a read was forwarded to another replica due to heat weighted
load balancing or query preference set by the user, it's not accounted
in the metric.

In case of a multi-partition read (a query using IN statement,
e.g. x in (1, 2, 3)), if any of the keys is read from a
non-local node the read is accounted as a non-local.
The rationale behind it is that if the user tries to be careful and send
IN queries only to the same vnode, they are rewarded with the counter
staying at zero, while if they send multi-partition IN queries without
any precautions, they will see the metric go up which gives them a
starting point for investigating performance problems.

Closes #4338
2019-07-08 19:23:38 +03:00
Calle Wilund
5dfc356380 test-serialization: Add small roundtrip test for inet address (v4 + v6)
Verify we get back what we put in.
2019-07-08 15:28:21 +00:00
Konstantin Osipov
da1d1b74da metrics: account writes forwarded by a coordinator in an own metric.
Add a metric to account writes which arrived to a non-replica and
had to be forwarded by a coordinator to a replica.

The name of the added metric is 'writes_coordinator_outside_replica_set'.

Do not account forwarded read repair writes, since they are already
accounted by a reads_coordinator_outside_replica_set metric, added in a
subsequent patch.

In scope of #4338.
2019-07-08 18:17:48 +03:00
Calle Wilund
3cfb79e0ff inet_address/init: Make ipv6 default enabled
Makes lookup find any (incl ipv6 numeric) address.
Init will look at enable_ipv6 and use explcit ipv4 family lookup if not
enabled.
2019-07-08 14:13:10 +00:00
Calle Wilund
1f5e1d22bf db::config: Add enable ipv6 switch (default off)
Off by default to prevent problems during cluster migration when
needing to gossip with non-ipv6 aware nodes.
2019-07-08 14:13:09 +00:00
Calle Wilund
c540e36fe2 gms::inet_address: Make serialization ipv6 aware
Because inet_address was initially hardcoded to
ipv4, its wire format is not very forward compatible.
Since we potentially need to communicate with older version nodes, we
manually define the new serial format for inet_address to be:

ipv4: 4  bytes address
ipv6: 4  bytes marker 0xffffffff (invalid address)
      16 bytes data -> address
2019-07-08 14:13:09 +00:00
Calle Wilund
e9816efe06 Remove usage of inet_address::raw_addr() 2019-07-08 14:13:09 +00:00
Calle Wilund
4ef940169f Replace use of "ipv4_addr" with socket_address
Allows the various sockets to use ipv6 address binding if so configured.
2019-07-08 14:13:09 +00:00
Calle Wilund
5ba545f493 inet_address: Add optional family to lookup 2019-07-08 14:13:09 +00:00
Calle Wilund
5fd811ec8a gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
Thusly handle all types net::inet_address can handle. I.e. ipv6.
2019-07-08 14:13:09 +00:00
Calle Wilund
482fd72ca2 types: Add ipv6_address support
As ipv4, just redirect to inet_address.
2019-07-08 14:09:25 +00:00
Asias He
b7abaa04da repair: Futurize get_row_diff to avoid stall
The copy of _working_row_buf and boost::copy_range can stall if the
number of rows are big. Futurize get_row_diff to avoid stall.
2019-07-08 15:22:16 +08:00
Asias He
a4b24e44a3 repair: Fix possible stall in request_row_hashes
The std::find_if and std::copy can stall if the number of rows are big.
Introduce a helper move_row_buf_to_working_row_buf to move the rows
that yields to avoid stall.
2019-07-08 15:22:16 +08:00
Asias He
b48dc42e73 repair: Allow default construct for repair_row
All members of repair_row are now optional. Enable the default
constructor so that _row_buf.resize() can work.
2019-07-08 15:22:16 +08:00
Asias He
18fb0714a0 repair: Remove apply_rows
It is not used any more. The user now calls
apply_rows_on_master_in_thread and apply_rows_on_follower instead.
2019-07-08 15:22:16 +08:00
Asias He
882530ce26 repair: Run get_row_diff_with_rpc_stream in a thread
So that we can make get_row_diff_source_op run inside a thread, in turn
it can now call apply_rows_on_master_in_thread which eliminates stall.
2019-07-08 15:22:16 +08:00
Asias He
948b833d74 repair: Run get_row_diff_and_update_peer_row_hash_sets inside a thread
So it can use apply_rows_on_master_in_thread which eliminates stall.
2019-07-08 15:22:16 +08:00
Asias He
7f29d13984 repair: Run get_row_diff inside a thread
So it can use apply_rows_on_master_in_thread which elimiates stall.
2019-07-08 15:22:16 +08:00
Asias He
6b2e3946fb repair: Add apply_rows_on_master_in_thread
Like apply_rows, except it runs inside a thread and runs on master node
only.
2019-07-08 15:22:16 +08:00
Asias He
7c6a29027f repair: Add apply_rows_on_follower
Add a version for apply_rows on follower node only.
2019-07-08 15:22:16 +08:00
Asias He
cc14c6e0c4 repair: Futurize working_row_hashes
To avoid stall when the number of rows is big.
2019-07-08 15:22:16 +08:00
Asias He
f3d2ba6ec7 repair: Remove get_full_row_hashes helper
It is a single wrapper for working_row_hashes and is used only once. Remove it.
2019-07-08 15:22:16 +08:00
Benny Halevy
a0499bbd31 lister::guarantee_type: do not follow symlink
Simliar to commit 9785754e0d
lister::guarantee_type needs to check the entry's type,
not the symlink it may point to.

Fixes #4606

The nodetool_refresh_with_wrong_upload_modes_test dtest creates a broken
symlink and following it fails, as it should, with the default follow_symlink::yes

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190626110734.4558-1-bhalevy@scylladb.com>
2019-07-07 15:29:28 +03:00
Avi Kivity
63edd46562 Merge "Expand big decimal with arithmetic operators" from Piotr
"
This miniseries expands big_decimal interface with convenience operators
(-=, +, -), provides test cases for it and makes one of the constructors
explicit.

Tests: unit(dev)
"

* 'expand_big_decimal_interface' of https://github.com/psarna/scylla:
  utils: make string-based big decimal constructor explicit
  tests: add more operators to big decimal tests
  utils: add operators to big_decimal
2019-07-06 12:26:08 +03:00
Avi Kivity
24caf0824d Merge "Complete the LIKE operator" from Dejan
"
Implement LIKE parsing, intermediate representation, and query processing. Add tests
for this implementation (leaving the LIKE functionality tests in
tests/like_matcher_test.cc).

Refs #4477.
"

* 'finish-like' of https://github.com/dekimir/scylla:
  cql3: Add LIKE operator to CQL grammar
  cql3: Ensure LIKE filtering for partition columns
  cql3: Add LIKE restriction
  cql3: Add LIKE relation
2019-07-06 12:26:08 +03:00
kbr-
8995945052 Implement tuple_type_impl::to_string_impl. (#4645)
Resolves #4633.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-06 12:26:08 +03:00
Avi Kivity
187859ad78 review-checklist: mention that the guidelines are not absolute rules and can be overridden 2019-07-06 12:26:08 +03:00
Kamil Braun
c0915c40eb Warn user about using SimpleStrategy with Multi DC deployment
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.
Resolves #4481.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:25:03 +02:00
Kamil Braun
35dbe9371c Add warning support to the CQL binary protocol implementation
The CQL binary protocol v4 adds support for server-side warnings:
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec
This adds a convenient API to add warnings to messages returned to the user.
Resolves #4651.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:24:56 +02:00
Kamil Braun
2f0f53ac72 Add logging of parsed command line options
The recognized command line options are now being printed when Scylla is run,
together with the whole command used.
Fixes #4203.
2019-07-05 09:00:28 +02:00
Piotr Sarna
eed2543bcc utils: make string-based big decimal constructor explicit
As a rule of thumb, single-parameter constructors should be explicit
in order to avoid unexpected implicit conversions.
2019-07-04 11:33:00 +02:00
Piotr Sarna
7e722f8dd5 tests: add more operators to big decimal tests 2019-07-04 11:32:57 +02:00
Piotr Sarna
a5e41408ec utils: add operators to big_decimal
For convenience, operators -=, + and - are implemented on top of +=.
2019-07-04 11:32:53 +02:00
Dejan Mircevski
6727e8f073 cql3: Add LIKE operator to CQL grammar
Extend the grammar with LIKE and add CQL query tests for it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 11:01:13 +02:00
Dejan Mircevski
1c583de8bb cql3: Ensure LIKE filtering for partition columns
Partition columns are implicitly filtered whenever possible, avoiding
expensive post-processing.  But there are exceptions, eg, when
partition key is only partially restricted, or for CONTAINS
expressions.  Here we add LIKE to this list of exceptions.

Also fix compute_bounds() to punt on LIKE restrictions, which cannot
be translated into meaningful bounds.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:59:13 +02:00
Dejan Mircevski
63cec653e5 cql3: Add LIKE restriction
This restriction leverages like_matcher to perform filtering.

Make single_column_relation::new_LIKE_restriction() return this new
restriction.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:58:56 +02:00
Dejan Mircevski
21d7722594 cql3: Add LIKE relation
Add a new type of relation with operator LIKE.  Handle it in
relation::to_restriction by introducing a new virtual method for it.
The temporary implementation of this method returns null; that will be
replaced in a subsequent patch.

Add abstract_type::is_string() to recognize string columns and
disallow LIKE operator on non-string columns.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:54:30 +02:00
Kamil Braun
f155a2d334 Fix command line argument parsing in main.
Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run.
The first parse is there to check if the "--version" flag is present --- in this case the version is printed
and the program exists.  The second parsing is correct; however, most of the arguments were improperly treated
as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument).
This happened because the arguments weren't known to the command line parser.
This commit fixes the issue by moving the parsing code until after the arguments are registered.
Resolves #4141.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-03 14:11:34 +02:00
Avi Kivity
8a0c4d508a Merge "Repair switch to rpc stream" from Asias
"
The put_row_diff, get_row_dif and get_full_row_hashes verbs are switched
to use rpc stream instead of rpc verb. They are the verbs that could
send big rpc messages. The rpc stream sink and source are created per
repair follower for each of the above 3 verbs. The sink and source are
shared for multiple requests during the entire repair operation for a
given range, so there is no overhead to setup rpc stream.

The row buffer is now increased to 32MiB from 256KiB, giving better
bandwidth in high latency links. The downside of bigger row buffer is
reduced possibility that all the rows inside a row buffer are identical.
This causes more full hashes to be exchanged. To address this issue, the
plan is to add better set reconciliation algorithm in addition to the
current send full hashes.

I compared rebuild using regular stream plan with repair using rpc
stream. With 2 nodes, 1 smp, 8M rows, delete all data on one of the
node before repair or rebuild.

    repair using seastar rpc verb

Time to complete: 82.17s

    rebuild using regular streaming which uses seastar rpc stream

Time to complete: 63.87s

    repair using seastar rpc stream

Time to complete: 68.48s

For 1) and 3), the improvement is 16.6% (repair using rpc verb v.s. repair using rpc stream)

For 2) and 3), the difference is 7.2% (repair v.s. stream)

The result is promising for the future repair-based bootstrap/replace node operations.

NOTE: We do not actually enable rpc stream in row level repair for now. We
will enable it after we fix the the stall issues caused by handling
bigger row buffers.

Fixes #4581
"

* 'repair_switch_to_rpc_stream_v9' of https://github.com/asias/scylla: (45 commits)
  docs: Add RPC stream doc for row level repair
  repair: Mark some of the helper functions static
  repair: Increase max row buf size
  repair: Hook rpc stream version of verbs in row level repair
  repair: Add use_rpc_stream to repair_meta
  repair: Add is_rpc_stream_supported
  repair: Add needs_all_rows flag to put_row_diff
  repair: Optimize get_row_diff
  repair: Register repair_get_full_row_hashes_with_rpc_strea
  repair: Register repair_put_row_diff_with_rpc_stream
  repair: Register repair_get_row_diff_with_rpc_stream
  repair: Add repair_get_full_row_hashes_with_rpc_stream_handler
  repair: Add repair_put_row_diff_with_rpc_stream_handler
  repair: Add repair_get_row_diff_with_rpc_stream_handler
  repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op
  repair: Add repair_put_row_diff_with_rpc_stream_process_op
  repair: Add repair_get_row_diff_with_rpc_stream_process_op
  repair: Add put_row_diff_with_rpc_stream
  repair: Add put_row_diff_sink_op
  repair: Add put_row_diff_source_op
  ...
2019-07-03 10:08:55 +03:00
Asias He
f686f0b9d6 docs: Add RPC stream doc for row level repair
This documents RPC stream usage in row level repair.
2019-07-03 08:09:57 +08:00
Asias He
78ae5af203 repair: Mark some of the helper functions static
They are used only inside repair/row_level.cc. Make them static.
2019-07-03 08:09:57 +08:00
Asias He
e8c13444ba repair: Increase max row buf size
If the cluster supports row level repair with rpc stream interface, we
can use bigger row buf size to have better repair bandwidth in high
latency links.
2019-07-03 08:01:37 +08:00
Asias He
7d08a8d223 repair: Hook rpc stream version of verbs in row level repair
If rpc stream is supported, use the rpc stream version of the
get_row_diff, put_row_diff, get_full_row_hashes.
2019-07-03 08:01:37 +08:00
Asias He
fccaa0324f repair: Add use_rpc_stream to repair_meta
Determine if rpc stream should be used.
2019-07-03 08:01:37 +08:00
Asias He
7bf0c646be repair: Add is_rpc_stream_supported
Given a row_level_diff_detect_algorithm, return if this algo supports
rpc stream interface.
2019-07-03 08:01:04 +08:00
Asias He
1c92643f02 repair: Add needs_all_rows flag to put_row_diff
So we can avoid copy _working_row_buf in get_row_diff on master node if
there is only one follower node and all repair rows are needed by
follower node.
2019-07-03 07:56:22 +08:00
Asias He
6595417567 repair: Optimize get_row_diff
Move _working_row_buf instead of copy if it is follower node or
it is master node with only one follow. In these cases, the
_working_row_buf will not be used after this function, so we can move
it.
2019-07-03 07:56:22 +08:00
Asias He
c4eb0ee361 repair: Register repair_get_full_row_hashes_with_rpc_strea
Register the get_full_row_hashes rpc stream verb.
2019-07-03 07:56:22 +08:00
Asias He
b56cced5b8 repair: Register repair_put_row_diff_with_rpc_stream
Register the put_row_diff rpc stream verb.
2019-07-03 07:56:22 +08:00
Asias He
67130031b1 repair: Register repair_get_row_diff_with_rpc_stream
Register the get_row_diff rpc stream verb.
2019-07-03 07:56:22 +08:00
Asias He
f255f902bd repair: Add repair_get_full_row_hashes_with_rpc_stream_handler
It is the handler for the get_full_row_hashes rpc stream verb on the
receiving side.
2019-07-03 07:56:17 +08:00
Asias He
e3267ad98c repair: Add repair_put_row_diff_with_rpc_stream_handler
It is the handler for the put_row_diff rpc stream verb on the
receiving side.
2019-07-03 07:55:24 +08:00
Asias He
06ac014261 repair: Add repair_get_row_diff_with_rpc_stream_handler
It is the handler for the get_row_diff rpc stream verb on the receiving
side.
2019-07-03 07:54:43 +08:00
Asias He
5f25969da3 repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op
It is the helper for the get_full_row_hashes rpc stream verb handler.
2019-07-03 07:54:03 +08:00
Asias He
39d5a9446e repair: Add repair_put_row_diff_with_rpc_stream_process_op
It is the helper for the put_row_diff rpc stream verb handler.
2019-07-03 07:53:21 +08:00
Asias He
049e793fe5 repair: Add repair_get_row_diff_with_rpc_stream_process_op
It is the helper for the get_row_diff rpc stream verb handler.
2019-07-03 07:52:12 +08:00
Avi Kivity
fca1ae69ff database: convert _cfg from a pointer to a reference
_cfg cannot be null, so it can be converted to a reference to
indicate this. Follow-up to fe59997efe.
2019-07-02 17:57:50 +02:00
Calle Wilund
f317d7a975 commitlog: Simplify commitlog extension iteration
Fixes #4640

Iterating extensions in commitlog.cc should mimic that in sstables.cc,
i.e. a simple future-chain. Should also use same order for read and
write open, as we should preserve transformation stack order.

Message-Id: <20190702150028.18042-1-calle@scylladb.com>
2019-07-02 18:37:44 +03:00
Takuya ASADA
332a6931c4 dist/redhat: fix install path of scripts
On recent changes install.sh mistakenly copies dist/common/scripts to
/opt/scylladb/scripts/scripts, it should be /opt/scylladb/scripts.
Same on /opt/scylladb/scyllatop as well.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190702120030.13729-1-syuu@scylladb.com>
2019-07-02 17:29:33 +03:00
Asias He
b1188f299e repair: Add put_row_diff_with_rpc_stream
It is rpc stream version of put_row_diff. It uses rpc stream instead of
rpc verb to put the repair rows to follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
31b30486a7 repair: Add put_row_diff_sink_op
It is a helper that works on the sink() of the put_row_diff
rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
dbe035649b repair: Add put_row_diff_source_op
It is a helper that works on the source() of the put_row_diff rpc
stream verb.
2019-07-02 21:22:41 +08:00
Asias He
72d3563da1 repair: Add get_row_diff_with_rpc_stream
It is rpc stream version of get_row_diff. It uses rpc stream instead of
rpc verb to get the repair rows from follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
4cb44baa08 repair: Add get_row_diff_sink_op
It is a helper that works on the sink() of the get_row_diff rpc
stream verb.
2019-07-02 21:22:41 +08:00
Asias He
a1e19514f9 repair: Add get_row_diff_source_op
It is a helper that works on the source() of the
get_row_diff rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
473bd7599c repair: Add get_full_row_hashes_with_rpc_stream
It is rpc stream version of get_full_row_hashes. It uses rpc stream
instead of rpc verb to get the repair hashes data from follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
1e2a598fe7 repair: Add get_full_row_hashes_sink_op
It is a helper that works on the sink() of the get_full_row_hashes
rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
149c54b000 repair: Add get_full_row_hashes_source_op
It is a helper that works on the source() of the
get_full_row_hashes rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
b3e7299032 repair: Add sink and source object into repair_meta
They will soon be used to sync repair hashes and repair rows bewteen
master and follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
acd40fd529 repair: Add sink_source_for_put_row_diff
Use sink_source_for_repair to define sink_source_for_put_row_diff
with sink = repair_row_on_wire_with_cmd, source = repair_stream_cmd
for REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
4405f7a6ff repair: Add sink_source_for_get_row_diff
Use sink_source_for_repair to define sink_source_for_get_row_diff with
sink = repair_hash_with_cmd, source = repair_row_on_wire_with_cmd for
REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
0bffd07e7e repair: Add sink_source_for_get_full_row_hashes
Use the sink_source_for_repair to define
sink_source_for_get_full_row_hashes with sink = repair_stream_cmd,
source = repair_hash_with_cmd for
REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
8400dafa12 repair: Add sink_source_for_repair helper class
It is used to store the sink and source objects for the rpc stream verbs
used by row level repair.
2019-07-02 21:22:41 +08:00
Asias He
37b3de4ea0 messaging_service: Add REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM support
It is used by row level repair.
2019-07-02 21:18:55 +08:00
Asias He
a7c7ba9765 messaging_service: Add REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM support
It is used by row level repair.
2019-07-02 21:18:55 +08:00
Asias He
dc92bda93b messaging_service: Add REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM support 2019-07-02 21:18:55 +08:00
Asias He
f312c95b74 messaging_service: Add do_make_sink_source helper
It is used by the row level repair rpc stream verbs to make sink and
source object.
2019-07-02 21:18:55 +08:00
Asias He
bc295a00a6 messaging_service: Add rpc stream verb for row level repair
- REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM

Get repair rows from follower nodes

- REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM

Put repair rows to follower nodes

- REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM:

Get full hashes from follower nodes
2019-07-02 21:18:55 +08:00
Asias He
c93113f3a5 idl: Add repair_row_on_wire_with_cmd 2019-07-02 21:18:54 +08:00
Asias He
a90fb24efc idl: Add repair_hash_with_cmd 2019-07-02 21:18:37 +08:00
Asias He
599d40fbe9 idl: Add repair_stream_cmd 2019-07-02 21:18:15 +08:00
Asias He
672c24f6b0 idl: Add send_full_set_rpc_stream for row_level_diff_detect_algorithm 2019-07-02 21:17:36 +08:00
Avi Kivity
c987397e52 transport: reject initial frames with wild body sizes (#4620)
If someone opens a connection to port 9042 and sends some random bytes,
there is a 1 in 64 probability we'll recognize it as a valid frame
(since we only check the version byte, allowing versions 1-4) and we'll
try to read frame.length bytes for the body. If this value is very large,
we'll run out of memory very quickly.

Fix this by checking for reasonable body size (100kB). The initial message
must be a STARTUP, whose body is a [string map] of options, of which just
three are recognized. 100kB is plenty for future expansion.

Note that this does not replace true security on listening ports and
only serves to protect against mistakes, not attacks. An attacker can
easily exhaust server memory by opening many connections and trickle-feeding
them small amounts of data so they appear alive.

We can't use the config item native_transport_max_frame_size_in_mb,
because that can be legitimately large (and the default is atrocious,
256MB).

Fixes #4366.
2019-07-01 19:02:34 +02:00
Tomasz Grabiec
eb496b5eae Merge "Allow changing configuration at runtime" from Avi
This patchset allows changing the configuration at runtime, The user
triggers this by editing the configuration file normally, then
signalling the database with SIGHUP (as is traditional).

The implementation is somewhat complicated due the need to store
non-atomic mutable state per-shard and to synchronize the values in
all shards. This is somewhat similar to Seastar's sharded<>, but that
cannot be used since the configuration is read before Seastar is
initialized (due to the need to read command-line options).

Tests: unit (dev, debug), manual test with extra prints (dev)

Ref #2689
Fixes #2517.
2019-07-01 15:04:59 +02:00
Avi Kivity
28a514820d Update seastar submodule
* seastar a5b9f77d52...44a300cd50 (1):
  > build: fix dpdk library link order

Should fix the build with dpdk enabled.
2019-07-01 11:56:59 +03:00
Takuya ASADA
02c6db29c8 dist/debian: manage *.pyc as a part of package
Since 828b63f4fb only add *.pyc on .rpm
package, we also need it to .deb package.

See #4612

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190629023739.8472-1-syuu@scylladb.com>
2019-06-30 15:54:42 +03:00
Avi Kivity
af2a3859f6 Update seastar submodule
* seastar b629d5ef7a...a5b9f77d52 (6):
  > perftune.py: add comment explaining why we don't log errors when binding NVMe IRQs for all but i3.nonmetal machines
  > sharded: do a two phase shutdown for sharded services
  > chunked_fifo: add iterator
  > perftune.py: fix the i3 metal detection pattern
  > core/memory: remove translation api
  > reactor: file_type: offer option to not follow symbolic links
2019-06-30 11:32:21 +03:00
Avi Kivity
2abe015150 database: allow live update of the compaction_enforce_min_threshold config item
Change the type from bool to updateable_value<bool> throughout the dependency
chain and mark it as live updateable.

In theory we should also observe the value and trigger compaction if it changes,
but I don't think it is worthwhile.
2019-06-28 16:43:25 +03:00
Avi Kivity
c98d1ea942 tests: cql_test_env: prepare config for updateable values
Once we start using updateable_value<>, we must make it refer
to the updateable_value_source<> on the same shard, and to do
that we need to call broadcast_to_all_shards() first (this
creates the per-shard copy).
2019-06-28 16:43:25 +03:00
Avi Kivity
8cffec37aa main: re-read configuration file on SIGHUP
Trap SIGHUP and signal a loop to re-read the configuration file.
2019-06-28 16:43:25 +03:00
Avi Kivity
2ee07bb09b main: preserve config::client_encryption_options configuration source
With dynamically updateable configuration, tracking the source of a value
is more important, since we'll accept or reject updates depending on the source.

Fix the source of client_encryption_options, which we RMW, by preserving the original
source.
2019-06-28 16:43:25 +03:00
Avi Kivity
6061a833a3 config: make values updateable
Replace the per-shard value we store with an updateable_value_source, which
allows updating it dynamically and allows users to track changes.

The broadcast_to_all_shards() function is augmented to apply modifications
when called on a live system.
2019-06-28 16:43:25 +03:00
Avi Kivity
f7de01d082 config: store copies of config items per shard
Since some of our values are not atomic (strings) and the administrative
information needed to track references to values is also not atomic, we will
need to store them per-shard. To do that we add a vector of per-shard data
to config_file, where each element is itself a vector of configuration items.

Since we need to operate generically on items (copying them from shard to shard)
we store them in a type-erased form.

Only mutable state is stored per-shard.
2019-06-28 16:43:25 +03:00
Avi Kivity
fb23cd1ff6 Introduce updatable_value
The updateable_value and updateable_value_source classes allow broadcasting
configuration changes across the application. The updateable_value_source class
represents a value that can be updated, and updateable_value tracks its source
and reflects changes. A typical use replaces "uint64_t config_item" with
"updateable_value<uint64_t> config_item", and from now on changes to the source
will be reflected in config_item. For more complicated uses, which must run some
callback when configuration changes, you can also call
config_item.observe(callback) to be actively notified of changes.
2019-06-28 16:43:25 +03:00
Avi Kivity
8d7c1c7231 db: seed_provider_type: add operator==()
Dynamically updateable configuration requires checking whether configuration items
changed or not, so we can skip firing notifiers for the common case where nothing
changed.

This patch adds a comparison operator for seed_provider_type, which was missing it.
2019-06-28 16:43:25 +03:00
Avi Kivity
da2a98cde6 config: don't allow assignment to config values
Currently, we allow adjusting configuration via

  cfg.whatever() = 5;

by returning a mutable reference from cfg.whatever(). Soon, however, this operation
will have side effects (updating all references to the config item, and triggering
notifiers). While this can be done with a proxy, it is too tricky.

Switch to an ordinary setter interface:

  cfg.whatever.set(5);

Because boost::program_options no longer gets a reference to the value to be written
to, we have to move the update to a notifier, and the value_ex() function has to
be adjusted to infer whether it was called with a vector type after it is
called, not before.
2019-06-28 16:43:25 +03:00
Avi Kivity
b146fd1356 config: make noncopyable
config_file and db::config are soon not going to be copyable. The reason is that
in order to support live updating, we'll need per-shard copies of each value,
and per-shard tracking of references to values. While these can be copied, it
will be an asycnronous operation and thus cannot be done from a copy constructor.

So to prepare for these changes, replace all copies of db::config by references
and delete config_file's copy constructor.

Some existing references had to be made const in order to adapt the const-ness
of db::config now being propagated (rather than being terminated by a non-const
copy).
2019-06-28 16:43:25 +03:00
Avi Kivity
fe59997efe database: don't copy config object
Copying the config object breaks the link between the original and the copied
object, so updates to config items will not be visible. To allow updates, don't
copy any more, and instead keep a pointer.

The pointer won't work will once config is updateable, since the same object is
shared across multiple shard, but that can be addressed later.
2019-06-28 15:20:39 +03:00
Avi Kivity
339699b627 database: remove default constructor
Currently, database::_cfg is a copy of the global configuration. But this means
that we have multiple master copies of the configuration, which makes updating
the configuration harder. In order to eliminate the copy we have to eliminate the
database default constructor, which creates a config object, so that all
remaining constructors can receive config by reference and retain that reference.
2019-06-28 15:20:39 +03:00
Avi Kivity
70d8127400 gossip_test: pass configuration to database object
We want to eliminate the default database constructor (to be explained in
the next patch), so eliminate its only use in gossip_test, using the regular
constructor instead.
2019-06-28 15:20:39 +03:00
Glauber Costa
d916601ea4 toppartitions: fix typo
toppartitons -> toppartitions

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190627160937.7842-1-glauber@scylladb.com>
2019-06-27 19:13:58 +03:00
Tomasz Grabiec
e071445373 Merge "More precise poisoning in logalloc" from Rafael
With this unused descriptors and objects should always be poisoned.

 * https://github.com/espindola/scylla/ align-descriptors-so-that-they-are-poisoned-v4:
 Convert macros to inline functions
 More precise poisoning in logalloc
2019-06-27 16:30:40 +02:00
Takuya ASADA
eabb872789 dist/redhat: install /usr/sbin symlinks correctly
On current scylla.spec, shell glob pattern "scylla_*setup" does not correctly
expanded, it mistakenly created a symlink named "/usr/sbin/scylla_*setup".
We need to expand them, need to create symlinks for each setup scripts.

Fixes #4605

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190627053530.10406-2-syuu@scylladb.com>
2019-06-27 14:22:40 +03:00
Takuya ASADA
828b63f4fb dist/redhat: manage *.pyc as a part of package
Since we don't install .pyc files on our package, python3 will generate .pyc
file when we launch setup script first time.
Then we will have unmanaged files under script directory, it will remain when
Scylla package upgraded / removed.

We need to compile *.py when we generate relocatable package, add compiled .pyc
files on .rpm/.deb packages.

Fixes #4612

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190627053530.10406-1-syuu@scylladb.com>
2019-06-27 14:22:39 +03:00
Rafael Ávila de Espíndola
d8dbacc7f6 More precise poisoning in logalloc
This change aligns descriptors and values to 8 bytes so that poisoning
a descriptor or value doesn't interfere with other descriptors and
values.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-06-26 13:13:48 -07:00
Rafael Ávila de Espíndola
6a2accb483 Convert macros to inline functions
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-06-26 13:13:48 -07:00
Avi Kivity
dd76943125 Merge "Segregate data when streaming by timestamp for time window compaction strategy" from Botond
"
When writing streamed data into sstables, while using time window
compaction strategy, we have to emit a new sstable for each time window.
Otherwise we can end up with sstables, mixing data from wildly different
windows, ruining the compaction strategy's ability to drop entire
sstables when all data within is expired. This gets worse as these mixed
sstables get compacted together with sstables that used to contain a
single time window.

This series provides a solution to this by segregating the data by its
atom's the time-windows. This is done on the new RPC streaming and the
new row-level, repair, memtable-flush and compaction, ensuring that the
segregation requirement is respected at all times.

Fixes: #2687
"

* 'segregate-data-into-sstables-by-time-window-streaming/v2.1' of ssh://github.com/denesb/scylla:
  streaming,repair: restore indentation
  repair: pass the data stream through the compaction strategy's interposer consumer
  streaming: pass the data stream through the compaction strategy's interposer consumer
  TWCS: implement add_interposer_consumer()
  compaction_strategy: add add_interposer_consumer()
  Add mutation_source_metadata
  tests: add unit test for timestamp_based_splitting_writer
  Add timestamp_based_splitting_writer
  Introduce mutation_writer namespace
2019-06-26 19:18:52 +03:00
Tomasz Grabiec
3e30a33e31 Merge "Introduce tests::random_schema" from Botond
Most of our tests use overly simplistic schemas (`simple_schema`) or
very specialized ones that focus on exercising a specific area of the
tested code. This is fine in most places as not all code is schema
dependent, however practice has showed that there can be nasty bugs
hiding in dark corners that only appear with a schema that has a
specific combination of types.

This series introduces `tests::random_schema` a utility class for
generating random schemas and random data for them. An important goal is
to make using random schemas in tests as simple and convenient as
possible, therefore fostering the appearance of tests using random
schemas.

Random schema was developed to help testing code I'm currently working
on, which segregates data by time-windows. As I wasn't confident in my
ability to think of every possible combination of types that can break
my code I came up with random-schema to help me finding these corner
cases. So far I consider it a success, it already found bugs in my code
that I'm not sure I would have found if I had relied on specific
schemas. It also found bugs in unrelated areas of the code which proves
my point in the first paragraph.

* https://github.com/denesb/scylla.git random_schema/v5:
  tests/data_model: approximate to the modeled data structures
  data_value: add ascii constructor
  tests/random-utils.hh: add stepped_int_distribution
  tests/random-utils.hh: get_int() add overloads that accept external
    rand engine
  tests/random-utils.hh: add get_real()
  tests: introduce random_schema
2019-06-26 18:10:20 +02:00
Botond Dénes
12b8405720 streaming,repair: restore indentation
Deferred from the previous two patches.
2019-06-26 18:45:36 +03:00
Botond Dénes
e3f4692868 repair: pass the data stream through the compaction strategy's interposer consumer 2019-06-26 18:45:36 +03:00
Botond Dénes
9c2407573c streaming: pass the data stream through the compaction strategy's interposer consumer 2019-06-26 18:45:36 +03:00
Botond Dénes
ee563928df TWCS: implement add_interposer_consumer()
Exploit the interposer customization point to inject a consumer that will
segregate the mutation stream based on the contained atoms' timestamps,
allowing the requirements of TWCS to be mantained every time sstables
are written to disk.
For the implementation, `timestamp_based_splitting_writer` is used,
with a classifier that maps timestamps to windows.
2019-06-26 18:45:36 +03:00
Tomasz Grabiec
2d3e3640df Merge "Collection: use utils::chunked_vector to store the cells" from Botond
This is a band-aid patch that is supposed to fix the immediate problem
of large collections causing large allocations. The proper fix is to
use IMR but that will take time. In the meanwhile alleviate the
pressure on the memory allocator by using a chunked storage collection
(utils::chunked_vector) instead of std::vector. In the linked issue
seastar::chunked_fifo was also proposed as the container to use,
however chunked fifo is not traversable in reverse which disqualifies
it from this role.

Refs: #3602
2019-06-26 15:32:25 +02:00
Botond Dénes
a280dcfe4c compaction_strategy: add add_interposer_consumer()
This will be the customization point for compaction strategies, used to
inject a specific interposer consumer that can manipulate the fragment
stream so that it satisfies the requirements of the compaction strategy.
For now the only candidate for injecting such an interposer is
time-window compaction strategy, which needs to write sstables that
only contains atoms belonging to the same time-window. By default no
interposer is injected.
Also add an accompanying customization point
`adjust_partition_estimate()` which returns the estimated per-sstable
partition-estimate that the interposer will produce.
2019-06-26 15:45:59 +03:00
Botond Dénes
3ce902a4be Add mutation_source_metadata
This struct contains metadata regarding to a mutation_source. Currently
it contains the min and max timestamp. This will be used later by
compaction strategies to determine whether a given mutation stream has
to be split or not.
2019-06-26 15:45:59 +03:00
Botond Dénes
25d7cbedc0 tests: add unit test for timestamp_based_splitting_writer 2019-06-26 15:45:59 +03:00
Botond Dénes
df29600eec Add timestamp_based_splitting_writer
This writer implements the core logic of time-window based data
segregation. It splits the fragment stream provided by a reader, such
that each atom (cell) in the stream will be written into a consumer
based on the time-window its timestamp belongs to. The end result is
that each consumer will only see fragments, whoose atoms all have
timestamps belonging to the same time-window.
When a mutation fragment has atoms belonging to different time-windows,
it is split into as many fragments as needed so each has only atoms
that belong to the same time-window.
2019-06-26 15:45:59 +03:00
Botond Dénes
2693f1838a Introduce mutation_writer namespace
Currently there is a single mutation_writer: `multishard_writer`,
however in the next path we are going to add another one. This is the
right moment to move these into a common namespace (and folder), we
have way too much stuff scattered already in the top-level namespace
(and folder).
Also rename `tests/multishard_writer_test.cc` to
`tests/mutation_writer_test.cc`, this test-suite will be the home of all
the different mutation writer's unit test cases.
2019-06-26 15:45:59 +03:00
Avi Kivity
adcc95dddc Merge "sstable: mc: reader: Optimize multi-partition scans for data sets with small partitions" from Tomasz
"
Currently, parser and the consumer save its state and return the
control to the caller, which then figures out that it needs to enter a
new partition, and that it doesn't need to skip. We do it twice, after
row end, and after row start. All this work could be avoided if the
consumer installed by the reader adjusted its state and pushed the
fragments on the spot. This patch achieves just that.

This results in less CPU overhead.

The ka/la reader is left still stopping after row end.

Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe):

perf_fast_forward -c1 -m1G --run-tests=small-partition-skips:

Before:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.952372            4   1000000    1050009        755    1050765    1046585      976.0    971     124256       1       0        0        0        0        0        0        0  99.7%
After:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.790178            4   1000000    1265538       1150    1266687    1263684      975.0    971     124256       2       0        0        0        0        0        0        0  99.6%

Tests: unit (dev)
"

* 'sstable-optimize-partition-scans' of https://github.com/tgrabiec/scylla:
  sstable: mc: reader: Do not stop parsing across partitions
  sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader
  sstables: reader: Simplify _single_partition_read checking
  sstables: reader: Update stats from on_next_partition()
  sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range()
  sstables: ka/la: reader make push_ready_fragments() safe to call many times
  sstables: mc: reader: Move out-of-range check out of push_ready_fragments()
  sstables: reader: Return void from push_ready_fragments()
  sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range()
  sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end
2019-06-26 13:19:12 +03:00
Avi Kivity
06a9596491 tests: cql_test_env: disable commitlog O_DSYNC
O_DSYNC causes commitlog to pre-allocate each commitlog segment by writing
zeroes into it. In normal operation, this is amortized over the many
times the segment will be reused. In tests, this is wasteful, but under
the default workstation configuration with /tmp using tmpfs, no actual
writes occur.

However on a non-default configuration with /tmp mounted on a real disk,
this causes huge disk I/O and eventually a crash (observed in
schema_change_test). The crash is likely only caused indirectly, as the
extra I/O (exacerbated by many tests running in parallel) xcauses timeouts.

I reproduced this problem by running 15 copies of schema_change_test in
parallel with /tmp mounted on a real filesystem. Without this change, I
usually observe one or two of the copies crashing, with the change they
complete (and much more quickly, too).
2019-06-26 12:15:53 +02:00
Asias He
f0f0beba2e repair: Move the global tracker object into repair_service
The tracker object was a static object in repair.cc. At the time we initialize
it, we do not know the smp::count, so we have to initialize the _repairs
object when it is used on the fly.

    void init_repair_info() {
        if (_repairs.size() != smp::count) {
            _repairs.resize(smp::count);
        }
    }

This introduces a race if init_repair_info is called on different
thread(shard).

To fix, put the tracker object inside the newly introduced
repair_service object which is created in main.cc.

Fixes #4593
Message-Id: <b1adef1c0528354d2f92f8aaddc3c4bee5dc8a0a.1561537841.git.asias@scylladb.com>
2019-06-26 12:53:10 +03:00
Botond Dénes
572a738777 collection: use chunked_vector to store cells
This is quick fix to the immediate problem of large collections causing
large allocations, triggering stalls or OOM. The proper fix is to
use IMR for storing the cells, but that is a complex change that will
require time, so let's not stall/OOM in the meanwhile.
2019-06-26 11:40:44 +03:00
Botond Dénes
c68ffc330e types: don't copy collection_type_impl::mutation_view
Just because its a view its not cheap to copy.
2019-06-26 11:39:41 +03:00
Asias He
fb3f0125ee repair: Add default construct for partition_key_and_mutation_fragments
This is useful when we want to add an empty
partition_key_and_mutation_fragments.
2019-06-26 09:12:55 +08:00
Asias He
3fc53a6b72 repair: Add send_full_set_rpc_stream in row_level_diff_detect_algorithm
It is used to negotiate if the master can use the rpc stream interface
to transfer data.
2019-06-26 09:12:55 +08:00
Asias He
6054a56333 repair: Add repair_row_on_wire_with_cmd
It is used to contain both a repair cmd and repair_row_on_wire object.
2019-06-26 09:12:55 +08:00
Asias He
9f36d775dc repair: Add repair_hash_with_cmd
It is a wrapper contains both a repair cmd and repair_hash object.
2019-06-26 09:12:55 +08:00
Asias He
6b59279e26 repair: Add repair_stream_cmd
It is used by row level repair to add small protocol on top of the rpc stream
interface.
2019-06-26 09:12:55 +08:00
Rafael Ávila de Espíndola
94d2194c77 dht: token: Simplify operator<
While this is a strict weak ordering, it is not obvious and duplicates
a bit of logic. This ptach simplifies it by using tri_compare.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190621204820.37874-1-espindola@scylladb.com>
2019-06-25 19:06:30 +03:00
Tomasz Grabiec
269e65a8db Merge "Sync schema before repair" from Asias
This series makes sure new schema is propagated to repair master and
follower nodes before repair.

Fixes #4575

* dev.git asias/repair_pull_schema_v2:
  migration_manager: Add sync_schema
  repair: Sync schema from follower nodes before repair
2019-06-25 19:05:29 +03:00
Amos Kong
f0cd589a75 dist: suppress the yaml load warning
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated,
as the default Loader is unsafe. Please read https://msg.pyyaml.org/load
for full details.

Fix it by use new safe interface - yaml.safe_load()

Signed-off-by: Amos Kong <amos@scylladb.com>
Cc: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <9b68601845117274573474ede0341cc81f80efa6.1561156205.git.amos@scylladb.com>
2019-06-25 19:05:29 +03:00
Avi Kivity
fc629bb14f Merge "cql3: lift infinite bound check" from Benny & Piotr
"
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

Fixes #432

Update test_range_deletion_scenarios unit test accordingly.
"

* 'cql3-lift-infinite-bound-check' of https://github.com/bhalevy/scylla:
  cql3: lift infinite bound check if it's supported
  service: enable infinite bound range deletions with mc
  database: add flag for infinite bound range deletions
2019-06-25 19:05:29 +03:00
Nadav Har'El
a88c9ca5a5 Merge branch 'add_proper_aggregation_for_paged_indexing_2' of git://github.com/psarna/scylla into next
Piotr Sarna says:

Fixes #4540
This series adds proper handling of aggregation for paged indexed queries.
Before this series returned results were presented to the user in per-page
partial manner, while they should have been returned as a single aggregated
value.

Tests: unit(dev)

Piotr Sarna (8):
  cql3: split execute_base_query implementation
  cql3: enable explicit copying of query_options
  cql3: add a query options constructor with explicit page size
  cql3: add proper aggregation to paged indexing
  cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
  tests: add query_options to cquery_nofail
  tests: add indexing + paging + aggregation test case
  tests: add indexing+paging test case for clustering keys
2019-06-25 19:05:29 +03:00
Avi Kivity
7195f75fb2 Update seastar submodule
* seastar ded50bd8a4...b629d5ef7a (9):
  > sharded: no_sharded_instance_exception: fix grammar
  > core,net: output_stream: remove redundant std::move()
  > perftune: make sure that ethtool -K has a chance of succeeding
  > net/dpdk: upgrade to dpdk-19.05
  > perftune.py: Fix a few more places where we use deprecated pyudev.Device ones
  > reactor: provide an uptime function
  > rpc: add sink::flush() to streaming api
  > Use a table to document the various build modes
  > foreign_ptr: Fix compilation error due to unused variable
2019-06-25 19:05:29 +03:00
Avi Kivity
9d21341733 review-checklist.md: add common checks
- code style
 - naming
 - micro-performance
 - concurrency
 - unit-testing
 - templates and type erasure
 - singletons
2019-06-25 19:05:29 +03:00
Piotr Sarna
efa7951ea5 main: stop view builder conditionally
The view builder is started only if it's enabled in config,
via the view_building=true variable. Unfortunately, stopping
the builder was unconditional, which may result in failed
assertions during shutdown. To remedy this, view building
is stopped only if it was previously started.

Fixes #4589
2019-06-25 19:05:29 +03:00
Asias He
bb5665331c repair: Sync schema from follower nodes before repair
Since commit "repair: Use the same schema version for repair master and
followers", repair master and followers uses the same schema version
that master decides to use during the whole repair operation. If master
has older version of schema, repair could ignore the data which makes use
of the new schema, e.g., writes to new columns.

To fix, always sync the schema agreement before repair.

The master node pulls schema from followers and applies locally. The
master then uses the "merged" schema. The followers use
get_schema_for_write() to pull the "merged" schema.

Fixes #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Asias He
14c1a71860 migration_manager: Add sync_schema
Makes sure this node knows about all schema changes known by
"nodes" that were made prior to this call.

Refs: #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Botond Dénes
d00cb4916c tests: introduce random_schema
random_schema is a utility class that provides methods for generating
random schemas as well as generating data (mutations) for them. The aim
is to make using random schemas in tests as simple and convenient as
is using `simple_schema`. For this reason the interface of
`random_schema` follows closely that of `simple_schema` to the extent
that it makes sense. An important difference is that `random_schema`
relies on `data_model` to actually build mutations. So all its
mutation-related operations work with `data_model::mutation_descrition`
instead of actual `mutation` objects. Once the user arrived at the
desired mutation description they can generate an actual mutation via
`data_model::mutation_description::build()`.

In addition to the `random_schema` class, the `random_schema.hh` header
exposes the generic utility classes for generating types and values
that it internally uses.

random_schema is fully deterministic. Using the same seed and the same
set of operations is guaranteed to result in generating the same schema
and data.
2019-06-25 12:01:33 +03:00
Botond Dénes
070d72ee23 tests/random-utils.hh: add get_real() 2019-06-25 12:01:33 +03:00
Botond Dénes
2d9f6c3b63 tests/random-utils.hh: get_int() add overloads that accept external rand engine 2019-06-25 12:01:33 +03:00
Botond Dénes
2a7710129e tests/random-utils.hh: add stepped_int_distribution 2019-06-25 12:01:33 +03:00
Botond Dénes
a3f9932a2f data_value: add ascii constructor
To allow a `data_value` with `ascii_type` to be constructed.
2019-06-25 12:01:33 +03:00
Botond Dénes
1bd8b77770 tests/data_model: approximate to the modeled data structures
Make the the data modelling structures model their "real" counterparts
more closely, allowing the user greater control on the produced data.
The changes:
* Add timestamp to atomic_value (which is now a struct, not just an
    alias to bytes).
* Add tombstone to collection.
* Add row_tombstone to row.
* Add bound kinds and tombstone to range_tombstone.

Great care was taken to preserve backward compatibility, to avoid
unnecessary changes in existing code.
2019-06-25 12:01:33 +03:00
Piotr Sarna
add40d4e59 cql3: lift infinite bound check if it's supported
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

[bhalevy] Update test_range_deletion_scenarios unit test accordingly.

Fixes #432

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:34 +03:00
Piotr Sarna
c19fdc4c90 service: enable infinite bound range deletions with mc
As soon as it's agreed that the cluster supports sstables in mc format,
infinite bound range deletions in statements can be safely enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:28 +03:00
Piotr Sarna
e77ef849af database: add flag for infinite bound range deletions
Database can only support infinite bound range deletions if sstable mc
format is supported. As a first step to implement these checks,
an appropriate flag is added to database.
2019-06-24 15:57:47 +03:00
Piotr Sarna
b668ee2b2d tests: add indexing+paging test case for clustering keys
Indexing a non-prefix part of the clustering key has a separate
code path (see issue #3405), so it deserves a separate test case.
2019-06-24 14:51:17 +02:00
Piotr Sarna
3d9a37f28f tests: add indexing + paging + aggregation test case
Indexed queries used to erroneously return partial per-page results
for aggregation queries. This test case used to reproduce the problem
and now ensures that there would be no regressions.

Refs #4540
2019-06-24 14:06:42 +02:00
Piotr Sarna
60cafcc39c tests: add query_options to cquery_nofail
The cquery_nofail utility is extended, so it can accept custom
query options, just like execute_cql does.
2019-06-24 14:06:41 +02:00
Piotr Sarna
fe18638de3 cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
The constant will be later used in test scenarios.
2019-06-24 13:21:37 +02:00
Piotr Sarna
bb08af7e68 cql3: add proper aggregation to paged indexing
Aggregated and paged filtering needs to aggregate the results
from all pages in order to avoid returning partial per-page
results. It's a little bit more complicated than regular aggregation,
because each paging state needs to be translated between the base
table and the underlying view. The routine keeps fetching pages
from the underlying view, which are then used to fetch base rows,
which go straight to the result set builder.

Fixes #4540
2019-06-24 13:21:32 +02:00
Piotr Sarna
97d476b90f cql3: add a query options constructor with explicit page size
For internal use, there already exists a query_options constructor
that copies data from another query_options with overwritten paging
state. This commit adds an option to overwrite page size as well.
2019-06-24 13:21:32 +02:00
Piotr Sarna
fa89e220ef cql3: enable explicit copying of query_options 2019-06-24 12:57:04 +02:00
Piotr Sarna
7a8b243ce4 cql3: split execute_base_query implementation
In order to handle aggregation queries correctly, the function that
returns base query results is split into two, so it's possible to
access raw query results, before they're converted into end-user
CQL message.
2019-06-24 12:57:03 +02:00
Benny Halevy
b1e78313fe log_histogram: log_heap_options::bucket_of: avoid calling pow2_rank(0)
pow2_rank is undefined for 0.
bucket_of currently works around that by using a bitmask of 0.
To allow asserting that count_{leading,trailing}_zeros are not
called with 0, we want to avoid it at all call sites.

Fixes #4153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190623162137.2401-1-bhalevy@scylladb.com>
2019-06-23 19:32:51 +03:00
Avi Kivity
779b378785 Merge "Fix partitioned_sstable_set by making it self sufficient" from Raphael & Benny
"
partitioned_sstable_set is not self sufficient because it relies on
compatible_ring_position_view, which in turn relies on lifetime of
sstable object. This leads to use-after-free. Fix this problem by
introducing compatible_ring_position and using it in p__s__s.

Fixes #4572.

Test: unit (dev), compaction dtests (dev)
"

* 'projects/fix_partitioned_sstable_set/v4' of ssh://github.com/bhalevy/scylla:
  tests: Test partitioned sstable set's self-sufficiency
  sstables: Fix partitioned_sstable_set by making it self sufficient
  Introduce compatible_ring_position and compatible_ring_position_or_view
2019-06-23 17:17:18 +03:00
Raphael S. Carvalho
14fa7f6c02 tests: Test partitioned sstable set's self-sufficiency
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
293557a34e sstables: Fix partitioned_sstable_set by making it self sufficient
Partitioned sstable set is not self sufficient, because it uses compatible_ring_position_view
as key for interval map, which is constructed from a decorated key in sstable object.
If sstable object is destroyed, like when compaction releases it early, partitioned set
potentially no longer works because c__r__p__v would store information that is already freed,
meaning its use implies use-after-free.
Therefore, the problem happens when partitioned set tries to access the interval of its
interval map and uses freed information from c__r__p__v.

Fix is about using the newly introduced compatible_ring_position_or_view which can hold a
ring_position, meaning that partitioned set is no longer dependent on lifetime of sstable
object.

Retire compatible_ring_position_view.hh as it is now unused.

Fixes #4572.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
9a83561700 Introduce compatible_ring_position and compatible_ring_position_or_view
The motivation for supporting ring position is that containers using
it can be self sufficient. The existing compatible_ring_position_view
could lead to use after free when the ring position data, it was built
from, is gone.

The motivation for compatible_ring_position_or_view is to allow lookup
on containers that don't support different key types using c__r__p,
and also to avoid unnecessary copies.
If the user is provided only with a ring_position_view, c__r__p__or_v
could be built from it and used for lookups.
Converting ring_position_view to ring_position is very bug prone because
there could be information lost in the process.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:12 +03:00
Rafael Ávila de Espíndola
65ac0a831c Add to_string_impl that takes a data_value
Currently to_string takes raw bytes. This means that to print a
data_value it has to first be serialized to be passed to to_string,
which will then deserializes it.

This patch adds a virtual to_string_impl that takes a data_value and
implements a now non virtual to_sting on top of it.

I don't expect this to have a performance impact. It mostly documents
how to access a data_value without converting it to bytes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-3-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
3bd5dd7570 Add a few more tests of data_value::to_string
I found that no tests covered this code while refactoring it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-2-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Nadav Har'El
6e87bca65d storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90448,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>
2019-06-23 16:03:06 +03:00
Asias He
b99c75429a repair: Avoid searching all the rows in to_repair_rows_on_wire
The repair_rows in row_list are sorted. It is only possible for the
current repair_row to share the same partition key with the last
repair_row inserted into repair_row_on_wire. So, no need to search from
the beginning of the repair_rows_on_wire to avoid quadratic complexity.
To fix, look at the last item in repair_rows_on_wire.

Fixes #4580
Message-Id: <08a8bfe90d1a6cf16b67c210151245879418c042.1561001271.git.asias@scylladb.com>
2019-06-23 16:03:06 +03:00
Benny Halevy
883cb4318f Merge pull request #4583 from bhalevy/init-and-shutdown-logging
Init and shutdown logging
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
3660caff77 Reduce memory used by all tests
Tests without custom flags were already being run with -m2G. Tests
with custom flags have to manually specify it, but some were missing
it. This could cause tests to fail with std::bad_alloc when two
concurrent tests tried to allocate all the memory.

This patch adds -m2G to all tests that were missing it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620002921.101481-1-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Avi Kivity
9229afe64f Merge "Fix infinite paging for indexed queries" from Piotr
"
Fixes #4569

This series fixes the infinite paging for indexed queries issue.
Before this fix, paging indexes tended to end up in an infinite loop
of returning pages with 0 results, but has_more_pages flag set to true,
which confused the drivers.

Tests: unit(dev)
Branches: 3.0, 3.1
"

* 'fix_infinite_paging_for_indexed_queries' of https://github.com/psarna/scylla:
  tests: add test case for finishing index paging
  cql3: fix infinite paging for indexed queries
2019-06-23 16:03:06 +03:00
Takuya ASADA
2135d2ae7f dist/debian: install capabilities.conf on postinst script
We still has "{{^jessie}}" tag on scylla-server systemd unit file to
skip using AmbientCapabilities on Debian 8, but it does not able to work
anymore since we moved to single binary .deb package for all debian variants,
we must share same systemd unit file across all Debian variants.

To do so we need to have separated file on /etc/systemd to define
AmbientCapabilities, create the file while running postinst script only
if distribution is not Debian 8, just like we do in .rpm.

See #3344
See #3486

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190619064224.23035-1-syuu@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
46341bd63f gdb: Print coordinator stats related to memory usage from 'scylla memory'
Example:

 Coordinator:
  fg writes:            150
  bg writes:          39980, 21429280 B
  fg reads:               0
  bg reads:               0
  hints:                  0 B
  view hints:             0 B

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906745-17150-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
f7e79b07d1 lsa: Respect the reclamation step hint from seastar allocator
This will allow us to reduce the amount of segment compaction when
reclaiming on behlaf of a large allocation because we'll evict much
more up front.

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906584-16770-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
c5184b3dd0 gdb: Print region_impl pointer from scylla lsa
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906684-17019-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Alexys Jacob
98bc9edf6f thrift/: support version 0.11+ after THRIFT-2221
Thrift 0.11 changed to generate c++ code with
std::shared_ptr instead of boost::shared_ptr.

- https://issues.apache.org/jira/browse/THRIFT-2221

This was forcing scylla to stick with older versions
of thrift.

Fixes issue #3097.

thrift: add type aliases to build with old and new versions.

update to using namespace =
2019-06-23 16:03:06 +03:00
Takuya ASADA
e4320d6537 dist/debian: run 'systemctl daemon-reload' automatically on package install/uninstall
Since we cannot use dh --with=systemd because we don't want to
automatically enabling systemd units, manage them by our setup scripts,
we have to do 'systemctl daemon-reload' manually.
(On dh --with=systemd, systemd helper automatically provides such
scirpts)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190618000210.28972-1-syuu@scylladb.com>
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
8c067c26d9 Add support for the sanitize build mode in scylla
Running tests in debug mode takes 25:22.08 in my machine. Using
sanitize instead takes that down to 10:46.39.

The mode is opt in, in that it must be explicitly selected with
"configure.py --mode=sanitize" or "ninja sanitize". It must also be
explicitly passed to test.py.

Unfortunately building with asan, optimizations and debug info is
very slow and there is nothing like -gline-tables-only in gcc.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190617170007.44117-1-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Benny Halevy
1fd91eb616 main: add logging for deferred stopping
Increase visibility of init messages to help diagnose init
and shutdown issues.

Ref #4384

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-20 13:04:36 +03:00
Benny Halevy
cbbe5a519a main: improve init logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-20 13:04:36 +03:00
Benny Halevy
e96b1afdbd supervisor::notify log at info level rather than trace
Increase visibility of init messages to help diagnose init
and shutdown issues.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-20 13:04:36 +03:00
Tomasz Grabiec
fa2ed3ecce sstable: mc: reader: Do not stop parsing across partitions
Currently, parser and the consumer save its state and return the
control to the caller, which then figures out that it needs to enter a
new partition, and that it doesn't need to skip. We do it twice, after
row end, and after row start. All this work could be avoided if the
consumer installed by the reader adjusted its state and pushed the
fragments on the spot. This patch achieves just that.

This results in less CPU overhead.

The ka/la reader is left still stopping after row end.

Brings a 20% improvement in frag/s for a full scan in perf_fast_forward (Haswell, NVMe):

 perf_fast_forward -c1 -m1G  --run-tests=small-partition-skips:

Before:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.952372            4   1000000    1050009        755    1050765    1046585      976.0    971     124256       1       0        0        0        0        0        0        0  99.7%

After:

   read    skip      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> 1       0         0.790178            4   1000000    1265538       1150    1266687    1263684      975.0    971     124256       2       0        0        0        0        0        0        0  99.6%
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
386079472a sstables: reader: Move some parser state from sstable_mutation_reader to mp_row_consumer_reader
This state will be needed by the consumer to handle crossing partition
boundaries on its own.

While at it, document it.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
92cb07debd sstables: reader: Simplify _single_partition_read checking
The old code was making advance_to_next_partition() behave
incorrectly when _single_partition_read, which was compensated by a
check in read_partition().

Cleaner to exit early.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
7f4c041ba0 sstables: reader: Update stats from on_next_partition()
After partition_start is emitted directly from the parser's consumer,
read_partition() will not always be called for each produced partition.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
0964a8fb38 sstables: mutation_fragment_filter: Drop unnecessary calls to _walker.out_of_range()
out_of_range() cannot change to true when the position falls into the
ranges, we only need to check it when it falls outside them.
2019-06-19 14:29:02 +02:00
Tomasz Grabiec
556ccf4373 sstables: ka/la: reader make push_ready_fragments() safe to call many times
Not a bug fix, just makes the implementation more robust against changes.

Before this patch this might have resulted in partition_end being
pushed many times.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
ef6edff673 sstables: mc: reader: Move out-of-range check out of push_ready_fragments()
Currently, calling push_ready_fragments() with _mf_filter disengaged
or with _mf_filter->out_of_range() causes it to call
_reader->on_out_of_clustering_range(), which emits the partition_end
fragment. It's incorrect to emit this fragment twice, or zero times,
so correctness depends on the fact that push_ready_fragments() is
called exactly once when transitioning between partitions.

This is proved to be tricky to ensure, especially after partition_end
starts to be emitted in a different path as well. Ensuring that
push_ready_fragments() is *NOT* called after partition_end is emitted
from consume_partition_end() becomes tricky.

After having to fix this problem many times after unrelated changes to
the flow, I decide that it's better to refactor.

This change moves the call of on_out_of_clustering_range() out of
push_ready_fragments(), making the latter safe to call any number of
times.

The _mf_filter->out_of_range() check is moved to sites which update
the filter.

It's also good because it gets rid of conditionals.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
552fe21812 sstables: reader: Return void from push_ready_fragments()
The result is ignored, which is fine, so make it official to avoid
confusion.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
1488b57933 sstables: reader: Rename on_end_of_stream() to on_out_of_clustering_range()
The old name is confusing, because we're not always ending the stream
when we call it.
2019-06-19 14:29:01 +02:00
Tomasz Grabiec
9b8ac5ecbc sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end
Currently, if there is a fragment in _ready and _out_of_range was set
after row end was consumer, push_ready_fragments() would return
without emitting partition_end.

This is problematic once we make consume_row_start() emit
partiton_start directly, because we will want to assume that all
fragments for the previous partition are emitted by then. If they're
not, then we'd emit partition_start before partition_end for the
previous partition. The fix is to make sure that
push_ready_fragments() emits everything.
2019-06-19 14:14:38 +02:00
Piotr Sarna
b8cadc928c tests: add test case for finishing index paging
The test case makes sure that paging indexes does not result
in an infinite loop.

Refs #4569
2019-06-19 14:10:13 +02:00
Piotr Sarna
88f3ade16f cql3: fix infinite paging for indexed queries
Indexed queries need to translate between view table paging state
and base table paging state, in order to be able to page the results
correctly. One of the stages of this translation is overwriting
the paging state obtained from the base query, in order to return
view paging state to the user, so it can be used for fetching next
pages. Unfortunately, in the original implementation the paging
state was overwritten only if more pages were available,
while if 'remaining' pages were equal to 0, nothing was done.
This is not enough, because the paging state of the base query
needs to be overwritten unconditionally - otherwise a guard paging state
value of 'remaining == 0' is returned back to the client along with
'has_more_pages = true', which will result in an infinite loop.
This patch correctly overwrites the base paging state unconditionally.

Fixes #4569
2019-06-19 14:10:13 +02:00
Tomasz Grabiec
cd1ff1fe02 Merge "Use same schema version for repair nodes" from Asias
This patch set fixes repair nodes using different schema version and
optimizes the hashing thanks to the fact now all nodes uses same schema
version.

Fixes: #4549

* seastar-dev.git asias/repair_use_same_schema.v3:
  repair: Use the same schema version for repair master and followers
  repair: Hash column kind and id instead of column name and type name
2019-06-18 12:42:53 +02:00
Asias He
4285801af9 repair: Hash column kind and id instead of column name and type name
It is guaranteed repair nodes use the same schema. It is faster to hash
column kind and id.

Changing the hashing of mutation fragment causes incompatibility with
mixed clusters. Let's backport to the 3.1 release, which includes row
level repair for the first time and is not released yet.

Refs: #4549
Backports: 3.1
2019-06-18 18:27:21 +08:00
Asias He
3db136f81e repair: Use the same schema version for repair master and followers
Before this patch, repair master and followers use their own schema
version at the point repair starts independently. The schemas can be
different due to schema change. Repair uses the schema to serialize
mutation_fragment and deserialize the mutation_fragment received from
peer nodes. Using different schema version to serialize and deserialize
cause undefined behaviour.

To fix, we use the schema the repair master decides for all the repair
nodes involved.

On top of this patch, we could do another step to make sure all nodes
has the latest schema. But let's do it in a separate patch.

Fixes: #4549
Backports: 3.1
2019-06-18 18:27:21 +08:00
Rafael Ávila de Espíndola
8672eddff2 Document the best practices for when to use asserts/exceptions/logs
The intention is just to document what is currently done. If someone
wants to propose changes, that can be done after the current practices
have been documented.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190524135109.29436-1-espindola@scylladb.com>
2019-06-18 12:13:01 +03:00
Rafael Ávila de Espíndola
26c0814a88 Add test large collection warning
This was already working, but we were not testing for it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190617181706.66490-1-espindola@scylladb.com>
2019-06-18 10:27:55 +02:00
Nadav Har'El
6aab1a61be Fix deciding whether a query uses indexing
The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary).

Fixes #4539

Closes https://github.com/scylladb/scylla/pull/4544

Merged from branch 'fix_deciding_whether_a_query_uses_indexing' of git://github.com/psarna/scylla
  tests: add case for partition key index and filtering
  cql3: fix deciding if a query uses indexing
2019-06-18 01:01:14 +03:00
Takuya ASADA
7320c966bc dist/common/scripts/scylla_setup: don't proceed with empty NIC name
Currently NIC selection prompt on scylla_setup just proceed setup when
user just pressed Enter key on the prompt.
The prompt should ask NIC name again until user input correct NIC name.

Fixes #4517
Message-Id: <20190617124925.11559-1-syuu@scylladb.com>
2019-06-17 15:52:29 +03:00
Avi Kivity
938b74f47a Merge "Fix gcc9 build" from Paweł
"
These patches fix remaining issues with gcc9 build, that involve a gcc9 bug, a gcc9 bug, and a stricter warning.

Tests: unit(debug, dev, release).
"

* 'fix-gcc9-build' of https://github.com/pdziepak/scylla:
  dht/ring_position: silence complaints about uninitialised _token_bound
  xx_hasher: disable -Warray-bounds
  api/column_family: work around gcc9 bug in seastar::future<std::any>
2019-06-17 15:23:24 +03:00
Tomasz Grabiec
f798f724c8 frozen_mutation: Guard against unfreezing using wrong schema
Currently, calling unfreeze() using the wrong version of the schema
results in undefined behavior. That can cause hard-to-debug
problems. Better to throw in such cases.

Refs #4549.

Tests:
  - unit (dev)
Message-Id: <1560459022-23786-1-git-send-email-tgrabiec@scylladb.com>
2019-06-17 15:23:24 +03:00
Asias He
f32371727b repair: Avoid copying position in to_repair_rows_list
No need to make a copy because it is not used to construct repair_row
any more since commit 9079790f85 (repair:
Avoid writing row with same partition key and clustering key more than
once). Use mf->position() instead.

Refs: #4510
Backports: 3.1
Message-Id: <7b21edcc3368036b6357b5136314c0edc22ad4d2.1560753672.git.asias@scylladb.com>
2019-06-17 15:23:24 +03:00
Paweł Dziepak
483f66332b dht/ring_position: silence complaints about uninitialised _token_bound 2019-06-17 13:11:20 +01:00
Paweł Dziepak
82b8450922 xx_hasher: disable -Warray-bounds
In release mode gcc9 has a false positive warning about out of bound
access in xxhash implementation:

./xxHash/xxhash.c:799:27: error: array subscript -3 is outside array bounds of ‘long unsigned int [1]’ [-Werror=array-bounds]

This is solved by disabling -Warray-bounds in the xxhash code.
2019-06-17 13:09:54 +01:00
Paweł Dziepak
8a13d96203 api/column_family: work around gcc9 bug in seastar::future<std::any>
There is a gcc9 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415
that makes it impossible to pass std::any through a seastar::future<T>.
Fortunately, there is only one user of seastar::future<std::any> in
Scylla and it is not performance-critical. This patch avoids the gcc9
bug by using seastar::future<std::unique_ptr<std::any>>.
2019-06-17 13:06:28 +01:00
Glauber Costa
91b71a0b1a do not allow multiple snapshot operations at the same time
We saw a node crashing today with nodetool clearsnapshot being called.
After investigation, the reason is that nodetool clearsnapshot ws called
at the same time a new snapshot was created with the same tag. nodetool
clearsnapshot can't delete all files in the directory, because new files
had by then been created in that directory, and crashes on I/O error.

There are, many problems with allowing those operations to proceed in
parallel. Even if we fix the code not to crash and return an error on
directory non-empty, the moment they do any amount of work in parallel
the result of the operation becomes undefined. Some files in the
snapshot may have been deleted by clear, for example, and a user may
then not be able to properly restore from the backup if this snapshot
was used to generate a backup.

Moreover, although we could lock at the granularity of a keyspace or
column family, I think we should use a big hammer here and lock the
entire snapshot creation/deletion to avoid surprises (for example, if a
user requests creation of a snapshot for all keyspaces, and another
process requests clear of a single keyspace)

Fixes #4554

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190614174438.9002-1-glauber@scylladb.com>
2019-06-16 10:30:13 +03:00
Rafael Ávila de Espíndola
44eb939aa6 Use the sanitizer flags from seastar
In practice, we always want to use the same sanitizer flags with
seastar and scylla. Seastar was already marking its sanitizer flags
public, so what was missing was exporting the link flags via pkgconfig
and dropping the duplicates from scylla.

I am doing this after wasting some time editing the wrong file.

This depends on the seastar patch to export the sanitizer flags in
pkgconfig.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-06-16 09:21:10 +03:00
Takuya ASADA
f582a759ee dist: merge /usr/lib/scylla to /opt/scylladb
We used to use /opt/scylladb just for Scylla build toolchain and
dependency libraries, not for Scylla main package.
But since we merged relocatable package, Scylla main binary and
dependency libraries are all located under /opt/scylladb, only
setup scripts remained on /usr/lib/scylla.
It strange to keep using both /usr/lib/<app name> and /opt/<app name>,
we should merge them into single place.

Message-Id: <20190614011038.17827-1-syuu@scylladb.com>
2019-06-14 21:03:36 +03:00
Piotr Jastrzebski
a41c9763a9 sstables: distinguish empty and missing cellpath
Before this patch mc sstables writer was ignoring
empty cellpaths. This is a wrong behaviour because
it is possible to have empty key in a map. In such case,
our writer creats a wrong sstable that we can't read back.
This is becaus a complex cell expects cellpath for each
simple cell it has. When writer ignores empty cellpath
it writes nothing and instead it should write a length
of zero to the file so that we know there's an empty cellpath.

Fixes #4533

Tests: unit(release)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <46242906c691a56a915ca5994b36baf87ee633b7.1560532790.git.piotr@scylladb.com>
2019-06-14 20:36:41 +03:00
Asias He
9079790f85 repair: Avoid writing row with same partition key and clustering key more than once
Consider

   master: row(pk=1, ck=1, col=10)
follower1: row(pk=1, ck=1, col=20)
follower2: row(pk=1, ck=1, col=30)

When repair runs, master fetches row(pk=1, ck=1, col=20) and row(pk=1,
ck=1, col=30) from follower1 and follower2.

Then repair master sends row(pk=1, ck=1, col=10) and row(pk=1, ck=1,
col=30) to follower1, follower1 will write the row with the same
pk=1, ck=1 twice, which violates uniqueness constraints.

To fix, we apply the row with same pk and ck into the previous row.
We only needs this on repair follower because the rows can come from
multiple nodes. While on repair master, we have a sstable writer per
follower, so the rows feed into sstable writer can come from only a
single node.

Tests: repair_additional_test.py:RepairAdditionalTest.repair_same_row_diff_value_3nodes_test
Fixes: #4510
Message-Id: <cb4fbba1e10fb0018116ffe5649c0870cda34575.1560405722.git.asias@scylladb.com>
2019-06-13 17:19:19 +02:00
Asias He
912ce53fc5 repair: Allow repair_row to initialize partially
On repair follower node, only decorated_key_with_hash and the
mutation_fragment inside repair_row are used in apply_rows() to apply
the rows to disk. Allow repair_row to initialize partially and throw if
the uninitialized member is accessed to be safe.
Message-Id: <b4e5cc050c11b1bafcf997076a3e32f20d059045.1560405722.git.asias@scylladb.com>
2019-06-13 17:18:53 +02:00
Benny Halevy
2fd2713fda conf: update conf/scylla.yaml default large data warning thresholds
They are currently inconsistent with db/config.cc
and missing compaction_large_cell_warning_threshold_mb

Fixes #4551

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613133657.15370-1-bhalevy@scylladb.com>
2019-06-13 16:45:27 +03:00
Benny Halevy
4ad06c7eeb tests/perf: provide random-seed option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613114307.31038-2-bhalevy@scylladb.com>
2019-06-13 14:45:49 +03:00
Benny Halevy
43e4631e6a tests: random-utils: use seastar::testing::local_random_engine
To provide test reproducibility use the seastar local_random_engine.

To reproduce a run, use the --random-seed command line option
with the seed printed accordingly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613114307.31038-1-bhalevy@scylladb.com>
2019-06-13 14:45:48 +03:00
Benny Halevy
fe2d629e20 mutation_reader_test: test_multishard_combining_reader_reading_empty_table: fix non-atomic sharing of shards_touched
It needs to be a std::vector<std::atomic<bool>>
otherwise threads step on wach other in shared memory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190613112359.21884-1-bhalevy@scylladb.com>
2019-06-13 14:44:43 +03:00
Piotr Sarna
2c2122e057 tests: add a test case for filtering clustering key
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref #4541
Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>
2019-06-13 10:38:56 +03:00
Piotr Sarna
c4b935780b cql3: fix qualifying clustering key restrictions for filtering
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref #4541
Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>
2019-06-13 10:38:47 +03:00
Piotr Sarna
adeea0a022 cql3: fix fetching clustering key columns for filtering
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes #4541
Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>
2019-06-13 10:38:37 +03:00
Glauber Costa
8a3fe3ac9b debian: correctly relocate python scripts
Relocation of python scripts mentions scylla-server in paths explicitly.
It should use {{product}} instead. The current build is failing when
{{product}} is different than scylla-server

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190613012518.28784-1-glauber@scylladb.com>
2019-06-13 09:39:36 +03:00
Takuya ASADA
b1226fb15a dist/docker/redhat: change user of scylla services to 'scylla'
On branch-3.1 / master, we are getting following error:

ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/data: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/commitlog: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/view_hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)

It seems like owner verification of data directory fails because
scylla-server process is running in root but data directory owned by
scylla, so we should run services as scylla user.

Fixes #4536
Message-Id: <20190611113142.23599-1-syuu@scylladb.com>
2019-06-12 20:29:06 +03:00
Takuya ASADA
60d8a99f05 dist/common/scripts/scylla_setup: verify system umask is acceptable for scylla-server
To avoid 'Bad permmisons' error when user changed default umask, we need
to verify system umask is acceptable for scylla-server.

Fixes #4157

Message-Id: <20190612130343.6043-1-syuu@scylladb.com>
2019-06-12 20:29:06 +03:00
Avi Kivity
cac812661c Update seastar submodule
* seastar 253d6cb...ded50bd (14):
  > Only export sanitizer flags if used
  > perftune.py: use pyudev.Devices methods instead of deprecated pyudev.Device ones
  > Add a Sanitize build mode
  > Merge "perftune.py : new tuning modes" from Vlad
  > reactor: clarify how submit_to() destroys the function object
  > Export the sanitizer flags via pkgconfig
  > smp: Delete unprocessed work items
  > iotune: fixed finding mountpoint infinite loop
  > net: Fix dereferencing moved object
  > Always enable the exception scalability hack
  > Merge "Simple cleanups in future.hh" from Rafael
  > tests: introduce testing::local_random_engine
  > core/deleter: Fix abort when append() is called twice with a shared deleter
  > rpc stream: do not crash if a stream is used after eos
2019-06-12 20:28:48 +03:00
Asias He
b463d7039c repair: Introduce get_combined_row_hash_response
Currently, REPAIR_GET_COMBINED_ROW_HASH RPC verb returns only the
repair_hash object. In the future, we will use set reconciliation
algorithm to decode the full row hashes in working row buf. It is useful
to return the number of rows inside working row buf in addition to the
combined row hashes to make sure the decode is successful.

It is also better to use a wrapper class for the verb response so we can
extend the return values later more easily with IDL.

Fixes #4526
Message-Id: <93be47920b523f07179ee17e418760015a142990.1559771344.git.asias@scylladb.com>
2019-06-12 13:51:29 +03:00
Takuya ASADA
30414d9c23 dist/ami: install scylla debug symbols by default
On AMI creation, install scylla-debuginfo by default.

closes #4542

Message-Id: <20190612102355.21386-1-syuu@scylladb.com>
2019-06-12 13:49:46 +03:00
Eliran Sinvani
2b44d8ed42 cql: Allow user manipulation queries to use cql keywords for a name
This commit allows the CREATE/DROP/ALTER USER cql queris
to use cql keywords for the user name (for example "empty").

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190612104301.8322-1-eliransin@scylladb.com>
2019-06-12 13:48:10 +03:00
Dejan Mircevski
a52a56bfc0 utils: Add like_matcher
A utility for matching text with LIKE patterns, and a battery of
tests.

Tests: unit(dev,debug)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-06-12 13:14:53 +03:00
Piotr Sarna
7b2de7ac5b tests: add case for partition key index and filtering
The test ensures that partition key index does not influence
filtering decisions for regular columns.

Ref #4539
2019-06-12 11:53:02 +02:00
Rafael Ávila de Espíndola
bf87b7e1df logalloc: Use asan to poison free areas
With this patch, when using asan, we poison segment memory that has
been allocated from the system but should not be accessible to user
code.

Should help with debugging user after free bugs.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190607140313.5988-1-espindola@scylladb.com>
2019-06-12 11:46:45 +02:00
Piotr Sarna
adc51e57c1 cql3: fix deciding if a query uses indexing
The code that decides whether a query should used indexing
was buggy - a partition key index might have influenced the decision
even if the whole partition key was passed in the query (which
effectively means that indexing it is not necessary).

Fixes #4539
2019-06-12 11:44:16 +02:00
Raphael S. Carvalho
62aa0ea3fa sstables: fix log of failure on large data entry deletion by fixing use-after-move
Fixes #4532.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190527200828.25339-1-raphaelsc@scylladb.com>
2019-06-12 10:55:46 +03:00
Juliana Oliveira
43f92ae6d5 cql: functions: add min/max/count for boolean type
Explicitly add min/max/count functions and tests for
boolean type.

Tests: unit (release)

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20190612015215.GA2618@shenzou.localdomain>
2019-06-12 10:11:08 +03:00
Benny Halevy
3ad005ba17 build-ami: fix branch detection failure when not in git tree
Introduced in 513d01d53e

The script is trying to determine the branch to shallow clone
when an rpm is missing and has to be built.
This functionality in the current implementation assumes it is being run inside
a git repository, but that must not be the case if the script is triggered after
local rpms were placed on the local directory.
This happens when putting all necessary rpm files in: dist/ami/files
And then running: dist/ami/build_ami.sh --localrpm
The dist/ami/ and dist/ami/files are the only ones required for this action so
querying the git repository in that situation makes no sense.

Fixes #4535

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190611112455.13862-1-bhalevy@scylladb.com>
2019-06-11 19:08:02 +03:00
Piotr Sarna
1a5e5433bf cql3: make add_restriction helper functions public
In order to allow building statement restrictions manually
instead of providing WHERE clause from CQL layer, helper functions
that add single restrictions are made public.
Message-Id: <31fa23a5e5ef927128f23b9fcb3362a2582d86bb.1560237237.git.sarna@scylladb.com>
2019-06-11 16:01:35 +03:00
Tomasz Grabiec
8c4baab81e Merge "view: ignore duplicated key entries in progress virtual reader" from Piotr S.
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent legacy
views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before
returning it to the user. That may cause duplicated clustering row
fragments to be emitted by the reader, which may cause undefined
behaviour in consumers.  The solution is to keep track of previous
clustering keys for each partition and drop fragments that would cause
duplication. That way if any shard is still building a view, its
progress will be returned, and if many shards are still building, the
returned value will indicate the progress of a single arbitrary shard.

Fixes #4524
Tests:
unit(dev) + custom monotonicity checks from tgrabiec@scylladb.com
2019-06-11 13:55:25 +02:00
Piotr Sarna
85a3a4b458 view: ignore duplicated key entries in progress virtual reader
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent
legacy views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before returning
it to the user. That may cause duplicated clustering row fragments to be
emitted by the reader, which may cause undefined behaviour in consumers.
The solution is to keep track of previous clustering keys for each
partition and drop fragments that would cause duplication. That way if
any shard is still building a view, its progress will be returned,
and if many shards are still building, the returned value will indicate
the progress of a single arbitrary shard.

Fixes #4524
Tests:
unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>
2019-06-11 13:01:31 +02:00
Nadav Har'El
5ef928a63d coding-style.md: mention "using namespace seastar"
All Scylla code is written with "using namespace seastar", i.e., no
"seastar::" prefix for Seastar symbols. Document this in the coding style.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190610203948.18075-1-nyh@scylladb.com>
2019-06-11 10:39:03 +03:00
Calle Wilund
26702612f3 api.hh: Fix bool parsing in req_param
Fixes #4525

req_param uses boost::lexical cast to convert text->var.
However, lexical_cast does not handle textual booleans,
thus param=true causes not only wrong values, but
exceptions.

Message-Id: <20190610140511.15478-1-calle@scylladb.com>
2019-06-10 17:11:47 +03:00
Gleb Natapov
9213d56a06 storage_proxy: align background and foreground repair metric names
One is plural another is not, make them all plural.

Message-Id: <20190605135940.GI25001@scylladb.com>
2019-06-10 11:34:36 +03:00
Benny Halevy
2017de9387 build-ami: delete extra parenthesis in branch_arg calculation
Fixing a typo

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190610062113.5604-1-bhalevy@scylladb.com>
2019-06-10 11:29:44 +03:00
Avi Kivity
591d2968cc storage_proxy: limit resources consumed in cross-shard operations
Currently, each shard protects itself by not reading from rpc and the native
transport if in-flight requests consume too much memory for that shard. However,
if all shards then forward their requests to some other shard, then that shard
can easily run out of memory since its load can be multiplied by the number of
shards that send it requests.

To protect against this, use the new Seastar smp_service_group infrastructure.
We create three groups: read, write, and write ack (the latter is needed to
avoid ABBA deadlocks is shard A exhausts all its resources sending writes to shard B,
and shard B simulateously does the same; neither will be able to send
acknowledgements, so if the writes are throttled, they will never be unthrottled
until a timeout occurs).

Range scans are not addressed by this patch since they are handled by
multishard_mutation_query, which has its own complex cross-shard communication
scheme, but it be a similar solution.

Ref #1105 (missing range scan protection)

Tests: unit (dev)
Message-Id: <20190512142243.17795-1-avi@scylladb.com>
2019-06-07 10:53:23 +02:00
Vlad Zolotarov
20a610f6bc fix_system_distributed_tables.py: declare the 'port' argument as 'int'
If a port value passed as a string this makes the cluster.connect() to
fail with Python3.4.

Let's fix this by explicitly declaring a 'port' argument as 'int'.

Fixes #4527

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190606133321.28225-1-vladz@scylladb.com>
2019-06-06 20:19:57 +03:00
Benny Halevy
c188f838bc build-ami: use ssh git URLs
Rather than https, for cert-based passwordless access.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190606133648.15877-2-bhalevy@scylladb.com>
2019-06-06 20:02:13 +03:00
Benny Halevy
513d01d53e build-ami: use current git branch for shallow-clone of other repos
We want to use the same branch on the other repos build-ami needs
as the one we're building for.  Automatically find the current branch
using the `git branch` command.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190606133648.15877-1-bhalevy@scylladb.com>
2019-06-06 20:02:13 +03:00
Juliana Oliveira
fd83f61556 Add a warning for partitions with too many rows
This patch adds a warning option to the user for situations where
rows count may get bigger than initially designed. Through the
warning, users can be aware of possible data modeling problems.

The threshold is initially set to '100,000'.

Tests: unit (dev)

Message-Id: <20190528075612.GA24671@shenzou.localdomain>
2019-06-06 19:48:57 +03:00
Piotr Sarna
74f6ab7599 db: drop unnecessary double computation when feeding hash
When feeding hash for schema digest, compact_for_schema_digest
is mistakenly called twice, which may result in needless recomputation.

Message-Id: <8f52201cf428a55e7057d8438025275023eb9288.1559826555.git.sarna@scylladb.com>
2019-06-06 16:16:47 +03:00
Rafael Ávila de Espíndola
b3adabda2d Reduce logalloc differences between debug and release
A lot of code in scylla is only reachable if SEASTAR_DEFAULT_ALLOCATOR
is not defined. In particular, refill_emergency_reserve in the default
allocator case is empty, but in the seastar allocator case it compacts
segments.

I am trying to debug a crash that seems to involve memory corruption
around the lsa allocator, and being able to use a debug build for that
would be awesome.

This patch reduces the differences between the two cases by having a
common segment_pool that defers only a few operations to different
segment_store implementations.

Tests: unit (debug, dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190606020937.118205-1-espindola@scylladb.com>
2019-06-06 12:55:56 +03:00
Nadav Har'El
95bab04cf9 docs/metrics.md: "instance" label no longer comes from Scylla
Prometheus needs to remember which "instance" (node) each measurement
came from. But it doesn't actually need Scylla to tell it the instance
name - it knows which node it got each measurement from.

After Seastar commit 79281ef287
which fixed Seastar issue https://github.com/scylladb/seastar/issues/477,
the "instance" label on measurements no longer comes from Scylla but rather
is added by Prometheus. This patch corrects the documentation to explain the
current situation, instead of incorrectly saying that Scylla adds the
"instance" label itself.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190602074629.14336-1-nyh@scylladb.com>
2019-06-06 12:42:30 +03:00
Piotr Sarna
f50f418066 types: isolate deserializing iterator to separate file
In order to be used outside types.cc, listlike deserializing iterator
is moved to a separate header.

Message-Id: <d9416e6a8d170aa4936826b54ca7be4acb4ec8e6.1559745816.git.sarna@scylladb.com>
2019-06-05 17:46:51 +03:00
Pekka Enberg
eb00095bca relocate_python_scripts.py: Fix node-exporter install on Debian variants
The relocatable Python is built from Fedora packages. Unfortunately TLS
certificates are in a different location on Debian variants, which
causes "node_exporter_install" to fail as follows:

  Traceback (most recent call last):
    File "/usr/lib/scylla/libexec/node_exporter_install", line 58, in <module>
      data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True)
    File "/usr/lib/scylla/scylla_util.py", line 40, in curl
      with urllib.request.urlopen(req) as res:
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen
      return opener.open(url, data, timeout)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open
      response = self._open(req, data)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open
      '_open', req)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain
      result = func(*args)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1360, in https_open
      context=self._context, check_hostname=self._check_hostname)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
  Unable to retrieve version information
  node exporter setup failed.

Fix the problem by overriding the SSL_CERT_FILE environment variable to
point to the correct location of the TLS bundle.

Message-Id: <20190604175434.24534-1-penberg@scylladb.com>
2019-06-04 21:12:21 +03:00
Piotr Sarna
b3396dbb57 types: migrate to_json_string to use bytes view
The to_json_string utility implementation was based on const references
instead of views, which can be a source of unnecessary memory copying.
This patch migrates all to_json_string to use bytes_view and leaves
the const reference version as a thin wrapper.

Message-Id: <2bf9f1951b862f8e8a2211cb4e83852e7ac70c67.1559654014.git.sarna@scylladb.com>
2019-06-04 19:17:46 +03:00
Avi Kivity
06d77aa548 Merge "Introduce queue reader" from Botond
"
Technically queue_reader already exists, however so far it was a
private utility in `multishard_writer.cc`. This mini-series makes it
public and generally useful. The interface is made safer and simpler and
the implementation is improved so it doesn't have two separate buffers.
Also, unit tests are added.

Tests: mutation_reader_test:debug/test_queue_reader, multishard_writer_test:debug
"

* 'queue_reader/v2' of https://github.com/denesb/scylla:
  queue_reader: use the reader's buffer as the queue
  Make queue_reader public
2019-06-04 13:46:15 +03:00
Botond Dénes
2ccd8ee47c queue_reader: use the reader's buffer as the queue
The queue reader currently uses two buffers, a `_queue` that the
producer pushes fragments into and its internal `_buffer` where these
fragments eventually end up being served to the consumer from.
This double buffering is not necessary. Change the reader to allow the
producer to push fragments directly into the internal `_buffer`. This
complicates the code a little bit, as the producer logic of
`seastar::queue` has to be folded into the queue reader. On the other
hand this introduces proper memory consumption management, as well as
reduces the amount of consumed memory and eliminates the possibility of
outside code mangling with the queue. Another big advantage of the
change is that there is now an explicit way to communicate the EOS
condition, no need to push a disengaged `mutation_fragment_opt`.

The producer of the queue reader now pushes the fragments into the
reader via an opaque `queue_reader_handle` object, which has the
producer methods of `seastar::queue`.

Existing users of queue readers are refactored to use the new interface.

Since the code is more complex now, unit tests are added as well.
2019-06-04 13:39:26 +03:00
Glauber Costa
cbaea172cd python3: add the cassandra driver to the relocatable package
We have a script in tree that fixes the schema for distributed system
tables, like tracing, should they change their schema. We use it all the
time but unfortunately it is not distributed with the scylla package,
which makes it using it harder (we want to do this in the server, but
consistent updates will take a while).

One of the problems with the script today that makes distributing it
harder is that it uses the python3 cassandra driver, that we don't want
to have as a server dependency. But now with the relocatable packages in
place there is no reaso not to just add it.

[avi: adjust tools/toolchain/image to point to a new image with
 python3-cassandra-driver]
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190603162447.24215-1-glauber@scylladb.com>
2019-06-03 19:34:55 +03:00
Konstantin Osipov
29c27bfc28 storage_proxy: remove unnecessary lambdas in metrics binding
Remove unnecessasry lambdas when binding metrics of the storage proxy.
Message-Id: <20190603133753.1724-1-kostja@scylladb.com>
2019-06-03 16:55:16 +03:00
Botond Dénes
a597e46792 Make queue_reader public
Extract it from `mutlishard_writer.cc` and move it to
`mutation_reader.{hh,cc}` so other code can start using it too.
2019-06-03 12:08:37 +03:00
Takuya ASADA
25112408a7 dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190531112707.20082-1-syuu@scylladb.com>
2019-06-02 14:59:43 +03:00
Raphael S. Carvalho
f360d5a936 sstables: export output operator for sstable run
It wasn't being exported in any header.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190527182246.19007-1-raphaelsc@scylladb.com>
2019-06-02 10:25:51 +03:00
Avi Kivity
7a0c6cd583 Revert "dist/debian: support relocatable python3 on Debian variants"
This reverts commit 4d119cbd6d. It breaks build_deb.sh:

18:39:56 +	seastar/scripts/perftune.py seastar/scripts/seastar-addr2line seastar/scripts/perftune.py
18:39:56 Traceback (most recent call last):
18:39:56   File "./relocate_python_scripts.py", line 116, in <module>
18:39:56     fixup_scripts(archive, args.scripts)
18:39:56   File "./relocate_python_scripts.py", line 104, in fixup_scripts
18:39:56     fixup_script(output, script)
18:39:56   File "./relocate_python_scripts.py", line 79, in fixup_script
18:39:56     orig_stat = os.stat(script)
18:39:56 FileNotFoundError: [Errno 2] No such file or directory: '/data/jenkins/workspace/scylla-master/unified-deb/scylla/build/debian/scylla-package/+'
18:39:56 make[1]: *** [debian/rules:19: override_dh_auto_install] Error 1
2019-05-29 13:58:41 +03:00
Konstantin Osipov
fcd52d6187 Update README.md with more recent build instructions on Ubuntu
Building on Ubuntu 18 or 19 following the current build instructions
doesn't work. Add information about a few pitfalls. Switch README.md
to recommending dbuild and move the details to HACKING.md.

Message-Id: <20190520152738.GA15198@atlas>
2019-05-29 12:26:12 +03:00
Takuya ASADA
4d119cbd6d dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190526105138.677-1-syuu@scylladb.com>
2019-05-26 13:56:30 +03:00
Glauber Costa
71c4375a66 scylla_io_setup: adjust values for i3en instances
Apparently we are having some issues running iotune in the i3en instances,
as the values not always make sense. We believe it is something that XFS
is doing, and running fio directly on the device (no filesystem) provides
more meaningful results (more according to AWS published expected values).

For now, let's use fio instead. In this patch I have ran fio for our 4
dimensions in each of the three types of disks (large, xlarge, 3xlarge).

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190524111454.27956-1-glauber@scylladb.com>
2019-05-24 19:37:58 +03:00
Avi Kivity
53dfaf9121 Update seastar submodule
* seastar 5cb1234b0...253d6cb69 (3):
  > reactor: disable nowait aio again
  > Merge "Restructure `timer` implementations to avoid circular dependencies" from Jesse
  > Fix build command in building-docker.md
2019-05-24 14:33:05 +03:00
Raphael S. Carvalho
cabeb12b4e sstables: add output operator for sstable run
the output will look like as follow:

Run = {
	Identifier: 647044fd-d3d4-43c4-b014-b546943ead0d
	Fragments = {
		1471=-9223317893235177836:-7063220874380325121
		1478=5924386327138804918:8070482595977135657
		1472=-7063202587832032132:-4903425074566642766
		1473=-4903298949436784325:-2739716797579745183
		1474=-2739703419744073436:-589328117804966275
		1477=3734534455848060136:5924372906965333873
		1476=1579822226461317527:3734518878340722529
		1475=-589322393539097068:1579813857236466583
		1479=8070499046054048682:9223317594733741806
	}
}

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190524043331.5093-1-raphaelsc@scylladb.com>
2019-05-24 08:36:08 +03:00
Paweł Dziepak
899ebe483a Merge "Fix empty counters handling in MC" from Piotr
"
Before this patchset empty counters were incorrectly persisted for
MC format. No value was written to disk for them. The correct way
is to still write a header that informs the counter is empty.

We also need to make sure that reading wrongly persisted empty
counters works because customers may have sstables with wrongly
persisted empty counters.

Fixes #4363
"

* 'haaawk/4363/v3' of github.com:scylladb/seastar-dev:
  sstables: add test for empty counters
  docs: add CorrectEmptyCounters to sstable-scylla-format
  sstables: Add a feature for empty counters in Scylla.db.
  sstables: Write header for empty counters
  sstables: Remove unused variables in make_counter_cell
  sstables: Handle empty counter value in read path
2019-05-23 13:05:53 +01:00
Piotr Jastrzebski
fdbf4f6f53 sstables: add test for empty counters
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:24 +02:00
Piotr Jastrzebski
e91e1a1dde docs: add CorrectEmptyCounters to sstable-scylla-format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:24 +02:00
Piotr Jastrzebski
a962696e44 sstables: Add a feature for empty counters in Scylla.db.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:24 +02:00
Piotr Jastrzebski
b35030ae7e sstables: Write header for empty counters
When storing an empty counter we should still
write its header that indicates the emptiness.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:08 +02:00
Amnon Heiman
f3b6c5fe2f API: storage_proxy add CAS and View endpoints
Some nodetool command in 3.0 uses the CAS and View metrics.

CAS is not implemented and we don't have all the metrics for View
but we still don't want those nodetool commands to fail.

After this patch the following would work and will return empty:

curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_read/moving_average_histogram'

curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/view_write/moving_average_histogram'

curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/metrics/cas_write/moving_average_histogram'

This patch is needed for #4416

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190521141235.20856-1-amnon@scylladb.com>
2019-05-22 14:25:17 +03:00
Avi Kivity
698f52d257 Merge "tests: Replace ad-hoc cql utilities with general ones" from Dejan
"
One local utility function in cql_query_test.cc duplicates an existing
exception_predicate member.  Another can be generalized for wider use
in the future.  This patch accomplishes both, retiring a to-do item.

Tests: unit (dev)
"

* 'use-utils-predicate-in-cql_test' of https://github.com/dekimir/scylla:
  tests/cql: Replace equery() with cquery_nofail()
  tests: Add cquery_nofail() utility
  tests: Drop redundant function
2019-05-22 10:09:12 +03:00
Dejan Mircevski
09acb32d35 tests/cql: Replace equery() with cquery_nofail()
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-21 23:38:09 -04:00
Dejan Mircevski
a9849ecba7 tests: Add cquery_nofail() utility
Most tests await the result of cql_test_env::execute_cql().  Most
would also benefit from reporting errors with top-level location
included.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-21 23:28:14 -04:00
Dejan Mircevski
1d8bfc4173 tests: Drop redundant function
make_predicate_for_exception_message_fragment() is redundant now that
exception_utils has landed.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-21 23:28:14 -04:00
Avi Kivity
d481521a2e Update seastar submodule
* seastar 3f7a5e1...5cb1234 (5):
  > build: Help Seastar to find Boost on Fedora 30
  > Merge 'Reinstate nowait aio support' from Avi
  > Fix documentation link in README.md
  > sharded: add variants to invoke_on() that accept an smp_service_group
  > improve error message on AIO setup failure
2019-05-21 20:15:09 +03:00
Benny Halevy
fae4ca756c cql3: select_statement: provide default initializer for parameters::_bypass_cache
Fixes #4503

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190521143300.22753-1-bhalevy@scylladb.com>
2019-05-21 20:06:40 +03:00
Piotr Jastrzebski
a6484b28a1 sstables: Remove unused variables in make_counter_cell
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-21 12:07:31 +02:00
Piotr Jastrzebski
f711cce024 sstables: Handle empty counter value in read path
Due to a bug in an sstable writer, empty counters
were stored without a header.

Correct way of storing empty counter is to still write
a header that indicates the emptiness.

Next patch in this series fixes the write path
but we have to make sure that we handle incorrectly
serialized counters in the read path becuase there
may exist sstables with counters stored without header.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-21 12:07:12 +02:00
Takuya ASADA
a55330a10b dist/ami: output scylla version information to AMI tags and description
Users may want to know which version of packages are used for the AMI,
it's good to have it on AMI tags and description.

To do this, we need to download .rpm from specified .repo, extract
version information from .rpm.

Fixes #4499

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-2-syuu@scylladb.com>
2019-05-20 15:46:06 +03:00
Takuya ASADA
abe44c28c5 dist/ami: build scylla-python3 when specified --localrpm
Since we switched to relocatable python3, we need to build it for AMI too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-1-syuu@scylladb.com>
2019-05-20 15:46:05 +03:00
Konstantin Osipov
25087536bc main: developer-mode configuraiton option uses dash, not underscore
Message-Id: <20190520115524.101871-1-kostja@scylladb.com>
2019-05-20 15:14:11 +03:00
Calle Wilund
1e37e1d40c commitlog: Add optional use of O_DSYNC mode
Refs #3929

Optionally enables O_DSYNC mode for segment files, and when
enabled ignores actual flushing and just barriers any ongoing
writes.

Iff using O_DSYNC mode, we will not only truncate the file
to max size, but also do an actual initial write of zero:s
to it, since XFS (intended target) has observably less good
behaviour on non-physical file blocks. Once written (and maybe
recycled) we should have rather satisfying throughput on writes.

Note that the O_DSYNC behaviour is hidden behind a default
disabled option. While user should probably seldom worry about
this, we should add some sort of logic i main/init that unless
specified by user, evaluates the commitlog disk and sets this
to true if it is using XFS and looks ok. This is because using
O_DSYNC on things like EXT4 etc has quite horrible performance.

All above statements about performance and O_DSYNC behaviour
are based on a sampling of benchmark results (modified fsqual)
on a statistically non-ssignificant selection of disks. However,
at least there the observed behaviour is a rather large
difference between ::fallocate:ed disk area vs. actually written
using O_DSYNC on XFS, and O_DSYNC on EXT4.

Note also that measurements on O_DSYNC vs. no O_DSYNC does not
take into account the wall-clock time of doing manual disk flush.
This is intentionally ignored, since in the commitlog case, at
least using periodic mode, flushes are relatively rare.

Message-Id: <20190520120331.10229-1-calle@scylladb.com>
2019-05-20 15:10:48 +03:00
Avi Kivity
d92973ba86 Merge "scylla-gdb.py: scylla_fiber: add fallback mode" from Botond
"
Add a fallback-mode that can be used when the `scylla ptr` cannot be
used, either because the application is not built with the seastar
allocator, or due to bugs. The fallback mode relies on a more primitive
method for determining how much memory to scan looking for task pointers
inside the task object. This mode, being more primitive, is less prone
to errors, but is more wasteful and less precise.
"

* 'scylla-fiber-fallback-mode/v2' of https://github.com/denesb/scylla:
  scylla-gdb.py: scylla_fiber: add fallback mode
  scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used()
  scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers
  scylla-gdb.py: scylla_fiber: fix misaligned text in docstring
2019-05-19 18:34:55 +03:00
Takuya ASADA
4b08a3f906 reloc/python3: add license files on relocatable python3 package
It's better to have license files on our python3 distribution.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190516094329.13273-1-syuu@scylladb.com>
2019-05-19 18:30:19 +03:00
Jesse Haber-Kucharsky
68353a8265 build: Don't build iotune unconditionally
We compile Seastar unconditionally so that changes to Seastar files are
reflected in Scylla when it's built.

We don't need to unconditionally build `iotune` in the same way.

`iotune` is still listed as a build artifact, so it will be built if
`ninja` is invoked without a particular target.

However, building a specific target (like `ninja build/dev/scylla`) will
not build `iotune`.

Fixes #4165

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <9fb96a281580a8743e04d5dd11398be53960cb58.1558100815.git.jhaberku@scylladb.com>
2019-05-19 18:24:05 +03:00
Avi Kivity
5a276d44af Merge "row_cache: Make invalidate() preemptible" from Tomasz
"
This patchset fixes reactor stalls caused by cache invalidation not being preemptible.
This becomes a problem when there is a lot of partitions in cache inside the invalidated range.

This affects high-level operations like nodetool refresh, table
truncation, repair and streaming.

Fixes #2683

The improvement on stalls was measured using tests/perf_row_cache_update:

  Before:

    Small partitions, no overwrites:
    invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]}
    Small partition with a few rows:
    invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]}
    Large partition, lots of small rows:
    invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]}

  After:

    Small partitions, no overwrites:
    invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]}
    Small partition with a few rows:
    invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]}
    Large partition, lots of small rows:
    invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]}

The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).

Tests:
  - unit (dev)
"

* tag 'cache-preemptible-invalidation-v2' of github.com:tgrabiec/scylla:
  row_cache: Make invalidate() preemptible
  row_cache: Switch _prev_snapshot_pos to be a ring_position_ext
  dht: Introduce ring_position_ext
  dht: ring_position_view: Take key by const pointer
  tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion
  tests: perf_row_cache_update: Report stalls around invalidation
2019-05-19 10:47:46 +03:00
Takuya ASADA
f625284113 dist/debian: apply product name variable on override_dh_auto_install
To make product name templatization works correctly, we cannot use
"debian/scylla-server" as package contents directory path,
need to use template like "debian/{{product}}-server" instead.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190517121946.18248-1-syuu@scylladb.com>
2019-05-19 10:46:08 +03:00
Gleb Natapov
31bf4cfb5e cache_hitrate_calculator: make cache hitrate calculation preemptable
The calculation is done in a non preemptable loop over all tables, so if
numbers of tables is very large it may take a while since we also build
a string for gossiper state. Make the loop preemtable and also make
the string calculation more efficient by preallocating memory for it.
Message-Id: <20190516132748.6469-3-gleb@scylladb.com>
2019-05-16 15:32:36 +02:00
Gleb Natapov
4517c56a57 cache_hitrate_calculator: do not copy stats map for each cpu
invoke_on_all() copies provided function for each shard it is executed
on, so by moving stats map into the capture we copy it for each shard
too. Avoid it by putting it into the top level object which is already
captured by reference.
Message-Id: <20190516132748.6469-2-gleb@scylladb.com>
2019-05-16 15:32:24 +02:00
Dejan Mircevski
8dcb35913a table: Avoid needless allocation of cell lockers
All `table` instances currently unconditionally allocate a cell locker
for counter cells, though not all need one.  Since the lockers occupy
quite a bit of memory (as reported in #4441), it's wasteful to
allocate them when unneeded.

Fixes #4441.

Tests: unit (dev, debug)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190515190910.87931-1-dejan@scylladb.com>
2019-05-16 11:10:38 +03:00
Avi Kivity
5b2c8847c7 Merge "Pre timestamp based data segregation cleanup" from Botond
"
This series contains loosely related generic cleanup patches that the
timestamp based data segregation series depends on. Most of the patches
have to do with making headers self-sustainable, that is compilable on
their own. This was needed to be able to ensure that the new headers
introduced or touched by that series are self-sustainable too.
This series also introduces `schema_fwd.hh` which contains a forward
declaration of `schema` and `schema_ptr` classes. No effort was made to
find and replace all existing ad-hoc schema forward declarations in the
source tree.
"

* 'pre-timestamp-based-data-segregation-cleanup/v1' of https://github.com/denesb/scylla:
  encoding_stats.hh: add missing include
  sstables/time_window_compaction_strategy.hh: make self-sufficient
  sstables/size_tiered_compaction_strategy.hh: make self-sufficient
  sstables/compaction_strategy_impl.hh: make header self-sufficient
  compaction_strategy.hh: use schema_fwd.hh
  db/extensions.hh: use schema_fwd.hh
  Add schema_fwd.hh
2019-05-15 17:37:06 +03:00
Asias He
51c4f8cc47 repair: Fix use after free in remove_repair_meta for repair_metas
We should capture repair_metas so that it will not be freed until the
parallel_for_each is finished.

Fixes: #4333
Tests: repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
Message-Id: <237b20a359122a639330f9f78c67568410aef014.1557922403.git.asias@scylladb.com>
2019-05-15 17:22:51 +03:00
Calle Wilund
e7003f1051 sstable: Make all sstable components subject to file extensions
Makes opening all sstable components go through same file open
routine, optionally applying extensions to each (except TOC which
is special).

Also ensures we read Scylla metadata before other non-TOC
components, as we might need this for extensions (hint hint).

Message-Id: <20190513201821.14417-1-calle@scylladb.com>
2019-05-15 17:14:58 +03:00
Botond Dénes
a0010f52c5 scylla-gdb.py: scylla_fiber: add fallback mode
The current implementation of the `scylla fiber` command relies on the
`scylla ptr` command to provide metadata on pointers, more
specifically the boundaries of the region the object they point to
occupies. However, in debug mode, seastar is using the standard allocator
and thus the `scylla ptr` command doesn't work.
To work around this, provide a fallback mode for debug builds. This mode
assumes pointers point to the start of objetcts and scans a
configurable region of memory. While less exact than the variant relying
on `scylla ptr` it still works reasonably well.
The size of the to-be-scanned memory region can be set using the
`--scanned-region-size` command line argument. This defaults to 512.

Additionally, add a flag (`--force-fallback-mode`) to force using the
fallback mode. This is useful if `scylla ptr` is not working for any
reason.
2019-05-15 15:46:42 +03:00
Botond Dénes
c78d667153 scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used()
Determines whether the application is using the seastar allocator or
not. This is done by attempting to resolve the
`seastar::memory::cpu_mem` symbol. To avoid the expensive symbol lookup
the result is cached. This means that loading a new inferior will
possibly return the wrong value. The cache can be flushed by re-sourcing
the `scylla-gdb.py` script.
2019-05-15 15:44:38 +03:00
Botond Dénes
c3a06da8fb scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers 2019-05-15 15:43:34 +03:00
Botond Dénes
4964671e83 scylla-gdb.py: scylla_fiber: fix misaligned text in docstring 2019-05-15 15:43:29 +03:00
Avi Kivity
8e19121e98 Merge "Implement simple selection alongside aggregation" from Dejan
"
Although CQL allows SELECT statements with both simple and aggregate
selectors, Scylla disallows them.  This patch removes that restriction
and ensures that mixed simple/aggregate selection works as specified
both with and without GROUP BY.

Tests: unit (dev)
"

* 'aggregate-and-simple-select-together' of https://github.com/dekimir/scylla:
  cql: Fix mixed selection with GROUP BY
  cql: Allow mixing of aggregate and simple selectors
2019-05-14 20:03:58 +03:00
Dejan Mircevski
f9b00a4318 cql: Fix mixed selection with GROUP BY
GROUP BY is currently supported by simple_selection, the class used
when all selectors are simple.  But when selectors are mixed, we use
selection_with_processing, which does not yet support GROUP BY.  This
patch fixes that.

It also adapts one testcase in filtering_test to the new behavior of
simple_selector.  The test currently expects the last value seen, but
simple_selector now outputs the first value seen.

(More details: the WHERE clause implicitly selects the columns it
references, and unit tests are forced to provide expected values for
these columns.  The user-visible result is unchanged in the test;
users never see the WHERE column values due to filtering in
cql::transport, outside unit tests.)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 12:50:39 -04:00
Dejan Mircevski
06e3b36164 cql: Allow mixing of aggregate and simple selectors
Scylla currently rejects SELECT statements with both simple and
aggregate selectors, but Cassandra allows them.  This patch brings
parity to Scylla.

Fixes #4447.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 10:34:02 -04:00
Botond Dénes
fe3b798b51 scylla-gdb.py: scylla fiber: add seastar::smp_message_queue::async_work_item to the whitelist
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4c49fcf5391e027eae68707c9e6ab2f9188c2ea4.1557838171.git.bdenes@scylladb.com>
2019-05-14 17:09:32 +03:00
Avi Kivity
82b91c1511 Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
Refs #4485.
"

* tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
  db/schema_tables: Move feed_hash_for_schema_digest() to .cc file
  hashing: Introduce type-erased interface for the hasher
  hashing: Introduce C++ concept for the hasher
  hashers: Rename hasher to cryptopp_hasher
  gc_clock: Fix hashing to be backwards-compatible
2019-05-14 16:59:50 +03:00
Tomasz Grabiec
285ada5035 Merge "config: remove _make_config_values macro" from Avi
The _make_config_values macro reduces duplication (both the item name
and the types need to be available as C++ identifiers and as runtime
strings), but is hard to work with. The macro is huge and editors
don't handle it well, errors aren't identified at the correct
location, and since the macro doesn't have types, it's hard to
refactor.

This series replaces the macro with ordinary C++ code. Some repetition is
introduced, but IMO the result is easier to maintain than the macro. As a
bonus the bulk of the code is moved away from the header file.

Tests: unit (dev), manual testing of the config REST API

* https://github.com/avikivity/scylla config-no-macro/v2
  config: make the named_value type name available without requiring
    _make_config_values
  config: remove value_status from named_value template parameter list
  config: add named_value::value_as_json()
  api: config: stop using _make_config_values
  config: auto-add named_values into config_file
  config: add allowed_values parameter to named_value constructor
  config: convert _make_config_values to individual named_value member
    declarations and initializers
2019-05-14 16:00:23 +03:00
Avi Kivity
987739898f docs: document SSTable Scylla.db component
Document the format and meaning of the various bits of the Scylla.db component.
Message-Id: <20190513081605.7394-1-avi@scylladb.com>
2019-05-14 16:00:23 +03:00
Avi Kivity
786ce70dfc doc: mention the Slack workspace as a place to get help
Message-Id: <20190514090420.5598-1-avi@scylladb.com>
2019-05-14 16:00:23 +03:00
Botond Dénes
c2ec78358b encoding_stats.hh: add missing include 2019-05-14 13:27:30 +03:00
Botond Dénes
eeacf45b4a sstables/time_window_compaction_strategy.hh: make self-sufficient 2019-05-14 13:27:30 +03:00
Botond Dénes
9953cecc83 sstables/size_tiered_compaction_strategy.hh: make self-sufficient 2019-05-14 13:27:30 +03:00
Botond Dénes
d02c2253a5 sstables/compaction_strategy_impl.hh: make header self-sufficient
Add missing includes and forward declarations. De-inline some methods.
2019-05-14 13:27:30 +03:00
Botond Dénes
20d9d18ab3 compaction_strategy.hh: use schema_fwd.hh 2019-05-14 13:27:30 +03:00
Botond Dénes
690ef09b8f db/extensions.hh: use schema_fwd.hh 2019-05-14 13:27:30 +03:00
Botond Dénes
48bf1d5629 Add schema_fwd.hh 2019-05-14 13:27:30 +03:00
Tomasz Grabiec
6159d5522d tests: Add test which verifies that schema digest stays the same
(cherry picked from commit 8019634dba)
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
815295547d tests: Add sstables for the schema digest test
Generated by running test_schema_digest_does_not_change with
regenerate set to true.

(cherry picked from commit 1f2995c8c5)
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
9de071d214 schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

That's not a problem during normal cluster operation because the
tombstones will expire at all nodes at the same time, and schema
digest, although changes, will change to the same value on all nodes
at about the same time.

This fix changes digest calculation to not feed any digest for
partitions which are empty after compaction.

The digest returned by schema_mutations::digest() is left unchanged by
this patch. It affects the table schema version calculation. It's not
changed because the version is calculated on boot, where we don't yet
know all the cluster features. It's possible to fix this but it's more
complicated, so this patch defers that.

Refs #4485.

Asd
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
3a4a903674 db/schema_tables: Move feed_hash_for_schema_digest() to .cc file 2019-05-14 10:43:06 +02:00
Tomasz Grabiec
b0eecdcb8f hashing: Introduce type-erased interface for the hasher
The motivation is to allow hiding the definition of functions
accepting a hasher. For one, this reduces (re)complication times,
because we can put the definition in .cc
2019-05-14 10:43:06 +02:00
Avi Kivity
1cf72b39a5 Merge "Unbreak the Unbreakable Linux" from Glauber
"
scylla_setup is currently broken for OEL. This happens because the
OS detection code checks for RHEL and Fedora. CentOS returns itself
as RHEL, but OEL does not.
"

* 'unbreakable' of github.com:glommer/scylla:
  scylla_setup: be nicer about unrecognized OS
  scylla_util: recognize OEL as part of the RHEL family
2019-05-13 21:38:21 +03:00
Glauber Costa
3b64727244 scylla_setup: be nicer about unrecognized OS
Right now if the user tries to execute this in an unrecognized OS, the
following will be thrown:

  Traceback (most recent call last):
   File "/usr/lib/scylla/libexec/scylla_setup", line 214, in <module>
     do_verify_package('scylla-enterprise-jmx')
   File "/usr/lib/scylla/libexec/scylla_setup", line 73, in do_verify_package
     if res != 0:
  UnboundLocalError: local variable 'res' referenced before assignment

It would be a lot nicer to exit gracefully and print a messge saying what
is going on. This was caught when running on OEL, which the previous patch
fixed. Still, there are other unknown OS out there the users may try to run
on.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-05-13 14:31:49 -04:00
Glauber Costa
6c15ae5b36 scylla_util: recognize OEL as part of the RHEL family
Oracle Linux is a RHEL-like distribution and we support it just fine, but our
new incarnation of scylla_setup is failing to recognize it.

os-release for OEL is a bit different. It doesn't have an ID_LIKE string, and
only shows an ID string, which is set to 'ol'. So let's recognize this.

Fixes: #4493
Branches: 3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-05-13 14:31:38 -04:00
Tomasz Grabiec
77fb34821b row_cache: Make invalidate() preemptible
This change inserts preemption points between removal of partitions.

The main complication is in maintaining consitency in the face of
concurrent population or eviction. We use the same mechanism which is
used by memtable updates. _prev_snapshot_pos is the ring position
which partitions the ring into the part which is already updated in
cache and the one which is yet to be updated. That position should be
set accordingly on preemption.

In case of invalidation, updating means removing all entries in the
range and marking the range as discontinuous.  When resuming
invalidation of a range we continue from _prev_snapshot_pos as the
lower bound.

This affects high-level operations like nodetool refresh, table
truncation, repair and streaming.

Fixes #2683

The improvement on stalls was measured using tests/perf_row_cache_update:

Before

Small partitions, no overwrites:
invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]}
Small partition with a few rows:
invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]}
Large partition, lots of small rows:
invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]}

After:

Small partitions, no overwrites:
invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]}
Small partition with a few rows:
invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]}
Large partition, lots of small rows:
invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]}

The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).
2019-05-13 19:32:00 +02:00
Tomasz Grabiec
595e1a540e row_cache: Switch _prev_snapshot_pos to be a ring_position_ext
dht::ring_position cannot represent all ring_position_view instances,
in particular those obtained from
dht::ring_position_view::for_range_start(). To allow using the latter,
switch to views.
2019-05-13 19:30:50 +02:00
Tomasz Grabiec
1530224377 dht: Introduce ring_position_ext
It's an owning version of ring_position_view.

Note that ring_position has a narrower domain than the
ring_position_view for historical reasons, so we cannot use that.
2019-05-13 19:30:50 +02:00
Tomasz Grabiec
b08180c7fa dht: ring_position_view: Take key by const pointer 2019-05-13 19:30:39 +02:00
Tomasz Grabiec
ed697306be tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion 2019-05-13 19:18:20 +02:00
Tomasz Grabiec
b516e5fdbf tests: perf_row_cache_update: Report stalls around invalidation 2019-05-13 10:47:03 +02:00
Avi Kivity
a8b3cb8a28 Update seastar submodule
* seastar f73690e...3f7a5e1 (7):
  > Revert "Make sure all allocations/deallocations are properly byte aligned"
  > http: fix request content for POST requests
  > doc: discourage generic lambdas and unconstrained templates
  > smp: add smp_service_group for smp::submit_to() resource control
  > Revert "smp: add smp_service_group for smp::submit_to() resource control"
  > smp: add smp_service_group for smp::submit_to() resource control
  > Make sure all allocations/deallocations are properly byte aligned
2019-05-12 13:32:41 +03:00
Tomasz Grabiec
fd349a3c65 hashing: Introduce C++ concept for the hasher 2019-05-10 12:54:30 +02:00
Tomasz Grabiec
5c2f5b522d hashers: Rename hasher to cryptopp_hasher
So that we can introduce a truly generic interface named "hasher".
2019-05-10 12:54:08 +02:00
Tomasz Grabiec
b7ece4b884 gc_clock: Fix hashing to be backwards-compatible
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.

(cherry picked from commit 549d0eb2f3)
2019-05-10 12:48:46 +02:00
Avi Kivity
fdace36fa5 Merge "Fixes for GCC9 build" from Paweł
"
This series contains fixes for GCC9 build, mostly corrections needed
after changes in libstdc++. With this series and a workaround for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415 (not included)
Scylla builds and passes unit tests with GCC9 (tested on Fedora 30,
development mode only).

Tests: unit(dev with gcc8 and gcc9).
"

* tag 'gcc9-fixes/v1' of https://github.com/pdziepak/scylla:
  tests/imr: add missing noexcept
  counters: bytes_view::pointer is not const pointer
  imr/fundamental: use bytes_view::const_pointer for const pointer
2019-05-09 21:51:24 +03:00
Paweł Dziepak
96eec203bd tests/imr: add missing noexcept
The concepts require that serialisers passed to the IMR are noexcept.
GCC9 started verifying that.
2019-05-09 17:38:24 +01:00
Paweł Dziepak
ae9e083b02 counters: bytes_view::pointer is not const pointer
In libstdc++ for gcc9 std::basic_string_view::pointer isn't const any
more. As a result the compiler is complaining about reinterpret_cast
casting away const. The solution is to use std::conditional<> to choose
between const pointer for counter view and non-const pointer for mutable
counter view.
2019-05-09 17:31:35 +01:00
Paweł Dziepak
c19576319f imr/fundamental: use bytes_view::const_pointer for const pointer
In libstdc++ shipped with gcc9 std::basic_string_view::pointer is no
longer constant, which is causing the compiler to complain about
dropping const in reinterpret_cast. The solution is to use
std::basic_string_view::const_pointer.
2019-05-09 17:30:15 +01:00
Paweł Dziepak
49b4aeca4d Merge "hinted handoff: prevent sending attempts" from Vlad
"
Fix the broken logic that is meant to prevent sending hints when node is
in a DOWN NORMAL state.
"

* 'hinted_handoff_stop_sending_to_down_node-v2' of https://github.com/vladzcloudius/scylla:
  hints_manager: rename the state::ep_state_is_not_normal enum value
  hinted handoff: fix the logic that detects that the destination node is in DN state
  hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check
  hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper()
  types.cc: fix the compilation with fmt v5.3.0
2019-05-09 15:18:57 +01:00
Avi Kivity
db536776d9 tools: toolchain: fix dbuild in interactive mode regression
Before ede1d248af, running "tools/toolchain/dbuild -it -- bash" was
a nice way to play in the toolchain environment, for example to start
a debugger. But that commit caused containers to run in detached mode,
which is incompatible with interactive mode.

To restore the old behavior, detect that the user wants interactive mode,
and run the container in non-detached mode instead. Add the --rm flag
so the container is removed after execution (as it was before ede1d248af).
Message-Id: <20190506175942.27361-1-avi@scylladb.com>
2019-05-09 15:01:21 +02:00
Dejan Mircevski
d5f587b83d Narrow down build dependences of duration_test
In 0ea6df, duration_test was made to link against all tests/*.o files.
This isn't necessary, as it only needs tests/exception_utils.o.  This
patch narrows down duration_test's dependences to only
exception_utils.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190508211630.108228-1-dejan@scylladb.com>
2019-05-09 15:01:21 +02:00
Dejan Mircevski
e4ec89473e tests: Cover indexing errors in frozen collections
Add new test cases:
- disallow creating a non-FULL index on frozen collections
- disallow repeated creation of a FULL index on frozen collections
- disallow FULL indexes on non-frozen collections
- disallow referencing frozen-map entries in the WHERE clause

Also add error-message expectations to existing test cases.

Fixes #3654.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190509025806.124499-1-dejan@scylladb.com>
2019-05-09 15:25:11 +03:00
Dejan Mircevski
4eeec4a452 tests: drop util.hh
The file tests/util.hh was somehow committed despite `git mv`g it to
tests/exception_utils.hh.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190508210203.106295-1-dejan@scylladb.com>
2019-05-09 14:45:33 +03:00
Takuya ASADA
19a973cd05 dist/ami: fix wrong path of SCYLLA-PRODUCT-FILE
Since other build_*.sh are for running inside extracted relocatable
package, they have SCYLLA-PRODUCT-FILE on top of the directory,
but build_ami.sh is not running in such condition, we need to run
SCYLLA-VERSION-GEN first, then refer to build/SCYLLA-PRODUCT-FILE.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190509110621.27468-1-syuu@scylladb.com>
2019-05-09 14:45:31 +03:00
Vlad Zolotarov
f07c341efc hints_manager: rename the state::ep_state_is_not_normal enum value
Rename this state value to better reflect the reality:
state::ep_state_is_not_normal -> state::ep_state_left_the_ring

The manager gets to this state when the destination Node has left the ring.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-05-08 15:46:47 -04:00
Vlad Zolotarov
93ba700458 hinted handoff: fix the logic that detects that the destination node is in DN state
When node is in a DN state its gossiper state may be NORMAL, SHUTDOWN
or "" depending on the use case.

In addition to that if node has been removed from the ring its state is
also going to be removed from the gossiper_state map.

Let's consider the above when deciding if node is in the DN state.

Fixes #4461

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-05-08 14:53:01 -04:00
Glauber Costa
a23531ebd5 Support AWS i3en instances
AWS just released their new instances, the i3en instances.  The instance
is verified already to work well with scylla, the only adjustments that
we need is advertise that we support it, and pre-fill the disk
information according to the performance numbers obtained by running the
instance.

Fixes #4486
Branches: 3.1

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190508170831.6003-1-glauber@scylladb.com>
2019-05-08 20:09:44 +03:00
Avi Kivity
a86fdeb02b Merge "Implement GROUP BY" from Dejan
"
Cassandra has supported GROUP BY in SELECT statements since 2016
(v3.10), while ScyllaDB currently treats it as a syntax error.  To
achieve parity with Cassandra in this important bit of functionality,
this patch adds full support for GROUP BY, from parsing to validation
to implementation to testing.
"

* 'groupby-implPP' of https://github.com/dekimir/scylla:
  Implement grouping in selection processing
  Propagate GROUP BY indices to result_set_builder
  Process GROUP BY columns into select_statement
  Parse GROUP BY clause, store column identifiers
2019-05-08 18:35:12 +03:00
Dejan Mircevski
d51e4a589d Implement grouping in selection processing
Make result_set_builder obey its _group_by_cell_indices by recognizing
group boundaries and resetting the selectors.

Also make simple_selectors work correctly when grouping.

Fixes #2206.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 11:05:36 -04:00
Dejan Mircevski
c3929aee3a Propagate GROUP BY indices to result_set_builder
Ensure that the indices recorded in select_statement are passed to
result_set_builder when one is created for processing the cell values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Dejan Mircevski
274a77f45e Process GROUP BY columns into select_statement
Validate raw GROUP BY identifiers and translate them into
a select_statement member.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Dejan Mircevski
e1fb414805 Parse GROUP BY clause, store column identifiers
Extend the grammar file with GROUP BY, collect the column identifiers,
and store them in raw::select_statement.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:09:22 -04:00
Avi Kivity
ab3f044daa Revert "Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz"
This reverts commit dcb263b36b, reversing
changes made to a6759dc6aa. schema_change_test
fails consistently on master with it.
2019-05-08 16:19:38 +03:00
JP-Reddy
56420dc650 scylla_io_setup: TypeError in iotune_args array from scylla_io_setup script
Whenever the iotune_args array uses "--smp", it needs cpudata.smp()
which returns an integer instead of a string. So when iotune_args is
passed to subprocess.check_call(), it actually throws "TypeError:
expected str, bytes or os.PathLike object, not int" but
"%s did not pass validation tests, it may not be on XFS..." is shown as
the exception.

Even though the user inputs correct arguments, it might still throw an
error and confuse the user that he/she has not passed the right
arguments.

One simple fix is to use str(cpudata.smp()) instead of cpudata.smp().

Signed-off-by: JP-Reddy <guthijp.reddy@gmail.com>
Message-Id: <20190406070118.48477-1-guthijp.reddy@gmail.com>
2019-05-07 20:13:54 +03:00
Paweł Dziepak
8a16cbc50d Merge "treewide: adjust for gcc 9" from Avi
"
gcc 9 complains a lot about pessimizing moves, narrowing conversions, and
has tighter deduction rules, plus other nice warnings. Fix problems found
by it, and make some non-problems compile without warnings.
"

* tag 'gcc9/v1' of https://github.com/avikivity/scylla:
  types: fix pessimizing moves
  thrift: fix pessimizing moves
  tests: fix pessimizing moves
  tests: cql_query_test: silence narrowing conversion warning
  test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T>
  table: fix potentially wrong schema when reading from zero sstables
  storage_proxy: fix pessimizing moves
  memtable: fix pessimizing moves
  IDL: silence narrowing conversion in bool serializer
  compaction: fix pessimizing moves
  cache: fix pessimizing moves
  locator: fix pessimizing moves
  database: fix pessimizing moves
  cql: fix pessimizing moves
  cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9
2019-05-07 12:19:29 +01:00
Avi Kivity
43867fe618 types: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:01:36 +03:00
Avi Kivity
1b760297f5 thrift: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:01:15 +03:00
Avi Kivity
0ff6e48e77 tests: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:00:58 +03:00
Avi Kivity
b60d58d6bd tests: cql_query_test: silence narrowing conversion warning
Make it explicit to gcc 9 that the conversion to bool is intended.
2019-05-07 09:59:44 +03:00
Avi Kivity
5636b621a7 test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T>
gcc 9 is unable to decide whether to call role_name's copy or move
constructor. Help it by casting.
2019-05-07 09:58:21 +03:00
Avi Kivity
add20eb9a6 table: fix potentially wrong schema when reading from zero sstables
We use the schema during creation of the mutation_source rather than
during the query itself. Likely they're the same, and since no rows
are returned from a zero-sstable query, harmless. But gcc 9 complains.

Fix by using the query's schema.
2019-05-07 09:56:30 +03:00
Avi Kivity
985a30a01c storage_proxy: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:56:09 +03:00
Avi Kivity
fd3c493961 memtable: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:55:53 +03:00
Avi Kivity
17c268cd55 IDL: silence narrowing conversion in bool serializer
bool serializers are now aliases to int8_t serializers, but gcc 9
complains about narrowing conversions, due to the path int8_t -> int -> bool.

A bad narrowing conversion here cannot happen in practice, but massage
the code a little to silence it.
2019-05-07 09:28:24 +03:00
Avi Kivity
d7cbd3dc61 compaction: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:28:12 +03:00
Avi Kivity
9c7eb95f78 cache: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:27:50 +03:00
Avi Kivity
c42d59d805 locator: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:27:27 +03:00
Avi Kivity
96a0073929 database: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:26:58 +03:00
Avi Kivity
03e9cdbfb0 cql: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:26:20 +03:00
Avi Kivity
c26ec176dd cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9
We use uninitialized<T> (wrapping an optional<T>) to adjust to the
parser's way of laying out the code, but this fails with gcc 9
(presumably for the correct reasons) when converting from
uninitialized<T> back to optional<T>. Add a conversion operator
to make it build.
2019-05-07 09:21:22 +03:00
Dejan Mircevski
0ea6df2cd1 tests: Add predicates for checking exception messages
Many tests verify exception messages.  Currently, they do so via
verbose lambdas or inner functions that hide test-failure locations.
This patch adds utilities for quick creation of message-checking tests
and replaces existing ad-hoc methods with these new utilities.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190506210006.124645-1-dejan@scylladb.com>
2019-05-07 07:11:07 +03:00
Avi Kivity
dcb263b36b Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.

Branches: 3.1
"

* tag 'fix-gc_clock-digest-v1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  gc_clock: Fix hashing to be backwards-compatible
2019-05-07 07:04:40 +03:00
Tomasz Grabiec
8019634dba tests: Add test which verifies that schema digest stays the same 2019-05-06 18:43:43 +02:00
Tomasz Grabiec
1f2995c8c5 tests: Add sstables for the schema digest test
Generated by running test_schema_digest_does_not_change with
regenerate set to true.
2019-05-06 18:43:43 +02:00
Tomasz Grabiec
549d0eb2f3 gc_clock: Fix hashing to be backwards-compatible
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
2019-05-06 18:43:43 +02:00
Avi Kivity
a6759dc6aa Update seastar submodule
* seastar 4cdccae...f73690e (16):
  > sstring: silence technically correct but unhelpful warning in sstring move ctor
  > cmake: add a seastar_supports_flag function
  > future: Fix build with libc++'s non-trivially-constructible  std::tuple<>
  > Revert "Make sure all allocations are properly bytes aligned"
  > Merge "future: simplify future_state management" from Rafael
  > Make sure all allocations are properly bytes aligned
  > util/log: use correct clock type
  > core/reactor: don't assume system_clock::duration is in nanoseconds
  > Merge "Optimize the future_state move constructor" from Rafael
  > rpc: don't use boost/variant.hpp directly
  > core/memory: Omit [[gnu::leaf]] attribute on clang
  > Fix build with std::filesystem
  > Merge "Fix clang build and tests" from Rafael
  > cmake: Move ) out of quotes
  > Merge "Fix some bugs found by (or perhaps in) gcc 9" by Avi
  > Deduplicate Seastar dependencies management in CMake scripts
2019-05-06 19:17:37 +03:00
Gleb Natapov
1d851a3892 messaging: catch an error that sending of CLIENT_ID may return
Avoid a warning about unhandled exception.

Message-Id: <20190506122718.GL21208@scylladb.com>
2019-05-06 18:13:51 +03:00
Glauber Costa
79a5351651 scylla-housekeeping: timeout eventually
scylla-housekeeping always wants to run in the installation to check if
we are running the latest version. This happens regardless of whether or
not we said yes or no to the housekeeping scylla_setup question - as
that question only deals with whether or not we want to do this through
a timer.

It is fine to try to run scylla-housekeeping, as long as we time it out.
The current code doesn't.

The naive solution is to add a timeout parameter to urllib.request.open.
However, that timeout is not respected and in my tests I saw real
timeouts up to four times higher the timeout we set. For a reasonable 5s
timeout, this mean a 20s real timeout which can lead to a very bad user
experience. This seems to be a known problem with this module according
to a quick Google search.

This patch then takes a slightly more complex solution and uses
multiprocess to enforce a well-defined user-visible timeout.

Fixes #3980

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190506122335.5707-1-glauber@scylladb.com>
2019-05-06 17:37:59 +03:00
Gleb Natapov
b8188e1e2f storage_proxy: avoid copying of a topology and endpoint array in batchlog code
batchlog make copies of topology and endpoint array in batchlog endpoint
choosing code. There is a remark that at least endpoint copy is
deliberate because Cassandra code has it. We do not have to follow. Our
endpoint calculation code is atomic, so we can use a reference.

Message-Id: <20190506115815.GK21208@scylladb.com>
2019-05-06 17:36:50 +03:00
Raphael S. Carvalho
ef5681486f compaction: do not unconditionally delete a new sstable in interrupted compaction
After incremental compaction, new sstables may have already replaced old
sstables at any point. Meaning that a new sstable is in-use by table and
a old sstable is already deleted when compaction itself is UNFINISHED.
Therefore, we should *NEVER* delete a new sstable unconditionally for an
interrupted compaction, or data loss could happen.
To fix it, we'll only delete new sstables that didn't replace anything
in the table, meaning they are unused.

Found the problem while auditting the code.

Fixes #4479.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190506134723.16639-1-raphaelsc@scylladb.com>
2019-05-06 16:55:36 +03:00
Avi Kivity
1c65ba6e66 Use correct scylla_tables schema for removing version column
Mutations carry their schema, so use that instead of bring in a global schema,
which may change as features are added.
Message-Id: <20190505132542.6472-1-avi@scylladb.com>
2019-05-06 13:51:08 +02:00
Paweł Dziepak
51e98e0e11 tests/perf_fast_forward: report average number of aio operations
perf_fast_forward is used to detect performance regressions. The two
main metrics used for this are fargments per second and the number of
the IO operations. The former is a median of a several runs, but the
latter is just the actual number of asynchronous IO operations performed
in the run that happened to be picked as a median frag/s-wise. There's
no always a direct correlation between frag/s and aio and the latter can
vary which makes the latter hard to compare.

In order to make this easier a new metric was introduced: "average aio"
which reports the average number of asynchronous IO operations performed
in a run. This should produce much more stable results and therefore
make the comparison more meaningful.
Message-Id: <20190430134401.19238-1-pdziepak@scylladb.com>
2019-05-06 11:47:31 +02:00
Piotr Sarna
cf8d2a5141 Revert "view: cache is_index for view pointer"
This reverts commit dbe8491655.
Caching the value was not done in a correct manner, which resulted
in longevity tests failures.

Fixes #4478

Branches: 3.1

Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>
2019-05-06 11:45:46 +03:00
Benny Halevy
d9136f96f3 commitlog: descriptor: skip leading path from filename
std::regex_match of the leading path may run out of stack
with long paths in debug build.

Using rfind instead to lookup the last '/' in in pathname
and skip it if found.

Fixes #4464

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>
2019-05-05 17:51:56 +03:00
Benny Halevy
3a2fa82d6e time_window_backlog_tracker: fix use after free
Fixes #4465

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190430094209.13958-1-bhalevy@scylladb.com>
2019-05-05 12:47:51 +03:00
Glauber Costa
47d04e49e8 scylla_setup: respect user's decision not to call housekeeping
The setup script asks the user whether or not housekeeping should
be called, and in the first time the script is executed this decision
is respected.

However if the script is invoked again, that decision is not respected.

This is because the check has the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
 }
 if (version_check) { do_version_check() } else { dont_do_it() }

When it should have the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
    if (version_check) { do_version_check() } else { dont_do_it() }
 }

(Thanks python)

This is problematic in systems that are not connected to the internet, since
housekeeping will fail to run and crash the setup script.

Fixes #4462

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502034211.18435-1-glauber@scylladb.com>
2019-05-02 18:46:41 +03:00
Glauber Costa
99c00547ad make scylla_util OS detection robust against empty lines
Newer versions of RHEL ship the os-release file with newlines in the
end, which our script was not prepared to handle. As such, scylla_setup
would fail.

This patch makes our OS detection robust against that.

Fixes #4473

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502152224.31307-1-glauber@scylladb.com>
2019-05-02 18:33:35 +03:00
Paweł Dziepak
cf451f0e62 Merge "gdb: Fixes and improvements to memory analysis" from Tomasz
"
One of the fixes is for incorrect recognition of memory pages as belonging
or not belonging to small allocation pools in some cases.

Also, compensates for https://github.com/scylladb/seastar/issues/608 in "scylla memory",
which improves accurracy of the small allocation pool report.

Fixes "scylla task_histogram" to not look into pages which do not belong to live
small allocation pool spans.

Fixes #4367
Fixes #4368
"

* tag 'gdb-fix-span-qualification-v2' of github.com:tgrabiec/scylla:
  gdb: Print size of large allocations in 'scylla ptr'
  gdb: Fix 'scylla ptr' for free pages
  gdb: Set is_live and offset for large allocations properly in 'scylla ptr'
  gdb: Fix 'scylla ptr' misqualifying pointers
  gdb: Make 'scylla memory' show unused memory in small pools
  gdb: Fix small pool memory usage reporting in 'scylla memory'
  gdb: Switch 'scylla memory' to use the span_checker to find large spans
  gdb: Switch task_histogram to use the span_checker
  gdb: Introduce span_checker
2019-05-02 14:25:30 +01:00
Gleb Natapov
95c6d19f6c batchlog_manager: fix array out of bound access
endpoint_filter() function assumes that each bucket of
std::unordered_multimap contains elements with the same key only, so
its size can be used to know how many elements with a particular key
are there.  But this is not the case, elements with multiple keys may
share a bucket. Fix it by counting keys in other way.

Fixes #3229

Message-Id: <20190501133127.GE21208@scylladb.com>
2019-05-01 17:30:11 +03:00
Nadav Har'El
2710f382de secondary index: expand test of secondary-index and UPDATE requests
The existing unit test test_secondary_index_contains_virtual_columns
reproduced a bug (issue #4144) with indexing of primary-key columns,
but we only actually tested clustering columns. In issue #4471 there
was a question whether we may still have a bug when indexing of
*partition-key* columns. This patch adds a test that verifies that
we don't, and this case works well too.

Refs #4144
Refs #4471

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190501113500.25900-1-nyh@scylladb.com>
2019-05-01 12:53:23 +01:00
Vlad Zolotarov
274b9d8069 hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check
gossiper::is_alive() has a lot of not needed checks (e.g. is_me(ep)) that
are irrelevant for HH use case and we may safely skip them.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-25 23:16:07 -04:00
Vlad Zolotarov
74b4076ceb hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper()
sender has its own reference to the local gossiper - use it.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-25 23:04:02 -04:00
Vlad Zolotarov
fe82437dea types.cc: fix the compilation with fmt v5.3.0
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.

Compiler's complain is: "error: static assertion failed: mismatch
between char-types of context and argument"

Fix this by explicitly using to_hex() converter.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-25 23:04:02 -04:00
Avi Kivity
9a6c86e2a7 config: convert _make_config_values to individual named_value member declarations and initializers
While causing some duplication (names are explicitly instead of implicitly
stringified, and names are repeated in the member declaration and initializer),
it is overall more maintainable than the huge macro. It is easier to overload
named_value constructors when you can get error reporting on the line where the error
occurs, for example.
2019-04-23 16:29:03 +03:00
Avi Kivity
4b3c2f6514 config: add allowed_values parameter to named_value constructor
The _make_config_values() macro supples an optional list of allowed values
for a config item, so support that, even though no one uses it yet.
2019-04-23 16:29:03 +03:00
Avi Kivity
d959fbfc16 config: auto-add named_values into config_file
By passing a config_file into named_value, we remove another call to the
_make_config_values() macro.
2019-04-23 16:29:03 +03:00
Avi Kivity
b663cd1765 api: config: stop using _make_config_values
Now that named_value::value_as_json() exists, make use of it to report the
current value of a configuration variable via the REST API, instead of
_make_config_values().
2019-04-23 16:29:03 +03:00
Avi Kivity
6033b6a079 config: add named_value::value_as_json()
Currently, the REST API does its own conversion of named_value into json.
This requires it to use the _make_config_values macro to perform iteration
of all config items, since it needs to preserve the concrete type of the item
while iterating, so it can select the correct json conversion.

Since we want to remove that macro, we need to provide a different way to
convert a config item to json. So this patch adds a value_as_json().

To hide json_return_value from the rest of the system, we extend config_type
with a conversion function to handle the details. This usually calls
the json_return_type constructor directly, but when it doesn't have default
translation, it interposes a conversion into a type that json recognizes.

I didn't bother maintaining the existing type names, since they're C++
names which don't make sense for the UI.
2019-04-23 16:28:19 +03:00
Avi Kivity
db3f61776f config: remove value_status from named_value template parameter list
The value_status is only needed at run-time, and removing it from the
template parameter list reduces type proliferation (which leads to code
bloat) and simplifies the code.
2019-04-23 16:15:28 +03:00
Avi Kivity
daf5744daa config: make the named_value type name available without requiring _make_config_values
I want to remove the _make_config_values macro, but it is needed now in
api/config.cc to make the type names available. So as a first step, copy the
type names to config_src. Further changes can extract it from there.

Because we want to add more type infomation in following patches, place the type
name in a new config_type object, instead of allocating a string_view in
config_src.
2019-04-23 16:13:54 +03:00
Tomasz Grabiec
1b1f241c94 gdb: Print size of large allocations in 'scylla ptr' 2019-04-09 13:44:15 +02:00
Tomasz Grabiec
cda1781a77 gdb: Fix 'scylla ptr' for free pages
Fixes runtime error which happens because the setter is expected to
take an argument, but our definition doesn't take one. We're not
really expecting the setter to be called with False, so don't use setter
semantics.
2019-04-09 13:44:15 +02:00
Tomasz Grabiec
13efabe74c gdb: Set is_live and offset for large allocations properly in 'scylla ptr'
Before:

  (gdb) scylla ptr 0x601000860003
  thread 1, large, free

After:

  (gdb) scylla ptr 0x601000860003
  thread 1, large, live (0x601000860000 +3)

Omission from e1ea4db7ca.
2019-04-09 13:22:06 +02:00
Tomasz Grabiec
4002d8db7c gdb: Fix 'scylla ptr' misqualifying pointers
It can be that page::pool is != nullptr and page::offset_in_span is 0
for a page which is inside a large allocation span (live or
dead). This may lead to misqualification of a pointer as belonging to
a small allocation pool.

Only the first page of a span contains reliable information. This
patch changes the code to use the span_checker, which knows the real
boundaries of spans and exposes reliable information via the span
object.

Fixes #4368
2019-04-09 13:22:06 +02:00
Tomasz Grabiec
4d3399ee1f gdb: Make 'scylla memory' show unused memory in small pools
Example output:

Small pools:
objsz spansz    usedobj       memory       unused  wst%
    1   4096          0            0            0   0.0
    1   4096          0            0            0   0.0
    1   4096          0            0            0   0.0
    1   4096          0            0            0   0.0
    2   4096          0            0            0   0.0
    2   4096          0            0            0   0.0
    3   4096          0            0            0   0.0
    3   4096          0            0            0   0.0
    4   4096          0            0            0   0.0
    5   4096          0            0            0   0.0
    6   4096          0            0            0   0.0
    7   4096          0            0            0   0.0
    8   4096        241         8192         6264  76.5
   10   4096          0         8192         8192  99.9
   12   4096      35943       454656        23340   1.4
   14   4096          0         8192         8192  99.8
   16   4096       1171        24576         5840  23.8
   20   4096       1007        24576         4436  17.7
   24   4096      59380      1437696        12576   0.5
   28   4096        548        16384         1040   6.2
   32   4096      69433      2314240        92384   0.3
   40   4096      36447      1564672       106792   0.4
   48   4096      34099      1748992       112240   0.4
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
ac7a393be5 gdb: Fix small pool memory usage reporting in 'scylla memory'
Uses span_checker to work around for corrupted _pages_in_use.

Refs https://github.com/scylladb/seastar/issues/608

As a bonus, calculates use_count correctly for fallback spans.
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
d0567476e5 gdb: Switch 'scylla memory' to use the span_checker to find large spans
Simplifies code.
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
4b748e601c gdb: Switch task_histogram to use the span_checker
It can be that page::pool is != nullptr and page::offset_in_span is 0
for a page which is inside a large allocation span (live or
dead). This may lead to misqualification of that span as belonging to
a small allocation pool and interpreting its contents as if it
contained small objects.

Only the first page of a span contains reliable information. This
patch changes the code to use the span_checker, which knows the real
boundaries of spans and exposes reliable information via the span
object.

Another problem was that the command scanned dead spans as well.
This is no longer the case after this patch.

I've seen this command report thousands of no longer live sstable
writers and various continuations because of those problems.

Fixes #4367
2019-04-09 13:22:05 +02:00
Tomasz Grabiec
c7215a2f67 gdb: Introduce span_checker
The purpose is to encapsulate iteration and lookup of seastar
allocator memory spans.
2019-04-09 13:22:05 +02:00
2903 changed files with 67238 additions and 19312 deletions

View File

@@ -1,4 +0,0 @@
Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.
See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.
If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).

5
.gitignore vendored
View File

@@ -19,3 +19,8 @@ CMakeLists.txt.user
__pycache__CMakeLists.txt.user
.gdbinit
resources
.pytest_cache
/expressions.tokens
tags
testlog/*
test/*/*.reject

5
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui
@@ -12,3 +12,6 @@
[submodule "libdeflate"]
path = libdeflate
url = ../libdeflate
[submodule "zstd"]
path = zstd
url = ../zstd

View File

@@ -97,7 +97,7 @@ scan_scylla_source_directories(
service
sstables
streaming
tests
test
thrift
tracing
transport

View File

@@ -1,6 +1,6 @@
# Asking questions or requesting help
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.
# Reporting an issue

View File

@@ -22,6 +22,15 @@ Scylla depends on the system package manager for its development dependencies.
Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
On Ubuntu and Debian based Linux distributions, some packages
required to build Scylla are missing in the official upstream:
- libthrift-dev and libthrift
- antlr3-c++-dev
Try running ```sudo ./scripts/scylla_current_repo``` to add Scylla upstream,
and get the missing packages from it.
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native
@@ -47,7 +56,7 @@ $ ./configure.py --help
The most important option is:
- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.
@@ -55,6 +64,7 @@ To save time -- for instance, to avoid compiling all unit tests -- you can also
```bash
$ ninja-build build/release/tests/schema_change_test
$ ninja-build build/release/service/storage_proxy.o
```
You can also specify a single mode. For example
@@ -194,7 +204,7 @@ cqlsh and nodetool. They are available at
https://github.com/scylladb/scylla-tools-java and can be built with
```bash
$ ./install-dependencies.sh
$ sudo ./install-dependencies.sh
$ ant jar
```

View File

@@ -5,8 +5,6 @@ F: Filename, directory, or pattern for the subsystem
---
AUTH
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
R: Vlad Zolotarov <vladz@scylladb.com>
R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
@@ -14,22 +12,17 @@ F: auth/*
CACHE
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
R: Piotr Jastrzebski <piotr@scylladb.com>
F: row_cache*
F: *mutation*
F: tests/mvcc*
COMMITLOG / BATCHLOGa
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Calle Wilund <calle@scylladb.com>
F: db/commitlog/*
F: db/batch*
COORDINATOR
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Gleb Natapov <gleb@scylladb.com>
F: service/storage_proxy*
@@ -49,12 +42,10 @@ M: Pekka Enberg <penberg@scylladb.com>
F: cql3/*
COUNTERS
M: Paweł Dziepak <pdziepak@scylladb.com>
F: counters*
F: tests/counter_test*
GOSSIP
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
R: Asias He <asias@scylladb.com>
F: gms/*
@@ -65,14 +56,11 @@ F: dist/docker/*
LSA
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
F: utils/logalloc*
MATERIALIZED VIEWS
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
R: Duarte Nunes <duarte@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: db/view/*
F: cql3/statements/*view*
@@ -82,14 +70,12 @@ F: dist/*
REPAIR
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
F: repair/*
SCHEMA MANAGEMENT
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Pekka Enberg <penberg@scylladb.com>
F: db/schema_tables*
F: db/legacy_schema_migrator*
@@ -98,15 +84,13 @@ F: schema*
SECONDARY INDEXES
M: Pekka Enberg <penberg@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
R: Pekka Enberg <penberg@scylladb.com>
F: db/index/*
F: cql3/statements/*index*
SSTABLES
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Raphael S. Carvalho <raphaelsc@scylladb.com>
R: Glauber Costa <glauber@scylladb.com>
R: Nadav Har'El <nyh@scylladb.com>
@@ -114,18 +98,17 @@ F: sstables/*
STREAMING
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
R: Asias He <asias@scylladb.com>
F: streaming/*
F: service/storage_service.*
THRIFT TRANSPORT LAYER
M: Duarte Nunes <duarte@scylladb.com>
F: thrift/*
ALTERNATOR
M: Nadav Har'El <nyh@scylladb.com>
F: alternator/*
F: alternator-test/*
THE REST
M: Avi Kivity <avi@scylladb.com>
M: Paweł Dziepak <pdziepak@scylladb.com>
M: Duarte Nunes <duarte@scylladb.com>
M: Tomasz Grabiec <tgrabiec@scylladb.com>
M: Nadav Har'El <nyh@scylladb.com>
F: *

View File

@@ -1,29 +0,0 @@
Seastar and DPDK
================
Seastar uses the Data Plane Development Kit to drive NIC hardware directly. This
provides an enormous performance boost.
To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a
run-time parameter. This will use the DPDK package provided as a git submodule with the
seastar sources.
To use your own self-compiled DPDK package, follow this procedure:
1. Setup host to compile DPDK:
- Ubuntu
`sudo apt-get install -y build-essential linux-image-extra-$(uname -r)`
2. Prepare a DPDK SDK:
- Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`
- Untar it.
- Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.
- For DPDK 1.7.x: edit config/common_linuxapp:
- Set CONFIG_RTE_LIBRTE_PMD_BOND to 'n'.
- Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.
- Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.
- Start the tools/setup.sh script as root.
- Compile a linuxapp target (option 9).
- Install IGB_UIO module (option 11).
- Bind some physical port to IGB_UIO (option 17).
- Configure hugepage mappings (option 14/15).
3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.

View File

@@ -2,21 +2,22 @@
## Quick-start
To get the build going quickly, Scylla offers a [frozen toolchain](tools/toolchain/README.md)
which would build and run Scylla using a pre-configured Docker image.
Using the frozen toolchain will also isolate all of the installed
dependencies in a Docker container.
Assuming you have met the toolchain prerequisites, which is running
Docker in user mode, building and running is as easy as:
```bash
$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!
```
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.
**Note**: GCC >= 8.1.1 is require to compile Scylla.
**Note**: See [frozen toolchain](tools/toolchain/README.md) for a way to build and run
on an older distribution.
**Note**: GCC >= 8.1.1 is required to compile Scylla.
## Running Scylla
@@ -26,10 +27,10 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
```
* run Scylla with one CPU and ./tmp as data directory
* run Scylla with one CPU and ./tmp as work directory
```
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
./build/release/scylla --workdir tmp --smp 1
```
* For more run options:
@@ -37,6 +38,24 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
./build/release/scylla --help
```
## Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
Thrift. There is also experimental support for the API of Amazon DynamoDB,
but being experimental it needs to be explicitly enabled to be used. For more
information on how to enable the experimental DynamoDB compatibility in Scylla,
and the current limitations of this feature, see
[Alternator](docs/alternator/alternator.md) and
[Getting started with Alternator](docs/alternator/getting-started.md).
## Documentation
Documentation can be found in [./docs](./docs) and on the
[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear
definition of what goes where, so when looking for something be sure to check
both.
Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
User documentation can be found [here](https://docs.scylladb.com/).
## Building Fedora RPM
As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:
@@ -80,4 +99,5 @@ docker run -p $(hostname -i):9042:9042 -i -t <image name>
## Contributing to Scylla
[Hacking howto](HACKING.md)
[Guidelines for contributing](CONTRIBUTING.md)

View File

@@ -1,7 +1,7 @@
#!/bin/sh
PRODUCT=scylla
VERSION=666.development
VERSION=3.3.4
if test -f version
then

78
alternator-test/README.md Normal file
View File

@@ -0,0 +1,78 @@
Tests for Alternator that should also pass, identically, against DynamoDB.
Tests use the boto3 library for AWS API, and the pytest frameworks
(both are available from Linux distributions, or with "pip install").
To run all tests against the local installation of Alternator on
http://localhost:8000, just run `pytest`.
Some additional pytest options:
* To run all tests in a single file, do `pytest test_table.py`.
* To run a single specific test, do `pytest test_table.py::test_create_table_unsupported_names`.
* Additional useful pytest options, especially useful for debugging tests:
* -v: show the names of each individual test running instead of just dots.
* -s: show the full output of running tests (by default, pytest captures the test's output and only displays it if a test fails)
Add the `--aws` option to test against AWS instead of the local installation.
For example - `pytest --aws test_item.py` or `pytest --aws`.
If you plan to run tests against AWS and not just a local Scylla installation,
the files ~/.aws/credentials should be configured with your AWS key:
```
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
and ~/.aws/config with the default region to use in the test:
```
[default]
region = us-east-1
```
## HTTPS support
In order to run tests with HTTPS, run pytest with `--https` parameter. Note that the Scylla cluster needs to be provided
with alternator\_https\_port configuration option in order to initialize a HTTPS server.
Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate
a key and a self-signed certificate, which is sufficient to run `--https` tests:
```
openssl genrsa 2048 > scylla.key
openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt
```
If this pair is put into `conf/` directory, it will be enough
to allow the alternator HTTPS server to think it's been authorized and properly certified.
Still, boto3 library issues warnings that the certificate used for communication is self-signed,
and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.
## Authorization
By default, boto3 prepares a properly signed Authorization header with every request.
In order to confirm the authorization, the server recomputes the signature by using
user credentials (user-provided username + a secret key known by the server),
and then checks if it matches the signature from the header.
Early alternator code did not verify signatures at all, which is also allowed by the protocol.
A partial implementation of the authorization verification can be allowed by providing a Scylla
configuration parameter:
```yaml
alternator_enforce_authorization: true
```
The implementation is currently coupled with Scylla's system\_auth.roles table,
which means that an additional step needs to be performed when setting up Scylla
as the test environment. Tests will use the following credentials:
Username: `alternator`
Secret key: `secret_pass`
With CQLSH, it can be achieved by executing this snipped:
```bash
cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"
```
Most tests expect the authorization to succeed, so they will pass even with `alternator_enforce_authorization`
turned off. However, test cases from `test_authorization.py` may require this option to be turned on,
so it's advised.

179
alternator-test/conftest.py Normal file
View File

@@ -0,0 +1,179 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# This file contains "test fixtures", a pytest concept described in
# https://docs.pytest.org/en/latest/fixture.html.
# A "fixture" is some sort of setup which an invididual test requires to run.
# The fixture has setup code and teardown code, and if multiple tests
# require the same fixture, it can be set up only once - while still allowing
# the user to run individual tests and automatically set up the fixtures they need.
import pytest
import boto3
from util import create_test_table
# Test that the Boto libraries are new enough. These tests want to test a
# large variety of DynamoDB API features, and to do this we need a new-enough
# version of the the Boto libraries (boto3 and botocore) so that they can
# access all these API features.
# In particular, the BillingMode feature was added in botocore 1.12.54.
import botocore
import sys
from distutils.version import LooseVersion
if (LooseVersion(botocore.__version__) < LooseVersion('1.12.54')):
pytest.exit("Your Boto library is too old. Please upgrade it,\ne.g. using:\n sudo pip{} install --upgrade boto3".format(sys.version_info[0]))
# By default, tests run against a local Scylla installation on localhost:8080/.
# The "--aws" option can be used to run against Amazon DynamoDB in the us-east-1
# region.
def pytest_addoption(parser):
parser.addoption("--aws", action="store_true",
help="run against AWS instead of a local Scylla installation")
parser.addoption("--https", action="store_true",
help="communicate via HTTPS protocol on port 8043 instead of HTTP when"
" running against a local Scylla installation")
# "dynamodb" fixture: set up client object for communicating with the DynamoDB
# API. Currently this chooses either Amazon's DynamoDB in the default region
# or a local Alternator installation on http://localhost:8080 - depending on the
# existence of the "--aws" option. In the future we should provide options
# for choosing other Amazon regions or local installations.
# We use scope="session" so that all tests will reuse the same client object.
@pytest.fixture(scope="session")
def dynamodb(request):
if request.config.getoption('aws'):
return boto3.resource('dynamodb')
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
local_url = 'https://localhost:8043' if request.config.getoption('https') else 'http://localhost:8000'
# Disable verifying in order to be able to use self-signed TLS certificates
verify = not request.config.getoption('https')
# Silencing the 'Unverified HTTPS request warning'
if request.config.getoption('https'):
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
return boto3.resource('dynamodb', endpoint_url=local_url, verify=verify,
region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')
# "test_table" fixture: Create and return a temporary table to be used in tests
# that need a table to work on. The table is automatically deleted at the end.
# We use scope="session" so that all tests will reuse the same client object.
# This "test_table" creates a table which has a specific key schema: both a
# partition key and a sort key, and both are strings. Other fixtures (below)
# can be used to create different types of tables.
#
# TODO: Although we are careful about deleting temporary tables when the
# fixture is torn down, in some cases (e.g., interrupted tests) we can be left
# with some tables not deleted, and they will never be deleted. Because all
# our temporary tables have the same test_table_prefix, we can actually find
# and remove these old tables with this prefix. We can have a fixture, which
# test_table will require, which on teardown will delete all remaining tables
# (possibly from an older run). Because the table's name includes the current
# time, we can also remove just tables older than a particular age. Such
# mechanism will allow running tests in parallel, without the risk of deleting
# a parallel run's temporary tables.
@pytest.fixture(scope="session")
def test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
yield table
# We get back here when this fixture is torn down. We ask Dynamo to delete
# this table, but not wait for the deletion to complete. The next time
# we create a test_table fixture, we'll choose a different table name
# anyway.
table.delete()
# The following fixtures test_table_* are similar to test_table but create
# tables with different key schemas.
@pytest.fixture(scope="session")
def test_table_s(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_b(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sb(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sn(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'N' } ])
yield table
table.delete()
# "filled_test_table" fixture: Create a temporary table to be used in tests
# that involve reading data - GetItem, Scan, etc. The table is filled with
# 328 items - each consisting of a partition key, clustering key and two
# string attributes. 164 of the items are in a single partition (with the
# partition key 'long') and the 164 other items are each in a separate
# partition. Finally, a 329th item is added with different attributes.
# This table is supposed to be read from, not updated nor overwritten.
# This fixture returns both a table object and the description of all items
# inserted into it.
@pytest.fixture(scope="session")
def filled_test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
count = 164
items = [{
'p': str(i),
'c': str(i),
'attribute': "x" * 7,
'another': "y" * 16
} for i in range(count)]
items = items + [{
'p': 'long',
'c': str(i),
'attribute': "x" * (1 + i % 7),
'another': "y" * (1 + i % 16)
} for i in range(count)]
items.append({'p': 'hello', 'c': 'world', 'str': 'and now for something completely different'})
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
yield table, items
table.delete()

View File

@@ -0,0 +1,74 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for authorization
import pytest
import botocore
from botocore.exceptions import ClientError
import boto3
import requests
# Test that trying to perform an operation signed with a wrong key
# will not succeed
def test_wrong_key_access(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints()
# A similar test, but this time the user is expected to exist in the database (for local tests)
def test_wrong_password(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='alternator', aws_secret_access_key='wrong_key').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='wrong_key', verify=verify).describe_endpoints()
# A test ensuring that expired signatures are not accepted
def test_expired_signature(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '20170101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature expired" in response.text
# A test ensuring that signatures that exceed current time too much are not accepted.
# Watch out - this test is valid only for around next 1000 years, it needs to be updated later.
def test_signature_too_futuristic(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '30200101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers, verify=False)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text

View File

@@ -0,0 +1,253 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for batch operations - BatchWriteItem, BatchReadItem.
# Note that various other tests in other files also use these operations,
# so they are actually tested by other tests as well.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Test ensuring that items inserted by a batched statement can be properly extracted
# via GetItem. Schema has both hash and sort keys.
def test_basic_batch_write_item(test_table):
count = 7
with test_table.batch_writer() as batch:
for i in range(count):
batch.put_item(Item={
'p': "batch{}".format(i),
'c': "batch_ck{}".format(i),
'attribute': str(i),
'another': 'xyz'
})
for i in range(count):
item = test_table.get_item(Key={'p': "batch{}".format(i), 'c': "batch_ck{}".format(i)}, ConsistentRead=True)['Item']
assert item['p'] == "batch{}".format(i)
assert item['c'] == "batch_ck{}".format(i)
assert item['attribute'] == str(i)
assert item['another'] == 'xyz'
# Test batch write to a table with only a hash key
def test_batch_write_hash_only(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
# Test batch delete operation (DeleteRequest): We create a bunch of items, and
# then delete them all.
def test_batch_write_delete(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
with test_table_s.batch_writer() as batch:
for item in items:
batch.delete_item(Key={'p': item['p']})
# Verify that all items are now missing:
for item in items:
assert not 'Item' in test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)
# Test the same batch including both writes and delete. Should be fine.
def test_batch_write_and_delete(test_table_s):
p1 = random_string()
p2 = random_string()
test_table_s.put_item(Item={'p': p1})
assert 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert not 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p2})
batch.delete_item(Key={'p': p1})
assert not 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
# It is forbidden to update the same key twice in the same batch.
# DynamoDB says "Provided list of item keys contains duplicates".
def test_batch_write_duplicate_write(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
def test_batch_write_duplicate_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.delete_item(Key={'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': other})
batch.delete_item(Key={'p': other, 'c': c})
def test_batch_write_duplicate_write_and_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
# Test that BatchWriteItem's PutRequest completely replaces an existing item.
# It shouldn't merge it with a previously existing value. See also the same
# test for PutItem - test_put_item_replace().
def test_batch_put_item_replace(test_table_s, test_table):
p = random_string()
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}
# Test that if one of the batch's operations is invalid, because a key
# column is missing or has the wrong type, the entire batch is rejected
# before any write is done.
def test_batch_write_invalid_operation(test_table_s):
# test key attribute with wrong type:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'p': 3}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# test missing key attribute:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'x': 'whatever'}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for BatchGetItem, reading several entire items.
# Schema has both hash and sort keys.
def test_batch_get_item(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
# We use the low-level batch_get_item API for lack of a more convenient
# API. At least it spares us the need to encode the key's types...
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ConsistentRead': True}})
print(reply)
got_items = reply['Responses'][test_table.name]
assert multiset(got_items) == multiset(items)
# Same, with schema has just hash key.
def test_batch_get_item_hash(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p')} for x in items]
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': keys, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert multiset(got_items) == multiset(items)
# Test what do we get if we try to read two *missing* values in addition to
# an existing one. It turns out the missing items are simply not returned,
# with no sign they are missing.
def test_batch_get_item_missing(test_table_s):
p = random_string();
test_table_s.put_item(Item={'p': p})
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}, {'p': random_string()}, {'p': p}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == [{'p' : p}]
# If all the keys requested from a particular table are missing, we still
# get a response array for that table - it's just empty.
def test_batch_get_item_completely_missing(test_table_s):
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == []
# Test GetItem with AttributesToGet
def test_batch_get_item_attributes_to_get(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'AttributesToGet': wanted, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)
# Test GetItem with ProjectionExpression (just a simple one, with
# top-level attributes)
def test_batch_get_item_projection_expression(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ProjectionExpression': ",".join(wanted), 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,49 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for the DescribeEndpoints operation
import boto3
# Test that the DescribeEndpoints operation works as expected: that it
# returns one endpoint (it may return more, but it never does this in
# Amazon), and this endpoint can be used to make more requests.
def test_describe_endpoints(request, dynamodb):
endpoints = dynamodb.meta.client.describe_endpoints()['Endpoints']
# It is not strictly necessary that only a single endpoint be returned,
# but this is what Amazon DynamoDB does today (and so does Alternator).
assert len(endpoints) == 1
for endpoint in endpoints:
assert 'CachePeriodInMinutes' in endpoint.keys()
address = endpoint['Address']
# Check that the address is a valid endpoint by checking that we can
# send it another describe_endpoints() request ;-) Note that the
# address does not include the "http://" or "https://" prefix, and
# we need to choose one manually.
prefix = "https://" if request.config.getoption('https') else "http://"
verify = not request.config.getoption('https')
url = prefix + address
if address.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, verify=verify).describe_endpoints()
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass', verify=verify).describe_endpoints()
# Nothing to check here - if the above call failed with an exception,
# the test would fail.

View File

@@ -0,0 +1,169 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the DescribeTable operation.
# Some attributes used only by a specific major feature will be tested
# elsewhere:
# 1. Tests for describing tables with global or local secondary indexes
# (the GlobalSecondaryIndexes and LocalSecondaryIndexes attributes)
# are in test_gsi.py and test_lsi.py.
# 2. Tests for the stream feature (LatestStreamArn, LatestStreamLabel,
# StreamSpecification) will be in the tests devoted to the stream
# feature.
# 3. Tests for describing a restored table (RestoreSummary, TableId)
# will be together with tests devoted to the backup/restore feature.
import pytest
from botocore.exceptions import ClientError
import re
import time
from util import multiset
# Test that DescribeTable correctly returns the table's name and state
def test_describe_table_basic(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['TableName'] == test_table.name
assert got['TableStatus'] == 'ACTIVE'
# Test that DescribeTable correctly returns the table's schema, in
# AttributeDefinitions and KeySchema attributes
def test_describe_table_schema(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
expected = { # Copied from test_table()'s fixture
'KeySchema': [ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
'AttributeDefinitions': [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
]
}
assert got['KeySchema'] == expected['KeySchema']
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected['AttributeDefinitions'])
# Test that DescribeTable correctly returns the table's billing mode,
# in the BillingModeSummary attribute.
def test_describe_table_billing(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['BillingModeSummary']['BillingMode'] == 'PAY_PER_REQUEST'
# The BillingModeSummary should also contain a
# LastUpdateToPayPerRequestDateTime attribute, which is a date.
# We don't know what date this is supposed to be, but something we
# do know is that the test table was created already with this billing
# mode, so the table creation date should be the same as the billing
# mode setting date.
assert 'LastUpdateToPayPerRequestDateTime' in got['BillingModeSummary']
assert got['BillingModeSummary']['LastUpdateToPayPerRequestDateTime'] == got['CreationDateTime']
# Test that DescribeTable correctly returns the table's creation time.
# We don't know what this creation time is supposed to be, so this test
# cannot be very thorough... We currently just tests against something we
# know to be wrong - returning the *current* time, which changes on every
# call.
@pytest.mark.xfail(reason="DescribeTable does not return table creation time")
def test_describe_table_creation_time(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'CreationDateTime' in got
time1 = got['CreationDateTime']
time.sleep(1)
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
time2 = got['CreationDateTime']
assert time1 == time2
# Test that DescribeTable returns the table's estimated item count
# in the ItemCount attribute. Unfortunately, there's not much we can
# really test here... The documentation says that the count can be
# delayed by six hours, so the number we get here may have no relation
# to the current number of items in the test table. The attribute should exist,
# though. This test does NOT verify that ItemCount isn't always returned as
# zero - such stub implementation will pass this test.
@pytest.mark.xfail(reason="DescribeTable does not return table item count")
def test_describe_table_item_count(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'ItemCount' in got
# Similar test for estimated size in bytes - TableSizeBytes - which again,
# may reflect the size as long as six hours ago.
@pytest.mark.xfail(reason="DescribeTable does not return table size")
def test_describe_table_size(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableSizeBytes' in got
# Test the ProvisionedThroughput attribute returned by DescribeTable.
# This is a very partial test: Our test table is configured without
# provisioned throughput, so obviously it will not have interesting settings
# for it. DynamoDB returns zeros for some of the attributes, even though
# the documentation suggests missing values should have been fine too.
@pytest.mark.xfail(reason="DescribeTable does not return provisioned throughput")
def test_describe_table_provisioned_throughput(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['ProvisionedThroughput']['NumberOfDecreasesToday'] == 0
assert got['ProvisionedThroughput']['WriteCapacityUnits'] == 0
assert got['ProvisionedThroughput']['ReadCapacityUnits'] == 0
# This is a silly test for the RestoreSummary attribute in DescribeTable -
# it should not exist in a table not created by a restore. When testing
# the backup/restore feature, we will have more meaninful tests for the
# value of this attribute in that case.
def test_describe_table_restore_summary(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'RestoreSummary' in got
# This is a silly test for the SSEDescription attribute in DescribeTable -
# by default, a table is encrypted with AWS-owned keys, not using client-
# owned keys, and the SSEDescription attribute is not returned at all.
def test_describe_table_encryption(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'SSEDescription' in got
# This is a silly test for the StreamSpecification attribute in DescribeTable -
# when there are no streams, this attribute should be missing.
def test_describe_table_stream_specification(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'StreamSpecification' in got
# Test that the table has an ARN, a unique identifier for the table which
# includes which zone it is on, which account, and of course the table's
# name. The ARN format is described in
# https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#genref-arns
@pytest.mark.xfail(reason="DescribeTable does not return ARN")
def test_describe_table_arn(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableArn' in got and got['TableArn'].startswith('arn:')
# Test that the table has a TableId.
# TODO: Figure out what is this TableId supposed to be, it is just a
# unique id that is created with the table and never changes? Or anything
# else?
@pytest.mark.xfail(reason="DescribeTable does not return TableId")
def test_describe_table_id(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableId' in got
# DescribeTable error path: trying to describe a non-existent table should
# result in a ResourceNotFoundException.
def test_describe_table_non_existent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException') as einfo:
dynamodb.meta.client.describe_table(TableName='non_existent_table')
# As one of the first error-path tests that we wrote, let's test in more
# detail that the error reply has the appropriate fields:
response = einfo.value.response
print(response)
err = response['Error']
assert err['Code'] == 'ResourceNotFoundException'
assert re.match(err['Message'], 'Requested resource not found: Table: non_existent_table not found')

File diff suppressed because it is too large Load Diff

874
alternator-test/test_gsi.py Normal file
View File

@@ -0,0 +1,874 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of GSI (Global Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new GSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# GSIs only support eventually consistent reads, so tests that involve
# writing to a table and then expect to read something from it cannot be
# guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
# Note that in practice, there repeated reads are almost never necessary:
# Amazon claims that "Changes to the table data are propagated to the global
# secondary indexes within a fraction of a second, under normal conditions"
# and indeed, in practice, the tests here almost always succeed without a
# retry.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_gsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# One of the simplest forms of a non-trivial GSI: The base table has a hash
# and sort key, and the index reverses those roles. Other attributes are just
# copied.
@pytest.fixture(scope="session")
def test_table_gsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_simple(test_table_gsi_1):
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
# The index allows a query on just a specific sort key, which isn't
# allowed on the base table.
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1, 'hello', full_scan(test_table_gsi_1))
def test_gsi_same_key(test_table_gsi_1):
c = random_string();
# All these items have the same sort key 'c' but different hash key 'p'
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Check we get an appropriate error when trying to read a non-existing index
# of an existing table. Although the documentation specifies that a
# ResourceNotFoundException should be returned if "The operation tried to
# access a nonexistent table or index", in fact in the specific case that
# the table does exist but an index does not - we get a ValidationException.
def test_gsi_missing_index(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_query(test_table_gsi_1, IndexName='wrong_name',
KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_scan(test_table_gsi_1, IndexName='wrong_name')
# Nevertheless, if the table itself does not exist, a query should return
# a ResourceNotFoundException, not ValidationException:
def test_gsi_missing_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.query(TableName='nonexistent_table', IndexName='any_name', KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.scan(TableName='nonexistent_table', IndexName='any_name')
# Verify that strongly-consistent reads on GSI are *not* allowed.
@pytest.mark.xfail(reason="GSI strong consistency not checked")
def test_gsi_strong_consistency(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': ['hi'], 'ComparisonOperator': 'EQ'}}, IndexName='hello', ConsistentRead=True)
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_scan(test_table_gsi_1, IndexName='hello', ConsistentRead=True)
# Verify that a GSI is correctly listed in describe_table
@pytest.mark.xfail(reason="DescribeTable provides index names only, no size or item count")
def test_gsi_describe(test_table_gsi_1):
desc = test_table_gsi_1.meta.client.describe_table(TableName=test_table_gsi_1.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 1
gsi = gsis[0]
assert gsi['IndexName'] == 'hello'
assert 'IndexSizeBytes' in gsi # actual size depends on content
assert 'ItemCount' in gsi
assert gsi['Projection'] == {'ProjectionType': 'ALL'}
assert gsi['IndexStatus'] == 'ACTIVE'
assert gsi['KeySchema'] == [{'KeyType': 'HASH', 'AttributeName': 'c'},
{'KeyType': 'RANGE', 'AttributeName': 'p'}]
# TODO: check also ProvisionedThroughput, IndexArn
# When a GSI's key includes an attribute not in the base table's key, we
# need to remember to add its type to AttributeDefinitions.
def test_gsi_missing_attribute_definition(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*AttributeDefinitions'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [ { 'AttributeName': 'c', 'KeyType': 'HASH' } ],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# test_table_gsi_1_hash_only is a variant of test_table_gsi_1: It's another
# case where the index doesn't involve non-key attributes. Again the base
# table has a hash and sort key, but in this case the index has *only* a
# hash key (which is the base's hash key). In the materialized-view-based
# implementation, we need to remember the other part of the base key as a
# clustering key.
@pytest.fixture(scope="session")
def test_table_gsi_1_hash_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_key_not_in_index(test_table_gsi_1_hash_only):
# Test with items with different 'c' values:
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1_hash_only, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Test items with the same sort key 'c' but different hash key 'p'
c = random_string();
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1_hash_only, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1_hash_only, 'hello', full_scan(test_table_gsi_1_hash_only))
# A second scenario of GSI. Base table has just hash key, Index has a
# different hash key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_gsi_2(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_2(test_table_gsi_2):
items1 = [{'p': random_string(), 'x': random_string()} for i in range(10)]
x1 = items1[0]['x']
x2 = random_string()
items2 = [{'p': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['x'] == x1]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['x'] == x2]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
def test_gsi_missing_attribute(test_table_gsi_2):
p1 = random_string()
x1 = random_string()
test_table_gsi_2.put_item(Item={'p': p1, 'x': x1})
p2 = random_string()
test_table_gsi_2.put_item(Item={'p': p2})
# Both items are now in the base table:
assert test_table_gsi_2.get_item(Key={'p': p1})['Item'] == {'p': p1, 'x': x1}
assert test_table_gsi_2.get_item(Key={'p': p2})['Item'] == {'p': p2}
# But only the first item is in the index: It can be found using a
# Query, and a scan of the index won't find it (but a scan on the base
# will).
assert_index_query(test_table_gsi_2, 'hello', [{'p': p1, 'x': x1}],
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
assert any([i['p'] == p1 for i in full_scan(test_table_gsi_2)])
# Note: with eventually consistent read, we can't really be sure that
# and item will "never" appear in the index. We do this test last,
# so if we had a bug and such item did appear, hopefully we had enough
# time for the bug to become visible. At least sometimes.
assert not any([i['p'] == p2 for i in full_scan(test_table_gsi_2, IndexName='hello')])
# Test when a table has a GSI, if the indexed attribute has the wrong type,
# the update operation is rejected, and is added to neither base table nor
# index. This is different from the case of a *missing* attribute, where
# the item is added to the base table but not index.
# The following three tests test_gsi_wrong_type_attribute_{put,update,batch}
# test updates using PutItem, UpdateItem, and BatchWriteItem respectively.
def test_gsi_wrong_type_attribute_put(test_table_gsi_2):
# PutItem with wrong type for 'x' is rejected, item isn't created even
# in the base table.
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.put_item(Item={'p': p, 'x': 3})
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
def test_gsi_wrong_type_attribute_update(test_table_gsi_2):
# An UpdateItem with wrong type for 'x' is also rejected, but naturally
# if the item already existed, it remains as it was.
p = random_string()
x = random_string()
test_table_gsi_2.put_item(Item={'p': p, 'x': x})
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.update_item(Key={'p': p}, AttributeUpdates={'x': {'Value': 3, 'Action': 'PUT'}})
assert test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': x}
def test_gsi_wrong_type_attribute_batch(test_table_gsi_2):
# In a BatchWriteItem, if any update is forbidden, the entire batch is
# rejected, and none of the updates happen at all.
p1 = random_string()
p2 = random_string()
p3 = random_string()
items = [{'p': p1, 'x': random_string()},
{'p': p2, 'x': 3},
{'p': p3, 'x': random_string()}]
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2, p3]:
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
# A third scenario of GSI. Index has a hash key and a sort key, both are
# non-key attributes from the base table. This scenario may be very
# difficult to implement in Alternator because Scylla's materialized-views
# implementation only allows one new key column in the view, and here
# we need two (which, also, aren't actual columns, but map items).
@pytest.fixture(scope="session")
def test_table_gsi_3(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_3(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
def test_gsi_update_second_regular_base_column(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
items[3]['b'] = 'updated'
test_table_gsi_3.update_item(Key={'p': items[3]['p']}, AttributeUpdates={'b': {'Value': 'updated', 'Action': 'PUT'}})
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
# This is the same feature we already tested in test_gsi_missing_attribute()
# above, but on a different table: In that test we used test_table_gsi_2,
# with one indexed attribute, and in this test we use test_table_gsi_3 which
# has two base regular attributes in the view key, and more possibilities
# of which value might be missing. Reproduces issue #6008.
def test_gsi_missing_attribute_3(test_table_gsi_3):
p = random_string()
a = random_string()
b = random_string()
# First, add an item with a missing "a" value. It should appear in the
# base table, but not in the index:
test_table_gsi_3.put_item(Item={'p': p, 'b': b})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'b': b}
# Note: with eventually consistent read, we can't really be sure that
# an item will "never" appear in the index. We hope that if a bug exists
# and such an item did appear, sometimes the delay here will be enough
# for the unexpected item to become visible.
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# Same thing for an item with a missing "b" value:
test_table_gsi_3.put_item(Item={'p': p, 'a': a})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p, 'a': a}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# And for an item missing both:
test_table_gsi_3.put_item(Item={'p': p})
assert test_table_gsi_3.get_item(Key={'p': p})['Item'] == {'p': p}
assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
# A fourth scenario of GSI. Two GSIs on a single base table.
@pytest.fixture(scope="session")
def test_table_gsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_a',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
},
{ 'IndexName': 'hello_b',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
# Test that a base table with two GSIs updates both as expected.
def test_gsi_4(test_table_gsi_4):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_4, 'hello_a', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'}})
assert_index_query(test_table_gsi_4, 'hello_b', [items[3]],
KeyConditions={'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Verify that describe_table lists the two GSIs.
def test_gsi_4_describe(test_table_gsi_4):
desc = test_table_gsi_4.meta.client.describe_table(TableName=test_table_gsi_4.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 2
assert multiset([g['IndexName'] for g in gsis]) == multiset(['hello_a', 'hello_b'])
# A scenario for GSI in which the table has both hash and sort key
@pytest.fixture(scope="session")
def test_table_gsi_5(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_5(test_table_gsi_5):
items1 = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
p1, x1 = items1[0]['p'], items1[0]['x']
p2, x2 = random_string(), random_string()
items2 = [{'p': p2, 'c': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_5.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['x'] == x1]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['x'] == x2]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Verify that DescribeTable correctly returns the schema of both base-table
# and secondary indexes. KeySchema is given for each of the base table and
# indexes, and AttributeDefinitions is merged for all of them together.
def test_gsi_5_describe_table_schema(test_table_gsi_5):
got = test_table_gsi_5.meta.client.describe_table(TableName=test_table_gsi_5.name)['Table']
# Copied from test_table_gsi_5 fixture
expected_base_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' } ]
expected_gsi_keyschema = [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# Similar DescribeTable schema test for test_table_gsi_2. The peculiarity
# in that table is that the base table has only a hash key p, and index
# only hash hash key x; Now, while internally Scylla needs to add "p" as a
# clustering key in the materialized view (in Scylla the view key always
# contains the base key), when describing the table, "p" shouldn't be
# returned as a range key, because the user didn't ask for it.
# This test reproduces issue #5320.
@pytest.mark.xfail(reason="GSI DescribeTable spurious range key (#5320)")
def test_gsi_2_describe_table_schema(test_table_gsi_2):
got = test_table_gsi_2.meta.client.describe_table(TableName=test_table_gsi_2.name)['Table']
# Copied from test_table_gsi_2 fixture
expected_base_keyschema = [ { 'AttributeName': 'p', 'KeyType': 'HASH' } ]
expected_gsi_keyschema = [ { 'AttributeName': 'x', 'KeyType': 'HASH' } ]
expected_all_attribute_definitions = [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' } ]
assert got['KeySchema'] == expected_base_keyschema
gsis = got['GlobalSecondaryIndexes']
assert len(gsis) == 1
assert gsis[0]['KeySchema'] == expected_gsi_keyschema
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
# All tests above involved "ProjectionType: ALL". This test checks how
# "ProjectionType:: KEYS_ONLY" works. We note that it projects both
# the index's key, *and* the base table's key. So items which had different
# base-table keys cannot suddenly become the same item in the index.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
items = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
table.delete()
# Test for "ProjectionType:: INCLUDE". The secondary table includes the
# its own and the base's keys (as in KEYS_ONLY) plus the extra keys given
# in NonKeyAttributes.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_include(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'INCLUDE',
'NonKeyAttributes': ['a', 'b'] }
}
])
# Some items have the projected attributes a,b and some don't:
items = [{'p': random_string(), 'x': random_string(), 'a': random_string(), 'b': random_string(), 'y': random_string()} for i in range(10)]
items = items + [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x', 'a', 'b']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
print(len(expected_items))
table.delete()
# DynamoDB's says the "Projection" argument of GlobalSecondaryIndexes is
# mandatory, and indeed Boto3 enforces that it must be passed. The
# documentation then goes on to claim that the "ProjectionType" member of
# "Projection" is optional - and Boto3 allows it to be missing. But in
# fact, it is not allowed to be missing: DynamoDB complains: "Unknown
# ProjectionType: null".
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_missing_projection_type(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*ProjectionType'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': {}
}
])
# update_table() for creating a GSI is an asynchronous operation.
# The table's TableStatus changes from ACTIVE to UPDATING for a short while
# and then goes back to ACTIVE, but the new GSI's IndexStatus appears as
# CREATING, until eventually (after a *long* time...) it becomes ACTIVE.
# During the CREATING phase, at some point the Backfilling attribute also
# appears, until it eventually disappears. We need to wait until all three
# markers indicate completion.
# Unfortunately, while boto3 has a client.get_waiter('table_exists') to
# wait for a table to exists, there is no such function to wait for an
# index to come up, so we need to code it ourselves.
def wait_for_gsi(table, gsi_name):
start_time = time.time()
# Surprisingly, even for tiny tables this can take a very long time
# on DynamoDB - often many minutes!
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
assert len(index_desc) == 1
index_status = index_desc[0]['IndexStatus']
if index_status != 'ACTIVE':
print('%d Index status still %s' % (i, index_status))
continue
# When the index is ACTIVE, this must be after backfilling completed
assert not 'Backfilling' in index_desc[0]
print('wait_for_gsi took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi did not complete")
# Similarly to how wait_for_gsi() waits for a GSI to finish adding,
# this function waits for a GSI to be finally deleted.
def wait_for_gsi_gone(table, gsi_name):
start_time = time.time()
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
if 'GlobalSecondaryIndexes' in desc['Table']:
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
if len(index_desc) != 0:
index_status = index_desc[0]['IndexStatus']
print('%d Index status still %s' % (i, index_status))
continue
print('wait_for_gsi_gone took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi_gone did not complete")
# All tests above involved creating a new table with a GSI up-front. This
# test will test creating a base table *without* a GSI, putting data in
# it, and then adding a GSI with the UpdateTable operation. This starts
# a backfilling stage - where data is copied to the index - and when this
# stage is done, the index is usable. Items whose indexed column contains
# the wrong type are silently ignored and not added to the index (it would
# not have been possible to add such items if the GSI was already configured
# when they were added).
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_backfill(dynamodb):
# First create, and fill, a table without GSI. The items in items1
# will have the appropriate string type for 'x' and will later get
# indexed. Items in item2 have no value for 'x', and in item3 'x' is in
# not a string; So the items in items2 and items3 will be missing
# in the index we'll create later.
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
items1 = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
items2 = [{'p': random_string(), 'y': random_string()} for i in range(10)]
items3 = [{'p': random_string(), 'x': i} for i in range(10)]
items = items1 + items2 + items3
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert multiset(items) == multiset(full_scan(table))
# Now use UpdateTable to create the GSI
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'x', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'x', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the table is backfilled.
wait_for_gsi(table, 'hello')
# As explained above, only items in items1 got copied to the gsi,
# and Scan on them works as expected.
# Note that we don't need to retry the reads here (i.e., use the
# assert_index_scan() or assert_index_query() functions) because after
# we waited for backfilling to complete, we know all the pre-existing
# data is already in the index.
assert multiset(items1) == multiset(full_scan(table, IndexName='hello'))
# We can also use Query on the new GSI, to search on the attribute x:
assert multiset([items1[3]]) == multiset(full_query(table,
IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items1[3]['x']], 'ComparisonOperator': 'EQ'}}))
# Let's also test that we cannot add another index with the same name
# that already exists
with pytest.raises(ClientError, match='ValidationException.*already exists'):
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'y', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'y', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
table.delete()
# Test deleting an existing GSI using UpdateTable
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_delete(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# So far, we have the index for "x" and can use it:
assert_index_query(table, 'hello', [items[3]],
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
# Now use UpdateTable to delete the GSI for "x"
dynamodb.meta.client.update_table(TableName=table.name,
GlobalSecondaryIndexUpdates=[{ 'Delete':
{ 'IndexName': 'hello' } }])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the GSI is removed.
wait_for_gsi_gone(table, 'hello')
# Now index is gone. We cannot query using it.
with pytest.raises(ClientError, match='ValidationException.*hello'):
full_query(table, IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
table.delete()
# Utility function for creating a new table a GSI with the given name,
# and, if creation was successful, delete it. Useful for testing which
# GSI names work.
def create_gsi(dynamodb, index_name):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# Verify that the GSI wasn't just ignored, as Scylla originally did ;-)
assert 'GlobalSecondaryIndexes' in table.meta.client.describe_table(TableName=table.name)['Table']
table.delete()
# Like table names (tested in test_table.py), index names must must also
# be 3-255 characters and match the regex [a-zA-Z0-9._-]+. This test
# is similar to test_create_table_unsupported_names(), but for GSI names.
# Note that Scylla is actually more limited in the length of the index
# names, because both table name and index name, together, have to fit in
# 221 characters. But we don't verify here this specific limitation.
def test_gsi_unsupported_names(dynamodb):
# Unfortunately, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException.*nnnnn'):
create_gsi(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException.*nyh'):
create_gsi(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_gsi_non_scylla_name(dynamodb):
create_gsi(dynamodb, '.alternator_test')
# Index names with 255 characters are allowed in Dynamo. In Scylla, the
# limit is different - the sum of both table and index length cannot
# exceed 211 characters. So we test a much shorter limit.
# (compare test_create_and_delete_table_very_long_name()).
def test_gsi_very_long_name(dynamodb):
#create_gsi(dynamodb, 'n' * 255) # works on DynamoDB, but not on Scylla
create_gsi(dynamodb, 'n' * 190)
# Verify that ListTables does not list materialized views used for indexes.
# This is hard to test, because we don't really know which table names
# should be listed beyond those we created, and don't want to assume that
# no other test runs in parallel with us. So the method we chose is to use a
# unique random name for an index, and check that no table contains this
# name. This assumes that materialized-view names are composed using the
# index's name (which is currently what we do).
@pytest.fixture(scope="session")
def test_table_gsi_random_name(dynamodb):
index_name = random_string()
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield [table, index_name]
table.delete()
def test_gsi_list_tables(dynamodb, test_table_gsi_random_name):
table, index_name = test_table_gsi_random_name
# Check that the random "index_name" isn't a substring of any table name:
tables = list_tables(dynamodb)
for name in tables:
assert not index_name in name
# But of course, the table's name should be in the list:
assert table.name in tables

View File

@@ -0,0 +1,35 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the health check
import requests
# Test that a health check can be performed with a GET packet
def test_health_works(dynamodb):
url = dynamodb.meta.client._endpoint.host
response = requests.get(url)
assert response.ok
assert response.content.decode('utf-8').strip() == 'healthy: {}'.format(url.replace('https://', '').replace('http://', ''))
# Test that a health check only works for the root URL ('/')
def test_health_only_works_for_root_path(dynamodb):
url = dynamodb.meta.client._endpoint.host
for suffix in ['/abc', '/-', '/index.htm', '/health']:
print(url + suffix)
response = requests.get(url + suffix, verify=False)
assert response.status_code in range(400, 405)

View File

@@ -0,0 +1,402 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the CRUD item operations: PutItem, GetItem, UpdateItem, DeleteItem
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string, random_bytes
# Basic test for creating a new item with a random name, and reading it back
# with strong consistency.
# Only the string type is used for keys and attributes. None of the various
# optional PutItem features (Expected, ReturnValues, ReturnConsumedCapacity,
# ReturnItemCollectionMetrics, ConditionalOperator, ConditionExpression,
# ExpressionAttributeNames, ExpressionAttributeValues) are used, and
# for GetItem strong consistency is requested as well as all attributes,
# but no other optional features (AttributesToGet, ReturnConsumedCapacity,
# ProjectionExpression, ExpressionAttributeNames)
def test_basic_string_put_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'attribute': val, 'another': val2})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Similar to test_basic_string_put_and_get, just uses UpdateItem instead of
# PutItem. Because the item does not yet exist, it should work the same.
def test_basic_string_update_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'PUT'}, 'another': {'Value': val2, 'Action': 'PUT'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Test put_item and get_item of various types for the *attributes*,
# including both scalars as well as nested documents, lists and sets.
# The full list of types tested here:
# number, boolean, bytes, null, list, map, string set, number set,
# binary set.
# The keys are still strings.
# Note that only top-level attributes are written and read in this test -
# this test does not attempt to modify *nested* attributes.
# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/dynamodb.html
# on how to pass these various types to Boto3's put_item().
def test_put_and_get_attribute_types(test_table):
key = {'p': random_string(), 'c': random_string()}
test_items = [
Decimal("12.345"),
42,
True,
False,
b'xyz',
None,
['hello', 'world', 42],
{'hello': 'world', 'life': 42},
{'hello': {'test': 'hi', 'hello': True, 'list': [1, 2, 'hi']}},
set(['hello', 'world', 'hi']),
set([1, 42, Decimal("3.14")]),
set([b'xyz', b'hi']),
]
item = { str(i) : test_items[i] for i in range(len(test_items)) }
item.update(key)
test_table.put_item(Item=item)
got_item = test_table.get_item(Key=key, ConsistentRead=True)['Item']
assert item == got_item
# The test_empty_* tests below verify support for empty items, with no
# attributes except the key. This is a difficult case for Scylla, because
# for an empty row to exist, Scylla needs to add a "CQL row marker".
# There are several ways to create empty items - via PutItem, UpdateItem
# and deleting attributes from non-empty items, and we need to check them
# all, in several test_empty_* tests:
def test_empty_put(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_put_delete(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'hello': 'world'})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update_delete(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Value': 'world', 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
# Test error handling of UpdateItem passed a bad "Action" field.
def test_update_bad_action(test_table):
p = random_string()
c = random_string()
val = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'NONEXISTENT'}})
# A more elaborate UpdateItem test, updating different attributes at different
# times. Includes PUT and DELETE operations.
def test_basic_string_more_update(test_table):
p = random_string()
c = random_string()
val1 = random_string()
val2 = random_string()
val3 = random_string()
val4 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a2': {'Value': val2, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val3, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['a1'] == val3
assert item['a2'] == val2
assert not 'a3' in item
# Test that item operations on a non-existant table name fail with correct
# error code.
def test_item_operations_nonexistent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.put_item(TableName='non_existent_table',
Item={'a':{'S':'b'}})
# Fetching a non-existant item. According to the DynamoDB doc, "If there is no
# matching item, GetItem does not return any data and there will be no Item
# element in the response."
def test_get_item_missing_item(test_table):
p = random_string()
c = random_string()
assert not "Item" in test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)
# Test that if we have a table with string hash and sort keys, we can't read
# or write items with other key types to it.
def test_put_item_wrong_key_type(test_table):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.put_item(Item={'p': s, 'c': s})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s})
def test_update_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.update_item(Key={'p': s, 'c': s}, AttributeUpdates={})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': b, 'c': s}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': n, 'c': s}, AttributeUpdates={})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': b}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': n}, AttributeUpdates={})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'c': s}, AttributeUpdates={})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s}, AttributeUpdates={})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_get_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types) but have empty result
assert not "Item" in test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_delete_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.delete_item(Key={'p': s, 'c': s})
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': s, 'c': s})
# Most of the tests here arbitrarily used a table with both hash and sort keys
# (both strings). Let's check that a table with *only* a hash key works ok
# too, for PutItem, GetItem, and UpdateItem.
def test_only_hash_key(test_table_s):
s = random_string()
test_table_s.put_item(Item={'p': s, 'hello': 'world'})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world'}
test_table_s.update_item(Key={'p': s}, AttributeUpdates={'hi': {'Value': 'there', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world', 'hi': 'there'}
# Tests for item operations in tables with non-string hash or sort keys.
# These tests focus only on the type of the key - everything else is as
# simple as we can (string attributes, no special options for GetItem
# and PutItem). These tests also focus on individual items only, and
# not about the sort order of sort keys - this should be verified in
# test_query.py, for example.
def test_bytes_hash_key(test_table_b):
# Bytes values are passed using base64 encoding, which has weird cases
# depending on len%3 and len%4. So let's try various lengths.
for len in range(10,18):
p = random_bytes(len)
val = random_string()
test_table_b.put_item(Item={'p': p, 'attribute': val})
assert test_table_b.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'attribute': val}
def test_bytes_sort_key(test_table_sb):
p = random_string()
c = random_bytes()
val = random_string()
test_table_sb.put_item(Item={'p': p, 'c': c, 'attribute': val})
assert test_table_sb.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': val}
# Tests for using a large binary blob as hash key, sort key, or attribute.
# DynamoDB strictly limits the size of the binary hash key to 2048 bytes,
# and binary sort key to 1024 bytes, and refuses anything larger. The total
# size of an item is limited to 400KB, which also limits the size of the
# largest attributes. For more details on these limits, see
# https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
# Alternator currently does *not* have these limitations, and can accept much
# larger keys and attributes, but what we do in the following tests is to verify
# that items up to DynamoDB's maximum sizes also work well in Alternator.
def test_large_blob_hash_key(test_table_b):
b = random_bytes(2048)
test_table_b.put_item(Item={'p': b})
assert test_table_b.get_item(Key={'p': b}, ConsistentRead=True)['Item'] == {'p': b}
def test_large_blob_sort_key(test_table_sb):
s = random_string()
b = random_bytes(1024)
test_table_sb.put_item(Item={'p': s, 'c': b})
assert test_table_sb.get_item(Key={'p': s, 'c': b}, ConsistentRead=True)['Item'] == {'p': s, 'c': b}
def test_large_blob_attribute(test_table):
p = random_string()
c = random_string()
b = random_bytes(409500) # a bit less than 400KB
test_table.put_item(Item={'p': p, 'c': c, 'attribute': b })
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': b}
# Checks what it is not allowed to use in a single UpdateItem request both
# old-style AttributeUpdates and new-style UpdateExpression.
def test_update_item_two_update_methods(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'a': {'Value': 3, 'Action': 'PUT'}},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
# Verify that having neither AttributeUpdates nor UpdateExpression is
# allowed, and results in creation of an empty item.
def test_update_item_no_update_method(test_table_s):
p = random_string()
assert not "Item" in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.update_item(Key={'p': p})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p}
# Test GetItem with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_getitem_attributes_to_get(dynamodb, test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=wanted, ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Basic test for DeleteItem, with hash key only
def test_delete_item_hash(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p})
assert 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.delete_item(Key={'p': p})
assert not 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for DeleteItem, with hash and sort key
def test_delete_item_sort(test_table):
p = random_string()
c = random_string()
key = {'p': p, 'c': c}
test_table.put_item(Item=key)
assert 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
test_table.delete_item(Key=key)
assert not 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
# Test that PutItem completely replaces an existing item. It shouldn't merge
# it with a previously existing value, as UpdateItem does!
# We test for a table with just hash key, and for a table with both hash and
# sort keys.
def test_put_item_replace(test_table_s, test_table):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
test_table_s.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
test_table.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}

365
alternator-test/test_lsi.py Normal file
View File

@@ -0,0 +1,365 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of LSI (Local Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new LSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# Currently, Alternator's LSIs only support eventually consistent reads, so tests
# that involve writing to a table and then expect to read something from it cannot
# be guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_lsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'S' }],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'c': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# Checks that providing a hash key different than the base table is not allowed,
# and so is providing duplicated keys or no sort key at all
def test_lsi_wrong(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
# A simple scenario for LSI. Base table has just hash key, Index has an
# additional sort key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_lsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_lsi_1(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}})
# A second scenario of LSI. Base table has both hash and sort keys,
# a local index is created on each non-key parameter
@pytest.fixture(scope="session")
def test_table_lsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
{ 'AttributeName': 'x2', 'AttributeType': 'S' },
{ 'AttributeName': 'x3', 'AttributeType': 'S' },
{ 'AttributeName': 'x4', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_' + column,
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': column, 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
} for column in ['x1','x2','x3','x4']
])
yield table
table.delete()
def test_lsi_4(test_table_lsi_4):
items1 = [{'p': random_string(), 'c': random_string(),
'x1': random_string(), 'x2': random_string(), 'x3': random_string(), 'x4': random_string()} for i in range(10)]
i_values = items1[0]
i5 = random_string()
items2 = [{'p': i5, 'c': i5, 'x1': i5, 'x2': i5, 'x3': i5, 'x4': i5}]
items = items1 + items2
with test_table_lsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
for column in ['x1', 'x2', 'x3', 'x4']:
expected_items = [i for i in items if (i['p'], i[column]) == (i_values['p'], i_values[column])]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i_values['p']], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i_values[column]], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if (i['p'], i[column]) == (i5, i5)]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'}})
def test_lsi_describe(test_table_lsi_4):
desc = test_table_lsi_4.meta.client.describe_table(TableName=test_table_lsi_4.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_x1', 'hello_x2', 'hello_x3', 'hello_x4'])
# TODO: check projection and key params
# TODO: check also ProvisionedThroughput, IndexArn
# A table with selective projection - only keys are projected into the index
@pytest.fixture(scope="session")
def test_table_lsi_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Check that it's possible to extract a non-projected attribute from the index,
# as the documentation promises
def test_lsi_get_not_projected_attribute(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1 and i['d'] == d1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [{'d': i['d']} for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['d'])
# Check that only projected attributes can be extracted
@pytest.mark.xfail(reason="LSI in alternator currently only implement full projections")
def test_lsi_get_all_projected_attributes(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [{'p': i['p'], 'c': i['c'],'b': i['b']} for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
# Check that strongly consistent reads are allowed for LSI
def test_lsi_consistent_read(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
# A table with both gsi and lsi present
@pytest.fixture(scope="session")
def test_table_lsi_gsi(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_g1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_l1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Test that GSI and LSI can coexist, even if they're identical
def test_lsi_and_gsi(test_table_lsi_gsi):
desc = test_table_lsi_gsi.meta.client.describe_table(TableName=test_table_lsi_gsi.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
assert 'GlobalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_l1'])
assert(sorted([gsi['IndexName'] for gsi in gsis]) == ['hello_g1'])
items = [{'p': random_string(), 'c': random_string(), 'x1': random_string()} for i in range(17)]
p1, c1, x1 = items[0]['p'], items[0]['c'], items[0]['x1']
with test_table_lsi_gsi.batch_writer() as batch:
for item in items:
batch.put_item(item)
for index in ['hello_g1', 'hello_l1']:
expected_items = [i for i in items if i['p'] == p1 and i['x1'] == x1]
assert_index_query(test_table_lsi_gsi, index, expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x1': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

View File

@@ -0,0 +1,60 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for operations on items with *nested* attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test that we can write a top-level attribute that is a nested document, and
# read it back correctly.
def test_nested_document_attribute_write(test_table_s):
nested_value = {
'a': 3,
'b': {'c': 'hello', 'd': ['hi', 'there', {'x': 'y'}, '42']},
}
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': nested_value})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': nested_value}
# Test that if we have a top-level attribute that is a nested document (i.e.,
# a dictionary), updating this attribute will replace it entirely by a new
# nested document - not merge into the old content with the new content.
def test_nested_document_attribute_overwrite(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': {'c': 5}, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'c': 5}, 'd': 5}
# Moreover, we can overwrite an entire nested document by, say, a string,
# and that's also fine.
def test_nested_document_attribute_overwrite_2(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': 'hi', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi', 'd': 5}
# Verify that AttributeUpdates cannot be used to update a nested attribute -
# trying to use a dot in the name of the attribute, will just create one with
# an actual dot in its name.
def test_attribute_updates_dot(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a.b': {'Value': 3, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}

View File

@@ -0,0 +1,201 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the various operations (GetItem, Query, Scan) with a
# ProjectionExpression parameter.
#
# ProjectionExpression is an expension of the legacy AttributesToGet
# parameter. Both parameters request that only a subset of the attributes
# be fetched for each item, instead of all of them. But while AttributesToGet
# was limited to top-level attributes, ProjectionExpression can request also
# nested attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Basic test for ProjectionExpression, requesting only top-level attributes.
# Result should include the selected attributes only - if one wants the key
# attributes as well, one needs to select them explicitly. When no key
# attributes are selected, an item may have *none* of the selected
# attributes, and returned as an empty item.
def test_projection_expression_toplevel(test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, ProjectionExpression=",".join(wanted), ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Various simple tests for ProjectionExpression's syntax, using only top-evel
# attributes.
def test_projection_expression_toplevel_syntax(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a'})['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,b')['Item'] == {'a': 'hello', 'b': 'hi'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=' a , b ')['Item'] == {'a': 'hello', 'b': 'hi'}
# Missing or unused names in ExpressionAttributeNames are errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#wrong': 'a'})['Item'] == {'a': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a', '#unused': 'b'})['Item'] == {'a': 'hello'}
# It is not allowed to fetch the same top-level attribute twice (or in
# general, list two overlapping attributes). We get an error like
# "Invalid ProjectionExpression: Two document paths overlap with each
# other; must remove or rewrite one of these paths; path one: [a], path
# two: [a]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a')['Item']
# A comma with nothing after it is a syntax error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=',a')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,,b')['Item']
# An empty ProjectionExpression is not allowed. DynamoDB recognizes its
# syntax, but then writes: "Invalid ProjectionExpression: The expression
# can not be empty".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='')['Item']
# The following two tests are similar to test_projection_expression_toplevel()
# which tested the GetItem operation - but these test Scan and Query.
# Both test ProjectionExpression with only top-level attributes.
def test_projection_expression_scan(filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_scan(table, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_projection_expression_query(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# The previous tests all fetched only top-level attributes. They could all
# be written using AttributesToGet instead of ProjectionExpression (and,
# in fact, we do have similar tests with AttributesToGet in other files),
# but the previous test checked that the alternative syntax works correctly.
# The following test checks fetching more elaborate attribute paths from
# nested documents.
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_projection_expression_path(test_table_s):
p = random_string()
test_table_s.put_item(Item={
'p': p,
'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5},
'b': 'hello'
})
# Fetching the entire nested document "a" works, of course:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5}}
# If we fetch a.b, we get only the content of b - but it's still inside
# the a dictionary:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}]}}
# Similarly, fetching a.b[0] gives us a one-element array in a dictionary.
# Note that [0] is the first element of an array.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0]')['Item'] == {'a': {'b': [2]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2]')['Item'] == {'a': {'b': [{'x': 'hi', 'y': 'yo'}]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2].y')['Item'] == {'a': {'b': [{'y': 'yo'}]}}
# Trying to read any sort of non-existant attribute returns an empty item.
# This includes a non-existing top-level attribute, an attempt to read
# beyond the end of an array or a non-existant member of a dictionary, as
# well as paths which begin with a non-existant prefix.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3]')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x.y')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3].x')['Item'] == {}
# We can read multiple paths - the result are merged into one object
# structured the same was as in the original item:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[1]')['Item'] == {'a': {'b': [2, 4]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.c')['Item'] == {'a': {'b': [2], 'c': 5}}
# It is not allowed to read the same path multiple times. The error from
# DynamoDB looks like: "Invalid ProjectionExpression: Two document paths
# overlap with each other; must remove or rewrite one of these paths;
# path one: [a, b, [0]], path two: [a, b, [0]]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[0]')['Item']
# Two paths are considered to "overlap" if the content of one path
# contains the content of the second path. So requesting both "a" and
# "a.b[0]" is not allowed.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a.b[0]')['Item']
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_query_projection_expression_path(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression="a.x")
expected_items = [{'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_scan_projection_expression_path(test_table):
# This test is similar to test_query_projection_expression_path above,
# but uses a scan instead of a query. The scan will generate unrelated
# partitions created by other tests (hopefully not too many...) that we
# need to ignore. We also need to ask for "p" too, so we can filter by it.
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = [ x for x in full_scan(test_table, ProjectionExpression="p, a.x") if x['p'] == p]
expected_items = [{'p': p, 'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
# It is not allowed to use both ProjectionExpression and its older cousin,
# AttributesToGet, together. If trying to do this, DynamoDB produces an error
# like "Can not use both expression and non-expression parameters in the same
# request: Non-expression parameters: {AttributesToGet} Expression
# parameters: {ProjectionExpression}
def test_projection_expression_and_attributes_to_get(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
with pytest.raises(ClientError, match='ValidationException.*both'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a', AttributesToGet=['b'])['Item']
with pytest.raises(ClientError, match='ValidationException.*both'):
full_scan(test_table_s, ProjectionExpression='a', AttributesToGet=['a'])
with pytest.raises(ClientError, match='ValidationException.*both'):
full_query(test_table_s, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression='a', AttributesToGet=['a'])

View File

@@ -0,0 +1,516 @@
# -*- coding: utf-8 -*-
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Query operation
import random
import pytest
from botocore.exceptions import ClientError, ParamValidationError
from decimal import Decimal
from util import random_string, random_bytes, full_query, multiset
from boto3.dynamodb.conditions import Key, Attr
# Test that scanning works fine with in-stock paginator
def test_query_basic_restrictions(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
# EQ
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long']) == multiset(got_items)
# LT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['12'], 'ComparisonOperator': 'LT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
# LE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'LE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] <= '14']) == multiset(got_items)
# GT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['15'], 'ComparisonOperator': 'GT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] > '15']) == multiset(got_items)
# GE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'GE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '14']) == multiset(got_items)
# BETWEEN
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['155', '164'], 'ComparisonOperator': 'BETWEEN'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '155' and item['c'] <= '164']) == multiset(got_items)
# BEGINS_WITH
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['11'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
print([item for item in items if item['p'] == 'long' and item['c'].startswith('11')])
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'].startswith('11')]) == multiset(got_items)
# Test that KeyConditionExpression parameter is supported
@pytest.mark.xfail(reason="KeyConditionExpression not supported yet")
def test_query_key_condition_expression(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditionExpression=Key("p").eq("long") & Key("c").lt("12")):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
def test_begins_with(dynamodb, test_table):
paginator = dynamodb.meta.client.get_paginator('query')
items = [{'p': 'unorthodox_chars', 'c': sort_key, 'str': 'a'} for sort_key in [u'ÿÿÿ', u'cÿbÿ', u'cÿbÿÿabg'] ]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# TODO(sarna): Once bytes type is supported, /xFF character should be tested
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'ÿÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'ÿÿ')])
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'cÿbÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'cÿbÿ')])
def test_begins_with_wrong_type(dynamodb, test_table_sn):
paginator = dynamodb.meta.client.get_paginator('query')
with pytest.raises(ClientError, match='ValidationException'):
for page in paginator.paginate(TableName=test_table_sn.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [17], 'ComparisonOperator': 'BEGINS_WITH'}
}):
pass
# Items returned by Query should be sorted by the sort key. The following
# tests verify that this is indeed the case, for the three allowed key types:
# strings, binary, and numbers. These tests test not just the Query operation,
# but inherently that the sort-key sorting works.
def test_query_sort_order_string(test_table):
# Insert a lot of random items in one new partition:
# str(i) has a non-obvious sort order (e.g., "100" comes before "2") so is a nice test.
p = random_string()
items = [{'p': p, 'c': str(i)} for i in range(128)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the items
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Verify that got_sort_keys are already sorted (in string order)
assert sorted(got_sort_keys) == got_sort_keys
# Verify that got_sort_keys are a sorted version of the expected sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_bytes(test_table_sb):
# Insert a lot of random items in one new partition:
# We arbitrarily use random_bytes with a random length.
p = random_string()
items = [{'p': p, 'c': random_bytes(10)} for i in range(128)]
with test_table_sb.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table_sb, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Boto3's "Binary" objects are sorted as if bytes are signed integers.
# This isn't the order that DynamoDB itself uses (byte 0 should be first,
# not byte -128). Sorting the byte array ".value" works.
assert sorted(got_sort_keys, key=lambda x: x.value) == got_sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_number(test_table_sn):
# This is a list of numbers, sorted in correct order, and each suitable
# for accurate representation by Alternator's number type.
numbers = [
Decimal("-2e10"),
Decimal("-7.1e2"),
Decimal("-4.1"),
Decimal("-0.1"),
Decimal("-1e-5"),
Decimal("0"),
Decimal("2e-5"),
Decimal("0.15"),
Decimal("1"),
Decimal("1.00000000000000000000000001"),
Decimal("3.14159"),
Decimal("3.1415926535897932384626433832795028841"),
Decimal("31.4"),
Decimal("1.4e10"),
]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Finally, verify that we get back exactly the same numbers (with identical
# precision), and in their original sorted order.
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
def test_query_filtering_attributes_equality(filled_test_table):
test_table, items = filled_test_table
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "yy" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_query_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# QueryFilter can only contain non-key attributes in order to be compatible
def test_query_filtering_key_equality(filled_test_table):
test_table, items = filled_test_table
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"c" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"attribute" : {
"AttributeValueList" : [ "x" ],
"ComparisonOperator": "EQ"
},
"p" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
# Test Query with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_query_attributes_to_get(dynamodb, test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# Test that in a table with both hash key and sort key, which keys we can
# Query by: We can Query by the hash key, by a combination of both hash and
# sort keys, but *cannot* query by just the sort key, and obviously not
# by any non-key column.
def test_query_which_key(test_table):
p = random_string()
c = random_string()
p2 = random_string()
c2 = random_string()
item1 = {'p': p, 'c': c}
item2 = {'p': p, 'c': c2}
item3 = {'p': p2, 'c': c}
for i in [item1, item2, item3]:
test_table.put_item(Item=i)
# Query by hash key only:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
expected_items = [item1, item2]
assert multiset(expected_items) == multiset(got_items)
# Query by hash key *and* sort key (this is basically a GetItem):
got_items = full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
expected_items = [item1]
assert multiset(expected_items) == multiset(got_items)
# Query by sort key alone is not allowed. DynamoDB reports:
# "Query condition missed key schema element: p".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Query by a non-key isn't allowed, for the same reason - that the
# actual hash key (p) is missing in the query:
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try both p and a non-key we get a complaint that the sort
# key is missing: "Query condition missed key schema element: c"
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try p, c and another key, we get an error that
# "Conditions can be of length 1 or 2 only".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Test the "Select" parameter of Query. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_query_select(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num, 'x': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order. By default,
# query returns all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_ATTRIBUTES')['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_x_attributes = [x['x'] for x in got_items]
assert got_x_attributes == numbers
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES')
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['x'])['Items']
expected_items = [{'x': i} for i in numbers]
assert got_items == expected_items
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression='x')['Items']
assert got_items == expected_items
# Select=COUNT just returns a count - not any items
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='COUNT')
assert got['Count'] == len(numbers)
assert not 'Items' in got
# Check again that we also get a count - not just with Select=COUNT,
# but without Select=COUNT we also get the items:
got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert got['Count'] == len(numbers)
assert 'Items' in got
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='UNKNOWN')
# Test that the "Limit" parameter can be used to return only some of the
# items in a single partition. The items returned are the first in the
# sorted order.
def test_query_limit(test_table_sn):
numbers = [Decimal(i) for i in range(10)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
# Unfortunately, the boto3 library forbids a Limit of 0 on its own,
# before even sending a request, so we can't test how the server responds.
with pytest.raises(ParamValidationError):
test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=0)
# In test_query_limit we tested just that Limit allows to stop the result
# after right right number of items. Here we test that such a stopped result
# can be resumed, via the LastEvaluatedKey/ExclusiveStartKey paging mechanism.
def test_query_limit_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that full_query() returns all these numbers, in sorted order.
# full_query() will do a query with the given limit, and resume it again
# and again until the last page.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
# Test that the ScanIndexForward parameter works, and can be used to
# return items sorted in reverse order. Combining this with Limit can
# be used to return the last items instead of the first items of the
# partition.
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Verify that we get back the numbers in their sorted order or reverse
# order, depending on the ScanIndexForward parameter being True or False.
# First, no Limit so we should get all numbers (we have few of them, so
# it all fits in the default 1MB limitation)
reversed_numbers = list(reversed(numbers))
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=True)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False)['Items']
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers
# Now try a few different Limit values, and verify that the query
# returns exactly the first Limit sorted numbers - in regular or
# reverse order, depending on ScanIndexForward.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=True)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers[0:limit]
got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=False)['Items']
assert len(got_items) == min(limit, len(numbers))
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers[0:limit]
# Test that paging also works properly with reverse order
# (ScanIndexForward=false), i.e., reverse-order queries can be resumed
@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
def test_query_reverse_paging(test_table_sn):
numbers = [Decimal(i) for i in range(20)]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
reversed_numbers = list(reversed(numbers))
# Verify that with ScanIndexForward=False, full_query() returns all
# these numbers in reversed sorted order - getting pages of Limit items
# at a time and resuming the query.
for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False, Limit=limit)
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == reversed_numbers

View File

@@ -0,0 +1,226 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the ReturnValues parameter for the different update operations
# (PutItem, UpdateItem, DeleteItem).
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test trivial support for the ReturnValues parameter in PutItem, UpdateItem
# and DeleteItem - test that "NONE" works (and changes nothing), while a
# completely unsupported value gives an error.
# This test is useful to check that before the ReturnValues parameter is fully
# implemented, it returns an error when a still-unsupported ReturnValues
# option is attempted in the request - instead of simply being ignored.
def test_trivial_returnvalues(test_table_s):
# PutItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# UpdateItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# DeleteItem:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# Test the ReturnValues parameter on a PutItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_put_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='none')
# Test the ReturnValues parameter on a DeleteItem operation. Only two settings
# are supported for this parameter for this operation: NONE (the default)
# and ALL_OLD.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_delete_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the old value of the item is returned
# in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_OLD')
assert ret['Attributes'] == {'p': p, 'a': 'hi'}
# Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
# are supported by other operations but not by PutItem:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_OLD')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_NEW')
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_NEW')
# Also, obviously, a non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': p}, ReturnValues='none')
# Test the ReturnValues parameter on a UpdateItem operation. All five
# settings are supported for this parameter for this operation: NONE
# (the default), ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW.
@pytest.mark.xfail(reason="ReturnValues not supported")
def test_update_item_returnvalues(test_table_s):
# By default, the previous value of an item is not returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# Using ReturnValues=NONE is the same:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert not 'Attributes' in ret
# With ReturnValues=ALL_OLD, the entire old value of the item (even
# attributes we did not modify) is returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'dog'}
# With ReturnValues=UPDATED_OLD, only the overwritten attributes of the
# old item are returned in an "Attributes" attribute:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'dog'}
# Even if an update overwrites an attribute by the same value again,
# this is considered an update, and the old value (identical to the
# new one) is returned:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'b': 'cat'}
# Deleting an attribute also counts as overwriting it, of course:
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
UpdateExpression='REMOVE b')
assert ret['Attributes'] == {'b': 'cat'}
# With ReturnValues=ALL_NEW, the entire new value of the item (including
# old attributes we did not modify) is returned:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_NEW',
UpdateExpression='SET b = :val',
ExpressionAttributeValues={':val': 'cat'})
assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'cat'}
# With ReturnValues=UPDATED_NEW, only the new value of the updated
# attributes are returned. Note that "updated attributes" means
# the newly set attributes - it doesn't require that these attributes
# have any previous values
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET b = :val, c = :val2',
ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
assert ret['Attributes'] == {'b': 'cat', 'c': 'hello'}
# Deleting an attribute also counts as overwriting it, but the delete
# column is not returned in the response - so it's empty in this case.
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='REMOVE b')
assert not 'Attributes' in ret
# In the above examples, UPDATED_NEW is not useful because it just
# returns the new values we already know from the request... UPDATED_NEW
# becomes more useful in read-modify-write operations:
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 1})
ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
assert ret['Attributes'] == {'a': 2}
# A non-supported setting "DOG" also returns in error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})
# The ReturnValues value is case sensitive, so while "NONE" is supported
# (and tested above), "none" isn't:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, ReturnValues='none',
UpdateExpression='SET a = a + :val',
ExpressionAttributeValues={':val': 1})

View File

@@ -0,0 +1,252 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Scan operation
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_scan_and_count, multiset
from boto3.dynamodb.conditions import Attr
# Test that scanning works fine with/without pagination
def test_scan_basic(filled_test_table):
test_table, items = filled_test_table
for limit in [None,1,2,4,33,50,100,9007,16*1024*1024]:
pos = None
got_items = []
while True:
if limit:
response = test_table.scan(Limit=limit, ExclusiveStartKey=pos) if pos else test_table.scan(Limit=limit)
assert len(response['Items']) <= limit
else:
response = test_table.scan(ExclusiveStartKey=pos) if pos else test_table.scan()
pos = response.get('LastEvaluatedKey', None)
got_items += response['Items']
if not pos:
break
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
def test_scan_with_paginator(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('scan')
got_items = []
for page in paginator.paginate(TableName=test_table.name):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
for page_size in [1, 17, 1234]:
got_items = []
for page in paginator.paginate(TableName=test_table.name, PaginationConfig={'PageSize': page_size}):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
# Although partitions are scanned in seemingly-random order, inside a
# partition items must be returned by Scan sorted in sort-key order.
# This test verifies this, for string sort key. We'll need separate
# tests for the other sort-key types (number and binary)
def test_scan_sort_order_string(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the partition "long"
items_long = [x['c'] for x in items if x['p'] == 'long']
got_items_long = [x['c'] for x in got_items if x['p'] == 'long']
# Verify that got_items_long are already sorted (in string order)
assert sorted(got_items_long) == got_items_long
# Verify that got_items_long are a sorted version of the expected items_long
assert sorted(items_long) == got_items_long
# Test Scan with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_scan_attributes_to_get(dynamodb, filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
print(wanted)
got_items = full_scan(table, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_scan_with_attribute_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" ]
assert multiset(expected_items) == multiset(got_items)
scan_filter = {
"another" : {
"AttributeValueList" : [ "y" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" and item["another"] == "y" ]
assert multiset(expected_items) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_scan_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and 'another' in item.keys() and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
def test_scan_with_key_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter_p = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_p_and_attribute = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "x"*7 ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c_and_another = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "y"*16 ],
"ComparisonOperator": "EQ"
}
}
# Filtering on the hash key
got_items = full_scan(table, ScanFilter=scan_filter_p)
expected_items = [item for item in items if "p" in item.keys() and item["p"] == "7" ]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key
got_items = full_scan(table, ScanFilter=scan_filter_c)
expected_items = [item for item in items if "c" in item.keys() and item["c"] == "9"]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the hash key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_p_and_attribute)
expected_items = [item for item in items if "p" in item.keys() and "another" in item.keys() and item["p"] == "7" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_c_and_another)
expected_items = [item for item in items if "c" in item.keys() and "another" in item.keys() and item["c"] == "9" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Test the "Select" parameter of Scan. The default Select mode,
# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
# allow returning just specific attributes or just counting the results
# without returning items at all.
@pytest.mark.xfail(reason="Select not supported yet")
def test_scan_select(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
# By default, a scan returns all the items, with all their attributes:
# query returns all attributes:
got_items = full_scan(test_table)
assert multiset(items) == multiset(got_items)
# Select=ALL_ATTRIBUTES does exactly the same as the default - return
# all attributes:
got_items = full_scan(test_table, Select='ALL_ATTRIBUTES')
assert multiset(items) == multiset(got_items)
# Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
# is just for indexes, when IndexName is specified)
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='ALL_PROJECTED_ATTRIBUTES')
# Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
# or ProjectionExpression appears, but then really does nothing beyond
# what AttributesToGet and ProjectionExpression already do:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='SPECIFIC_ATTRIBUTES')
wanted = ['c', 'another']
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression=','.join(wanted))
assert multiset(expected_items) == multiset(got_items)
# Select=COUNT just returns a count - not any items
(got_count, got_items) = full_scan_and_count(test_table, Select='COUNT')
assert got_count == len(items)
assert got_items == []
# Check that we also get a count in regular scans - not just with
# Select=COUNT, but without Select=COUNT we both items and count:
(got_count, got_items) = full_scan_and_count(test_table)
assert got_count == len(items)
assert multiset(items) == multiset(got_items)
# Select with some unknown string generates a validation exception:
with pytest.raises(ClientError, match='ValidationException'):
full_scan(test_table, Select='UNKNOWN')
# Test parallel scan, i.e., the Segments and TotalSegments options.
# In the following test we check that these parameters allow splitting
# a scan into multiple parts, and that these parts are in fact disjoint,
# and their union is the entire contents of the table. We do not actually
# try to run these queries in *parallel* in this test.
@pytest.mark.xfail(reason="parallel scan not supported yet")
def test_scan_parallel(filled_test_table):
test_table, items = filled_test_table
for nsegments in [1, 2, 17]:
print('Testing TotalSegments={}'.format(nsegments))
got_items = []
for segment in range(nsegments):
got_items.extend(full_scan(test_table, TotalSegments=nsegments, Segment=segment))
# The following comparison verifies that each of the expected item
# in items was returned in one - and just one - of the segments.
assert multiset(items) == multiset(got_items)

View File

@@ -0,0 +1,276 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for basic table operations: CreateTable, DeleteTable, ListTables.
import pytest
from botocore.exceptions import ClientError
from util import list_tables, test_table_name, create_test_table, random_string
# Utility function for create a table with a given name and some valid
# schema.. This function initiates the table's creation, but doesn't
# wait for the table to actually become ready.
def create_table(dynamodb, name, BillingMode='PAY_PER_REQUEST', **kwargs):
return dynamodb.create_table(
TableName=name,
BillingMode=BillingMode,
KeySchema=[
{
'AttributeName': 'p',
'KeyType': 'HASH'
},
{
'AttributeName': 'c',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'p',
'AttributeType': 'S'
},
{
'AttributeName': 'c',
'AttributeType': 'S'
},
],
**kwargs
)
# Utility function for creating a table with a given name, and then deleting
# it immediately, waiting for these operations to complete. Since the wait
# uses DescribeTable, this function requires all of CreateTable, DescribeTable
# and DeleteTable to work correctly.
# Note that in DynamoDB, table deletion takes a very long time, so tests
# successfully using this function are very slow.
def create_and_delete_table(dynamodb, name, **kwargs):
table = create_table(dynamodb, name, **kwargs)
table.meta.client.get_waiter('table_exists').wait(TableName=name)
table.delete()
table.meta.client.get_waiter('table_not_exists').wait(TableName=name)
##############################################################################
# Test creating a table, and then deleting it, waiting for each operation
# to have completed before proceeding. Since the wait uses DescribeTable,
# this tests requires all of CreateTable, DescribeTable and DeleteTable to
# function properly in their basic use cases.
# Unfortunately, this test is extremely slow with DynamoDB because deleting
# a table is extremely slow until it really happens.
def test_create_and_delete_table(dynamodb):
create_and_delete_table(dynamodb, 'alternator_test')
# DynamoDB documentation specifies that table names must be 3-255 characters,
# and match the regex [a-zA-Z0-9._-]+. Names not matching these rules should
# be rejected, and no table be created.
def test_create_table_unsupported_names(dynamodb):
from botocore.exceptions import ParamValidationError, ClientError
# Intererstingly, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_create_and_delete_table_non_scylla_name(dynamodb):
create_and_delete_table(dynamodb, '.alternator_test')
# names with 255 characters are allowed in Dynamo, but they are not currently
# supported in Scylla because we create a directory whose name is the table's
# name followed by 33 bytes (underscore and UUID). So currently, we only
# correctly support names with length up to 222.
def test_create_and_delete_table_very_long_name(dynamodb):
# In the future, this should work:
#create_and_delete_table(dynamodb, 'n' * 255)
# But for now, only 222 works:
create_and_delete_table(dynamodb, 'n' * 222)
# We cannot test the following on DynamoDB because it will succeed
# (DynamoDB allows up to 255 bytes)
#with pytest.raises(ClientError, match='ValidationException'):
# create_table(dynamodb, 'n' * 223)
# Tests creating a table with an invalid schema should return a
# ValidationException error.
def test_create_table_invalid_schema(dynamodb):
# The name of the table "created" by this test shouldn't matter, the
# creation should not succeed anyway.
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'HASH' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'z', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'k', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'k', 'AttributeType': 'Q' }
],
)
# Test that trying to create a table that already exists fails in the
# appropriate way (ResourceInUseException)
def test_create_table_already_exists(dynamodb, test_table):
with pytest.raises(ClientError, match='ResourceInUseException'):
create_table(dynamodb, test_table.name)
# Test that BillingMode error path works as expected - only the values
# PROVISIONED or PAY_PER_REQUEST are allowed. The former requires
# ProvisionedThroughput to be set, the latter forbids it.
# If BillingMode is outright missing, it defaults (as original
# DynamoDB did) to PROVISIONED so ProvisionedThroughput is allowed.
def test_create_table_billing_mode_errors(dynamodb, test_table):
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='unknown')
# billing mode is case-sensitive
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='pay_per_request')
# PAY_PER_REQUEST cannot come with a ProvisionedThroughput:
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(),
BillingMode='PAY_PER_REQUEST', ProvisionedThroughput={'ReadCapacityUnits': 10, 'WriteCapacityUnits': 10})
# On the other hand, PROVISIONED requires ProvisionedThroughput:
# By the way, ProvisionedThroughput not only needs to appear, it must
# have both ReadCapacityUnits and WriteCapacityUnits - but we can't test
# this with boto3, because boto3 has its own verification that if
# ProvisionedThroughput is given, it must have the correct form.
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='PROVISIONED')
# If BillingMode is completely missing, it defaults to PROVISIONED, so
# ProvisionedThroughput is required
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(TableName=test_table_name(),
KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }])
# Our first implementation had a special column name called "attrs" where
# we stored a map for all non-key columns. If the user tried to name one
# of the key columns with this same name, the result was a disaster - Scylla
# goes into a bad state after trying to write data with two updates to same-
# named columns.
special_column_name1 = 'attrs'
special_column_name2 = ':attrs'
@pytest.fixture(scope="session")
def test_table_special_column_name(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[
{ 'AttributeName': special_column_name1, 'KeyType': 'HASH' },
{ 'AttributeName': special_column_name2, 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': special_column_name1, 'AttributeType': 'S' },
{ 'AttributeName': special_column_name2, 'AttributeType': 'S' },
],
)
yield table
table.delete()
@pytest.mark.xfail(reason="special attrs column not yet hidden correctly")
def test_create_table_special_column_name(test_table_special_column_name):
s = random_string()
c = random_string()
h = random_string()
expected = {special_column_name1: s, special_column_name2: c, 'hello': h}
test_table_special_column_name.put_item(Item=expected)
got = test_table_special_column_name.get_item(Key={special_column_name1: s, special_column_name2: c}, ConsistentRead=True)['Item']
assert got == expected
# Test that all tables we create are listed, and pagination works properly.
# Note that the DyanamoDB setup we run this against may have hundreds of
# other tables, for all we know. We just need to check that the tables we
# created are indeed listed.
def test_list_tables_paginated(dynamodb, test_table, test_table_s, test_table_b):
my_tables_set = {table.name for table in [test_table, test_table_s, test_table_b]}
for limit in [1, 2, 3, 4, 50, 100]:
print("testing limit={}".format(limit))
list_tables_set = set(list_tables(dynamodb, limit))
assert my_tables_set.issubset(list_tables_set)
# Test that pagination limit is validated
def test_list_tables_wrong_limit(dynamodb):
# lower limit (min. 1) is imposed by boto3 library checks
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.meta.client.list_tables(Limit=101)

View File

@@ -0,0 +1,854 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the UpdateItem operations with an UpdateExpression parameter
import random
import string
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string
# The simplest test of using UpdateExpression to set a top-level attribute,
# instead of the older AttributeUpdates parameter.
# Checks only one "SET" action in an UpdateExpression.
def test_update_expression_set(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
# An empty UpdateExpression is NOT allowed, and generates a "The expression
# can not be empty" error. This contrasts with an empty AttributeUpdates which
# is allowed, and results in the creation of an empty item if it didn't exist
# yet (see test_empty_update()).
def test_update_expression_empty(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='')
# A basic test with multiple SET actions in one expression
def test_update_expression_set_multi(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET x = :val1, y = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': 4, 'y': 4}
# SET can be used to copy an existing attribute to a new one
def test_update_expression_set_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
# Copying an non-existing attribute generates an error
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = z')
# It turns out that attributes to be copied are read before the SET
# starts to write, so "SET x = :val1, y = x" does not work...
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET x = :val1, y = x', ExpressionAttributeValues={':val1': 4})
# SET z=z does nothing if z exists, or fails if it doesn't
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET z = z')
# We can also use name references in either LHS or RHS of SET, e.g.,
# SET #one = #two. We need to also take the references used in the RHS
# when we want to complain about unused names in ExpressionAttributeNames.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello', 'c': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a', '#three': 'z'})
# Test for read-before-write action where the value to be read is nested inside a - operator
def test_update_expression_set_nested_copy(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #n = :two',
ExpressionAttributeNames={'#n': 'n'}, ExpressionAttributeValues={':two': 2})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nn = :seven - #n',
ExpressionAttributeNames={'#nn': 'nn', '#n': 'n'}, ExpressionAttributeValues={':seven': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnn = :nnn',
ExpressionAttributeNames={'#nnn': 'nnn'}, ExpressionAttributeValues={':nnn': [2,4]})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnnn = list_append(:val1, #nnn)',
ExpressionAttributeNames={'#nnnn': 'nnnn', '#nnn': 'nnn'}, ExpressionAttributeValues={':val1': [1,3]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5, 'nnn': [2,4], 'nnnn': [1,3,2,4]}
# Test for getting a key value with read-before-write
def test_update_expression_set_key(test_table_sn):
p = random_string()
test_table_sn.update_item(Key={'p': p, 'c': 7});
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #n = #p',
ExpressionAttributeNames={'#n': 'n', '#p': 'p'})
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #nn = #c + #c',
ExpressionAttributeNames={'#nn': 'nn', '#c': 'c'})
assert test_table_sn.get_item(Key={'p': p, 'c': 7}, ConsistentRead=True)['Item'] == {'p': p, 'c': 7, 'n': p, 'nn': 14}
# Simple test for the "REMOVE" action
def test_update_expression_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hi'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hi'}
# Demonstrate that although all DynamoDB examples give UpdateExpression
# action names in uppercase - e.g., "SET", it can actually be any case.
def test_update_expression_action_case(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression='sEt b = :val1', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# Demonstrate that whitespace is ignored in UpdateExpression parsing.
def test_update_expression_action_whitespace(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression=' set b=:val1 ', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# In UpdateExpression, the attribute name can appear directly in the expression
# (without a "#placeholder" notation) only if it is a single "token" as
# determined by DynamoDB's lexical analyzer rules: Such token is composed of
# alphanumeric characters whose first character must be alphabetic. Other
# names cause the parser to see multiple tokens, and produce syntax errors.
def test_update_expression_name_token(test_table_s):
p = random_string()
# Alphanumeric names starting with an alphabetical character work
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET alnum = :val1', ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['alnum'] == 1
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['Alpha_Numeric_123'] == 2
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET A123_ = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['A123_'] == 3
# But alphanumeric names cannot start with underscore or digits.
# DynamoDB's lexical analyzer doesn't recognize them, and produces
# a ValidationException looking like:
# Invalid UpdateExpression: Syntax error; token: "_", near: "SET _123"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _123 = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _abc = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123a = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123 = :val1', ExpressionAttributeValues={':val1': 3})
# Various other non-alpha-numeric characters, split a token and NOT allowed
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi-there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi$there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET "hithere" = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET !hithere = :val1', ExpressionAttributeValues={':val1': 3})
# In addition to the literal names, DynamoDB also allows references to any
# name, using the "#reference" syntax. It turns out the reference name is
# also a token following the rules as above, with one interesting point:
# since "#" already started the token, the next character may be any
# alphanumeric and doesn't need to be only alphabetical.
# Note that the reference target - the actual attribute name - can include
# absolutely any characters, and we use silly_name below as an example
silly_name = '3can include any character!.#='
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 4}, ExpressionAttributeNames={'#Alpha_Numeric_123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 4
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123a = :val1', ExpressionAttributeValues={':val1': 5}, ExpressionAttributeNames={'#123a': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 5
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123 = :val1', ExpressionAttributeValues={':val1': 6}, ExpressionAttributeNames={'#123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 6
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #_ = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#_': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 7
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #hi-there = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#hi-there': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #!hi = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#!hi': silly_name})
# Just a "#" is not enough as a token. Interestingly, DynamoDB will
# find the bad name in ExpressionAttributeNames before it actually tries
# to parse UpdateExpression, but we can verify the parse fails too by
# using a valid but irrelevant name in ExpressionAttributeNames:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#a': silly_name})
# There is also the value references, ":reference", for the right-hand
# side of an assignment. These have similar naming rules like "#reference".
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :Alpha_Numeric_123', ExpressionAttributeValues={':Alpha_Numeric_123': 8})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 8
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123a', ExpressionAttributeValues={':123a': 9})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123', ExpressionAttributeValues={':123': 10})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 10
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :_', ExpressionAttributeValues={':_': 11})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 11
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :hi!there', ExpressionAttributeValues={':hi!there': 12})
# Just a ":" is not enough as a token.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':': 7})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':a': 7})
# Trying to use a :reference on the left-hand side of an assignment will
# not work. In DynamoDB, it's a different type of token (and generates
# syntax error).
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET :a = :b', ExpressionAttributeValues={':a': 1, ':b': 2})
# Multiple actions are allowed in one expression, but actions are divided
# into clauses (SET, REMOVE, DELETE, ADD) and each of those can only appear
# once.
def test_update_expression_multi(test_table_s):
p = random_string()
# We can have two SET actions in one SET clause:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 2}
# But not two SET clauses - we get error "The "SET" section can only be used once in an update expression"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# We can have a REMOVE and a SET clause (note no comma between clauses):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = :val2', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = :val2 REMOVE b', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 3}
# The same clause (e.g., SET) cannot be used twice, even if interleaved with something else
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 REMOVE a SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# Trying to modify the same item twice in the same update is forbidden.
# For "SET a=:v REMOVE a" DynamoDB says: "Invalid UpdateExpression: Two
# document paths overlap with each other; must remove or rewrite one of
# these paths; path one: [a], path two: [a]".
# It is actually good for Scylla that such updates are forbidden, because had
# we allowed "SET a=:v REMOVE a" the result would be surprising - because data
# wins over a delete with the same timestamp, so "a" would be set despite the
# REMOVE command appearing later in the command line.
def test_update_expression_multi_overlap(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# Neither "REMOVE a SET a = :v" nor "SET a = :v REMOVE a" are allowed:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET a = :v', ExpressionAttributeValues={':v': 'hi'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v REMOVE a', ExpressionAttributeValues={':v': 'yo'})
# It's also not allowed to set a twice in the same clause
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v1, a = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'})
# Obviously, the paths are compared after the name references are evaluated
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a1 = :v1, #a2 = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'}, ExpressionAttributeNames={'#a1': 'a', '#a2': 'a'})
# The problem isn't just with identical paths - we can't modify two paths that
# "overlap" in the sense that one is the ancestor of the other.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_multi_overlap_nested(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, a.b = :val2',
ExpressionAttributeValues={':val1': {'b': 7}, ':val2': 'there'})
test_table_s.put_item(Item={'p': p, 'a': {'b': {'c': 2}}})
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.b = :val1, a.b.c = :val2',
ExpressionAttributeValues={':val1': 'hi', ':val2': 'there'})
# In the previous test we saw that *modifying* the same item twice in the same
# update is forbidden; But it is allowed to *read* an item in the same update
# that also modifies it, and we check this here.
def test_update_expression_multi_with_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# "REMOVE a SET b = a" works: as noted in test_update_expression_set_copy()
# the value of 'a' is read before the actual REMOVE operation happens.
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = b REMOVE b')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 'hello'}
# Test case where a :val1 is referenced, without being defined
def test_update_expression_set_missing_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val2': 4})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1')
# It is forbidden for ExpressionAttributeValues to contain values not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeValues unused in expressions: keys: {:val1}"
def test_update_expression_spurious_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test case where a #name is referenced, without being defined
def test_update_expression_set_missing_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4},
ExpressionAttributeNames={'#wrongname': 'hello'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4})
# It is forbidden for ExpressionAttributeNames to contain names not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeNames unused in expressions: keys: {#b}"
def test_update_expression_spurious_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeNames={'#a': 'hello', '#b': 'hi'},
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test that the key attributes (hash key or sort key) cannot be modified
# by an update
def test_update_expression_cannot_modify_key(test_table):
p = random_string()
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET p = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET c = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE p')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE c')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD p :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD c :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE p :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE c :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
# As sanity check, verify we *can* modify a non-key column
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='SET a = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 4}
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE a')
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c}
# Test that trying to start an expression with some nonsense like HELLO
# instead of SET, REMOVE, ADD or DELETE, fails.
def test_update_expression_non_existant_clause(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='HELLO b = :val1',
ExpressionAttributeValues={':val1': 4})
# Test support for "SET a = :val1 + :val2", "SET a = :val1 - :val2"
# Only exactly these combinations work - e.g., it's a syntax error to
# try to add three. Trying to add a string fails.
def test_update_expression_plus_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 4, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 7}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 - :val2',
ExpressionAttributeValues={':val1': 5, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
# Only the addition of exactly two values is supported!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2 + :val3',
ExpressionAttributeValues={':val1': 4, ':val2': 3, ':val3': 2})
# Only numeric values can be added - other things like strings or lists
# cannot be added, and we get an error like "Incorrect operand type for
# operator or function; operator or function: +, operand type: S".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 'dog', ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
# While most of the Alternator code just saves high-precision numbers
# unchanged, the "+" and "-" operations need to calculate with them, and
# we should check the calculation isn't done with some lower-precision
# representation, e.g., double
def test_update_expression_plus_precision(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("10000000000000000000001")}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val2 - :val1',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("9999999999999999999999")}
# Test support for "SET a = b + :val2" et al., i.e., a version of the
# above test_update_expression_plus_basic with read before write.
def test_update_expression_plus_rmw(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = a + :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1 + a',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + a',
ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 10
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = b + a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 19
# Test the list_append() function in SET, for the most basic use case of
# concatenating two value references. Because this is the first test of
# functions in SET, we also test some generic features of how functions
# are parsed.
def test_update_expression_list_append_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': [4, 'hello', 'hi', 7]}
# Unlike the operation name "SET", function names are case-sensitive!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = LIST_APPEND(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
# As usual, spaces are ignored by the parser
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['b']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['a', 'b']}
# The list_append function only allows two parameters. The parser can
# correctly parse fewer or more, but then an error is generated: "Invalid
# UpdateExpression: Incorrect number of operands for operator or function;
# operator or function: list_append, number of operands: 1".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1)',
ExpressionAttributeValues={':val1': ['a']})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2, :val3)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': [7], ':val3': ['a']})
# If list_append is used on value which isn't a list, we get
# error: "Invalid UpdateExpression: Incorrect operand type for operator
# or function; operator or function: list_append, operand type: S"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': 'hi'})
# Additional list_append() tests, also using attribute paths as parameters
# (i.e., read-modify-write).
def test_update_expression_list_append(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': ['hi', 2]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==['hi', 2]
# Often, list_append is used to append items to a list attribute
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(a, :val1)',
ExpressionAttributeValues={':val1': [4, 'hello']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['hi', 2, 4, 'hello']
# But it can also be used to just concatenate in other ways:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, a)',
ExpressionAttributeValues={':val1': ['dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = list_append(a, :val1)',
ExpressionAttributeValues={':val1': ['cat']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == ['dog', 'hi', 2, 4, 'hello', 'cat']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = list_append(a, b)')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['c'] == ['dog', 'hi', 2, 4, 'hello', 'dog', 'hi', 2, 4, 'hello', 'cat']
# As usual, #references are allowed instead of inline names:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name1 = list_append(#name2,:val1)',
ExpressionAttributeValues={':val1': [8]},
ExpressionAttributeNames={'#name1': 'a', '#name2': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello', 8]
# Test the "if_not_exists" function in SET
# The test also checks additional features of function-call parsing.
def test_update_expression_if_not_exists(test_table_s):
p = random_string()
# Since attribute a doesn't exist, set it:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# Now the attribute does exist, so set does nothing:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# if_not_exists can also be used to check one attribute and set another,
# but note that if_not_exists(a, :val) means a's value if it exists,
# otherwise :val!
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 4
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 6})
# note how because 'a' does exist, its value is copied, overwriting b's
# value:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 2
# The parser expects function parameters to be value references, paths,
# or nested call to functions. Other crap will cause syntax errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(non@sense, :val1)',
ExpressionAttributeValues={':val1': 6})
# if_not_exists() requires that the first parameter be a path. However,
# the parser doesn't know this, and allows for a function parameter
# also a value reference or a function call. If try one of these other
# things the parser succeeds, but we get a later error, looking like:
# "Invalid UpdateExpression: Operator or function requires a document
# path; operator or function: if_not_exists"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(if_not_exists(a, :val2), :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
# Surprisingly, if the wrong argument is a :val value reference, the
# parser first tries to look it up in ExpressionAttributeValues (and
# fails if it's missing), before realizing any value reference would be
# wrong... So the following fails like the above does - but with a
# different error message (which we do not check here): "Invalid
# UpdateExpression: An expression attribute value used in expression
# is not defined; attribute value: :val2"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6})
# When the expression parser parses a function call f(value, value), each
# value may itself be a function call - ad infinitum. So expressions like
# list_append(if_not_exists(a, :val1), :val2) are legal and so is deeper
# nesting.
@pytest.mark.xfail(reason="for unknown reason, DynamoDB does not allow nesting list_append")
def test_update_expression_function_nesting(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['cat', 'dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog', '1', '2']
# I don't understand why the following expression isn't accepted, but it
# isn't! It produces a "Invalid UpdateExpression: The function is not
# allowed to be used this way in an expression; function: list_append".
# I don't know how to explain it. In any case, the *parsing* works -
# this is not a syntax error - the failure is in some verification later.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(:val1, :val2), :val3)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi']})
# Ditto, the following passes the parser but fails some later check with
# the same error message as above.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(list_append(:val1, :val2), :val3), :val4)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi'], ':val4': ['yo']})
# Verify how in SET expressions, "+" (or "-") nests with functions.
# We discover that f(x)+f(y) works but f(x+y) does NOT (results in a syntax
# error on the "+"). This means that the parser has two separate rules:
# 1. set_action: SET path = value + value
# 2. value: VALREF | NAME | NAME (value, ...)
def test_update_expression_function_plus_nesting(test_table_s):
p = random_string()
# As explained above, this - with "+" outside the expression, works:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(b, :val1)+:val2',
ExpressionAttributeValues={':val1': 2, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
# ...but this - with the "+" inside an expression parameter, is a syntax
# error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = if_not_exists(c, :val1+:val2)',
ExpressionAttributeValues={':val1': 5, ':val2': 4})
# This test tries to use an undefined function "f". This, obviously, fails,
# but where we to actually print the error we would see "Invalid
# UpdateExpression: Invalid function name; function: f". Not a syntax error.
# This means that the parser accepts any alphanumeric name as a function
# name, and only later use of this function fails because it's not one of
# the supported file.
def test_update_expression_unknown_function(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*f'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f(b,c,d)')
with pytest.raises(ClientError, match='ValidationException.*f123_hi'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f123_hi(b,c,d)')
# Just like unreferenced column names parsed by the DynamoDB parser,
# function names must also start with an alphabetic character. Trying
# to use _f as a function name will result with an actual syntax error,
# on the "_" token.
with pytest.raises(ClientError, match='ValidationException.*yntax error'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = _f(b,c,d)')
# Test "ADD" operation for numbers
def test_update_expression_add_numbers(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 3, 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 7
# If the value to be added isn't a number, we get an error like "Invalid
# UpdateExpression: Incorrect operand type for operator or function;
# operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Similarly, if the attribute we're adding to isn't a number, we get an
# error like "An operand in the update expression has an incorrect data
# type"
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD b :val1',
ExpressionAttributeValues={':val1': 1})
# Test "ADD" operation for sets
def test_update_expression_add_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog', 'cat', 'mouse', 'pig'])
# TODO: right now this test won't detect duplicated values in the returned result,
# because boto3 parses a set out of the returned JSON anyway. This check should leverage
# lower level API (if exists) to ensure that the JSON contains no duplicates
# in the set representation. It has been verified manually.
test_table_s.put_item(Item={'p': p, 'a': set(['beaver', 'lynx', 'coati']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['coati', 'beaver', 'badger'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['beaver', 'badger', 'lynx', 'coati'])
# The value to be added needs to be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Test "DELETE" operation for sets
def test_update_expression_delete_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# Deleting an element not present in the set is not an error - it just
# does nothing
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# The value to be deleted must be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: DELETE, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': 'hello'})
######## Tests for paths and nested attribute updates:
# A dot inside a name in ExpressionAttributeNames is a literal dot, and
# results in a top-level attribute with an actual dot in its name - not
# a nested attribute path.
def test_update_expression_dot_in_name(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeValues={':val1': 3},
ExpressionAttributeNames={'#a': 'a.b'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}
# A basic test for direct update of a nested attribute: One of the top-level
# attributes is itself a document, and we update only one of that document's
# nested attributes.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}, 'd': 5}
# Of course we can also add new nested attributes, not just modify
# existing ones:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.d = :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7, 'd': 3}, 'd': 5}
# Similar test, for a list: one of the top-level attributes is a list, we
# can update one of its items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_index(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[1] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'hello', 'three']}
# Test that just like happens in top-level attributes, also in nested
# attributes, setting them replaces the old value - potentially an entire
# nested document, by the whole value (which may have a different type)
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_different_type(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': {'one': 1, 'two': 2}}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}}
# Yet another test of a nested attribute update. This one uses deeper
# level of nesting (dots and indexes), adds #name references to the mix.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_deep(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c[1].#name.y[1] = :val1',
ExpressionAttributeValues={':val1': 9}, ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 9, 7]}}]}
# A deep path can also appear on the right-hand-side of an assignment
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.z = a.c[1].#name.y[1]',
ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a']['z'] == 9
# A REMOVE operation can be used to remove nested attributes, and also
# individual list items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}, 'q': 2}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a.c[1].x.y[1], a.c[1].q')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 7]}}]}
# The DynamoDB documentation specifies: "When you use SET to update a list
# element, the contents of that element are replaced with the new data that
# you specify. If the element does not already exist, SET will append the
# new element at the end of the list."
# So if we take a three-element list a[7], and set a[7], the new element
# will be put at the end of the list, not position 7 specifically.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_array_out_of_bounds(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[7] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello']}
# The DynamoDB documentation also says: "If you add multiple elements
# in a single SET operation, the elements are sorted in order by element
# number.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[84] = :val1, a[37] = :val2',
ExpressionAttributeValues={':val1': 'a1', ':val2': 'a2'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello', 'a2', 'a1']}
# Test what happens if we try to write to a.b, which would only make sense if
# a were a nested document, but a doesn't exist, or exists and is NOT a nested
# document but rather a scalar or list or something.
# DynamoDB actually detects this case and prints an error:
# ClientError: An error occurred (ValidationException) when calling the
# UpdateItem operation: The document path provided in the update expression
# is invalid for update
# Because Scylla doesn't read before write, it cannot detect this as an error,
# so we'll probably want to allow for that possibility as well.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': ['hi']})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c.c = :val1',
ExpressionAttributeValues={':val1': 7})
# Similarly for other types of bad paths - using [0] on something which
# isn't an array,
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_array(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[0] = :val1',
ExpressionAttributeValues={':val1': 7})

141
alternator-test/util.py Normal file
View File

@@ -0,0 +1,141 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Various utility functions which are useful for multiple tests
import string
import random
import collections
import time
def random_string(length=10, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for x in range(length))
def random_bytes(length=10):
return bytearray(random.getrandbits(8) for _ in range(length))
# Utility functions for scan and query into an array of items:
# TODO: add to full_scan and full_query by default ConsistentRead=True, as
# it's not useful for tests without it!
def full_scan(table, **kwargs):
response = table.scan(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# full_scan_and_count returns both items and count as returned by the server.
# Note that count isn't simply len(items) - the server returns them
# independently. e.g., with Select='COUNT' the items are not returned, but
# count is.
def full_scan_and_count(table, **kwargs):
response = table.scan(**kwargs)
items = []
count = 0
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
if 'Items' in response:
items.extend(response['Items'])
if 'Count' in response:
count = count + response['Count']
return (count, items)
# Utility function for fetching the entire results of a query into an array of items
def full_query(table, **kwargs):
response = table.query(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# To compare two lists of items (each is a dict) without regard for order,
# "==" is not good enough because it will fail if the order is different.
# The following function, multiset() converts the list into a multiset
# (set with duplicates) where order doesn't matter, so the multisets can
# be compared.
def freeze(item):
if isinstance(item, dict):
return frozenset((key, freeze(value)) for key, value in item.items())
elif isinstance(item, list):
return tuple(freeze(value) for value in item)
return item
def multiset(items):
return collections.Counter([freeze(item) for item in items])
test_table_prefix = 'alternator_test_'
def test_table_name():
current_ms = int(round(time.time() * 1000))
# In the off chance that test_table_name() is called twice in the same millisecond...
if test_table_name.last_ms >= current_ms:
current_ms = test_table_name.last_ms + 1
test_table_name.last_ms = current_ms
return test_table_prefix + str(current_ms)
test_table_name.last_ms = 0
def create_test_table(dynamodb, **kwargs):
name = test_table_name()
print("fixture creating new table {}".format(name))
table = dynamodb.create_table(TableName=name,
BillingMode='PAY_PER_REQUEST', **kwargs)
waiter = table.meta.client.get_waiter('table_exists')
# recheck every second instead of the default, lower, frequency. This can
# save a few seconds on AWS with its very slow table creation, but can
# more on tests on Scylla with its faster table creation turnaround.
waiter.config.delay = 1
waiter.config.max_attempts = 200
waiter.wait(TableName=name)
return table
# DynamoDB's ListTables request returns up to a single page of table names
# (e.g., up to 100) and it is up to the caller to call it again and again
# to get the next page. This is a utility function which calls it repeatedly
# as much as necessary to get the entire list.
# We deliberately return a list and not a set, because we want the caller
# to be able to recognize bugs in ListTables which causes the same table
# to be returned twice.
def list_tables(dynamodb, limit=100):
ret = []
pos = None
while True:
if pos:
page = dynamodb.meta.client.list_tables(Limit=limit, ExclusiveStartTableName=pos);
else:
page = dynamodb.meta.client.list_tables(Limit=limit);
results = page.get('TableNames', None)
assert(results)
ret = ret + results
newpos = page.get('LastEvaluatedTableName', None)
if not newpos:
break;
# It doesn't make sense for Dynamo to tell us we need more pages, but
# not send anything in *this* page!
assert len(results) > 0
assert newpos != pos
# Note that we only checked that we got back tables, not that we got
# any new tables not already in ret. So a buggy implementation might
# still cause an endless loop getting the same tables again and again.
pos = newpos
return ret

147
alternator/auth.cc Normal file
View File

@@ -0,0 +1,147 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/error.hh"
#include "log.hh"
#include <string>
#include <string_view>
#include <gnutls/crypto.h>
#include <seastar/util/defer.hh>
#include "hashers.hh"
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/roles-metadata.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
namespace alternator {
static logging::logger alogger("alternator-auth");
static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
hmac_sha256_digest digest;
int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
if (ret) {
throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
}
return digest;
}
static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
return signing;
}
static std::string apply_sha256(std::string_view msg) {
sha256_hasher hasher;
hasher.update(msg.data(), msg.size());
return to_hex(hasher.finalize());
}
static std::string format_time_point(db_clock::time_point tp) {
time_t time_point_repr = db_clock::to_time_t(tp);
std::string time_point_str;
time_point_str.resize(17);
::tm time_buf;
// strftime prints the terminating null character as well
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));
time_point_str.resize(16);
return time_point_str;
}
void check_expiry(std::string_view signature_date) {
//FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
std::string expiration_str = format_time_point(db_clock::now() - 15min);
std::string validity_str = format_time_point(db_clock::now() + 15min);
if (signature_date < expiration_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
signature_date, expiration_str));
}
if (signature_date > validity_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
signature_date, validity_str));
}
}
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
auto amz_date_it = signed_headers_map.find("x-amz-date");
if (amz_date_it == signed_headers_map.end()) {
throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");
}
std::string_view amz_date = amz_date_it->second;
check_expiry(amz_date);
std::string_view datestamp = amz_date.substr(0, 8);
if (datestamp != orig_datestamp) {
throw api_error("InvalidSignatureException",
format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
orig_datestamp, datestamp));
}
std::string_view canonical_uri = "/";
std::stringstream canonical_headers;
for (const auto& header : signed_headers_map) {
canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
}
std::string payload_hash = apply_sha256(body_content);
std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
std::string_view algorithm = "AWS4-HMAC-SHA256";
std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope, apply_sha256(canonical_request));
hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
}
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);
auto cl = auth::password_authenticator::consistency_for_user(username);
auto timeout = auth::internal_distributed_timeout_config();
return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (res->empty()) {
throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));
}
salted_hash = res->one().get_opt<sstring>("salted_hash");
if (!salted_hash) {
throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));
}
return make_ready_future<std::string>(*salted_hash);
});
}
}

46
alternator/auth.hh Normal file
View File

@@ -0,0 +1,46 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include <array>
#include "gc_clock.hh"
#include "utils/loading_cache.hh"
namespace cql3 {
class query_processor;
}
namespace alternator {
using hmac_sha256_digest = std::array<char, 32>;
using key_cache = utils::loading_cache<std::string, std::string>;
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);
}

111
alternator/base64.cc Normal file
View File

@@ -0,0 +1,111 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values
// be encoded in the JSON API as base64-encoded strings. This is code to
// convert byte arrays to base64-encoded strings, and back.
#include "base64.hh"
#include <ctype.h>
// Arrays for quickly converting to and from an integer between 0 and 63,
// and the character used in base64 encoding to represent it.
static class base64_chars {
public:
static constexpr const char* to =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int8_t from[255];
base64_chars() {
static_assert(strlen(to) == 64);
for (int i = 0; i < 255; i++) {
from[i] = 255; // signal invalid character
}
for (int i = 0; i < 64; i++) {
from[(unsigned) to[i]] = i;
}
}
} base64_chars;
std::string base64_encode(bytes_view in) {
std::string ret;
ret.reserve(((4 * in.size() / 3) + 3) & ~3);
int i = 0;
unsigned char chunk3[3]; // chunk of input
for (auto byte : in) {
chunk3[i++] = byte;
if (i == 3) {
ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
ret += base64_chars.to[ chunk3[2] & 0x3f ];
i = 0;
}
}
if (i) {
// i can be 1 or 2.
for(int j = i; j < 3; j++)
chunk3[j] = '\0';
ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
if (i == 2) {
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
} else {
ret += '=';
}
ret += '=';
}
return ret;
}
bytes base64_decode(std::string_view in) {
int i = 0;
int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;
std::string ret;
ret.reserve(in.size() * 3 / 4);
for (unsigned char c : in) {
uint8_t dc = base64_chars.from[c];
if (dc == 255) {
// Any unexpected character, include the "=" character usually
// used for padding, signals the end of the decode.
break;
}
chunk4[i++] = dc;
if (i == 4) {
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];
i = 0;
}
}
if (i) {
// i can be 2 or 3, meaning 1 or 2 more output characters
if (i>=2)
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
if (i==3)
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
}
// FIXME: This copy is sad. The problem is we need back "bytes"
// but "bytes" doesn't have efficient append and std::string.
// To fix this we need to use bytes' "uninitialized" feature.
return bytes(ret.begin(), ret.end());
}

34
alternator/base64.hh Normal file
View File

@@ -0,0 +1,34 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string_view>
#include "bytes.hh"
#include "rjson.hh"
std::string base64_encode(bytes_view);
bytes base64_decode(std::string_view);
inline bytes base64_decode(const rjson::value& v) {
return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
}

564
alternator/conditions.cc Normal file
View File

@@ -0,0 +1,564 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <list>
#include <map>
#include <string_view>
#include "alternator/conditions.hh"
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "rjson.hh"
#include "serialization.hh"
#include "base64.hh"
#include <stdexcept>
namespace alternator {
static logging::logger clogger("alternator-conditions");
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {
static std::unordered_map<std::string, comparison_operator_type> ops = {
{"EQ", comparison_operator_type::EQ},
{"NE", comparison_operator_type::NE},
{"LE", comparison_operator_type::LE},
{"LT", comparison_operator_type::LT},
{"GE", comparison_operator_type::GE},
{"GT", comparison_operator_type::GT},
{"IN", comparison_operator_type::IN},
{"NULL", comparison_operator_type::IS_NULL},
{"NOT_NULL", comparison_operator_type::NOT_NULL},
{"BETWEEN", comparison_operator_type::BETWEEN},
{"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
{"CONTAINS", comparison_operator_type::CONTAINS},
{"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
};
if (!comparison_operator.IsString()) {
throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error("ValidationException", format("Unsupported comparison operator {}", op));
}
return it->second;
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {
bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));
auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));
bytes raw_value = serialize_item(value);
auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {
bytes raw_value = get_key_from_typed_value(value, cdef, type_to_string(cdef.type));
auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));
}
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {
clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));
auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);
for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {
std::string_view column_name(it->name.GetString(), it->name.GetStringLength());
const rjson::value& condition = it->value;
const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");
const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");
comparison_operator_type op = get_comparison_operator(comp_definition);
if (op != comparison_operator_type::EQ) {
throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");
}
if (attr_list.Size() != 1) {
throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));
}
if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {
// Primary key restriction
filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);
} else {
// Regular column restriction
filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);
}
}
return filtering_restrictions;
}
namespace {
struct size_check {
// True iff size passes this check.
virtual bool operator()(rapidjson::SizeType size) const = 0;
// Check description, such that format("expected array {}", check.what()) is human-readable.
virtual sstring what() const = 0;
};
class exact_size : public size_check {
rapidjson::SizeType _expected;
public:
explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}
bool operator()(rapidjson::SizeType size) const override { return size == _expected; }
sstring what() const override { return format("of size {}", _expected); }
};
struct empty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size < 1; }
sstring what() const override { return "to be empty"; }
};
struct nonempty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size > 0; }
sstring what() const override { return "to be non-empty"; }
};
} // anonymous namespace
// Check that array has the expected number of elements
static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {
if (!array || !array->IsArray()) {
throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
}
if (!expected(array->Size())) {
throw api_error("ValidationException",
format("{} operator requires AttributeValueList {}, instead found list size {}",
op, expected.what(), array->Size()));
}
}
struct rjson_engaged_ptr_comp {
bool operator()(const rjson::value* p1, const rjson::value* p2) const {
return rjson::single_value_comp()(*p1, *p2);
}
};
// It's not enough to compare underlying JSON objects when comparing sets,
// as internally they're stored in an array, and the order of elements is
// not important in set equality. See issue #5021
static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {
if (set1.Size() != set2.Size()) {
return false;
}
std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;
for (auto it = set1.Begin(); it != set1.End(); ++it) {
set1_raw.insert(&*it);
}
for (const auto& a : set2.GetArray()) {
if (set1_raw.count(&a) == 0) {
return false;
}
}
return true;
}
// Check if two JSON-encoded values match with the EQ relation
static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {
return check_EQ_for_sets(it1->value, it2->value);
}
}
return *v1 == v2;
}
// Check if two JSON-encoded values match with the NE relation
static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
return !v1 || *v1 != v2; // null is unequal to anything.
}
// Check if two JSON-encoded values match with the BEGINS_WITH relation
static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
// BEGINS_WITH requires that its single operand (v2) be a string or
// binary - otherwise it's a validation error. However, problems with
// the stored attribute (v1) will just return false (no match).
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
}
auto it2 = v2.MemberBegin();
if (it2->name != "S" && it2->name != "B") {
throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
auto it1 = v1->MemberBegin();
if (it1->name != it2->name) {
return false;
}
if (it2->name == "S") {
std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
return val1.substr(0, val2.size()) == val2;
} else /* it2->name == "B" */ {
// TODO (optimization): Check the begins_with condition directly on
// the base64-encoded string, without making a decoded copy.
bytes val1 = base64_decode(it1->value);
bytes val2 = base64_decode(it2->value);
return val1.substr(0, val2.size()) == val2;
}
}
static std::string_view to_string_view(const rjson::value& v) {
return std::string_view(v.GetString(), v.GetStringLength());
}
static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {
return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");
}
// Check if two JSON-encoded values match with the CONTAINS relation
static bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", kv2.name));
}
if (kv1.name == "S" && kv2.name == "S") {
return to_string_view(kv1.value).find(to_string_view(kv2.value)) != std::string_view::npos;
} else if (kv1.name == "B" && kv2.name == "B") {
return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;
} else if (is_set_of(kv1.name, kv2.name)) {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (*i == kv2.value) {
return true;
}
}
} else if (kv1.name == "L") {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (!i->IsObject() || i->MemberCount() != 1) {
clogger.error("check_CONTAINS received a list whose element is malformed");
return false;
}
const auto& el = *i->MemberBegin();
if (el.name == kv2.name && el.value == kv2.value) {
return true;
}
}
}
return false;
}
// Check if two JSON-encoded values match with the NOT_CONTAINS relation
static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (!v1) {
return false;
}
return !check_CONTAINS(v1, v2);
}
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
static bool check_IN(const rjson::value* val, const rjson::value& array) {
if (!array[0].IsObject() || array[0].MemberCount() != 1) {
throw api_error("ValidationException",
format("IN operator encountered malformed AttributeValue: {}", array[0]));
}
const auto& type = array[0].MemberBegin()->name;
if (type != "S" && type != "N" && type != "B") {
throw api_error("ValidationException",
"IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
}
if (!val) {
return false;
}
bool have_match = false;
for (const auto& elem : array.GetArray()) {
if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
throw api_error("ValidationException",
"IN operator requires all AttributeValueList elements to have the same type ");
}
if (!have_match && *val == elem) {
// Can't return yet, must check types of all array elements. <sigh>
have_match = true;
}
}
return have_match;
}
static bool check_NULL(const rjson::value* val) {
return val == nullptr;
}
static bool check_NOT_NULL(const rjson::value* val) {
return val != nullptr;
}
// Check if two JSON-encoded values match with cmp.
template <typename Comparator>
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
if (kv1.name != kv2.name) {
return false;
}
if (kv1.name == "N") {
return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));
}
if (kv1.name == "S") {
return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
}
if (kv1.name == "B") {
return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
}
clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
return false;
}
struct cmp_lt {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
static constexpr const char* diagnostic = "LT operator";
};
struct cmp_le {
// bytes only has <, so we cannot use <=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs || lhs == rhs; }
static constexpr const char* diagnostic = "LE operator";
};
struct cmp_ge {
// bytes only has <, so we cannot use >=.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs || lhs == rhs; }
static constexpr const char* diagnostic = "GE operator";
};
struct cmp_gt {
// bytes only has <, so we cannot use >.
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }
static constexpr const char* diagnostic = "GT operator";
};
// True if v is between lb and ub, inclusive. Throws if lb > ub.
template <typename T>
bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
if (ub < lb) {
throw api_error("ValidationException",
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
}
return cmp_ge()(v, lb) && cmp_le()(v, ub);
}
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
if (!v) {
return false;
}
if (!v->IsObject() || v->MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
}
if (!lb.IsObject() || lb.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
}
if (!ub.IsObject() || ub.MemberCount() != 1) {
throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
}
const auto& kv_v = *v->MemberBegin();
const auto& kv_lb = *lb.MemberBegin();
const auto& kv_ub = *ub.MemberBegin();
if (kv_lb.name != kv_ub.name) {
throw api_error(
"ValidationException",
format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
kv_lb.name, kv_ub.name));
}
if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
return false;
}
if (kv_v.name == "N") {
const char* diag = "BETWEEN operator";
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
}
if (kv_v.name == "S") {
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
}
if (kv_v.name == "B") {
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
}
throw api_error("ValidationException",
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
kv_lb.name));
}
// Verify one Expect condition on one attribute (whose content is "got")
// for the verify_expected() below.
// This function returns true or false depending on whether the condition
// succeeded - it does not throw ConditionalCheckFailedException.
// However, it may throw ValidationException on input validation errors.
static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {
const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");
const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");
const rjson::value* value = rjson::find(condition, "Value");
const rjson::value* exists = rjson::find(condition, "Exists");
// There are three types of conditions that Expected supports:
// A value, not-exists, and a comparison of some kind. Each allows
// and requires a different combinations of parameters in the request
if (value) {
if (exists && (!exists->IsBool() || exists->GetBool() != true)) {
throw api_error("ValidationException", "Cannot combine Value with Exists!=true");
}
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");
}
return check_EQ(got, *value);
} else if (exists) {
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");
}
if (!exists->IsBool() || exists->GetBool() != false) {
throw api_error("ValidationException", "Exists!=false requires Value");
}
// Remember Exists=false, so we're checking that the attribute does *not* exist:
return !got;
} else {
if (!comparison_operator) {
throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");
}
comparison_operator_type op = get_comparison_operator(*comparison_operator);
switch (op) {
case comparison_operator_type::EQ:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_EQ(got, (*attribute_value_list)[0]);
case comparison_operator_type::NE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NE(got, (*attribute_value_list)[0]);
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
case comparison_operator_type::LE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_le{});
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
case comparison_operator_type::GE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
case comparison_operator_type::IN:
verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
return check_IN(got, *attribute_value_list);
case comparison_operator_type::IS_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NULL(got);
case comparison_operator_type::NOT_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NOT_NULL(got);
case comparison_operator_type::BETWEEN:
verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
case comparison_operator_type::CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_CONTAINS(got, (*attribute_value_list)[0]);
case comparison_operator_type::NOT_CONTAINS:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);
}
throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));
}
}
// Verify that the existing values of the item (previous_item) match the
// conditions given by the Expected and ConditionalOperator parameters
// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
// This function will throw a ConditionalCheckFailedException API error
// if the values do not match the condition, or ValidationException if there
// are errors in the format of the condition itself.
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {
const rjson::value* expected = rjson::find(req, "Expected");
if (!expected) {
return;
}
if (!expected->IsObject()) {
throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");
}
// ConditionalOperator can be "AND" for requiring all conditions, or
// "OR" for requiring one condition, and defaults to "AND" if missing.
const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
bool require_all = true;
if (conditional_operator) {
if (!conditional_operator->IsString()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
}
std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());
if (s == "AND") {
// require_all is already true
} else if (s == "OR") {
require_all = false;
} else {
throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");
}
if (expected->GetObject().ObjectEmpty()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");
}
}
for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {
const rjson::value* got = nullptr;
if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {
got = rjson::find((*previous_item)["Item"], rjson::string_ref_type(it->name.GetString()));
}
bool success = verify_expected_one(it->value, got);
if (success && !require_all) {
// When !require_all, one success is enough!
return;
} else if (!success && require_all) {
// When require_all, one failure is enough!
throw api_error("ConditionalCheckFailedException", "Failed condition.");
}
}
// If we got here and require_all, none of the checks failed, so succeed.
// If we got here and !require_all, all of the checks failed, so fail.
if (!require_all) {
throw api_error("ConditionalCheckFailedException", "None of ORed Expect conditions were successful.");
}
}
}

49
alternator/conditions.hh Normal file
View File

@@ -0,0 +1,49 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* This file contains definitions and functions related to placing conditions
* on Alternator queries (equivalent of CQL's restrictions).
*
* With conditions, it's possible to add criteria to selection requests (Scan, Query)
* and use them for narrowing down the result set, by means of filtering or indexing.
*
* Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
*/
#pragma once
#include "cql3/restrictions/statement_restrictions.hh"
#include "serialization.hh"
namespace alternator {
enum class comparison_operator_type {
EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
};
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);
}

50
alternator/error.hh Normal file
View File

@@ -0,0 +1,50 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
namespace alternator {
// DynamoDB's error messages are described in detail in
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser
// HTTP code (almost always, 400), and a human readable message. Eventually these
// will be wrapped into a JSON object returned to the client.
class api_error : public std::exception {
public:
using status_type = httpd::reply::status_type;
status_type _http_code;
std::string _type;
std::string _msg;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
: _http_code(std::move(http_code))
, _type(std::move(type))
, _msg(std::move(msg))
{ }
api_error() = default;
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}

2275
alternator/executor.cc Normal file

File diff suppressed because it is too large Load Diff

71
alternator/executor.hh Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include <seastar/json/json_elements.hh>
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
#include "service/client_state.hh"
#include "stats.hh"
namespace alternator {
class executor {
service::storage_proxy& _proxy;
service::migration_manager& _mm;
public:
using client_state = service::client_state;
stats _stats;
static constexpr auto ATTRS_COLUMN_NAME = ":attrs";
static constexpr auto KEYSPACE_NAME = "alternator";
executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}
future<json::json_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> list_tables(client_state& client_state, std::string content);
future<json::json_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);
future<json::json_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<json::json_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
future<> start();
future<> stop() { return make_ready_future<>(); }
future<> maybe_create_keyspace();
static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
};
}

98
alternator/expressions.cc Normal file
View File

@@ -0,0 +1,98 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "expressions.hh"
#include "alternator/expressionsLexer.hpp"
#include "alternator/expressionsParser.hpp"
#include <seastarx.hh>
#include <seastar/core/print.hh>
#include <seastar/util/log.hh>
#include <functional>
namespace alternator {
template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
Result do_with_parser(std::string input, Func&& f) {
expressionsLexer::InputStreamType input_stream{
reinterpret_cast<const ANTLR_UINT8*>(input.data()),
ANTLR_ENC_UTF8,
static_cast<ANTLR_UINT32>(input.size()),
nullptr };
expressionsLexer lexer(&input_stream);
expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());
expressionsParser parser(&tstream);
auto result = f(parser);
return result;
}
parsed::update_expression
parse_update_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::update_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));
}
}
std::vector<parsed::path>
parse_projection_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::projection_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));
}
}
template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
namespace parsed {
void update_expression::add(update_expression::action a) {
std::visit(overloaded {
[&] (action::set&) { seen_set = true; },
[&] (action::remove&) { seen_remove = true; },
[&] (action::add&) { seen_add = true; },
[&] (action::del&) { seen_del = true; }
}, a._action);
_actions.push_back(std::move(a));
}
void update_expression::append(update_expression other) {
if ((seen_set && other.seen_set) ||
(seen_remove && other.seen_remove) ||
(seen_add && other.seen_add) ||
(seen_del && other.seen_del)) {
throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");
}
std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));
seen_set |= other.seen_set;
seen_remove |= other.seen_remove;
seen_add |= other.seen_add;
seen_del |= other.seen_del;
}
} // namespace parsed
} // namespace alternator

214
alternator/expressions.g Normal file
View File

@@ -0,0 +1,214 @@
/*
* Copyright 2019 ScyllaDB
*
* This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
* top-level directory for licensing information.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* The DynamoDB protocol is based on JSON, and most DynamoDB requests
* describe the operation and its parameters via JSON objects such as maps
* and lists. Nevertheless, in some types of requests an "expression" is
* passed as a single string, and we need to parse this string. These
* cases include:
* 1. Attribute paths, such as "a[3].b.c", are used in projection
* expressions as well as inside other expressions described below.
* 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
* used in conditional updates, filters, and other places.
* 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
*
* All these expression syntaxes are very simple: Most of them could be
* parsed as regular expressions, and the parenthesized condition expression
* could be done with a simple hand-written lexical analyzer and recursive-
* descent parser. Nevertheless, we decided to specify these parsers in the
* ANTLR3 language already used in the Scylla project, hopefully making these
* parsers easier to reason about, and easier to change if needed - and
* reducing the amount of boiler-plate code.
*/
grammar expressions;
options {
language = Cpp;
}
@parser::namespace{alternator}
@lexer::namespace{alternator}
/* TODO: explain what these traits things are. I haven't seen them explained
* in any document... Compilation fails without these fail because a definition
* of "expressionsLexerTraits" and "expressionParserTraits" is needed.
*/
@lexer::traits {
class expressionsLexer;
class expressionsParser;
typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
}
@parser::traits {
typedef expressionsLexerTraits expressionsParserTraits;
}
@lexer::header {
#include "alternator/expressions.hh"
// ANTLR generates a bunch of unused variables and functions. Yuck...
#pragma GCC diagnostic ignored "-Wunused-variable"
#pragma GCC diagnostic ignored "-Wunused-function"
}
@parser::header {
#include "expressionsLexer.hpp"
}
/* By default, ANTLR3 composes elaborate syntax-error messages, saying which
* token was unexpected, where, and so on on, but then dutifully writes these
* error messages to the standard error, and returns from the parser as if
* everything was fine, with a half-constructed output object! If we define
* the "displayRecognitionError" method, it will be called upon to build this
* error message, and we can instead throw an exception to stop the parsing
* immediately. This is good enough for now, for our simple needs, but if
* we ever want to show more information about the syntax error, Cql3.g
* contains an elaborate implementation (it would be nice if we could reuse
* it, not duplicate it).
* Unfortunately, we have to repeat the same definition twice - once for the
* parser, and once for the lexer.
*/
@parser::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
/*
* Lexical analysis phase, i.e., splitting the input up to tokens.
* Lexical analyzer rules have names starting in capital letters.
* "fragment" rules do not generate tokens, and are just aliases used to
* make other rules more readable.
* Characters *not* listed here, e.g., '=', '(', etc., will be handled
* as individual tokens on their own right.
* Whitespace spans are skipped, so do not generate tokens.
*/
WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
/* shortcuts for case-insensitive keywords */
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
/* These keywords must be appear before the generic NAME token below,
* because NAME matches too, and the first to match wins.
*/
SET: S E T;
REMOVE: R E M O V E;
ADD: A D D;
DELETE: D E L E T E;
fragment ALPHA: 'A'..'Z' | 'a'..'z';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA | DIGIT | '_';
INTEGER: DIGIT+;
NAME: ALPHA ALNUM*;
NAMEREF: '#' ALNUM+;
VALREF: ':' ALNUM+;
/*
* Parsing phase - parsing the string of tokens generated by the lexical
* analyzer defined above.
*/
path_component: NAME | NAMEREF;
path returns [parsed::path p]:
root=path_component { $p.set_root($root.text); }
( '.' name=path_component { $p.add_dot($name.text); }
| '[' INTEGER ']' { $p.add_index(std::stoi($INTEGER.text)); }
)*;
update_expression_set_value returns [parsed::value v]:
VALREF { $v.set_valref($VALREF.text); }
| path { $v.set_path($path.p); }
| NAME { $v.set_func_name($NAME.text); }
'(' x=update_expression_set_value { $v.add_func_parameter($x.v); }
(',' x=update_expression_set_value { $v.add_func_parameter($x.v); })*
')'
;
update_expression_set_rhs returns [parsed::set_rhs rhs]:
v=update_expression_set_value { $rhs.set_value(std::move($v.v)); }
( '+' v=update_expression_set_value { $rhs.set_plus(std::move($v.v)); }
| '-' v=update_expression_set_value { $rhs.set_minus(std::move($v.v)); }
)?
;
update_expression_set_action returns [parsed::update_expression::action a]:
path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
update_expression_remove_action returns [parsed::update_expression::action a]:
path { $a.assign_remove($path.p); };
update_expression_add_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_add($path.p, $VALREF.text); };
update_expression_delete_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_del($path.p, $VALREF.text); };
update_expression_clause returns [parsed::update_expression e]:
SET s=update_expression_set_action { $e.add(s); }
(',' s=update_expression_set_action { $e.add(s); })*
| REMOVE r=update_expression_remove_action { $e.add(r); }
(',' r=update_expression_remove_action { $e.add(r); })*
| ADD a=update_expression_add_action { $e.add(a); }
(',' a=update_expression_add_action { $e.add(a); })*
| DELETE d=update_expression_delete_action { $e.add(d); }
(',' d=update_expression_delete_action { $e.add(d); })*
;
// Note the "EOF" token at the end of the update expression. We want to the
// parser to match the entire string given to it - not just its beginning!
update_expression returns [parsed::update_expression e]:
(update_expression_clause { e.append($update_expression_clause.e); })* EOF;
projection_expression returns [std::vector<parsed::path> v]:
p=path { $v.push_back(std::move($p.p)); }
(',' p=path { $v.push_back(std::move($p.p)); } )* EOF;

41
alternator/expressions.hh Normal file
View File

@@ -0,0 +1,41 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <stdexcept>
#include <vector>
#include "expressions_types.hh"
namespace alternator {
class expressions_syntax_error : public std::runtime_error {
public:
using runtime_error::runtime_error;
};
parsed::update_expression parse_update_expression(std::string query);
std::vector<parsed::path> parse_projection_expression(std::string query);
} /* namespace alternator */

View File

@@ -0,0 +1,166 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include <string>
#include <variant>
/*
* Parsed representation of expressions and their components.
*
* Types in alternator::parse namespace are used for holding the parse
* tree - objects generated by the Antlr rules after parsing an expression.
* Because of the way Antlr works, all these objects are default-constructed
* first, and then assigned when the rule is completed, so all these types
* have only default constructors - but setter functions to set them later.
*/
namespace alternator {
namespace parsed {
// "path" is an attribute's path in a document, e.g., a.b[3].c.
class path {
// All paths have a "root", a top-level attribute, and any number of
// "dereference operators" - each either an index (e.g., "[2]") or a
// dot (e.g., ".xyz").
std::string _root;
std::vector<std::variant<std::string, unsigned>> _operators;
public:
void set_root(std::string root) {
_root = std::move(root);
}
void add_index(unsigned i) {
_operators.emplace_back(i);
}
void add_dot(std::string(name)) {
_operators.emplace_back(std::move(name));
}
const std::string& root() const {
return _root;
}
bool has_operators() const {
return !_operators.empty();
}
};
// "value" is is a value used in the right hand side of an assignment
// expression, "SET a = ...". It can be a reference to a value included in
// the request (":val"), a path to an attribute from the existing item
// (e.g., "a.b[3].c"), or a function of other such values.
// Note that the real right-hand-side of an assignment is actually a bit
// more general - it allows either a value, or a value+value or value-value -
// see class set_rhs below.
struct value {
struct function_call {
std::string _function_name;
std::vector<value> _parameters;
};
std::variant<std::string, path, function_call> _value;
void set_valref(std::string s) {
_value = std::move(s);
}
void set_path(path p) {
_value = std::move(p);
}
void set_func_name(std::string s) {
_value = function_call {std::move(s), {}};
}
void add_func_parameter(value v) {
std::get<function_call>(_value)._parameters.emplace_back(std::move(v));
}
};
// The right-hand-side of a SET in an update expression can be either a
// single value (see above), or value+value, or value-value.
class set_rhs {
public:
char _op; // '+', '-', or 'v''
value _v1;
value _v2;
void set_value(value&& v1) {
_op = 'v';
_v1 = std::move(v1);
}
void set_plus(value&& v2) {
_op = '+';
_v2 = std::move(v2);
}
void set_minus(value&& v2) {
_op = '-';
_v2 = std::move(v2);
}
};
class update_expression {
public:
struct action {
path _path;
struct set {
set_rhs _rhs;
};
struct remove {
};
struct add {
std::string _valref;
};
struct del {
std::string _valref;
};
std::variant<set, remove, add, del> _action;
void assign_set(path p, set_rhs rhs) {
_path = std::move(p);
_action = set { std::move(rhs) };
}
void assign_remove(path p) {
_path = std::move(p);
_action = remove { };
}
void assign_add(path p, std::string v) {
_path = std::move(p);
_action = add { std::move(v) };
}
void assign_del(path p, std::string v) {
_path = std::move(p);
_action = del { std::move(v) };
}
};
private:
std::vector<action> _actions;
bool seen_set = false;
bool seen_remove = false;
bool seen_add = false;
bool seen_del = false;
public:
void add(action a);
void append(update_expression other);
bool empty() const {
return _actions.empty();
}
const std::vector<action>& actions() const {
return _actions;
}
};
} // namespace parsed
} // namespace alternator

172
alternator/rjson.cc Normal file
View File

@@ -0,0 +1,172 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "rjson.hh"
#include "error.hh"
#include <seastar/core/print.hh>
namespace rjson {
static allocator the_allocator;
std::string print(const rjson::value& value) {
string_buffer buffer;
writer writer(buffer);
value.Accept(writer);
return std::string(buffer.GetString());
}
rjson::value copy(const rjson::value& value) {
return rjson::value(value, the_allocator);
}
rjson::value parse(const std::string& str) {
return parse_raw(str.c_str(), str.size());
}
rjson::value parse_raw(const char* c_str, size_t size) {
rjson::document d;
d.Parse(c_str, size);
if (d.HasParseError()) {
throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));
}
rjson::value& v = d;
return std::move(v);
}
rjson::value& get(rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
rjson::value from_string(const std::string& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const sstring& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const char* str, size_t size) {
return rjson::value(str, size, the_allocator);
}
const rjson::value* find(const rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
rjson::value* find(rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {
base.AddMember(name, std::move(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {
base.AddMember(name, rjson::value(member), the_allocator);
}
void push_back(rjson::value& base_array, rjson::value&& item) {
base_array.PushBack(std::move(item), the_allocator);
}
bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r2) const {
auto r1_type = r1.GetType();
auto r2_type = r2.GetType();
// null is the smallest type and compares with every other type, nothing is lesser than null
if (r1_type == rjson::type::kNullType || r2_type == rjson::type::kNullType) {
return r1_type < r2_type;
}
// only null, true, and false are comparable with each other, other types are not compatible
if (r1_type != r2_type) {
if (r1_type > rjson::type::kTrueType || r2_type > rjson::type::kTrueType) {
throw rjson::error(format("Types are not comparable: {} {}", r1, r2));
}
}
switch (r1_type) {
case rjson::type::kNullType:
// fall-through
case rjson::type::kFalseType:
// fall-through
case rjson::type::kTrueType:
return r1_type < r2_type;
case rjson::type::kObjectType:
throw rjson::error("Object type comparison is not supported");
case rjson::type::kArrayType:
throw rjson::error("Array type comparison is not supported");
case rjson::type::kStringType: {
const size_t r1_len = r1.GetStringLength();
const size_t r2_len = r2.GetStringLength();
size_t len = std::min(r1_len, r2_len);
int result = std::strncmp(r1.GetString(), r2.GetString(), len);
return result < 0 || (result == 0 && r1_len < r2_len);
}
case rjson::type::kNumberType: {
if (r1.IsInt() && r2.IsInt()) {
return r1.GetInt() < r2.GetInt();
} else if (r1.IsUint() && r2.IsUint()) {
return r1.GetUint() < r2.GetUint();
} else if (r1.IsInt64() && r2.IsInt64()) {
return r1.GetInt64() < r2.GetInt64();
} else if (r1.IsUint64() && r2.IsUint64()) {
return r1.GetUint64() < r2.GetUint64();
} else {
// it's safe to call GetDouble() on any number type
return r1.GetDouble() < r2.GetDouble();
}
}
default:
return false;
}
}
} // end namespace rjson
std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {
return os << rjson::print(v);
}

163
alternator/rjson.hh Normal file
View File

@@ -0,0 +1,163 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
/*
* rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.
*
* rapidjson has strict copy elision policies, which, among other things, involves
* using provided char arrays without copying them and allows copying objects only explicitly.
* As such, one should be careful when passing strings with limited liveness
* (e.g. data underneath local std::strings) to rjson functions, because created JSON objects
* may end up relying on dangling char pointers. All rjson functions that create JSONs from strings
* by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live
* at least as long as the object, e.g. a static char array) and for std::strings. The more optimal
* variants should be used *only* if the liveness of the string is guaranteed, otherwise it will
* result in undefined behaviour.
* Also, bear in mind that methods exposed by rjson::value are generic, but some of them
* work fine only for specific types. In case the type does not match, an rjson::error will be thrown.
* Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type
* or calling Size() on a non-array value.
*/
#include <string>
#include <stdexcept>
namespace rjson {
class error : public std::exception {
std::string _msg;
public:
error() = default;
error(const std::string& msg) : _msg(msg) {}
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}
// rapidjson configuration macros
#define RAPIDJSON_HAS_STDSTRING 1
// Default rjson policy is to use assert() - which is dangerous for two reasons:
// 1. assert() can be turned off with -DNDEBUG
// 2. assert() crashes a program
// Fortunately, the default policy can be overridden, and so rapidjson errors will
// throw an rjson::error exception instead.
#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <rapidjson/error/en.h>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace rjson {
using allocator = rapidjson::CrtAllocator;
using encoding = rapidjson::UTF8<>;
using document = rapidjson::GenericDocument<encoding, allocator>;
using value = rapidjson::GenericValue<encoding, allocator>;
using string_ref_type = value::StringRefType;
using string_buffer = rapidjson::GenericStringBuffer<encoding>;
using writer = rapidjson::Writer<string_buffer, encoding>;
using type = rapidjson::Type;
// Returns an object representing JSON's null
inline rjson::value null_value() {
return rjson::value(rapidjson::kNullType);
}
// Returns an empty JSON object - {}
inline rjson::value empty_object() {
return rjson::value(rapidjson::kObjectType);
}
// Returns an empty JSON array - []
inline rjson::value empty_array() {
return rjson::value(rapidjson::kArrayType);
}
// Returns an empty JSON string - ""
inline rjson::value empty_string() {
return rjson::value(rapidjson::kStringType);
}
// Convert the JSON value to a string with JSON syntax, the opposite of parse().
// The representation is dense - without any redundant indentation.
std::string print(const rjson::value& value);
// Copies given JSON value - involves allocation
rjson::value copy(const rjson::value& value);
// Parses a JSON value from given string or raw character array.
// The string/char array liveness does not need to be persisted,
// as both parse() and parse_raw() will allocate member names and values.
// Throws rjson::error if parsing failed.
rjson::value parse(const std::string& str);
rjson::value parse_raw(const char* c_str, size_t size);
// Creates a JSON value (of JSON string type) out of internal string representations.
// The string value is copied, so str's liveness does not need to be persisted.
rjson::value from_string(const std::string& str);
rjson::value from_string(const sstring& str);
rjson::value from_string(const char* str, size_t size);
// Returns a pointer to JSON member if it exists, nullptr otherwise
rjson::value* find(rjson::value& value, rjson::string_ref_type name);
const rjson::value* find(const rjson::value& value, rjson::string_ref_type name);
// Returns a reference to JSON member if it exists, throws otherwise
rjson::value& get(rjson::value& value, rjson::string_ref_type name);
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name);
// Sets a member in given JSON object by moving the member - allocates the name.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference - allocates the name.
// NOTICE: member string liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);
// Sets a member in given JSON object by moving the member.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// NOTICE: member liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);
// Adds a value to a JSON list by moving the item to its end.
// Throws if base_array is not a JSON array.
void push_back(rjson::value& base_array, rjson::value&& item);
struct single_value_comp {
bool operator()(const rjson::value& r1, const rjson::value& r2) const;
};
} // end namespace rjson
namespace std {
std::ostream& operator<<(std::ostream& os, const rjson::value& v);
}

261
alternator/serialization.cc Normal file
View File

@@ -0,0 +1,261 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "base64.hh"
#include "log.hh"
#include "serialization.hh"
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
static logging::logger slogger("alternator-serialization");
namespace alternator {
type_info type_info_from_string(std::string type) {
static thread_local const std::unordered_map<std::string, type_info> type_infos = {
{"S", {alternator_type::S, utf8_type}},
{"B", {alternator_type::B, bytes_type}},
{"BOOL", {alternator_type::BOOL, boolean_type}},
{"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_infos.find(type);
if (it == type_infos.end()) {
return {alternator_type::NOT_SUPPORTED_YET, utf8_type};
}
return it->second;
}
type_representation represent_type(alternator_type atype) {
static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {
{alternator_type::S, {"S", utf8_type}},
{alternator_type::B, {"B", bytes_type}},
{alternator_type::BOOL, {"BOOL", boolean_type}},
{alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_representations.find(atype);
if (it == type_representations.end()) {
throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));
}
return it->second;
}
struct from_json_visitor {
const rjson::value& v;
bytes_ostream& bo;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };
void operator()(const string_type_impl& t) {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
void operator()(const bytes_type_impl& t) const {
bo.write(base64_decode(v));
}
void operator()(const boolean_type_impl& t) const {
bo.write(boolean_type->decompose(v.GetBool()));
}
void operator()(const decimal_type_impl& t) const {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
// default
void operator()(const abstract_type& t) const {
bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));
}
};
bytes serialize_item(const rjson::value& item) {
if (item.IsNull() || item.MemberCount() != 1) {
throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));
}
auto it = item.MemberBegin();
type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings
if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal serialization of type {}", it->name.GetString());
return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));
}
bytes_ostream bo;
bo.write(bytes{int8_t(type_info.atype)});
visit(*type_info.dtype, from_json_visitor{it->value, bo});
return bytes(bo.linearize());
}
struct to_json_visitor {
rjson::value& deserialized;
const std::string& type_ident;
bytes_view bv;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };
void operator()(const decimal_type_impl& t) const {
auto s = to_json_string(*decimal_type, bytes(bv));
//FIXME(sarna): unnecessary copy
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));
}
void operator()(const string_type_impl& t) {
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));
}
void operator()(const bytes_type_impl& t) const {
std::string b64 = base64_encode(bv);
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));
}
// default
void operator()(const abstract_type& t) const {
rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));
}
};
rjson::value deserialize_item(bytes_view bv) {
rjson::value deserialized(rapidjson::kObjectType);
if (bv.empty()) {
throw api_error("ValidationException", "Serialized value empty");
}
alternator_type atype = alternator_type(bv[0]);
bv.remove_prefix(1);
if (atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));
return rjson::parse_raw(reinterpret_cast<const char *>(bv.data()), bv.size());
}
type_representation type_representation = represent_type(atype);
visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});
return deserialized;
}
std::string type_to_string(data_type type) {
static thread_local std::unordered_map<data_type, std::string> types = {
{utf8_type, "S"},
{bytes_type, "B"},
{boolean_type, "BOOL"},
{decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type
};
auto it = types.find(type);
if (it == types.end()) {
throw std::runtime_error(format("Unknown type {}", type->name()));
}
return it->second;
}
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text();
std::string expected_type = type_to_string(column.type);
const rjson::value& key_typed_value = rjson::get(item, rjson::value::StringRefType(column_name.c_str()));
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {
throw api_error("ValidationException",
format("Missing or invalid value object for key column {}: {}", column_name, item));
}
return get_key_from_typed_value(key_typed_value, column, expected_type);
}
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type) {
auto it = key_typed_value.MemberBegin();
if (it->name.GetString() != expected_type) {
throw api_error("ValidationException",
format("Type mismatch: expected type {} for key column {}, got type {}",
expected_type, column.name_as_text(), it->name.GetString()));
}
if (column.type == bytes_type) {
return base64_decode(it->value);
} else {
return column.type->from_string(it->value.GetString());
}
}
rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {
if (column.type == bytes_type) {
std::string b64 = base64_encode(cell);
return rjson::from_string(b64);
} if (column.type == utf8_type) {
return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));
} else if (column.type == decimal_type) {
// FIXME: use specialized Alternator number type, not the more
// general "decimal_type". A dedicated type can be more efficient
// in storage space and in parsing speed.
auto s = to_json_string(*decimal_type, bytes(cell));
return rjson::from_string(s);
} else {
// We shouldn't get here, we shouldn't see such key columns.
throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));
}
}
partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {
std::vector<bytes> raw_pk;
// FIXME: this is a loop, but we really allow only one partition key column.
for (const column_definition& cdef : schema->partition_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_pk.push_back(std::move(raw_value));
}
return partition_key::from_exploded(raw_pk);
}
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key::make_empty();
}
std::vector<bytes> raw_ck;
// FIXME: this is a loop, but we really allow only one clustering key column.
for (const column_definition& cdef : schema->clustering_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value));
}
return clustering_key::from_exploded(raw_ck);
}
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error("ValidationException", format("{}: invalid number object", diagnostic));
}
auto it = v.MemberBegin();
if (it->name != "N") {
throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));
}
if (it->value.IsNumber()) {
// FIXME(sarna): should use big_decimal constructor with numeric values directly:
return big_decimal(rjson::print(it->value));
}
if (!it->value.IsString()) {
throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(it->value.GetString());
}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return {"", nullptr};
}
auto it = v.MemberBegin();
const std::string it_key = it->name.GetString();
if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
return {"", nullptr};
}
return std::make_pair(it_key, &(it->value));
}
}

View File

@@ -0,0 +1,72 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include "types.hh"
#include "schema.hh"
#include "keys.hh"
#include "rjson.hh"
#include "utils/big_decimal.hh"
namespace alternator {
enum class alternator_type : int8_t {
S, B, BOOL, N, NOT_SUPPORTED_YET
};
struct type_info {
alternator_type atype;
data_type dtype;
};
struct type_representation {
std::string ident;
data_type dtype;
};
type_info type_info_from_string(std::string type);
type_representation represent_type(alternator_type atype);
bytes serialize_item(const rjson::value& item);
rjson::value deserialize_item(bytes_view bv);
std::string type_to_string(data_type type);
bytes get_key_column_value(const rjson::value& item, const column_definition& column);
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type);
rjson::value json_key_column_value(bytes_view cell, const column_definition& column);
partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it. Otherwise,
// raises ValidationException with diagnostic.
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
// and returns set's type and a pointer to that set. If the object does not encode a set,
// returned value is {"", nullptr}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);
}

314
alternator/server.cc Normal file
View File

@@ -0,0 +1,314 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/server.hh"
#include "log.hh"
#include <seastar/http/function_handlers.hh>
#include <seastar/json/json_elements.hh>
#include <seastarx.hh>
#include "error.hh"
#include "rjson.hh"
#include "auth.hh"
#include <cctype>
#include "cql3/query_processor.hh"
static logging::logger slogger("alternator-server");
using namespace httpd;
namespace alternator {
static constexpr auto TARGET = "X-Amz-Target";
inline std::vector<std::string_view> split(std::string_view text, char separator) {
std::vector<std::string_view> tokens;
if (text == "") {
return tokens;
}
while (true) {
auto pos = text.find_first_of(separator);
if (pos != std::string_view::npos) {
tokens.emplace_back(text.data(), pos);
text.remove_prefix(pos + 1);
} else {
tokens.emplace_back(text);
break;
}
}
return tokens;
}
// DynamoDB HTTP error responses are structured as follows
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Our handlers throw an exception to report an error. If the exception
// is of type alternator::api_error, it unwrapped and properly reported to
// the user directly. Other exceptions are unexpected, and reported as
// Internal Server Error.
class api_handler : public handler_base {
public:
api_handler(const future_json_function& _handle) : _f_handle(
[_handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([rep = std::move(rep)](future<json::json_return_type> resf) mutable {
if (resf.failed()) {
// Exceptions of type api_error are wrapped as JSON and
// returned to the client as expected. Other types of
// exceptions are unexpected, and returned to the user
// as an internal server error:
api_error ret;
try {
resf.get();
} catch (api_error &ae) {
ret = ae;
} catch (rjson::error & re) {
ret = api_error("ValidationException", re.what());
} catch (...) {
ret = api_error(
"Internal Server Error",
format("Internal server error: {}", std::current_exception()),
reply::status_type::internal_server_error);
}
// FIXME: what is this version number?
rep->_content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + ret._type + "\"," +
"\"message\":\"" + ret._msg + "\"}";
rep->_status = ret._http_code;
slogger.trace("api_handler error case: {}", rep->_content);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
slogger.trace("api_handler success case");
auto res = resf.get0();
if (res._body_writer) {
rep->write_body("json", std::move(res._body_writer));
} else {
rep->_content += res._res;
}
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}), _type("json") { }
api_handler(const api_handler&) = default;
future<std::unique_ptr<reply>> handle(const sstring& path,
std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
return _f_handle(std::move(req), std::move(rep)).then(
[this](std::unique_ptr<reply> rep) {
rep->done(_type);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}
protected:
future_handler_function _f_handle;
sstring _type;
};
class health_handler : public handler_base {
virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
rep->set_status(reply::status_type::ok);
rep->write_body("txt", format("healthy: {}", req->get_header("Host")));
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
};
future<> server::verify_signature(const request& req) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<>();
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
std::string credential;
std::string user_signature;
std::string signed_headers_str;
std::vector<std::string_view> signed_headers;
for (std::string_view entry : credentials_raw) {
std::vector<std::string_view> entry_split = split(entry, '=');
if (entry_split.size() != 2) {
if (entry != "AWS4-HMAC-SHA256") {
throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
}
continue;
}
std::string_view auth_value = entry_split[1];
// Commas appear as an additional (quite redundant) delimiter
if (auth_value.back() == ',') {
auth_value.remove_suffix(1);
}
if (entry_split[0] == "Credential") {
credential = std::string(auth_value);
} else if (entry_split[0] == "Signature") {
user_signature = std::string(auth_value);
} else if (entry_split[0] == "SignedHeaders") {
signed_headers_str = std::string(auth_value);
signed_headers = split(auth_value, ';');
std::sort(signed_headers.begin(), signed_headers.end());
}
}
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
std::string region(credential_split[2]);
std::string service(credential_split[3]);
std::map<std::string_view, std::string_view> signed_headers_map;
for (const auto& header : signed_headers) {
signed_headers_map.emplace(header, std::string_view());
}
for (auto& header : req._headers) {
std::string header_str;
header_str.resize(header.first.size());
std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
auto it = signed_headers_map.find(header_str);
if (it != signed_headers_map.end()) {
it->second = std::string_view(header.second);
}
}
auto cache_getter = [] (std::string username) {
return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req,
user = std::move(user),
host = std::move(host),
datestamp = std::move(datestamp),
signed_headers_str = std::move(signed_headers_str),
signed_headers_map = std::move(signed_headers_map),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");
}
});
}
future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
_executor.local()._stats.total_operations++;
sstring target = req->get_header(TARGET);
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
slogger.trace("Request: {} {}", op, req->content);
return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor.local()._stats.unsupported_operations++;
throw api_error("UnknownOperationException",
format("Unsupported operation {}", op));
}
//FIXME: Client state can provide more context, e.g. client's endpoint address
// We use unique_ptr because client_state cannot be moved or copied
return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
client_state->set_raw_keyspace(executor::KEYSPACE_NAME);
tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);
tracing::trace(trace_state, op);
return callback_it->second(_executor.local(), *client_state, trace_state, std::move(req)).finally([trace_state] {});
});
});
}
void server::set_routes(routes& r) {
api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
return handle_api_request(std::move(req));
});
r.add(operation_type::POST, url("/"), req_handler);
r.add(operation_type::GET, url("/"), new health_handler);
}
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(seastar::sharded<executor>& e)
: _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) {
return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req), trace_state = std::move(trace_state)] () mutable { return e.create_table(client_state, std::move(trace_state), req->content); }); }
},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_table(client_state, std::move(trace_state), req->content); }},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_table(client_state, std::move(trace_state), req->content); }},
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.put_item(client_state, std::move(trace_state), req->content); }},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.update_item(client_state, std::move(trace_state), req->content); }},
{"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.get_item(client_state, std::move(trace_state), req->content); }},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_item(client_state, std::move(trace_state), req->content); }},
{"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},
{"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.scan(client_state, std::move(trace_state), req->content); }},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, std::move(trace_state), req->content); }},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, std::move(trace_state), req->content); }},
{"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.query(client_state, std::move(trace_state), req->content); }},
} {
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {
_enforce_authorization = enforce_authorization;
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
}
return seastar::async([this, addr, port, https_port, creds] {
try {
_executor.invoke_on_all([] (executor& e) {
return e.start();
}).get();
if (port) {
_control.start().get();
_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_control.listen(socket_address{addr, *port}).get();
slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);
}
if (https_port) {
_https_control.start().get();
_https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_https_control.server().invoke_on_all([creds] (http_server& serv) {
return serv.set_tls_credentials(creds->build_server_credentials());
}).get();
_https_control.listen(socket_address{addr, *https_port}).get();
slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);
}
} catch (...) {
slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
std::throw_with_nested(std::runtime_error(
format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));
}
});
}
}

54
alternator/server.hh Normal file
View File

@@ -0,0 +1,54 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "alternator/executor.hh"
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include <seastar/net/tls.hh>
#include <optional>
#include <alternator/auth.hh>
namespace alternator {
class server {
using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, tracing::trace_state_ptr, std::unique_ptr<request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
seastar::httpd::http_server_control _control;
seastar::httpd::http_server_control _https_control;
seastar::sharded<executor>& _executor;
key_cache _key_cache;
bool _enforce_authorization;
alternator_callbacks_map _callbacks;
public:
server(seastar::sharded<executor>& executor);
seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);
private:
void set_routes(seastar::httpd::routes& r);
future<> verify_signature(const seastar::httpd::request& r);
future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);
};
}

98
alternator/stats.cc Normal file
View File

@@ -0,0 +1,98 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "stats.hh"
#include <seastar/core/metrics.hh>
namespace alternator {
const char* ALTERNATOR_METRICS = "alternator";
stats::stats() : api_operations{} {
// Register the
seastar::metrics::label op("op");
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
OPERATION(create_global_table, "CreateGlobalTable")
OPERATION(create_table, "CreateTable")
OPERATION(delete_backup, "DeleteBackup")
OPERATION(delete_item, "DeleteItem")
OPERATION(delete_table, "DeleteTable")
OPERATION(describe_backup, "DescribeBackup")
OPERATION(describe_continuous_backups, "DescribeContinuousBackups")
OPERATION(describe_endpoints, "DescribeEndpoints")
OPERATION(describe_global_table, "DescribeGlobalTable")
OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")
OPERATION(describe_limits, "DescribeLimits")
OPERATION(describe_table, "DescribeTable")
OPERATION(describe_time_to_live, "DescribeTimeToLive")
OPERATION(get_item, "GetItem")
OPERATION(list_backups, "ListBackups")
OPERATION(list_global_tables, "ListGlobalTables")
OPERATION(list_tables, "ListTables")
OPERATION(list_tags_of_resource, "ListTagsOfResource")
OPERATION(put_item, "PutItem")
OPERATION(query, "Query")
OPERATION(restore_table_from_backup, "RestoreTableFromBackup")
OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")
OPERATION(scan, "Scan")
OPERATION(tag_resource, "TagResource")
OPERATION(transact_get_items, "TransactGetItems")
OPERATION(transact_write_items, "TransactWriteItems")
OPERATION(untag_resource, "UntagResource")
OPERATION(update_continuous_backups, "UpdateContinuousBackups")
OPERATION(update_global_table, "UpdateGlobalTable")
OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")
OPERATION(update_item, "UpdateItem")
OPERATION(update_table, "UpdateTable")
OPERATION(update_time_to_live, "UpdateTimeToLive")
OPERATION_LATENCY(put_item_latency, "PutItem")
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
});
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API")),
seastar::metrics::make_total_operations("total_operations", total_operations,
seastar::metrics::description("number of total operations via Alternator API")),
seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
seastar::metrics::description("number of performed read-before-write operations")),
seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
});
}
}

95
alternator/stats.hh Normal file
View File

@@ -0,0 +1,95 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <cstdint>
#include <seastar/core/metrics_registration.hh>
#include "seastarx.hh"
#include "utils/estimated_histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
// Object holding per-shard statistics related to Alternator.
// While this object is alive, these metrics are also registered to be
// visible by the metrics REST API, with the "alternator" prefix.
class stats {
public:
stats();
// Count of DynamoDB API operations by types
struct {
uint64_t batch_get_item = 0;
uint64_t batch_write_item = 0;
uint64_t create_backup = 0;
uint64_t create_global_table = 0;
uint64_t create_table = 0;
uint64_t delete_backup = 0;
uint64_t delete_item = 0;
uint64_t delete_table = 0;
uint64_t describe_backup = 0;
uint64_t describe_continuous_backups = 0;
uint64_t describe_endpoints = 0;
uint64_t describe_global_table = 0;
uint64_t describe_global_table_settings = 0;
uint64_t describe_limits = 0;
uint64_t describe_table = 0;
uint64_t describe_time_to_live = 0;
uint64_t get_item = 0;
uint64_t list_backups = 0;
uint64_t list_global_tables = 0;
uint64_t list_tables = 0;
uint64_t list_tags_of_resource = 0;
uint64_t put_item = 0;
uint64_t query = 0;
uint64_t restore_table_from_backup = 0;
uint64_t restore_table_to_point_in_time = 0;
uint64_t scan = 0;
uint64_t tag_resource = 0;
uint64_t transact_get_items = 0;
uint64_t transact_write_items = 0;
uint64_t untag_resource = 0;
uint64_t update_continuous_backups = 0;
uint64_t update_global_table = 0;
uint64_t update_global_table_settings = 0;
uint64_t update_item = 0;
uint64_t update_table = 0;
uint64_t update_time_to_live = 0;
utils::estimated_histogram put_item_latency;
utils::estimated_histogram get_item_latency;
utils::estimated_histogram delete_item_latency;
utils::estimated_histogram update_item_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;
uint64_t unsupported_operations = 0;
uint64_t reads_before_write = 0;
// CQL-derived stats
cql3::cql_stats cql_stats;
private:
// The metric_groups object holds this stat object's metrics registered
// as long as the stats object is alive.
seastar::metrics::metric_groups _metrics;
};
}

View File

@@ -13,7 +13,7 @@
{
"method":"GET",
"summary":"get row cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_row_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -35,7 +35,7 @@
"description":"row cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"get key cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_key_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -70,7 +70,7 @@
"description":"key cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -83,7 +83,7 @@
{
"method":"GET",
"summary":"get counter cache save period in seconds",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_save_period_in_seconds",
"produces":[
"application/json"
@@ -105,7 +105,7 @@
"description":"counter cache save period in seconds",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -118,7 +118,7 @@
{
"method":"GET",
"summary":"get row cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_row_cache_keys_to_save",
"produces":[
"application/json"
@@ -140,7 +140,7 @@
"description":"row cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -153,7 +153,7 @@
{
"method":"GET",
"summary":"get key cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_key_cache_keys_to_save",
"produces":[
"application/json"
@@ -175,7 +175,7 @@
"description":"key cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -188,7 +188,7 @@
{
"method":"GET",
"summary":"get counter cache keys to save",
"type":"int",
"type": "long",
"nickname":"get_counter_cache_keys_to_save",
"produces":[
"application/json"
@@ -210,7 +210,7 @@
"description":"counter cache keys to save",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -448,7 +448,7 @@
{
"method": "GET",
"summary": "Get key entries",
"type": "int",
"type": "long",
"nickname": "get_key_entries",
"produces": [
"application/json"
@@ -568,7 +568,7 @@
{
"method": "GET",
"summary": "Get row entries",
"type": "int",
"type": "long",
"nickname": "get_row_entries",
"produces": [
"application/json"
@@ -688,7 +688,7 @@
{
"method": "GET",
"summary": "Get counter entries",
"type": "int",
"type": "long",
"nickname": "get_counter_entries",
"produces": [
"application/json"

View File

@@ -121,7 +121,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -172,7 +172,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -223,7 +223,7 @@
"description":"The maximum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -231,7 +231,7 @@
"description":"The minimum number of sstables in queue before compaction kicks off",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -544,7 +544,7 @@
"summary":"sstable count for each level. empty unless leveled compaction is used",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_sstable_count_per_level",
"produces":[
@@ -636,7 +636,7 @@
"description":"Duration (in milliseconds) of monitoring operation",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -644,7 +644,7 @@
"description":"number of the top partitions to list",
"required":false,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
},
{
@@ -652,7 +652,7 @@
"description":"capacity of stream summary: determines amount of resources used in query processing",
"required":false,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -921,7 +921,7 @@
{
"method":"GET",
"summary":"Get memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_memtable_switch_count",
"produces":[
"application/json"
@@ -945,7 +945,7 @@
{
"method":"GET",
"summary":"Get all memtable switch count",
"type":"int",
"type": "long",
"nickname":"get_all_memtable_switch_count",
"produces":[
"application/json"
@@ -1082,7 +1082,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1235,7 +1235,7 @@
{
"method":"GET",
"summary":"Get all read latency",
"type":"int",
"type": "long",
"nickname":"get_all_read_latency",
"produces":[
"application/json"
@@ -1251,7 +1251,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"
@@ -1275,7 +1275,7 @@
{
"method":"GET",
"summary":"Get all range latency",
"type":"int",
"type": "long",
"nickname":"get_all_range_latency",
"produces":[
"application/json"
@@ -1291,7 +1291,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1444,7 +1444,7 @@
{
"method":"GET",
"summary":"Get all write latency",
"type":"int",
"type": "long",
"nickname":"get_all_write_latency",
"produces":[
"application/json"
@@ -1460,7 +1460,7 @@
{
"method":"GET",
"summary":"Get pending flushes",
"type":"int",
"type": "long",
"nickname":"get_pending_flushes",
"produces":[
"application/json"
@@ -1484,7 +1484,7 @@
{
"method":"GET",
"summary":"Get all pending flushes",
"type":"int",
"type": "long",
"nickname":"get_all_pending_flushes",
"produces":[
"application/json"
@@ -1500,7 +1500,7 @@
{
"method":"GET",
"summary":"Get pending compactions",
"type":"int",
"type": "long",
"nickname":"get_pending_compactions",
"produces":[
"application/json"
@@ -1524,7 +1524,7 @@
{
"method":"GET",
"summary":"Get all pending compactions",
"type":"int",
"type": "long",
"nickname":"get_all_pending_compactions",
"produces":[
"application/json"
@@ -1540,7 +1540,7 @@
{
"method":"GET",
"summary":"Get live ss table count",
"type":"int",
"type": "long",
"nickname":"get_live_ss_table_count",
"produces":[
"application/json"
@@ -1564,7 +1564,7 @@
{
"method":"GET",
"summary":"Get all live ss table count",
"type":"int",
"type": "long",
"nickname":"get_all_live_ss_table_count",
"produces":[
"application/json"
@@ -1580,7 +1580,7 @@
{
"method":"GET",
"summary":"Get live disk space used",
"type":"int",
"type": "long",
"nickname":"get_live_disk_space_used",
"produces":[
"application/json"
@@ -1604,7 +1604,7 @@
{
"method":"GET",
"summary":"Get all live disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_live_disk_space_used",
"produces":[
"application/json"
@@ -1620,7 +1620,7 @@
{
"method":"GET",
"summary":"Get total disk space used",
"type":"int",
"type": "long",
"nickname":"get_total_disk_space_used",
"produces":[
"application/json"
@@ -1644,7 +1644,7 @@
{
"method":"GET",
"summary":"Get all total disk space used",
"type":"int",
"type": "long",
"nickname":"get_all_total_disk_space_used",
"produces":[
"application/json"
@@ -2100,7 +2100,7 @@
{
"method":"GET",
"summary":"Get speculative retries",
"type":"int",
"type": "long",
"nickname":"get_speculative_retries",
"produces":[
"application/json"
@@ -2124,7 +2124,7 @@
{
"method":"GET",
"summary":"Get all speculative retries",
"type":"int",
"type": "long",
"nickname":"get_all_speculative_retries",
"produces":[
"application/json"
@@ -2204,7 +2204,7 @@
{
"method":"GET",
"summary":"Get row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2228,7 +2228,7 @@
{
"method":"GET",
"summary":"Get all row cache hit out of range",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit_out_of_range",
"produces":[
"application/json"
@@ -2244,7 +2244,7 @@
{
"method":"GET",
"summary":"Get row cache hit",
"type":"int",
"type": "long",
"nickname":"get_row_cache_hit",
"produces":[
"application/json"
@@ -2268,7 +2268,7 @@
{
"method":"GET",
"summary":"Get all row cache hit",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_hit",
"produces":[
"application/json"
@@ -2284,7 +2284,7 @@
{
"method":"GET",
"summary":"Get row cache miss",
"type":"int",
"type": "long",
"nickname":"get_row_cache_miss",
"produces":[
"application/json"
@@ -2308,7 +2308,7 @@
{
"method":"GET",
"summary":"Get all row cache miss",
"type":"int",
"type": "long",
"nickname":"get_all_row_cache_miss",
"produces":[
"application/json"
@@ -2324,7 +2324,7 @@
{
"method":"GET",
"summary":"Get cas prepare",
"type":"int",
"type": "long",
"nickname":"get_cas_prepare",
"produces":[
"application/json"
@@ -2348,7 +2348,7 @@
{
"method":"GET",
"summary":"Get cas propose",
"type":"int",
"type": "long",
"nickname":"get_cas_propose",
"produces":[
"application/json"
@@ -2372,7 +2372,7 @@
{
"method":"GET",
"summary":"Get cas commit",
"type":"int",
"type": "long",
"nickname":"get_cas_commit",
"produces":[
"application/json"

View File

@@ -118,7 +118,7 @@
{
"method": "GET",
"summary": "Get pending tasks",
"type": "int",
"type": "long",
"nickname": "get_pending_tasks",
"produces": [
"application/json"
@@ -127,6 +127,24 @@
}
]
},
{
"path": "/compaction_manager/metrics/pending_tasks_by_table",
"operations": [
{
"method": "GET",
"summary": "Get pending tasks by table name",
"type": "array",
"items": {
"type": "pending_compaction"
},
"nickname": "get_pending_tasks_by_table",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/compaction_manager/metrics/completed_tasks",
"operations": [
@@ -163,7 +181,7 @@
{
"method": "GET",
"summary": "Get bytes compacted",
"type": "int",
"type": "long",
"nickname": "get_bytes_compacted",
"produces": [
"application/json"
@@ -179,7 +197,7 @@
"description":"A row merged information",
"properties":{
"key":{
"type":"int",
"type": "long",
"description":"The number of sstable"
},
"value":{
@@ -244,6 +262,23 @@
}
}
},
"pending_compaction": {
"id": "pending_compaction",
"properties": {
"cf": {
"type": "string",
"description": "The column family name"
},
"ks": {
"type":"string",
"description": "The keyspace name"
},
"task": {
"type":"long",
"description": "The number of pending tasks"
}
}
},
"history": {
"id":"history",
"description":"Compaction history information",

View File

@@ -110,7 +110,7 @@
{
"method":"GET",
"summary":"Get count down endpoint",
"type":"int",
"type": "long",
"nickname":"get_down_endpoint_count",
"produces":[
"application/json"
@@ -126,7 +126,7 @@
{
"method":"GET",
"summary":"Get count up endpoint",
"type":"int",
"type": "long",
"nickname":"get_up_endpoint_count",
"produces":[
"application/json"
@@ -180,11 +180,11 @@
"description": "The endpoint address"
},
"generation": {
"type": "int",
"type": "long",
"description": "The heart beat generation"
},
"version": {
"type": "int",
"type": "long",
"description": "The heart beat version"
},
"update_time": {
@@ -209,7 +209,7 @@
"description": "Holds a version value for an application state",
"properties": {
"application_state": {
"type": "int",
"type": "long",
"description": "The application state enum index"
},
"value": {
@@ -217,7 +217,7 @@
"description": "The version value"
},
"version": {
"type": "int",
"type": "long",
"description": "The application state version"
}
}

View File

@@ -75,7 +75,7 @@
{
"method":"GET",
"summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -99,7 +99,7 @@
{
"method":"GET",
"summary":"Get heart beat version for a node",
"type":"int",
"type": "long",
"nickname":"get_current_heart_beat_version",
"produces":[
"application/json"

View File

@@ -99,7 +99,7 @@
{
"method": "GET",
"summary": "Get create hint count",
"type": "int",
"type": "long",
"nickname": "get_create_hint_count",
"produces": [
"application/json"
@@ -123,7 +123,7 @@
{
"method": "GET",
"summary": "Get not stored hints count",
"type": "int",
"type": "long",
"nickname": "get_not_stored_hints_count",
"produces": [
"application/json"

View File

@@ -191,7 +191,7 @@
{
"method":"GET",
"summary":"Get the version number",
"type":"int",
"type": "long",
"nickname":"get_version",
"produces":[
"application/json"

View File

@@ -105,7 +105,7 @@
{
"method":"GET",
"summary":"Get the max hint window",
"type":"int",
"type": "long",
"nickname":"get_max_hint_window",
"produces":[
"application/json"
@@ -128,7 +128,7 @@
"description":"max hint window in ms",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -141,7 +141,7 @@
{
"method":"GET",
"summary":"Get max hints in progress",
"type":"int",
"type": "long",
"nickname":"get_max_hints_in_progress",
"produces":[
"application/json"
@@ -164,7 +164,7 @@
"description":"max hints in progress",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -177,7 +177,7 @@
{
"method":"GET",
"summary":"get hints in progress",
"type":"int",
"type": "long",
"nickname":"get_hints_in_progress",
"produces":[
"application/json"
@@ -602,7 +602,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -632,7 +632,7 @@
{
"method": "GET",
"summary": "Get cas write metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_write_metrics_condition_not_met",
"produces": [
"application/json"
@@ -647,7 +647,7 @@
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"type": "long",
"nickname": "get_cas_read_metrics_unfinished_commit",
"produces": [
"application/json"
@@ -671,28 +671,13 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/condition_not_met",
"operations": [
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"nickname": "get_cas_read_metrics_condition_not_met",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/timeouts",
"operations": [
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_timeouts",
"produces": [
"application/json"
@@ -707,7 +692,7 @@
{
"method": "GET",
"summary": "Get read metrics",
"type": "int",
"type": "long",
"nickname": "get_read_metrics_unavailables",
"produces": [
"application/json"
@@ -791,6 +776,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS read rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_read_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/view_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get view write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_view_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/moving_average_histogram",
"operations": [
@@ -812,7 +827,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_timeouts",
"produces": [
"application/json"
@@ -827,7 +842,7 @@
{
"method": "GET",
"summary": "Get range metrics",
"type": "int",
"type": "long",
"nickname": "get_range_metrics_unavailables",
"produces": [
"application/json"
@@ -872,7 +887,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_timeouts",
"produces": [
"application/json"
@@ -887,7 +902,7 @@
{
"method": "GET",
"summary": "Get write metrics",
"type": "int",
"type": "long",
"nickname": "get_write_metrics_unavailables",
"produces": [
"application/json"
@@ -956,6 +971,21 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path":"/storage_proxy/metrics/read/estimated_histogram/",
"operations":[
@@ -978,7 +1008,7 @@
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"type": "long",
"nickname":"get_read_latency",
"produces":[
"application/json"
@@ -1010,7 +1040,7 @@
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"type": "long",
"nickname":"get_write_latency",
"produces":[
"application/json"
@@ -1042,7 +1072,7 @@
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"type": "long",
"nickname":"get_range_latency",
"produces":[
"application/json"

View File

@@ -458,7 +458,7 @@
{
"method":"GET",
"summary":"Return the generation value for this node.",
"type":"int",
"type": "long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -646,7 +646,7 @@
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace",
"type":"int",
"type": "long",
"nickname":"force_keyspace_cleanup",
"produces":[
"application/json"
@@ -678,7 +678,7 @@
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",
"type":"int",
"type": "long",
"nickname":"scrub",
"produces":[
"application/json"
@@ -726,7 +726,7 @@
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",
"type":"int",
"type": "long",
"nickname":"upgrade_sstables",
"produces":[
"application/json"
@@ -800,7 +800,7 @@
"summary":"Return an array with the ids of the currently active repairs",
"type":"array",
"items":{
"type":"int"
"type": "long"
},
"nickname":"get_active_repair_async",
"produces":[
@@ -816,7 +816,7 @@
{
"method":"POST",
"summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",
"type":"int",
"type": "long",
"nickname":"repair_async",
"produces":[
"application/json"
@@ -947,7 +947,7 @@
"description":"The repair ID to check for status",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1277,18 +1277,18 @@
},
{
"name":"dynamic_update_interval",
"description":"integer, in ms (default 100)",
"description":"interval in ms (default 100)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
"name":"dynamic_reset_interval",
"description":"integer, in ms (default 600,000)",
"description":"interval in ms (default 600,000)",
"required":false,
"allowMultiple":false,
"type":"integer",
"type":"long",
"paramType":"query"
},
{
@@ -1493,7 +1493,7 @@
"description":"Stream throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1501,7 +1501,7 @@
{
"method":"GET",
"summary":"Get stream throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_stream_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1517,7 +1517,7 @@
{
"method":"GET",
"summary":"get compaction throughput mb per sec",
"type":"int",
"type": "long",
"nickname":"get_compaction_throughput_mb_per_sec",
"produces":[
"application/json"
@@ -1539,7 +1539,7 @@
"description":"compaction throughput",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1943,7 +1943,7 @@
{
"method":"GET",
"summary":"Returns the threshold for warning of queries with many tombstones",
"type":"int",
"type": "long",
"nickname":"get_tombstone_warn_threshold",
"produces":[
"application/json"
@@ -1965,7 +1965,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -1978,7 +1978,7 @@
{
"method":"GET",
"summary":"",
"type":"int",
"type": "long",
"nickname":"get_tombstone_failure_threshold",
"produces":[
"application/json"
@@ -2000,7 +2000,7 @@
"description":"tombstone debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2013,7 +2013,7 @@
{
"method":"GET",
"summary":"Returns the threshold for rejecting queries due to a large batch size",
"type":"int",
"type": "long",
"nickname":"get_batch_size_failure_threshold",
"produces":[
"application/json"
@@ -2035,7 +2035,7 @@
"description":"batch size debug threshold",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2059,7 +2059,7 @@
"description":"throttle in kb",
"required":true,
"allowMultiple":false,
"type":"int",
"type": "long",
"paramType":"query"
}
]
@@ -2072,7 +2072,7 @@
{
"method":"GET",
"summary":"Get load",
"type":"int",
"type": "long",
"nickname":"get_metrics_load",
"produces":[
"application/json"
@@ -2088,7 +2088,7 @@
{
"method":"GET",
"summary":"Get exceptions",
"type":"int",
"type": "long",
"nickname":"get_exceptions",
"produces":[
"application/json"
@@ -2104,7 +2104,7 @@
{
"method":"GET",
"summary":"Get total hints in progress",
"type":"int",
"type": "long",
"nickname":"get_total_hints_in_progress",
"produces":[
"application/json"
@@ -2120,7 +2120,7 @@
{
"method":"GET",
"summary":"Get total hints",
"type":"int",
"type": "long",
"nickname":"get_total_hints",
"produces":[
"application/json"
@@ -2164,7 +2164,42 @@
]
}
]
}
},
{
"path":"/storage_service/sstable_info",
"operations":[
{
"method":"GET",
"summary":"SSTable information",
"type":"array",
"items":{
"type":"table_sstables"
},
"nickname":"sstable_info",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
],
"models":{
"mapper":{
@@ -2324,6 +2359,92 @@
"description":"The endpoint details"
}
}
},
"named_maps":{
"id":"named_maps",
"properties":{
"group":{
"type":"string"
},
"attributes":{
"type":"array",
"items":{
"type":"mapper"
}
}
}
},
"sstable":{
"id":"sstable",
"properties":{
"size":{
"type":"long",
"description":"Total size in bytes of sstable"
},
"data_size":{
"type":"long",
"description":"The size in bytes on disk of data"
},
"index_size":{
"type":"long",
"description":"The size in bytes on disk of index"
},
"filter_size":{
"type":"long",
"description":"The size in bytes on disk of filter"
},
"timestamp":{
"type":"datetime",
"description":"File creation time"
},
"generation":{
"type":"long",
"description":"SSTable generation"
},
"level":{
"type":"long",
"description":"SSTable level"
},
"version":{
"type":"string",
"enum":[
"ka", "la", "mc"
],
"description":"SSTable version"
},
"properties":{
"type":"array",
"description":"SSTable attributes",
"items":{
"type":"mapper"
}
},
"extended_properties":{
"type":"array",
"description":"SSTable extended attributes",
"items":{
"type":"named_maps"
}
}
}
},
"table_sstables":{
"id":"table_sstables",
"description":"Per-table SSTable info and attributes",
"properties":{
"keyspace":{
"type":"string"
},
"table":{
"type":"string"
},
"sstables":{
"type":"array",
"items":{
"$ref":"sstable"
}
}
}
}
}
}

View File

@@ -32,7 +32,7 @@
{
"method":"GET",
"summary":"Get number of active outbound streams",
"type":"int",
"type": "long",
"nickname":"get_all_active_streams_outbound",
"produces":[
"application/json"
@@ -48,7 +48,7 @@
{
"method":"GET",
"summary":"Get total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_total_incoming_bytes",
"produces":[
"application/json"
@@ -72,7 +72,7 @@
{
"method":"GET",
"summary":"Get all total incoming bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_incoming_bytes",
"produces":[
"application/json"
@@ -88,7 +88,7 @@
{
"method":"GET",
"summary":"Get total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_total_outgoing_bytes",
"produces":[
"application/json"
@@ -112,7 +112,7 @@
{
"method":"GET",
"summary":"Get all total outgoing bytes",
"type":"int",
"type": "long",
"nickname":"get_all_total_outgoing_bytes",
"produces":[
"application/json"
@@ -154,7 +154,7 @@
"description":"The peer"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"connecting":{
@@ -211,7 +211,7 @@
"description":"The ID"
},
"files":{
"type":"int",
"type": "long",
"description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."
},
"total_size":{
@@ -242,7 +242,7 @@
"description":"The peer address"
},
"session_index":{
"type":"int",
"type": "long",
"description":"The session index"
},
"file_name":{

View File

@@ -52,6 +52,21 @@
}
]
},
{
"path":"/system/uptime_ms",
"operations":[
{
"method":"GET",
"summary":"Get system uptime, in milliseconds",
"type":"long",
"nickname":"get_system_uptime",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/system/logger/{name}",
"operations":[

View File

@@ -22,6 +22,7 @@
#pragma once
#include <seastar/json/json_elements.hh>
#include <type_traits>
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
@@ -231,7 +232,22 @@ public:
return;
}
try {
value = T{boost::lexical_cast<Base>(param)};
// boost::lexical_cast does not use boolalpha. Converting a
// true/false throws exceptions. We don't want that.
if constexpr (std::is_same_v<Base, bool>) {
// Cannot use boolalpha because we (probably) want to
// accept 1 and 0 as well as true and false. And True. And fAlse.
std::transform(param.begin(), param.end(), param.begin(), ::tolower);
if (param == "true" || param == "1") {
value = T(true);
} else if (param == "false" || param == "0") {
value = T(false);
} else {
throw boost::bad_lexical_cast{};
}
} else {
value = T{boost::lexical_cast<Base>(param)};
}
} catch (boost::bad_lexical_cast&) {
throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
}

View File

@@ -23,6 +23,8 @@
#include "service/storage_proxy.hh"
#include <seastar/http/httpd.hh>
namespace service { class load_meter; }
namespace api {
struct http_context {
@@ -31,9 +33,11 @@ struct http_context {
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm)
: db(_db), sp(_sp), lmeter(_lm) {
}
};

View File

@@ -26,7 +26,7 @@
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/system_keyspace_view_types.hh"
#include "db/data_listeners.hh"
extern logging::logger apilog;
@@ -53,8 +53,7 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {
try {
return db.find_uuid(ks, cf);
} catch (std::out_of_range& e) {
@@ -62,6 +61,11 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
}
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
return get_uuid(ks, cf, db);
}
future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {
auto uuid = get_uuid(name, ctx.db.local());
@@ -71,28 +75,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f) {
int64_t column_family_stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f) {
int64_t column_family_stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, f](database& db) {
// Histograms information is sample of the actual load
@@ -108,14 +112,14 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
@@ -126,7 +130,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
@@ -142,7 +146,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
@@ -153,7 +157,7 @@ static future<json::json_return_type> get_cf_rate_and_histogram(http_context& c
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db) {
utils::rate_moving_average_and_histogram res;
for (auto i : db.get_column_families()) {
@@ -250,12 +254,11 @@ class sum_ratio {
uint64_t _n = 0;
T _total = 0;
public:
future<> operator()(T value) {
void operator()(T value) {
if (value > 0) {
_total += value;
_n++;
}
return make_ready_future<>();
}
// Returns average value of all registered ratios.
T get() && {
@@ -404,11 +407,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);
});
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);
});
// FIXME: this refers to partitions, not rows.
@@ -453,67 +456,67 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);
});
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::pending_flushes);
return get_cf_stats(ctx, &column_family_stats::pending_flushes);
});
cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);
return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);
});
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family::stats::reads);
return get_cf_stats_count(ctx, &column_family_stats::reads);
});
cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);
return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);
});
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family::stats::writes);
return get_cf_stats_count(ctx, &column_family_stats::writes);
});
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);
});
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);
});
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
return get_cf_histogram(ctx, &column_family_stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
return get_cf_histogram(ctx, &column_family_stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -529,11 +532,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);
return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);
});
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_sstable_count);
return get_cf_stats(ctx, &column_family_stats::live_sstable_count);
});
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -792,25 +795,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_prepare;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_propose;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_commit;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -821,11 +824,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -843,13 +846,28 @@ void set_column_family(http_context& ctx, routes& r) {
return true;
});
cf::get_built_indexes.set(r, [](const_req) {
// FIXME
// Currently there are no index support
return std::vector<sstring>();
cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);
return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
std::set<sstring> vp;
for (auto b : vb) {
if (b.view.first == ks) {
vp.insert(b.view.second);
}
}
std::vector<sstring> res;
auto uuid = get_uuid(ks, cf_name, ctx.db.local());
column_family& cf = ctx.db.local().find_column_family(uuid);
res.reserve(cf.get_index_manager().list_indexes().size());
for (auto&& i : cf.get_index_manager().list_indexes()) {
if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {
res.emplace_back(i.metadata().name());
}
}
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
// FIXME
// Currently there are no information on the compression

View File

@@ -39,14 +39,14 @@ template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
using mapper_type = std::function<std::any (database&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
using mapper_type = std::function<std::unique_ptr<std::any>(database&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {
return I(mapper(db.find_column_family(uuid)));
}), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
})).then([] (std::any r) {
return std::any_cast<I>(std::move(r));
return std::make_unique<std::any>(I(mapper(db.find_column_family(uuid))));
}), std::make_unique<std::any>(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
})).then([] (std::unique_ptr<std::any> r) {
return std::any_cast<I>(std::move(*r));
});
}
@@ -70,14 +70,14 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
struct map_reduce_column_families_locally {
std::any init;
std::function<std::any (column_family&)> mapper;
std::function<std::any (std::any, std::any)> reducer;
future<std::any> operator()(database& db) const {
auto res = seastar::make_lw_shared<std::any>(init);
std::function<std::unique_ptr<std::any>(column_family&)> mapper;
std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;
future<std::unique_ptr<std::any>> operator()(database& db) const {
auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));
return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
*res = reducer(*res.get(), mapper(*i.second.get()));
*res = std::move(reducer(std::move(*res), mapper(*i.second.get())));
}).then([res] {
return *res;
return std::move(*res);
});
}
};
@@ -85,16 +85,17 @@ struct map_reduce_column_families_locally {
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
using mapper_type = std::function<std::any (column_family&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
using mapper_type = std::function<std::unique_ptr<std::any>(column_family&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {
return I(mapper(cf));
return std::make_unique<std::any>(I(mapper(cf)));
});
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
});
return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {
return std::any_cast<I>(std::move(res));
return ctx.db.map_reduce0(map_reduce_column_families_locally{init,
std::move(wrapped_mapper), wrapped_reducer}, std::make_unique<std::any>(init), wrapped_reducer).then([] (std::unique_ptr<std::any> res) {
return std::any_cast<I>(std::move(*res));
});
}
@@ -108,9 +109,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f);
int64_t column_family_stats::*f);
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f);
int64_t column_family_stats::*f);
}

View File

@@ -24,6 +24,7 @@
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
#include <utility>
namespace api {
@@ -38,6 +39,16 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,
return make_ready_future<json::json_return_type>(res);
});
}
static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> sum_pending_tasks(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>&& a,
const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& b) {
for (auto&& i : b) {
if (i.second) {
a[i.first] += i.second;
}
}
return std::move(a);
}
void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -61,6 +72,32 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
});
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([&ctx](database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
return make_ready_future<>();
}).then([&tasks] {
return std::move(tasks);
});
});
}, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(
[](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {
std::vector<cm::pending_compaction> res;
res.reserve(task_map.size());
for (auto i : task_map) {
cm::pending_compaction task;
task.ks = i.first.first;
task.cf = i.first.second;
task.task = i.second;
res.emplace_back(std::move(task));
}
return make_ready_future<json::json_return_type>(res);
});
});
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME

View File

@@ -44,14 +44,14 @@ json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
return json::json_return_type(val.class_name);
}
std::string format_type(const std::string& type) {
std::string_view format_type(std::string_view type) {
if (type == "int") {
return "integer";
}
return type;
}
future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {
future<> get_config_swagger_entry(std::string_view name, const std::string& description, std::string_view type, bool& first, output_stream<char>& os) {
std::stringstream ss;
if (first) {
first=false;
@@ -88,23 +88,29 @@ future<> get_config_swagger_entry(const std::string& name, const std::string& de
}
namespace cs = httpd::config_json;
#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}
#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});
void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {
rb->register_function(r, [] (output_stream<char>& os) {
return do_with(true, [&os] (bool& first) {
rb->register_function(r, [&ctx] (output_stream<char>& os) {
return do_with(true, [&os, &ctx] (bool& first) {
auto f = make_ready_future();
_make_config_values(_get_config_description)
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
f = f.then([&os, &first, &cfg] {
return get_config_swagger_entry(cfg.name(), std::string(cfg.desc()), cfg.type_name(), first, os);
});
}
return f;
});
});
cs::find_config_id.set(r, [&ctx] (const_req r) {
auto id = r.param["id"];
_make_config_values(_get_config_value)
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
if (id == cfg.name()) {
return cfg.value_as_json();
}
}
throw bad_param_exception(sstring("No such config entry: ") + id);
});
}

View File

@@ -76,7 +76,7 @@ future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
ms.foreach_server_connection_stats([&map, f] (const rpc::client_info& info, const rpc::stats& stats) mutable {
map[gms::inet_address(net::ipv4_address(info.addr))] = f(stats);
map[gms::inet_address(info.addr.addr())] = f(stats);
});
return map;
};

View File

@@ -47,6 +47,10 @@ static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>&
});
}
httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average() {
return timer_to_json(utils::rate_moving_average_and_histogram());
}
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
return make_ready_future<json::json_return_type>(val.count);
@@ -77,12 +81,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// hinted handoff is not supported currently,
// so we should return false
return make_ready_future<json::json_return_type>(false);
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
return make_ready_future<json::json_return_type>(enabled);
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
@@ -246,68 +247,40 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
});
sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
});
sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
});
sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
});
sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
});
sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
});
sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
});
sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
});
sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
});
sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -377,6 +350,21 @@ void set_storage_proxy(http_context& ctx, routes& r) {
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::write);
});
sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
});
sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
});
sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// No View metrics are available, so just return empty moving average
return make_ready_future<json::json_return_type>(get_empty_moving_average());
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::read);

View File

@@ -22,13 +22,16 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include <optional>
#include <time.h>
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include <service/storage_service.hh>
#include <db/commitlog/commitlog.hh>
#include <gms/gossiper.hh>
#include <db/system_keyspace.hh>
#include <seastar/http/exception.hh>
#include "service/storage_service.hh"
#include "service/load_meter.hh"
#include "db/commitlog/commitlog.hh"
#include "gms/gossiper.hh"
#include "db/system_keyspace.hh"
#include "seastar/http/exception.hh"
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
@@ -37,6 +40,7 @@
#include "sstables/compaction_manager.hh"
#include "sstables/sstables.hh"
#include "database.hh"
#include "db/extensions.hh"
sstables::sstable::version_types get_highest_supported_format();
@@ -52,27 +56,22 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
}
static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
std::vector<ss::token_range> res;
for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
r.endpoint_details.push(ed);
static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
res.push_back(r);
r.endpoint_details.push(ed);
}
return res;
return r;
}
void set_storage_service(http_context& ctx, routes& r) {
@@ -174,13 +173,13 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
ss::describe_any_ring.set(r, [&ctx](const_req req) {
return describe_ring("");
ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));
});
ss::describe_ring.set(r, [&ctx](const_req req) {
auto keyspace = validate_keyspace(ctx, req.param);
return describe_ring(keyspace);
ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));
});
ss::get_host_id_map.set(r, [](const_req req) {
@@ -190,11 +189,11 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
});
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.lmeter.get_load_map().then([] (auto&& load_map) {
std::vector<ss::map_string_double> res;
for (auto i : load_map) {
ss::map_string_double val;
@@ -252,6 +251,9 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_family.empty()) {
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.empty()) {
throw httpd::bad_param_exception("The keyspace of column families must be specified");
}
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
@@ -302,17 +304,24 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
if (!is_cleanup_allowed) {
return make_exception_future<json::json_return_type>(
std::runtime_error("Can not perform cleanup operation when topology changes"));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
@@ -596,9 +605,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::join_ring.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().join_ring().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
return make_ready_future<json::json_return_type>(json_void());
});
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
@@ -858,7 +865,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
});
ss::get_exceptions.set(r, [](const_req req) {
@@ -900,6 +907,133 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
});
});
ss::sstable_info.set(r, [&ctx] (std::unique_ptr<request> req) {
auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;
auto cf = api::req_param<sstring>(*req, "cf", {}).value;
// The size of this vector is bound by ks::cf. I.e. it is as most Nks + Ncf long
// which is not small, but not huge either.
using table_sstables_list = std::vector<ss::table_sstables>;
return do_with(table_sstables_list{}, [ks, cf, &ctx](table_sstables_list& dst) {
return service::get_local_storage_service().db().map_reduce([&dst](table_sstables_list&& res) {
for (auto&& t : res) {
auto i = std::find_if(dst.begin(), dst.end(), [&t](const ss::table_sstables& t2) {
return t.keyspace() == t2.keyspace() && t.table() == t2.table();
});
if (i == dst.end()) {
dst.emplace_back(std::move(t));
continue;
}
auto& ssd = i->sstables;
for (auto&& sd : t.sstables._elements) {
auto j = std::find_if(ssd._elements.begin(), ssd._elements.end(), [&sd](const ss::sstable& s) {
return s.generation() == sd.generation();
});
if (j == ssd._elements.end()) {
i->sstables.push(std::move(sd));
}
}
}
}, [ks, cf](const database& db) {
// see above
table_sstables_list res;
auto& ext = db.get_config().extensions();
for (auto& t : db.get_column_families() | boost::adaptors::map_values) {
auto& schema = t->schema();
if ((ks.empty() || ks == schema->ks_name()) && (cf.empty() || cf == schema->cf_name())) {
// at most Nsstables long
ss::table_sstables tst;
tst.keyspace = schema->ks_name();
tst.table = schema->cf_name();
for (auto sstable : *t->get_sstables_including_compacted_undeleted()) {
auto ts = db_clock::to_time_t(sstable->data_file_write_time());
::tm t;
::gmtime_r(&ts, &t);
ss::sstable info;
info.timestamp = t;
info.generation = sstable->generation();
info.level = sstable->get_sstable_level();
info.size = sstable->bytes_on_disk();
info.data_size = sstable->ondisk_data_size();
info.index_size = sstable->index_size();
info.filter_size = sstable->filter_size();
info.version = sstable->get_version();
if (sstable->has_component(sstables::component_type::CompressionInfo)) {
auto& c = sstable->get_compression();
auto cp = sstables::get_sstable_compressor(c);
ss::named_maps nm;
nm.group = "compression_parameters";
for (auto& p : cp->options()) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
if (!cp->options().count(compression_parameters::SSTABLE_COMPRESSION)) {
ss::mapper e;
e.key = compression_parameters::SSTABLE_COMPRESSION;
e.value = cp->name();
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
sstables::file_io_extension::attr_value_map map;
for (auto* ep : ext.sstable_file_io_extensions()) {
map.merge(ep->get_attributes(*sstable));
}
for (auto& p : map) {
struct {
const sstring& key;
ss::sstable& info;
void operator()(const std::map<sstring, sstring>& map) const {
ss::named_maps nm;
nm.group = key;
for (auto& p : map) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
void operator()(const sstring& value) const {
ss::mapper e;
e.key = key;
e.value = value;
info.properties.push(std::move(e));
}
} v{p.first, info};
std::visit(v, p.second);
}
tst.sstables.push(std::move(info));
}
res.emplace_back(std::move(tst));
}
}
std::sort(res.begin(), res.end(), [](const ss::table_sstables& t1, const ss::table_sstables& t2) {
return t1.keyspace() < t2.keyspace() || (t1.keyspace() == t2.keyspace() && t1.table() < t2.table());
});
return res;
}).then([&dst] {
return make_ready_future<json::json_return_type>(stream_object(dst));
});
});
});
}
}

View File

@@ -30,6 +30,10 @@ namespace api {
namespace hs = httpd::system_json;
void set_system(http_context& ctx, routes& r) {
hs::get_system_uptime.set(r, [](const_req req) {
return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();
});
hs::get_all_logger_names.set(r, [](const_req req) {
return logging::logger_registry().get_all_logger_names();
});

View File

@@ -21,8 +21,8 @@
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "counters.hh"
#include "types.hh"
#include "types/collection.hh"
/// LSA mirator for cells with irrelevant type
///
@@ -148,35 +148,6 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,
{
}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
: _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
{
auto ptr_a = _data.get();
@@ -231,7 +202,7 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
size_t external_value_size = 0;
if (flags.get<data::cell::tags::external_data>()) {
if (flags.get<data::cell::tags::collection>()) {
external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
external_value_size = as_collection_mutation().data.size_bytes();
} else {
auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
external_value_size = cell_view.value_size();
@@ -244,6 +215,61 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view& acv) {
if (acv.is_live()) {
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
acv.is_counter_update()
? "counter_update_value=" + to_sstring(acv.counter_update_value())
: to_hex(acv.value().linearize()),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell& ac) {
return os << atomic_cell_view(ac);
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {
auto& type = acvp._type;
auto& acv = acvp._cell;
if (acv.is_live()) {
std::ostringstream cell_value_string_builder;
if (type.is_counter()) {
if (acv.is_counter_update()) {
cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();
} else {
cell_value_string_builder << "shards: ";
counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {
cell_value_string_builder << ::join(", ", ccv.shards());
});
}
} else {
cell_value_string_builder << type.to_string(acv.value().linearize());
}
return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
cell_value_string_builder.str(),
acv.timestamp(),
acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
} else {
return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
acv.timestamp(), acv.deletion_time().time_since_epoch().count());
}
}
std::ostream&
operator<<(std::ostream& os, const atomic_cell::printer& acp) {
return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {
if (!p._cell._data.get()) {
return os << "{ null atomic_cell_or_collection }";
@@ -253,9 +279,9 @@ std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::prin
if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {
os << "collection ";
auto cmv = p._cell.as_collection_mutation();
os << to_hex(cmv.data.linearize());
os << collection_mutation_view::printer(*p._cdef.type, cmv);
} else {
os << p._cell.as_atomic_cell(p._cdef);
os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));
}
return os << " }";
}

View File

@@ -153,6 +153,14 @@ public:
}
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
class printer {
const abstract_type& _type;
const atomic_cell_view& _cell;
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
@@ -219,30 +227,12 @@ public:
static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
friend class atomic_cell_or_collection;
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
};
class collection_mutation_view;
// Represents a mutation of a collection. Actual format is determined by collection type,
// and is:
// set: list of atomic_cell
// map: list of pair<atomic_cell, bytes> (for key/value)
// list: tbd, probably ugly
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
collection_mutation() {}
collection_mutation(const collection_type_impl&, collection_mutation_view v);
collection_mutation(const collection_type_impl&, bytes_view bv);
operator collection_mutation_view() const;
};
class collection_mutation_view {
public:
atomic_cell_value_view data;
class printer : atomic_cell_view::printer {
public:
printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}
friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
};
};
class column_definition;

View File

@@ -34,14 +34,12 @@ template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
auto m_view = ctype->deserialize_mutation_form(cell_bv);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
});
}
};

View File

@@ -22,6 +22,7 @@
#pragma once
#include "atomic_cell.hh"
#include "collection_mutation.hh"
#include "schema.hh"
#include "hashing.hh"

View File

@@ -33,6 +33,7 @@
#include "auth/resource.hh"
#include "seastarx.hh"
#include "exceptions/exceptions.hh"
namespace auth {
@@ -52,9 +53,9 @@ struct role_config_update final {
///
/// A logical argument error for a role-management operation.
///
class roles_argument_exception : public std::invalid_argument {
class roles_argument_exception : public exceptions::invalid_request_exception {
public:
using std::invalid_argument::invalid_argument;
using exceptions::invalid_request_exception::invalid_request_exception;
};
class role_already_exists : public roles_argument_exception {

View File

@@ -39,7 +39,7 @@
#include "db/consistency_level_type.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_listener.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
@@ -77,17 +77,23 @@ private:
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_drop_keyspace(const sstring& ks_name) override {
_authorizer.revoke_all(
// Do it in the background.
(void)_authorizer.revoke_all(
auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);
});
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
_authorizer.revoke_all(
// Do it in the background.
(void)_authorizer.revoke_all(
auth::make_data_resource(
ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped table: {}", e);
});
}
@@ -108,14 +114,14 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_manager& mm,
::service::migration_notifier& mn,
std::unique_ptr<authorizer> z,
std::unique_ptr<authenticator> a,
std::unique_ptr<role_manager> r)
: _permissions_cache_config(std::move(c))
, _permissions_cache(nullptr)
, _qp(qp)
, _migration_manager(mm)
, _mnotifier(mn)
, _authorizer(std::move(z))
, _authenticator(std::move(a))
, _role_manager(std::move(r))
@@ -135,18 +141,19 @@ service::service(
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
::service::migration_notifier& mn,
::service::migration_manager& mm,
const service_config& sc)
: service(
std::move(c),
qp,
mm,
mn,
create_object<authorizer>(sc.authorizer_java_name, qp, mm),
create_object<authenticator>(sc.authenticator_java_name, qp, mm),
create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {
}
future<> service::create_keyspace_if_missing() const {
future<> service::create_keyspace_if_missing(::service::migration_manager& mm) const {
auto& db = _qp.db();
if (!db.has_keyspace(meta::AUTH_KS)) {
@@ -160,22 +167,24 @@ future<> service::create_keyspace_if_missing() const {
// We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
// See issue #2129.
return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
return mm.announce_new_keyspace(ksm, api::min_timestamp, false);
}
return make_ready_future<>();
}
future<> service::start() {
return once_among_shards([this] {
return create_keyspace_if_missing();
future<> service::start(::service::migration_manager& mm) {
return once_among_shards([this, &mm] {
return create_keyspace_if_missing(mm);
}).then([this] {
return when_all_succeed(_role_manager->start(), _authorizer->start(), _authenticator->start());
return _role_manager->start().then([this] {
return when_all_succeed(_authorizer->start(), _authenticator->start());
});
}).then([this] {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
return once_among_shards([this] {
_migration_manager.register_listener(_migration_listener.get());
_mnotifier.register_listener(_migration_listener.get());
return make_ready_future<>();
});
});
@@ -184,9 +193,9 @@ future<> service::start() {
future<> service::stop() {
// Only one of the shards has the listener registered, but let's try to
// unregister on each one just to make sure.
_migration_manager.unregister_listener(_migration_listener.get());
return _permissions_cache->stop().then([this] {
return _mnotifier.unregister_listener(_migration_listener.get()).then([this] {
return _permissions_cache->stop();
}).then([this] {
return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
});
}

View File

@@ -28,6 +28,7 @@
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include <seastar/util/bool_class.hh>
#include <seastar/core/sharded.hh>
#include "auth/authenticator.hh"
#include "auth/authorizer.hh"
@@ -42,6 +43,7 @@ class query_processor;
namespace service {
class migration_manager;
class migration_notifier;
class migration_listener;
}
@@ -76,13 +78,15 @@ public:
///
/// All state associated with access-control is stored externally to any particular instance of this class.
///
class service final {
/// peering_sharded_service inheritance is needed to be able to access shard local authentication service
/// given an object from another shard. Used for bouncing lwt requests to correct shard.
class service final : public seastar::peering_sharded_service<service> {
permissions_cache_config _permissions_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache;
cql3::query_processor& _qp;
::service::migration_manager& _migration_manager;
::service::migration_notifier& _mnotifier;
std::unique_ptr<authorizer> _authorizer;
@@ -97,7 +101,7 @@ public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_manager&,
::service::migration_notifier&,
std::unique_ptr<authorizer>,
std::unique_ptr<authenticator>,
std::unique_ptr<role_manager>);
@@ -110,10 +114,11 @@ public:
service(
permissions_cache_config,
cql3::query_processor&,
::service::migration_notifier&,
::service::migration_manager&,
const service_config&);
future<> start();
future<> start(::service::migration_manager&);
future<> stop();
@@ -159,7 +164,7 @@ public:
private:
future<bool> has_existing_legacy_users() const;
future<> create_keyspace_if_missing() const;
future<> create_keyspace_if_missing(::service::migration_manager& mm) const;
};
future<bool> has_superuser(const service&, const authenticated_user&);

View File

@@ -101,8 +101,8 @@ static future<std::optional<record>> find_record(cql3::query_processor& qp, std:
return std::make_optional(
record{
row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
row.get_as<bool>("is_superuser"),
row.get_as<bool>("can_login"),
row.get_or<bool>("is_superuser", false),
row.get_or<bool>("can_login", false),
(row.has("member_of")
? row.get_set<sstring>("member_of")
: role_set())});
@@ -203,7 +203,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
role_config config;
config.is_superuser = row.get_as<bool>("super");
config.is_superuser = row.get_or<bool>("super", false);
config.can_login = true;
return do_with(

71
build_id.cc Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
#include "build_id.hh"
#include <fmt/printf.h>
#include <link.h>
#include <seastar/core/align.hh>
#include <sstream>
using namespace seastar;
static const Elf64_Nhdr* get_nt_build_id(dl_phdr_info* info) {
auto base = info->dlpi_addr;
const auto* h = info->dlpi_phdr;
auto num_headers = info->dlpi_phnum;
for (int i = 0; i != num_headers; ++i, ++h) {
if (h->p_type != PT_NOTE) {
continue;
}
auto* p = reinterpret_cast<const char*>(base) + h->p_vaddr;
auto* e = p + h->p_memsz;
while (p != e) {
const auto* n = reinterpret_cast<const Elf64_Nhdr*>(p);
if (n->n_type == NT_GNU_BUILD_ID) {
return n;
}
p += sizeof(Elf64_Nhdr);
p += n->n_namesz;
p = align_up(p, 4);
p += n->n_descsz;
p = align_up(p, 4);
}
}
assert(0 && "no NT_GNU_BUILD_ID note");
}
static int callback(dl_phdr_info* info, size_t size, void* data) {
std::string& ret = *(std::string*)data;
std::ostringstream os;
// The first DSO is always the main program, which has an empty name.
assert(strlen(info->dlpi_name) == 0);
auto* n = get_nt_build_id(info);
auto* p = reinterpret_cast<const char*>(n);
p += sizeof(Elf64_Nhdr);
p += n->n_namesz;
p = align_up(p, 4);
const char* desc = p;
for (unsigned i = 0; i < n->n_descsz; ++i) {
fmt::fprintf(os, "%02x", (unsigned char)*(desc + i));
}
ret = os.str();
return 1;
}
std::string get_build_id() {
std::string ret;
int r = dl_iterate_phdr(callback, &ret);
assert(r == 1);
return ret;
}

9
build_id.hh Normal file
View File

@@ -0,0 +1,9 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
#pragma once
#include <string>
std::string get_build_id();

View File

@@ -38,6 +38,7 @@ class bytes_ostream {
public:
using size_type = bytes::size_type;
using value_type = bytes::value_type;
using fragment_type = bytes_view;
static constexpr size_type max_chunk_size() { return 128 * 1024; }
private:
static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
@@ -93,6 +94,29 @@ public:
return _current != other._current;
}
};
using const_iterator = fragment_iterator;
class output_iterator {
public:
using iterator_category = std::output_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = bytes_ostream::value_type;
using pointer = bytes_ostream::value_type*;
using reference = bytes_ostream::value_type&;
friend class bytes_ostream;
private:
bytes_ostream* _ostream = nullptr;
private:
explicit output_iterator(bytes_ostream& os) : _ostream(&os) { }
public:
reference operator*() const { return *_ostream->write_place_holder(1); }
output_iterator& operator++() { return *this; }
output_iterator operator++(int) { return *this; }
};
private:
inline size_type current_space_left() const {
if (!_current) {
@@ -289,6 +313,11 @@ public:
return _size;
}
// For the FragmentRange concept
size_type size_bytes() const {
return _size;
}
bool empty() const {
return _size == 0;
}
@@ -326,6 +355,8 @@ public:
fragment_iterator begin() const { return { _begin.get() }; }
fragment_iterator end() const { return { nullptr }; }
output_iterator write_begin() { return output_iterator(*this); }
boost::iterator_range<fragment_iterator> fragments() const {
return { begin(), end() };
}

View File

@@ -61,6 +61,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
// - _underlying is engaged and fast-forwarded
reading_from_underlying,
end_of_stream
@@ -99,7 +100,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// forward progress is not guaranteed in case iterators are getting constantly invalidated.
bool _lower_bound_changed = false;
// Points to the underlying reader conforming to _schema,
// either to *_underlying_holder or _read_context->underlying().underlying().
flat_mutation_reader* _underlying = nullptr;
std::optional<flat_mutation_reader> _underlying_holder;
future<> do_fill_buffer(db::timeout_clock::time_point);
future<> ensure_underlying(db::timeout_clock::time_point);
void copy_from_cache_to_buffer();
future<> process_static_row(db::timeout_clock::time_point);
void move_to_end();
@@ -186,23 +193,22 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_
return make_ready_future<>();
} else {
_read_context->cache().on_row_miss();
return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
return ensure_underlying(timeout).then([this, timeout] {
return (*_underlying)(timeout).then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
});
});
}
}
inline
void cache_flat_mutation_reader::touch_partition() {
if (_snp->at_latest_version()) {
rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
_snp->tracker()->touch(last_dummy);
}
_snp->touch();
}
inline
@@ -232,14 +238,36 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t
});
}
inline
future<> cache_flat_mutation_reader::ensure_underlying(db::timeout_clock::time_point timeout) {
if (_underlying) {
return make_ready_future<>();
}
return _read_context->ensure_underlying(timeout).then([this, timeout] {
flat_mutation_reader& ctx_underlying = _read_context->underlying().underlying();
if (ctx_underlying.schema() != _schema) {
_underlying_holder = make_delegating_reader(ctx_underlying);
_underlying_holder->upgrade_schema(_schema);
_underlying = &*_underlying_holder;
} else {
_underlying = &ctx_underlying;
}
});
}
inline
future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::move_to_underlying) {
if (!_underlying) {
return ensure_underlying(timeout).then([this, timeout] {
return do_fill_buffer(timeout);
});
}
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
return _underlying->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
return read_from_underlying(timeout);
});
}
@@ -280,7 +308,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin
inline
future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
return consume_mutation_fragments_until(_read_context->underlying().underlying(),
return consume_mutation_fragments_until(*_underlying,
[this] { return _state != state::reading_from_underlying || is_buffer_full(); },
[this] (mutation_fragment mf) {
_read_context->cache().on_row_miss();

View File

@@ -35,6 +35,7 @@
#include "idl/uuid.dist.impl.hh"
#include "idl/keys.dist.impl.hh"
#include "idl/mutation.dist.impl.hh"
#include <iostream>
canonical_mutation::canonical_mutation(bytes data)
: _data(std::move(data))
@@ -79,7 +80,8 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
if (version == m.schema()->version()) {
auto partition_view = mutation_partition_view::from_view(mv.partition());
m.partition().apply(*m.schema(), partition_view, *m.schema());
mutation_application_stats app_stats;
m.partition().apply(*m.schema(), partition_view, *m.schema(), app_stats);
} else {
column_mapping cm = mv.mapping();
converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
@@ -88,3 +90,81 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
}
return m;
}
static sstring bytes_to_text(bytes_view bv) {
sstring ret(sstring::initialized_later(), bv.size());
std::copy_n(reinterpret_cast<const char*>(bv.data()), bv.size(), ret.data());
return ret;
}
std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm) {
auto in = ser::as_input_stream(cm._data);
auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());
column_mapping mapping = mv.mapping();
auto partition_view = mutation_partition_view::from_view(mv.partition());
fmt::print(os, "{{canonical_mutation: ");
fmt::print(os, "table_id {} schema_version {} ", mv.table_id(), mv.schema_version());
fmt::print(os, "partition_key {} ", mv.key());
class printing_visitor : public mutation_partition_view_virtual_visitor {
std::ostream& _os;
const column_mapping& _cm;
bool _first = true;
bool _in_row = false;
private:
void print_separator() {
if (!_first) {
fmt::print(_os, ", ");
}
_first = false;
}
public:
printing_visitor(std::ostream& os, const column_mapping& cm) : _os(os), _cm(cm) {}
virtual void accept_partition_tombstone(tombstone t) override {
print_separator();
fmt::print(_os, "partition_tombstone {}", t);
}
virtual void accept_static_cell(column_id id, atomic_cell ac) override {
print_separator();
auto&& entry = _cm.static_column_at(id);
fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
}
virtual void accept_static_cell(column_id id, collection_mutation_view cmv) override {
print_separator();
auto&& entry = _cm.static_column_at(id);
fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
}
virtual void accept_row_tombstone(range_tombstone rt) override {
print_separator();
fmt::print(_os, "row tombstone {}", rt);
}
virtual void accept_row(position_in_partition_view pipv, row_tombstone rt, row_marker rm, is_dummy, is_continuous) override {
if (_in_row) {
fmt::print(_os, "}}, ");
}
fmt::print(_os, "{{row {} tombstone {} marker {}", pipv, rt, rm);
_in_row = true;
_first = false;
}
virtual void accept_row_cell(column_id id, atomic_cell ac) override {
print_separator();
auto&& entry = _cm.regular_column_at(id);
fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
}
virtual void accept_row_cell(column_id id, collection_mutation_view cmv) override {
print_separator();
auto&& entry = _cm.regular_column_at(id);
fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
}
void finalize() {
if (_in_row) {
fmt::print(_os, "}}");
}
}
};
printing_visitor pv(os, mapping);
partition_view.accept(mapping, pv);
pv.finalize();
fmt::print(os, "}}");
return os;
}

View File

@@ -26,6 +26,7 @@
#include "database_fwd.hh"
#include "mutation_partition_visitor.hh"
#include "mutation_partition_serializer.hh"
#include <iosfwd>
// Immutable mutation form which can be read using any schema version of the same table.
// Safe to access from other shards via const&.
@@ -52,4 +53,5 @@ public:
const bytes& representation() const { return _data; }
friend std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm);
};

835
cdc/cdc.cc Normal file
View File

@@ -0,0 +1,835 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <utility>
#include <algorithm>
#include <boost/range/irange.hpp>
#include <seastar/util/defer.hh>
#include <seastar/core/thread.hh>
#include "cdc/cdc.hh"
#include "bytes.hh"
#include "database.hh"
#include "db/config.hh"
#include "dht/murmur3_partitioner.hh"
#include "partition_slice_builder.hh"
#include "schema.hh"
#include "schema_builder.hh"
#include "service/migration_listener.hh"
#include "service/storage_service.hh"
#include "types/tuple.hh"
#include "cql3/statements/select_statement.hh"
#include "cql3/multi_column_relation.hh"
#include "cql3/tuples.hh"
#include "log.hh"
#include "json.hh"
using locator::snitch_ptr;
using locator::token_metadata;
using locator::topology;
using seastar::sstring;
using service::migration_notifier;
using service::storage_proxy;
namespace std {
template<> struct hash<std::pair<net::inet_address, unsigned int>> {
std::size_t operator()(const std::pair<net::inet_address, unsigned int> &p) const {
return std::hash<net::inet_address>{}(p.first) ^ std::hash<int>{}(p.second);
}
};
}
using namespace std::chrono_literals;
static logging::logger cdc_log("cdc");
namespace cdc {
static schema_ptr create_log_schema(const schema&, std::optional<utils::UUID> = {});
static schema_ptr create_stream_description_table_schema(const schema&, std::optional<utils::UUID> = {});
static future<> populate_desc(db_context ctx, const schema& s);
}
class cdc::cdc_service::impl : service::migration_listener::empty_listener {
friend cdc_service;
db_context _ctxt;
bool _stopped = false;
public:
impl(db_context ctxt)
: _ctxt(std::move(ctxt))
{
_ctxt._migration_notifier.register_listener(this);
}
~impl() {
assert(_stopped);
}
future<> stop() {
return _ctxt._migration_notifier.unregister_listener(this).then([this] {
_stopped = true;
});
}
void on_before_create_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
if (schema.cdc_options().enabled()) {
auto& db = _ctxt._proxy.get_db().local();
auto logname = log_name(schema.cf_name());
if (!db.has_schema(schema.ks_name(), logname)) {
// in seastar thread
auto log_schema = create_log_schema(schema);
auto stream_desc_schema = create_stream_description_table_schema(schema);
auto& keyspace = db.find_keyspace(schema.ks_name());
auto log_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
}
void on_before_update_column_family(const schema& new_schema, const schema& old_schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
bool is_cdc = new_schema.cdc_options().enabled();
bool was_cdc = old_schema.cdc_options().enabled();
// we need to create or modify the log & stream schemas iff either we changed cdc status (was != is)
// or if cdc is on now unconditionally, since then any actual base schema changes will affect the column
// etc.
if (was_cdc || is_cdc) {
auto logname = log_name(old_schema.cf_name());
auto descname = desc_name(old_schema.cf_name());
auto& db = _ctxt._proxy.get_db().local();
auto& keyspace = db.find_keyspace(old_schema.ks_name());
auto log_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), logname).schema() : nullptr;
auto stream_desc_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), descname).schema() : nullptr;
if (!is_cdc) {
auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
return;
}
auto new_log_schema = create_log_schema(new_schema, log_schema ? std::make_optional(log_schema->id()) : std::nullopt);
auto new_stream_desc_schema = create_stream_description_table_schema(new_schema, stream_desc_schema ? std::make_optional(stream_desc_schema->id()) : std::nullopt);
auto log_mut = log_schema
? db::schema_tables::make_update_table_mutations(keyspace.metadata(), log_schema, new_log_schema, timestamp, false)
: db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_log_schema, timestamp)
;
auto stream_mut = stream_desc_schema
? db::schema_tables::make_update_table_mutations(keyspace.metadata(), stream_desc_schema, new_stream_desc_schema, timestamp, false)
: db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_stream_desc_schema, timestamp)
;
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
void on_before_drop_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
if (schema.cdc_options().enabled()) {
auto logname = log_name(schema.cf_name());
auto descname = desc_name(schema.cf_name());
auto& db = _ctxt._proxy.get_db().local();
auto& keyspace = db.find_keyspace(schema.ks_name());
auto log_schema = db.find_column_family(schema.ks_name(), logname).schema();
auto stream_desc_schema = db.find_column_family(schema.ks_name(), descname).schema();
auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
}
}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {
// This callback is done on all shards. Only do the work once.
if (engine().cpu_id() != 0) {
return;
}
auto& db = _ctxt._proxy.get_db().local();
auto& cf = db.find_column_family(ks_name, cf_name);
auto schema = cf.schema();
if (schema->cdc_options().enabled()) {
populate_desc(_ctxt, *schema).get();
}
}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed) override {
on_create_column_family(ks_name, cf_name);
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {}
future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
lowres_clock::time_point timeout,
std::vector<mutation>&& mutations
);
template<typename Iter>
future<> append_mutations(Iter i, Iter e, schema_ptr s, lowres_clock::time_point, std::vector<mutation>&);
};
cdc::cdc_service::cdc_service(service::storage_proxy& proxy)
: cdc_service(db_context::builder(proxy).build())
{}
cdc::cdc_service::cdc_service(db_context ctxt)
: _impl(std::make_unique<impl>(std::move(ctxt)))
{
_impl->_ctxt._proxy.set_cdc_service(this);
}
future<> cdc::cdc_service::stop() {
return _impl->stop();
}
cdc::cdc_service::~cdc_service() = default;
cdc::options::options(const std::map<sstring, sstring>& map) {
if (map.find("enabled") == std::end(map)) {
return;
}
for (auto& p : map) {
if (p.first == "enabled") {
_enabled = p.second == "true";
} else if (p.first == "preimage") {
_preimage = p.second == "true";
} else if (p.first == "postimage") {
_postimage = p.second == "true";
} else if (p.first == "ttl") {
_ttl = std::stoi(p.second);
} else {
throw exceptions::configuration_exception("Invalid CDC option: " + p.first);
}
}
}
std::map<sstring, sstring> cdc::options::to_map() const {
if (!_enabled) {
return {};
}
return {
{ "enabled", _enabled ? "true" : "false" },
{ "preimage", _preimage ? "true" : "false" },
{ "postimage", _postimage ? "true" : "false" },
{ "ttl", std::to_string(_ttl) },
};
}
sstring cdc::options::to_sstring() const {
return json::to_json(to_map());
}
bool cdc::options::operator==(const options& o) const {
return _enabled == o._enabled && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl;
}
bool cdc::options::operator!=(const options& o) const {
return !(*this == o);
}
namespace cdc {
using operation_native_type = std::underlying_type_t<operation>;
using column_op_native_type = std::underlying_type_t<column_op>;
sstring log_name(const sstring& table_name) {
static constexpr auto cdc_log_suffix = "_scylla_cdc_log";
return table_name + cdc_log_suffix;
}
sstring desc_name(const sstring& table_name) {
static constexpr auto cdc_desc_suffix = "_scylla_cdc_desc";
return table_name + cdc_desc_suffix;
}
static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid) {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.set_comment(sprint("CDC log for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("stream_id", uuid_type, column_kind::partition_key);
b.with_column("time", timeuuid_type, column_kind::clustering_key);
b.with_column("batch_seq_no", int32_type, column_kind::clustering_key);
b.with_column("operation", data_type_for<operation_native_type>());
b.with_column("ttl", long_type);
auto add_columns = [&] (const schema::const_iterator_range_type& columns, bool is_data_col = false) {
for (const auto& column : columns) {
auto type = column.type;
if (is_data_col) {
type = tuple_type_impl::get_instance({ /* op */ data_type_for<column_op_native_type>(), /* value */ type, /* ttl */long_type});
}
b.with_column("_" + column.name(), type);
}
};
add_columns(s.partition_key_columns());
add_columns(s.clustering_key_columns());
add_columns(s.static_columns(), true);
add_columns(s.regular_columns(), true);
if (uuid) {
b.set_uuid(*uuid);
}
return b.build();
}
static schema_ptr create_stream_description_table_schema(const schema& s, std::optional<utils::UUID> uuid) {
schema_builder b(s.ks_name(), desc_name(s.cf_name()));
b.set_comment(sprint("CDC description for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("node_ip", inet_addr_type, column_kind::partition_key);
b.with_column("shard_id", int32_type, column_kind::partition_key);
b.with_column("created_at", timestamp_type, column_kind::clustering_key);
b.with_column("stream_id", uuid_type);
if (uuid) {
b.set_uuid(*uuid);
}
return b.build();
}
// This function assumes setup_stream_description_table was called on |s| before the call to this
// function.
static future<> populate_desc(db_context ctx, const schema& s) {
auto& db = ctx._proxy.get_db().local();
auto desc_schema =
db.find_schema(s.ks_name(), desc_name(s.cf_name()));
auto log_schema =
db.find_schema(s.ks_name(), log_name(s.cf_name()));
auto belongs_to = [&](const gms::inet_address& endpoint,
const unsigned int shard_id,
const int shard_count,
const unsigned int ignore_msb_bits,
const utils::UUID& stream_id) {
const auto log_pk = partition_key::from_singular(*log_schema,
data_value(stream_id));
const auto token = ctx._partitioner.decorate_key(*log_schema, log_pk).token();
if (ctx._token_metadata.get_endpoint(ctx._token_metadata.first_token(token)) != endpoint) {
return false;
}
const auto owning_shard_id = dht::murmur3_partitioner(shard_count, ignore_msb_bits).shard_of(token);
return owning_shard_id == shard_id;
};
std::vector<mutation> mutations;
const auto ts = api::new_timestamp();
const auto ck = clustering_key::from_single_value(
*desc_schema, timestamp_type->decompose(ts));
auto cdef = desc_schema->get_column_definition(to_bytes("stream_id"));
for (const auto& dc : ctx._token_metadata.get_topology().get_datacenter_endpoints()) {
for (const auto& endpoint : dc.second) {
const auto decomposed_ip = inet_addr_type->decompose(endpoint.addr());
const unsigned int shard_count = ctx._snitch->get_shard_count(endpoint);
const unsigned int ignore_msb_bits = ctx._snitch->get_ignore_msb_bits(endpoint);
for (unsigned int shard_id = 0; shard_id < shard_count; ++shard_id) {
const auto pk = partition_key::from_exploded(
*desc_schema, { decomposed_ip, int32_type->decompose(static_cast<int>(shard_id)) });
mutations.emplace_back(desc_schema, pk);
auto stream_id = utils::make_random_uuid();
while (!belongs_to(endpoint, shard_id, shard_count, ignore_msb_bits, stream_id)) {
stream_id = utils::make_random_uuid();
}
auto value = atomic_cell::make_live(*uuid_type,
ts,
uuid_type->decompose(stream_id));
mutations.back().set_cell(ck, *cdef, std::move(value));
}
}
}
return ctx._proxy.mutate(std::move(mutations),
db::consistency_level::QUORUM,
db::no_timeout,
nullptr,
empty_service_permit());
}
db_context::builder::builder(service::storage_proxy& proxy)
: _proxy(proxy)
{}
db_context::builder& db_context::builder::with_migration_notifier(service::migration_notifier& migration_notifier) {
_migration_notifier = migration_notifier;
return *this;
}
db_context::builder& db_context::builder::with_token_metadata(locator::token_metadata& token_metadata) {
_token_metadata = token_metadata;
return *this;
}
db_context::builder& db_context::builder::with_snitch(locator::snitch_ptr& snitch) {
_snitch = snitch;
return *this;
}
db_context::builder& db_context::builder::with_partitioner(dht::i_partitioner& partitioner) {
_partitioner = partitioner;
return *this;
}
db_context db_context::builder::build() {
return db_context{
_proxy,
_migration_notifier ? _migration_notifier->get() : service::get_local_storage_service().get_migration_notifier(),
_token_metadata ? _token_metadata->get() : service::get_local_storage_service().get_token_metadata(),
_snitch ? _snitch->get() : locator::i_endpoint_snitch::get_local_snitch_ptr(),
_partitioner ? _partitioner->get() : dht::global_partitioner()
};
}
class transformer final {
public:
using streams_type = std::unordered_map<std::pair<net::inet_address, unsigned int>, utils::UUID>;
private:
db_context _ctx;
schema_ptr _schema;
schema_ptr _log_schema;
utils::UUID _time;
bytes _decomposed_time;
::shared_ptr<const transformer::streams_type> _streams;
const column_definition& _op_col;
ttl_opt _cdc_ttl_opt;
clustering_key set_pk_columns(const partition_key& pk, int batch_no, mutation& m) const {
const auto log_ck = clustering_key::from_exploded(
*m.schema(), { _decomposed_time, int32_type->decompose(batch_no) });
auto pk_value = pk.explode(*_schema);
size_t pos = 0;
for (const auto& column : _schema->partition_key_columns()) {
assert (pos < pk_value.size());
auto cdef = m.schema()->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(pk_value[pos]),
_cdc_ttl_opt);
m.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
return log_ck;
}
void set_operation(const clustering_key& ck, operation op, mutation& m) const {
m.set_cell(ck, _op_col, atomic_cell::make_live(*_op_col.type, _time.timestamp(), _op_col.type->decompose(operation_native_type(op)), _cdc_ttl_opt));
}
partition_key stream_id(const net::inet_address& ip, unsigned int shard_id) const {
auto it = _streams->find(std::make_pair(ip, shard_id));
if (it == std::end(*_streams)) {
throw std::runtime_error(format("No stream found for node {} and shard {}", ip, shard_id));
}
return partition_key::from_exploded(*_log_schema, { uuid_type->decompose(it->second) });
}
public:
transformer(db_context ctx, schema_ptr s, ::shared_ptr<const transformer::streams_type> streams)
: _ctx(ctx)
, _schema(std::move(s))
, _log_schema(ctx._proxy.get_db().local().find_schema(_schema->ks_name(), log_name(_schema->cf_name())))
, _time(utils::UUID_gen::get_time_UUID())
, _decomposed_time(timeuuid_type->decompose(_time))
, _streams(std::move(streams))
, _op_col(*_log_schema->get_column_definition(to_bytes("operation")))
{
if (_schema->cdc_options().ttl()) {
_cdc_ttl_opt = std::chrono::seconds(_schema->cdc_options().ttl());
}
}
// TODO: is pre-image data based on query enough. We only have actual column data. Do we need
// more details like tombstones/ttl? Probably not but keep in mind.
mutation transform(const mutation& m, const cql3::untyped_result_set* rs = nullptr) const {
auto& t = m.token();
auto&& ep = _ctx._token_metadata.get_endpoint(
_ctx._token_metadata.first_token(t));
if (!ep) {
throw std::runtime_error(format("No owner found for key {}", m.decorated_key()));
}
auto shard_id = dht::murmur3_partitioner(_ctx._snitch->get_shard_count(*ep), _ctx._snitch->get_ignore_msb_bits(*ep)).shard_of(t);
mutation res(_log_schema, stream_id(ep->addr(), shard_id));
auto& p = m.partition();
if (p.partition_tombstone()) {
// Partition deletion
auto log_ck = set_pk_columns(m.key(), 0, res);
set_operation(log_ck, operation::partition_delete, res);
} else if (!p.row_tombstones().empty()) {
// range deletion
int batch_no = 0;
for (auto& rt : p.row_tombstones()) {
auto set_bound = [&] (const clustering_key& log_ck, const clustering_key_prefix& ckp) {
auto exploded = ckp.explode(*_schema);
size_t pos = 0;
for (const auto& column : _schema->clustering_key_columns()) {
if (pos >= exploded.size()) {
break;
}
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(exploded[pos]),
_cdc_ttl_opt);
res.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
};
{
auto log_ck = set_pk_columns(m.key(), batch_no, res);
set_bound(log_ck, rt.start);
// TODO: separate inclusive/exclusive range
set_operation(log_ck, operation::range_delete_start, res);
++batch_no;
}
{
auto log_ck = set_pk_columns(m.key(), batch_no, res);
set_bound(log_ck, rt.end);
// TODO: separate inclusive/exclusive range
set_operation(log_ck, operation::range_delete_end, res);
++batch_no;
}
}
} else {
// should be update or deletion
int batch_no = 0;
for (const rows_entry& r : p.clustered_rows()) {
auto ck_value = r.key().explode(*_schema);
std::optional<clustering_key> pikey;
const cql3::untyped_result_set_row * pirow = nullptr;
if (rs) {
for (auto& utr : *rs) {
bool match = true;
for (auto& c : _schema->clustering_key_columns()) {
auto rv = utr.get_view(c.name_as_text());
auto cv = r.key().get_component(*_schema, c.component_index());
if (rv != cv) {
match = false;
break;
}
}
if (match) {
pikey = set_pk_columns(m.key(), batch_no, res);
set_operation(*pikey, operation::pre_image, res);
pirow = &utr;
++batch_no;
break;
}
}
}
auto log_ck = set_pk_columns(m.key(), batch_no, res);
size_t pos = 0;
for (const auto& column : _schema->clustering_key_columns()) {
assert (pos < ck_value.size());
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
res.set_cell(log_ck, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
if (pirow) {
assert(pirow->has(column.name_as_text()));
res.set_cell(*pikey, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
}
++pos;
}
std::vector<bytes_opt> values(3);
auto process_cells = [&](const row& r, column_kind ckind) {
r.for_each_cell([&](column_id id, const atomic_cell_or_collection& cell) {
auto& cdef = _schema->column_at(ckind, id);
auto* dst = _log_schema->get_column_definition(to_bytes("_" + cdef.name()));
// todo: collections.
if (cdef.is_atomic()) {
column_op op;
values[1] = values[2] = std::nullopt;
auto view = cell.as_atomic_cell(cdef);
if (view.is_live()) {
op = column_op::set;
values[1] = view.value().linearize();
if (view.is_live_and_has_ttl()) {
values[2] = long_type->decompose(data_value(view.ttl().count()));
}
} else {
op = column_op::del;
}
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(op)));
res.set_cell(log_ck, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
if (pirow && pirow->has(cdef.name_as_text())) {
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(column_op::set)));
values[1] = pirow->get_blob(cdef.name_as_text());
values[2] = std::nullopt;
assert(std::addressof(res.partition().clustered_row(*_log_schema, *pikey)) != std::addressof(res.partition().clustered_row(*_log_schema, log_ck)));
assert(pikey->explode() != log_ck.explode());
res.set_cell(*pikey, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
}
} else {
cdc_log.warn("Non-atomic cell ignored {}.{}:{}", _schema->ks_name(), _schema->cf_name(), cdef.name_as_text());
}
});
};
process_cells(r.row().cells(), column_kind::regular_column);
process_cells(p.static_row().get(), column_kind::static_column);
set_operation(log_ck, operation::update, res);
++batch_no;
}
}
return res;
}
static db::timeout_clock::time_point default_timeout() {
return db::timeout_clock::now() + 10s;
}
future<lw_shared_ptr<cql3::untyped_result_set>> pre_image_select(
service::client_state& client_state,
db::consistency_level cl,
const mutation& m)
{
auto& p = m.partition();
if (p.partition_tombstone() || !p.row_tombstones().empty() || p.clustered_rows().empty()) {
return make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>();
}
dht::partition_range_vector partition_ranges{dht::partition_range(m.decorated_key())};
auto&& pc = _schema->partition_key_columns();
auto&& cc = _schema->clustering_key_columns();
std::vector<query::clustering_range> bounds;
if (cc.empty()) {
bounds.push_back(query::clustering_range::make_open_ended_both_sides());
} else {
for (const rows_entry& r : p.clustered_rows()) {
auto& ck = r.key();
bounds.push_back(query::clustering_range::make_singular(ck));
}
}
std::vector<const column_definition*> columns;
columns.reserve(_schema->all_columns().size());
std::transform(pc.begin(), pc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
std::transform(cc.begin(), cc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
query::column_id_vector static_columns, regular_columns;
auto sk = column_kind::static_column;
auto rk = column_kind::regular_column;
// TODO: this assumes all mutations touch the same set of columns. This might not be true, and we may need to do more horrible set operation here.
for (auto& [r, cids, kind] : { std::tie(p.static_row().get(), static_columns, sk), std::tie(p.clustered_rows().begin()->row().cells(), regular_columns, rk) }) {
r.for_each_cell([&](column_id id, const atomic_cell_or_collection&) {
auto& cdef =_schema->column_at(kind, id);
cids.emplace_back(id);
columns.emplace_back(&cdef);
});
}
auto selection = cql3::selection::selection::for_columns(_schema, std::move(columns));
auto partition_slice = query::partition_slice(std::move(bounds), std::move(static_columns), std::move(regular_columns), selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_partitions);
return _ctx._proxy.query(_schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
[s = _schema, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) -> lw_shared_ptr<cql3::untyped_result_set> {
cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *s, *selection));
auto result_set = builder.build();
if (!result_set || result_set->empty()) {
return {};
}
return make_lw_shared<cql3::untyped_result_set>(*result_set);
});
}
};
// This class is used to build a mapping from <node ip, shard id> to stream_id
// It is used as a consumer for rows returned by the query to CDC Description Table
class streams_builder {
const schema& _schema;
transformer::streams_type _streams;
net::inet_address _node_ip = net::inet_address();
unsigned int _shard_id = 0;
api::timestamp_type _latest_row_timestamp = api::min_timestamp;
utils::UUID _latest_row_stream_id = utils::UUID();
public:
streams_builder(const schema& s) : _schema(s) {}
void accept_new_partition(const partition_key& key, uint32_t row_count) {
auto exploded = key.explode(_schema);
_node_ip = value_cast<net::inet_address>(inet_addr_type->deserialize(exploded[0]));
_shard_id = static_cast<unsigned int>(value_cast<int>(int32_type->deserialize(exploded[1])));
_latest_row_timestamp = api::min_timestamp;
_latest_row_stream_id = utils::UUID();
}
void accept_new_partition(uint32_t row_count) {
assert(false);
}
void accept_new_row(
const clustering_key& key,
const query::result_row_view& static_row,
const query::result_row_view& row) {
auto row_iterator = row.iterator();
api::timestamp_type timestamp = value_cast<db_clock::time_point>(
timestamp_type->deserialize(key.explode(_schema)[0])).time_since_epoch().count();
if (timestamp <= _latest_row_timestamp) {
return;
}
_latest_row_timestamp = timestamp;
for (auto&& cdef : _schema.regular_columns()) {
if (cdef.name_as_text() != "stream_id") {
row_iterator.skip(cdef);
continue;
}
auto val_opt = row_iterator.next_atomic_cell();
assert(val_opt);
val_opt->value().with_linearized([&] (bytes_view bv) {
_latest_row_stream_id = value_cast<utils::UUID>(uuid_type->deserialize(bv));
});
}
}
void accept_new_row(const query::result_row_view& static_row, const query::result_row_view& row) {
assert(false);
}
void accept_partition_end(const query::result_row_view& static_row) {
_streams.emplace(std::make_pair(_node_ip, _shard_id), _latest_row_stream_id);
}
transformer::streams_type build() {
return std::move(_streams);
}
};
static future<::shared_ptr<transformer::streams_type>> get_streams(
db_context ctx,
const sstring& ks_name,
const sstring& cf_name,
lowres_clock::time_point timeout,
service::query_state& qs) {
auto s =
ctx._proxy.get_db().local().find_schema(ks_name, desc_name(cf_name));
query::read_command cmd(
s->id(),
s->version(),
partition_slice_builder(*s).with_no_static_columns().build());
return ctx._proxy.query(
s,
make_lw_shared(std::move(cmd)),
{dht::partition_range::make_open_ended_both_sides()},
db::consistency_level::QUORUM,
{timeout, qs.get_permit(), qs.get_client_state()}).then([s = std::move(s)] (auto qr) mutable {
return query::result_view::do_with(*qr.query_result,
[s = std::move(s)] (query::result_view v) {
auto slice = partition_slice_builder(*s)
.with_no_static_columns()
.build();
streams_builder builder{ *s };
v.consume(slice, builder);
return ::make_shared<transformer::streams_type>(builder.build());
});
});
}
template <typename Func>
future<std::vector<mutation>>
transform_mutations(std::vector<mutation>& muts, decltype(muts.size()) batch_size, Func&& f) {
return parallel_for_each(
boost::irange(static_cast<decltype(muts.size())>(0), muts.size(), batch_size),
std::move(f))
.then([&muts] () mutable { return std::move(muts); });
}
} // namespace cdc
future<std::tuple<std::vector<mutation>, cdc::result_callback>>
cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
// we do all this because in the case of batches, we can have mixed schemas.
auto e = mutations.end();
auto i = std::find_if(mutations.begin(), e, [](const mutation& m) {
return m.schema()->cdc_options().enabled();
});
if (i == e) {
return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
}
mutations.reserve(2 * mutations.size());
return do_with(std::move(mutations), service::query_state(service::client_state::for_internal_calls(), empty_service_permit()), [this, timeout, i](std::vector<mutation>& mutations, service::query_state& qs) {
return transform_mutations(mutations, 1, [this, &mutations, timeout, &qs] (int idx) {
auto& m = mutations[idx];
auto s = m.schema();
if (!s->cdc_options().enabled()) {
return make_ready_future<>();
}
// for batches/multiple mutations this is super inefficient. either partition the mutation set by schema
// and re-use streams, or probably better: add a cache so this lookup is a noop on second mutation
return get_streams(_ctxt, s->ks_name(), s->cf_name(), timeout, qs).then([this, s = std::move(s), &qs, &mutations, idx](::shared_ptr<transformer::streams_type> streams) mutable {
auto& m = mutations[idx]; // should not really need because of reserve, but lets be conservative
transformer trans(_ctxt, s, streams);
if (!s->cdc_options().preimage()) {
mutations.emplace_back(trans.transform(m));
return make_ready_future<>();
}
// Note: further improvement here would be to coalesce the pre-image selects into one
// iff a batch contains several modifications to the same table. Otoh, batch is rare(?)
// so this is premature.
auto f = trans.pre_image_select(qs.get_client_state(), db::consistency_level::LOCAL_QUORUM, m);
return f.then([trans = std::move(trans), &mutations, idx] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
mutations.push_back(trans.transform(mutations[idx], rs.get()));
});
});
}).then([](std::vector<mutation> mutations) {
return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
});
});
}
bool cdc::cdc_service::needs_cdc_augmentation(const std::vector<mutation>& mutations) const {
return std::any_of(mutations.begin(), mutations.end(), [](const mutation& m) {
return m.schema()->cdc_options().enabled();
});
}
future<std::tuple<std::vector<mutation>, cdc::result_callback>>
cdc::cdc_service::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
return _impl->augment_mutation_call(timeout, std::move(mutations));
}

142
cdc/cdc.hh Normal file
View File

@@ -0,0 +1,142 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <optional>
#include <map>
#include <string>
#include <vector>
#include <seastar/core/future.hh>
#include <seastar/core/lowres_clock.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sstring.hh>
#include "exceptions/exceptions.hh"
#include "timestamp.hh"
#include "cdc_options.hh"
class schema;
using schema_ptr = seastar::lw_shared_ptr<const schema>;
namespace locator {
class snitch_ptr;
class token_metadata;
} // namespace locator
namespace service {
class migration_notifier;
class storage_proxy;
class query_state;
} // namespace service
namespace dht {
class i_partitioner;
} // namespace dht
class mutation;
class partition_key;
namespace cdc {
class db_context;
// Callback to be invoked on mutation finish to fix
// the whole bit about post-image.
// TODO: decide on what the parameters are to be for this.
using result_callback = std::function<future<>()>;
/// \brief CDC service, responsible for schema listeners
///
/// CDC service will listen for schema changes and iff CDC is enabled/changed
/// create/modify/delete corresponding log tables etc as part of the schema change.
///
class cdc_service {
class impl;
std::unique_ptr<impl> _impl;
public:
future<> stop();
cdc_service(service::storage_proxy&);
cdc_service(db_context);
~cdc_service();
// If any of the mutations are cdc enabled, optionally selects preimage, and adds the
// appropriate augments to set the log entries.
// Iff post-image is enabled for any of these, a non-empty callback is also
// returned to be invoked post the mutation query.
future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
lowres_clock::time_point timeout,
std::vector<mutation>&& mutations
);
bool needs_cdc_augmentation(const std::vector<mutation>&) const;
};
struct db_context final {
service::storage_proxy& _proxy;
service::migration_notifier& _migration_notifier;
locator::token_metadata& _token_metadata;
locator::snitch_ptr& _snitch;
dht::i_partitioner& _partitioner;
class builder final {
service::storage_proxy& _proxy;
std::optional<std::reference_wrapper<service::migration_notifier>> _migration_notifier;
std::optional<std::reference_wrapper<locator::token_metadata>> _token_metadata;
std::optional<std::reference_wrapper<locator::snitch_ptr>> _snitch;
std::optional<std::reference_wrapper<dht::i_partitioner>> _partitioner;
public:
builder(service::storage_proxy& proxy);
builder& with_migration_notifier(service::migration_notifier& migration_notifier);
builder& with_token_metadata(locator::token_metadata& token_metadata);
builder& with_snitch(locator::snitch_ptr& snitch);
builder& with_partitioner(dht::i_partitioner& partitioner);
db_context build();
};
};
// cdc log table operation
enum class operation : int8_t {
// note: these values will eventually be read by a third party, probably not privvy to this
// enum decl, so don't change the constant values (or the datatype).
pre_image = 0, update = 1, row_delete = 2, range_delete_start = 3, range_delete_end = 4, partition_delete = 5
};
// cdc log data column operation
enum class column_op : int8_t {
// same as "operation". Do not edit values or type/type unless you _really_ want to.
set = 0, del = 1, add = 2,
};
seastar::sstring log_name(const seastar::sstring& table_name);
seastar::sstring desc_name(const seastar::sstring& table_name);
} // namespace cdc

51
cdc/cdc_options.hh Normal file
View File

@@ -0,0 +1,51 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <map>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace cdc {
class options final {
bool _enabled = false;
bool _preimage = false;
bool _postimage = false;
int _ttl = 86400; // 24h in seconds
public:
options() = default;
options(const std::map<sstring, sstring>& map);
std::map<sstring, sstring> to_map() const;
sstring to_sstring() const;
bool enabled() const { return _enabled; }
bool preimage() const { return _preimage; }
bool postimage() const { return _postimage; }
int ttl() const { return _ttl; }
bool operator==(const options& o) const;
bool operator!=(const options& o) const;
};
} // namespace cdc

View File

@@ -68,7 +68,7 @@ public:
public:
explicit iterator(const mutation_partition& mp)
: _mp(mp)
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row())
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row().get())
{ }
iterator(const mutation_partition& mp, mutation_partition::rows_type::const_iterator it)

View File

@@ -1,3 +0,0 @@
# Scylla Coding Style
Please see the [Seastar style document](https://github.com/scylladb/seastar/blob/master/coding-style.md).

476
collection_mutation.cc Normal file
View File

@@ -0,0 +1,476 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "types/collection.hh"
#include "types/user.hh"
#include "concrete_types.hh"
#include "atomic_cell_or_collection.hh"
#include "mutation_partition.hh"
#include "compaction_garbage_collector.hh"
#include "combine.hh"
#include "collection_mutation.hh"
collection_mutation::collection_mutation(const abstract_type& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator())) {}
collection_mutation::collection_mutation(const abstract_type& type, const bytes_ostream& data)
: _data(imr_object_type::make(data::cell::make_collection(fragment_range_view(data)), &type.imr_state().lsa_migrator())) {}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
bool collection_mutation_view::is_empty() const {
auto in = collection_mutation_input_stream(data);
auto has_tomb = in.read_trivial<bool>();
return !has_tomb && in.read_trivial<uint32_t>() == 0;
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
static bool is_any_live(const atomic_cell_value_view& data, tombstone tomb, gc_clock::time_point now, F&& read_cell_type_info) {
auto in = collection_mutation_input_stream(data);
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
auto ts = in.read_trivial<api::timestamp_type>();
auto ttl = in.read_trivial<gc_clock::duration::rep>();
tomb.apply(tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))});
}
auto nr = in.read_trivial<uint32_t>();
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
if (value.is_live(tomb, now, false)) {
return true;
}
}
return false;
}
bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone tomb, gc_clock::time_point now) const {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::is_any_live(data, tomb, now, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
in.skip(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::is_any_live(data, tomb, now, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
auto key = in.read(key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
[&] (const abstract_type& o) -> bool {
throw std::runtime_error(format("collection_mutation_view::is_any_live: unknown type {}", o.name()));
}
));
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
static api::timestamp_type last_update(const atomic_cell_value_view& data, F&& read_cell_type_info) {
auto in = collection_mutation_input_stream(data);
api::timestamp_type max = api::missing_timestamp;
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
max = std::max(max, in.read_trivial<api::timestamp_type>());
(void)in.read_trivial<gc_clock::duration::rep>();
}
auto nr = in.read_trivial<uint32_t>();
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
max = std::max(value.timestamp(), max);
}
return max;
}
api::timestamp_type collection_mutation_view::last_update(const abstract_type& type) const {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::last_update(data, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
in.skip(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::last_update(data, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
auto key_size = in.read_trivial<uint32_t>();
auto key = in.read(key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
[&] (const abstract_type& o) -> api::timestamp_type {
throw std::runtime_error(format("collection_mutation_view::last_update: unknown type {}", o.name()));
}
));
}
std::ostream& operator<<(std::ostream& os, const collection_mutation_view::printer& cmvp) {
fmt::print(os, "{{collection_mutation_view ");
cmvp._cmv.with_deserialized(cmvp._type, [&os, &type = cmvp._type] (const collection_mutation_view_description& cmvd) {
bool first = true;
fmt::print(os, "tombstone {}", cmvd.tomb);
visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto&& key_type = ctype.name_comparator();
auto&& value_type = ctype.value_comparator();
for (auto&& [key, value] : cmvd.cells) {
if (!first) {
fmt::print(os, ", ");
}
fmt::print(os, "{}: {}", key_type->to_string(key), atomic_cell_view::printer(*value_type, value));
first = false;
}
},
[&] (const user_type_impl& utype) {
for (auto&& [raw_idx, value] : cmvd.cells) {
if (!first) {
fmt::print(os, ", ");
}
auto idx = deserialize_field_index(raw_idx);
fmt::print(os, "{}: {}", utype.field_name_as_string(idx), atomic_cell_view::printer(*utype.type(idx), value));
first = false;
}
},
[&] (const abstract_type& o) {
// Not throwing exception in this likely-to-be debug context
fmt::print(os, "attempted to pretty-print collection_mutation_view_description with type {}", o.name());
}
));
});
fmt::print(os, "}}");
return os;
}
collection_mutation_description
collection_mutation_view_description::materialize(const abstract_type& type) const {
collection_mutation_description m;
m.tomb = tomb;
m.cells.reserve(cells.size());
visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& value_type = *ctype.value_comparator();
for (auto&& e : cells) {
m.cells.emplace_back(to_bytes(e.first), atomic_cell(value_type, e.second));
}
},
[&] (const user_type_impl& utype) {
for (auto&& e : cells) {
m.cells.emplace_back(to_bytes(e.first), atomic_cell(*utype.type(deserialize_field_index(e.first)), e.second));
}
},
[&] (const abstract_type& o) {
throw std::runtime_error(format("attempted to materialize collection_mutation_view_description with type {}", o.name()));
}
));
return m;
}
bool collection_mutation_description::compact_and_expire(column_id id, row_tombstone base_tomb, gc_clock::time_point query_time,
can_gc_fn& can_gc, gc_clock::time_point gc_before, compaction_garbage_collector* collector)
{
bool any_live = false;
auto t = tomb;
tombstone purged_tomb;
if (tomb <= base_tomb.regular()) {
tomb = tombstone();
} else if (tomb.deletion_time < gc_before && can_gc(tomb)) {
purged_tomb = tomb;
tomb = tombstone();
}
t.apply(base_tomb.regular());
utils::chunked_vector<std::pair<bytes, atomic_cell>> survivors;
utils::chunked_vector<std::pair<bytes, atomic_cell>> losers;
for (auto&& name_and_cell : cells) {
atomic_cell& cell = name_and_cell.second;
auto cannot_erase_cell = [&] {
return cell.deletion_time() >= gc_before || !can_gc(tombstone(cell.timestamp(), cell.deletion_time()));
};
if (cell.is_covered_by(t, false) || cell.is_covered_by(base_tomb.shadowable().tomb(), false)) {
continue;
}
if (cell.has_expired(query_time)) {
if (cannot_erase_cell()) {
survivors.emplace_back(std::make_pair(
std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
} else if (collector) {
losers.emplace_back(std::pair(
std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
}
} else if (!cell.is_live()) {
if (cannot_erase_cell()) {
survivors.emplace_back(std::move(name_and_cell));
} else if (collector) {
losers.emplace_back(std::move(name_and_cell));
}
} else {
any_live |= true;
survivors.emplace_back(std::move(name_and_cell));
}
}
if (collector) {
collector->collect(id, collection_mutation_description{purged_tomb, std::move(losers)});
}
cells = std::move(survivors);
return any_live;
}
template <typename Iterator>
static collection_mutation serialize_collection_mutation(
const abstract_type& type,
const tombstone& tomb,
boost::iterator_range<Iterator> cells) {
auto element_size = [] (size_t c, auto&& e) -> size_t {
return c + 8 + e.first.size() + e.second.serialize().size();
};
auto size = accumulate(cells, (size_t)4, element_size);
size += 1;
if (tomb) {
size += sizeof(tomb.timestamp) + sizeof(tomb.deletion_time);
}
bytes_ostream ret;
ret.reserve(size);
auto out = ret.write_begin();
*out++ = bool(tomb);
if (tomb) {
write(out, tomb.timestamp);
write(out, tomb.deletion_time.time_since_epoch().count());
}
auto writeb = [&out] (bytes_view v) {
serialize_int32(out, v.size());
out = std::copy_n(v.begin(), v.size(), out);
};
// FIXME: overflow?
serialize_int32(out, boost::distance(cells));
for (auto&& kv : cells) {
auto&& k = kv.first;
auto&& v = kv.second;
writeb(k);
writeb(v.serialize());
}
return collection_mutation(type, ret);
}
collection_mutation collection_mutation_description::serialize(const abstract_type& type) const {
return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
}
collection_mutation collection_mutation_view_description::serialize(const abstract_type& type) const {
return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
}
template <typename C>
GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
static collection_mutation_view_description
merge(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type) {
using element_type = std::pair<bytes_view, atomic_cell_view>;
auto compare = [&] (const element_type& e1, const element_type& e2) {
return key_type.less(e1.first, e2.first);
};
auto merge = [] (const element_type& e1, const element_type& e2) {
// FIXME: use std::max()?
return std::make_pair(e1.first, compare_atomic_cell_for_merge(e1.second, e2.second) > 0 ? e1.second : e2.second);
};
// applied to a tombstone, returns a predicate checking whether a cell is killed by
// the tombstone
auto cell_killed = [] (const std::optional<tombstone>& t) {
return [&t] (const element_type& e) {
if (!t) {
return false;
}
// tombstone wins if timestamps equal here, unlike row tombstones
if (t->timestamp < e.second.timestamp()) {
return false;
}
return true;
// FIXME: should we consider TTLs too?
};
};
collection_mutation_view_description merged;
merged.cells.reserve(a.cells.size() + b.cells.size());
combine(a.cells.begin(), std::remove_if(a.cells.begin(), a.cells.end(), cell_killed(b.tomb)),
b.cells.begin(), std::remove_if(b.cells.begin(), b.cells.end(), cell_killed(a.tomb)),
std::back_inserter(merged.cells),
compare,
merge);
merged.tomb = std::max(a.tomb, b.tomb);
return merged;
}
collection_mutation merge(const abstract_type& type, collection_mutation_view a, collection_mutation_view b) {
return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
return merge(std::move(a_view), std::move(b_view), *ctype.name_comparator());
},
[&] (const user_type_impl& utype) {
return merge(std::move(a_view), std::move(b_view), *short_type);
},
[] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("collection_mutation merge: unknown type: {}", o.name()));
}
)).serialize(type);
});
});
}
template <typename C>
GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
static collection_mutation_view_description
difference(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type)
{
collection_mutation_view_description diff;
diff.cells.reserve(std::max(a.cells.size(), b.cells.size()));
auto it = b.cells.begin();
for (auto&& c : a.cells) {
while (it != b.cells.end() && key_type.less(it->first, c.first)) {
++it;
}
if (it == b.cells.end() || !key_type.equal(it->first, c.first)
|| compare_atomic_cell_for_merge(c.second, it->second) > 0) {
auto cell = std::make_pair(c.first, c.second);
diff.cells.emplace_back(std::move(cell));
}
}
if (a.tomb > b.tomb) {
diff.tomb = a.tomb;
}
return diff;
}
collection_mutation difference(const abstract_type& type, collection_mutation_view a, collection_mutation_view b)
{
return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
return difference(std::move(a_view), std::move(b_view), *ctype.name_comparator());
},
[&] (const user_type_impl& utype) {
return difference(std::move(a_view), std::move(b_view), *short_type);
},
[] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("collection_mutation difference: unknown type: {}", o.name()));
}
)).serialize(type);
});
});
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, collection_mutation_input_stream&>)
static collection_mutation_view_description
deserialize_collection_mutation(collection_mutation_input_stream& in, F&& read_kv) {
collection_mutation_view_description ret;
auto has_tomb = in.read_trivial<bool>();
if (has_tomb) {
auto ts = in.read_trivial<api::timestamp_type>();
auto ttl = in.read_trivial<gc_clock::duration::rep>();
ret.tomb = tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))};
}
auto nr = in.read_trivial<uint32_t>();
ret.cells.reserve(nr);
for (uint32_t i = 0; i != nr; ++i) {
ret.cells.push_back(read_kv(in));
}
assert(in.empty());
return ret;
}
collection_mutation_view_description
deserialize_collection_mutation(const abstract_type& type, collection_mutation_input_stream& in) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
// value_comparator(), ugh
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return deserialize_collection_mutation(in, [&type_info] (collection_mutation_input_stream& in) {
// FIXME: we could probably avoid the need for size
auto ksize = in.read_trivial<uint32_t>();
auto key = in.read(ksize);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
return std::make_pair(key, value);
});
},
[&] (const user_type_impl& utype) {
return deserialize_collection_mutation(in, [&utype] (collection_mutation_input_stream& in) {
// FIXME: we could probably avoid the need for size
auto ksize = in.read_trivial<uint32_t>();
auto key = in.read(ksize);
auto vsize = in.read_trivial<uint32_t>();
auto value = atomic_cell_view::from_bytes(
utype.type(deserialize_field_index(key))->imr_state().type_info(), in.read(vsize));
return std::make_pair(key, value);
});
},
[&] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("deserialize_collection_mutation: unknown type {}", o.name()));
}
));
}

139
collection_mutation.hh Normal file
View File

@@ -0,0 +1,139 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "utils/chunked_vector.hh"
#include "schema_fwd.hh"
#include "gc_clock.hh"
#include "atomic_cell.hh"
#include "cql_serialization_format.hh"
#include "marshal_exception.hh"
#include "utils/linearizing_input_stream.hh"
#include <iosfwd>
class abstract_type;
class bytes_ostream;
class compaction_garbage_collector;
class row_tombstone;
class collection_mutation;
// An auxiliary struct used to (de)construct collection_mutations.
// Unlike collection_mutation which is a serialized blob, this struct allows to inspect logical units of information
// (tombstone and cells) inside the mutation easily.
struct collection_mutation_description {
tombstone tomb;
// FIXME: use iterators?
// we never iterate over `cells` more than once, so there is no need to store them in memory.
// In some cases instead of constructing the `cells` vector, it would be more efficient to provide
// a one-time-use forward iterator which returns the cells.
utils::chunked_vector<std::pair<bytes, atomic_cell>> cells;
// Expires cells based on query_time. Expires tombstones based on max_purgeable and gc_before.
// Removes cells covered by tomb or this->tomb.
bool compact_and_expire(column_id id, row_tombstone tomb, gc_clock::time_point query_time,
can_gc_fn&, gc_clock::time_point gc_before, compaction_garbage_collector* collector = nullptr);
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
// Similar to collection_mutation_description, except that it doesn't store the cells' data, only observes it.
struct collection_mutation_view_description {
tombstone tomb;
// FIXME: use iterators? See the fixme in collection_mutation_description; the same considerations apply here.
utils::chunked_vector<std::pair<bytes_view, atomic_cell_view>> cells;
// Copies the observed data, storing it in a collection_mutation_description.
collection_mutation_description materialize(const abstract_type&) const;
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
using collection_mutation_input_stream = utils::linearizing_input_stream<atomic_cell_value_view, marshal_exception>;
// Given a linearized collection_mutation_view, returns an auxiliary struct allowing the inspection of each cell.
// The struct is an observer of the data given by the collection_mutation_view and is only valid while the
// passed in `collection_mutation_input_stream` is alive.
// The function needs to be given the type of stored data to reconstruct the structural information.
collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, collection_mutation_input_stream&);
class collection_mutation_view {
public:
atomic_cell_value_view data;
// Is this a noop mutation?
bool is_empty() const;
// Is any of the stored cells live (not deleted nor expired) at the time point `tp`,
// given the later of the tombstones `t` and the one stored in the mutation (if any)?
// Requires a type to reconstruct the structural information.
bool is_any_live(const abstract_type&, tombstone t = tombstone(), gc_clock::time_point tp = gc_clock::time_point::min()) const;
// The maximum of timestamps of the mutation's cells and tombstone.
api::timestamp_type last_update(const abstract_type&) const;
// Given a function that operates on a collection_mutation_view_description,
// calls it on the corresponding description of `this`.
template <typename F>
inline decltype(auto) with_deserialized(const abstract_type& type, F f) const {
auto stream = collection_mutation_input_stream(data);
return f(deserialize_collection_mutation(type, stream));
}
class printer {
const abstract_type& _type;
const collection_mutation_view& _cmv;
public:
printer(const abstract_type& type, const collection_mutation_view& cmv)
: _type(type), _cmv(cmv) {}
friend std::ostream& operator<<(std::ostream& os, const printer& cmvp);
};
};
// A serialized mutation of a collection of cells.
// Used to represent mutations of collections (lists, maps, sets) or non-frozen user defined types.
// It contains a sequence of cells, each representing a mutation of a single entry (element or field) of the collection.
// Each cell has an associated 'key' (or 'path'). The meaning of each (key, cell) pair is:
// for sets: the key is the serialized set element, the cell contains no data (except liveness information),
// for maps: the key is the serialized map element's key, the cell contains the serialized map element's value,
// for lists: the key is a timeuuid identifying the list entry, the cell contains the serialized value,
// for user types: the key is an index identifying the field, the cell contains the value of the field.
// The mutation may also contain a collection-wide tombstone.
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
collection_mutation() {}
collection_mutation(const abstract_type&, collection_mutation_view);
collection_mutation(const abstract_type& type, const bytes_ostream& data);
operator collection_mutation_view() const;
};
collection_mutation merge(const abstract_type&, collection_mutation_view, collection_mutation_view);
collection_mutation difference(const abstract_type&, collection_mutation_view, collection_mutation_view);
// Serializes the given collection of cells to a sequence of bytes ready to be sent over the CQL protocol.
bytes serialize_for_cql(const abstract_type&, collection_mutation_view, cql_serialization_format);

63
column_computation.hh Normal file
View File

@@ -0,0 +1,63 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "bytes.hh"
class schema;
class partition_key;
class clustering_row;
class column_computation;
using column_computation_ptr = std::unique_ptr<column_computation>;
/*
* Column computation represents a computation performed in order to obtain a value for a computed column.
* Computed columns description is also available at docs/system_schema_keyspace.md. They hold values
* not provided directly by the user, but rather computed: from other column values and possibly other sources.
* This class is able to serialize/deserialize column computations and perform the computation itself,
* based on given schema, partition key and clustering row. Responsibility for providing enough data
* in the clustering row in order for computation to succeed belongs to the caller. In particular,
* generating a value might involve performing a read-before-write if the computation is performed
* on more values than are present in the update request.
*/
class column_computation {
public:
virtual ~column_computation() = default;
static column_computation_ptr deserialize(bytes_view raw);
static column_computation_ptr deserialize(const Json::Value& json);
virtual column_computation_ptr clone() const = 0;
virtual bytes serialize() const = 0;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const = 0;
};
class token_column_computation : public column_computation {
public:
virtual column_computation_ptr clone() const override {
return std::make_unique<token_column_computation>(*this);
}
virtual bytes serialize() const override;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;
};

View File

@@ -0,0 +1,36 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "schema.hh"
#include "collection_mutation.hh"
class atomic_cell;
class row_marker;
class compaction_garbage_collector {
public:
virtual ~compaction_garbage_collector() = default;
virtual void collect(column_id id, atomic_cell) = 0;
virtual void collect(column_id id, collection_mutation_description) = 0;
virtual void collect(row_marker) = 0;
};

View File

@@ -21,14 +21,19 @@
#pragma once
#include <seastar/core/future.hh>
#include <seastar/util/noncopyable_function.hh>
#include "schema_fwd.hh"
#include "sstables/shared_sstable.hh"
#include "exceptions/exceptions.hh"
#include "sstables/compaction_backlog_manager.hh"
class table;
using column_family = table;
class schema;
using schema_ptr = lw_shared_ptr<const schema>;
class flat_mutation_reader;
struct mutation_source_metadata;
namespace sstables {
@@ -47,6 +52,8 @@ class sstable_set;
struct compaction_descriptor;
struct resharding_descriptor;
using reader_consumer = noncopyable_function<future<> (flat_mutation_reader)>;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
public:
@@ -75,8 +82,8 @@ public:
// Return if optimization to rule out sstables based on clustering key filter should be applied.
bool use_clustering_key_filter() const;
// Return true if compaction strategy ignores sstables coming from partial runs.
bool ignore_partial_runs() const;
// Return true if compaction strategy doesn't care if a sstable belonging to partial sstable run is compacted.
bool can_compact_partial_runs() const;
// An estimation of number of compaction for strategy to be satisfied.
int64_t estimated_pending_compactions(column_family& cf) const;
@@ -129,6 +136,10 @@ public:
sstable_set make_sstable_set(schema_ptr schema) const;
compaction_backlog_tracker& get_backlog_tracker();
uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate);
reader_consumer make_interposer_consumer(const mutation_source_metadata& ms_meta, reader_consumer end_consumer);
};
// Creates a compaction_strategy object from one of the strategies available.

165
compatible_ring_position.hh Normal file
View File

@@ -0,0 +1,165 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "query-request.hh"
#include <optional>
#include <variant>
// Wraps ring_position_view so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position_view {
const ::schema* _schema = nullptr;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position_view> _rpv;
public:
constexpr compatible_ring_position_view() = default;
compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
: _schema(&s), _rpv(rpv) {
}
const dht::ring_position_view& position() const {
return *_rpv;
}
const ::schema& schema() const {
return *_schema;
}
friend int tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
}
friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position {
schema_ptr _schema;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position> _rp;
public:
constexpr compatible_ring_position() = default;
compatible_ring_position(schema_ptr s, dht::ring_position rp)
: _schema(std::move(s)), _rp(std::move(rp)) {
}
dht::ring_position_view position() const {
return *_rp;
}
const ::schema& schema() const {
return *_schema;
}
friend int tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rp, *y._rp);
}
friend bool operator<(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position or ring_position_view so either is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
// The motivations for supporting both types are to make containers self-sufficient by not relying
// on callers to keep ring position alive, allow lookup on containers that don't support different
// key types, and also avoiding unnecessary copies.
class compatible_ring_position_or_view {
// Optional to supply a default constructor, no more.
std::optional<std::variant<compatible_ring_position, compatible_ring_position_view>> _crp_or_view;
public:
constexpr compatible_ring_position_or_view() = default;
explicit compatible_ring_position_or_view(schema_ptr s, dht::ring_position rp)
: _crp_or_view(compatible_ring_position(std::move(s), std::move(rp))) {
}
explicit compatible_ring_position_or_view(const schema& s, dht::ring_position_view rpv)
: _crp_or_view(compatible_ring_position_view(s, rpv)) {
}
dht::ring_position_view position() const {
struct rpv_accessor {
dht::ring_position_view operator()(const compatible_ring_position& crp) {
return crp.position();
}
dht::ring_position_view operator()(const compatible_ring_position_view& crpv) {
return crpv.position();
}
};
return std::visit(rpv_accessor{}, *_crp_or_view);
}
friend int tri_compare(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
struct schema_accessor {
const ::schema& operator()(const compatible_ring_position& crp) {
return crp.schema();
}
const ::schema& operator()(const compatible_ring_position_view& crpv) {
return crpv.schema();
}
};
return dht::ring_position_tri_compare(std::visit(schema_accessor{}, *x._crp_or_view), x.position(), y.position());
}
friend bool operator<(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) != 0;
}
};

View File

@@ -1,64 +0,0 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "query-request.hh"
#include <optional>
// Wraps ring_position_view so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position_view {
const schema* _schema = nullptr;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position_view> _rpv;
public:
constexpr compatible_ring_position_view() = default;
compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
: _schema(&s), _rpv(rpv) {
}
const dht::ring_position_view& position() const {
return *_rpv;
}
friend int tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
}
friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) != 0;
}
};

View File

@@ -74,8 +74,8 @@ private:
* <len(value1)><value1><len(value2)><value2>...<len(value_n)><value_n>
*
*/
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out) {
template<typename RangeOfSerializedComponents, typename CharOutputIterator>
static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out) {
for (auto&& val : values) {
assert(val.size() <= std::numeric_limits<size_type>::max());
write<size_type>(out, size_type(val.size()));

View File

@@ -248,15 +248,16 @@ private:
static size_t size(const data_value& val) {
return val.serialized_size();
}
template<typename Value, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
static void write_value(Value&& val, bytes::iterator& out) {
template<typename Value, typename CharOutputIterator, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
static void write_value(Value&& val, CharOutputIterator& out) {
out = std::copy(val.begin(), val.end(), out);
}
static void write_value(const data_value& val, bytes::iterator& out) {
template <typename CharOutputIterator>
static void write_value(const data_value& val, CharOutputIterator& out) {
val.serialize(out);
}
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out, bool is_compound) {
template<typename RangeOfSerializedComponents, typename CharOutputIterator>
static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out, bool is_compound) {
if (!is_compound) {
auto it = values.begin();
write_value(std::forward<decltype(*it)>(*it), out);

Some files were not shown because too many files have changed in this diff Show More